Thursday, 2 May 2013

Yeast Identification Test

A short while ago I wrote a post on a fairly technical method to identify wild (and not-so-wild) yeast. This method relies on sequencing a short piece of the yeasts genome; this sequence is then used to ID the yeast. In this article I am going through an example of this method, aiming to demonstrate the operation of this methods. Sadly, we have no official wild yeasts in this example, but we do have a few strains of Brett as well as a yeast sample from a batch of beer that may or may not have been contaminated. Specifically, I am testing:
  • Wyeast 1084 (Irish ale); a run-of-the-mill Saccharomyces cerevisiae strain of yeast.
  • White Labs Brettanomyces lambicus
  • White Labs Brettanomyces bruxellensis
  • A mystery yeast from my Guilds president - its either White Labs Yorkshire Square, or a yeast which contaminated his latest brew.
Details below the fold...


For this example I am looking at five yeasts - Wyeast 1084 (Saccharomyces cerevisiae, Irish Ale), Brettanomyces lambicus & Brettanomyces bruxellensis from White Labs, and a (possibly) wild yeast that invaded the London Homebrewers Guild president's last couple batches of beer...

In all cases, 1ml samples of larger cultures being prepared for either freezing or brewing were taken, the DNA, and a PCR performed on the purified DNA. The PCR was done as follows:
  • A 30ul PCR reaction was setup using PFU polyerase, 0.5ul of the ITS1 and ITS4 primers (stock solution = 100uM) and 1ul of purified DNA.
  • The samples were run through 45 rounds of PCR amplification; each cycle was comprised of: 30sec at 98C, 30sec at 50C, 1min at 72C.
  • The PCR products were then run on a gel to visualize & purify the fragments
First Attempt.
In the first attempt only the Brett produced any bands. It is unclear at this point why this happened - but I suspect I mis-measured or mis-mixed the last two samples.

The gel to the left shows the result of the first PCR attempt. These gels separate DNA pieces by size, with the largest pieces at the top. The left-most lane is a DNA ladder - essentially a mix of DNA pieces of known size. If you go up the ladder from the bottom you'll hit a gap - the band in the middle of the gap is 1500 base pairs (bp) in size; the one below it 1000bp. Every band below the 1000bp is 100bp smaller than the one above. The bright bands between 400bp and 500bp in the two lanes to the right of the ladder are the ITS regions from B. bruxellensis and B. lambicus. B. bruxellensis has an ITS of ~450bp in length, while B. lambicus is slightly larger (but less than 500bp). Brett lanes should have been the ITS regions from 1084 and the mystery yeast in my brew clubs president's beer. The fainter bands are minor products that are no uncommon when using an annealing temperature of 50C (55-0C are more common). The blank two lanes to the right of the

1084 ITS
I repeated the 'presidential' and 1084 PCR reactions, using double the yeast DNA just in case the DNA quantity was limiting last time. I was also extra-anal about mixing the reactions, and dropped the annealing temperature from 50C to 49C. This second attempt is shown to the right. Only the 1084 worked (I cropped the 'presidential' lane). Here, the ITS region is only around 350bp; roughly 100bp shorter then Brett. I re-repeated the presidential PCR, using an annealing temperature of only 45C; still no band. A third attempt also failed - whatever is in their is either indestructible, or so far removed from being a yeast as to be unidentifiable....

Sequencing Results:

The purified DNA was sequenced, producing the following result:
SampleDNA SequenceSpecies
Saccharomycetes sp. HZ94
Pichia membranifaciens strain CBS 212
Pichia membranifaciens isolate NCL 53

So there you have it - one out of three worked...what the hell happened? The answer is two things - one is that the first 60% of the ITS is nearly identical between Sacch, Brett & Pichia. Because DNA search engines look for similarities, this biases results towards this common region. The second issue is species representation in the database. Sacch & Pichia (a plant pathogen) have thousands of sequenced strains in the database; as far as I can tell there is only a couple of Dekkera (another name for Brett) in the database. As such, the high number of Sacc & Pichia strains dominate, thus biasing our results.

What if we tell the search engine to look for Dekkera and only Dekkera? It works! B. brux is ID'd as "Dekkera bruxellensis", B. lamb is ID'd as "Dekkera bruxellensis strain ATCC 56866" (technically, B. lamb is a sub-strain of B. brux). So hey - if we know what we got, we can identify it. That's. . .less than useful.

A large part of the issue appears to be the over-abundance of some species which then bias our results. But we're not done yet - we know what sort of organisms are likely in beer; and blast lets us limit our results by genus or species (see red box in image to the right). So what do we get it we limit our search to a list of likelies? Specifically:
  • Saccharomyces
  • Pleurotus opuntiae
  • Cryptococcus kuetzingii
  • Candida krusei
  • Pichia fermentans
  • Rhodotorula mucilaginosa
  • Dekkera
  • Metschnikowia
  • Schizosaccharomyces
  • Hanseniaspora apiculata
  • Zygosaccharomyces
  • Aureobasidium
  • Torulaspora
  • Kluyveromyces
Everything comes up pichia & candida (damned pathogens, dominating a database). Exclude those and. . .it works! For the B. lamb and B. brux strains the first 25 & 28 hits are Dekkera or Brettanomyces. Searching 1084's sequence gives us the same hit as before. Yay!

Will This Actually Work?

The answer is . . . yes, so long as we include some classical methods. In most cases microscopy + DNA sequencing should be enough to ID the species: by using morphology we can reduce our list of possibles to something manageable, and then use DNA for the final identification. This weekend (as in, 2 days from now) I'm beginning the first real test of this sample - I'm beginning a fully wild ferment using uncrushed malted barley as a source material. With luck, in a few weeks, I'll have a legitimate example to post.

A Note on DNA Sequences.

Example of a DNA sequencing error.
The small peaks are good sequence
reads, the large peaks lead to errors.
The more vigilant may have noticed a series of 'N's in the DNA sequence. But DNA is made of A, T, C, &G's. What's going on? It has to do with the Sanger method used to sequence DNA. In short, this method uses PCR to copy DNA, but adds to the mix a small number of DNA bases which have a fluorescent molecule of a specific colour attached where the next DNA base would normally be added. When these get incorporated into the DNA, they stop the PCR reaction. Because this process is random, and the fluorescent DNA bases are rare, you end up with a mix of different lengths of DNA, each terminated by a single coloured DNA base. By separating these pieces by size, then 'reading' the colour of each piece, we assemble the sequence.

On occasion, errors happen - an incorrect colour gets incorporated, random gunk creates a fluorescent blob, etc. This creates ambiguity in the DNA sequence. For example, in the above image there are neat, single colour peaks giving good sequence, but in the middle there are some overlapping red and green (which represent A's and T's) signals. This ambiguity creates two bases which cannot be identified; leading to two 'N' in the sequence.


  1. Conserved primer sequences for PCR amplification and sequencing from nuclear ribosomal RNA - webpage outlining primers and methods to ID yeast by sequencing.
  2. Brewhouse-Resident Microbiota Are Responsible for Multi-Stage Fermentation of American Coolship Ale - Free scientific article on the yeasts/bacteria found in lambic-style beer & the use of sequencing primers to ID the species within the sample.
  3. NCBI BLAST - search multiple DNA databases for genome sequences.
  4. Yeast Genome Database - database of yeast & other fungi genomes, includes a BLAST feature.


  1. Hi Bryan,
    nice write-up. I have some questions. I did a multiple sequence alignment of your three sequences and the only differences between them are from the sequencing errors. Since this is the case, I see no point in allocating the sequences to different organisms if they show such high similarities. I further aligned the sequences to a rRNA sequence of D. bruxellensis (AM850055) and there are significant differences between the AM850055 sequence and yours. I therefore wouldn't conclude that some of your sequences map to Dekkera and others don't. For example, if you blast the 1084 sequence against the Dekkera database, you get Dekkera hits as well... However, with gaps like in the case of blasting WLP B. lambicus against the Dekkera database.

    Another question concerning the primers. Did you trim your sequences somehow? Because I can only find binding regions for the reverse primers.

    Yet another question. Why is the sequenced sequence of WLP B. lambicus (584 bp) longer than the one you see on the gel?

    As a side note, I did some blasting and for me its seems that you cannot for example amplify Kloeckera rRNA with your primers. Bokulich et al did not use these primers to amplify Kloeckera anyway. I therefore expect that several yeast cannot be identified with you actual setup. Maybe the reason why you couldn't amplify the mystery yeast.

    My last thoughts are about the sequencing errors. In my experience, sequencing errors most often occur in highly repetitive sequences. But in case of your sequences, the errors are not in repetitive sequences. Might this be because of the Taq polymerase you use? We use proof reading enzymes in our lab in case we want to sequence something. And the sequences we get are very low in sequencing errors.

    Good luck with your next experiments, cheers Sam

  2. One Q at a time:
    I made some copying errors in moving sequences over from the sequence files; there are three reads of Bret B sequence in there, instead of the consensus read of each. I'll try to remember to enter the correct ones tomorrow at work.

    The "missing" 5' sequence occurres because I use the same forward primer to sequence as I do to amplify out the genes. Because of the way sequencing works you tend to not get sequence closest to the primer, and instead get sequence ~50bp downstream of the primer.

    The 'extra' length of the one sequence is simply a product of the inaccuracy of gels for determining size. There are better gels with higher resolution (polyacrilimide gels), but for this kind of work we use the poor-quality gel (agarose) as they are easier to recover the DNA from. The caveat then becomes, especially for smaller pieces of DNA, that the gel only reflects the size to within ~100bp.

    As for the sequencing errors, it wouldn't be the enzyme; the additions of mutations are random, and the mutation rate low enough, that its statistically impossible to generate these sorts of read errors. The error I showed in the diagram was unusual; most of the errors are later in the sequence, where the peak-heights are small. These end-run errors are common in sequencing, and usually reflect nothing more than poor signal-to-noise in the machine.

    I'm not overly happy with the results - further reading (of course, after ordering the primers) led me to learn that the ITS region is best for other groups of fungi, but is less ideal for yeasts. Based on those readings I'm re-orientating my plan and instead will (probably) switch to large ribosomal subunit sequencing. But before then I'm going to download some genomes and do some "in silico" tests to make sure the new sequencing approach will work better. I.E. instead of just jumping in, I'll do all my homework first.

  3. Thanks for your replies.
    For the missing end. This makes sense if you only sequence from one side using one primer only. I am used to sequence stuff with two primers. In this case, I would have sent in the DNA with the forward and once with the reverse primer. Then I would get the sequences of both ends plus two sequencing results.

    Believe me, I read through a bunch of papers concerning yeast differentiation based on ITSs over the last years and I still haven't found a good way to apply ITS sequencing on our particular problem. That's why I am very sceptical about all ITS-based sequencing stuff. If one wants to stick to the ITS PCRs, multiple primers might be useful. Some specific for Saccharomyces, some specific for some Dekkera species and others for yeast species more abundant in beer such as Kloeckera, Pichia etc. That's why I might give the MALDI-TOF a go. Simply because it is easy and fast. The only disadvantage are the missing reference databases.

    I guess you are aware of primer-BLAST at NCBI. This gives you an idea what you might amplify using the primers you now have on hand. And there is another yeast hunter doing some work on yeast identifications:

    Good luck and I hope you get some great results soon, Cheers Sam

  4. The problem with 2 primers is it costs 2X as much - plus the ITS is only 300-500bp long, so you can get all of it (minus the bit by the primer) in one read. Like I said, my primer selection was a little naive; while some groups had great success, others didn't. I'm thinking seriously about the large subunit now - it looks to be as good, there are more sequences deposited covering a broader swath of yeasts, and its bigger so it should be possible to get firmer matches.

    I am aware of primer-BLAST, but I'm not a fan of it. Its not very good ad handling degeneracy in mixed samples. I have some matlab scripts which do a better job, assuming I feed it a reasonable set of genomes.

  5. Below some publications about yeast PCR methods I read so far. Maybe a multiplex PCR might work as well.

    - Cocolin et al (Molecular Detection and Identification of Brettanomyces/Dekkera bruxellensis and Brettanomyces/Dekkera anomalus in Spoiled Wines)

    - Curtin et al (Genetic diversity of Dekkera bruxellensis yeasts isolated from Australian wineries)

    - Ibeas et al (Detection of Dekkera-Brettanomyces Strains in Sherry by a Nested PCR Method)

    - Boekhout et al (Phylogeny of the Yeast Genera Hanseniaspora (Anamorph Kloeckera) Dekkera (Anamorph Brettanomyces), and Eeniellaa as Inferred from Partial 26s Ribosomal DNA Nucleotide Sequences)

    - Hayashi et al (Detection and identification of Brettanomyces/Dekkera sp. yeasts with a loop-mediated isothermal amplification method)

    - Kurtzman et al (Identification and phylogeny of ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal DNA partial sequences)

    - Mitrakul et al (Discrimination of Brettanomyces ⁄
    Dekkera yeast isolates from wine by using various DNA fingerprinting methods)

    Cheers, Sam

  6. Thanx! I'd only read about 1/2 of those, so I've got some more reading to do.

  7. Sam, I'd just like to thank you again for those papers. Two covered 26s ribosomal sequencing, and had great success with the NL-1 and NL-4 primers - and they sequenced the very species I'm likely to find (Sacch, Brett, Hanisporia, etc). I've ordered those primers and will hopefully be able to post again soon using the new ones.

    I just thought I comment on the DNA fingerprinting - that is a pre-(cheap) sequencing method that relied on using the size of various PCR or digest fragments to ID species. While it works well, you need to have a pre-existing database of fragment sizes. Unfortunately, those databases are generally incomplete, as cheap sequencing displaced them in the late 1990's.

    Again, thanx for the papers!


  8. You are welcome. I would be interested to know if you pick up sequence differences between B. bruxellensis and B. lambicus?

    Cheers and good luck with your next experiments,

  9. The PCR didn't work out as well as I hoped, with only the B. bruxellensis and B. lambicus PCRs working (kinda what like happened here - I'm thinking my DNA isolation on the later 2 strains was lacking). I dropped the DNA off for sequencing today, so I should have the answer early next week.

    My guess is there may be a change or two, but I'm not expecting much. Technically, B. lambicus is actually a subspecies of B. bruxellensis. So differences may be small.


  10. From a taxonomy-genetic point of view, B. lambicus and B. bruxellensis might differ in their rDNA sequences by less than 0.5%. In other words, less than 3 nucleotides in the D1/D2 LSU rDNA. Sorry to correct you there, B.lambicus is a synonym of B. bruxellensis. Not a subspecies.

    Cheers, Sam

  11. I agree that there may be differences, but even in the variable regions, rRNA evolves slower than other genomic regions. I should have the sequences early next week, so we'll have a better idea then.

    As for B. lamb, its taxonomical status is uncertain. The ICBN oversees fungal plant species nomenclature; lambicus remains an accepted species and strain identifier in their catalogue (which is odd, as it should be one or the other, never both). In 'The Yeasts - A Taxonomic Study', the review on brettanomyces/dekkara shows that lambicus forms two discrete clades within the broader bruxellensis group; one inseparable from the prototype brux, and one mid-way between brux and anomala (about 1% variation).

    Long story short, in the real world there are not clear divisions between species. What we call lambicus is actually two unique clades; one an inseparable part of the brux group of species, the other an outlier mid-way between prototypical brux and prototypical anomala.

    As for my B. lambicus strain, I have no idea which "group" of lambicus it belongs to. I've already got the B. brux ad B. anomala rRNA sequences from the strains used in "The Yeasts"; a 3-way comparison should show which lambicus I have - assuming it fits within the 2-clade paradigm...

  12. Can you tell me on what page in "Yeasts - A Taxonomic Study" by Kurtzman et al (I have the 5th edition) they show the discrete Lambicus clades? I can't find it. On p. 375 (Chapter 25) they list B. lambicus as a synonym for the D. bruxellensis species.

    1. I only have access to a partial copy of the 4th addition at the moment, via google books; my access to the 5th edition is limted to when I'm at work. I'm using the 4th version for my upcoming blog post as I can link to the relevant page:

      That said, I'm 99% sure the same table appears in the 5th edition...somewhere in the first 1/3rd of the book if I recall.

      Short version is they used eletrophoretic mobility (a method common pre-modern genetics) to measure evolutionary differences in a few enzymes. Using this method, "b. lambicus" appears as two distinct populations; one entirely within brux, and one mid-way between brux & anomalus.

      I've been collecting some of the 26S rRNA sequences from the b. lamb/brux/anomala strains mentioned in the table in "Yeasts"; alignments of those appear to mimic what was reported in "yeasts" - some lambichs strains (e.g. CBS 75, ) are 99.5% to 100% identical to the proteotypical brux; while one (CBS 5602) is clearly an outlier and actually clusters closer to the one anomala sequence I currently have.


  13. Sam, it looks like I have to eat crow*. The sequences arrived on a Sunday (imagine that). There was huge 26S variability between the bruxellensis and lambicus strains. I have the post up for these sequences, so look for the details there:


    *In the real world I have eaten crow - it was rather good, so I don't really understand this saying....

  14. Sorry to resurrect this thread but looking for your input on a observation that I made: microscopic examination of WLP653 revealed that the cells exhibited a Saccharomyces morphology instead of the Brett morphology of WLP648 and WLP650. I don't know whether I have a contaminated vial of WLP653 or whether this is indeed what the strain is supposed to look like. Although PCR might answer the question, I don't want to go to the expense of ordering primers. Look forward to a response.



    1. It is impossible to say, as the morphology of Brett can range from Sacc-like ovoids through to hyphae-like branching yeasts. You could ferment with it to see if you get the expected flavour profile; if straight from white labs it is unlikely to be seriously contaminated.