A little bit about ABI SOLiD Sequencing

The following is an except from my research project into ovarian cancer at the Peter MacCallum Cancer Centre 

Solid sequencing

SOLiD sequencing uses “color-space”, which binds fluorescent primers to the DNA two nucleotides at a time.  For example, a “blue” on the first di-nucleotide pair (bases 2 & 3) will correspond to a double of any nucleotide (e.g. AA, TT, GG, CC).  The di-nucleotide to be read (bases 3 & 4) may then be “green” (AT, CG, GC or TT).  If we know that the nucleotide 1 is a “C”, then the “blue” call can only correspond to nucleotide 2 being a C.  Since we know that nucleotide 2 is a C, then “green” can only correspond to nucleotide 3 being a G.  Thus it is important that the true base of the first nucleotide is known, as any mistake will mean that the entire read is incorrect.  The first base is generally well known, as it is the last base of the adapter sequence (Applied Biosystems, 2008). 

When a reference genome is also available, the “colour space” measurements enable the detection of errors.  Without additional error correction a single error colour base call will affect the calling of all downstream nucleotides.  However, where a reference is known, single incorrect calls can be detected when compared to the reference as they will appear as a read that matches perfectly (in color space) aside from one call, enabling the misread colour to be detected.  Additionally, since a SNP will require two colour changes there is a clear distinction between a SNP call and a single erroneous call. 

(Applied Biosystems, 2008)

The BioScope alignment software supplied by Applied Biosystems uses all this information to produce the alignment that we will be utilizing. 

Long mate pair & library preparation

Long mate pairs enable the sequencing of two ends of a single strand of DNA that is of a known length.  The distance between the two reads is known as the insert size, which is known prior to the alignment.  By comparing the known insert size to the insert size following alignment; it is possible to detect structural variations such as tandem duplications. 

In order to sequence DNA in this way, the DNA is randomly sheered, creating a distribution of different length DNA fragments.  Fragments are then selected for based on their size (in our case 1500 bp), and are capped with adaptors by ligation.  The capped fragments are then circularized, biding the two ends of the fragment.  A nick translation reaction enables the circularized fragment to be cut from both sides of where the two ends are joined.  By controlling the time and temperature of the reaction, the position of the cuts away from the join can be determined to control the length of DNA at each end (e.g. 50bp).    The now bare ends of the fragment are then ligated with adaptors and are suitable for sequencing.  This process is shown in the following diagram:

(Applied Biosystems, 2010)



Despite the error correcting technologies employed by this sequencing technology, the data is far from perfect.  The detection of sequencing errors by relying on the reference sequence is not infallible as it may not be possible to know if a read that fails to map is because of a single colour space read error, or because read actually covers a structural variation.  Obviously, the statistics far favour the likelihood of an error in reading (which is claimed to be 0.1%) (Applied Biosystems 2008) versus the chance of the read being correct and their being a structural variation, but when all these reads are corrected in this manner, it may be difficult to find “true” structural variations.

The library preparation process is difficult and needs to be exacting.  The selection of fragments by size requires meticulous wet-lab work, with the difference between selecting for 600bp fragments and 6kb fragments being the difference between 1% Agarose solution and 0.8% Aragose solution (Applied Biosystems 2010).  This process will always lead to significant variability in the length of the fragments.  Given that it is the insert size that is the primary signal we are looking for, this is something that needs to be corrected for.  However, the orientation of the pairs should not suffer from this variability to the same degree due to the simpler chemistry of how the different pair adaptors are joined to the fragment.

Structural variation detection: Second generation sequencing

The following is an except from my research project into ovarian cancer at the Peter MacCallum Cancer Centre 

There are four primary methods of detecting structural variation (SV) within second-generation sequencing.  These include: read pair discordance, read depth analysis, split read analysis and sequence assembly.  Sequence assembly is not yet feasible with short-read whole genome human data at this point, so will not be discussed.

Read pair discordance

As mentioned previously, both SOLiD and Illumina sequencing technologies provide the opportunity to read two ends of the same strand of DNA.  The sequenced strands are of a known length, so there is an expectation that the two reads should be within a known distance of each other on the genome.

Paired end discordance methods take the reads which have been aligned to the reference genome and look at how far away the two pairs align.  If the first read is found the expected distance from the second read (and in the correct orientation) then the pair is said to be concordant.  If not, the read is discordant and may provide evidence that the sample genome differs structurally from the reference genome, providing evidence of a structural variation.

There are a large number of tools that utilize this basic methodology for the detection of structural variation.  A key metric for the use of these tools is the amount of citations that the tool has received, which are summarized below.






Citations (Oct-2011)



Nature Methods





Genome Research





Genome Biology





Nature Methods

























Due to the number of tools available and the limitations of the word limit of this piece, I will only discuss BreakDancer in detail.


BreakDancer uses the aligned genome by way of SAM or BAM files to look for areas within the sample genome that contain more discordant pairs than would be expected through random chance.  These regions are then classified into six types: normal, deletion, insertion, inversion, intra-chromosomal translocation, inter-chromosomal translocation.  Categorization is done depending on the size of insert size discordance and the orientation of the reads.  Regions with two or more discordant reads are considered for further analysis using a Poisson model that considers the number of supporting reads, the size of the anchoring region and the coverage of the genome.  The type of the structural variation call is then decided by type with the most anchored reads.

A key difficulty with using BreakDancer is the number of false positives that the tool creates, as seen by the following table:











Intra-chromosomal translocations 








It should be noted that the configuration run, BreakDancer was not set to look for inter-chromosomal translocations.

These results imply that the normal sample has more structural variation than the tumour sample which is known not to be the case.  Interpreting these results will be a key aspect of this research project. 

Read depth techniques

A key advantage of read depth techniques is there ability to detect SV within highly repetitive regions of the genome, as “paired-end mapping frequently cannot unambiguously assign end sequences in duplicated regions, making it impossible to distinguish allelic and paralogous variation.” (Alkan, Kidd et al. 2009).


MrFAST (micro-read fast alignment search tool) is a tool designed to detect CNV using second generation sequence data.  It is also the most popular SV tool described in this review, having been cited by 103 papers as of October 2011.  There is also an additional version called “DrFast” which is designed for SOLiD color-space data.

MrFAST attempts to align the raw reads to a reference genome, much like aligners such as BioScope or MAQ.   However MrFAST differs in two key details.  Firstly, it does not attempt to map the full reads, instead breaking a read up into k-mers (with a default length of 12), which are then aligned.  Secondly, most aligners when faced with a read that matches multiple loci within the genome will select a loci at random.  MrFAST on the other hand will map the read to all matching loci in order to reduce variability.  Additionally MrFAST also tracks the “edit distance” for each read at each loci to both reduce the impact of sequencing error and also to enable the calling of SNPs, which is important in determining whether a called copy is actually functional or not.(Alkan, Kidd et al. 2009)

The results called by MrFAST were validated using Array-CGH and FISH Analysis. It is difficult to quantify the success rates of the Array-CGH validation as they only sought to validate those called duplication intervals that were not shared across all three of their samples, in which case they found a validation rate of 68%.  They also used FISH analysis to validate 11 duplicated loci that were different between two of their samples, finding the FISH results to be “highly consistent with the absolute copy number predicted by MrFAST” (Alkan, Kidd et al. 2009).




CNV-Seq uses a different model to MrFAST, in that rather than seeking to detect the absolute number of copies, CNV-Seq uses a comparative technique enabling the detection of differences between two samples.  This is of particular interest for cancer data where we have two sample, tumour and normal and the primary interest is in the differences.

Both samples are aligned to a reference genome.  Using a sliding window, the read depth of each window is calculated for each sample.  The read depth distributions are compared using a Poisson model (using a normal approximation) that enables the calculation of a probability that the difference is the result of random chance (Xie and Tammi 2009).

Calls made by CNV-Seq were validated against the low-coverage genomes of Dr Craig Venter and Dr J Watson (at 7.5x and 7.4x coverage), with the results compared to known regions achieving a 50% overlap of known regions.  Results were also compared to a-CGH micro-array experiments with the majority of calls not being validated, however this was seen as evidence of the superiority of the sensitivity of the CNV-Seq technique.



Split read techniques
Split read techniques attempt to locate the exact junction of a break point by looking for reads which have “hard clips”, which is to say that a large portion of the read maps to the reference genome, but the remaining section does not.  One possible explanation for a hard clip is that their is a structural variation in the region that the read spans. Thus part of the read may map to one loci, and the remaining hard clipped region may map to another - providing evidence of a structural variation.

This technique requires long reads (the CREST algorithm requires reads of > 75bp in length (Wang, Mullighan et al. 2011)) to be effective.  With shorter reads, the hard-clipped region is typically very short, meaning that it may map to a very large number of regions within the genome making it impossible to detect the secondary loci.  




BBQ: Greek style whole lamb

There is something very primal about cooking an entire animal in one go.  It also requires a lot of time, patience and beer to do correctly.


  • 15kg whole lamb
  • 12 lemons
  • 24+ beers

Animal selection & preparation

The most critical is to select an appropriate animal for cooking.  For most of us, this means speaking with your local butcher in advance so that they can find you the perfect beast.  For this recipe, I am using a 15kg spring lamb.  This is about as big as I could manage on the 120cm spit roasting pole that I am using.

Its also important to name your lamb.  Here we have "Lambert", sitting on the counter at my local butcher.  As a ball park cost, Lambert cost me about $120.

Attaching your beast to the pole is tricky.  Essentially the pole goes through the upper chest of your beast, then out its rectum.  Hooks go through the animal's spine and the pole to keep him securely attached to the pole, while its legs are hooked through the attachment.  

Place 3 or 4 cut up lemons into the body cavity and stitch him up.

In my case I didn't have any proper butcher's string or wire, so I cut up coat hangers and bent them with pliers which actually worked really well.

With your beast on the pole, all stitched up its time for the hardest part of the cook, ensuring that the lamb is properly balanced on the pole.  Once you have moved the pole onto the barbecue, start the motor up to ensure that it spins evenly.  If the weighting is not correct, your beast will "fall" down one side then get stuck as the rotisserie tries to lift the heavier side.  This will not only result in uneven cooking, but also potentially break your motor.

Rub salt and pepper all over your beast, so that as he cooks, the fat and juices have something to hang on to.


Start a nice big fire in your rotisserie, I personally try to start with paper and kindling, but if you feel like cheating then I guess you could use fire lighters.



You will want to set aside at least 6 hours to cook your lamb (12-14 beers).  "Low and Slow" is always the best for cooking any meat, and it's doubly true for doing a whole animal in one go!

Prepare a big bowl of oil and lemon in approximately even quantities.  I'd start with about 6 lemons and an equal portion of oil.  Season liberally with salt and pepper and oregano.

Cook the lamb, basting each 20 minutes (or every second beer).

Keep in mind that as you cook, the lamb will crack and shift.  This means that it may become unbalanced on the spit, so you will have to keep a keen eye on it (well as keen as your beer goggles will allow).

For extra authenticity and to impress your friends, start basting the lamb with a bunch of basil.




Carving the lamb is surprisingly friend’s enjoyable (although the hundred beers consumed may have played a role).  The key is to avoid cutting your freinds fingers off as they try to sneak a taste of the delicious, succulent meat. 

We all enjoyed Lambert on pita bread complete with tomato, lettuce, onion, garlic sauce, chilli sauce and of course, more lemon!

BBQ: Apple City BBQ Sauce

This recipe is taken from Peace Love and Barbecue and modified by me (basically to make it hotter!).  I may be biassed, but this is by far the most delicious BBQ sauce I have been lucky enough to taste.


  • 1 cup tomato sauce (Masterfoods)
  • 2/3 cup rice wine vinegar
  • 1/2 cup Bulmers Apply Cider
  • 1/4 cup Apple Cider Vinegar
  • 1/2 cup raw brown sugar
  • 1/4 cup Worcestershire sauce
  • 1 table spoon American mustard
  • 1 teaspoon garlic powder
  • 1/2 teaspoon ground white pepper
  • 1 teaspoon cayenne
  • 1 teaspoon chilli powder (or more if you dare)
  • 1/2 cup Bacon bits, ground fine
  • 1/2 cup finely grated apple
  • 1/3 cup finely grated onion


Combine all ingredients aside from apple and onion in a saucepan and bring to the boil.  Slowly add apple and onion and simmer for about 20 minutes to thicken.

BBQ: Magic Dust

"Magic Dust" is just that, magic.  This paprika based dry rub is the default preparation for just about any meat, from pork to chicken or beef (probably wouldn't work so well with lamb however).


  • 1/2 cup of paprika
  • 1/4 cup of freshly ground salt
  • 1/4 cup of raw, unprocessed sugar
  • 1/3 cup of cumin
  • 3 tablespoons of mustard powder
  • 3 tablespoons of freshly ground black pepper
  • 1/3 cup of garlic powder
  • 2 tablespoons of cayenne
  • 1 tablespoon of chilli powder
  • 2 tablespoons of bacon bits (crushed)


Place bacon bits in a mortar and pestle and grind until a fine powder.

Add the remaining ingredients and mix together.

Thats it!

BBQ: Tezza's Ribs

Its a little known fact that BBQ ribs are the most delicious food in the world.  The following recipe is mostly stolen from the "Apple City Barbecue Grand World Championship Ribs" recipe found in Peace Love and Barbecue.




Build a nice fire and burn down coals until white hot.  Around 20-30 large coals should do the trick.  With the BBQ lid closed, the temperature should stabilize at about 100 degrees centigrade.

Place wood chips in a bucket of warm water, and a splash of apple cider vinegar.

With a sharp knife, clean up the ribs, removing any excess fat or dangly bits.  Cover ribs with the magic dust and set aside for 30 minutes.

Place ribs on the BBQ bone side down, cooking over indirect heat (e.g. move the coals from being under the meat).  Add a handful of the wet chips to start the smoking.

Cook ribs for about 6 hours, turning each hour or so.

In the last 10 minutes of cooking, lather top side of the ribs with the BBQ sauce and leave for about 10 minutes, before doing the other side and leaving for a further 10 minutes.

Remove from the grill, adding more BBQ sauce and leave to rest for about 10 minutes.

I had every intention of photographing the final result, but they looked too damn good, so I just ate them instead.