AFLP-like step with 454 sequencing for studying population structure

In 2011, Simon Joly, researcher at the Jardin Botanique de Montréal, set up an experiment with Annie Archambault research professional at the QCBS, using one of the high throughput (or next-generation) sequencing methods to study ginseng (Panax quinquefolius) population structure from southern Ontario and Quebec; which will be necessary for establishing conservation criteria for this rare plant species. The protocol used unidirectional amplicons sequencing on a the Genome Sequencer FLX (GS-FLX) System with the current Titanium chemistry. The sequencing procedures were performed at the Centre d’Innovation McGill et Génome Québec, and the protocol for DNA library preparation is described in the following sections.

Sampling

Ten plants per population were collected for six populations of ginseng (Panax quinquefolius). Leaves were cut and dried in silica gel. The disclosure of the precise populations localities could have a negative impact this rare species, and it is therefore a sensitive information, kept confidential according to Agreement for the Protection and Recovery of Species at Risk between the Canada and the Quebec governments. All populations are at the northern limit of the Appalachians.

Molecular biology protocols

DNA extraction

An amount of 10 milligramm of dried leaves was ground for one minute in a microcentrifuge tube with one tungsten bead in the TissuLyser (Qiagen). Total DNA was extracted using EZ-10 Spin Column Genomic DNA kits for Plant Samples (BioBasics catalog number BS425-50) as recommended by the manufacturer. Quality and quantity of total DNA was evaluated by gel electrophoresis and by optical density measurement.

Genome complexity reduction

A modified AFLP strategy, inspired by the Crops technology¹⁾ (AFLP and CRoPS are registered trademarks of Keygene N.V.) and a published study ²⁾ was applied to Panax quinquefolius total DNA, in order to efficiently discover sequence polymorphism in a wide and random range of the whole genome, but without actually sequencing the whole genome. One of the assumptions of this AFLP-like method is that restriction sites within the genome are conserved among populations. The steps are described in the following paragraphs.

Restriction-digestion of total DNA

The digestion with two different restriction enzymes used a moderate amount of DNA for each samples. Enzyme used were a 4bp-cutter (here Mse1, T/TAA) and a 6bp-cutter (here, EcoR1, G/AATTC) that are not blunt-end, and leave a overhang of 2 (for Mse1) or 4 (for EcoR1) nucleotides.

Table 1 Reagents for digestion of plant genomic DNA, at 37 °C for 3 hours.

Reagent	Initial conc.	Qty added	Final conc. or Final qty
Template DNA	20 ng/µl	9 µl	180 ng
NEB4 Buffer	10X	4 µl	1X
EcoR1	100,000 U/ml	0.05 µl	5 U
Mse1	10,000 U/ml	0.30 µl	3 U
BSA	10 mg/ml	0.4 µl	100 µg/ml
H2O	-	26.25 µl	-
Total volume	-	40 µl	-

Ligation of double stranded adaptors to the digested DNA

Two different double-stranded adaptors were designed with the oligonucleotides listed in Table 2. Resuspended EcoRI_adapter1 and EcoRI_adapter2 oligonucleotides were mixed together, heated at 95 C for 5 minutes, and slowly cool down to make the double stranded adaptor. The same procedure was applied to MseI_adapter1 and MseI_adapter2 oligonucleotides. EcoRI adaptor were diluted to a final concentration of 5 micromolar (5 µM), while MseI adaptors were diluted to a final concentration of 50 micromolar (50 µM).

Table 2 Oligonucleotides for preparation of double stranded adaptors.

Oligo name	Modification	Sequence, 5' to 3'
EcoRI_adapter1		CTCGTAGACTGCGTACC
EcoRI_adapter2	5' phosphorylated	AATTGGTACGCAGTCTAC
MseI adapter1	5' phosphorylated	TACTCAGGACTCAT
MseI adapter2		GACGATGAGTCCTGAG

Ligation of the double-stranded adaptors to digested DNA was performed in NEB4 Buffer using T4 DNA ligase and additional ATP. Table 3 describes the ligation reaction mix and Figure 1 illustrates the DNA fragments involved in adaptor ligation to digested DNA.

Table 3 Reagents for ligation of double stranded adaptors to previously digested DNA. A total volume of 10 µl of the ligation mix is added to the 40 µl volume of each digestion mix, and is incubated at 16 °C for 3 hours.

Reagent	Initial conc.	Qty added	Final conc. or Final qty
NEB4 Buffer	10X	1 µl	1X
EcoRI double-stranded adaptor	5 µM	1.5 µl	0.15 µM
MseI double-stranded adaptor	50 µM	1.5 µl	1.5 µl
T4 DNA ligase	2000 unit/µl	0.1 µl	200 cohesive ends units
ATP	10 mM	5 µl	1 mM
ddH2O	-	0.9 µl	-
Volume added to digestion mix	-	10 µl	-
Total volume	-	40 µl	-

Figure 1 Pictogram of the DNA fragments involved for ligating double stranded adaptors to DNA previously digested with EcoRI and MseI restriction enzymes, in the context of a modified AFLP method for genome complexity reduction.

Selective amplification by PCR using primers specific to the adaptor sequence

The purpose of this step was to amplify only a small proportion of the total genome, thereby reducing the complexity of the nucleotides fragments pool to be sequenced. The term selective refers to addition of one or two nucleotides at the 3' end of the adaptor-specific primers. This way, primers will amplify only a subset of the fragments that exist in the digested-ligated genome. These types of primers are termed selective primers. The only genomic fragments amplified in the present selective amplification were those that, in addition to having a EcoR1 site or a Mse1 site on each end of the sample DNA, also ended by a C on a EcorR1 side and by a AC on a Mse1 side. Because the selective primers were design to also carry the MID (multiplex identifiers) barcodes, and the LibL segments used for the pyrosequencing step, the selective amplification is described in more details in the next section. Figure 2 illustrates the DNA fragments involved in selective amplification.

Pooling, multiplexing and barcoding samples for high throughput sequencing

One feature to high throughput sequencing is the ability to multiplex different samples into a single sequencing run, which is made possible with the use of MID (multiplex identifiers). These are 10 bp long segments that were here added to the 5’ side of the EcoRI section of the selective primers. The barcodes are being sequenced along with the organism DNA, and are then recognized and sorted using bioinformatics methods. The complete list of MID for the Genome Sequencer FLX system is available TCB No. 005-2009 April 2009 Using Multiplex Identifier (MID) Adaptors for the GS FLX Titanium Chemistry - Extended MID Set. Here, the 30 bp nucleotides segment (LibL-A and key) necessary for the sequencing instrument was further added to the 5’ side of the MID segment, following recommendations in APP No. 001-2009 unidirectional sequencing of Amplicon libraries using the GS FLX Titanium emPCR Kits (Lib-L). In the present study, the 6 different ginseng populations were labeled with 6 different MID (multiplex identifiers) barcodes, but each sample of a population was labeled with the same population-specific barcode (listed in Table 4).

Table 4 Selective primers used for reducing the genomic complexity of the Panax genome, and for making amplified fragments suitable for multiplexing different samples in a single run of pyrosequencing on a GS-FLX instrument. All oligonucleotides used as a forward primer include the LibL-A and the key segments necessary for the sequencing instrument, and the population-specific MID. They are followed by a unique EcoRI segment for the selective amplification. The reverse primer is made with a MseI segment (for selective amplification), and a LibL-B segment for the instrument. All oligonucleotides were purified by HPLC.

Oligo name	Sequence, 5' to 3'
LibL_A_MID1_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGACGAGTGCGTGACTGCGTACCAATTC
LibL_A_MID3_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGAGACGCACTCGACTGCGTACCAATTC
LibL_A_MID4_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCACTGTAGGACTGCGTACCAATTC
LibL_A_MID5_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGATCAGACACGGACTGCGTACCAATTC
LibL_A_MID6_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGATATCGCGAGGACTGCGTACCAATTC
LibL_A_MID7_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTGTCTCTAGACTGCGTACCAATTC
LibL_A_MID2_EcoRI_plus1	CCATCTCATCCCTGCGTGTCTCCGACTCAGACGCTCGACAGACTGCGTACCAATTC
LibL_B_MseI_plus2	CCTATCCCCTGTGTGCCTTGGCAGTCTCAGGATGAGTCCTGAGTAAC

Digested DNA samples were amplified in a PCR reaction where the reverse primer is LibL_B_MseI_plus2 for all tubes, and the forward primer is specific to a population (Table 4). However, since the objective of the study was to reveal the genetic diversity at the population level rather than between each individual, the ten Panax quinquefolius samples for each population were all labeled with a same set of barcoded population-specific primers. Each sample was however amplified separately prior to pooling, to ensure an equimolar representation in the pool. Figure 2 illustrates the DNA fragments involved in the selective amplification step.

Figure 2 Pictogram of the DNA fragments involved in selective amplification of previously digested-ligated DNA, in the context of a modified AFLP method for genome complexity reduction coupled to multiplexing samples for high throughput sequencing. The selective primers therefore also contain a barcode (MID), and an instrument specific region (Libl-A and Key), in addition to the template specific region (EcoRI).

Selective amplifications were performed with a highly accurate proofreading enzyme (iProof polymerase, BioRad, catalog number 172-5301), to minimize risks of spurious single nucleotides polymorphisms that would be due to misincorporation of a nucleotide rather than genuine allelic variant. Reaction mix is given in Table 5, and cycling conditions in Table 6. Figure 3 shows an example of an agarose gel electrophorese of selective-amplification products, using Lambda BstEII as molecular ladder.

Table 5 Reaction mix for selective amplifications, in the context of a modified AFLP method for genome complexity reduction coupled to multiplexing samples for high throughput sequencing.

Reagent	Initial conc.	Qty added	Final conc. or Final qty
HF Buffer	5X (includes 15 mM MgCl2)	6 µl	1X
MgCl2	50 mM	0.6 µl	2.5 mM
dNTP	10 mM	0.6 µl	200 µM
Population specific LibL-A-MID primer	10 mM	0.9 µl	300 µM
general LibL_B_MseI_plus2 primer	10 mM	0.9 µl	300 µM
Digested-ligated DNA template	5.4 ng/µl	3 µl	16.2 ng
iProof polymerase	2 U/µl	0.24 µl	0.48 U
ddH2O	-	17.76 µl	-
Total volume	-	30 µl	-

Figure 3 Electrophoresis of the product of selective amplification of digested-ligated Panax quinqefolius total DNA with selective primers, which have a MID barcode tail, for a modified AFLP method for genome complexity reduction coupled to a high throughput sequencing. Two different MgCl2 concentrations were tested, and different temperatures for the primer annealing step. Lane 1: Lambda BstEII ladder; Lane 2: 57 °C; Lane 3: 61.8 °C; Lane 4: 65.5; Lane 5: 68.7 °C; Lane 6: 57 °C; Lane 7: 61.8 °C; Lane 8: 65.5 °C; Lane 9: 68.7 °C

Table 6 Cycling conditions for selective amplifications, in the context of a modified AFLP method for genome complexity reduction coupled to multiplexing samples for high throughput sequencing.

Step	Temperature (°C)	Time
Initial denaturation	98	2 min
30 cycles
Denaturation	98	2 sec
Primers annealing	66	30
Polymerization	72	30
End of cycling
Last polymerisation	72	5 min

¹⁾

van Orsouw, N. J. et al. (2007). Complexity Reduction of Polymorphic Sequences (CRoPSTM): A Novel Approach for Large-Scale Polymorphism Discovery in Complex Genomes. PLoS ONE 2, e1172.

²⁾

Gompert, Z., Forister, M. L., Fordyce, J. A., Nice, C. C., Williamson, R. J., and Alex Buerkle, C. (2010). Bayesian analysis of molecular variance in pyrosequences quantifies population genetic structure across the genome of Lycaeides butterflies. Molecular Ecology, 19, 2455-2473