Browse Prior Art Database

Method to Characterize DNA Sequence Composition Outside of Known Border Sequence (Flanking Sequencing)

IP.com Disclosure Number: IPCOM000245346D
Publication Date: 2016-Mar-02
Document File: 3 page(s) / 235K

Publishing Venue

The IP.com Prior Art Database

Abstract

Method to Characterize DNA Sequence Composition Outside of Known Border Sequence (Flanking Sequencing) The method described enables the sequence analysis of a genetic element of interest (e.g., transgene or transposon) and flanking DNA in a high throughput manner for all transgenic species – including crop plants (e.g., maize, soybean, canola and rice), animals and microbes. In this approach, DNA is isolated from a sample of interest (e.g., event, line, variety, isolate, strain, Tn), the isolated DNA is fragmented and adapter sequences are attached and DNA barcodes are assigned. Ligation-mediated-nested PCR is then performed on each border using modified DNA primers. Samples are then pooled, DNA sequenced and analyzed – which deconvolutes, "seeds" (junction read), extends and maps the genetic element of interest within the genome of the sample of interest. In one example, the flanking sequence (FS) method includes the steps of: isolating sample DNA; shearing or digesting DNA into the preferred size range of 1-2kb fragments; attaching adapters and assigning DNA barcodes; performing ligation-mediated-nested PCR of each border with modified DNA primers; pooling samples, and perform DNA sequence analysis on a next generation DNA sequencing instrument (e.g., Illumina, Ion Torrent™, Pacific Biosciences® and Oxford Nanopore-based technologies); deconvoluting data using any suitable software program (e.g., SUSHI, BaseSpace); identifying transgene/genome junction sequence reads using any suitable software program such as (Junction Finder); extending reads using any suitable software program (e.g., SSAKE, Abyss) and mapping insertions (Targeted Flanking Sequence-High Throughput). Flanking sequence data analysis and sequencing output provides read content including if the read is: insert only; genomic only; a combination of both insert and genomic sequence (junction reads – Figure 1) or background noise (optimally less than five percent). The Junction Finder algorithm pulls and filters sequence reads based on known insert sequence and read depth. Insert-only reads are filtered out, junction reads are condensed, and 80/20 junction reads for each insertion site and border are returned.

Any software that allows for the assembly of short sequences may be used. For example, SSAKE junction extension involves short sequence assembly by K-mer search and 3' read extension. SSAKE progressively searches for perfect 3'-most k-mer using a DNA prefix tree to select desired "seed"-sequence. See Figure 2 for an example. Figure 2 >contig1/size505/read1/cov10000.00/seed:A1_1RVb1_1 ATCGATCGTACGTAGCTAGCTACGATTTTCGATACGAACGATCGTAGCATGCTACTGCGATCTGACTAGCTAGCTAGTCGATGCTACTAGCAGTCGATGCATGCGATCGTACGATCTCTTTTCGCGGATACTGAGTCTGACTGACTAGTCGATCTGACTATCATCTGACGTATATTGCGCGCTATCTCTTACGATCGATCGATTTTACTGATCTGACTATCGATCATCGATCATCAGGGCTATTTGCTATCATCTAGTCGATCAGCTATCTGATCGATCGATCATCACGATTTTTCGGCGCGCGCGCTATACGTACTGATCGATCTCGTACTGATCTGACTATCGATCGATCGACTGATCAGTCGATCGATCGATCGATCGTACGATGCTATCGAGTCTATTATATATACGCGCGCGCTAGCTATCGATCGATCGATCGTACGATCGACTACTAGCTAGCTAGCTAGCGCGCGCGCGCGATATCGCGCGGCGCTATATCGATTTC Green = "seed" sequence, purple = SSAKE confirmed via Sanger sequencing, yellow = SSAKE genomic sequence, white = SSAKE transgene sequence. Extension of "seed" sequence from 100bp up to 2.3Kb (average = 928bp). Insertion site characterization provides information on; transgene/genome junction sequence; truncation points of the transgene (left and right T-DNA borders); length of flanking sequence; transgene sequence fragments (transgene/transgene junctions); insertion copy number for each border; genomic and physical map location; risk of gene disruption; 5' and 3' nearest genes and distance to these genes; 5' and 3' nearest genetic markers. Data is available as an Excel output and reported and stored in a searchable database. Other applications of FS include promoter discovery and construct identification. Promoter discovery enables the targeting of unknown promoter (or any other genomic DNA) sequence. In this example, primers are designed against the 5' region of the coding sequence. As a result, up to one kilobase of sequence will be generated 5' proximal to the primer site. This application has been performed successfully for promoter discovery, flanking sequence extension and copy number variation validation. The use of a next generation sequencing technology such as Illumina allows for the priming and sequence analysis of multiple loci given the read numbers that these technologies provide over Sanger sequencing technologies. Construct identification enables the identification of unknown transformation constructs in a sample, for example, eukaryotes such as plants, animals, or prokaryotes. This approach uses a core set of DNA primers that are commonly used to amplify across junctions that join features in a gene construct or a transformation vector. Sequences obtained from these primers are aligned against a DNA database of transformation vectors, enabling the identification of construct elements within the sample. The method, including the laboratory and analysis steps, as described above can be applied to additional applications. These applications include, but are not limited to 1) the analysis of activation tagged populations – resulting in mapping enhancer elements locations for high throughput trait screening in economically important plant and animal species; 2) mutator co-segregation analysis in crop species - targeting hundreds of Mu locations per plant and identifying common Mu locations in out-crossed plants; 3) construct identification – identifying a transgene in an unknown transgenic organism with limited prior information regarding the transformation event and; 4) tiling applications to detect partial insertions of DNA sequence that comprise transformation vectors.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 48% of the total text.

Page 01 of 3

Method to Characterize DNA Sequence Composition Outside of Known Border Sequence (Flanking Sequencing)

The method described enables the sequence analysis of a genetic element of interest (e.g., transgene or transposon) and flanking DNA in a high throughput manner for all transgenic species - including crop plants (e.g., maize, soybean, canola and rice), animals and microbes.

In this approach, DNA is isolated from a sample of interest (e.g., event, line, variety, isolate, strain, Tn), the isolated DNA is fragmented and adapter sequences are attached and DNA barcodes are assigned. Ligation-mediated-nested PCR is then performed on each border using modified DNA primers. Samples are then pooled, DNA sequenced and analyzed - which deconvolutes, "seeds" (junction read), extends and maps the genetic element of interest within the genome of the sample of interest.

In one example, the flanking sequence (FS) method includes the steps of: isolating sample DNA; shearing or digesting DNA into the preferred size range of 1-2kb fragments; attaching adapters and assigning DNA barcodes; performing ligation-mediated-nested PCR of each border with modified DNA primers; pooling samples, and perform DNA sequence analysis on a next generation DNA sequencing instrument (e.g., Illumina, Ion Torrent™, Pacific Biosciences® and Oxford Nanopore-based technologies); deconvoluting data using any suitable software program (e.g., SUSHI, BaseSpace); identifying transgene/genome junction sequence reads using any suitable software program such as (Junction Finder); extending reads using any suitable software program (e.g., SSAKE, Abyss) and mapping insertions (Targeted Flanking Sequence- High Throughput).

Flanking sequence data analysis and sequencing output provides read content including if the read is: insert only; genomic only; a combination of both insert and genomic sequence (junction reads - Figure 1) or background noise (optimally less than five percent). The Junction Finder algorithm pulls and filters sequence reads based on known insert sequence and read depth. Insert-only reads are filtered out, junction reads are condensed, and 80/20 junction reads for each insertion site and border are returned.


Page 02 of 3

Figure 1

1

2

1

2

2

1

2 1

Genomics Sequence Insert Sequence

ATCGATCGAGGGCTCTAGCTACACGGATGCTATAAACGCGCCATCACGT

ACGCGATCGTACGATTACGATAACGGATGCTATAAACGCGCCATCACGT

CATCGATCGAGGGCTCTAGCTACACGGATGCTATAAACGCGCCATCACG

TACGCGATCGTACGATTACGATAACGGATGCTATAAACGCGCCATCACG

GTACGCGATCGTACGATTACGATAACGGATGCTATAAACGCGCCATCAC

GCATCGATCGAGGGCTCTAGCTACACGGATGCTATAAACGCGCCATCAC

CGTACGCGATCGTACGATTACGATAACGGATGCTATAAACGCGCCATCA AGCATCGATCGAGGGCTCTAGCTACACGGATGCTATAAACGCGCCATCA

Genomic sequence in black font, insert (transgenic) sequence in red.

Any software that allows for the assembly of short sequences may be used. For example, SSAKE junction extension involves short sequence assembly by K-mer search and 3' read extension. SSAKE progressively searches for perfect 3...