Tandem mass spectrometric sequencing of nucleic acids

Nucleic acids sequencing

Over the past two decades automated Sanger sequencing has served as the most widely used analytical tool for DNA sequencing. Sanger sequencing offers high quality data and long read lengths. Thus, it can be used to establish the identity of both known and unknown sequence specific nucleotide variations and has been proclaimed the "gold standard" against which other technologies must be judged. Applications span numerous research interests, including sequence variation studies, comparative genomics, forensics, and diagnostics. Sanger sequencing, however, is rather time-consuming, laborious, and expensive. Thus, several alternative sequencing strategies have been introduced in recent years. Among those techniques mass spectrometry (MS) represent a promising method for the characterization of small or modified nucleic acids, which are hardly amenable to standard sequencing methods.

Mass spectrometric sequencing

Mass spectrometric sequencing of nucleic acids often relies on the reconstruction of sequences from measuring molecular mass differences between members of oligonucleotide ladders synthesized via molecular biological methods such as degradation, cleavage, or chain termination synthesis. Despite considerable success in obtaining sequence information of oligonucleotides consisting of more than 100 nucleotides (nts), the combination of various enzymatic steps with mass spectrometric measurements usually renders enzymatic assays too costly and time consuming for large-scale application.

Tandem mass spectrometric sequencing

The term tandem mass spectrometry (MS/MS) summarizes mass spectrometric methods concerned with the selection of a particular ion (= precursor ion) and its activation to generate characteristic secondary fragment ions. Oligonucleotides consist of a limited number of different nts and the bonds between these building blocks are well known that are preferably broken in MS/MS experiments. In the majority of cases collision induced dissociation (CID) experiments are used to obtain structural information from multiply charged oligonucleotide ions. Other techniques applied for oligonucleotide fragmentation with Fourier transform mass spectrometry (FTMS) include infrared multiphotone dissociation (IRMPD), electron capture dissociation (ECD), and blackbody infrared radiative dissociation (BIRD).

For oligonucleotides, CID typically produces an-Bn- and wn-type fragment ions whereas for oligoribonucleotides cn- and yn-type ions dominate. Different types of mass spectrometers can be applied for activation and scanning of fragment ions. These can be classified either as 'tandem-in-space'- or as 'tandem-in-time'-instruments. Tandem-in-time MS, as implemented on an ion-trap, is the process whereby precursor ions are created, stored in a trapped ion cell, and then sequentially fragmented to form product ions by translational excitation and subsequent collision with a background gas. Tandem-in-time MS has been extensively used to study the fragmentation mechanisms of oligonucleotides, and was successfully applied for the sequence verification of oligonucleotides consisting of more than 50 nts. Tandem-in-space experiments are typically performed on instruments consisting of three parts. In the first segment, which is most often a quadrupole filter, precursor ions are selected for fragmentation, which is accomplished in the subsequent part of the instrument (= gas-filled collision cell). Finally, the fragment ions are scanned in the third section of the instrument, which can either be another quadrupole, a linear ion trap, or a time-of-flight mass analyzer. Several groups have studied the fragmentation behavior of oligonucleotides in tandem-in-space MS/MS. In comparison to tandem-in-space MS the beam-type collision activation allows for multiple competitive dissociation reactions to be observed, which tend to give more extensive structural information for short oligonucleotides. We could show that this type of fragmentation cannot be used for the sequence verification of oligonucleotides consisting of more than 25-30 nts.. 

A distinct advantage of tandem mass spectral sequencing is its inherent speed of data generation. Fragment ion mass spectra can be obtained very rapidly in a time frame of several seconds upon fragmentation and subsequent mass analysis of the fragments directly in a mass spectrometer. Data interpretation, however, still represents a bottleneck. The complexity of fragment ion mass spectrum interpretation increases with the length of the precursor ion rendering "manual" interpretation of MS/MS spectra a difficult and time-consuming task. We have recently introduced a comparative sequencing algorithm (COMPAS) for the computer-aided interpretation of fragment ion mass spectra generated by ESI-MS/MS. By applying the algorithm for the interpretation of data obtained from CID in a quadrupole ion trap instrument, we were able to successfully verify the sequences of oligodeoxynucleotides as large as 80-mers.

For a given oligonucleotide sequence COMPAS calculates its fitness (FS). FS determination starts with the generation of a list of monoisotopic m/z values representing all theoretically possible an-Bn- and wn-ions for the given reference sequence. Then, the predicted m/z values and those obtained from the experimental spectrum are compared and FS is calculated. The maximum tolerable mass deviation for matching predicted and calculated fragment ions was fixed at 0.1. FS takes into account the difference   between measured and predicted m/z values, the relative intensity I% of the fragment ions, the number K of fragments assigned, and the number M of nucleotide positions not covered by fragment ions in the experimental spectrum. The larger the value for FS, the closer the match between the measured and the predicted spectra. Finally, in order to find a sequence most closely matching the experimental spectrum, the first reference sequence is sequentially permutated by incorporating all four possible nucleotides A, T, G, and C at each position in the sequence. The correct sequence is then identified by that reference sequence having the highest FS value.

 Comparative sequencing algorithm (COMPAS)

Home » Research » Bioanalytical Mass Spectrometry » Comparative sequencing algorithm (COMPAS)