Wiley Registry of Tandem Mass Spectral Data, MS for ID
Mass spectrometric-based identification of small molecules
Gas chromatography (GC) hyphenated to electron impact ionization mass spectrometry (EI-MS) represents the "golden standard" for general unknown screening. Over the last decades very large libraries of standardized spectra have been created for GC-MS techniques, enabling simultaneous screening for thousands of compounds. Despite its proven record of success, GC-MS is faced with problems regarding the detectability of polar, thermally labile, and high mass molecules. Hence, complementary ionization techniques have been developed (ESI, APCI). ESI and APCI are soft ionization techniques. Usually, only molecular ions are formed. Via accurate mass measurements the elemental formula of a molecule can be determined. Currently, time-of-flight (TOF)-MS represents the most cost-effective technique for performing accurate mass analysis on a routine basis. Due to the inability of MS to differentiate isobaric substances, the molecular formula represents an insufficient amount of information for unequivocal identification. Collision-induced dissociation (CID) can be used to obtain structural-related information of analytes. Diagnostic fragment ions are selectively in the collision cell of an instrument dedicated for tandem MS (MS/MS). For ESI and APCI, the energetic characteristics of ion production and activation are much less well defined compared to EI. The ions need to cross a high-pressure region, where their internal energy can be modified, before they enter the mass analyzer system. Consequently, CID spectra may strongly differ upon applied experimental conditions (pressure, acceleration voltages, nature of the solution and the gas phases), which makes them difficult to compare. Thus, transferable tandem mass spectral libraries have not been established yet.
Our strategy for the setup of a tandem mass spectral library
To build up the spectral library, 1000 substances were used as reference samples. At the current stage of development the database mainly consists of drugs for therapeutic purposes as well as illicit substances. All investigated compounds are of forensic or toxicological interest as they are able to cause severe or even fatal intoxications. Tandem mass spectra were acquired on a QqTOF instrument. To increase the tolerance of the library towards the applied collision energy (CE), product ion spectra of reference compounds were acquired at ten different CE values between 5 eV and 50 eV. As expected, the applied CE affected the number of detected fragments as well as the measured relative signal intensities. Spectra showing low, medium, and high levels of fragmentation were observed. Because of saturation effects and to avoid false positive matches of the precursor ion with product ions associated to alternative compounds, all signals within a 4.0 u window around the m/z of the precursor ion were deleted from the obtained reference spectra. For a further increase of specificity, all signals found in a reference spectrum that could not positively contribute to the precursor identification were eliminated. Only those signals with a relative intensity above 0.01%, and which were observed twice or more times within a collection of substance-specific product ion spectra were considered to be suitable for identification. The remaining species were deleted from the reference spectra. Finally, artefacts were erased that arose from improper centroiding and bypassed the already installed filtering steps.
The library search algorithm
Depending on the applied experimental conditions, the number of fragment ions and/or the corresponding signal intensities can vary between compound-specific MS/MS-spectra. Common library search algorithms were developed and optimized for the comparison of highly reproducible EI-spectra. Thus, they often malfunction if the identity of compounds needs to be proven via the comparison of MS/MS-spectra.

We have developed a sophisticated procedure dedicated to the identification of an unknown by finding similarity and/or identity between its fragment ion spectrum and a collection of fragment ion mass spectra stored in a library. The measured product ion mass spectrum of an unknown compound represents the input for library search. The spectrum is compared with all mass spectra stored in the library. In each case the similarity is determined. The estimation of similarity starts with the identification of ions that are present in both of the two spectra compared. They are called 'matching fragments' (mf). For a match, the difference of the m/z-values must be smaller than a user defined value ( =0.1 amu). Next, the 'reference spectrum-specific match probability' (mp) is calculated. The mp-value increases with increasing correlation between the two spectra compared. As the mass spectral library consists of MS/MS-spectra that have been collected at several different collision energy values for each single reference compound a number of mp-values are obtained. The reference compound-specific mp-values are averaged to yield the compound-specific 'average match probability' (amp). To facilitate comparison, amp is converted into the 'relative average match probability' (ramp). Consequently, single ramp-values range between 0 and 100. The substance with the highest ramp is considered to represent the unknown compound if its ramp exceeds a value of 50.0. Next, the monoisotopic mass of the best matching compound is checked for accordance with the monoisotopic mass of the precursor ion. If the monoisotopic masses do not agree with each other, identity is excluded. Only the presence of some structural similarity between the unknown and the best matching reference compound can be considered. Provided that the 'top hit' passes this final check the correct compound should have been identified with high probability.

Performance of the library search approach
We have evaluated the performance of the library search approach using a collection of 410 MS/MS-spectra corresponding to 22 library compounds was used. The MS/MS-spectra were collected in three different laboratories using different instrumental platforms (quadrupole-quadrupole-time-of-flight, quadrupole-quadrupole-linear ion trap, quadrupole-quadrupole-quadrupole, and linear ion trap-Fourier transform ion cyclotron resonance mass spectrometer) and matched against a library containing 3759 MS/MS-spectra of 402 compounds developed on a QqTOF instrument. For statistical evaluation of the identification performance the dataset was extended with 300 spectra corresponding to 100 compounds not included in the reference library. Sensitivity as well as the specificity of the library search approach exceeded a value of 0.95 clearly indicating a high predictive accuracy of the established matching procedure. As far as we know, no other search algorithm has reached a similar level of performance in the context of MS/MS-spectral library search before.

Applications
Many fields of application such as forensic analysis, bioanalysis, or environmental analysis may benefit from the developments.
Grant
Österreichische Forschungsförderungsgesellschaft: dnatox – Die Kopplung der Flüssigkeitschromatographie mit der Massenspektrometrie als Werkzeug für die Toxin- und DNA-Analytik, KIRAS PL 2 Projekt 813786, 2008-2009.
