The complexity of mass spectra is driven by the increasing number and density of ion signals and the growing size and heterogeneity of analytes. It is imperative to (i) improve our understanding of the relationship between mass spectral peak appearance and instrument fundamentals, such as peak interference and resolution dependence on m/z, and (ii) develop novel algorithms and tools for accurate extraction of maximum information from these complex mass spectra. Obtaining “gold-standard” datasets is crucial for training and validating bioinformatics tools, such as those for protein deconvolution and product ion annotation in top-down proteomics. To address these needs, we developed FTMS Simulator, a stand-alone Python-based software tool.
The tool has been adapted to enable accurate simulations of Orbitrap, FT-ICR, and TOF mass spectra of any complexity, including LC-MS/MS datasets, and for analytes ranging from elements and metabolites to monoclonal antibodies (mAbs) and viruses. FTMS analysis of proteins, including mAbs, is characterized by isotopic beating effects, which influence the resolution and signal/noise ratio of protein signals as a function of transient length. Ignoring such information can lead to significant artifacts in protein mass spectra analysis.
In addition to simulating individual isotopic envelopes and full mass spectra for specific instrument models and settings, these simulations aid feature extraction from experimental datasets via spectral matching. We have demonstrated that compound databases of diverse sizes can be efficiently converted into simulated mass spectra and used to extract features directly in the m/z space. This approach has shown advantages in several applications, including affinity-selection MS, where large databases of ligands (small molecules) are evaluated for their interaction strength with proteins. We will discuss these and other applications and outline future strategies in using the simulated mass spectra datasets.