psm_utils.io

Parsers for proteomics search results from various search engines.

psm_utils.io.read_file(filename: str | Path, *args, filetype: str = 'infer', **kwargs)

Read PSM file into PSMList.

Parameters:
  • filename (str) – Path to file.

  • filetype (str, optional) – File type. Any PSM file type with read support. See psm_utils tag in Supported file formats.

  • *args (tuple) – Additional arguments are passed to the psm_utils.io reader.

  • **kwargs (dict, optional) – Additional keyword arguments are passed to the psm_utils.io reader.

psm_utils.io.write_file(psm_list: PSMList, filename: str | Path, *args, filetype: str = 'infer', show_progressbar: bool = False, **kwargs)

Write PSMList to PSM file.

Parameters:
  • psm_list (PSMList) – PSM list to be written.

  • filename (str) – Path to file.

  • filetype (str, optional) – File type. Any PSM file type with read support. See psm_utils tag in Supported file formats.

  • show_progressbar (bool, optional) – Show progress bar for conversion process. (default: False)

  • *args (tuple) – Additional arguments are passed to the psm_utils.io writer.

  • **kwargs (dict, optional) – Additional keyword arguments are passed to the psm_utils.io writer.

psm_utils.io.convert(input_filename: str | Path, output_filename: str | Path, input_filetype: str = 'infer', output_filetype: str = 'infer', show_progressbar: bool = False)

Convert a PSM file from one format into another.

Parameters:
  • input_filename (str) – Path to input file.

  • output_filename (str) – Path to output file.

  • input_filetype (str, optional) – File type. Any PSM file type with read support. See psm_utils tag in Supported file formats.

  • output_filetype (str, optional) – File type. Any PSM file type with write support. See psm_utils tag in Supported file formats.

  • show_progressbar (bool, optional) – Show progress bar for conversion process. (default: False)

Examples

Convert a MaxQuant msms.txt file to a MS²PIP peprec file, while inferring the applicable file types from the file extensions:

>>> from psm_utils.io import convert
>>> convert("msms.txt", "filename_out.peprec")

Convert a MaxQuant msms.txt file to a MS²PIP peprec file, while explicitly specifying both file types:

>>> convert(
...     "filename_in.msms",
...     "filename_out.peprec",
...     input_filetype="msms",
...     output_filetype="peprec"
... )

Note that filetypes can only be inferred for select specific file names and/or extensions, such as msms.txt or *.peprec.

psm_utils.io.ionbot

Interface with ionbot PSM files.

Currently only supports the ionbot.first.csv files.

class psm_utils.io.ionbot.IonbotReader(filename: str | Path, *args, **kwargs)

Reader for ionbot.first.csv PSM files.

Parameters:

filename (str, pathlib.Path) – Path to PSM file.

Examples

IonbotReader supports iteration:

>>> from psm_utils.io.ionbot import IonbotReader
>>> for psm in IonbotReader("ionbot.first.csv"):
...     print(psm.peptidoform.proforma)
ACDEK
AC[Carbamidomethyl]DEFGR
[Acetyl]-AC[Carbamidomethyl]DEFGHIK

Or a full file can be read at once into a psm_utils.psm_list.PSMList object:

>>> ionbot_reader = IonbotReader("ionbot.first.csv")
>>> psm_list = ionbot_reader.read_file()
read_file() PSMList

Read full PSM file into a PSMList object.

exception psm_utils.io.ionbot.InvalidIonbotModificationError

Invalid Peptide Record modification.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

psm_utils.io.idxml

Interface with OpenMS idXML PSM files.

Notes

  • idXML supports multiple peptide hits (identifications) per spectrum. Each peptide hit is parsed as an individual PSM object.

class psm_utils.io.idxml.IdXMLReader(filename: Path | str, *args, **kwargs)

Reader for idXML files.

Parameters:

filename (str, pathlib.Path) – Path to idXML file.

Examples

>>> from psm_utils.io import IdXMLReader
>>> reader = IdXMLReader("example.idXML")
>>> psm_list = [psm for psm in reader]
read_file() PSMList

Read full PSM file into a PSMList object.

class psm_utils.io.idxml.IdXMLWriter(filename: str | Path, protein_ids=None, peptide_ids=None, *args, **kwargs)

Writer for idXML files.

Parameters:

Notes

  • Unlike other psm_utils.io writer classes, IdXMLWriter does not support writing a single PSM to a file with the write_psm() method. Only writing a full PSMList to a file at once with the write_file() method is currently supported.

  • If protein_ids and peptide_ids are provided, each PeptideIdentification object in the list peptide_ids will be updated with new rescoring_features from the PSMList. Otherwise, new pyopenms objects will be created, filled with information of PSMList and written to the idXML file.

Examples

  • Example with pyopenms objects:

>>> from psm_utils.io.idxml import IdXMLReader, IdXMLWriter
>>> reader = IdXMLReader("psm_utils/tests/test_data/test_in.idXML")
>>> psm_list = reader.read_file()
>>> for psm in psm_list:
...     psm.rescoring_features = {**psm.rescoring_features, **{"feature": 1}}
>>> writer = IdXMLWriter("psm_utils/tests/test_data//test_out.idXML", reader.protein_ids, reader.peptide_ids)
>>> writer.write_file(psm_list)
  • Example without pyopenms objects:

>>> from psm_utils.psm_list import PSMList
>>> psm_list = PSMList(psm_list=[PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2)])
>>> writer = IdXMLWriter("psm_utils/tests/test_data//test_out.idXML")
>>> writer.write_file(psm_list)
write_psm(psm: PSM)

Write a single PSM to the PSM file.

This method is currently not supported (see Notes).

Raises:

NotImplementedError – IdXMLWriter currently does not support write_psm.

write_file(psm_list: PSMList) None

Write the PSMList to the PSM file.

If self.protein_ids and self.peptide_ids are not None, the PSM list scores, ranks, and rescoring features will first be merged with the existing IDs from those objects.

exception psm_utils.io.idxml.IdXMLException

Exception in psm_utils.io.IdXML

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception psm_utils.io.idxml.IdXMLReaderEmptyListException

Exception in psm_utils.io.IdXMLReader

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

psm_utils.io.maxquant

Interface to MaxQuant msms.txt PSM files.

class psm_utils.io.maxquant.MSMSReader(filename: str | Path, *args, **kwargs)

Reader for MaxQuant msms.txt PSM files.

Parameters:
  • filename (str, pathlib.Path) – Path to PSM file.

  • decoy_prefix (str, optional) – Protein name prefix used to denote decoy protein entries. Default: "DECOY_".

Examples

MSMSReader supports iteration:

>>> from psm_utils.io.maxquant import MSMSReader
>>> for psm in MSMSReader("msms.txt"):
...     print(psm.peptidoform.proforma)
WFEELSK
NDVPLVGGK
GANLGEMTNAGIPVPPGFC[+57.022]VTAEAYK
...

Or a full file can be read at once into a PSMList object:

>>> reader = MSMSReader("msms.txt")
>>> psm_list = reader.read_file()
read_file() PSMList

Read full PSM file into a PSMList object.

exception psm_utils.io.maxquant.MSMSParsingError

Error while parsing MaxQuant msms.txt PSM file.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

psm_utils.io.msamanda

Interface to MS Amanda CSV result files.

class psm_utils.io.msamanda.MSAmandaReader(filename: str | Path, *args, **kwargs)

Reader for PSM file.

Parameters:

filename (str, pathlib.Path) – Path to PSM file.

read_file() PSMList

Read full PSM file into a PSMList object.

exception psm_utils.io.msamanda.MSAmandaParsingError

Error while parsing MS Amanda CSV PSM file.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

psm_utils.io.mzid

Reader and writers for the HUPO-PSI mzIdentML format.

See psidev.info/mzidentml for more info on the format.

class psm_utils.io.mzid.MzidReader(filename: str | Path, *args, score_key: str = None, **kwargs)

Reader for mzIdentML PSM files.

Parameters:
  • filename (str, pathlib.Path) – Path to PSM file.

  • score_key (str, optional) – Name of the score metric to use as PSM score. If not provided, the score metric is inferred from the file if one of the child parameters of MS:1001143 is present.

Examples

MzidReader supports iteration:

>>> from psm_utils.io.mzid import MzidReader
>>> for psm in MzidReader("peptides_1_1_0.mzid"):
...     print(psm.peptidoform.proforma)
ACDEK
AC[Carbamidomethyl]DEFGR
[Acetyl]-AC[Carbamidomethyl]DEFGHIK

Or a full file can be read at once into a psm_utils.psm_list.PSMList object:

>>> mzid_reader = MzidReader("peptides_1_1_0.mzid")
>>> psm_list = mzid_reader.read_file()

Notes

  • MzidReader looks for the retention time or scan start time cvParams in both SpectrumIdentificationResult and SpectrumIdentificationItem levels. Note that according to the mzIdentML specification document (v1.1.1) neither cvParams are expected to be present at either levels.

  • For the PSM.spectrum_id property, the spectrum title cvParam is preferred over the spectrumID attribute, as these titles always match the titles in the peak list files. spectrumID is then saved in PSM.metadata["mzid_spectrum_id"]. If spectrum title is absent, spectrumID is saved to PSM.spectrum_id.

read_file() PSMList

Read full PSM file into a PSMList object.

class psm_utils.io.mzid.MzidWriter(filename: str | Path, *args, show_progressbar: bool = False, **kwargs)

Writer for mzIdentML PSM files.

Parameters:
  • filename (str, Pathlib.Path) – Path to PSM file.

  • show_progressbar (bool, optional) – Show progress bar for conversion process. (default: False)

Notes

  • Unlike other psm_utils.io writer classes, MzidWriter does not support writing a single PSM to a file with the write_psm() method. Only writing a full PSMList to a file at once with the write_file() method is currently supported.

  • While not required according to the mzIdentML specification document (v1.1.1), the retention time is written as cvParam retention time to the SpectrumIdentificationItem element. As the actual unit is not known in psm_utils, the unit is written as seconds.

  • As the actual PSM score type is not known in psm_utils, the score is written as cvParam MS:1001153 to the SpectrumIdentificationItem element.

write_psm(psm: PSM)

Write a single PSM to the PSM file.

This method is currently not supported (see Notes).

Raises:

NotImplementedError – MzidWriter currently does not support write_psm.

write_file(psm_list: PSMList)

Write entire PSMList to mzid file.

exception psm_utils.io.mzid.UnknownMzidScore

No known score metric found in mzIdentML file.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

psm_utils.io.peptide_record

Interface with Peptide Record PSM files.

Peptide Record (or PEPREC) is a legacy PSM file type developed at CompOmics as input format for MS²PIP. It is a simple and flexible delimited text file where each row represents a single PSM. Required columns are:

  • spec_id: Spectrum identifier; usually the identifier used in the spectrum file.

  • peptide: Simple, stripped peptide sequence (e.g., ACDE).

  • modifications: Amino acid modifications in a custom format (see below).

Depending on the use case, more columns can be required or optional:

  • charge: Peptide precursor charge.

  • observed_retention_time: Observed retention time.

  • predicted_retention_time: Predicted retention time.

  • label: Target/decoy: 1 for target PSMs, -1 for decoy PSMs.

  • score: Primary search engine score (e.g., the score used for q-value calculation).

Peptide modifications are denoted as a pipe-separated list of pipe-separated location → label pairs for each modification. The location is an integer counted starting at 1 for the first amino acid. 0 is reserved for N-terminal modifications and -1 for C-terminal modifications. Unmodified peptides can be marked with a hyphen (-). For example:

PEPREC modification(s)

Explanation

-

Unmodified

1|Oxidation

Oxidation on the first amino acid

1|Oxidation|5|Carbamidomethyl

Oxidation on the first amino acid and Carbamidomethyl on the fifth

0|Acetylation

Acetylation on the N-terminus

-1|Amidation

Amidation on the C-terminus

Full PEPREC example:

spec_id,modifications,peptide,charge
peptide1,-,ACDEK,2
peptide2,2|Carbamidomethyl,ACDEFGR,3
peptide3,0|Acetyl|2|Carbamidomethyl,ACDEFGHIK,2

Attention

Labile, unlocalized, and fixed modifications are not encoded in the Peptide Record notation. To encode fixed modifications, use apply_fixed_modifications() before writing to Peptide Record.

class psm_utils.io.peptide_record.PeptideRecordReader(filename: str | Path, *args, **kwargs)

Reader for Peptide Record PSM files.

Parameters:

filename (str, pathlib.Path) – Path to PSM file.

Examples

PeptideRecordReader supports iteration:

>>> from psm_utils.io.peptide_record import PeptideRecordReader
>>> for psm in PeptideRecordReader("peprec.txt"):
...     print(psm.peptidoform.proforma)
ACDEK
AC[Carbamidomethyl]DEFGR
[Acetyl]-AC[Carbamidomethyl]DEFGHIK

Or a full file can be read at once into a PSMList object:

>>> peprec_reader = PeptideRecordReader("peprec.txt")
>>> psm_list = peprec_reader.read_file()
read_file() PSMList

Read full PSM file into a PSMList object.

class psm_utils.io.peptide_record.PeptideRecordWriter(filename, *args, **kwargs)

Writer for Peptide Record PSM files.

Parameters:

filename (str, Path) – Path to PSM file

write_psm(psm: PSM)

Write a single PSM to new or existing Peptide Record PSM file.

Parameters:

psm (PSM) – PSM object to write.

Examples

To write single PSMs to a file, PeptideRecordWriter must be opened as a context manager. Then, within the context, write_psm() can be called:

>>> with PeptideRecordWriter("peprec.txt") as writer:
>>>     writer.write_psm(psm)
write_file(psm_list: PSMList)

Write an entire PSMList to a new Peptide Record PSM file.

Parameters:

psm_list (PSMList) – PSMList object to write to file.

Examples

>>> writer = PeptideRecordWriter("peprec.txt")
>>> writer.write_file(psm_list)
psm_utils.io.peptide_record.peprec_to_proforma(peptide: str, modifications: str, charge: int | None = None) Peptidoform

Convert Peptide Record notation to Peptidoform.

Parameters:
  • peptide (str) – Stripped peptide sequence.

  • modifications (str) – Modifications in Peptide Record notation (e.g., 4|Oxidation)

  • charge (int, optional) – Precursor charge state

Returns:

peptidoform – Peptidoform

Return type:

psm_utils.peptidoform.Peptidoform

Raises:

InvalidPeprecModificationError – If a PEPREC modification cannot be parsed.

psm_utils.io.peptide_record.proforma_to_peprec(peptidoform: Peptidoform)

Convert Peptidoform to Peptide Record notation.

Parameters:

peptidoform (psm_utils.peptidoform.Peptidoform) –

Returns:

  • peptide (str) – Stripped peptide sequence

  • modifications (str) – Modifications in Peptide Record notation

  • charge (int, optional) – Precursor charge state, if available, else None

Notes

Labile, unlocalized, and fixed modifications are not encoded in the Peptide Record notation. To encode fixed modifications, use apply_fixed_modifications() before writing to Peptide Record.

psm_utils.io.peptide_record.from_dataframe(peprec_df: DataFrame) PSMList

Convert Peptide Record Pandas DataFrame into PSMList.

Parameters:

peprec_df (pandas.DataFrame) – Peptide Record DataFrame

Returns:

psm_list – PSMList object

Return type:

PSMList

psm_utils.io.peptide_record.to_dataframe(psm_list: PSMList) DataFrame

Convert PSMList object into Peptide Record Pandas DataFrame.

Parameters:

psm_list (PSMList) –

Return type:

pd.DataFrame

Examples

>>> psm_list = PeptideRecordReader("peprec.csv").read_file()
>>> psm_utils.io.peptide_record.to_dataframe(psm_list)
    spec_id    peptide               modifications  charge  label  ...
0  peptide1      ACDEK                           -       2      1  ...
1  peptide2    ACDEFGR           2|Carbamidomethyl       3      1  ...
2  peptide3  ACDEFGHIK  0|Acetyl|2|Carbamidomethyl       2      1  ...
exception psm_utils.io.peptide_record.InvalidPeprecError

Invalid Peptide Record file.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception psm_utils.io.peptide_record.InvalidPeprecModificationError

Invalid Peptide Record modification.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

psm_utils.io.pepxml

Interface with TPP pepXML PSM files.

class psm_utils.io.pepxml.PepXMLReader(filename: str | Path, *args, score_key: str = None, **kwargs)

Reader for pepXML PSM files.

Parameters:
  • filename (str, pathlib.Path) – Path to PSM file.

  • score_key (str, optional) – Name of the score metric to use as PSM score. If not provided, the score metric is inferred from a list of known search engine scores.

read_file() PSMList

Read full PSM file into a PSMList object.

psm_utils.io.percolator

Reader and writers for Percolator Tab PIN/POUT PSM files.

The tab-delimited input and output format for Percolator are defined on the Percolator GitHub Wiki pages.

Notes

  • While PercolatorTabReader supports reading the peptide notation with preceding and following amino acids (e.g. R.ACDEK.F), these amino acids are not stored and are not written by PercolatorTabWriter.

class psm_utils.io.percolator.PercolatorTabReader(filename: str | Path, score_column=None, retention_time_column=None, mz_column=None, *args, **kwargs)

Reader for Percolator Tab PIN/POUT PSM file.

As the score, retention time, and precursor m/z are often embedded as feature columns, but not with a fixed column name, their respective column names need to be provided as parameters to the class. If not provided, these properties will not be added to the resulting PSM. Nevertheless, they will still be added to its rescoring_features property dictionary, along with the other features.

Parameters:
  • filename (str, pathlib.Path) – Path to PSM file.

  • score_column (str, optional) – Name of the column that holds the primary PSM score.

  • retention_time_column (str, optional) – Name of the column that holds the retention time.

  • mz_column (str, optional) – Name of the column that holds the precursor m/z.

read_file() PSMList

Read full PSM file into a PSMList object.

class psm_utils.io.percolator.PercolatorTabWriter(filename: str | Path, style: str = 'pin', feature_names: list[str] | None = None, add_basic_features: bool = False, *args, **kwargs)

Writer for Percolator TSV “PIN” and “POUT” PSM files.

Parameters:
  • filename (str, pathlib.Path) – Path to PSM file.

  • style (str) – Percolator Tab style. One of {pin, pout}. If pin, the columns SpecId, Label, ScanNr, ChargeN, PSMScore, Peptide, and Proteins are written alongside the requested feature names (see feature_names). If pout, the columns PSMId, Label, score, q-value, posterior_error_prob, peptide, and proteinIds are written.

  • feature_names (list[str], optional) – List of feature names to extract from PSMs and write to file. List values should correspond to keys in the rescoring_features property. If None, no rescoring features will be written to the file. If appending to an existing file, the existing header will be used to determine the feature names. Only has effect with pin style.

  • add_basic_features (bool, optional) – If True, add PSMScore and ChargeN features to the file. Only has effect with pin style. Default is False.

write_psm(psm: PSM)

Write a single PSM to the PSM file.

write_file(psm_list: PSMList)

Write an entire PSMList to the PSM file.

psm_utils.io.percolator.join_pout_files(target_filename: str | Path, decoy_filename: str | Path, output_filename: str | Path)

Join target and decoy Percolator Out (POUT) files into single PercolatorTab file.

Parameters:
  • target_filename (str, Path) –

  • decoy_filename (str, Path) –

  • output_filename (str, Path) –

psm_utils.io.proteome_discoverer

Reader for Proteome Discoverer MSF PSM files.

class psm_utils.io.proteome_discoverer.MSFReader(filename: str | Path, *args, **kwargs)

Reader for Proteome Discoverer MSF file.

Parameters:

filename (str, pathlib.Path) – Path to MSF file.

read_file() PSMList

Read full PSM file into a PSMList object.

psm_utils.io.sage

Reader for PSM files from the Sage search engine.

Reads the results.sage.tsv file as defined on the Sage documentation page.

class psm_utils.io.sage.SageReader(filename, score_column: str = 'sage_discriminant_score', *args, **kwargs)

Reader for Sage results.sage.tsv file.

Parameters:
  • filename (str or Path) – Path to PSM file.

  • score_column (str, optional) – Name of the column that holds the primary PSM score. Default is sage_discriminant_score, hyperscore could also be used.

read_file() PSMList

Read full PSM file into a PSMList object.

psm_utils.io.tsv

Reader and writer for a simple, lossless psm_utils TSV format.

Most PSM file formats will introduce a loss of some information when reading, writing, or converting with psm_utils.io due to differences between file formats. In contrast, PSMList objects can be written to — or read from — this simple TSV format without any information loss (with exception of the free-form spectrum attribute).

The format follows basic TSV rules, using tab as delimiter, and supports quoting when a field contains the delimiter. Peptidoforms are written in the HUPO-PSI ProForma 2.0 notation.

Required and optional columns equate to the required and optional attributes of PSM. Dictionary items in provenance_data, metadata, and rescoring_features are flattened to separate columns, each with their column names prefixed with provenance:, meta:, and rescoring:, respectively.

Examples

Minimal psm_utils TSV file
peptidoform spectrum_id
RNVIDKVAK/2 1
KHLEQHPK/2  2
...
Recommended psm_utils TSV file, compatible with HUPO-PSI Universal Spectrum Identifier
peptidoform spectrum_id     run     collection
VLHPLEGAVVIIFK/2    17555   Adult_Frontalcortex_bRP_Elite_85_f09    PXD000561
...
Full psm_utils TSV file, converted from a Percolator Tab file
peptidoform spectrum_id     run     collection      spectrum        is_decoy        score   precursor_mz    retention_time  protein_list    source  provenance:filename     rescoring:ExpMass       rescoring:CalcMass      rescoring:hyperscore    rescoring:deltaScore    rescoring:frac_ion_b    rescoring:frac_ion_y    rescoring:Mass  rescoring:dM    rescoring:absdM rescoring:PepLen        rescoring:Charge2       rescoring:Charge3       rescoring:Charge4       rescoring:enzN  rescoring:enzC  rescoring:enzInt
RNVIDKVAK/2 _3_2_1                          False   20.3    1042.64         ['DECOY_sp|Q8U0H4_REVERSED|RTCB_PYRFU-tRNA-splicing-ligase-RtcB-OS=Pyrococcus-furiosus...']     percolator      pyro.t.xml.pin  1042.64 1042.64 20.3    6.6     0.444444        0.333333        1042.64 0.0003  0.0003  9       1       0       0       1       0       1
KHLEQHPK/2  _4_2_1                          False   26.5    1016.56         ['sp|Q8TZD9|RS15_PYRFU-30S-ribosomal-protein-S15-OS=Pyrococcus-furiosus-(strain-ATCC...']       percolator      pyro.t.xml.pin  1016.56 1016.56 26.5    18.5    0.375   0.75    1016.56 0.001   0.001   8       1       0       0       1       0       0
...
class psm_utils.io.tsv.TSVReader(filename: str | Path, *args, **kwargs)

Reader for PSM file.

Parameters:

filename (str, pathlib.Path) – Path to PSM file.

read_file() PSMList

Read full PSM file into a PSMList object.

class psm_utils.io.tsv.TSVWriter(filename: str | Path, example_psm: PSM | None = None, *args, **kwargs)

Reader for psm_utils TSV format.

Parameters:
  • filename (str, Pathlib.Path) – Path to PSM file.

  • example_psm (psm_utils.psm.PSM, optional) – Example PSM, required to extract the column names when writing to a new file. Should contain all fields that are to be written to the PSM file, i.e., all items in the provenance_data, metadata, and rescoring_features attributes. In other words, items that are not present in the example PSM will not be written to the file, even though they are present in other PSMs passed to write_psm() or write_file().

write_psm(psm: PSM)

Write a single PSM to new or existing PSM file.

Parameters:

psm (PSM) – PSM object to write.

write_file(psm_list: PSMList)

Write an entire PSMList to a new PSM file.

Parameters:

psm_list (PSMList) – PSMList object to write to file.

psm_utils.io.timscore

Reader for TIMScore Parquet files.

class psm_utils.io.timscore.TIMScoreReader(filename: str | Path, *args, **kwargs)

Reader for TIMScore Parquet files.

Parameters:

filename (str, pathlib.Path) – Path to MSF file.

read_file() PSMList

Read full PSM file into a PSMList object.

psm_utils.io.xtandem

Interface with X!Tandem XML PSM files.

Notes

  • In X!Tandem XML, N/C-terminal modifications are encoded as normal modifications and are therefore parsed accordingly. Any information on which modifications are N/C-terminal is therefore lost.

    N-terminal modification in X!Tandem XML:

    <aa type="M" at="1" modified="42.01057" />
    
  • Consecutive modifications, i.e., a modified residue that is modified further, is encoded in X!Tandem XML as two distinctive modifications on the same site. However, in psm_utils, multiple modifications on the same site are not supported. While parsing X!Tandem XML PSMs, the mass shift labels of these two modifications will therefore be summed into a single modification.

    For example, carbamidomethylation of cystein (57.02200) plus ammonia-loss (-17.02655) will be parsed as one modification with mass shift 39.994915, which matches the combined modification Pyro-carbamidomethyl:

    <aa type="C" at="189" modified="57.02200" />
    <aa type="C" at="189" modified="-17.02655" />
    
    [+39,99545]
    
  • Although X!Tandem XML allows multiple peptide/protein identifications per entry, only the first peptide/protein per entry is parsed.

class psm_utils.io.xtandem.XTandemReader(filename: str | Path, *args, decoy_prefix='DECOY_', score_key='expect', **kwargs)

Reader for X!Tandem XML PSM files.

Parameters:
  • filename (str, pathlib.Path) – Path to PSM file.

  • decoy_prefix (str, optional) – Protein name prefix used to denote decoy protein entries. Default: "DECOY_".

  • score_key (str, optional) – Key of score to use as PSM score. One of "expect", "hyperscore", "delta", or "nextscore". Default: "expect". The "expect" score (e-value) is converted to its negative natural logarithm to facilitate downstream analysis.

Examples

XTandemReader supports iteration:

>>> from psm_utils.io.xtandem import XTandemReader
>>> for psm in XTandemReader("pyro.t.xml"):
...     print(psm.peptidoform.proforma)
WFEELSK
NDVPLVGGK
GANLGEMTNAGIPVPPGFC[+57.022]VTAEAYK
...

Or a full file can be read at once into a PSMList object:

>>> reader = XTandemReader("pyro.t.xml")
>>> psm_list = reader.read_file()
read_file() PSMList

Read full PSM file into a PSMList object.

exception psm_utils.io.xtandem.XTandemException
add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception psm_utils.io.xtandem.XTandemModificationException
add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.