
Common utilities for parsing and handling PSMs, and search engine results.

class psm_utils.Peptidoform(proforma_sequence: str | ProForma)

Peptide sequence, modifications and charge state represented in ProForma notation.


proforma_sequence – Peptidoform sequence in ProForma v2 notation as str or pyteomics.proforma.ProForma object.


List of tuples with residue and modifications for each location.




Dict with sequence-wide properties.


dict[str, Any]


>>> peptidoform = Peptidoform("ACDM[Oxidation]EK")
>>> peptidoform.theoretical_mass
property proforma: str

Peptidoform sequence in ProForma v2 notation.


>>> Peptidoform("AC[U:4]DEK/2").proforma
property sequence: str

Stripped peptide sequence (modifications removed).


>>> Peptidoform("AC[U:4]DEK/2").sequence
property precursor_charge: int | None

Returns the charge state as integer or None if no charge assigned.


>>> Peptidoform("ACDEK/2").precursor_charge
>>> Peptidoform("ACDEK").precursor_charge
property is_modified: bool

Whether or not the peptidoform carries any modification of any type.

Includes N- and C-terminal, fixed, sequential, labile, and unlocalized modifications.

property sequential_composition: list[Composition]

Atomic compositions of both termini and each residue, including modifications.

Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.


>>> Peptidoform("ACDEK/2").sequential_composition
[Composition({'H': 1}),
Composition({'H': 5, 'C': 3, 'O': 1, 'N': 1}),
Composition({'H': 5, 'C': 3, 'S': 1, 'O': 1, 'N': 1}),
Composition({'H': 5, 'C': 4, 'O': 3, 'N': 1}),
Composition({'H': 7, 'C': 5, 'O': 3, 'N': 1}),
Composition({'H': 12, 'C': 6, 'N': 2, 'O': 1}),
Composition({'H': 1, 'O': 1})]
property composition: Composition

Atomic composition of the full peptidoform.

Includes all modifications, also labile and unlocalized.


>>> Peptidoform("ACDEK/2").composition
Composition({'H': 36, 'C': 21, 'O': 10, 'N': 6, 'S': 1})
property sequential_theoretical_mass: float

Monoisotopic mass of both termini and each (modified) residue.

Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.


>>> Peptidoform("ACDEK/2").sequential_theoretical_mass
property theoretical_mass: float

Monoisotopic mass of the full uncharged peptidoform.

Includes all modifications, also labile and unlocalized.


>>> Peptidoform("ACDEK/2").theoretical_mass
property theoretical_mz: float | None

Monoisotopic mass-to-charge ratio of the full peptidoform.

Includes all modifications, also labile and unlocalized.


>>> Peptidoform("ACDEK/2").theoretical_mz
>>> Peptidoform("AC[+57.021464]DEK/2").theoretical_mz
rename_modifications(mapping: dict[str, str]) None

Apply mapping to rename modification tags.


mapping (dict[str, str]) – Mapping of old labelnew label for each modification that requires renaming. Modification labels that are not in the mapping will not be renamed.


>>> peptidoform = Peptidoform('[ac]-PEPTC[cmm]IDEK')
>>> peptidoform.rename_modifications({
...     "ac": "Acetyl",
...     "cmm": "Carbamidomethyl"
... })
>>> peptidoform.proforma
add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])

Add fixed modifications to peptidoform.

Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.


>>> peptidoform = Peptidoform("ATPEILTCNSIGCLK")
>>> peptidoform.add_fixed_modifications([("Carbamidomethyl", ["C"])])
>>> peptidoform.proforma


While globally defined terminal modifications are not explicitly supported in ProForma v2, this function supports adding terminal modifications using the N-term and C-term targets in place of an amino acid target. These global modifications are supported in the psm_utils.peptidoform.Peptidoform.apply_fixed_modifications() method through a workaround. See for discussions on the issue.


Apply ProForma fixed modifications as sequential modifications.

Applies all modifications that are encoded as fixed in the ProForma notation (once at the beginning of the sequence) as modifications throughout the sequence at each affected amino acid residue.


>>> peptidoform = Peptidoform('<[Carbamidomethyl]@C>ATPEILTCNSIGCLK')
>>> peptidoform.apply_fixed_modifications()
>>> peptidoform.proforma
class psm_utils.PSM(*, peptidoform: Peptidoform | str, spectrum_id: int | str, run: str | None = None, collection: str | None = None, spectrum: Any | None = None, is_decoy: bool | None = None, score: float | None = None, qvalue: float | None = None, pep: float | None = None, precursor_mz: float | None = None, retention_time: float | None = None, ion_mobility: float | None = None, protein_list: List[str] | None = None, rank: int | None = None, source: str | None = None, provenance_data: Dict[str, str] | None = {}, metadata: Dict[str, str] | None = {}, rescoring_features: Dict[str, float] | None = {})

Data class representing a peptide-spectrum match (PSM).

Links a Peptidoform to an observed spectrum and holds the related information. Attribute types are coerced and enforced upon initialization.

  • peptidoform (Peptidoform, str) – Peptidoform object or string in ProForma v2 notation.

  • spectrum_id (str, int) – Spectrum identifier as used in spectrum file (e.g., mzML or MGF), usually in HUPO-PSI nativeID format (MS:1000767), e.g., controllerType=0 controllerNumber=0 scan=423.

  • run (str, optional) – Name of the MS run. Usually the spectrum file filename without extension.

  • collection (str, optional) – Identifier of the collection of spectrum files. Usually, the ProteomeXchange identifier, e.g. PXD028735.

  • spectrum (any, optional) – Observed spectrum. Can be freely used, for instance as a spectrum_utils.spectrum.MsmsSpectrum object.

  • is_decoy (bool, optional) – Boolean specifying if the PSM is a decoy (True) or target hit (False).

  • score (float, optional) – Search engine score.

  • qvalue (float, optional) – PSM-level q-value.

  • pep (float, optional) – PSM-level posterior error probability.

  • precursor_mz (float, optional) – Precursor m/z.

  • retention_time (float, optional) – Precursor retention time.

  • ion_mobility (float, optional) – Precursor ion mobility.

  • protein_list (list[str]) – List of proteins or protein groups associated with peptide.

  • rank (int) – rank of a psm

  • source (str, optional) – PSM file type where PSM was stored or search engine that generated it. E.g., mzid, or X!Tandem.

  • provenance_data (dict[str, str], optional) – Freeform dict to hold data describing the PSM origin, e.g. a search engine-specific identifier.

  • metadata (dict[str, str], optional) – More data about PSM.

  • rescoring_features (dict[str, str], optional) – Dict with features that can be used for PSM rescoring.

property precursor_mz_error: float

Difference between observed and theoretical m/z in Da.

get_precursor_charge() int

Precursor charge, as embedded in PSM.peptidoform.

get_usi(as_url=False) str

Compile Universal Spectrum Identifier for PSM.


as_url (bool, optional) – Return URL to USI aggregator.


model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'ion_mobility': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'is_decoy': FieldInfo(annotation=Union[bool, NoneType], required=False, default=None), 'metadata': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'pep': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'peptidoform': FieldInfo(annotation=Union[Peptidoform, str], required=True), 'precursor_mz': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'protein_list': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'provenance_data': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'qvalue': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'rescoring_features': FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default={}), 'retention_time': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'run': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'score': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'source': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'spectrum': FieldInfo(annotation=Union[Any, NoneType], required=False, default=None), 'spectrum_id': FieldInfo(annotation=Union[int, str], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class psm_utils.PSMList(*, psm_list: List[PSM])

Data class representing a list of PSMs, with some useful functionality.


psm_list (list[PSM]) – List of PSM instances.


Initiate a PSMList from a list of PSM objects:

>>> psm_list = PSMList(psm_list=[
...     PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2),
...     PSM(peptidoform="CDEFR", spectrum_id=2, score=132.9, retention_time=1225.4),
...     PSM(peptidoform="DEM[Oxidation]K", spectrum_id=3, score=55.7, retention_time=3389.1),
... ])

PSMList directly supports iteration:

>>> for psm in psm_list:
...     print(psm.peptidoform.score)

PSM properties can be accessed as a single Numpy array:

>>> psm_list["score"]
array([140.2, 132.9, 55.7], dtype=object)

PSMList supports indexing and slicing:

>>> psm_list_subset = psm_list[0:2]
>>> psm_list_subset["score"]
array([140.2, 132.9], dtype=object)
>>> psm_list_subset = psm_list[0, 2]
>>> psm_list_subset["score"]
array([140.2, 55.7], dtype=object)

For more advanced and efficient vectorized access, converting the PSMList to a Pandas DataFrame is highly recommended:

>>> psm_df = psm_list.to_dataframe()
>>> psm_df[(psm_df["retention_time"] < 2000) & (psm_df["score"] > 10)]
  peptidoform  spectrum_id   run collection spectrum is_decoy  score qvalue   pep precursor_mz  retention_time protein_list  rank source provenance_data metadata rescoring_features
0        ACDK            1  None       None     None     None  140.2   None  None         None           600.0         None  None   None            None     None               None
1       CDEFR            2  None       None     None     None  132.9   None  None         None          1225.0         None  None   None            None     None               None
property collections: list

List of collections in PSMList.

property runs: list

List of runs in PSMList.

append(psm: PSM) None

Append PSM to PSMList.

extend(psm_list: PSMList) None

Extend PSMList with another PSMList.


Get nested dictionary of PSMs by collection, run, and spectrum_id.

set_ranks(lower_score_better: bool = False)

Set identification ranks for all PSMs in PSMList.

get_rank1_psms(*args, **kwargs) PSMList

Return new PSMList with only first-rank PSMs.

First runs set_ranks() with *args and **kwargs if if any PSM has no rank yet.

find_decoys(decoy_pattern: str) None

Use regular expression pattern to find decoy PSMs by protein name(s).

This method allows a regular expression pattern to be applied on PSM protein_list items to set the is_decoy attribute. Decoy protein entries are commonly marked with a prefix or suffix, e.g. DECOY_, or _REVERSED. If decoy_pattern matches to a substring of all entries in protein_list, the PSM is interpreted as a decoy. Existing is_decoy entries are overwritten.


decoy_pattern (str) – Regular expression pattern to match decoy protein entries.


>>> psm_list.find_decoys(r"^DECOY_")
calculate_qvalues(reverse: bool = True, **kwargs) None

Calculate q-values using the target-decoy approach.

Q-values are calculated for all PSMs from the target and decoy scores. This requires that all PSMs have a score and a target/decoy state (is_decoy) assigned. Any existing q-values will be overwritten.

rename_modifications(mapping: dict[str, str]) None

Apply mapping to rename modification tags for all PSMs.

Applies psm_utils.peptidoform.Peptidoform.rename_modifications() on all PSM peptidoforms in the PSMList.


mapping (dict[str, str]) – Mapping of old labelnew label for each modification that requires renaming. Modification labels that are not in the mapping will not be renamed.

add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])

Add fixed modifications to all PSM peptidoforms in PSMList.

Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.


>>> psm_list.add_fixed_modifications([("Carbamidomethyl", ["C"])])
>>> psm_list.add_fixed_modifications({"Carbamidomethyl": ["C"]})

Apply ProForma fixed modifications as sequential modifications.

Applies psm_utils.peptidoform.Peptidoform.apply_fixed_modifications() on all PSM peptidoforms in the PSMList.


>>> psm_list.apply_fixed_modifications()
to_dataframe() DataFrame

Convert PSMList to pandas.DataFrame.

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'psm_list': FieldInfo(annotation=List[PSM], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.


class psm_utils.peptidoform.Peptidoform(proforma_sequence: str | ProForma)

Peptide sequence, modifications and charge state represented in ProForma notation.


proforma_sequence – Peptidoform sequence in ProForma v2 notation as str or pyteomics.proforma.ProForma object.


List of tuples with residue and modifications for each location.




Dict with sequence-wide properties.


dict[str, Any]


>>> peptidoform = Peptidoform("ACDM[Oxidation]EK")
>>> peptidoform.theoretical_mass
property proforma: str

Peptidoform sequence in ProForma v2 notation.


>>> Peptidoform("AC[U:4]DEK/2").proforma
property sequence: str

Stripped peptide sequence (modifications removed).


>>> Peptidoform("AC[U:4]DEK/2").sequence
property precursor_charge: int | None

Returns the charge state as integer or None if no charge assigned.


>>> Peptidoform("ACDEK/2").precursor_charge
>>> Peptidoform("ACDEK").precursor_charge
property is_modified: bool

Whether or not the peptidoform carries any modification of any type.

Includes N- and C-terminal, fixed, sequential, labile, and unlocalized modifications.

property sequential_composition: list[Composition]

Atomic compositions of both termini and each residue, including modifications.

Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.


>>> Peptidoform("ACDEK/2").sequential_composition
[Composition({'H': 1}),
Composition({'H': 5, 'C': 3, 'O': 1, 'N': 1}),
Composition({'H': 5, 'C': 3, 'S': 1, 'O': 1, 'N': 1}),
Composition({'H': 5, 'C': 4, 'O': 3, 'N': 1}),
Composition({'H': 7, 'C': 5, 'O': 3, 'N': 1}),
Composition({'H': 12, 'C': 6, 'N': 2, 'O': 1}),
Composition({'H': 1, 'O': 1})]
property composition: Composition

Atomic composition of the full peptidoform.

Includes all modifications, also labile and unlocalized.


>>> Peptidoform("ACDEK/2").composition
Composition({'H': 36, 'C': 21, 'O': 10, 'N': 6, 'S': 1})
property sequential_theoretical_mass: float

Monoisotopic mass of both termini and each (modified) residue.

Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.


>>> Peptidoform("ACDEK/2").sequential_theoretical_mass
property theoretical_mass: float

Monoisotopic mass of the full uncharged peptidoform.

Includes all modifications, also labile and unlocalized.


>>> Peptidoform("ACDEK/2").theoretical_mass
property theoretical_mz: float | None

Monoisotopic mass-to-charge ratio of the full peptidoform.

Includes all modifications, also labile and unlocalized.


>>> Peptidoform("ACDEK/2").theoretical_mz
>>> Peptidoform("AC[+57.021464]DEK/2").theoretical_mz
rename_modifications(mapping: dict[str, str]) None

Apply mapping to rename modification tags.


mapping (dict[str, str]) – Mapping of old labelnew label for each modification that requires renaming. Modification labels that are not in the mapping will not be renamed.


>>> peptidoform = Peptidoform('[ac]-PEPTC[cmm]IDEK')
>>> peptidoform.rename_modifications({
...     "ac": "Acetyl",
...     "cmm": "Carbamidomethyl"
... })
>>> peptidoform.proforma
add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])

Add fixed modifications to peptidoform.

Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.


>>> peptidoform = Peptidoform("ATPEILTCNSIGCLK")
>>> peptidoform.add_fixed_modifications([("Carbamidomethyl", ["C"])])
>>> peptidoform.proforma


While globally defined terminal modifications are not explicitly supported in ProForma v2, this function supports adding terminal modifications using the N-term and C-term targets in place of an amino acid target. These global modifications are supported in the psm_utils.peptidoform.Peptidoform.apply_fixed_modifications() method through a workaround. See for discussions on the issue.


Apply ProForma fixed modifications as sequential modifications.

Applies all modifications that are encoded as fixed in the ProForma notation (once at the beginning of the sequence) as modifications throughout the sequence at each affected amino acid residue.


>>> peptidoform = Peptidoform('<[Carbamidomethyl]@C>ATPEILTCNSIGCLK')
>>> peptidoform.apply_fixed_modifications()
>>> peptidoform.proforma

Format number as string for ProForma mass modifications.

exception psm_utils.peptidoform.PeptidoformException

Error while handling Peptidoform.

exception psm_utils.peptidoform.AmbiguousResidueException

Error while handling ambiguous residue.

exception psm_utils.peptidoform.ModificationException

Error while handling amino acid modification.


class psm_utils.psm.PSM(*, peptidoform: Peptidoform | str, spectrum_id: int | str, run: str | None = None, collection: str | None = None, spectrum: Any | None = None, is_decoy: bool | None = None, score: float | None = None, qvalue: float | None = None, pep: float | None = None, precursor_mz: float | None = None, retention_time: float | None = None, ion_mobility: float | None = None, protein_list: List[str] | None = None, rank: int | None = None, source: str | None = None, provenance_data: Dict[str, str] | None = {}, metadata: Dict[str, str] | None = {}, rescoring_features: Dict[str, float] | None = {})

Data class representing a peptide-spectrum match (PSM).

Links a Peptidoform to an observed spectrum and holds the related information. Attribute types are coerced and enforced upon initialization.

  • peptidoform (Peptidoform, str) – Peptidoform object or string in ProForma v2 notation.

  • spectrum_id (str, int) – Spectrum identifier as used in spectrum file (e.g., mzML or MGF), usually in HUPO-PSI nativeID format (MS:1000767), e.g., controllerType=0 controllerNumber=0 scan=423.

  • run (str, optional) – Name of the MS run. Usually the spectrum file filename without extension.

  • collection (str, optional) – Identifier of the collection of spectrum files. Usually, the ProteomeXchange identifier, e.g. PXD028735.

  • spectrum (any, optional) – Observed spectrum. Can be freely used, for instance as a spectrum_utils.spectrum.MsmsSpectrum object.

  • is_decoy (bool, optional) – Boolean specifying if the PSM is a decoy (True) or target hit (False).

  • score (float, optional) – Search engine score.

  • qvalue (float, optional) – PSM-level q-value.

  • pep (float, optional) – PSM-level posterior error probability.

  • precursor_mz (float, optional) – Precursor m/z.

  • retention_time (float, optional) – Precursor retention time.

  • ion_mobility (float, optional) – Precursor ion mobility.

  • protein_list (list[str]) – List of proteins or protein groups associated with peptide.

  • rank (int) – rank of a psm

  • source (str, optional) – PSM file type where PSM was stored or search engine that generated it. E.g., mzid, or X!Tandem.

  • provenance_data (dict[str, str], optional) – Freeform dict to hold data describing the PSM origin, e.g. a search engine-specific identifier.

  • metadata (dict[str, str], optional) – More data about PSM.

  • rescoring_features (dict[str, str], optional) – Dict with features that can be used for PSM rescoring.

property precursor_mz_error: float

Difference between observed and theoretical m/z in Da.

get_precursor_charge() int

Precursor charge, as embedded in PSM.peptidoform.

get_usi(as_url=False) str

Compile Universal Spectrum Identifier for PSM.


as_url (bool, optional) – Return URL to USI aggregator.


model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'ion_mobility': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'is_decoy': FieldInfo(annotation=Union[bool, NoneType], required=False, default=None), 'metadata': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'pep': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'peptidoform': FieldInfo(annotation=Union[Peptidoform, str], required=True), 'precursor_mz': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'protein_list': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'provenance_data': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'qvalue': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'rescoring_features': FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default={}), 'retention_time': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'run': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'score': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'source': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'spectrum': FieldInfo(annotation=Union[Any, NoneType], required=False, default=None), 'spectrum_id': FieldInfo(annotation=Union[int, str], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.


class psm_utils.psm_list.PSMList(*, psm_list: List[PSM])

Data class representing a list of PSMs, with some useful functionality.


psm_list (list[PSM]) – List of PSM instances.


Initiate a PSMList from a list of PSM objects:

>>> psm_list = PSMList(psm_list=[
...     PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2),
...     PSM(peptidoform="CDEFR", spectrum_id=2, score=132.9, retention_time=1225.4),
...     PSM(peptidoform="DEM[Oxidation]K", spectrum_id=3, score=55.7, retention_time=3389.1),
... ])

PSMList directly supports iteration:

>>> for psm in psm_list:
...     print(psm.peptidoform.score)

PSM properties can be accessed as a single Numpy array:

>>> psm_list["score"]
array([140.2, 132.9, 55.7], dtype=object)

PSMList supports indexing and slicing:

>>> psm_list_subset = psm_list[0:2]
>>> psm_list_subset["score"]
array([140.2, 132.9], dtype=object)
>>> psm_list_subset = psm_list[0, 2]
>>> psm_list_subset["score"]
array([140.2, 55.7], dtype=object)

For more advanced and efficient vectorized access, converting the PSMList to a Pandas DataFrame is highly recommended:

>>> psm_df = psm_list.to_dataframe()
>>> psm_df[(psm_df["retention_time"] < 2000) & (psm_df["score"] > 10)]
  peptidoform  spectrum_id   run collection spectrum is_decoy  score qvalue   pep precursor_mz  retention_time protein_list  rank source provenance_data metadata rescoring_features
0        ACDK            1  None       None     None     None  140.2   None  None         None           600.0         None  None   None            None     None               None
1       CDEFR            2  None       None     None     None  132.9   None  None         None          1225.0         None  None   None            None     None               None
property collections: list

List of collections in PSMList.

property runs: list

List of runs in PSMList.

append(psm: PSM) None

Append PSM to PSMList.

extend(psm_list: PSMList) None

Extend PSMList with another PSMList.


Get nested dictionary of PSMs by collection, run, and spectrum_id.

set_ranks(lower_score_better: bool = False)

Set identification ranks for all PSMs in PSMList.

get_rank1_psms(*args, **kwargs) PSMList

Return new PSMList with only first-rank PSMs.

First runs set_ranks() with *args and **kwargs if if any PSM has no rank yet.

find_decoys(decoy_pattern: str) None

Use regular expression pattern to find decoy PSMs by protein name(s).

This method allows a regular expression pattern to be applied on PSM protein_list items to set the is_decoy attribute. Decoy protein entries are commonly marked with a prefix or suffix, e.g. DECOY_, or _REVERSED. If decoy_pattern matches to a substring of all entries in protein_list, the PSM is interpreted as a decoy. Existing is_decoy entries are overwritten.


decoy_pattern (str) – Regular expression pattern to match decoy protein entries.


>>> psm_list.find_decoys(r"^DECOY_")
calculate_qvalues(reverse: bool = True, **kwargs) None

Calculate q-values using the target-decoy approach.

Q-values are calculated for all PSMs from the target and decoy scores. This requires that all PSMs have a score and a target/decoy state (is_decoy) assigned. Any existing q-values will be overwritten.

rename_modifications(mapping: dict[str, str]) None

Apply mapping to rename modification tags for all PSMs.

Applies psm_utils.peptidoform.Peptidoform.rename_modifications() on all PSM peptidoforms in the PSMList.


mapping (dict[str, str]) – Mapping of old labelnew label for each modification that requires renaming. Modification labels that are not in the mapping will not be renamed.

add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])

Add fixed modifications to all PSM peptidoforms in PSMList.

Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.


>>> psm_list.add_fixed_modifications([("Carbamidomethyl", ["C"])])
>>> psm_list.add_fixed_modifications({"Carbamidomethyl": ["C"]})

Apply ProForma fixed modifications as sequential modifications.

Applies psm_utils.peptidoform.Peptidoform.apply_fixed_modifications() on all PSM peptidoforms in the PSMList.


>>> psm_list.apply_fixed_modifications()
to_dataframe() DataFrame

Convert PSMList to pandas.DataFrame.

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'psm_list': FieldInfo(annotation=List[PSM], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.


Various utility functions.

psm_utils.utils.mass_to_mz(mass: float, charge: int, adduct_mass: float | None = None) float

Convert mass to m/z.

  • mass – Mass of the uncharged ion without adducts.

  • charge – Charge of the ion.

  • adduct_mass – Mass of the charge-carrying adduct. Defaults to the mass of a proton.

psm_utils.utils.mz_to_mass(mz: float, charge: int, adduct_mass: float | None = None) float

Convert m/z to mass.

  • mz – m/z of the charged ion and adducts.

  • charge – Charge of the ion.

  • adduct_mass – Mass of the charge-carrying adduct. Defaults to the mass of a proton.