psm_utils
Common utilities for parsing and handling PSMs, and search engine results.
- class psm_utils.Peptidoform(proforma_sequence: str | ProForma)
Peptide sequence, modifications and charge state represented in ProForma notation.
- Parameters:
proforma_sequence – Peptidoform sequence in ProForma v2 notation as
strorpyteomics.proforma.ProFormaobject.
Examples
>>> peptidoform = Peptidoform("ACDM[Oxidation]EK") >>> peptidoform.theoretical_mass 711.2567622919099
- property proforma: str
Peptidoform sequence in ProForma v2 notation.
Examples
>>> Peptidoform("AC[U:4]DEK/2").proforma 'AC[UNIMOD:4]DEK/2'
- property sequence: str
Stripped peptide sequence (modifications removed).
Examples
>>> Peptidoform("AC[U:4]DEK/2").sequence 'ACDEK'
- property precursor_charge: int | None
Returns the charge state as integer or
Noneif no charge assigned.Examples
>>> Peptidoform("ACDEK/2").precursor_charge 2
>>> Peptidoform("ACDEK").precursor_charge None
- property is_modified: bool
Whether or not the peptidoform carries any modification of any type.
Includes N- and C-terminal, fixed, sequential, labile, and unlocalized modifications.
- property sequential_composition: list[Composition]
Atomic compositions of both termini and each residue, including modifications.
Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.
Examples
>>> Peptidoform("ACDEK/2").sequential_composition [Composition({'H': 1}), Composition({'H': 5, 'C': 3, 'O': 1, 'N': 1}), Composition({'H': 5, 'C': 3, 'S': 1, 'O': 1, 'N': 1}), Composition({'H': 5, 'C': 4, 'O': 3, 'N': 1}), Composition({'H': 7, 'C': 5, 'O': 3, 'N': 1}), Composition({'H': 12, 'C': 6, 'N': 2, 'O': 1}), Composition({'H': 1, 'O': 1})]
- property composition: Composition
Atomic composition of the full peptidoform.
Includes all modifications, also labile and unlocalized.
Examples
>>> Peptidoform("ACDEK/2").composition Composition({'H': 36, 'C': 21, 'O': 10, 'N': 6, 'S': 1})
- property sequential_theoretical_mass: float
Monoisotopic mass of both termini and each (modified) residue.
Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.
Examples
>>> Peptidoform("ACDEK/2").sequential_theoretical_mass [1.00782503207, 71.03711378471, 103.00918478471, 115.02694302383001, 129.04259308796998, 128.09496301399997, 17.002739651629998]
- property theoretical_mass: float
Monoisotopic mass of the full uncharged peptidoform.
Includes all modifications, also labile and unlocalized.
Examples
>>> Peptidoform("ACDEK/2").theoretical_mass 564.22136237892
- property theoretical_mz: float | None
Monoisotopic mass-to-charge ratio of the full peptidoform.
Includes all modifications, also labile and unlocalized.
Examples
>>> Peptidoform("ACDEK/2").theoretical_mz 283.11850622153
>>> Peptidoform("AC[+57.021464]DEK/2").theoretical_mz 311.62923822153
- rename_modifications(mapping: dict[str, str]) None
Apply mapping to rename modification tags.
- Parameters:
mapping (dict[str, str]) – Mapping of
old label→new labelfor each modification that requires renaming. Modification labels that are not in the mapping will not be renamed.
Examples
>>> peptidoform = Peptidoform('[ac]-PEPTC[cmm]IDEK') >>> peptidoform.rename_modifications({ ... "ac": "Acetyl", ... "cmm": "Carbamidomethyl" ... }) >>> peptidoform.proforma '[Acetyl]-PEPTC[Carbamidomethyl]IDEK'
- add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])
Add fixed modifications to peptidoform.
Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.
Examples
>>> peptidoform = Peptidoform("ATPEILTCNSIGCLK") >>> peptidoform.add_fixed_modifications([("Carbamidomethyl", ["C"])]) >>> peptidoform.proforma '<[Carbamidomethyl]@C>ATPEILTCNSIGCLK'
Notes
While globally defined terminal modifications are not explicitly supported in ProForma v2, this function supports adding terminal modifications using the
N-termandC-termtargets in place of an amino acid target. These global modifications are supported in thepsm_utils.peptidoform.Peptidoform.apply_fixed_modifications()method through a workaround. See https://github.com/HUPO-PSI/ProForma/issues/6 for discussions on the issue.
- apply_fixed_modifications()
Apply ProForma fixed modifications as sequential modifications.
Applies all modifications that are encoded as fixed in the ProForma notation (once at the beginning of the sequence) as modifications throughout the sequence at each affected amino acid residue.
Examples
>>> peptidoform = Peptidoform('<[Carbamidomethyl]@C>ATPEILTCNSIGCLK') >>> peptidoform.apply_fixed_modifications() >>> peptidoform.proforma 'ATPEILTC[Carbamidomethyl]NSIGC[Carbamidomethyl]LK'
- class psm_utils.PSM(*, peptidoform: Peptidoform | str, spectrum_id: str, run: str | None = None, collection: str | None = None, spectrum: Any | None = None, is_decoy: bool | None = None, score: float | None = None, qvalue: float | None = None, pep: float | None = None, precursor_mz: float | None = None, retention_time: float | None = None, ion_mobility: float | None = None, protein_list: List[str] | None = None, rank: int | None = None, source: str | None = None, provenance_data: Dict[str, str] | None = {}, metadata: Dict[str, str] | None = {}, rescoring_features: Dict[str, float] | None = {})
Data class representing a peptide-spectrum match (PSM).
Links a
Peptidoformto an observed spectrum and holds the related information. Attribute types are coerced and enforced upon initialization.- Parameters:
peptidoform (Peptidoform, str) – Peptidoform object or string in ProForma v2 notation.
spectrum_id (str, int) – Spectrum identifier as used in spectrum file (e.g., mzML or MGF), usually in HUPO-PSI nativeID format (MS:1000767), e.g.,
controllerType=0 controllerNumber=0 scan=423.run (str, optional) – Name of the MS run. Usually the spectrum file filename without extension.
collection (str, optional) – Identifier of the collection of spectrum files. Usually, the ProteomeXchange identifier, e.g.
PXD028735.spectrum (any, optional) – Observed spectrum. Can be freely used, for instance as a
spectrum_utils.spectrum.MsmsSpectrumobject.is_decoy (bool, optional) – Boolean specifying if the PSM is a decoy (
True) or target hit (False).score (float, optional) – Search engine score.
qvalue (float, optional) – PSM-level q-value.
pep (float, optional) – PSM-level posterior error probability.
precursor_mz (float, optional) – Precursor m/z.
retention_time (float, optional) – Precursor retention time.
ion_mobility (float, optional) – Precursor ion mobility.
protein_list (list[str]) – List of proteins or protein groups associated with peptide.
rank (int) – rank of a psm
source (str, optional) – PSM file type where PSM was stored or search engine that generated it. E.g.,
mzid, orX!Tandem.provenance_data (dict[str, str], optional) – Freeform dict to hold data describing the PSM origin, e.g. a search engine-specific identifier.
rescoring_features (dict[str, str], optional) – Dict with features that can be used for PSM rescoring.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'coerce_numbers_to_str': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- get_usi(as_url=False) str
Compile Universal Spectrum Identifier for
PSM.- Parameters:
as_url (bool, optional) – Return URL to proteomeXchange.org USI aggregator.
Notes
The resulting USI will only be valid if the
collection,run, andspectrum_idare defined.There is no guarantee that the resulting USI is resolvable at ProteomeXchange. This requires that the spectrum has been fully indexed in a ProteomeXchange partner repository. For instance, the following USI should be resolvable: mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_fields: ClassVar[dict[str, FieldInfo]] = {'collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'ion_mobility': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'is_decoy': FieldInfo(annotation=Union[bool, NoneType], required=False, default=None), 'metadata': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'pep': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'peptidoform': FieldInfo(annotation=Union[Peptidoform, str], required=True), 'precursor_mz': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'protein_list': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'provenance_data': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'qvalue': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'rescoring_features': FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default={}), 'retention_time': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'run': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'score': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'source': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'spectrum': FieldInfo(annotation=Union[Any, NoneType], required=False, default=None), 'spectrum_id': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class psm_utils.PSMList(*, psm_list: List[PSM])
Data class representing a list of PSMs, with some useful functionality.
Examples
Initiate a
PSMListfrom a list of PSM objects:>>> psm_list = PSMList(psm_list=[ ... PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2), ... PSM(peptidoform="CDEFR", spectrum_id=2, score=132.9, retention_time=1225.4), ... PSM(peptidoform="DEM[Oxidation]K", spectrum_id=3, score=55.7, retention_time=3389.1), ... ])
PSMListdirectly supports iteration:>>> for psm in psm_list: ... print(psm.peptidoform.score) 140.2 132.9 55.7
PSMproperties can be accessed as a single Numpy array:>>> psm_list["score"] array([140.2, 132.9, 55.7], dtype=object)
PSMListsupports indexing and slicing:>>> psm_list_subset = psm_list[0:2] >>> psm_list_subset["score"] array([140.2, 132.9], dtype=object)
>>> psm_list_subset = psm_list[0, 2] >>> psm_list_subset["score"] array([140.2, 55.7], dtype=object)
For more advanced and efficient vectorized access, converting the
PSMListto a Pandas DataFrame is highly recommended:>>> psm_df = psm_list.to_dataframe() >>> psm_df[(psm_df["retention_time"] < 2000) & (psm_df["score"] > 10)] peptidoform spectrum_id run collection spectrum is_decoy score qvalue pep precursor_mz retention_time protein_list rank source provenance_data metadata rescoring_features 0 ACDK 1 None None None None 140.2 None None None 600.0 None None None None None None 1 CDEFR 2 None None None None 132.9 None None None 1225.0 None None None None None None
- get_psm_dict()
Get nested dictionary of PSMs by collection, run, and spectrum_id.
- get_rank1_psms(*args, **kwargs) PSMList
Return new
PSMListwith only first-rank PSMs.First runs
set_ranks()with*argsand**kwargsif if any PSM has no rank yet.
- find_decoys(decoy_pattern: str) None
Use regular expression pattern to find decoy PSMs by protein name(s).
This method allows a regular expression pattern to be applied on
PSMprotein_listitems to set theis_decoyattribute. Decoy protein entries are commonly marked with a prefix or suffix, e.g.DECOY_, or_REVERSED. Ifdecoy_patternmatches to a substring of all entries inprotein_list, the PSM is interpreted as a decoy. Existingis_decoyentries are overwritten.- Parameters:
decoy_pattern (str) – Regular expression pattern to match decoy protein entries.
Examples
>>> psm_list.find_decoys(r"^DECOY_")
- calculate_qvalues(reverse: bool = True, **kwargs) None
Calculate q-values using the target-decoy approach.
Q-values are calculated for all PSMs from the target and decoy scores. This requires that all PSMs have a
scoreand a target/decoy state (is_decoy) assigned. Any existing q-values will be overwritten.- Parameters:
reverse (boolean, optional) – If True (default), a higher score value indicates a better PSM.
**kwargs (dict, optional) – Additional arguments to be passed to pyteomics.auxiliary.target_decoy.qvalues.
- rename_modifications(mapping: dict[str, str]) None
Apply mapping to rename modification tags for all PSMs.
Applies
psm_utils.peptidoform.Peptidoform.rename_modifications()on all PSM peptidoforms in thePSMList.
- add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])
Add fixed modifications to all PSM peptidoforms in
PSMList.Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.
Examples
>>> psm_list.add_fixed_modifications([("Carbamidomethyl", ["C"])])
>>> psm_list.add_fixed_modifications({"Carbamidomethyl": ["C"]})
- apply_fixed_modifications()
Apply ProForma fixed modifications as sequential modifications.
Applies
psm_utils.peptidoform.Peptidoform.apply_fixed_modifications()on all PSM peptidoforms in thePSMList.Examples
>>> psm_list.apply_fixed_modifications()
- to_dataframe() DataFrame
Convert
PSMListtopandas.DataFrame.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
psm_utils.peptidoform
- class psm_utils.peptidoform.Peptidoform(proforma_sequence: str | ProForma)
Peptide sequence, modifications and charge state represented in ProForma notation.
- Parameters:
proforma_sequence – Peptidoform sequence in ProForma v2 notation as
strorpyteomics.proforma.ProFormaobject.
Examples
>>> peptidoform = Peptidoform("ACDM[Oxidation]EK") >>> peptidoform.theoretical_mass 711.2567622919099
- property proforma: str
Peptidoform sequence in ProForma v2 notation.
Examples
>>> Peptidoform("AC[U:4]DEK/2").proforma 'AC[UNIMOD:4]DEK/2'
- property sequence: str
Stripped peptide sequence (modifications removed).
Examples
>>> Peptidoform("AC[U:4]DEK/2").sequence 'ACDEK'
- property precursor_charge: int | None
Returns the charge state as integer or
Noneif no charge assigned.Examples
>>> Peptidoform("ACDEK/2").precursor_charge 2
>>> Peptidoform("ACDEK").precursor_charge None
- property is_modified: bool
Whether or not the peptidoform carries any modification of any type.
Includes N- and C-terminal, fixed, sequential, labile, and unlocalized modifications.
- property sequential_composition: list[Composition]
Atomic compositions of both termini and each residue, including modifications.
Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.
Examples
>>> Peptidoform("ACDEK/2").sequential_composition [Composition({'H': 1}), Composition({'H': 5, 'C': 3, 'O': 1, 'N': 1}), Composition({'H': 5, 'C': 3, 'S': 1, 'O': 1, 'N': 1}), Composition({'H': 5, 'C': 4, 'O': 3, 'N': 1}), Composition({'H': 7, 'C': 5, 'O': 3, 'N': 1}), Composition({'H': 12, 'C': 6, 'N': 2, 'O': 1}), Composition({'H': 1, 'O': 1})]
- property composition: Composition
Atomic composition of the full peptidoform.
Includes all modifications, also labile and unlocalized.
Examples
>>> Peptidoform("ACDEK/2").composition Composition({'H': 36, 'C': 21, 'O': 10, 'N': 6, 'S': 1})
- property sequential_theoretical_mass: float
Monoisotopic mass of both termini and each (modified) residue.
Includes N- and C-terminal, fixed, and sequential modifications. Does not include labile or unlocalized modifications.
Examples
>>> Peptidoform("ACDEK/2").sequential_theoretical_mass [1.00782503207, 71.03711378471, 103.00918478471, 115.02694302383001, 129.04259308796998, 128.09496301399997, 17.002739651629998]
- property theoretical_mass: float
Monoisotopic mass of the full uncharged peptidoform.
Includes all modifications, also labile and unlocalized.
Examples
>>> Peptidoform("ACDEK/2").theoretical_mass 564.22136237892
- property theoretical_mz: float | None
Monoisotopic mass-to-charge ratio of the full peptidoform.
Includes all modifications, also labile and unlocalized.
Examples
>>> Peptidoform("ACDEK/2").theoretical_mz 283.11850622153
>>> Peptidoform("AC[+57.021464]DEK/2").theoretical_mz 311.62923822153
- rename_modifications(mapping: dict[str, str]) None
Apply mapping to rename modification tags.
- Parameters:
mapping (dict[str, str]) – Mapping of
old label→new labelfor each modification that requires renaming. Modification labels that are not in the mapping will not be renamed.
Examples
>>> peptidoform = Peptidoform('[ac]-PEPTC[cmm]IDEK') >>> peptidoform.rename_modifications({ ... "ac": "Acetyl", ... "cmm": "Carbamidomethyl" ... }) >>> peptidoform.proforma '[Acetyl]-PEPTC[Carbamidomethyl]IDEK'
- add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])
Add fixed modifications to peptidoform.
Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.
Examples
>>> peptidoform = Peptidoform("ATPEILTCNSIGCLK") >>> peptidoform.add_fixed_modifications([("Carbamidomethyl", ["C"])]) >>> peptidoform.proforma '<[Carbamidomethyl]@C>ATPEILTCNSIGCLK'
Notes
While globally defined terminal modifications are not explicitly supported in ProForma v2, this function supports adding terminal modifications using the
N-termandC-termtargets in place of an amino acid target. These global modifications are supported in thepsm_utils.peptidoform.Peptidoform.apply_fixed_modifications()method through a workaround. See https://github.com/HUPO-PSI/ProForma/issues/6 for discussions on the issue.
- apply_fixed_modifications()
Apply ProForma fixed modifications as sequential modifications.
Applies all modifications that are encoded as fixed in the ProForma notation (once at the beginning of the sequence) as modifications throughout the sequence at each affected amino acid residue.
Examples
>>> peptidoform = Peptidoform('<[Carbamidomethyl]@C>ATPEILTCNSIGCLK') >>> peptidoform.apply_fixed_modifications() >>> peptidoform.proforma 'ATPEILTC[Carbamidomethyl]NSIGC[Carbamidomethyl]LK'
- psm_utils.peptidoform.format_number_as_string(num)
Format number as string for ProForma mass modifications.
- exception psm_utils.peptidoform.PeptidoformException
Error while handling
Peptidoform.
- exception psm_utils.peptidoform.AmbiguousResidueException
Error while handling ambiguous residue.
- exception psm_utils.peptidoform.ModificationException
Error while handling amino acid modification.
psm_utils.psm
- class psm_utils.psm.PSM(*, peptidoform: Peptidoform | str, spectrum_id: str, run: str | None = None, collection: str | None = None, spectrum: Any | None = None, is_decoy: bool | None = None, score: float | None = None, qvalue: float | None = None, pep: float | None = None, precursor_mz: float | None = None, retention_time: float | None = None, ion_mobility: float | None = None, protein_list: List[str] | None = None, rank: int | None = None, source: str | None = None, provenance_data: Dict[str, str] | None = {}, metadata: Dict[str, str] | None = {}, rescoring_features: Dict[str, float] | None = {})
Data class representing a peptide-spectrum match (PSM).
Links a
Peptidoformto an observed spectrum and holds the related information. Attribute types are coerced and enforced upon initialization.- Parameters:
peptidoform (Peptidoform, str) – Peptidoform object or string in ProForma v2 notation.
spectrum_id (str, int) – Spectrum identifier as used in spectrum file (e.g., mzML or MGF), usually in HUPO-PSI nativeID format (MS:1000767), e.g.,
controllerType=0 controllerNumber=0 scan=423.run (str, optional) – Name of the MS run. Usually the spectrum file filename without extension.
collection (str, optional) – Identifier of the collection of spectrum files. Usually, the ProteomeXchange identifier, e.g.
PXD028735.spectrum (any, optional) – Observed spectrum. Can be freely used, for instance as a
spectrum_utils.spectrum.MsmsSpectrumobject.is_decoy (bool, optional) – Boolean specifying if the PSM is a decoy (
True) or target hit (False).score (float, optional) – Search engine score.
qvalue (float, optional) – PSM-level q-value.
pep (float, optional) – PSM-level posterior error probability.
precursor_mz (float, optional) – Precursor m/z.
retention_time (float, optional) – Precursor retention time.
ion_mobility (float, optional) – Precursor ion mobility.
protein_list (list[str]) – List of proteins or protein groups associated with peptide.
rank (int) – rank of a psm
source (str, optional) – PSM file type where PSM was stored or search engine that generated it. E.g.,
mzid, orX!Tandem.provenance_data (dict[str, str], optional) – Freeform dict to hold data describing the PSM origin, e.g. a search engine-specific identifier.
rescoring_features (dict[str, str], optional) – Dict with features that can be used for PSM rescoring.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'coerce_numbers_to_str': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- get_usi(as_url=False) str
Compile Universal Spectrum Identifier for
PSM.- Parameters:
as_url (bool, optional) – Return URL to proteomeXchange.org USI aggregator.
Notes
The resulting USI will only be valid if the
collection,run, andspectrum_idare defined.There is no guarantee that the resulting USI is resolvable at ProteomeXchange. This requires that the spectrum has been fully indexed in a ProteomeXchange partner repository. For instance, the following USI should be resolvable: mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_fields: ClassVar[dict[str, FieldInfo]] = {'collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'ion_mobility': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'is_decoy': FieldInfo(annotation=Union[bool, NoneType], required=False, default=None), 'metadata': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'pep': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'peptidoform': FieldInfo(annotation=Union[Peptidoform, str], required=True), 'precursor_mz': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'protein_list': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'provenance_data': FieldInfo(annotation=Union[Dict[str, str], NoneType], required=False, default={}), 'qvalue': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'rescoring_features': FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default={}), 'retention_time': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'run': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'score': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'source': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'spectrum': FieldInfo(annotation=Union[Any, NoneType], required=False, default=None), 'spectrum_id': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
psm_utils.psm_list
- class psm_utils.psm_list.PSMList(*, psm_list: List[PSM])
Data class representing a list of PSMs, with some useful functionality.
Examples
Initiate a
PSMListfrom a list of PSM objects:>>> psm_list = PSMList(psm_list=[ ... PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2), ... PSM(peptidoform="CDEFR", spectrum_id=2, score=132.9, retention_time=1225.4), ... PSM(peptidoform="DEM[Oxidation]K", spectrum_id=3, score=55.7, retention_time=3389.1), ... ])
PSMListdirectly supports iteration:>>> for psm in psm_list: ... print(psm.peptidoform.score) 140.2 132.9 55.7
PSMproperties can be accessed as a single Numpy array:>>> psm_list["score"] array([140.2, 132.9, 55.7], dtype=object)
PSMListsupports indexing and slicing:>>> psm_list_subset = psm_list[0:2] >>> psm_list_subset["score"] array([140.2, 132.9], dtype=object)
>>> psm_list_subset = psm_list[0, 2] >>> psm_list_subset["score"] array([140.2, 55.7], dtype=object)
For more advanced and efficient vectorized access, converting the
PSMListto a Pandas DataFrame is highly recommended:>>> psm_df = psm_list.to_dataframe() >>> psm_df[(psm_df["retention_time"] < 2000) & (psm_df["score"] > 10)] peptidoform spectrum_id run collection spectrum is_decoy score qvalue pep precursor_mz retention_time protein_list rank source provenance_data metadata rescoring_features 0 ACDK 1 None None None None 140.2 None None None 600.0 None None None None None None 1 CDEFR 2 None None None None 132.9 None None None 1225.0 None None None None None None
- get_psm_dict()
Get nested dictionary of PSMs by collection, run, and spectrum_id.
- get_rank1_psms(*args, **kwargs) PSMList
Return new
PSMListwith only first-rank PSMs.First runs
set_ranks()with*argsand**kwargsif if any PSM has no rank yet.
- find_decoys(decoy_pattern: str) None
Use regular expression pattern to find decoy PSMs by protein name(s).
This method allows a regular expression pattern to be applied on
PSMprotein_listitems to set theis_decoyattribute. Decoy protein entries are commonly marked with a prefix or suffix, e.g.DECOY_, or_REVERSED. Ifdecoy_patternmatches to a substring of all entries inprotein_list, the PSM is interpreted as a decoy. Existingis_decoyentries are overwritten.- Parameters:
decoy_pattern (str) – Regular expression pattern to match decoy protein entries.
Examples
>>> psm_list.find_decoys(r"^DECOY_")
- calculate_qvalues(reverse: bool = True, **kwargs) None
Calculate q-values using the target-decoy approach.
Q-values are calculated for all PSMs from the target and decoy scores. This requires that all PSMs have a
scoreand a target/decoy state (is_decoy) assigned. Any existing q-values will be overwritten.- Parameters:
reverse (boolean, optional) – If True (default), a higher score value indicates a better PSM.
**kwargs (dict, optional) –
Additional arguments to be passed to pyteomics.auxiliary.target_decoy.qvalues.
- rename_modifications(mapping: dict[str, str]) None
Apply mapping to rename modification tags for all PSMs.
Applies
psm_utils.peptidoform.Peptidoform.rename_modifications()on all PSM peptidoforms in thePSMList.
- add_fixed_modifications(modification_rules: list[tuple[str, list[str]]] | dict[str, list[str]])
Add fixed modifications to all PSM peptidoforms in
PSMList.Add modification rules for fixed modifications to peptidoform. These will be added in the “fixed modifications” notation, at the front of the ProForma sequence.
Examples
>>> psm_list.add_fixed_modifications([("Carbamidomethyl", ["C"])])
>>> psm_list.add_fixed_modifications({"Carbamidomethyl": ["C"]})
- apply_fixed_modifications()
Apply ProForma fixed modifications as sequential modifications.
Applies
psm_utils.peptidoform.Peptidoform.apply_fixed_modifications()on all PSM peptidoforms in thePSMList.Examples
>>> psm_list.apply_fixed_modifications()
- to_dataframe() DataFrame
Convert
PSMListtopandas.DataFrame.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
psm_utils.utils
Various utility functions.