AFDB tools

This module contains the utility functions for alpha fold db and uniprot.

src.AFDB_tools.get_amino_acid_sequence(pdb_filename)

This function extracts the amino acid sequence from a PDB file.

src.AFDB_tools.descr(pdb_path)

Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns a descriptive statistics object. :param pdb_path: The path to the PDB file. :type pdb_path: str

src.AFDB_tools.filter_plddt(pdb_path, thresh=0.6, minthresh=0.5)

Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns bool if the pdb is accepted or not.

Parameters:

pdb_path (str) – The path to the PDB file.

src.AFDB_tools.grab_struct(uniID, structfolder, rejected=None, overwrite=False)

Downloads a protein structure file from the AlphaFold website and saves it to the specified folder.

Parameters: uniID (str): The UniProt ID of the protein for which the structure is being downloaded. structfolder (str): The path to the folder where the structure file should be saved. overwrite (bool, optional): A flag indicating whether to overwrite an existing file with the same name in the specified folder. Defaults to False.

Returns: None: If the file is successfully downloaded or if overwrite is set to True and a file with the same name is found in the specified folder. str: If an error occurs during the download or if a file with the same name is found in the specified folder and overwrite is set to False.

Examples: >>> grab_struct(‘P00533’, ‘/path/to/structures/’) None >>> grab_struct(‘P00533’, ‘/path/to/structures/’, overwrite=True) None

src.AFDB_tools.chunk(data, csize)
src.AFDB_tools.unirequest_tab(name, verbose=False)

Makes a request to the UniProt API and returns information about a protein in tab-separated format.

Parameters: name (str): The name of the protein for which information is being requested. verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.

Returns: pd.DataFrame: A DataFrame containing information about the protein, with one row for each hit in the search.

Examples: >>> unirequest_tab(‘P00533’)

id … sequence

0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN…

src.AFDB_tools.grab_entries(ids, verbose=False)

Makes requests to the UniProt API for information about proteins with the given IDs.

Parameters: ids (list): A list of UniProt IDs for the proteins for which information is being requested. verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.

Returns: pd.DataFrame: A DataFrame containing information about the proteins, with one row for each hit in the search.

Examples: >>> grab_entries([‘P00533’, ‘P15056’])

id … sequence

0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN… 1 sp|P15056|1A01_HUMAN RecName: Full=Alpha-1-… … MAAARLLPLLPLLLALALALTETSCPPASQGQRASVGDRV…

Notes: This function makes requests to the UniProt API for information about proteins with the given IDs. If a request is successful, the returned data is processed and added to a DataFrame. If a request is unsuccessful, an error message is printed to the console.

src.AFDB_tools.res2fasta(unires_df)

Converts a DataFrame containing protein information into a FASTA format string.

Parameters: unires_df (pd.DataFrame): A DataFrame containing information about proteins, with columns ‘query’ and ‘Sequence’ representing the name and sequence of each protein, respectively.

Returns: str: A string in FASTA format representing the proteins in the input DataFrame.

Examples: >>> unires_df = pd.DataFrame([{‘query’: ‘P00533’, ‘Sequence’: ‘MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN’}]) >>> res2fasta(unires_df) ‘> P00533

MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN ‘