AFDB tools
This module contains the utility functions for alpha fold db and uniprot.
- src.AFDB_tools.get_amino_acid_sequence(pdb_filename)
This function extracts the amino acid sequence from a PDB file.
- src.AFDB_tools.descr(pdb_path)
Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns a descriptive statistics object. :param pdb_path: The path to the PDB file. :type pdb_path: str
- src.AFDB_tools.filter_plddt(pdb_path, thresh=0.6, minthresh=0.5)
Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns bool if the pdb is accepted or not.
- Parameters:
pdb_path (str) – The path to the PDB file.
- src.AFDB_tools.grab_struct(uniID, structfolder, rejected=None, overwrite=False)
Downloads a protein structure file from the AlphaFold website and saves it to the specified folder.
Parameters: uniID (str): The UniProt ID of the protein for which the structure is being downloaded. structfolder (str): The path to the folder where the structure file should be saved. overwrite (bool, optional): A flag indicating whether to overwrite an existing file with the same name in the specified folder. Defaults to False.
Returns: None: If the file is successfully downloaded or if overwrite is set to True and a file with the same name is found in the specified folder. str: If an error occurs during the download or if a file with the same name is found in the specified folder and overwrite is set to False.
Examples: >>> grab_struct(‘P00533’, ‘/path/to/structures/’) None >>> grab_struct(‘P00533’, ‘/path/to/structures/’, overwrite=True) None
- src.AFDB_tools.chunk(data, csize)
- src.AFDB_tools.unirequest_tab(name, verbose=False)
Makes a request to the UniProt API and returns information about a protein in tab-separated format.
Parameters: name (str): The name of the protein for which information is being requested. verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.
Returns: pd.DataFrame: A DataFrame containing information about the protein, with one row for each hit in the search.
Examples: >>> unirequest_tab(‘P00533’)
id … sequence
0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN…
- src.AFDB_tools.grab_entries(ids, verbose=False)
Makes requests to the UniProt API for information about proteins with the given IDs.
Parameters: ids (list): A list of UniProt IDs for the proteins for which information is being requested. verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.
Returns: pd.DataFrame: A DataFrame containing information about the proteins, with one row for each hit in the search.
Examples: >>> grab_entries([‘P00533’, ‘P15056’])
id … sequence
0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN… 1 sp|P15056|1A01_HUMAN RecName: Full=Alpha-1-… … MAAARLLPLLPLLLALALALTETSCPPASQGQRASVGDRV…
Notes: This function makes requests to the UniProt API for information about proteins with the given IDs. If a request is successful, the returned data is processed and added to a DataFrame. If a request is unsuccessful, an error message is printed to the console.
- src.AFDB_tools.res2fasta(unires_df)
Converts a DataFrame containing protein information into a FASTA format string.
Parameters: unires_df (pd.DataFrame): A DataFrame containing information about proteins, with columns ‘query’ and ‘Sequence’ representing the name and sequence of each protein, respectively.
Returns: str: A string in FASTA format representing the proteins in the input DataFrame.
Examples: >>> unires_df = pd.DataFrame([{‘query’: ‘P00533’, ‘Sequence’: ‘MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN’}]) >>> res2fasta(unires_df) ‘> P00533
MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN ‘