PyOMADB
PyOMADB is intended as a user-friendly Python wrapper around the OMA REST API.
Client
- class omadb.Client(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]
Client for the OMA browser REST API.
Initialisation example:
from omadb import Client c = Client()
- Raises:
ClientException – for 400, 404, 500 errors.
ClientTimeout – for timeout when interacting with REST endpoint.
- __init__(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]
- Parameters:
endpoint (str) – OMA REST API endpoint (default omabrowser.org/api)
persistent_cached (bool) – whether to cache queries on disk in SQLite DB.
persistent_cache_path (str or None) – location for persistent cache, optional
- genomes
Instance of
omadb.OMARestAPI.Genomes
.- entries
Instance of
omadb.OMARestAPI.Entries
.- proteins
Synonym of entries.
- hogs
Instance of
omadb.OMARestAPI.HOGs
.- groups
Instance of
omadb.OMARestAPI.OMAGroups
.- function
Instance of
omadb.OMARestAPI.Function
.- taxonomy
Instance of
omadb.OMARestAPI.Taxonomy
.- pairwise
Instance of
omadb.OMARestAPI.PairwiseRelations
.- xrefs
Instance of
omadb.OMARestAPI.ExternalReferences
.- external_references
Synonym of xrefs.
- synteny
Instance of
omadb.OMARestAPI.Synteny
.
Genomes
- class omadb.OMARestAPI.Genomes(client)[source]
API functionality for genome information.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() wheat = c.genomes.genome('WHEAT')
- __getitem__(genome_id)[source]
Retrieve information on a genome in OMA.
- Parameters:
genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code
- Returns:
genome information
- Return type:
- as_dataframe()[source]
Retrieve information on all genomes in OMA, return as pandas data frame.
- Returns:
information on all genomes
- Return type:
pd.DataFrame
- genome(genome_id)[source]
Retrieve information on a genome in OMA.
- Parameters:
genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code
- Returns:
genome information
- Return type:
- property genomes[source]
Retrieve information on all genomes in OMA.
- Returns:
information on all genomes
- Return type:
Note
The
genomes
property is alazy_property
. This property’s value is computed once (the first time it is accessed) and the result is cached.
- property list
Synonym for genomes. Retrieve information on all genomes in OMA.
- Returns:
information on all genomes
- Return type:
Entries
- class omadb.OMARestAPI.Entries(client)[source]
API functionality for protein entries.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() entry = c.entries['WHEAT00001']
- __getitem__(entry_id)[source]
Retrieve the information available for a protein entry.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
entry information
- Return type:
- cross_references(entry_id, type=None)[source]
Retrieve all cross-references for a protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
type (str or None) – specify type of cross-references to retain
- Returns:
cross references
- Return type:
dict or set
- domains(entry_id)[source]
Retrieve the domains present in a protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
domain information
- Return type:
- gene_ontology(entry_id, aspect=None, as_dataframe=None, as_goatools=None, as_goea=None, progress=False, **kwargs)[source]
Retrieve any associations to Gene Ontology terms for a protein.
- Parameters:
entry_id (str or int or list) – a unique identifier for a protein
aspect (str) – GO aspect - biological process (BP), cellular component (CC), molecular function (MF)
as_dataframe (bool) – whether to return as pandas data frame, optional
as_goea (bool) – whether to return a GOEnrichmentAnalysis object, optional
as_goatools (bool) – whether to return as GOATOOLS GOEA object, optional (deprecated)
progress (bool) – whether to show a progress bar during load (default False)
- Returns:
gene ontology associations
- Return type:
list or pd.DataFrame or goatools.go_enrichment.GOEnrichmentStudy
- hog_derived_orthologs(entry_id)[source]
Retrieve list of all orthologs derived from the HOG for a given protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
list of orthologs
- Return type:
- hog_derived_orthologues(entry_id)[source]
Retrieve list of all orthologues derived from the HOG for a given protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
list of orthologues
- Return type:
- homoeologs(entry_id)[source]
Retrieve all homoeologs for a given protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
list of homoeologs
- Return type:
- homoeologues(entry_id)[source]
Retrieve all homoeologues for a given protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
list of homoeologues
- Return type:
- info(entry_id)[source]
Retrieve the information available for a protein entry.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
- Returns:
entry information
- Return type:
- orthologs(entry_id, rel_type=None)[source]
Retrieve list of all identified orthologs of a protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
rel_type (str or None) – relationship type to filter to (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’)
- Returns:
list of orthologs
- Return type:
- orthologues(entry_id, rel_type=None)[source]
Retrieve list of all identified orthologues of a protein.
- Parameters:
entry_id (str or int) – a unique identifier for a protein
rel_type (str or None) – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional
- Returns:
list of orthologues
- Return type:
- search(sequence, search=None, full_length=None)[source]
Search for closest sequence in OMA database.
- Parameters:
query (str) – query sequence
search (str or None) – search strategy (‘exact, ‘approximate’, ‘mixed’ [Default])
full_length (bool) – indicates if exact matches have to be full length (by default, not)
- Returns:
closest entries
- Return type:
HOGs
- class omadb.OMARestAPI.HOGs(client)[source]
API functionality for HOG information.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() entry = c.hogs['WHEAT00001']
- __getitem__(hog_id)[source]
Retrieve the detail available for a given HOG.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns:
HOG information
- Return type:
- analyse(hog_id)[source]
Use the PyHAM package to analyse a particular hierarchical orthologous group.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns:
analysis object
- Return type:
pyham.Ham
- analyze(hog_id)[source]
Use the PyHAM package to analyse a particular hierarchical orthologous group.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns:
analysis object
- Return type:
pyham.Ham
- at_level(level, compare=None, as_dataframe=None)[source]
Retrieve list of HOGs at a particular level
- Parameters:
level (str) – level of interest
compare – if set to None, or False, returns all HOGs defined at a particular level. If
set to a parental taxonomic level, all hogs will be annotated with evolutionary events that occurred between the two points in time. If set to True, will compare with the direct parent (default None) :type compare: None or bool or str :param bool as_dataframe: whether to return as pandas data frame, optional
- Returns:
all hogs at a particular level
- Return type:
- external_references(hog_id, type=None)[source]
Retrieve external references for all members of a particular HOG.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
level (str) – level of interest
- Returns:
external references
- Return type:
dict
- gene_ontology(hog_id, level=None, aspect=None, as_dataframe=None, progress=False, **kwargs)[source]
Retrieve any ancestral associations to Gene Ontology terms for a given HOG identifier, at a particular taxonomic level.
- Parameters:
hog_id (str or list) – a unique identifier for a Hierarchical Orthologous Group - its HOGID starting with “HOG:”. (Example: HOG:0001221.1a)
level (str or list) – taxonomic level of reference for a HOG - default is its deepest level for a given HOG ID. (Example: Mammalia)
aspect (str) – GO aspect - biological process (BP), cellular component (CC), molecular function (MF)
as_dataframe (bool) – whether to return as pandas data frame, optional
progress (bool) – whether to show a progress bar during load (default False)
- Returns:
gene ontology associations
- Return type:
list or pd.DataFrame or goatools.go_enrichment.GOEnrichmentStudy
- get_orthoxml(hog_id, augmented=False)[source]
Retrieve OrthoXML (from browser) for a particular HOG.
- Parameters:
augmented (bool) – whether or not to use the augmented version of the orthoxml, that contains more information but is not fully according to the spec. Essentially it also contains orthologGroup nodes that have only one children node.
- Returns:
OrthoXML
- Return type:
str
- iham(hog_id)[source]
Create an iHam page and print path to temporary file.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- info(hog_id)[source]
Retrieve the detail available for a given HOG.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns:
HOG information
- Return type:
- members(hog_id, level=None, as_dataframe=None)[source]
Retrieve list of protein entries in a given HOG.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
level (str) – level of interest
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns:
list of members
- Return type:
list or pd.DataFrame
- xrefs(hog_id, level=None, type=None, as_dataframe=None)[source]
Retrieve external references for all members of a particular HOG.
- Parameters:
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
level (str) – level of interest
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns:
external references
- Return type:
dict or pd.DataFrame
OMAGroups
- class omadb.OMARestAPI.OMAGroups(client)[source]
API functionality for retrieving information on OMA groups.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() og = c.groups['WHEAT00001']
- __getitem__(group_id)[source]
Retrieve information available for a given OMA group.
- Parameters:
group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.
- Returns:
group information
- Return type:
- close_groups(group_id, as_dataframe=None)[source]
Retrieve the sorted list of closely related groups for a given OMA group.
- Parameters:
group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns:
sorted list of closely related groups
- Return type:
list or pd.DataFrame
Function
- class omadb.OMARestAPI.Function(client)[source]
API functionality for retrieving functional annotations for sequences.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() gos = c.function('ATCATATCAT')
- __call__(seq, as_dataframe=None)[source]
Annotate a sequence with GO terms based on annotations stored in the OMA database.
- Parameters:
query (str) – query sequence
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns:
results of fast function prediction
- Return type:
list or pd.DataFrame
Taxonomy
- class omadb.OMARestAPI.Taxonomy(client)[source]
- dendropy_tree(members=None, root=None, with_names=None)[source]
Retrieve taxonomy and load as dendropy tree.
- Parameters:
members (list) – list of members to get the induced taxonomy for, optional
root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional
with_names (bool) – whether to use species code (False, default) or species names (True), optional
- Returns:
taxonomy loaded as dendropy tree object
- Return type:
dendropy.Tree
- ete_tree(members=None, root=None, with_names=None)[source]
Retrieve taxonomy and load as ete3 tree.
- Parameters:
members (list) – list of members to get the induced taxonomy for, optional
root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional
with_names (bool) – whether to use species code (False, default) or species names (True), optional
- Returns:
taxonomy loaded as ete tree object
- Return type:
ete.Tree
- get(members=None, format=None, collapse=None)[source]
Retrieve taxonomy in a particular format and return as string.
- Parameters:
members (list) – list of members to get the induced taxonomy for, optional
format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)
collapse (bool) – whether or not to collapse levels with single child, optional (default yes)
- Returns:
taxonomy
- Return type:
str
- read(root, format=None, collapse=None)[source]
Retrieve taxonomy in a particular format and return as string.
- Parameters:
root (str or int) – taxon ID, species name or UniProt species code for root taxonomic level, optional
format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)
collapse (bool) – whether or not to collapse levels with single child, optional (default yes)
- Returns:
taxonomy
- Return type:
str
PairwiseRelations
- class omadb.OMARestAPI.PairwiseRelations(client)[source]
API functionality for pairwise relations..
Access indirectly, via the client.
Example:
from omadb import Client c = Client() arath_wheat_pairs = list(c.pairwise('ARATH', 'WHEAT', progress=True))
- __call__(genome_id1, genome_id2, chr1=None, chr2=None, rel_type=None, progress=False)[source]
List the pairwise relations among two genomes.
If genome_id1 == genome_id2, relations are close paralogues and homoeologues. If different, the relations are orthologues.
By using the paramaters chr1 and chr2, it is possible to limit the relations to a certain chromosome for one or both genomes. The ID of the chromosome corresponds to the IDs in, for example:
from omadb import Client c = Client() r = c.genomes.genome('HUMAN') human_inparalogues = list(c.pairwise('HUMAN', 'HUMAN', chr1=r.chromosomes[0].id, chr2=r.chromosomes[3].id, progress=True))
- Parameters:
genome_id1 (int or str) – unique identifier for first genome - either NCBI taxonomic identifier or UniProt species code.
genome_id2 (int or str) – unique identifier for second genome - either NCBI taxonomic identifier or UniProt species code.
chr1 (str or None) – ID of chromosome of interest in first genome
chr2 – ID of chromosome of interest in second genome :type chr2: str or None
rel_type – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional
rel_type – str or None
progress (bool) – whether to show a progress bar during load (default False)
- Returns:
generator of pairwise relations.
- Return type:
ExternalReferences
- class omadb.OMARestAPI.ExternalReferences(client)[source]
API functionality for external references from a query sequence.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() xrefs = c.xrefs('AAA')
Synteny
- class omadb.OMARestAPI.Synteny(client)[source]
API functionality for loading synteny data.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() xrefs = c.synteny.xrefs('AAA')
- at_level(level, evidence='linearized', break_circular_contigs=True)[source]
Retrieve list of HOGs at a particular level
- Parameters:
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)
break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)
- Returns:
networkx graph of synteny
- Return type:
- neighborhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]
Retrieve neighborhood around a HOG or protein at a particular level.
- Parameters:
id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)
context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)
break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)
- Returns:
networkx graph of synteny
- Return type:
- neighbourhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]
Retrieve neighbourhood around a HOG or protein at a particular level.
- Parameters:
id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)
context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)
break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)
- Returns:
networkx graph of synteny
- Return type:
- window(id, level, n=2)[source]
Retrieve window around a HOG or protein at a particular level.
- Parameters:
id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
n (int) – size of the window +-n around the query HOG. (default 2)
- Returns:
list of HOGs
- Return type:
GOEnrichmentAnalysis
ClientResponse
AttrDict
ClientPagedResponse
- class omadb.OMARestAPI.ClientPagedResponse(client, response, progress_desc=None)[source]
License
PyOMADB is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PyOMADB is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with PyOMADB. If not, see <http://www.gnu.org/licenses/>.