PyOMADB
PyOMADB is intended as a user-friendly Python wrapper around the OMA REST API.
Client
- class omadb.Client(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]
Client for the OMA browser REST API.
Initialisation example:
from omadb import Client c = Client()
- Raises
ClientException – for 400, 404, 500 errors.
ClientTimeout – for timeout when interacting with REST endpoint.
- __init__(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]
- Parameters
endpoint (str) – OMA REST API endpoint (default omabrowser.org/api)
persistent_cached (bool) – whether to cache queries on disk in SQLite DB.
persistent_cache_path (str or None) – location for persistent cache, optional
- genomes
Instance of
omadb.OMARestAPI.Genomes
.- entries
Instance of
omadb.OMARestAPI.Entries
.- proteins
Synonym of entries.
- hogs
Instance of
omadb.OMARestAPI.HOGs
.- groups
Instance of
omadb.OMARestAPI.OMAGroups
.- function
Instance of
omadb.OMARestAPI.Function
.- taxonomy
Instance of
omadb.OMARestAPI.Taxonomy
.- pairwise
Instance of
omadb.OMARestAPI.PairwiseRelations
.- xrefs
Instance of
omadb.OMARestAPI.ExternalReferences
.- external_references
Synonym of xrefs.
- synteny
Instance of
omadb.OMARestAPI.Synteny
.
Genomes
- class omadb.OMARestAPI.Genomes(client)[source]
API functionality for genome information.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() wheat = c.genomes.genome('WHEAT')
- __getitem__(genome_id)[source]
Retrieve information on a genome in OMA.
- Parameters
genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code
- Returns
genome information
- Return type
- as_dataframe()[source]
Retrieve information on all genomes in OMA, return as pandas data frame.
- Returns
information on all genomes
- Return type
pd.DataFrame
- genome(genome_id)[source]
Retrieve information on a genome in OMA.
- Parameters
genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code
- Returns
genome information
- Return type
- property genomes[source]
Retrieve information on all genomes in OMA.
- Returns
information on all genomes
- Return type
Note
The
genomes
property is alazy_property
. This property’s value is computed once (the first time it is accessed) and the result is cached.
- property list
Synonym for genomes. Retrieve information on all genomes in OMA.
- Returns
information on all genomes
- Return type
Entries
- class omadb.OMARestAPI.Entries(client)[source]
API functionality for protein entries.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() entry = c.entries['WHEAT00001']
- __getitem__(entry_id)[source]
Retrieve the information available for a protein entry.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
entry information
- Return type
- cross_references(entry_id, type=None)[source]
Retrieve all cross-references for a protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
type (str or None) – specify type of cross-references to retain
- Returns
cross references
- Return type
dict or set
- domains(entry_id)[source]
Retrieve the domains present in a protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
domain information
- Return type
- gene_ontology(entry_id, aspect=None, as_dataframe=None, as_goatools=None, as_goea=None, progress=False, **kwargs)[source]
Retrieve any associations to Gene Ontology terms for a protein.
- Parameters
entry_id (str or int or list) – a unique identifier for a protein
aspect (str) – GO aspect - biological process (BP), cellular component (CC), molecular function (MF)
as_dataframe (bool) – whether to return as pandas data frame, optional
as_goea (bool) – whether to return a GOEnrichmentAnalysis object, optional
as_goatools (bool) – whether to return as GOATOOLS GOEA object, optional (deprecated)
progress (bool) – whether to show a progress bar during load (default False)
- Returns
gene ontology associations
- Return type
list or pd.DataFrame or goatools.go_enrichment.GOEnrichmentStudy
- hog_derived_orthologs(entry_id)[source]
Retrieve list of all orthologs derived from the HOG for a given protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
list of orthologs
- Return type
- hog_derived_orthologues(entry_id)[source]
Retrieve list of all orthologues derived from the HOG for a given protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
list of orthologues
- Return type
- homoeologs(entry_id)[source]
Retrieve all homoeologs for a given protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
list of homoeologs
- Return type
- homoeologues(entry_id)[source]
Retrieve all homoeologues for a given protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
list of homoeologues
- Return type
- info(entry_id)[source]
Retrieve the information available for a protein entry.
- Parameters
entry_id (str or int) – a unique identifier for a protein
- Returns
entry information
- Return type
- orthologs(entry_id, rel_type=None)[source]
Retrieve list of all identified orthologs of a protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
rel_type (str or None) – relationship type to filter to (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’)
- Returns
list of orthologs
- Return type
- orthologues(entry_id, rel_type=None)[source]
Retrieve list of all identified orthologues of a protein.
- Parameters
entry_id (str or int) – a unique identifier for a protein
rel_type (str or None) – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional
- Returns
list of orthologues
- Return type
- search(sequence, search=None, full_length=None)[source]
Search for closest sequence in OMA database.
- Parameters
query (str) – query sequence
search (str or None) – search strategy (‘exact, ‘approximate’, ‘mixed’ [Default])
full_length (bool) – indicates if exact matches have to be full length (by default, not)
- Returns
closest entries
- Return type
HOGs
- class omadb.OMARestAPI.HOGs(client)[source]
API functionality for HOG information.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() entry = c.hogs['WHEAT00001']
- __getitem__(hog_id)[source]
Retrieve the detail available for a given HOG.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns
HOG information
- Return type
- analyse(hog_id)[source]
Use the PyHAM package to analyse a particular hierarchical orthologous group.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns
analysis object
- Return type
pyham.Ham
- analyze(hog_id)[source]
Use the PyHAM package to analyse a particular hierarchical orthologous group.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns
analysis object
- Return type
pyham.Ham
- at_level(level, compare=None, as_dataframe=None)[source]
Retrieve list of HOGs at a particular level
- Parameters
level (str) – level of interest
compare – if set to None, or False, returns all HOGs defined at a particular level. If
set to a parental taxonomic level, all hogs will be annotated with evolutionary events that occurred between the two points in time. If set to True, will compare with the direct parent (default None) :type compare: None or bool or str :param bool as_dataframe: whether to return as pandas data frame, optional
- Returns
all hogs at a particular level
- Return type
- external_references(hog_id, type=None)[source]
Retrieve external references for all members of a particular HOG.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
level (str) – level of interest
- Returns
external references
- Return type
dict
- get_orthoxml(hog_id, augmented=False)[source]
Retrieve OrthoXML (from browser) for a particular HOG.
- Parameters
augmented (bool) – whether or not to use the augmented version of the orthoxml, that contains more information but is not fully according to the spec. Essentially it also contains orthologGroup nodes that have only one children node.
- Returns
OrthoXML
- Return type
str
- iham(hog_id)[source]
Create an iHam page and print path to temporary file.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- info(hog_id)[source]
Retrieve the detail available for a given HOG.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
- Returns
HOG information
- Return type
- members(hog_id, level=None, as_dataframe=None)[source]
Retrieve list of protein entries in a given HOG.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
level (str) – level of interest
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns
list of members
- Return type
list or pd.DataFrame
- xrefs(hog_id, level=None, type=None, as_dataframe=None)[source]
Retrieve external references for all members of a particular HOG.
- Parameters
hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins
level (str) – level of interest
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns
external references
- Return type
dict or pd.DataFrame
OMAGroups
- class omadb.OMARestAPI.OMAGroups(client)[source]
API functionality for retrieving information on OMA groups.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() og = c.groups['WHEAT00001']
- __getitem__(group_id)[source]
Retrieve information available for a given OMA group.
- Parameters
group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.
- Returns
group information
- Return type
- close_groups(group_id, as_dataframe=None)[source]
Retrieve the sorted list of closely related groups for a given OMA group.
- Parameters
group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns
sorted list of closely related groups
- Return type
list or pd.DataFrame
Function
- class omadb.OMARestAPI.Function(client)[source]
API functionality for retrieving functional annotations for sequences.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() gos = c.function('ATCATATCAT')
- __call__(seq, as_dataframe=None)[source]
Annotate a sequence with GO terms based on annotations stored in the OMA database.
- Parameters
query (str) – query sequence
as_dataframe (bool) – whether to return as pandas data frame, optional
- Returns
results of fast function prediction
- Return type
list or pd.DataFrame
Taxonomy
- class omadb.OMARestAPI.Taxonomy(client)[source]
- dendropy_tree(members=None, root=None, with_names=None)[source]
Retrieve taxonomy and load as dendropy tree.
- Parameters
members (list) – list of members to get the induced taxonomy for, optional
root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional
with_names (bool) – whether to use species code (False, default) or species names (True), optional
- Returns
taxonomy loaded as dendropy tree object
- Return type
dendropy.Tree
- ete_tree(members=None, root=None, with_names=None)[source]
Retrieve taxonomy and load as ete3 tree.
- Parameters
members (list) – list of members to get the induced taxonomy for, optional
root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional
with_names (bool) – whether to use species code (False, default) or species names (True), optional
- Returns
taxonomy loaded as ete tree object
- Return type
ete.Tree
- get(members=None, format=None, collapse=None)[source]
Retrieve taxonomy in a particular format and return as string.
- Parameters
members (list) – list of members to get the induced taxonomy for, optional
format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)
collapse (bool) – whether or not to collapse levels with single child, optional (default yes)
- Returns
taxonomy
- Return type
str
- read(root, format=None, collapse=None)[source]
Retrieve taxonomy in a particular format and return as string.
- Parameters
root (str or int) – taxon ID, species name or UniProt species code for root taxonomic level, optional
format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)
collapse (bool) – whether or not to collapse levels with single child, optional (default yes)
- Returns
taxonomy
- Return type
str
PairwiseRelations
- class omadb.OMARestAPI.PairwiseRelations(client)[source]
API functionality for pairwise relations..
Access indirectly, via the client.
Example:
from omadb import Client c = Client() arath_wheat_pairs = list(c.pairwise('ARATH', 'WHEAT', progress=True))
- __call__(genome_id1, genome_id2, chr1=None, chr2=None, rel_type=None, progress=False)[source]
List the pairwise relations among two genomes.
If genome_id1 == genome_id2, relations are close paralogues and homoeologues. If different, the relations are orthologues.
By using the paramaters chr1 and chr2, it is possible to limit the relations to a certain chromosome for one or both genomes. The ID of the chromosome corresponds to the IDs in, for example:
from omadb import Client c = Client() r = c.genomes.genome('HUMAN') human_inparalogues = list(c.pairwise('HUMAN', 'HUMAN', chr1=r.chromosomes[0].id, chr2=r.chromosomes[3].id, progress=True))
- Parameters
genome_id1 (int or str) – unique identifier for first genome - either NCBI taxonomic identifier or UniProt species code.
genome_id2 (int or str) – unique identifier for second genome - either NCBI taxonomic identifier or UniProt species code.
chr1 (str or None) – ID of chromosome of interest in first genome
chr2 – ID of chromosome of interest in second genome :type chr2: str or None
rel_type – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional
rel_type – str or None
progress (bool) – whether to show a progress bar during load (default False)
- Returns
generator of pairwise relations.
- Return type
ExternalReferences
- class omadb.OMARestAPI.ExternalReferences(client)[source]
API functionality for external references from a query sequence.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() xrefs = c.xrefs('AAA')
Synteny
- class omadb.OMARestAPI.Synteny(client)[source]
API functionality for loading synteny data.
Access indirectly, via the client.
Example:
from omadb import Client c = Client() xrefs = c.synteny.xrefs('AAA')
- at_level(level, evidence='linearized', break_circular_contigs=True)[source]
Retrieve list of HOGs at a particular level
- Parameters
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)
break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)
- Returns
networkx graph of synteny
- Return type
- neighborhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]
Retrieve neighborhood around a HOG or protein at a particular level.
- Parameters
id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)
context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)
break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)
- Returns
networkx graph of synteny
- Return type
- neighbourhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]
Retrieve neighbourhood around a HOG or protein at a particular level.
- Parameters
id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)
context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)
break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)
- Returns
networkx graph of synteny
- Return type
- window(id, level, n=2)[source]
Retrieve window around a HOG or protein at a particular level.
- Parameters
id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)
level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.
n (int) – size of the window +-n around the query HOG. (default 2)
- Returns
list of HOGs
- Return type
GOEnrichmentAnalysis
ClientResponse
- class omadb.OMARestAPI.ClientResponse(response, client=None, _is_paginated=None)[source]
Bases:
omadb.OMARestAPI.AttrDict
ClientPagedResponse
- class omadb.OMARestAPI.ClientPagedResponse(client, response, progress_desc=None)[source]
License
PyOMADB is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PyOMADB is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with PyOMADB. If not, see <http://www.gnu.org/licenses/>.