PyOMADB

PyOMADB is intended as a user-friendly Python wrapper around the OMA REST API.

Client

class omadb.Client(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]

Client for the OMA browser REST API.

Initialisation example:

from omadb import Client
c = Client()
Raises:
  • ClientException – for 400, 404, 500 errors.

  • ClientTimeout – for timeout when interacting with REST endpoint.

__init__(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]
Parameters:
  • endpoint (str) – OMA REST API endpoint (default omabrowser.org/api)

  • persistent_cached (bool) – whether to cache queries on disk in SQLite DB.

  • persistent_cache_path (str or None) – location for persistent cache, optional

genomes

Instance of omadb.OMARestAPI.Genomes.

entries

Instance of omadb.OMARestAPI.Entries.

proteins

Synonym of entries.

hogs

Instance of omadb.OMARestAPI.HOGs.

groups

Instance of omadb.OMARestAPI.OMAGroups.

function

Instance of omadb.OMARestAPI.Function.

taxonomy

Instance of omadb.OMARestAPI.Taxonomy.

pairwise

Instance of omadb.OMARestAPI.PairwiseRelations.

xrefs

Instance of omadb.OMARestAPI.ExternalReferences.

external_references

Synonym of xrefs.

synteny

Instance of omadb.OMARestAPI.Synteny.

clear_cache()[source]

Clear both RAM and persistent cache.

Genomes

class omadb.OMARestAPI.Genomes(client)[source]

API functionality for genome information.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
wheat = c.genomes.genome('WHEAT')
__getitem__(genome_id)[source]

Retrieve information on a genome in OMA.

Parameters:

genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code

Returns:

genome information

Return type:

ClientResponse

__iter__()[source]

Iterate over all genomes.

as_dataframe()[source]

Retrieve information on all genomes in OMA, return as pandas data frame.

Returns:

information on all genomes

Return type:

pd.DataFrame

genome(genome_id)[source]

Retrieve information on a genome in OMA.

Parameters:

genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code

Returns:

genome information

Return type:

ClientResponse

property genomes[source]

Retrieve information on all genomes in OMA.

Returns:

information on all genomes

Return type:

ClientResponse

Note

The genomes property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

property list

Synonym for genomes. Retrieve information on all genomes in OMA.

Returns:

information on all genomes

Return type:

ClientResponse

proteins(genome_id, progress=False)[source]

Retrive all proteins for a particular genome.

Parameters:
  • genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code

  • progress (bool) – whether to show progress bar

Returns:

proteins in genome

Return type:

ClientPagedResponse

Entries

class omadb.OMARestAPI.Entries(client)[source]

API functionality for protein entries.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
entry = c.entries['WHEAT00001']
__getitem__(entry_id)[source]

Retrieve the information available for a protein entry.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

entry information

Return type:

ClientResponse

cross_references(entry_id, type=None)[source]

Retrieve all cross-references for a protein.

Parameters:
  • entry_id (str or int) – a unique identifier for a protein

  • type (str or None) – specify type of cross-references to retain

Returns:

cross references

Return type:

dict or set

domains(entry_id)[source]

Retrieve the domains present in a protein.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

domain information

Return type:

ClientResponse

gene_ontology(entry_id, aspect=None, as_dataframe=None, as_goatools=None, as_goea=None, progress=False, **kwargs)[source]

Retrieve any associations to Gene Ontology terms for a protein.

Parameters:
  • entry_id (str or int or list) – a unique identifier for a protein

  • aspect (str) – GO aspect - biological process (BP), cellular component (CC), molecular function (MF)

  • as_dataframe (bool) – whether to return as pandas data frame, optional

  • as_goea (bool) – whether to return a GOEnrichmentAnalysis object, optional

  • as_goatools (bool) – whether to return as GOATOOLS GOEA object, optional (deprecated)

  • progress (bool) – whether to show a progress bar during load (default False)

Returns:

gene ontology associations

Return type:

list or pd.DataFrame or goatools.go_enrichment.GOEnrichmentStudy

hog_derived_orthologs(entry_id)[source]

Retrieve list of all orthologs derived from the HOG for a given protein.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

list of orthologs

Return type:

ClientResponse

hog_derived_orthologues(entry_id)[source]

Retrieve list of all orthologues derived from the HOG for a given protein.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

list of orthologues

Return type:

ClientResponse

homoeologs(entry_id)[source]

Retrieve all homoeologs for a given protein.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

list of homoeologs

Return type:

ClientResponse

homoeologues(entry_id)[source]

Retrieve all homoeologues for a given protein.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

list of homoeologues

Return type:

ClientResponse

info(entry_id)[source]

Retrieve the information available for a protein entry.

Parameters:

entry_id (str or int) – a unique identifier for a protein

Returns:

entry information

Return type:

ClientResponse

orthologs(entry_id, rel_type=None)[source]

Retrieve list of all identified orthologs of a protein.

Parameters:
  • entry_id (str or int) – a unique identifier for a protein

  • rel_type (str or None) – relationship type to filter to (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’)

Returns:

list of orthologs

Return type:

ClientResponse

orthologues(entry_id, rel_type=None)[source]

Retrieve list of all identified orthologues of a protein.

Parameters:
  • entry_id (str or int) – a unique identifier for a protein

  • rel_type (str or None) – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional

Returns:

list of orthologues

Return type:

ClientResponse

search(sequence, search=None, full_length=None)[source]

Search for closest sequence in OMA database.

Parameters:
  • query (str) – query sequence

  • search (str or None) – search strategy (‘exact, ‘approximate’, ‘mixed’ [Default])

  • full_length (bool) – indicates if exact matches have to be full length (by default, not)

Returns:

closest entries

Return type:

ClientResponse

xrefs(entry_id, type=None)[source]

Retrieve all cross-references for a protein.

Parameters:
  • entry_id (str or int) – a unique identifier for a protein

  • type (str or None) – specify type of cross-references to retain

Returns:

cross references

Return type:

dict or set

HOGs

class omadb.OMARestAPI.HOGs(client)[source]

API functionality for HOG information.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
entry = c.hogs['WHEAT00001']
__getitem__(hog_id)[source]

Retrieve the detail available for a given HOG.

Parameters:

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns:

HOG information

Return type:

ClientResponse

analyse(hog_id)[source]

Use the PyHAM package to analyse a particular hierarchical orthologous group.

Parameters:

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns:

analysis object

Return type:

pyham.Ham

analyze(hog_id)[source]

Use the PyHAM package to analyse a particular hierarchical orthologous group.

Parameters:

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns:

analysis object

Return type:

pyham.Ham

at_level(level, compare=None, as_dataframe=None)[source]

Retrieve list of HOGs at a particular level

Parameters:
  • level (str) – level of interest

  • compare – if set to None, or False, returns all HOGs defined at a particular level. If

set to a parental taxonomic level, all hogs will be annotated with evolutionary events that occurred between the two points in time. If set to True, will compare with the direct parent (default None) :type compare: None or bool or str :param bool as_dataframe: whether to return as pandas data frame, optional

Returns:

all hogs at a particular level

Return type:

ClientPagedResponse

external_references(hog_id, type=None)[source]

Retrieve external references for all members of a particular HOG.

Parameters:
  • hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

  • level (str) – level of interest

Returns:

external references

Return type:

dict

gene_ontology(hog_id, level=None, aspect=None, as_dataframe=None, progress=False, **kwargs)[source]

Retrieve any ancestral associations to Gene Ontology terms for a given HOG identifier, at a particular taxonomic level.

Parameters:
  • hog_id (str or list) – a unique identifier for a Hierarchical Orthologous Group - its HOGID starting with “HOG:”. (Example: HOG:0001221.1a)

  • level (str or list) – taxonomic level of reference for a HOG - default is its deepest level for a given HOG ID. (Example: Mammalia)

  • aspect (str) – GO aspect - biological process (BP), cellular component (CC), molecular function (MF)

  • as_dataframe (bool) – whether to return as pandas data frame, optional

  • progress (bool) – whether to show a progress bar during load (default False)

Returns:

gene ontology associations

Return type:

list or pd.DataFrame or goatools.go_enrichment.GOEnrichmentStudy

get_orthoxml(hog_id, augmented=False)[source]

Retrieve OrthoXML (from browser) for a particular HOG.

Parameters:

augmented (bool) – whether or not to use the augmented version of the orthoxml, that contains more information but is not fully according to the spec. Essentially it also contains orthologGroup nodes that have only one children node.

Returns:

OrthoXML

Return type:

str

iham(hog_id)[source]

Create an iHam page and print path to temporary file.

Parameters:

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

info(hog_id)[source]

Retrieve the detail available for a given HOG.

Parameters:

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns:

HOG information

Return type:

ClientResponse

list()[source]

Retrieve information about all HOGs.

Returns:

all HOGs

Return type:

ClientPagedResponse

members(hog_id, level=None, as_dataframe=None)[source]

Retrieve list of protein entries in a given HOG.

Parameters:
  • hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

  • level (str) – level of interest

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns:

list of members

Return type:

list or pd.DataFrame

xrefs(hog_id, level=None, type=None, as_dataframe=None)[source]

Retrieve external references for all members of a particular HOG.

Parameters:
  • hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

  • level (str) – level of interest

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns:

external references

Return type:

dict or pd.DataFrame

OMAGroups

class omadb.OMARestAPI.OMAGroups(client)[source]

API functionality for retrieving information on OMA groups.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
og = c.groups['WHEAT00001']
__getitem__(group_id)[source]

Retrieve information available for a given OMA group.

Parameters:

group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.

Returns:

group information

Return type:

ClientResponse

__iter__()[source]

Iterate over all OMA groups in the current release.

Yields:

groups

close_groups(group_id, as_dataframe=None)[source]

Retrieve the sorted list of closely related groups for a given OMA group.

Parameters:
  • group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns:

sorted list of closely related groups

Return type:

list or pd.DataFrame

info(group_id)[source]

Retrieve information available for a given OMA group.

Parameters:

group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.

Returns:

group information

Return type:

ClientResponse

Function

class omadb.OMARestAPI.Function(client)[source]

API functionality for retrieving functional annotations for sequences.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
gos = c.function('ATCATATCAT')
__call__(seq, as_dataframe=None)[source]

Annotate a sequence with GO terms based on annotations stored in the OMA database.

Parameters:
  • query (str) – query sequence

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns:

results of fast function prediction

Return type:

list or pd.DataFrame

Taxonomy

class omadb.OMARestAPI.Taxonomy(client)[source]
dendropy_tree(members=None, root=None, with_names=None)[source]

Retrieve taxonomy and load as dendropy tree.

Parameters:
  • members (list) – list of members to get the induced taxonomy for, optional

  • root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional

  • with_names (bool) – whether to use species code (False, default) or species names (True), optional

Returns:

taxonomy loaded as dendropy tree object

Return type:

dendropy.Tree

ete_tree(members=None, root=None, with_names=None)[source]

Retrieve taxonomy and load as ete3 tree.

Parameters:
  • members (list) – list of members to get the induced taxonomy for, optional

  • root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional

  • with_names (bool) – whether to use species code (False, default) or species names (True), optional

Returns:

taxonomy loaded as ete tree object

Return type:

ete.Tree

get(members=None, format=None, collapse=None)[source]

Retrieve taxonomy in a particular format and return as string.

Parameters:
  • members (list) – list of members to get the induced taxonomy for, optional

  • format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)

  • collapse (bool) – whether or not to collapse levels with single child, optional (default yes)

Returns:

taxonomy

Return type:

str

read(root, format=None, collapse=None)[source]

Retrieve taxonomy in a particular format and return as string.

Parameters:
  • root (str or int) – taxon ID, species name or UniProt species code for root taxonomic level, optional

  • format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)

  • collapse (bool) – whether or not to collapse levels with single child, optional (default yes)

Returns:

taxonomy

Return type:

str

PairwiseRelations

class omadb.OMARestAPI.PairwiseRelations(client)[source]

API functionality for pairwise relations..

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
arath_wheat_pairs = list(c.pairwise('ARATH', 'WHEAT', progress=True))
__call__(genome_id1, genome_id2, chr1=None, chr2=None, rel_type=None, progress=False)[source]

List the pairwise relations among two genomes.

If genome_id1 == genome_id2, relations are close paralogues and homoeologues. If different, the relations are orthologues.

By using the paramaters chr1 and chr2, it is possible to limit the relations to a certain chromosome for one or both genomes. The ID of the chromosome corresponds to the IDs in, for example:

from omadb import Client
c = Client()
r = c.genomes.genome('HUMAN')
human_inparalogues = list(c.pairwise('HUMAN',
                                     'HUMAN',
                                     chr1=r.chromosomes[0].id,
                                     chr2=r.chromosomes[3].id,
                                     progress=True))
Parameters:
  • genome_id1 (int or str) – unique identifier for first genome - either NCBI taxonomic identifier or UniProt species code.

  • genome_id2 (int or str) – unique identifier for second genome - either NCBI taxonomic identifier or UniProt species code.

  • chr1 (str or None) – ID of chromosome of interest in first genome

  • chr2 – ID of chromosome of interest in second genome :type chr2: str or None

  • rel_type – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional

  • rel_type – str or None

  • progress (bool) – whether to show a progress bar during load (default False)

Returns:

generator of pairwise relations.

Return type:

ClientPagedResponse

ExternalReferences

class omadb.OMARestAPI.ExternalReferences(client)[source]

API functionality for external references from a query sequence.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
xrefs = c.xrefs('AAA')
__call__(sequence)[source]

Retrieve external references for a particular sequence.

Parameters:
  • sequence (str) – query sequence or pattern

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns:

list of cross references and match information

Return type:

ClientResponse

Synteny

class omadb.OMARestAPI.Synteny(client)[source]

API functionality for loading synteny data.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
xrefs = c.synteny.xrefs('AAA')
at_level(level, evidence='linearized', break_circular_contigs=True)[source]

Retrieve list of HOGs at a particular level

Parameters:
  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)

  • break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)

Returns:

networkx graph of synteny

Return type:

ClientResponse

neighborhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]

Retrieve neighborhood around a HOG or protein at a particular level.

Parameters:
  • id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)

  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)

  • context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)

  • break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)

Returns:

networkx graph of synteny

Return type:

ClientResponse

neighbourhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]

Retrieve neighbourhood around a HOG or protein at a particular level.

Parameters:
  • id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)

  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)

  • context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)

  • break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)

Returns:

networkx graph of synteny

Return type:

ClientResponse

window(id, level, n=2)[source]

Retrieve window around a HOG or protein at a particular level.

Parameters:
  • id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)

  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • n (int) – size of the window +-n around the query HOG. (default 2)

Returns:

list of HOGs

Return type:

ClientResponse

GOEnrichmentAnalysis

class omadb.OMARestAPI.GOEnrichmentAnalysis(annots, go, methods)[source]

Wrapper to GOATOOLs to make it more user-friendly with the API.

run_study(foreground)[source]

Runs the GOEA study and returns a dataframe of results.

Parameters:

foreground (list) – unique identifiers for proteins in foreground set

Returns:

enrichment analysis results

Return type:

pd.DataFrame

ClientResponse

class omadb.OMARestAPI.ClientResponse(response, client=None, _is_paginated=None)[source]

Bases: AttrDict

AttrDict

class omadb.OMARestAPI.AttrDict(d)[source]

ClientPagedResponse

class omadb.OMARestAPI.ClientPagedResponse(client, response, progress_desc=None)[source]
__iter__()[source]

Iterates over all entries, silently lazily loading the next page when required.

as_dataframe()[source]

Retrieves all entries, from all pages. Returns as pandas data frame.

Note: in general, it would be better to use iter_dataframes to deal with the returned entries in chunks (if possible).

Returns:

data frame containing all entries in response

Return type:

pd.DataFrame

iter_dataframes()[source]

Yields dataframes for each page of response. That is, entries are yielded in chunks.

License

PyOMADB is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PyOMADB is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with PyOMADB. If not, see <http://www.gnu.org/licenses/>.