PyOMADB

PyOMADB is intended as a user-friendly Python wrapper around the OMA REST API.

Client

class omadb.Client(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]

Client for the OMA browser REST API.

Initialisation example:

from omadb import Client
c = Client()
Raises
  • ClientException – for 400, 404, 500 errors.

  • ClientTimeout – for timeout when interacting with REST endpoint.

__init__(endpoint='omabrowser.org/api', persistent_cached=False, persistent_cache_path=None)[source]
Parameters
  • endpoint (str) – OMA REST API endpoint (default omabrowser.org/api)

  • persistent_cached (bool) – whether to cache queries on disk in SQLite DB.

  • persistent_cache_path (str or None) – location for persistent cache, optional

genomes

Instance of omadb.OMARestAPI.Genomes.

entries

Instance of omadb.OMARestAPI.Entries.

proteins

Synonym of entries.

hogs

Instance of omadb.OMARestAPI.HOGs.

groups

Instance of omadb.OMARestAPI.OMAGroups.

function

Instance of omadb.OMARestAPI.Function.

taxonomy

Instance of omadb.OMARestAPI.Taxonomy.

pairwise

Instance of omadb.OMARestAPI.PairwiseRelations.

xrefs

Instance of omadb.OMARestAPI.ExternalReferences.

external_references

Synonym of xrefs.

synteny

Instance of omadb.OMARestAPI.Synteny.

clear_cache()[source]

Clear both RAM and persistent cache.

Genomes

class omadb.OMARestAPI.Genomes(client)[source]

API functionality for genome information.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
wheat = c.genomes.genome('WHEAT')
__getitem__(genome_id)[source]

Retrieve information on a genome in OMA.

Parameters

genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code

Returns

genome information

Return type

ClientResponse

__iter__()[source]

Iterate over all genomes.

as_dataframe()[source]

Retrieve information on all genomes in OMA, return as pandas data frame.

Returns

information on all genomes

Return type

pd.DataFrame

genome(genome_id)[source]

Retrieve information on a genome in OMA.

Parameters

genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code

Returns

genome information

Return type

ClientResponse

property genomes[source]

Retrieve information on all genomes in OMA.

Returns

information on all genomes

Return type

ClientResponse

Note

The genomes property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

property list

Synonym for genomes. Retrieve information on all genomes in OMA.

Returns

information on all genomes

Return type

ClientResponse

proteins(genome_id, progress=False)[source]

Retrive all proteins for a particular genome.

Parameters
  • genome_id (str or int) – unique identifier for genome, NCBI taxonomic ID or UniProt species code

  • progress (bool) – whether to show progress bar

Returns

proteins in genome

Return type

ClientPagedResponse

Entries

class omadb.OMARestAPI.Entries(client)[source]

API functionality for protein entries.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
entry = c.entries['WHEAT00001']
__getitem__(entry_id)[source]

Retrieve the information available for a protein entry.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

entry information

Return type

ClientResponse

cross_references(entry_id, type=None)[source]

Retrieve all cross-references for a protein.

Parameters
  • entry_id (str or int) – a unique identifier for a protein

  • type (str or None) – specify type of cross-references to retain

Returns

cross references

Return type

dict or set

domains(entry_id)[source]

Retrieve the domains present in a protein.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

domain information

Return type

ClientResponse

gene_ontology(entry_id, aspect=None, as_dataframe=None, as_goatools=None, as_goea=None, progress=False, **kwargs)[source]

Retrieve any associations to Gene Ontology terms for a protein.

Parameters
  • entry_id (str or int or list) – a unique identifier for a protein

  • aspect (str) – GO aspect - biological process (BP), cellular component (CC), molecular function (MF)

  • as_dataframe (bool) – whether to return as pandas data frame, optional

  • as_goea (bool) – whether to return a GOEnrichmentAnalysis object, optional

  • as_goatools (bool) – whether to return as GOATOOLS GOEA object, optional (deprecated)

  • progress (bool) – whether to show a progress bar during load (default False)

Returns

gene ontology associations

Return type

list or pd.DataFrame or goatools.go_enrichment.GOEnrichmentStudy

hog_derived_orthologs(entry_id)[source]

Retrieve list of all orthologs derived from the HOG for a given protein.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

list of orthologs

Return type

ClientResponse

hog_derived_orthologues(entry_id)[source]

Retrieve list of all orthologues derived from the HOG for a given protein.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

list of orthologues

Return type

ClientResponse

homoeologs(entry_id)[source]

Retrieve all homoeologs for a given protein.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

list of homoeologs

Return type

ClientResponse

homoeologues(entry_id)[source]

Retrieve all homoeologues for a given protein.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

list of homoeologues

Return type

ClientResponse

info(entry_id)[source]

Retrieve the information available for a protein entry.

Parameters

entry_id (str or int) – a unique identifier for a protein

Returns

entry information

Return type

ClientResponse

orthologs(entry_id, rel_type=None)[source]

Retrieve list of all identified orthologs of a protein.

Parameters
  • entry_id (str or int) – a unique identifier for a protein

  • rel_type (str or None) – relationship type to filter to (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’)

Returns

list of orthologs

Return type

ClientResponse

orthologues(entry_id, rel_type=None)[source]

Retrieve list of all identified orthologues of a protein.

Parameters
  • entry_id (str or int) – a unique identifier for a protein

  • rel_type (str or None) – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional

Returns

list of orthologues

Return type

ClientResponse

search(sequence, search=None, full_length=None)[source]

Search for closest sequence in OMA database.

Parameters
  • query (str) – query sequence

  • search (str or None) – search strategy (‘exact, ‘approximate’, ‘mixed’ [Default])

  • full_length (bool) – indicates if exact matches have to be full length (by default, not)

Returns

closest entries

Return type

ClientResponse

xrefs(entry_id, type=None)[source]

Retrieve all cross-references for a protein.

Parameters
  • entry_id (str or int) – a unique identifier for a protein

  • type (str or None) – specify type of cross-references to retain

Returns

cross references

Return type

dict or set

HOGs

class omadb.OMARestAPI.HOGs(client)[source]

API functionality for HOG information.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
entry = c.hogs['WHEAT00001']
__getitem__(hog_id)[source]

Retrieve the detail available for a given HOG.

Parameters

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns

HOG information

Return type

ClientResponse

analyse(hog_id)[source]

Use the PyHAM package to analyse a particular hierarchical orthologous group.

Parameters

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns

analysis object

Return type

pyham.Ham

analyze(hog_id)[source]

Use the PyHAM package to analyse a particular hierarchical orthologous group.

Parameters

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns

analysis object

Return type

pyham.Ham

at_level(level, compare=None, as_dataframe=None)[source]

Retrieve list of HOGs at a particular level

Parameters
  • level (str) – level of interest

  • compare – if set to None, or False, returns all HOGs defined at a particular level. If

set to a parental taxonomic level, all hogs will be annotated with evolutionary events that occurred between the two points in time. If set to True, will compare with the direct parent (default None) :type compare: None or bool or str :param bool as_dataframe: whether to return as pandas data frame, optional

Returns

all hogs at a particular level

Return type

ClientPagedResponse

external_references(hog_id, type=None)[source]

Retrieve external references for all members of a particular HOG.

Parameters
  • hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

  • level (str) – level of interest

Returns

external references

Return type

dict

get_orthoxml(hog_id, augmented=False)[source]

Retrieve OrthoXML (from browser) for a particular HOG.

Parameters

augmented (bool) – whether or not to use the augmented version of the orthoxml, that contains more information but is not fully according to the spec. Essentially it also contains orthologGroup nodes that have only one children node.

Returns

OrthoXML

Return type

str

iham(hog_id)[source]

Create an iHam page and print path to temporary file.

Parameters

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

info(hog_id)[source]

Retrieve the detail available for a given HOG.

Parameters

hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

Returns

HOG information

Return type

ClientResponse

list()[source]

Retrieve information about all HOGs.

Returns

all HOGs

Return type

ClientPagedResponse

members(hog_id, level=None, as_dataframe=None)[source]

Retrieve list of protein entries in a given HOG.

Parameters
  • hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

  • level (str) – level of interest

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns

list of members

Return type

list or pd.DataFrame

xrefs(hog_id, level=None, type=None, as_dataframe=None)[source]

Retrieve external references for all members of a particular HOG.

Parameters
  • hog_id (str or int) – unique identifier for a HOG, either HOG ID or one of its member proteins

  • level (str) – level of interest

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns

external references

Return type

dict or pd.DataFrame

OMAGroups

class omadb.OMARestAPI.OMAGroups(client)[source]

API functionality for retrieving information on OMA groups.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
og = c.groups['WHEAT00001']
__getitem__(group_id)[source]

Retrieve information available for a given OMA group.

Parameters

group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.

Returns

group information

Return type

ClientResponse

__iter__()[source]

Iterate over all OMA groups in the current release.

Yields

groups

close_groups(group_id, as_dataframe=None)[source]

Retrieve the sorted list of closely related groups for a given OMA group.

Parameters
  • group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns

sorted list of closely related groups

Return type

list or pd.DataFrame

info(group_id)[source]

Retrieve information available for a given OMA group.

Parameters

group_id (int or str) – unique identifier of a group - either group number, fingerprint or entry ID of a member.

Returns

group information

Return type

ClientResponse

Function

class omadb.OMARestAPI.Function(client)[source]

API functionality for retrieving functional annotations for sequences.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
gos = c.function('ATCATATCAT')
__call__(seq, as_dataframe=None)[source]

Annotate a sequence with GO terms based on annotations stored in the OMA database.

Parameters
  • query (str) – query sequence

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns

results of fast function prediction

Return type

list or pd.DataFrame

Taxonomy

class omadb.OMARestAPI.Taxonomy(client)[source]
dendropy_tree(members=None, root=None, with_names=None)[source]

Retrieve taxonomy and load as dendropy tree.

Parameters
  • members (list) – list of members to get the induced taxonomy for, optional

  • root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional

  • with_names (bool) – whether to use species code (False, default) or species names (True), optional

Returns

taxonomy loaded as dendropy tree object

Return type

dendropy.Tree

ete_tree(members=None, root=None, with_names=None)[source]

Retrieve taxonomy and load as ete3 tree.

Parameters
  • members (list) – list of members to get the induced taxonomy for, optional

  • root (str or int or None) – taxon ID, species name or UniProt species code for root taxonomic level, optional

  • with_names (bool) – whether to use species code (False, default) or species names (True), optional

Returns

taxonomy loaded as ete tree object

Return type

ete.Tree

get(members=None, format=None, collapse=None)[source]

Retrieve taxonomy in a particular format and return as string.

Parameters
  • members (list) – list of members to get the induced taxonomy for, optional

  • format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)

  • collapse (bool) – whether or not to collapse levels with single child, optional (default yes)

Returns

taxonomy

Return type

str

read(root, format=None, collapse=None)[source]

Retrieve taxonomy in a particular format and return as string.

Parameters
  • root (str or int) – taxon ID, species name or UniProt species code for root taxonomic level, optional

  • format (str) – format of the taxonomy (dictionary [default], newick or phyloxml)

  • collapse (bool) – whether or not to collapse levels with single child, optional (default yes)

Returns

taxonomy

Return type

str

PairwiseRelations

class omadb.OMARestAPI.PairwiseRelations(client)[source]

API functionality for pairwise relations..

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
arath_wheat_pairs = list(c.pairwise('ARATH', 'WHEAT', progress=True))
__call__(genome_id1, genome_id2, chr1=None, chr2=None, rel_type=None, progress=False)[source]

List the pairwise relations among two genomes.

If genome_id1 == genome_id2, relations are close paralogues and homoeologues. If different, the relations are orthologues.

By using the paramaters chr1 and chr2, it is possible to limit the relations to a certain chromosome for one or both genomes. The ID of the chromosome corresponds to the IDs in, for example:

from omadb import Client
c = Client()
r = c.genomes.genome('HUMAN')
human_inparalogues = list(c.pairwise('HUMAN',
                                     'HUMAN',
                                     chr1=r.chromosomes[0].id,
                                     chr2=r.chromosomes[3].id,
                                     progress=True))
Parameters
  • genome_id1 (int or str) – unique identifier for first genome - either NCBI taxonomic identifier or UniProt species code.

  • genome_id2 (int or str) – unique identifier for second genome - either NCBI taxonomic identifier or UniProt species code.

  • chr1 (str or None) – ID of chromosome of interest in first genome

  • chr2 – ID of chromosome of interest in second genome :type chr2: str or None

  • rel_type – relationship type (‘1:1’, ‘1:many’, ‘many:1’, or ‘many:many’), optional

  • rel_type – str or None

  • progress (bool) – whether to show a progress bar during load (default False)

Returns

generator of pairwise relations.

Return type

ClientPagedResponse

ExternalReferences

class omadb.OMARestAPI.ExternalReferences(client)[source]

API functionality for external references from a query sequence.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
xrefs = c.xrefs('AAA')
__call__(sequence)[source]

Retrieve external references for a particular sequence.

Parameters
  • sequence (str) – query sequence or pattern

  • as_dataframe (bool) – whether to return as pandas data frame, optional

Returns

list of cross references and match information

Return type

ClientResponse

Synteny

class omadb.OMARestAPI.Synteny(client)[source]

API functionality for loading synteny data.

Access indirectly, via the client.

Example:

from omadb import Client
c = Client()
xrefs = c.synteny.xrefs('AAA')
at_level(level, evidence='linearized', break_circular_contigs=True)[source]

Retrieve list of HOGs at a particular level

Parameters
  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)

  • break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)

Returns

networkx graph of synteny

Return type

ClientResponse

neighborhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]

Retrieve neighborhood around a HOG or protein at a particular level.

Parameters
  • id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)

  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)

  • context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)

  • break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)

Returns

networkx graph of synteny

Return type

ClientResponse

neighbourhood(id, level, evidence='linearized', context=2, break_circular_contigs=True)[source]

Retrieve neighbourhood around a HOG or protein at a particular level.

Parameters
  • id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)

  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • evidence (str) – evidence value for the ancestral synteny graph, used for filtering. (‘linearized’, ‘parsimonious’, ‘any’). (default ‘linearized’)

  • context (int) – size of the graph around the query HOG, in terms of number of edges. (default 2)

  • break_circular_contigs (bool) – whether to break ancestral contigs on the weakest edge, when ancestral contigs end up being circular. This has no effect if evidence is not set to linearized. (default True)

Returns

networkx graph of synteny

Return type

ClientResponse

window(id, level, n=2)[source]

Retrieve window around a HOG or protein at a particular level.

Parameters
  • id (str) – unique identifier for either a HOG (for ancestral synteny), or a protein (for extant synteny)

  • level (str or int) – taxonomic level of interest. Ancestral levels accept scientific name or numeric TaxID. Extant genomes also accept UniProt 5-letter species codes.

  • n (int) – size of the window +-n around the query HOG. (default 2)

Returns

list of HOGs

Return type

ClientResponse

GOEnrichmentAnalysis

class omadb.OMARestAPI.GOEnrichmentAnalysis(annots, go, methods)[source]

Wrapper to GOATOOLs to make it more user-friendly with the API.

run_study(foreground)[source]

Runs the GOEA study and returns a dataframe of results.

Parameters

foreground (list) – unique identifiers for proteins in foreground set

Returns

enrichment analysis results

Return type

pd.DataFrame

ClientResponse

class omadb.OMARestAPI.ClientResponse(response, client=None, _is_paginated=None)[source]

Bases: omadb.OMARestAPI.AttrDict

AttrDict

class omadb.OMARestAPI.AttrDict(d)[source]

ClientPagedResponse

class omadb.OMARestAPI.ClientPagedResponse(client, response, progress_desc=None)[source]
__iter__()[source]

Iterates over all entries, silently lazily loading the next page when required.

as_dataframe()[source]

Retrieves all entries, from all pages. Returns as pandas data frame.

Note: in general, it would be better to use iter_dataframes to deal with the returned entries in chunks (if possible).

Returns

data frame containing all entries in response

Return type

pd.DataFrame

iter_dataframes()[source]

Yields dataframes for each page of response. That is, entries are yielded in chunks.

License

PyOMADB is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PyOMADB is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with PyOMADB. If not, see <http://www.gnu.org/licenses/>.