The command-line version of Ontologizer can be used for batch processing or pipelines. Most general users will prefer the Java WebStart version, though.
Installation
If you want to use the command-line version, you need just need to download the Ontologizer.jar file.
Installation under Debian
Users that run Debian or a Debian-based distribution are recommended to install the command-line version of Ontologizer from Ontologizer’s Debian repository hosted at Bintray. To do so, first import Bintray’s public key:
$ su
$ apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 379CE192D401AB61
Then add the Ontologizer repository to your sources.list
file and update the package database like this:
$ echo deb [arch=all] https://dl.bintray.com/ontologizer/deb unstable main >>/etc/apt/sources.list.d/ontologizer.list
$ apt-get update
You are now ready to install Ontologizer:
$ apt-get install ontologizer-cli
$ exit
You may need to install the Debian package apt-transport-https
prior these steps in order to allow downloading packages
via the https
protocol.
Help
Ontologizer is a Java-Application and if you downloaded the Ontologizer.jar jar file, you need to start it via the java
command.
But if you installed it via the Debian repository, you can invoke it more conveniently via the ontologizer
wrapper script.
All possible command arguments along with a short description can be viewed via the --help
argument. E.g.
$ java -jar Ontologizer.jar --help
or
$ ontologizer --help
will produce following output:
usage: java -jar Ontologizer.jar [-a <file>] [-c <arg>] [-d <[thrsh[,id]|id]>]
[-f <arg>] [-g <file>] [-h] [-i] [-m <arg>] [-n] [-o <arg>] [-p <file>]
[-r <arg>] [-s <path>] [-t <arg>] [-v]
Analyze High-Throughput Biological Data Using Gene Ontology
-a,--association <file> File containing associations from genes to GO
terms. Required
-c,--calculation <arg> Specifies the calculation method to use. Possible
values are: "MGSA", "Parent-Child-Intersection",
"Parent-Child-Union" (default), "Term-For-Term",
"Topology-Elim", "Topology-Weighted"
-d,--dot <[thrsh[,id]|id]> For every study set analysis write out an
additional .dot file (GraphViz) containing the
graph that is induced by interesting nodes. The
optional argument thrsh must be in range between 0
and 1 and it specifies the threshold used to
identify interesting nodes (defaults to 0.05). The
GO term identifier id restricts the output to the
subgraph originating at id.
-f,--filter <arg> Filter the gene names by appling rules in a given
file (currently only mapping supported).
-g,--go <file> File containig GO terminology and structure (.obo
format). Required
-h,--help Shows this help
-i,--ignore Ignore genes to which no association exist within
the calculation.
-m,--mtc <arg> Specifies the MTC method to use. Possible values
are: "Benjamini-Hochberg", "Benjamini-Yekutieli",
"Bonferroni", "Bonferroni-Holm", "None" (default),
"Westfall-Young-Single-Step",
"Westfall-Young-Step-Down"
-n,--annotation Create an additional file per study set which
contains the annotations.
-o,--outdir <arg> Specfies the directory in which the results will
be placed.
-p,--population <file> File containing genes within the population.
Required
-r,--resamplingsteps <arg> Specifies the number of steps used in resampling
based MTCs
-s,--studyset <path> Path to a file of a study set or to a directory
containing study set files. Required
-t,--sizetolerance <arg> Specifies the percentage at which the actual study
set size and the size of the resampled study sets
are allowed to differ
-v,--version Shows version information and exits
In order to do something useful, Ontologizer must be started with several arguments (as indicated with “Required” within the output above).
Required Arguments
First, you are required to specify the -g
(or --go
) option. This defines the path to a file which contains the GO terminology and structure. Ontologizer is able to parse files in the OBO format. Such are available directly at the GO Website from http://geneontology.org/page/download-ontology.
Second, you are required to specify the -a
option which defines the mapping of gene names to GO terms. The GO Website provides association files for a variety of organisms, as well. See http://geneontology.org/page/download-annotations.
Third, you must specify a population file with the -p
option. This file contains all gene names (one per line) of the population set, e.g. the names of the genes of your microarray.
Last, you need to specify the path to your study set(s) with the -s
option. This can either be a single file for a single study set or a directory, in which case all files (ending with *.txt
) are considered as separate study sets. As for the population file, one line represents only a single gene name.
When started with these four parameters only, the output of Ontologizer’s calculation is written to a basic ascii table file into the same directory where the study files are located. The table’s filename is derived from the name of the study set in question but prepended with “table-” string.
Optional Arguments
Using the -d
option, you can instruct Ontologizer to create a graphical output of the results. For every study set, a file (name is prepended with “view-”) is written which can be read by the graphviz dot
tool to produce a viewable graphics file. In this file, terms are depicted by nodes and their hierarchical relations are depicted by edges. Because the GO DAG contains a huge amount of terms, the graph is constructed only for significant terms and their predecessors (up to the source) and those significant terms are highligthed. Which terms are considered as significant is influenced by their p-values and the significance threshold. This threshold is specified as a parameter to the -d argument. It must be a valid floating point value ranging from 0 to 0.5. For example use -d 0.05
to define those terms as significant whose p-value falls below 0.05.
In addition, you can specify a GO Term ID, after the floating-point value (separated by “,”). In this case only the subgraph starting at this term is written. For example use -d 0.05,8152
to get only a graph with the term id GO:0008152 (metabolism)
and its successors within the subgraph emanating from GO:0008152
such that all significantly overrepresented terms are included in the graph.
Some sample datasets and population sets can be downloaded from this page. To perform parent-child analysis using Westphal-Young MTC on the Yeast data set from the tutorial page and display the results using dot, enter the following command:
$ java -jar Ontologizer.jar -a gene_association.sgd -g gene_ontology.obo -s study/4hourSMinduced.txt -p population.txt -c Parent-Child-Union -m Westfall-Young-Single-Step -d 0.05 -r 1000
The corresponding files must be in the current directory (or their full path must be indicated). To create a PNG image with the result, enter
dot -Tpng view-4hourSMinduced-Parent-Child-Westfall-Young-Single-Step.dot -oExample.png
The corresponding graphic should look something like this:
Obtaining Graphviz
Graphviz is a Open Source project which provides tools for layouting and depicting graphs. Hereby, graph references to mathematical entities which consists of nodes and edges.
A installed Graphviz system is a requirement for Ontologizer’s facility to visualize the results of an enrichment analysis. More precisely, the tool named dot is invoked which performs the layouting step of the graph.
Windows users can obtain the installation package of Graphviz following the Download link of www.graphviz.org. On this site, you also can find the latest source packages which can be used, for instance by Linux, users to compile the package. Linux Distributions such as Debian provide binary packages for Graphviz as well. This allows a straithforward installation of Graphviz.
Note that in general, the standard installtion (of the Windows and Linux installation) is perfectly suited for Ontologizer. If however, for any reason, the dot exectutable is not in the command path, you have to tell Ontologizer explicitly where this command can be found by entering the full path within the Preferences window, which can be obtained via the Window > Preferences…. menu entry.