|
On this page you will find links to public bioinformatics products of the Generation Challenge Program. The Generation Challenge Program subprogram 4 participated in these projects and delivered smaller or larger contributions.
The last paragraphs of this page lists information and links mainly intended for developers, but off course also for all others interested !
Generation Challenge Programme SP4 products: analytic software
Dayhoff
The GCP has developed Dayhoff, the ortholog gene catalogue (http://dayhoff.generationcp.org/). Dayhoff documents stress-responsive genes comparatively across plant species. It is a compendium of protein families, phylogenetic trees and multiple sequence alignments with the associated experimental evidence. It serves to elucidate orthologous and paralogous relationships between plant genes that may be involved in response to environmental stress, mainly abiotic stresses such as water deficit (‘drought’). For example, the comparison of microarray data about drought stress obtained across diverse crop species can be analysed in a comparative manner to identify common gene expression profiles under similar stresses. The web site includes query and visualization tools and allows searches and browsing of the underlying project database.
GreenPhylDB
GreenPhylDB is a phylogenomic database for plant comparative genomics. GreenPhylDB offers the most complete plant family list and is manually curated. The major part of these families have been analyzed with the CIRAD optimized phylogenomic pipeline. In addition, GreenPhylDB includes GOST (GreenPhyl Orthologs Search Tool) to predict phylogenomic relationships between any plant sequence gene and O.sativa /A.thaliana gene(s).
GOST
GOST (GreenPhyl Orthologs Search Tool) is a web site developed by GCP to predict phylogenomic relationships between any plant gene and Oryza sativa and / or Arabidopsis thaliana gene(s). GOST rapidly infers ortholog relationships by integrating a new sequence into a model phylogeny developed on these plant species. GOST is based on GreenphylDB, a phylogenomic database for plant comparative genomics and the most complete plant family list manually curated. GreenPhylDB is the comprehensive platform for comparative genomic analyses of Arabidopsis thaliana and Oryza sativa full genomes. (back to top)
DARwin
DARwin (Dissimilarity Analysis and Representation for Windows) is developed as a software package by GCP. It offers diversity and phylogenetic analysis. Nucleic sequences, amino acid sequences, molecular markers, and qualitative or quantitative characters can be utilised to infer and transform dissimilarity matrices. Different algorithms are available for factorial analyses, tree construction and the accession of their reliability. Several trees based on the same data set can be compared. Ease readible graphics can be made.
iMAS
The Integrated Marker-Assisted Selection System (iMAS) software is an integrated molecular breeding analysis platform. The GCP supported the development of the iMAS. The iMAS provides simple-to-follow guidelines to assist users with the selection of the most appropriate experimental design and data analysis methods. It offers the users a regularly updated selection of the currently most appropriate options. The iMAS performs the required reformatting of datasets for all included software.
DIVA-GIS
DIVA-GIS is a free and open source geographic information system (GIS) to make maps of species distribution data and analyze these data. DIVA-GIS 6.0.3 was specifically developed by GCP at CIP for use with genebank data such as available through national or international genebank documentation systems and SINGER, the System-wide Information Network for Genetic Resources the germplasm information exchange network of the CGIAR and partners. The DIVA-GIS program is particularly useful for mapping and analyzing biodiversity data, such as the distribution of species. A program with a similar name and comparible functionalities. DIVA-GIS version 5.4, was indepently developed from the historic DIVA-GIS source code files. The two programmes are not compatible. (back to top)
CMTV
The Comparative Map and Trait Viewer (CMTV) is a graphical user interface developed for constructing complex visualizations of genetic and genomic data collected from a variety of sources. It was designed to provide researchers with an intuitive mechanism to compare and integrate their own local mapping data with data from public databases, as well as to enable the aggregation of data from disparate sources into a common representational framework.
RiCES: Rice Cis-Element Searcher
RiCES is a cis-element searching tool optimized for rice (Oryza sativa) genome. It takes as input a list of identifiers of transcription unit IDs of Oryza sativa, defined in Pseudomolecule corresponding to KOME full length cDNA clones, and returns a list of cis-element sites that appear to be shared across the listed gene loci. The motif searching program, MEME, is applied in the front step of the analysis. The RiCES tool is available as webservice.
RiceGeneThresher 2.0
RiceGeneThresher is a web-based application for mining genes underlying QTLs in the Rice genome. RiceGeneThresher incorporated the “The Generation Challenge Programme Comparative Plant Stress-responsive Gene Catalogue” on http://dayhoff.generationcp.org/ for an online resource documenting stress-responsive genes comparatively across plant species (Wanchana, Thongjuea et al. 2008). RiceGeneThresher provides user friendly interface to find the stress-responsive genes and their location in the genome region of interest. (back to top)
HPC High Performance Computing
The GCP High Performance Computing (HPC) facility functions as a webservice and consists of a global grid of 4 cluster systems.
All systems offer the Paracel BLAST routine with the Paracel Bioview Workbench web interface (http://coe04.ucalgary.ca/bwb/)
Several different additional programs and web interfaces are being developed for each site. HPC Project Manager is Anthony Collins at CIP a.collins@cgiar.org.
Passwords are freely available upon request: contact the HPC Bioinformatics system administrators indicated at the HPC sites below.
HPC at ICRISAT, Hyderabad, India
Besides the above mentioned Paracel BLAST service (for internal users only), you will find at this HPC the Comparitive genomics pipeline, with programs such as the SNP Pipeline, MUSCLE, ClustalW, Tree Puzzle and CISPrimer Tool. In addition, the Population genetics section offers the File Format Converter and VisualStruct.
Contact: B Jayashree, b.jayashree@cgiar.ORG
HPC at IRRI, Los Banos, Phillipines
At this location you can find the BioView Workbench, EMBOSS (software tools for sequence analysis) and RiCES (Rice Cis-Element Searcher).
Contact: Richard Bruskiewich, r.bruskiewich@cgiar.org (back to top)
HPC at CIP, Lima, Peru
At the CIP HPC you can find the BioView Workbench from Paracel and use the GCP HPC Structure WebService.
Structure is a free software package for using multi-locus genotype data to investigate population structure. It can be applied to most genetic markers, including microsatellites, RFLPs and SNPs. GCP HPC Structure allows multiple execution of the program structure.
You can enter the CIP HPC facility as guest, with username CIP-HPCTEST, password: hpctest
Contact: Reinhard Simon, r.simon@cgiar.org
HPC at ILRI, Nairobi, Kenya.
At the comprehensive and well-designed website of the HPC at ILRI you can select a wide array of applications devoted to sequence analysis. Artemis, ACT, Interproscan, EMBOSS, wEMBOSS, Primer3, SignalP, STADEN, Similarity Searching, Paracel Blast and FASTA can be accessed from the compatible, non-GCP CGIAR site at ILRI.
Contact: Etienne de Villiers, e.villiers@cgiar.org (back to top)
Bioinformatics learning materials
Learning materials: online self-study courses and tutorials
For the benefit of breeders and plant scientists a large number of bioinformatics programs and products were developed by the GCP SP4. In close collaboration with the technical subprogrammes, the GCP SP5, for Capacity Building and Enabling Delivery, produced several online training courses. These courses are aiming to generalize and maximize the use of the bioinformatIcs commodities.
- The McClintock Crop Bioinformatics Course, a joint project between IRRI and the GCP. Designed to demonstrate how basic bioinformatics tools, techniques and resources can help to effectively manage sequencing projects.
- Genomics and comparative genomics. The principal audience includes plant scientists engaged in genomics research. Developed jointly by Cornell University’s Institute for Genomic Diversity and the GCP.
More information can be found here at the capacity building corner of SP5. (back to top)
ICARDA started the development of a user friendly Laboratory Information Management System (LIMS) and Genomic Management System (GeMS) specifically to target the protocol, processes and materials in a Molecular Laboratory setting. At ICARDA, LIMS and GeMS services are offers researchers tools to store, organize and analyze research data. Genomic LIMS offers capture, storage and documentation of laboratory protocols, procedures, practices and output data. GeMS provides storage and management of molecular marker data in readily useable formats. Molecular data can be integrated with information on genealogy, phenotype and geography. A software package can be downloaded from this page. (back to top)
Central Registry and Templates
The GCP Central Registry provides access to research data produced with the support of the Generation Challenge Programme. At present the GCP Central Registry contains 152 datasets.
Dataset upload: The GCP partners can register their dataset and upload the files they want to share with the other partners at the GCP-CR web page. GCP partners will then be able to search and find the dataset and files.
Dataset download: Not all the GCP datasets described in the Central registry are available or even finalized yet. The data provider will decide if download of the file will be granted to the public, or temporarily to the GCP Consortium members only.
The Generation Challenge Program (CGP) is generating a wide spectrum of data. The objective of the Data Templates task is to provide simple templates for the temporary storing or distributing of the different data sets that are being produced within SP1, SP2 and SP3, for which there is no current provision in public or institutional databases. For more information on the GCP Templates task please refer to the Data Submission Templates sub section on Domain Modeling pages. (back to top)
Domain Modeling , Ontology and other projects
In 2005, the Generation Challenge Programme Subprogramme 4 for crop informatics defined an agenda of commissioned research and development that includes the development of so-called domain models to help the Consortium capture, encode and structure CP research data for further analysis and distributed (internet) dissemination. For more information on the GCP Domain Modeling task please refer to the Domain Modeling section.
GCP Pantheon represents an elaborate website, devoted to data exchange and information system interoperability. The domain modeling is the central part of the GCP Pantheon. Here extensive sections on the ontology of different domains are under development.
An ontology depicts the formal representation of a set of concepts within a domain and the relationships between those concepts. On the page http://ontology.generationcp.org/ a detailed catalog of ontology adopted and/or under construction by the GCP is provided. State of the art information on ontology is found on the GCP Pantheon pages.
Web Services in the Generation CP: the goal of this site is to provide standard structured access to the great wealth of information the Generation Challenge Programme generates. Web service technology has been chosen as the core foundation of the data exchange network. The informatics network allows sharing and analysis of scientific data, regardless of format or platform. Detailed information on technologies, development and installation is avialable.
CropForge, GCP's software throve, is a collaborative software development site, providing tools and a centralized workspace for developers to control and manage software development. CropForge contains information on projects and software with documents such as manual/guides for different users (administrators, developers, etc), presentations, articles and papers.
On this page you can see the 24 projects which are currently under development. Detailed insights into development status, release data, developers, comments and download sites are at disposal. (back to top)
GCPwiki
GCPWiki. The Generation Challenge Program Wiki is a wiki site for technical and discussion documents of the Generation Challenge Program (GCP). The site contains home pages for each GCP subprogramme and is organized hierarchically into sections for GPC management and the five subprogrammes. However, the site consists primarily of (work-in-progress) from Subprogramme 4, and a few pages from informatics-focused activities in Subprogramme 5 (Capacity Building). An user account, freely available upon request, is required in order to see and edit the content of this site.
This site is not only interesting for managers or developers, but definitely for crop scientists as well. Some examples are:
- Assessments of Analysis Tools for SP1, SP2 and SP3. In this section assessments of data analysis tools for GCP subprogrammes 1 (Genetic Diversity), 2 (Comparative Genomics) and 3 (Molecular Breeding) are published.
- Bioinformatics courses: Crop Bioinformatics is an introductory two-week bioinformatics online course intended for scientists with a sensible background in germplasm, biology and genetics. The course syllabus can be found here. (back to top)
|