Washington Update

NIH Seeks Feedback on CGR Project

By: NIH Staff
Thursday, March 9, 2023
National Institutes of Health (NIH) seeks user feedback on NCBI (National Center for Biotechnology Information) data, tools, and resources used for comparative analyses. This feedback will help guide future development and inform ongoing efforts for its Comparative Genomics Resource (CGR) project.

CGR is a multiyear, National Library of Medicine (NLM) project to maximize the impact of all eukaryotic research organisms and their genomic data resources to biomedical research. CGR will establish an ecosystem that facilitates reliable comparative genomics analyses for all eukaryotic organisms. This ecosystem will feature an interoperable suite of NCBI repositories and knowledgebases that offer high-value data, tools, and interfaces compatible with community-provided organism resources. Visit the CGR website to explore the suite of data, tools, and resources that contribute to the CGR ecosystem.

The CGR project will also amplify opportunities for new discoveries that can be made from genomic sequence and metadata. NCBI is increasing potential contributions to research from all taxa, providing organism-agnostic tools and making it easier for users to find and create datasets with content from across the eukaryotic tree of life. Improved connectivity of data in the CGR ecosystem, through increased connections between NCBI and community-supplied resources, will help researchers make use of data from organisms they might not otherwise have known about. CGR project associated feature curation efforts promote data usability in comparative analyses. The addition of super-family architectures to more proteins supports protein naming and functional annotation and facilitates the identification of related proteins across different organisms. 

The CGR project has developed new analysis tools like the ClusteredNR BLAST database that provides faster searches, greater taxonomic reach, and easier-to-interpret results than the traditional nr database, and the Comparative Genomics Viewer for exploring whole genome alignments within and between species. Moreover, NCBI Datasets provides web and programmatic interfaces that specifically improve data discovery and retrieval for scalable analysis across species, offering a seamless experience to explore, analyze, and retrieve eukaryotic genome-related content..  

NCBI is also developing publicly available tools supporting genome quality and usability for data producers to use prior to data submission. The NCBI Foreign Contamination Screen tool, now available on GitHub, puts contamination removal in the hands of assembly producers, allowing them to do so prior to submission to NCBI. Likewise, the NCBI Eukaryotic Genome Annotation Pipeline is being made publicly available to enable the genomics community to create and submit consistent, high-quality annotations for assembled genomes from diverse taxonomic groups. 

Central to CGR are standardized public programmatic interfaces (APIs) and structured data packages for all genomic-related sequence and metadata data, such as those available from Datasets, and can work as part of common genomics workflows. This will support emerging big data approaches to comparative genomics analyses, such as facilitating the creation of Artificial Intelligence-ready datasets and cloud-ready tools to meet new research needs and accommodate anticipated data growth.  

How can you get involved? Send an email to cgr@nlm.nih.gov and:
  • Explain what you need to support your comparative genomics analyses and amplify your genomics data and tools
  • Provide product and tool feedback–especially if you use multiple CGR-related data, tools, and resources in combination
  • Volunteer to participate in a CGR feedback session or a tool testing session, or learn how you can connect your tools and data into the CGR ecosystem 
Keep up with the latest on the CGR project by: