UCSC and Amazon Web Services Work to Accelerate Genomics Research
UCSC NEWSCENTER. April 11, 2022. By Emily Cerf
The UC Santa Cruz Genomics Institute is collaborating with Amazon Web Services (AWS) to allow researchers to quickly and efficiently execute bioinformatics pipelines on AWS’s global cloud infrastructure. AWS and UCSC are committed to accelerating genomics research by integrating the Dockstore project, a leading repository for scientific and biomedical workflows created in part by Genomics Institute researchers, with the recently released Amazon Genomics Command Line Interface (CLI).
Researchers at the Genomics Institute seek to understand the mechanisms of human disease, and this work typically requires huge quantities of data and processing capabilities. Cloud-based platforms that store and run workflows make it possible to analyze such data, and are equally available to all researchers, independent of wealth and location.
Bioinformatics Tools In the Cloud
Dockstore, a joint development between the UC Santa Cruz Genomics Institute and the Ontario Institute for Cancer Research, acts as an app store for bioinformatics analysis tools and is used by scientists worldwide. It provides a global cloud library of analytical workflows, so that researchers can easily find and use existing analysis tools, facilitating large-scale biomedical research collaborations. Dockstore follows the principles of findability, accessibility, interoperability and reusability (FAIR) to promote the reproducibility of complex bioinformatics analyses.
The integration with AWS’s new open source tool for genomics and life science customers allows rapid deployment and execution of Dockstore-based workflows on Amazon Genomics CLI with a minimum of setup and configuration. In addition to all of the Dockstore-based workflows, the Amazon Genomics CLI natively supports Cromwell, miniWDL, Nextflow, and SnakeMake.
"Dockstore's ability to share bioinformatics workflows has already proven critical in federally funded projects such as NHLBI BioData Catalyst and NHGRI AnVIL that allow for secure, cloud-based genomics analyses,” said Benedict Paten, Associate Director of the Genomics Institute and professor in the Department of Biomolecular Engineering at the UCSC Baskin School of Engineering. “We are excited about this new collaboration, as it unlocks an entirely new category of users that can quickly utilize available workflows in the cloud to accelerate their research."
The new integration of Dockstore with Amazon Genomics CLI aligns with technical standards set by the Global Alliance for Genomics and Health (GA4GH), a global organization that ensures common standards across genomics research projects to enable portable genomic analysis and data-driven, life-saving therapeutics faster. These standards include Application Programming Interfaces (APIs) that may be used to allow interconnectivity between different computational platforms, overcoming barriers that limit productivity.
Specifically, the new integration allows Dockstore to execute workflows by using the GA4GH Workflow Execution Service (WES) API. Amazon Genomics CLI provides the WES endpoint that Dockstore utilizes, thereby allowing researchers to efficiently launch analysis on AWS cloud resources with little coding or intervention.
“Amazon Genomics CLI promises to simplify genomics analysis in the cloud,” said Dr. Taha Kass-Hout, Director of Machine Learning at Amazon Web Services. “Our new collaboration with UCSC allows quick utilization of existing bioinformatics workflows via the Dockstore repository, and will further enhance the opportunities for computational biologists to rapidly ramp up new research directions while using AWS’s proven global infrastructure.”
For more information on how to configure Dockstore with WES servers, visit this Dockstore blog post.