What SARS-CoV-2 Has Taught Us About Data Silos, Accuracy, and Accessibility

Realistic 3D Illustration of COVID-19 Virus Structure Diagram. Corona Virus SARS-CoV-2, 2019 nCoV virus scheme.

A blog article by Ashlie Reker Ph.D.

In an article published online for “The Scientist” (Q&A: Data Gaps Hinder Monitoring of SARS-COV-2 Variants) Jef (Jennifer) Akst interviewed Martha Nelson about the impediments to our understanding, tracking and defeating the SARS-CoV-2 (COVID-19) virus and ultimately ending the global pandemic. As a senior computational biologist at the Laboratory of Parasitic Diseases at the National Institute of Allergy and Infectious Diseases, Dr. Nelson has been tasked with sequencing the SARS-CoV-2 virus and its emerging variants. In this interview, Dr. Nelson emphasizes that documenting how, where, and when the variants appear, as well as the pattern, rate and efficiency of their transmission is crucial to our ability to manage the virus and, potentially even more critical, produce, organize, and distribute safe and efficacious vaccines.

Given the emergence and spread of these new variants, primary concerns for many people the world over are: ‘Are these variants more contagious and / or more dangerous?’ and ‘Will current vaccines protect me from these variants?’.  The CDC has substantiated that some of the variants are more contagious  (About Variants of the Virus that Causes COVID-19). What remains unknown is if current vaccines are effective against them and if they are more lethal. Dr. Stuart Ray, of Johns Hopkins, says that current vaccines “…could be less effective against some of the new strains”, while Dr. Robert Bollinger, also of Johns Hopkins, noted that preliminary evidence suggests that some variants may be associated with more severe disease (New Variants of Coronavirus: What You Should Know). Regardless of these unknowns, the fact that these variants are more contagious means they are more lethal – the greater the number of infections, the greater the number of deaths.

According to Dr. Nelson, a significant contributing factor to the current unknowns is the decentralization of data collection in the US which makes coordinating and sharing information “…fragmented and hard to interpret (Akst, 2021)”. At this time, GISAID and GenBank are the go-to resources that allow public access to sequenced viruses; however, GISAID does not allow direct data sharing. Alongside this dearth of data repository and access, Dr. Nelson cites the difficulty of obtaining and storing metadata, which provides the information that allows scientists to answer those critical questions of “how, where, when, and at what rate?”. Protecting patient privacy is not only a matter of ethics, but also of law. Currently, there is no standard for metadata collection, anonymization and storage, making sharing this information complicated and dangerous. Together these shortcomings are a significant hindrance to the contribution of sequencing data in mitigating the impact of CoVid-19 and being prepared for the potential of more dangerous variants and a thwarted vaccination campaign.

From pre-clinical to clinical, academia to international biotech giants, the problems of data silos, lack of standardization, and poor collaboration lead to large gaps in knowledge that can have dire consequences, as we are now seeing played out on the global stage. As Dr. Nelson has highlighted in the interview referenced here, difficulty in accessing comprehensive CoVid-19 sequencing data is prohibitive to the advancement of scientific knowledge, which will translate to a paucity of therapeutic development (see our post about “The Valley of Death” ).

Data scientists spend approximately 80% of their time collating data – time that could be spent turning experimental results into therapeutic solutions. Climb 2.0, from RockStep Solutions, offers a digital solution to aggregate and harmonize experimental data, metadata, and experimental protocols. As we continue to socially distance, the cloud-based platform allows team members to remotely analyze data and refine experiments in real time, while Microsoft security offers encryption protection for secure data storage and regulatory compliance to keep discoveries happening and treatments coming.


Akst, Jef. (2021). Q&A: Data Gaps Hinder Monitoring of SARS-COV-2 Variants. The Scientist.