Canadian cancer researchers claim that by combining open source software with commodity hardware, they can give academics in the field access to public cloud-like resource at 60% less cost than Amazon can offer, and help to accelerate the pace of scientific breakthroughs.
The Ontario Institute of Cancer Research (OICR) is focused on accelerating the time it takes patients to benefit from emerging cancer research findings, through the roll-out of new treatments and diagnostic tools, for example.
OICR is a non-profit organisation, funded through grants from the government of Ontario, Canada, with ties to the International Cancer Genome Consortium and the Global Alliance for Genomics and Health.
International collaboration with research teams from across the globe plays an important role in the work OICR does, and is made possible through the institute’s cloud-based Cancer Genome Collaboratory.
The resource, launched in 2014, runs exclusively on open source, with OpenStack, Ceph, Ansible and Linux being among the many technologies used to run it, in combination with high-density commodity server hardware.
It comprises 2,600 cores, 18TB of RAM, 7.3PB of storage (managed by Ceph), and 670TB of protected cancer genome data, which is accessed by cancer research teams across several continents for analytics purposes.
Speaking to Computer Weekly at the recent OpenStack Summit in Vancouver, George Mihaiescu, senior cloud architect at OICR, said the Collaboratory was created, in part, to open up access to the data that researchers need to advance the medical community’s understanding of cancer.
The datasets generated by the sequencing of the cancer genomes are huge, for example, making it difficult for researchers to download and store the information they need to carry out their work.
The Collaboratory helps researchers sidestep these issues by acting as a centralised storage hub for the data, while providing them with the compute resources to analyse it, on a self-service basis.
“There are lots of savings from having everything in one place for researchers, as it is basically saving them time on downloading this data, and budgets, because they don’t have to build this themselves only to use it for a few months or whatever,” said Mihaiescu.
The self-service aspect is designed to provide researchers with a similar user experience as if they were to engage with a public cloud provider to access the compute power they need, but at a much lower cost, said Mihaiescu.
“We charge [the researchers] for the usage, using the same billing model as the cloud providers, just that the prices we have are about 40% of those of the leading cloud providers, making it so much cheaper to do the analysis in the Collaboratory than in the Amazon cloud,” he said.
This is made possible by OICR’s decision to build the Collaboratory on open source software, coupled with the fact it runs on commodity hardware, which is all supported in-house.
“Because we use open source, not only can we troubleshoot and implement features we need and want, we can also support it ourselves, we can install it on commodity hardware, and this gives us a very low cost for building the environment – and this low cost means we can add more capacity,” said Mihaiescu.
“We couldn’t do this if we used a paid distribution [of OpenStack], where you basically pay for the hardware, the support, and pay for licensing, and the more you want, the more you pay, because all this would have taken away from budgets for capacity and research.”
The do-it-yourself approach also means that the OICR team could create a tailor-made environment that takes into account the huge amounts of storage and compute power required to stand up the Collaboratory, he said.
“My background is not in bioinformatics or cancer research – my background is purely in IT,” said Mihaiescu. “But, at the beginning, I helped to manage a few environments where they were doing this type of analysis work, so I was able to do system monitoring on those virtual machines.
“I could see the type of stress these [workloads] put on those systems, so when I started the design phase, I knew what was important, what to focus on and what I can go away with, and basically craft this environment and customise it for cancer research.”
As such, the set-up gives researchers access to virtual machines containing up to 30 cores, which also come equipped with 244GB of RAM and more than 5TB of storage.
“In Amazon, a similar type of instance with eight cores, for example, has much less memory because they didn’t design it for this type of use case, but for more general cloud computing,” he said.
Such is the OICR team’s commitment to using open source, that it recently started packaging up and sharing the components of the Collaboratory for the benefit of the wider scientific community, through the launch of its Overture initiative.
“A lot of tools can be used to build similar genomics portals and systems for securely downloading protected data,” added Mihaiescu. “Not only for genomic data, but also for clinical data or other data that you want to share with a group of researchers in a secure and controlled manner.”