The Rayner Group
The primary goal of the Computational Biology group is to develop a secure, user friendly data solution to integrate and share the data being generated by the Centre. However, the broad range of expertise within the Centre makes this particularly challenging as each group generates diverse data types. Moreover, rather than working as independent units, the HTH and partner research centers commonly work together in cross-functional teams to achieve particular goals. This means that the data handling solutions need to be dynamic to accommodate the changing needs of the Centre.
We need (i) cloud storage, (ii) a version control system (iii) a system to ensure metadata quality & (iv) the ability to implement privacy and security requirements as necessary. Standard tools such as Dropbox or Google Drive provide a shared solution, but fail to provide the required privacy and cannot be shaped to our specific needs. Hence, to meet the four targets described above, we have been developing our own cloud based solution.
Our solution implements a distributed mult-layer architecture, deployed through Docker containers and orchestrated by Docker Swarm. All partner institutions of the Centre support this architecture, which makes it feasible to build a cross site solution that can incorporate additional partners if needed.
The cloud storage is implemented using MinIO, providing data encryption, identity management and access control. Version control is handled using a S3 implemented version of Git. Data quality is handled by requiring users to supply data descriptors in the form of metadata, and then using a Hyperledger solution to ensure users adopt these descriptors. By bringing the described technologies together, we can fully integrate data in a way that reflects the research that is being performed in the Centre. For example, an OoC experiment might bring together a cell population; a 3D printed organoid; a chip design with associated microfluidics; and a sensor set to capture the data generated by the integrated system. Each of these components must be characterized individually and then collectively as part of the final system. The version control system allows us to track the evolution of the individual component design and link them to the experimental data as it is collected for the integrated design. As users are required to include metadata in order to submit their data, this will ultimately provide us with a rich data set that can be used to help interpret experimental results and aid the design of future experiments.
A further advantage of this data solution is that is achieves a level of compliance with FAIR (Findability, Accessibility, Interoperability, Reusability) goals. This is something that is supported by the Research Council of Norway (RCN) and the European Research Council (ERC). Our architecture is flexible enough to be adopted by other larger scale projects and can help them meet the FAIR data goals of the RCN and ERC.
To date, we have been working with test data but our next goal is to demonstrate the value of our work by performing a standardization test across the three nodes of the Centre (Oslo, Glasgow & London) using a shared chip design and cell populations. To our knowledge, this will be the first study of its kind in the organ on a chip research field.
Selected Publications (see Google Scholar for full list):
- Han, N. et al. Comparison of Genotypes I and III in Japanese encephalitis virus reveal distinct differences in their genetic and host diversity. Journal of virology, JVI. 02050-02014 (2014). 
- Wu, Y., Wei, B., Liu, H., Li, T. & Rayner, S. MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences. BMC Bioinformatics 12, 107, doi:10.1186/1471-2105-12-107 (2011). 
- Deng, X., Liu, H., Shao, Y., Rayner, S. & Yang, R. The epidemic origin and molecular properties of B′: a founder strain of the HIV-1 transmission in Asia. Aids 22, 1851-1858 (2008). 
- Ye, Y., Wei, B., Wen, L. & Rayner, S. BlastGraph: a comparative genomics tool based on BLAST and graph algorithms. Bioinformatics 29, 3222-3224 (2013). 
- Chen, P. et al. Computational evolutionary analysis of the overlapped surface (S) and polymerase (P) region in hepatitis B virus indicates the spacer domain in P is crucial for survival. PloS one 8, 4 (2013).