Welcome to the first newsletter for 2019! A belated Happy New Year, wishing you all a prosperous 2019.
After a nice summer break, January started with the usual fast pace of activities at H3ABioNet central (CBIO). We have been working towards many deadlines, including the NIH mid year report and node review submissions for our Scientific Advisory Board. Currently we have seven H3ABioNet trainers assisting the Wellcome Trust trainers in their NGS data analysis course running in Johannesburg. This will help to develop our own NGS trainers and course materials for future courses.
We are excited to announce the release of the new H3ABioNet website. Of course a website is always a work in progress but we decided it was time to replace the old website. The new site is more colourful, hopefully more user friendly, and does a better job at reflecting the activities and outputs of the consortium. This newsletter provides an article describing some of the new features of the website, but I encourage you to visit it for yourself at: http://www.h3abionet.org.
In this newsletter we also provide a report on some of the events we were involved in towards the end of 2018. This includes the H3Africa genotyping array and GWAS data analysis workshop, which took place in Cape Town in October, and the SciDataCon and International Data Week conference which took place in Gaborone, Botswana in November. H3ABioNet was well represented with talks and three organized sessions. Here we also report on our public outreach activities at the Science Forum South Africa, where we shared an exhibition booth with H3Africa.
H3ABioNet was well represented at the International Data Week Conference held in Gaborone, Botswana toward the end of 2018. IDW 2018 consisted of two joint events, the Research Data Alliance’s 12th plenary meeting, and SciDataCon, both of which focussed on the theme of “The Digital Frontiers of Global Science”.
Attendees from H3ABioNet included Nicola Mulder, Nicki Tiffin, Suresh Maslamoney, Shaun Aron, Faisal M. Fadlelmola, Shakuntala Baichoo, Amel Ghouila, Lyndon Zass, and Ayton Meintjes. These delegates presented during several sessions, including a discussion panel session titled “Big Data Challenges in the Biomedical field” where topics in H3Africa/H3ABioNet included the H3Africa Archive, Cloud workflows project, H3Africa array design logistics, and a discussion on minimum reporting standards for genomics research. Another session was entitled “The impact of data revolution on science and technology education and research in Africa: the perspective of bioinformatics“ where presented topics included: Running a webinar series across Africa; H3ABioNet’s multiple-delivery-mode approach for providing online training in Africa and Hackathons as a means of accelerating learning in the era of Big Data. H3ABioNet members also led an Open Science Café session around African of Open Science during which participants had informal discussions about advantages and challenges faced while adopting working open principles.
Over 800 delegates attended the conference, from a broad range of scientific fields, yet several topics of relevance to H3ABioNet were covered:
- Ownership of data collected from ethnic groups, indigenous knowledge
- FAIR data, with sessions covering tools such as FAIRSharing.org
- Data Access Statement policies of Scientific journals
- Health and Disease Surveillance systems
- Data Management Plans
At the close of the meeting a draft commitment statement was shared under the title “IDW Gaborone Statement”. The statement makes particular mention of the challenges and benefits the data revolution holds for African researchers.
With data generated or soon to be generated using the H3Africa genotyping array for genome wide association studies (GWAS) by the H3Africa projects, H3ABioNet organized its first ever bring your own data (BYOD) workshop in October 2018. The H3ABioNet H3Africa Genotyping and Array workshop is unique to previous workshops organized by H3ABioNet in that it was primarily application oriented. Given the time constraints for an application based workshop involving upto 20 participants bringing their data for analysis, the limited number of experts and trainers on site and the huge demand for learning about GWAS, the organizers decided to split the workshop into two parts. The first part of the workshop consisted of seven theoretical lectures commencing with computational requirements to analyze GWAS data, an overview of GWAS and study design, genotype calling from an Illumina array platform, the importance of quality control in a GWAS, population structure, imputation and its importance in GWAS and ended with statistical models used for GWAS. Each of the seven lectures were delivered live online by experts working in the field and the recorded lectures are available on the H3ABioNet YouTube channel (https://tinyurl.com/gwas-h3abionet). Attendance at the theoretical lectures was mandatory for the participants who were selected to attend the practical workshop, and open to anyone that would like to attend any of the lectures. This ensured that all the participants had the pre-requisite background knowledge and the workshop could focus solely on the analysis of their datasets, and people wanting to learn more about GWAS could also do so.
The H3ABioNet H3Africa Genotyping Array and GWAS workshop also marked a couple of firsts for H3ABioNet. The workshop made use of the containerized GWAS and imputation workflows developed by H3ABioNet at a prior hackathon (find them by clicking here or here). Containerization enables the development and packaging of a user defined software stack in an encapsulated compute environment which can be executed on heterogenous compute environments from desktops, clusters, high performance computer centers (HPC) and cloud environments. In addition to containerization, workflow languages were used to convert the analysis pipelines to workflows which have the advantage of enabling automation when running an analysis from the start to the end of a bioinformatics pipeline. This removes the need for the manual execution of each step of a pipeline, allowing the scientist to make multiple runs of an analysis and investigate how the use of various parameters affect their results. Workflow languages combined with containerization are a boon given the numerous types, versions and parameters usually contained in bioinformatics software and enables reproducibility of an analysis, an important factor when doing genomics where some analysis can take days to run, and the same amount of time to reproduce the analysis. The H3ABioNet GWAS workflow and containerization development is spearheaded by the University of the Witwatersrand’s H3ABioNet Node led by Prof. Scott Hazelhurst and is available from the H3ABioNet GitHub repository.
Another first for H3ABioNet was the creation of a cloud compute environment using ILIFU, a cloud computing based platform developed for data intensive research, which H3ABioNet is a major contributor to the hardware and software stack. A virtual compute cluster was created and participants asked to generate and provide their public ssh keys via a special Slack group created for the workshop to be added to the virtual cluster and for H3ABioNet to make use of ILIFU as Infrastructure as a service (IaaS).
In total, 17 of the 20 selected participants from 6 different African countries attended the workshop with some participants bringing their own data. Simulated datasets were provided in the cases where participants were expecting their data to be generated soon. Participants started the H3ABioNet H3Africa Genotyping Array and GWAS workshop with a talk about the design and specifications of the H3Africa genotyping array provided by Prof. Nicola Mulder. The participants delved into some introductory HPC and commands specific to using a Slurm scheduler, how to pull a containers and execute commands. An introduction to Nextflow was provided by Prof. Scott Hazelhurst and the workshop participants given time to familiarize themselves with the computer environment being used. Over the course of the workshop, participants used the H3ABioNet GWAS workflow to quality control their data and explore population structure within their data or the simulated datasets. An excellent design feature of the H3ABioNet GWAS workflow is its ability to generate detailed reports, graphs and parameters used of the various steps executed for one to visually inspect and make any corrections, or compare results when running the workflow and exploring the various parameter spaces of the different tools.
Additional practical topics covered included imputation, visualization of cluster probe plots using the H3Africa Genotyping Array Web Annotation server developed by Ayton Meintjes from the CBIO Division at the University of Cape Town, visualization of GWAS signals using LocusZoom for fine mapping. An introduction to variant calling file formats and tools for predicting the effects of variants identified and other post-GWAS tools that can be used to analyze variants through interactive protein modelling and other tools available on the Human Mutation Analysis platform (HUMA) developed by Prof. Ozlem Tastan Bishop’s group at the Rhodes University H3ABioNet Node (https://huma.rubi.ru.ac.za/). At the start of the workshop participants were asked to prepare and present on their practical work conducted during the course at the end in order to give the participants some guidelines and experience in presenting GWAS results at a conference. The topics covered in the presentation included number of samples in the study, samples that pass QC, interpretation and explanation of the QQ and Manhattan plots generated, the statistical association tests used and their results, key hits found and post GWAS analysis of these hits using Variant Effect Predictor and HUMA. Input and suggestions were provided on the results to the presenters by the trainers and course participants.
The first H3ABioNet BYOD workshop, use of the H3ABioNet workflows for GWAS and the use of ILIFU as a cloud based IaaS was successful due to the great planning, organization by the teaching and technical H3ABioNet and AWI-GEN team assembled comprising of Prof. Nicky Mulder, Shaun Aron, Prof. Scott Hazelhurst, Dr. Ananyo Choudhury, Ayton Meintjes, Mamana Mbiyavanga, Gerrit Botha, Caro Ross, Suresh Maslamoney, Verena Ras and Sumir Panji.
The H3ABioNet H3Africa Genotyping and Array workshop workshop was quite intense due to the large practical component and a great camaraderie was formed between participants and trainers, a quarterly followup call is planned to followup on the progress and any issues the workshop participants may encounter as they continue with their research.
H3ABioNet and H3Africa were well represented with some colleagues manning the very popular stand and others attending the different talks on offer interchangeably. Students and other youth, academics, policy makers as well as other researchers and scientists, both local and international who were in attendance at SFSA2018 visited our exhibition stand in their numbers. We had a very popular game were kids were asked to recreate the image of a bacteriophage using colourful sticks we offered. True to the theme of SFSA2018, robust discussions and conversations about science were ignited!
As part of starting imperative dialogues with diverse thinkers and scientists from a whole array of fields and multi-disciplinary spheres, Nicola Mulder and Jantina de Vries gave a joint presentation titled “Is Africa ready to share it’s Genomic Data?” which was well received and sparked challenging and meaningful discussions amongst the audience.
We gave away free T-shirts and sweets to some of those that were able to complete the bacteriophage and robustly engage with us. This image is of Charlotte Nukeri, wearing our highly coveted free T-shirt.
Science Forum South Africa 2018 was a major success and we would like to congratulate the organising team on such an important event that provided space for scientists, and policy-makers and ordinary civilians a rare opportunity to engage with different forms of science.
After 5 years of the working with the original H3ABioNet website we decided to redesign it to refresh the look and feel and to better reflect the activities of H3ABioNet II. The previous website was difficult to navigate and did not adequately display our activities and outputs. The new website framework was driven by the structure of the project which is now divided into work packages. Though not all work packages have a dedicated sub-menu, all the major projects are represented within one of the sections. For tools and resources being developed, these will be included when they are complete and in production. The main menu tabs include:
- Tools and Services
- Data & Standards
The home page now provides an overview and quick links to key pages.
The new sections include the “Resources” and “Data & Standards” pages. Though we have been working in these areas for some time, we did not display them on the previous website. The Data and standards section provides links to useful guideline documents and to the Standard CRF package of documents and REDCap templates developed in collaboration with the Phenotype Harmonization working group.