Welcome to the August 2020 Newsletter. I find it hard to believe we are still in lockdown and have been since late March. I am getting used to working from home but still miss seeing and interacting with my team face to face. Nevertheless, work goes on and all projects are progressing well. This was particularly apparent at the H3ABioNet virtual SAB meeting which took place during the first week of August. I was proud of the project leads and team members as they demonstrated significant progress in major projects, moving from ideas and concepts this time last year, to functioning prototypes. We had excellent attendance at the SAB meeting with up to 100 participants. The only shame is not being able to interact over coffees and a good Cape Town dinner.
This newsletter highlights some of the recent activities, including a competition encouraging entrants to express “bioinformatics” in their home language and through poetry. I loved reading the excellent poems, some of which are shown in the article below. We obviously have creative talent as well as good scientists in our network. In this newsletter we also get tips on working from home, application of machine learning, and we meet one of our PIs from Morocco. Enjoy reading these articles.
Machine learning – Associate Professor Shakuntala Baichoo
Tell us a bit about yourself
I work as an Associate Professor at the University of Mauritius (UoM), in the department of Digital Technologies. At the UoM, my work consists of mainly teaching (including project supervision), doing some administrative tasks and undertaking research. My research interests are mostly in cancer genomics and application of machine learning to health and bioinformatics.
I am the site PI for the UoM H3ABioNet node; My involvement in H3ABioNet also consists of (i) co-chairing the Pipeline and Computing work package, (ii) contributing to projects such as Development and Deployment of Workflows, microbial GWAS and machine learning, and (iii) leading the machine learning project (as from April 2020).
What does Machine learning entail?
Machine Learning (ML) is a subset of Artificial Intelligence (AI), which focuses on the ability of machines to receive a set of data and learn from it, while improving the algorithms as they learn more about the information being processed. The recent popularity of ML has been fuelled by the abundance of data, improved storage systems and high-performance computing. Broadly ML can be categorized into two main types, namely supervised and unsupervised learning.
In supervised learning the algorithm learns from past data using labelled examples and applies this knowledge to new data to predict future events. More precisely, using labelled data the algorithm creates a model that can be used to make predictions on unseen data. Supervised learning can be further categorized into classification and regression. In classification systems we try to seek a yes-or-no prediction, “What category does ‘x’ fall into?”. For example, a model is trained with histopathological images of cancerous and non-cancerous breast tumours and used to classify unlabelled histopathological images of breast tumours as cancerous or non-cancerous.
In unsupervised learning the algorithm takes unlabelled data (without any outcome) and tries to find patterns and create structure in the data, in order to derive meaning. Unsupervised learning can take two forms, namely clustering and dimensionality reduction. In clustering similar data are grouped together such that observations in one group should have similar properties or features to one another, while data in different groups should have highly dissimilar properties. In clustering problems, we ask questions such as “What group does this fall into?”, for example looking at the genomic profiles of breast cancer patients as a whole we try to find how they group together.
Apart from supervised and unsupervised learning there are two more categories of ML algorithms, semi-supervised and reinforcement learning. Semi-supervised algorithms fall somewhere in between supervised and unsupervised learning, i.e. in this setting the algorithm receives a collection of data points, but only a small subset of these data points have associated labels and a large portion consists of unlabelled data. On the other hand, reinforcement learning uses a reward system and trial-and-error in order to maximize the long-term reward - the learning system interacts with the environment by producing actions and discovers errors or rewards. In recommendation systems, we ask questions such as “What options should we take?” This would be based on the history of the data we have, and the system tries to find out what is the best option going forward. A common scenario would be a product recommendation on an online shopping system.
In your experience, what is most frequently asked with regards to this category?
- Taxonomy of machine learning
- Machine learning algorithms
- Implementation of machine learning applications
- Evaluation of machine learning models
How to get started with machine learning?
- If you don’t have any coding experience, choose a programming language and try solving problems to get accustomed with the syntax (e.g. R or Python).
- Start brushing up on your basics of Linear Algebra, Statistics & Probability and Calculus which have already been taught in high school.
- Follow an online machine learning course from sites such as Harvard edx, Coursera, Udemy or Datacamp.
- Read a few recent papers where they discuss the use of machine learning to solve problems in your area of interest.
- Choose one interesting paper which also provides the methodology and data they used and try to repeat the same machine learning experiment they did and compare your results to ensure that you have done it correctly.
- Find an interesting problem, get the necessary data, start exploring the data and then proceed to machine learning
Are any online resources available where I can find more info on machine learning?
- Data Science from Medium
- Machine learning from Datacamp Community tutorials
- Machine learning from Kaggle
- Machine learning and News articles from Medium
- Free theoretical course on Machine Learning - Coursera — Machine Learning (Andrew Ng)
- Udacity — Intro to Machine Learning (Sebastian Thrun)
- Data School (Youtube)
Africa Day Celebrations and Musings: Whose Africa is it anyway? Using Linguistics and Poetry for Bioinformatics Improvement in the continent
By: Paballo Abel Chauke
Africa is one the most diverse, complex and talented continents with a very interesting history and contemporary reality. The continent has many talents in terms of science, medicine and culture but equally faces debilitating challenges in terms of poverty, lack of funds and skills to name but a few of the myriad of forces and factors intersectionally at play in the achievement of Africa’s Development Goals. Notwithstanding the many issues, Africa has so much to celebrate and look forward to, hence why at H3ABioNet with our passion for improving the status of Bioinformatics throughout the continent and our belief that young African scientists are the future of science, we use multi-layered approaches such as training and outreach for skills transfer. This year with everyone working and studying from home due to the truly unprecedented global scourge and panic brought about by the COVID19 pandemic, we had to become extra creative in how to continue the sense of camaraderie, community and support for our members, affiliates and beyond. We organized the first ever Bioinformatics and Linguistics Poetry Competition which was very successful, we are grateful to all the entries and voters (see image below for the advert used).
Annually on the 25th of May, Africans from across the continent celebrate “Africa Day”, a day intended to mark the achievements of the African Union (formerly OAU) created on this fateful day in 1963 to fight oppressive regimes such as colonialism and apartheid. Using these celebrations as a stepping stone, on the 25th of May 2020 which was #AfricaDay we, as H3ABioNet, invited staff and students from across the continent to Celebrate Africa Day with us through our Bioinformatics and Linguistics Poetry Competition. This year (2020) was the first time such an event, which brought together training and outreach, science, linguistics, art and culture into a melting pot, amalgamating and exposing African talent as well as creativity. Eager participants were asked to write a short Poem about #bioinformatics as well as the uses, benefits, challenges etc in the African context and continent. They were further asked to translate the word bioinformatics into their own mother tongue. The hashtag we used both on Twitter and Facebook was #H3ABIONETAFRICADAY #AfricaDay2020. According to the United Nations Africa has 54 countries with more than a thousand languages spoken, with such diversity, it is sad and an indictment on us that most words in scientific jargon of fields such as bioinformatics do not have translations in local languages that most people can understand. It is thus my view that language is an important tool in education and science, so although the competition was a small drop in an ocean of solutions, it might spark something bigger.
How the competition ran: Entrants sent us a very short poem about bioinformatics and/or in Africa and translated the word #Bioinformatics into their own vernacular language. They shared it on social media (Twitter and Facebook) and tagged us using #H3ABIONETAFRICADAY. The most liked entry won a voucher (worth $50!!!) and is featured in our monthly newsletter below. There was so much interest that we had to give two prizes instead of one, Margo and Richard won (hearty congratulations to them). This is the first time this competition ran, and we will definitely run it again next year and future years going forward. There was a significant number of entries and likes and interest in this competition. We received over 14 entries- with thousands of votes both on Facebook and Twitter. We particularly enjoyed seeing the word bioinformatics translated into multiple African languages and it is interesting to note that it usually becomes a whole sentence or two!
As earlier mentioned, we originally promised one prize but with the quality of entries and amount of votes we decided to make it two winners. So congratulations to our winners and thank you to everyone who entered the competition and made the first #H3ABIONETAFRICADAY Bioinformatics competition a resounding success. The winners who received their $50 Amazon vouchers are:
Richard Adeleke- Twitter winner
Margo Sabry- Facebook winner
Two additional outstanding submissions!
1. Tell us a bit about yourself
I am Fouzia Radouani, I have a PhD in Immunology, and am a full time researcher and head of microbiology laboratory at Institut Pasteur du Maroc (IPM), Casablanca Morocco. I’m the Principal Investigator (PI) of the H3ABioNet node at IPM. My research interest is related to Microbiology and infectious diseases. In addition, I am responsible for the implementation of the bioinformatics unit at the Institute. I started my science activities at Cadi Ayyad University, Marrakech, where I did my first degree in biological sciences. During my academic career, I had a big interest in life sciences, during this period I tried within my group to explore the relationship between living individuals: Humans, animals and plants with their environment. Later, my choice went to Immunology and I prepared my PhD degree in immunology.
2. Tell us a bit about your institution
Institut Pasteur du Maroc is a public health institution, under the administration of the ministry of health; the management is governance intensive and interfacing with administrative boards under the presidency of the ministry of health.
Institut Pasteur du Maroc has been in existence since 1911, it was first created in Tanger, a city located in the north of Morocco. On November 15, 1929, the creation of the Institut Pasteur du Maroc in Casablanca was decided, on the initiative of Dr. Emile Roux, director of the Institut Pasteur in Paris at that time. The ministry of health then set the missions of Institute Pasteur of Morocco (IPM), which are to promote and protect health in Morocco thorough research activities and provide services in public health, also to develop knowledge for Moroccan population and maintain health and food safety.
IPM conducts research in infectious diseases of parasites in animals, plants and humans, therefore we have various research topics in infectious diseases, human genetics, oncology, Venom and Toxin, and lastly, we have started to develop bioinformatics. We are also active in capacity building, where we provide training in microbiology, parasitology and molecular biology. The institute, on a contractual basis, also lends its expertise in biological analysis, food and animal safety. In addition to this, it provides advice to individuals, companies, national and international institutions, and prepares or imports serums, vaccines, enzymes and organic products for use in national human and veterinary health facilities.
Institut Pasteur du Maroc is a member of the International Pasteur Institute Network (RIIP). It was connected to the network through a collaboration agreement, which was updated in March 2010. Lastly, IPM is also a member of the International Association of National Public Health Institutes (IANPHI) since 2006, and member of the H3ABioNet network since 2012.
3. How did you get into bioinformatics?
My interest in bioinformatics started when I began generating huge amounts of data. As a microbiologist practicing research in infectious diseases, I’m dealing with pathogens. For our activities, we look to understand at the molecular levels of bacteria, to understand biological mechanisms and the molecular characterization of pathogens. With all these activities, we felt a need to interpret, understand and explain the data we were generating. That is why we started attending courses and workshops to improve our knowledge in bioinformatics. During these events, we met colleagues who also showed interest in bioinformatics and we constituted a national committee; some members had international partners and collaborations. Together, we began organizing courses in Morocco and inviting experts, it was during these circumstances we heard about the constitution of H3ABioNet and we were invited to join the network. With this, we started the implementation of bioinformatics in the institution.
4. What are your research interests?
My first research interest is microbiology and infectious diseases. Later, my interest was also given to bioinformatics. In the institute, I hold the responsibility of implementing the laboratory for the diagnosis and research in chlamydiae and mycoplasma infections. We are exploring the relationship between these two bacteria and the diseases to which they are associated: sexually transmitted diseases, respiratory tract diseases, and cardiovascular diseases. So, my research is in public health where I’m coordinating and managing research projects. I’m authoring and co-author publications, and our work is also presented in different national and international conferences.
My interest in bioinformatics was built and improved since I joined the H3ABioNet project, which helped to implement a bioinformatics unit in our Institution. Our main activities are data analysis in genomics and proteomics; we also started developing tools and databases during our participation in projects associated with H3ABioNet activities. We are also applying bioinformatics to understand and explore some biological mechanisms such as drug resistance. In addition, we supervise and provide training to MSc and PhD students to develop their research projects.
5. What does your typical workday look like?
First of all, I try to keep my activities in the laboratory, I develop and validate lab protocols and monitor students’ activities. In addition, I set aside time to read and write articles, and develop projects. On the other hand, I have the responsibility to handle administrative tasks such as preparing reports to deliver feedback to my administration, to the ministry of health, and to other extended organizations. Besides these, I attend meetings with researchers in my institution and/or collaborators. This is practically my routine workday; however, the administrative tasks are not done daily.
6. What do you enjoy most about your job?
I am a laboratory researcher; I like being in the laboratory to develop and optimize new techniques to apply for my research activities. Also, when I prepare and submit a publication or project and it’s accepted, this is motivating, I feel a big satisfaction when I succeed in managing and honoring my commitments with my collaborators and partners.
7. What do you enjoy least about your job?
When I cannot deliver what I have to deliver, it can happen sometimes, when I have a lot of tasks to do at the same time, this is like a contest for me. I don’t appreciate when I assign tasks to my students or collaborators to do, and they fail to honor them. I also don’t like it when people working with me don’t respect their commitments.
8. How has being a part of the H3ABioNet community impacted your research group?
I can’t ignore the way the H3ABioNet community impacted my research group; Indeed, I am impressed by the people who are responsible for each task within the network. They are very cooperative, which made me attracted to bioinformatics. Also, the way information and opportunities are shared is sort of an open science, and through this network, a lot of people have been able to implement their knowledge in bioinformatics.
9. What advice would you give a young person that is interested in pursuing a career in bioinformatics?
In my opinion, bioinformatics is interesting, it's a relevant science, therefore one would need to develop knowledge in bioinformatics - biological sciences and informatics, such that there is a balance of knowledge in these fields. Since it is a science that can help biologists to understand and interpret results, I would advise those who wish to pursue a career in bioinformatics to try and foster their knowledge in the previous cited disciplines.
10. Final words
Collaborating and working within H3ABioNet network members is a very big pride; through H3ABioNet, we are not only gaining scientific and managerial improvement, but we feel we have gained and constituted a very big family throughout Africa. It was launched and started by a small group who succeeded by perseverance in their efforts and sacrifices. Now we feel our network is growing, reaching all Africa and worldwide. Therefore, as H3ABioNet members, we need to work on ways to maintain its sustainability and keep it going.