Training materials

Home » Training materials

N.B. this page is under development

This page catalogues training materials for students and researchers seeking to apply AI in the biological sciences. Eventually it will contain three sections:

  • Computing for Biologists, covering introductory material on programming for beginners;
  • Biology for Computer Scientists, introducing key biological concepts;
  • Interdisciplinary Materials, looking at the application of specific techniques to specific domains.

Pick your stream, browse the materials and find an entry point that works for you.

The page and some of the materials are a work in progress. For now we mostly reference outside courses from well-established sources, but as time goes on we will be developing our own materials and publishing them here.

Computing for Biologists

For biologists seeking to learn programming and computational skills we recommend courses from The Carpentries, particularly Software Carpentry and their sister project Data Carpentry.

Software Carpentry is a volunteer project dedicated to teaching basic computing skills to researchers.

The Software Carpentry courses introduce programming concepts, tools and techniques in a very general way. The core curriculum covers the Unix shell and version control with Git, and then branches into Python and R. Additional lessons cover automation and the Make utility, programming with MATLAB, and working with SQL databases.

The lessons are designed to be taught as workshops, but can be followed in a self-directed way. All course materials are available online, including detailed explanations, installation and setup instructions, and exercises with solutions.

Core Curriculum

The core curriculum covers the Unix shell and version control with Git, and then branches into Python and R.

Lesson
Maintainers
Gerard Capes, Jacob Deppen, Benson Muite
Indraneel Chakraborty, Toan Phung
Rohit Goswami, Hugo Gruson, Isaac Jennings
Matthieu Bruneaux, Sehrish Kanwal, Naupaka Zimmerman

Additional Lessons

Additional lessons cover automation and the Make utility, programming with MATLAB, and working with SQL databases. We also include here a lesson on Image Processing from Data Carpentry, which teaches the concepts and techniques involved in working with images in Python.

Lesson
Maintainer(s)
The Carpentries is looking for maintainers for this lesson. Please contact curriculum@carpentries.org if you are interested.
Jacob Deppen, Toby Hodges, Kimberly Meechan, Ulf Schiller, Robert Turner

Biology for Computational Scientists

Interdisciplinary Materials

This section contains lessons and resources for applying computing skills to problems in the biosciences, organised by source.

  • Data Carpentry provides Genomics and Ecology curricula that build on the computing skills covered in Software Carpentry (see Computing for Biologists above) by teaching domain-specific data skills. In some cases the lessons also serve as beginner-level introductions to a particular language or tool. There is also an Image Processing lesson, teaching the concepts and techniques involved in working with images
  • Rosalind is a learning and practice resource providing automatically assessed bitesize coding challenges based on problems from molecular biology.
  • Bioconductor is a set of open source R libraries for bioinformatics. The Bioconductor project publishes its own lessons, some of which are based on the software carpentry curriculum.

 

Data Carpentry

Genomics Curriculum

The Genomics curriculum covers project organisation and management, data wrangling and processing, and an provides introductions to command line tools and cloud computing. There is also a beginner-level, R-specific lesson on data analysis and visualisation (currently in beta), which can be used as an introduction to the language.

Lesson
Maintainer(s)
Anuj Guruacharya, Travis Wrightsman
Valentina Hurtado-McCormick, Paul Smith
Yuka Takemon, Jason Williams, Naupaka Zimmerman
Lesson
Maintainer(s)
Fabrice Rwasimitana, Juan Ugalde, Ethan White
Tajudeen Akanbi Akinosho, Luis J Villanueva
James Foster, Adam Mansur
James Azam, Nikki Gentle, Doug Joubert, Jay Lee, Elif Dede Yildirim
Sarah Pohl, David Palmquist, Carlos Rodrigues

Rosalind

Rosalind is designed to help people learn bioinformatics by solving bite-sized coding problems that increase in biological and computational complexity. Read the background and problem description, code your solution, then download an example dataset, generate the answer, and submit it for automatic assessment.

Rosalind is inspired by Project Euler, Google Code Jam, and the ever growing movement of free online courses.

Problems are grouped into tracks. Most of the problems are language-agnostic, but there is an introductory programming track designed to be completed in Python. The Bioinformatics Stronghold contains the bulk of the problems, ranging from trivial to complex. The Bioinformatics Armory covers widely used tools and file formats.

A free account is needed to download datasets and submit solutions for assessments.

Bioconductor

Bioconductor is a set of open source R libraries for bioinformatics. The Bioconductor project publishes its own lessons, some of which are based on the software carpentry curriculum. The Training and Education section of the Bioconductor website has three modules:

  • bioc-intro The Data science lesson is based on the Carpentries Ecology Curriculum. There are no pre-requisites for this module, and the materials assume no prior knowledge about R and Bioconductor. It introduces R, RStudio, teaches data cleaning, management, analysis, and visualisation and introduces some Bioconductor concepts. Notes are collated in bioc-intro.md in this repo.
  • bioc-rnaseq Analysis and interpretation of bulk RNA-Sequencing data using Bioconductor shows how to use Bioconductor packages to analyse RNA-Seq data. It expects good familiarity with R and the Bioconductor project.
  • bioc-project The Bioconductor project lesson provides an introduction to the Bioconductor project such as the Bioconductor home page, packages, package landing pages, and package vignettes, where to find help, Bioconductor workflows, Bioconductor release schedule and versions, and some core infrastructure. It is meant to be used in combination with other modules as part of a wider workshop.

More advanced lessons can be found under Courses and Conference Materials.