Dear all,
Welcome to EpiMed Open Course initiative.
Looking forward to reading your questions and comments !
We hope that this distant interactive course will initiate dynamic and constructive exchanges focused on bioinformatics and biostatistics subjects that we are all interested in, and that we will all learn from one another.
Best wishes to you all,
Katia, Florent et Sophie
Sessions 1 and 2 - Linear models
23 March - 3 April 2020
For the two first weeks, we plan to define a common framework corresponding to the main questions that are usually addressed in statistics: linear models.
We propose the following plan
- First watch two following videos:
- Linear Models Pt.1 - Linear Regression
- Linear Models Pt.2 - t-tests and ANOVA and at your own pace, digest the various notions
-
When you are ready and wish to go into more details, please ask your questions using the mailing list (in English or in French, as you prefer)
- We (or anybody who has something to say about the subject) will answer your questions collectively using the mailing list, suggest you more web links …
To go further (in French):
- Régression linéaire simple
- Régression linéaire multiple
- Anova à un facteur
- Anova à 2 facteurs sans répétition
- Anova à 2 facteurs avec interaction
Session 3 - Presentation of tools developed by EpiMed
6 - 10 April 2020
In this session, we would like to share with you some tools that we developed to simplify our work with omics data. We hope that these tools could also be useful in your projects.
3.1 Biomarker discovery and general presentation of EpiMed pipelines and tools
Computational translational epigenetics: concept-driven omics analyses.
- The awakening of silent genes in malignancies : a new biomarker discovery strategy
- Concept driven omics analyses: EpiMed information system and pipelines
3.2. EpiMed Database - Update a list of gene symbols
EpiMed Database is a web application for gene-related and clinical meta-data processing. In this first video we explain how to update / convert a list of gene symbols with EpiMed Database.
- Please, watch a tutorial: Update a list of gene symbols (16 min)
- Download slides: slides
- Answer the Quiz questions: Quiz
- If you are R or Python user, you may also be interested in an API query for gene annotation. Please, visit this documentation page that explains how to use EpiMed Database directly from your code.
- If you have questions or comments, please contact us via the mailing list
3.3. Manage public omics data from NCBI GEO
EpiMed developed specific tools to simplify managing omics data and corresponding bio-clinical annotations from popular public repositories, for example, NCBI GEO.
- Tutorial: Manage public omics data from NCBI GEO (19 min)
- Download slides: slides
- Tools mentioned in the tutorial:
- R package epimedtools (Florent Chuffart)
- EpiMed Database: bio-clinical annotations (Ekaterina Flin)
- For R or Python users, an API is available to query clinical data of EpiMed Database directly from your code. Please, visit this documentation page.
Session 4 - Artificial Intelligence for omics
13 - 17 April 2020
Welcome to Session 4 of EpiMed Open Course introducing AI for omics.
The organization of the session is explained in the video below:
- Presentation of the session and organization notes (11 min, in French).
The session is structured in 5 theoretical parts with several videos to watch. Some of them are optional, depending on your available time and interest.
To complete the theoretical knowledge, we propose you to study with us a use case with real transcriptomic data for leukemia classification as an example. For this use case, we’ll see how the data should be prepared for machine learning training and how different ML algorithms may perform on the same dataset (see parts 4.3 and 4.4).
If you have questions or comments, please contact us via the mailing list.
4.1. Introduction to Artificial Intelligence and Machine Learning
- A Gentle Introduction to Machine Learning (13 min)
- Omics data analysis towards precision medicine: a machine learning approach (45 min)
4.2. Unsupervised Machine Learning and visualization of omics data
- Principal Component Analysis (PCA) (22 min)
- Optional: t-SNE (12 min)
- Hierarchical Clustering (12 min)
4.3. Machine Learning fundamentals
- Bias and Variance (7 min)
-
Cross Validation (5 min)
- Use case of leukemia classification - Part 1. Omics data challenges, variable reduction, data normalization and cross-validation. (45 min, narration in French, slides in English)
4.4. Supervised Machine Learning
- Logistic Regression (9 min)
- Support Vector Machines (SVM) (21 min)
- Optional: SVM with Polynomial Kernel (8 min)
- Optional: SVM with RBF Kernel (16 min)
-
Random Forest (10 min)
- Use case of leukemia classification - Part 2. Training machine learning algorithms. (38 min, narration in French, slides in English)
4.5. Optional: Introduction to Deep Learning (Neural Networks)
The MIT course linked below is one of the best in Deep Learning available online. The course is excellent, clearly explained and updated every year. We propose you to watch the first introductory lecture. Nevertheless, if you are interested in Neural Networks, we highly recommend you to follow the entire course.
Web resources
- https://bioinfo-fr.net
- https://statquest.org
- https://www.fun-mooc.fr/courses/course-v1:USPC+37028+session01/about
- http://math.agrocampus-ouest.fr/