Intégration de l’apprentissage automatique avec sélection de fonctionnalités pour prédire la réponse au traitement à partir de profils tumoraux

A major challenge in biomarker discovery is the high-dimensionality of multi-omics data. Most Machine Learning (ML) classifiers built with few data instances (e.g. 50-100 tumours) and a much higher number of features (e.g. 20,000 per tumour) strongly overfit these datasets. One route to reduce dimensionality is to use more training data, but these are usually not available. An alternative route is to only consider the most informative features in the data by Feature Selection (FS), thus typically discarding the many thousands of less informative features (hence strongly reducing data dimensionality while retaining most the initial information content). This internship will compare a range of ML classifiers integrating FS on in vivo pharmaco-omic datasets. Synthetic data sets will be also analysed to understand how the number predictive features, the total number of features or the number of data instances influence the considered methods. Some studies of the team in this area are:

https://www.frontiersin.org/articles/10.3389/fgene.2019.01041

https://www.biorxiv.org/content/10.1101/277772v2

https://doi.org/10.18632/oncotarget.20923

En direct

Le Centre de Recherche en Cancérologie de Marseille fête ses 50 ans ! -