Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data

Bettina Mieth, James R.F. Hockley, Nico Görnitz, Marina M.C. Vidovic, Klaus Robert Müller, Alex Gutteridge, Daniel Ziemek

Research output: Contribution to journalArticle

Abstract

In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.

Original languageEnglish
Article number20353
JournalScientific reports
Volume9
Issue number1
DOIs
Publication statusPublished - 2019 Dec 1

Fingerprint

Cluster Analysis
RNA
RNA Sequence Analysis
Small Cytoplasmic RNA
Cataloging
Datasets
Transfer (Psychology)
Atlases
Research

ASJC Scopus subject areas

  • General

Cite this

Mieth, B., Hockley, J. R. F., Görnitz, N., Vidovic, M. M. C., Müller, K. R., Gutteridge, A., & Ziemek, D. (2019). Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data. Scientific reports, 9(1), [20353]. https://doi.org/10.1038/s41598-019-56911-z

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data. / Mieth, Bettina; Hockley, James R.F.; Görnitz, Nico; Vidovic, Marina M.C.; Müller, Klaus Robert; Gutteridge, Alex; Ziemek, Daniel.

In: Scientific reports, Vol. 9, No. 1, 20353, 01.12.2019.

Research output: Contribution to journalArticle

Mieth, B, Hockley, JRF, Görnitz, N, Vidovic, MMC, Müller, KR, Gutteridge, A & Ziemek, D 2019, 'Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data', Scientific reports, vol. 9, no. 1, 20353. https://doi.org/10.1038/s41598-019-56911-z
Mieth, Bettina ; Hockley, James R.F. ; Görnitz, Nico ; Vidovic, Marina M.C. ; Müller, Klaus Robert ; Gutteridge, Alex ; Ziemek, Daniel. / Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data. In: Scientific reports. 2019 ; Vol. 9, No. 1.
@article{faf17c7241c74e6b9768f6ea06d528d5,
title = "Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data",
abstract = "In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.",
author = "Bettina Mieth and Hockley, {James R.F.} and Nico G{\"o}rnitz and Vidovic, {Marina M.C.} and M{\"u}ller, {Klaus Robert} and Alex Gutteridge and Daniel Ziemek",
year = "2019",
month = "12",
day = "1",
doi = "10.1038/s41598-019-56911-z",
language = "English",
volume = "9",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data

AU - Mieth, Bettina

AU - Hockley, James R.F.

AU - Görnitz, Nico

AU - Vidovic, Marina M.C.

AU - Müller, Klaus Robert

AU - Gutteridge, Alex

AU - Ziemek, Daniel

PY - 2019/12/1

Y1 - 2019/12/1

N2 - In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.

AB - In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.

UR - http://www.scopus.com/inward/record.url?scp=85077220581&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077220581&partnerID=8YFLogxK

U2 - 10.1038/s41598-019-56911-z

DO - 10.1038/s41598-019-56911-z

M3 - Article

C2 - 31889137

AN - SCOPUS:85077220581

VL - 9

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 20353

ER -