A Reproducible Pipeline for Preprocessing and Annotation of scRNA-seq Data Using Seurat and Scanpy

3rd International Conference on Chemo and BioInformatics, Kragujevac, September 25-26. 2025. (pp. 371-374)

АУТОР(И) / AUTHOR(S): Vladimir Kovačević, Andreja Živić, Miloš Ivanović, Nevena Milivojević Dimitrijević, Marko Živanović

САЖЕТАК / ABSTRACT:

Single-cell RNA sequencing (scRNA-seq) is now a versatile platform for the dissection of cellular heterogeneity across biological conditions. Standardization of preprocessing and annotation pipelines is still to come. We present here a reproducible and modular workflow that combines the strengths of Seurat (R) and Scanpy (Python) to preprocess, annotate, and prepare scRNA-seq data for downstream analysis.

The workflow begins with raw count matrices from greater than one biological replicates or conditions. Utilizing Seurat, we perform initial quality control, low-quality cell removal, and reference-based cell type annotation from a reference scRNA-seq atlas. The annotated data is re- coded to AnnData format for an easy transition to the Scanpy framework. In Scanpy, additional operations such as normalization, feature selection, dimensionality re- duction (PCA, UMAP), and checking for batch effects are performed. The output data structure is conducive to flexible downstream analysis, including differential expression and pathway enrichment.

This pipeline ensures interoperability, reproducibility, and transparency and is particu- larly suited for group environments and comparative analysis. All of the preprocessing is thoroughly documented and parameterized to be straightforwardly modifiable for a range of datasets and research questions.

КЉУЧНЕ РЕЧИ / KEYWORDS:

single-cell RNA sequencing, peripheral blood mononuclear cells, gene expression, pipeline, Seurat, ScanPy

ПРОЈЕКАТ / ACKNOWLEDGEMENT:

This paper is supported through the EIT’s Higher Education Initiative SMART-2M, DEEPTECH- 2M and A-SIDE projects, coordinated by EIT RawMaterials, funded by the European Union and the i-GREENPHARM project, HORIZON-MSCA-2023-SE-01-01, Grant No. 101182850 and supported by the Ministry of Education and Ministry of Science, Technological Development and Innovation, Republic of Serbia, Grants: No. 451-03-136/2025-03/200122.

ЛИТЕРАТУРА / REFERENCES:

Jiarui Ding, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Ne- manja D. Marjanovic, Travis K. Hughes, Marc H. Wadsworth, Tyler Burks, Lan T. Nguyen, John Y. H. Kwon, Boaz Barak, William Ge, Amanda J. Kedaigle, Shaina Carroll, Shuqiang Li, Nir Hacohen, Orit Rozenblatt-Rosen, Alex Shalek, Alexandra-Chloé Villani, Aviv Regev, and Joshua Z. Levin. Systematic comparative analysis of single cell rna-sequencing methods. bioRxiv, 2019.
Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen, William M Mauck, Shiwei Zheng, An- drew Butler, Maddie J Lee, Aaron J Wilk, Charlotte Darby, Michael Zager, et al. Integrated analysis of multimodal single-cell data. Cell, 184(13):3573– 3587,
Yuhan Hao, Tim Stuart, Madeline H Kowalski, Saket Choudhary, Paul Hoffman, Austin Hartman, Avi Srivastava, Gesmira Molla, Shaista Madad, Carlos Fernandez- Granda, et al. Dictionary learning for integrative, multimodal and scalable single- cell analysis. Nature biotechnology, 42(2):293–304,
Vladimir Kovacevic, Marija Bezulj, Nikola Milicevic, Bojana Josic, Shuangsang Fang, Yong Zhang, and Junhua Li. Codi: Contrastive distance cell type annotation for spatially resolved transcriptomics. Preprint, 2024.
Isaac Virshup, Sergei Rybakov, Fabian J Theis, Philipp Angerer, and F Alexander Wolf. anndata: Annotated data. BioRxiv, pages 2021–12, 2021.