Toward robust automated cardiovascular arrhythmia detection using self-supervised learning and 1-dimensional vision transformers

Kaptoge, S. et al. World health organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. The Lancet global health 7, e1332–e1345 (2019).

Google Scholar

Benjamin, E. J. et al. Heart disease and stroke statistics-2018 update: a report from the american heart association. Circulation 137, e67–e492 (2018).

Google Scholar

Wilkins, E. et al. European Cardiovascular Disease Statistics 2017 (Department for Health, University of Bath, Tech. Rep., 2017).

Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature medicine 25, 65–69. https://doi.org/10.1038/s41591-018-0268-3 (2019).

Google Scholar

Xiong, P., Lee, S.M.-Y. & Chan, G. Deep learning for detecting and locating myocardial infarction by electrocardiogram: A literature review. Frontiers in cardiovascular medicine 9, 860032–860032. https://doi.org/10.3389/fcvm.2022.860032 (2022).

Google Scholar

Hong, S., Zhou, Y., Shang, J., Xiao, C. & Sun, J. Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review. Computers in biology and medicine 122, 103801–103801. https://doi.org/10.1016/j.compbiomed.2020.103801 (2020).

Google Scholar

Abdelazez, M., Rajan, S. & Chan, A. D. Automated biosignal quality analysis of electrocardiograms. Ieee Instrumentation & Measurement Magazine 24, 37–44 (2021).

Google Scholar

Vaid, A. et al. A foundational vision transformer improves diagnostic performance for electrocardiograms. NPJ Digital Medicine 6, 108 (2023).

Google Scholar

Hu, R., Chen, J. & Zhou, L. Spatiotemporal self-supervised representation learning from multi-lead ecg signals. Biomed. Signal Process. Control 84. https://doi.org/10.1016/j.bspc.2023.104772 (2023).

Merdjanovska, E. & Rashkovska, A. Benchmarking deep learning methods for arrhythmia detection. In 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO) 356–361. https://doi.org/10.23919/MIPRO55190.2022.9803367 (2022).

Ribeiro, A. H. et al. Automatic diagnosis of the 12-lead ecg using a deep neural network. Nature communications 11, 1760 (2020).

Google Scholar

Zheng, J., Guo, H. & Chu, H. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0). PhysioNet (Accessed 23 November 2022). http://physionet.org/content/ecg-arrhythmia/1.0.0/ (2022)

Alday, E. A. P. et al. Classification of 12-lead ecgs: the physionet/computing in cardiology challenge 2020. Physiological measurement 41, 124003 (2020).

Google Scholar

Wagner, P. et al. Ptb-xl, a large publicly available electrocardiography dataset. Scientific data 7, 154 (2020).

Google Scholar

Yu, C. et al. Multi-level multi-type self-generated knowledge fusion for cardiac ultrasound segmentation. Information Fusion 92, 1–12 (2023).

Google Scholar

Achanta, R. et al. Slic superpixels (Tech. Rep, Technical report EPFL, 2010).

Li, J. et al. An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains. http://arxiv.org/abs/2410.04133 Published in NEJM AI (2025).

Vaid, A. et al. A foundational vision transformer improves diagnostic performance for electrocardiograms. NPJ Digital Medicine 6, 108. https://doi.org/10.1038/s41746-023-00840-9 (2023).

Google Scholar

Li, J., Liu, C., Cheng, S., Arcucci, R. & Hong, S. Frozen language model helps ecg zero-shot learning. In Proceedings of Machine Learning International (MIDL) 1–14 (2023).

Wang, F., Xu, J. & Yu, L. From token to rhythm: A multi-scale approach for ecg-language pre-training. In International Conference on Machine Learning (ICML) ArXiv:2506.21803 (2025).

Zhao, W. X. et al. A survey of large language models. http://arxiv.org/abs/2303.18223 (2023).

Kaplan, J. et al. Scaling laws for neural language models. http://arxiv.org/abs/2001.08361 (2020).

Henighan, T. et al. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701 (2020).

Sechidis, K., Tsoumakas, G. & Vlahavas, I. On the stratification of multi-label data. Machine Learn. Knowl. Discov. Databases 145–158 (2011).

Szymanski, P. & Kajdanowicz, T. A network perspective on stratification of multi-label data. In Torgo, L., Krawczyk, B., Branco, P. & Moniz, N. (eds.) Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications 74 22–35 (PMLR, ECML-PKDD, Skopje, Macedonia, 2017).

Strodthoff, N., Wagner, P., Schaeffter, T. & Samek, W. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl. IEEE Journal of Biomedical and Health Informatics 25, 1519–1528 (2020).

Google Scholar

Mehari, T. & Strodthoff, N. Self-supervised representation learning from 12-lead ecg data. Computers in biology and medicine 141, 105114–105114. https://doi.org/10.1016/j.compbiomed.2021.105114 (2022).

Google Scholar

Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. http://arxiv.org/abs/2312.00752 (2023).

Nie, Y., Nguyen, N. H., Sinthong, P. & Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. http://arxiv.org/abs/2211.14730 (2022).

Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM 64, 107–115 (2021).

Google Scholar

Liu, S., Niles-Weed, J., Razavian, N. & Fernandez-Granda, C. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems 33, 20331–20342 (2020).

Google Scholar

de Vos, B. D., Jansen, G. E. & Išgum, I. Stochastic co-teaching for training neural networks with unknown levels of label noise. Scientific reports 13, 16875 (2023).

Google Scholar

Doggart, P., Kennedy, A., Foreman, E., Finlay, D. & Bond, R. Automated identification of label errors in large electrocardiogram datasets. In 2022 Computing in Cardiology (CinC) 498 1–4 (IEEE, 2022).

Chuanjian, S. Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement. http://arxiv.org/abs/2403.06659 (2024).

Yang, S., Lian, C. & Zeng, Z. Masked autoencoder for ecg representation learning. In 2022 12th International Conference on Information Science and Technology (ICIST) 95–98. https://doi.org/10.1109/ICIST55546.2022.9926900 (IEEE, 2022)

Zhang, W., Geng, S. & Hong, S. Maefe: Masked autoencoders family of electrocardiogram for self-supervised pretraining and transfer learning. IEEE Access https://doi.org/10.1109/ACCESS.2022.3207089 (2022).

Google Scholar

Multi-scale masked autoencoder for electrocardiogram anomaly detection. http://arxiv.org/abs/2502.05494 (2025).

Masked transformer for electrocardiogram classification. ArXiv:2309.07136 (2024).

Reading your heart: Learning ecg words and sentences via pre-training ecg language model. http://arxiv.org/abs/2502.10707 (2025).

Transforming ecg diagnosis: An in-depth review of transformer-based deep learning models in cardiovascular disease detection. http://arxiv.org/abs/2306.01249 (2023).

Deep learning for ecg arrhythmia detection and classification: an overview of progress for period 2017–2023. Front. Physiol. 14 1246746. https://doi.org/10.3389/fphys.2023.1246746 (2023).

Chatterjee, M. Code repository: Toward robust automated cardiovascular arrhythmia detection using self-supervised learning and 1-dimensional vision transformers. https://github.com/Mitchell-Chatterjee/Robust-Automated-Cardiovascular-Arrhythmia-Detection (2026).

Perez Alday, E. et al. Classification of 12-lead ecgs: the physionet. Comput. Cardiol. Challenge (2020).

Zheng, J. et al. Optimal multi-stage arrhythmia classification approach. Scientific reports 10, 2898 (2020).

Google Scholar

Stearns, M. Q., Price, C., Spackman, K. A. & Wang, A. Y. Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA Symposium 662 (American Medical Informatics Association, 2001).

Moody, G. B., Muldrow, W. & Mark, R. G. A noise stress test for arrhythmia detectors. Computers in cardiology 11, 381–384 (1984).

Google Scholar

Bao, H., Dong, L., Piao, S. & Wei, F. Beit: Bert pre-training of image transformers. In ICLR 2022–10th International Conference on Learning Representations (2022).

Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale (2021).

Pratiher, S., Srivastava, A., Priyatha, Y. B., Ghosh, N. & Patra, A. A dilated residual vision transformer for atrial fibrillation detection from stacked time-frequency ecg representations. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings 2022 1121–1125. https://doi.org/10.1109/ICASSP43922.2022.9747258 (2022).

Che, C., Zhang, P., Zhu, M., Qu, Y. & Jin, B. Constrained transformer network for ecg signal processing and arrhythmia classification. BMC medical informatics and decision making 21, 1–184. https://doi.org/10.1186/s12911-021-01546-2 (2021).

Google Scholar

Choi, S. et al. Ecgbert: Understanding hidden language of ecgs with self-supervised representation learning. arXiv preprint arXiv:2306.06340 (2023).

Meng, L. et al. Enhancing dynamic ecg heartbeat classification with lightweight transformer model. Artif. Intell. Med. 124. https://doi.org/10.1016/j.artmed.2022.102236 (2022).

Nankani, D. & Baruah, R. D. Atrial fibrillation classification and prediction explanation using transformer neural network. In Proceedings of the International Joint Conference on Neural Networks 2022 https://doi.org/10.1109/IJCNN55064.2022.9892286 (2022).

Natarajan, A. et al. A wide and deep transformer neural network for 12-lead ecg classification. In 2020 Computing in Cardiology 1–4. https://doi.org/10.22489/CinC.2020.107 (2020).

Vazquez-Rodriguez, J., Lefebvre, G., Cumin, J. & Crowley, J. L. Transformer-based self-supervised learning for emotion recognition. In 2022 26th International Conference on Pattern Recognition (ICPR) 2605–2612. https://doi.org/10.1109/ICPR56361.2022.9956027 (2022).

Wang, D. et al. Inter-patient ecg characteristic wave detection based on convolutional neural network combined with transformer. Biomed. Signal Process. Control 81 2023. https://doi.org/10.1016/j.bspc.2022.104436 (2023).

Wu, Y., Daoudi, M. & Amad, A. Transformer-based self-supervised multimodal representation learning for wearable emotion recognition. IEEE Trans. Affect. Comput. (2023).

Yang, S., Lian, C. & Zeng, Z. Masked autoencoder for ecg representation learning. In 2022 12th International Conference on Information Science and Technology 95–98. https://doi.org/10.1109/ICIST55546.2022.9926900 (ICIST, 2022).

Liu, H., Zhao, Z. & She, Q. Self-supervised ecg pre-training. Biomed. Signal Process. Control 70. https://doi.org/10.1016/j.bspc.2021.103010 (2021).

Ericsson, L., Gouk, H., Loy, C. C. & Hospedales, T. M. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine 39, 42–62. https://doi.org/10.1109/MSP.2021.3134634 (2022).

Google Scholar

Zhang, W., Geng, S. & Hong, S. A simple self-supervised ecg representation learning method via manipulated temporal–spatial reverse detection. Biomedical signal processing and control 79, 104194. https://doi.org/10.1016/j.bspc.2022.104194 (2023).

Google Scholar

Gedon, D., Ribeiro, A. H., Wahlström, N. & Schön, T. B. First steps towards self-supervised pretraining of the 12-lead ecg. In 2021 Computing in Cardiology (CinC) 48, 1–4. https://doi.org/10.23919/CinC53138.2021.9662748 (2021).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. http://arxiv.org/abs/1810.04805 (2018).

Conover, M. B. Understanding Electrocardiography (Elsevier Health Sciences, 2002).

Hu, E. J. et al. Lora: Low-rank adaptation of large language models. http://arxiv.org/abs/2106.09685 (2021).

Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 36 (2024).

Source link

Toward robust automated cardiovascular arrhythmia detection using self-supervised learning and 1-dimensional vision transformers

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Leave a Reply Cancel reply

Recent Posts

Recent Comments

You Might Also Like

I Quit My $130,000 ML Engineer Job After Learning 4 Lessons

AI that talks to itself learns faster and smarter

CUGA on Hugging Face: Democratizing Configurable AI Agents

Liquid AI Releases LocalCowork Powered By LFM2-24B-A2B to Execute Privacy-First Agent Workflows Locally Via Model Context Protocol (MCP)