Toward robust automated cardiovascular arrhythmia detection using self-supervised learning and 1-dimensional vision transformers

Dataemia
12 Min Read


  • Kaptoge, S. et al. World health organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. The Lancet global health 7, e1332–e1345 (2019).


    Google Scholar
     

  • Benjamin, E. J. et al. Heart disease and stroke statistics-2018 update: a report from the american heart association. Circulation 137, e67–e492 (2018).


    Google Scholar
     

  • Wilkins, E. et al. European Cardiovascular Disease Statistics 2017 (Department for Health, University of Bath, Tech. Rep., 2017).

  • Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature medicine 25, 65–69. https://doi.org/10.1038/s41591-018-0268-3 (2019).


    Google Scholar
     

  • Xiong, P., Lee, S.M.-Y. & Chan, G. Deep learning for detecting and locating myocardial infarction by electrocardiogram: A literature review. Frontiers in cardiovascular medicine 9, 860032–860032. https://doi.org/10.3389/fcvm.2022.860032 (2022).


    Google Scholar
     

  • Hong, S., Zhou, Y., Shang, J., Xiao, C. & Sun, J. Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review. Computers in biology and medicine 122, 103801–103801. https://doi.org/10.1016/j.compbiomed.2020.103801 (2020).


    Google Scholar
     

  • Abdelazez, M., Rajan, S. & Chan, A. D. Automated biosignal quality analysis of electrocardiograms. Ieee Instrumentation & Measurement Magazine 24, 37–44 (2021).


    Google Scholar
     

  • Vaid, A. et al. A foundational vision transformer improves diagnostic performance for electrocardiograms. NPJ Digital Medicine 6, 108 (2023).


    Google Scholar
     

  • Hu, R., Chen, J. & Zhou, L. Spatiotemporal self-supervised representation learning from multi-lead ecg signals. Biomed. Signal Process. Control 84. https://doi.org/10.1016/j.bspc.2023.104772 (2023).

  • Merdjanovska, E. & Rashkovska, A. Benchmarking deep learning methods for arrhythmia detection. In 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO) 356–361. https://doi.org/10.23919/MIPRO55190.2022.9803367 (2022).

  • Ribeiro, A. H. et al. Automatic diagnosis of the 12-lead ecg using a deep neural network. Nature communications 11, 1760 (2020).


    Google Scholar
     

  • Zheng, J., Guo, H. & Chu, H. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0). PhysioNet (Accessed 23 November 2022). http://physionet.org/content/ecg-arrhythmia/1.0.0/ (2022)

  • Alday, E. A. P. et al. Classification of 12-lead ecgs: the physionet/computing in cardiology challenge 2020. Physiological measurement 41, 124003 (2020).


    Google Scholar
     

  • Wagner, P. et al. Ptb-xl, a large publicly available electrocardiography dataset. Scientific data 7, 154 (2020).


    Google Scholar
     

  • Yu, C. et al. Multi-level multi-type self-generated knowledge fusion for cardiac ultrasound segmentation. Information Fusion 92, 1–12 (2023).


    Google Scholar
     

  • Achanta, R. et al. Slic superpixels (Tech. Rep, Technical report EPFL, 2010).

  • Li, J. et al. An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains. http://arxiv.org/abs/2410.04133 Published in NEJM AI (2025).

  • Vaid, A. et al. A foundational vision transformer improves diagnostic performance for electrocardiograms. NPJ Digital Medicine 6, 108. https://doi.org/10.1038/s41746-023-00840-9 (2023).


    Google Scholar
     

  • Li, J., Liu, C., Cheng, S., Arcucci, R. & Hong, S. Frozen language model helps ecg zero-shot learning. In Proceedings of Machine Learning International (MIDL) 1–14 (2023).

  • Wang, F., Xu, J. & Yu, L. From token to rhythm: A multi-scale approach for ecg-language pre-training. In International Conference on Machine Learning (ICML) ArXiv:2506.21803 (2025).

  • Zhao, W. X. et al. A survey of large language models. http://arxiv.org/abs/2303.18223 (2023).

  • Kaplan, J. et al. Scaling laws for neural language models. http://arxiv.org/abs/2001.08361 (2020).

  • Henighan, T. et al. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701 (2020).

  • Sechidis, K., Tsoumakas, G. & Vlahavas, I. On the stratification of multi-label data. Machine Learn. Knowl. Discov. Databases 145–158 (2011).

  • Szymanski, P. & Kajdanowicz, T. A network perspective on stratification of multi-label data. In Torgo, L., Krawczyk, B., Branco, P. & Moniz, N. (eds.) Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications 74 22–35 (PMLR, ECML-PKDD, Skopje, Macedonia, 2017).

  • Strodthoff, N., Wagner, P., Schaeffter, T. & Samek, W. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl. IEEE Journal of Biomedical and Health Informatics 25, 1519–1528 (2020).


    Google Scholar
     

  • Mehari, T. & Strodthoff, N. Self-supervised representation learning from 12-lead ecg data. Computers in biology and medicine 141, 105114–105114. https://doi.org/10.1016/j.compbiomed.2021.105114 (2022).


    Google Scholar
     

  • Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. http://arxiv.org/abs/2312.00752 (2023).

  • Nie, Y., Nguyen, N. H., Sinthong, P. & Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. http://arxiv.org/abs/2211.14730 (2022).

  • Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM 64, 107–115 (2021).


    Google Scholar
     

  • Liu, S., Niles-Weed, J., Razavian, N. & Fernandez-Granda, C. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems 33, 20331–20342 (2020).


    Google Scholar
     

  • de Vos, B. D., Jansen, G. E. & Išgum, I. Stochastic co-teaching for training neural networks with unknown levels of label noise. Scientific reports 13, 16875 (2023).


    Google Scholar
     

  • Doggart, P., Kennedy, A., Foreman, E., Finlay, D. & Bond, R. Automated identification of label errors in large electrocardiogram datasets. In 2022 Computing in Cardiology (CinC) 498 1–4 (IEEE, 2022).

  • Chuanjian, S. Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement. http://arxiv.org/abs/2403.06659 (2024).

  • Yang, S., Lian, C. & Zeng, Z. Masked autoencoder for ecg representation learning. In 2022 12th International Conference on Information Science and Technology (ICIST) 95–98. https://doi.org/10.1109/ICIST55546.2022.9926900 (IEEE, 2022)

  • Zhang, W., Geng, S. & Hong, S. Maefe: Masked autoencoders family of electrocardiogram for self-supervised pretraining and transfer learning. IEEE Access https://doi.org/10.1109/ACCESS.2022.3207089 (2022).


    Google Scholar
     

  • Multi-scale masked autoencoder for electrocardiogram anomaly detection. http://arxiv.org/abs/2502.05494 (2025).

  • Masked transformer for electrocardiogram classification. ArXiv:2309.07136 (2024).

  • Reading your heart: Learning ecg words and sentences via pre-training ecg language model. http://arxiv.org/abs/2502.10707 (2025).

  • Transforming ecg diagnosis: An in-depth review of transformer-based deep learning models in cardiovascular disease detection. http://arxiv.org/abs/2306.01249 (2023).

  • Deep learning for ecg arrhythmia detection and classification: an overview of progress for period 2017–2023. Front. Physiol. 14 1246746. https://doi.org/10.3389/fphys.2023.1246746 (2023).

  • Chatterjee, M. Code repository: Toward robust automated cardiovascular arrhythmia detection using self-supervised learning and 1-dimensional vision transformers. https://github.com/Mitchell-Chatterjee/Robust-Automated-Cardiovascular-Arrhythmia-Detection (2026).

  • Perez Alday, E. et al. Classification of 12-lead ecgs: the physionet. Comput. Cardiol. Challenge (2020).

  • Zheng, J. et al. Optimal multi-stage arrhythmia classification approach. Scientific reports 10, 2898 (2020).


    Google Scholar
     

  • Stearns, M. Q., Price, C., Spackman, K. A. & Wang, A. Y. Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA Symposium 662 (American Medical Informatics Association, 2001).

  • Moody, G. B., Muldrow, W. & Mark, R. G. A noise stress test for arrhythmia detectors. Computers in cardiology 11, 381–384 (1984).


    Google Scholar
     

  • Bao, H., Dong, L., Piao, S. & Wei, F. Beit: Bert pre-training of image transformers. In ICLR 2022–10th International Conference on Learning Representations (2022).

  • Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale (2021).

  • Pratiher, S., Srivastava, A., Priyatha, Y. B., Ghosh, N. & Patra, A. A dilated residual vision transformer for atrial fibrillation detection from stacked time-frequency ecg representations. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings 2022 1121–1125. https://doi.org/10.1109/ICASSP43922.2022.9747258 (2022).

  • Che, C., Zhang, P., Zhu, M., Qu, Y. & Jin, B. Constrained transformer network for ecg signal processing and arrhythmia classification. BMC medical informatics and decision making 21, 1–184. https://doi.org/10.1186/s12911-021-01546-2 (2021).


    Google Scholar
     

  • Choi, S. et al. Ecgbert: Understanding hidden language of ecgs with self-supervised representation learning. arXiv preprint arXiv:2306.06340 (2023).

  • Meng, L. et al. Enhancing dynamic ecg heartbeat classification with lightweight transformer model. Artif. Intell. Med. 124. https://doi.org/10.1016/j.artmed.2022.102236 (2022).

  • Nankani, D. & Baruah, R. D. Atrial fibrillation classification and prediction explanation using transformer neural network. In Proceedings of the International Joint Conference on Neural Networks 2022 https://doi.org/10.1109/IJCNN55064.2022.9892286 (2022).

  • Natarajan, A. et al. A wide and deep transformer neural network for 12-lead ecg classification. In 2020 Computing in Cardiology 1–4. https://doi.org/10.22489/CinC.2020.107 (2020).

  • Vazquez-Rodriguez, J., Lefebvre, G., Cumin, J. & Crowley, J. L. Transformer-based self-supervised learning for emotion recognition. In 2022 26th International Conference on Pattern Recognition (ICPR) 2605–2612. https://doi.org/10.1109/ICPR56361.2022.9956027 (2022).

  • Wang, D. et al. Inter-patient ecg characteristic wave detection based on convolutional neural network combined with transformer. Biomed. Signal Process. Control 81 2023. https://doi.org/10.1016/j.bspc.2022.104436 (2023).

  • Wu, Y., Daoudi, M. & Amad, A. Transformer-based self-supervised multimodal representation learning for wearable emotion recognition. IEEE Trans. Affect. Comput. (2023).

  • Yang, S., Lian, C. & Zeng, Z. Masked autoencoder for ecg representation learning. In 2022 12th International Conference on Information Science and Technology 95–98. https://doi.org/10.1109/ICIST55546.2022.9926900 (ICIST, 2022).

  • Liu, H., Zhao, Z. & She, Q. Self-supervised ecg pre-training. Biomed. Signal Process. Control 70. https://doi.org/10.1016/j.bspc.2021.103010 (2021).

  • Ericsson, L., Gouk, H., Loy, C. C. & Hospedales, T. M. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine 39, 42–62. https://doi.org/10.1109/MSP.2021.3134634 (2022).


    Google Scholar
     

  • Zhang, W., Geng, S. & Hong, S. A simple self-supervised ecg representation learning method via manipulated temporal–spatial reverse detection. Biomedical signal processing and control 79, 104194. https://doi.org/10.1016/j.bspc.2022.104194 (2023).


    Google Scholar
     

  • Gedon, D., Ribeiro, A. H., Wahlström, N. & Schön, T. B. First steps towards self-supervised pretraining of the 12-lead ecg. In 2021 Computing in Cardiology (CinC) 48, 1–4. https://doi.org/10.23919/CinC53138.2021.9662748 (2021).

  • Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. http://arxiv.org/abs/1810.04805 (2018).

  • Conover, M. B. Understanding Electrocardiography (Elsevier Health Sciences, 2002).

  • Hu, E. J. et al. Lora: Low-rank adaptation of large language models. http://arxiv.org/abs/2106.09685 (2021).

  • Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 36 (2024).



  • Source link

    Share This Article
    Leave a Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    error: Content is protected !!