Reset filters

Search publications


Search by keyword
List by department / centre / faculty

No publications found.

 

Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification

Authors: Ranipa KZhu WPSwamy MNS


Affiliations

1 Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Description

Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early or late fusion strategies have been proposed, they often fall short of effectively capturing cross-modal feature interactions. Additionally, conventional fusion methods, such as concatenation, averaging, or max pooling, frequently result in information loss. To address these limitations, this paper presents a novel attention fusion-based two-stream Vision Transformer (AFTViT) architecture for HSC that leverages two-dimensional mel-cepstral domain features. The proposed method employs a ViT-based encoder to capture long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation. Experiments conducted on the PhysioNet2016 and PhysioNet2022 datasets demonstrate that the AFTViT outperforms state-of-the-art CNN-based methods in terms of accuracy. These results highlight the potential of the AFTViT framework for early diagnosis of cardiovascular diseases, offering a valuable tool for cardiologists and researchers in developing advanced HSC techniques.


Keywords: attention fusiondeep learningheart sound classificationvision transformer


Links

PubMed: https://pubmed.ncbi.nlm.nih.gov/41155032/

DOI: 10.3390/bioengineering12101033