Keyword search (4,163 papers available)

"Zhu WP" Authored Publications:

Title Authors PubMed ID
1 Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification Ranipa K; Zhu WP; Swamy MNS; 41155032
ENCS
2 Age estimation via electrocardiogram from smartwatches Adib A; Zhu WP; Ahmad MO; 41142465
ENCS
3 Cooperative Schemes for Joint Latency and Energy Consumption Minimization in UAV-MEC Networks Cheng M; He S; Pan Y; Lin M; Zhu WP; 40942666
ENCS
4 Cluster based statistical feature extraction method for automatic bleeding detection in wireless capsule endoscopy video. Ghosh T, Fattah SA, Wahid KA, Zhu WP, Ahmad MO 29407997
IMAGING

 

Title:Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification
Authors:Ranipa KZhu WPSwamy MNS
Link:https://pubmed.ncbi.nlm.nih.gov/41155032/
DOI:10.3390/bioengineering12101033
Publication:Bioengineering (Basel, Switzerland)
Keywords:attention fusiondeep learningheart sound classificationvision transformer
PMID:41155032 Category: Date Added:2025-10-29
Dept Affiliation: ENCS
1 Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Description:

Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early or late fusion strategies have been proposed, they often fall short of effectively capturing cross-modal feature interactions. Additionally, conventional fusion methods, such as concatenation, averaging, or max pooling, frequently result in information loss. To address these limitations, this paper presents a novel attention fusion-based two-stream Vision Transformer (AFTViT) architecture for HSC that leverages two-dimensional mel-cepstral domain features. The proposed method employs a ViT-based encoder to capture long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation. Experiments conducted on the PhysioNet2016 and PhysioNet2022 datasets demonstrate that the AFTViT outperforms state-of-the-art CNN-based methods in terms of accuracy. These results highlight the potential of the AFTViT framework for early diagnosis of cardiovascular diseases, offering a valuable tool for cardiologists and researchers in developing advanced HSC techniques.





BookR developed by Sriram Narayanan
for the Concordia University School of Health
Copyright © 2011-2026
Cookie settings
Concordia University