| Keyword search (4,163 papers available) | ![]() |
"Zhu WP" Authored Publications:
| Title | Authors | PubMed ID | |
|---|---|---|---|
| 1 | Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification | Ranipa K; Zhu WP; Swamy MNS; | 41155032 ENCS |
| 2 | Age estimation via electrocardiogram from smartwatches | Adib A; Zhu WP; Ahmad MO; | 41142465 ENCS |
| 3 | Cooperative Schemes for Joint Latency and Energy Consumption Minimization in UAV-MEC Networks | Cheng M; He S; Pan Y; Lin M; Zhu WP; | 40942666 ENCS |
| 4 | Cluster based statistical feature extraction method for automatic bleeding detection in wireless capsule endoscopy video. | Ghosh T, Fattah SA, Wahid KA, Zhu WP, Ahmad MO | 29407997 IMAGING |
| Title: | Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification | ||||
| Authors: | Ranipa K, Zhu WP, Swamy MNS | ||||
| Link: | https://pubmed.ncbi.nlm.nih.gov/41155032/ | ||||
| DOI: | 10.3390/bioengineering12101033 | ||||
| Publication: | Bioengineering (Basel, Switzerland) | ||||
| Keywords: | attention fusion; deep learning; heart sound classification; vision transformer; | ||||
| PMID: | 41155032 | Category: | Date Added: | 2025-10-29 | |
| Dept Affiliation: |
ENCS
1 Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada. |
||||
Description: |
Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early or late fusion strategies have been proposed, they often fall short of effectively capturing cross-modal feature interactions. Additionally, conventional fusion methods, such as concatenation, averaging, or max pooling, frequently result in information loss. To address these limitations, this paper presents a novel attention fusion-based two-stream Vision Transformer (AFTViT) architecture for HSC that leverages two-dimensional mel-cepstral domain features. The proposed method employs a ViT-based encoder to capture long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation. Experiments conducted on the PhysioNet2016 and PhysioNet2022 datasets demonstrate that the AFTViT outperforms state-of-the-art CNN-based methods in terms of accuracy. These results highlight the potential of the AFTViT framework for early diagnosis of cardiovascular diseases, offering a valuable tool for cardiologists and researchers in developing advanced HSC techniques. |



