- Research
- Open access
- Published:
A multi-label deep residual shrinkage network for high-density surface electromyography decomposition in real-time
Journal of NeuroEngineering and Rehabilitation volume 22, Article number: 106 (2025)
Abstract
Background
The swift and accurate identification of motor unit spike trains (MUSTs) from surface electromyography (sEMG) is essential for enabling real-time control in neural interfaces. However, the existing sEMG decomposition methods, including blind source separation (BSS) and deep learning, have not yet achieved satisfactory performance, due to high latency or low accuracy.
Methods
This study introduces a novel real-time high-density sEMG (HD-sEMG) decomposition algorithm named ML-DRSNet, which combines multi-label learning with a deep residual shrinkage network (DRSNet) to improve accuracy and reduce latency. ML-DRSNet was evaluated on a public sEMG dataset and the corresponding MUSTs extracted via the convolutional BSS algorithm. An improved multi-label deep convolutional neural network (ML-DCNN) was also evaluated and compared against a conventional multi-task DCNN (MT-DCNN). These networks were trained and tested on various window sizes and step sizes.
Results
With the shortest window size (20 data points) and step size (10 data points), ML-DRSNet significantly outperformed both ML-DCNN (0.86 ± 0.18 vs. 0.71 ± 0.24, P < 0.001) and MT-DCNN (0.86 ± 0.18 vs. 0.66 ± 0.16, P < 0.001) in decomposition precision. Moreover, ML-DRSNet demonstrated a notably lower latency (15.15 ms) compared to ML-DCNN (69.36 ms) and MT-DCNN (76.96 ms), both of which demonstrated reduced latency relative to BSS-based decomposition methods.
Conclusions
The proposed ML-DRSNet and the improved ML-DCNN algorithms substantially enhance both the accuracy and real-time performance in decomposing MUSTs, establishing a technical foundation for neuro-information-driven motor intention recognition and disease assessment.
Background
The implementation of neural interfaces, whether in the central or peripheral nervous system, is crucial for comprehending motor neurophysiology and developing human-machine interaction systems. Neural interfaces aim to connect human neural cells and decode their activities to interpret the neural code for cognitive or sensorimotor tasks [1, 2]. Non-invasive surface electromyography (sEMG)-based neural interfaces have the potential to surpass current solutions in terms of viability and information-transfer rate [3,4,5]. This type of neural interface decodes electrical signals from skeletal muscles (i.e., EMG) to identify spinal motor neuron activity. To extract neural information from sEMG, it is essential to identify the neural encoding behind muscle activation through sEMG decomposition, which separates the signals into motor unit (MU) spike trains (MUSTs) and MU action potential trains (MUAPTs) [6, 7]. sEMG decomposition not only aids in decoding motor commands but also facilitates more precise motor intention recognition [8]. It offers superior performance over traditional EMG control systems, paving the way for more accurate and intuitive neural interfaces [9]. Consequently, sEMG decomposition exhibits extensive applications in motor intention recognition and the control of rehabilitation devices.
In recent years, the use of two-dimensional flexible electrode arrays has enabled high-density sEMG (HD-sEMG) recording from dozens to hundreds of channels, providing rich spatiotemporal information and prompting the development of sEMG decomposition algorithms [10]. Blind source separation (BSS) algorithms, such as convolution kernel compensation (CKC) and independent component analysis (ICA), have been developed and validated to achieve accurate sEMG decomposition [11]. Given the promising applications of sEMG in various research fields, developing robust real-time decomposition methods is crucial. Online sEMG decomposition using BSS algorithms involves offline training to determine separation vectors (i.e., MU filters) and their subsequent real-time application to short segments of HD-sEMG [12, 13]. For example, Chen et al. [14] proposed a real-time sEMG decomposition method based on the CKC algorithm. It accurately identifies 12 ± 2 MUSTs during hand movements from 200 ms signal segments, with a processing time of approximately 250 ms for signal recording and computation. Zhao et al. [12] introduced an online sEMG decomposition method using the progressive fast ICA peel-off algorithm, achieving an average delay of 84 ± 28 ms for decomposing one-second signal segments in real time. Despite significant strides made by BSS-based methods in real-time decomposition, these approaches require extending sEMG segments with delayed signals to transform convolution-mixed sEMG into linear instantaneous mixed signals [15]. Whitening transformation is used to enhance convergence by reducing the number of unknowns and speeding up separation vector estimation [16]. However, spatial whitening may amplify noise, complicating neural encoding identification [15, 17]. Although preprocessing steps like extending and whitening reduce offline training complexity, they introduce delays that may undermine real-time performance in applications such as neural interfaces [14, 18]. Additionally, BSS algorithms are limited by current signal processing knowledge, potentially overlooking complex relationships in the high-dimensional sEMG space. These limitations hinder the achievement of more accurate online HD-sEMG decomposition. Therefore, methods that can directly decode raw sEMG signals for accurate real-time sEMG decomposition are needed.
With the rapid development of computer technology and artificial intelligence, deep learning has emerged as a powerful tool for addressing various inverse problems, including the decomposition of signals such as HD-sEMG [19, 20]. It often outperforms manually designed algorithms (e.g., BSS) in many applications. Recent research has explored integrating deep learning algorithms in both offline training and online application for real-time sEMG decomposition. Urh et al. [21] compared the capabilities of a dense neural network, a long short-term memory network, and a deep convolutional neural network (DCNN) in identifying MUSTs from HD-sEMG. Results indicated that the DCNN demonstrated higher accuracy and greater resistance to noise. Although the study demonstrated the potential of deep learning in HD-sEMG decomposition, it still relied on preprocessing steps like extending and whitening. Clarke et al. [20] used a gated recurrent unit network for sEMG decomposition, validating its accuracy in extracting innervation pulse trains using both simulation and experimental data. The network processed a one-second HD-sEMG segment in 67 ms. Wen et al. [22] employed the CKC algorithm to identify MUSTs from HD-sEMG, which were then used to train DCNNs (SO-DCNN and MO-DCNN). MO-DCNN required approximately 48 ms to process each 60 ms signal segment. These studies collectively suggest that training a deep learning model offline for real-time sEMG decomposition is a faster and more accurate approach. However, the application of deep learning in sEMG decomposition remains relatively limited. Considering the real-time accuracy requirements in applications like neural interfaces, further optimization of these algorithms is needed. This involves shortening the length of input signal segments and reducing processing time to enhance decomposition performance.
Currently, multi-label learning has started to be applied in bioelectrical signal processing [23, 24]. By leveraging the correlations among different labels to extract task-relevant features, it has opened new possibilities for more in-depth signal analysis. In HD-sEMG decomposition, where MU discharge patterns exhibit synergistic effects [25], multi-label learning could improve the robustness of simultaneously extracting multiple MUSTs from each sEMG segment. However, research on HD-sEMG decomposition using multi-label learning is still limited. Additionally, attention mechanisms have also gained prominence in deep learning. The deep residual shrinkage network (DRSNet) is a feature learning method designed for noisy and redundant data. It adapts to the characteristics of input data to perform adaptive denoising, and increases the weights of task-relevant feature channels [26]. In sEMG decomposition, DRSNet can utilize a small attention network integrated into the residual module to automatically set soft thresholds for each HD-sEMG segment, effectively processing signals with varying noise or redundancy. It extracts high-dimensional features of each segment and captures deep relationships among feature channels, accurately identifying MUSTs from HD-sEMG while avoiding gradient vanishing and exploding issues.
Therefore, this study assumes that multi-label learning can enhance the robustness and precision of HD-sEMG decomposition by recognizing the interrelationships among motor unit discharge patterns. Additionally, it hypothesizes that the DRSNet can effectively handle noisy and redundancy in HD-sEMG with its adaptive denoising and attention mechanisms, improving decomposition performance. This study proposes a novel real-time HD-sEMG decomposition method combining multi-label learning and DRSNet (ML-DRSNet) to more accurately and rapidly identify MUSTs from raw HD-sEMG. Additionally, a multi-label DCNN (ML-DCNN) is also proposed for sEMG decomposition. The primary contributions of this study are as follows:
(1) Transformed HD-sEMG decomposition into a multi-label binary classification problem using two DCNN frameworks (ML-DRSNet and ML-DCNN), validating the feasibility and effectiveness of multi-label learning in recognizing MU discharges.
(2) ML-DRSNet and ML-DCNN accurately identified MU discharges with reduced delays, achieved through shorter input window sizes and processing times, enhancing the real-time capabilities of HD-sEMG decomposition.
(3) ML-DRSNet and ML-DCNN demonstrated higher decomposition precision compared to existing DCNN methods, improving the overall accuracy of HD-sEMG decomposition.
Methods
Experimental HD-sEMG dataset
In this study, the public dataset from [27] was used to validate the proposed HD-sEMG decomposition method. The dataset includes HD-sEMG recordings from the dominant leg of 18 physically active male participants (average age: 29.4 ± 7.9 years, height: 180 ± 7 cm, body mass: 76 ± 8 kg, and BMI: 23.6 ± 2.7 kg/m2). Participants performed two types of isometric contraction tasks at three different intensities. In the first task, participants lied down on a custom force platform (TRE-50 K, Dacell, Korea) with a torque sensor, performing three plantarflexion contractions with fully extended knees and a 10° ankle joint angle (0° represents the foot perpendicular to the lower leg) at 10%, 30%, and 50% of their peak torque (maximum voluntary contraction, MVC). In the second task, participants stood on a force plate, randomly performing balance and isometric heel raises, including maintaining a standard heel height (6 cm), a neutral foot position, or internal rotation. HD-sEMG signals were collected from the soleus, the gastrocnemius medialis (GM), and the lateralis muscles using two-dimensional adhesive grids with either 8 × 4 electrodes (10 mm interelectrode distance) or 13 × 5 electrodes (with one corner electrode absent; 8 mm interelectrode distance). The signals were recorded in monopolar mode, band-pass filtered from 10 to 900 Hz, and digitized at a sampling rate of 2048 Hz using a multi-channel acquisition system (EMG-Quattrocento; 400-channel EMG amplifier, OTBiolelettronica, Italy).
The dataset also includes HD-sEMG decomposition results obtained using convolutional BSS (CBSS). After excluding noisy channels, HD-sEMG signals were band-pass filtered with second-order Butterworth filters (20 to 750Â Hz). The DEMUSE software (version 4.9; The University of Maribor, Slovenia) was used to decompose HD-sEMG with CBSS. This algorithm has been validated on both simulation and experimental data for identifying MUSTs across various contraction intensities [15, 28]. Following the automated identification of MU discharges, experienced operators visually inspected and manually edited all identified MUSTs [29, 30]. Only MUs with a pulse-to-noise ratio (PNR) greater than 30 dB were retained, ensuring a sensitivity over 90% and a false-positive rate below 2% [31, 32]. This ensures that the MUSTs in this dataset are highly reliable and serve as an excellent benchmark for evaluating the decomposition performance in this study. More detailed descriptions of the dataset and related analysis results can be found in [27].
In this study, a subset containing HD-sEMG from the GM muscle during the isometric plantarflexion task, along with the corresponding MUSTs, was selected from the public dataset to evaluate the proposed decomposition method. Due to the absence of HD-sEMG data for the isometric plantarflexion task, participants S14 and S15 were excluded from the analysis.
Data pre-processing
Figure 1 shows a schematic summarizing the data pipeline for the proposed decomposition algorithm. Given the stochastic nature of sEMG signals, a sliding window with a specified window size and step size was used to segment the HD-sEMG signals into small segments (Fig. 2). These segments were then used as inputs for the HD-sEMG decomposition models to identify MUSTs (Fig. 2).
The sliding window length is denoted as the window size \(\:\text{W}\), with a width equal to the number of channels \(\:\text{C}\) in the sEMG signals (64 in this study). The sliding distance, or increment, is represented by the step size \(\:\text{S}\). The input to the HD-sEMG decomposition models is expressed as a \(\:\text{C}\times\:\text{W}\) matrix, as shown in Eq. (1). Each element is defined as shown in Eq. (2). The model output is a \(\:1\times\:\text{M}\) matrix, as shown in Eq. (3), where each element is expressed in Eq. (4).
Block diagram of the data processing pipeline for ML-DRSNet training, validation, and testing. The MUSTs are extracted using the unsupervised CBSS algorithm from filtered, extended, and whitened HD-sEMG signals. The raw data is segmented with a sliding window to generate paired HD-sEMG segments and MUST labels. A five-fold cross-validation strategy is employed to divide the data segments into training, validation, and testing sets. All segments are z-score normalized using the mean and standard deviation of the training set before being input to the network. The binary cross-entropy loss between the predictions and the MUST labels is backpropagated to update the network. The model achieving the lowest validation loss is selected and evaluated on the testing set to assess its performance
Where \(\:\text{X}\) represents an HD-sEMG segment, with \(\:\text{n}\) being the segment index (n = 1, 2, 3,…), \(\:\text{x}\) is the raw HD-sEMG signal, and the subscript \(\:\text{c}\) denoting the channel index (ranging from \(\:1\) to \(\:\text{C}\)). \(\:\text{Y}\) is a binary matrix containing the discharge information for
A schematic diagram illustrating the sliding window for data segmentation and input-output data of the network. (a) Input data: HD-sEMG signals are segmented using a sliding window with a specified window size (e.g., 20, 60, 100, or 140 data points) and step size (e.g., 10, 20, 30, 40, or 50 data points). (b) Output data: A one-dimensional binary vector where each binary label corresponds to a MU, with 1 indicating a MU discharge and 0 indicating no discharge
multiple MUs, while \(\:\text{y}\) is the binary MUST vector, where each element is \(\:0\) (no discharge) and \(\:1\) (discharge). \(\:\text{M}\) represents the total number of MUs extracted by CBSS, with the subscript \(\:\text{m}\) denoting the MU index (ranging from \(\:1\) to \(\:\text{M}\)).
Finally, HD-sEMG segments and their corresponding MUST labels were stored for training, validation, and testing of the decomposition networks. The data preprocessing was performed using MATLAB 2022b (The Mathworks, Inc., Natick, MA, USA) and Python 3.8 (Python Software Foundation, Delaware, USA).
Network structures of ML-DCNN and ML-DRSNet
To evaluate the feasibility and effectiveness of multi-label learning in real-time HD-sEMG decomposition, this study first proposed the ML-DCNN, inspired by the work in [22]. Subsequently, a new method combining multi-label learning with DRSNet (ML-DRSNet) was introduced to identify MUSTs from HD-sEMG. This section details the structures of the ML-DCNN and ML-DRSNet networks.
Network structure of ML-DCNN
ML-DCNN consists of four convolutional layers, two max-pooling layers, three dropout layers (with a dropout rate of 0.5), and two fully connected (FC) layers, as shown in Fig. 3a. The four convolutional layers have 128, 128, 128, and 64 output channels, respectively, and utilize rectified linear unit (ReLU) activation functions. The first FC layer outputs 256 features, while the final FC layer employs a sigmoid function. The number of output labels is dynamically determined by the number of MUs in the training dataset.
Network structure of ML-DRSNet
ML-DRSNet (Fig. 3b) is built on the deep residual network [26]. The residual building unit (RBU) is its core component, which consists of two batch normalization (BN) layers, two ReLU functions, two convolutional layers, and an identity shortcut. The identity shortcut helps learn residual representations between the input and output of RBU, improving network convergence speed and prediction accuracy. Expanding on the RBU, a compact sub-network incorporating soft thresholding and an attention mechanism forms the residual shrinkage building unit with channel-wise (RSBU-CW). Soft thresholding, a pivotal technique in signal denoising [33, 34], helps mitigate gradient vanishing and exploding issues during training.
To minimize the impact of noise-related features in HD-sEMG on MUST identification, an attention mechanism adaptively sets shrinkage thresholds for each channel of the feature map. This attention network comprises two FC layers, a BN layer, a ReLU function, and a sigmoid function. The feature map output from the final BN layer in the RBU undergoes an absolute operation, global average pooling (GAP), and flattening into a one-dimensional vector. The resulting vector is then passed through two FC layers, with a BN layer and a ReLU function in between. Finally, the sigmoid function scales the output to the (0, 1) range, ensuring positive and reasonable thresholds and avoiding a feature map with all-zero output.
In this study, ML-DRSNet consists of a convolutional layer (64 output channels, 3 × 3 kernel size, padding of 1), a BN layer, a ReLU function, eight RSBU-CW blocks, a GAP layer, a flatten layer, a FC layer, and a sigmoid function. The RSBU-CW blocks have output channels of 64, 64, 128, 128, 256, 256, 512, and 512, respectively. The final FC layer dynamically outputs a number of labels determined by the number of MUs in the training dataset.
Model training, validation and testing
PyTorch (version 1.11.0) [35] was employed as the primary deep learning framework, with ML-DRSNet and ML-DCNN implemented in a Python 3.8 environment (Ubuntu 20.04, CUDA 11.3). HD-sEMG decomposition was framed as a multi-label binary classification task, where binary cross-entropy (BCE) loss was used to quantify the difference between predicted and true labels. The BCE loss was computed for each MU, and network parameters were optimized by minimizing the average BCE loss to enhance the accuracy of sEMG decomposition through coordinated MU discharges. The Adam optimizer [36] with an adaptive learning rate of 10− 4 and a weight decay of 10− 6 was employed.
A five-fold cross-validation strategy was used to assess the performance of sEMG decomposition networks, ensuring robust evaluation and minimizing variability. Separate decomposition models were developed for each subject at three contraction intensities, with sEMG segments divided into five exclusive subsets for training, validation, and testing. Three subsets were used for training, one for validation, and one for testing in each fold. This process was repeated five times, with each subset serving as the test set once. The average performance metrics across these five tests represented the overall performance of the HD-sEMG decomposition model.
Before training, the mean and standard deviation of each channel in the training set were calculated for z-score normalization across all subsets, facilitating model convergence. Each training session consisted of 100 epochs with a batch size of 64. The model with the lowest average validation loss was selected as the final model and was evaluated on the test set. Model training, validation, and testing were conducted on a PC equipped with an Intel Xeon Platinum 8358P CPU, NVIDIA RTX 3090 GPU (24GB), and 80 GB of RAM.
Comparative analysis
To validate the feasibility and robustness of multi-label learning in HD-sEMG decomposition, this study implemented the DCNN provided by Wen et al. [22] using PyTorch (version 1.11.0), referred to as MT-DCNN. The effects of window size, step size, and contraction intensity on decomposition accuracy and time efficiency of ML-DRSNet were investigated. Additionally, a comparative analysis was conducted to assess the accuracy and real-time performance of ML-DRSNet, ML-DCNN, and MT-DCNN in HD-sEMG decomposition.
Effects of window size, step size, and contraction intensity on ML-DRSNet
The window size determines the length of the signal fed into ML-DRSNet for processing. On one hand, the window size was chosen based on the average duration of multiple MUAPs (approximately 10 ms) and the electromechanical delay of human muscles (225 ± 50 ms) [15, 37], for both identification and real-time purposes. On the other hand, as described in Eqs. (1), (2), (3) and (4), ML-DRSNet requires sEMG at time \(\:(\text{n}\times\:\text{S}+\text{W}/2)\) to predict the discharge activity of MUs at time \(\:\text{n}\times\:\text{S}\), resulting in a latency of half the window size (Fig. 2). Consequently, reducing the window size leads to a decrease in latency. In this study, four window sizes were selected: 20, 60, 100, and 140 data points, corresponding to 10, 29, 49, and 68 ms at a sampling frequency of 2048 Hz, respectively.
The step size refers to the sliding distance or increment of the sliding window used to segment HD-sEMG signals, which determines the overlap or difference between adjacent signal segments (Fig. 2). It directly influences the prediction frequency (cycle time) used for control purposes, as a smaller step size leads to more frequent predictions (higher prediction frequency), while a larger step size results in fewer predictions (lower prediction frequency). Five step sizes were selected: 10, 20, 30, 40, and 50 data points, corresponding to 5, 10, 15, 20 and 24 ms at a sampling frequency of 2048 Hz. These step sizes result in prediction frequencies of 205, 102, 68, 51, and 41 Hz, respectively. This ensures that the prediction frequency exceeds the typical MU discharge frequencies (3–100 Hz) [38,39,40] to capture high-frequency components effectively.
Furthermore, this study assessed the performance of ML-DRSNet at 10%, 30%, and 50% MVC, respectively, to evaluate the impact of contraction intensity on decomposition accuracy.
Comparison among ML-DRSNet, ML-DCNN, and MT-DCNN
The optimal window sizes and step sizes for maximizing decomposition performance were first determined for ML-DRSNet, ML-DCNN, and MT-DCNN individually. The decomposition accuracy of the three models was then compared at their optimal settings to assess the robustness of multi-label learning, particularly when integrated with DRSNet, for sEMG decomposition. Additionally, the decomposition accuracy of the models was also compared at the shortest window size (20 data points) and step size (10 data points) to evaluate their performance under constrained conditions.
Evaluation criteria
The performance of the decomposition models was evaluated based on accuracy and time efficiency.
Decomposition accuracy
The MUSTs obtained from the decomposition models were compared with those extracted using the CBSS algorithm (i.e., MUST labels). The precision (Eq. 5), sensitivity (Eq. 6), F1-score (Eq. 7), and miss rate (Eq. 8) were calculated for all MUs.
Where \(\:\text{T}\text{P}\) represents the number of correctly identified discharges, \(\:\text{F}\text{N}\) represents the number of missed discharges, and \(\:\text{F}\text{P}\) represents the number of incorrect identifications. The miss rate was used to evaluate the performance of decomposition models in identifying individual MUs.
This study counted the number of correctly identified MUs under the criteria of miss rate < 0.1. If the miss rate of a MU is below 0.1, the model is considered to have correctly identified that MU. The decomposition accuracy was also evaluated by calculating the proportion of correctly identified MUs relative to the total number of MUs used in training.
Time efficiency
Time efficiency was assessed based on training time and prediction time, which quantify the computational complexity [22, 41]. Training time is defined as the time required for one epoch, including training, validation, and testing, as well as parameter updates, measured in seconds per epoch (s/epoch).
Prediction time was defined as the time required to make a prediction for one sEMG window (segment) during testing, measured in milliseconds per window (ms/window). The prediction time, combined with the whole window size of input data, was used to evaluate the real-time performance of the models. In this study, the training and prediction times for the three decomposition models were measured using the data from subject S10 at 10% MVC, for reference.
Statistical analysis
Given that the accuracy metrics may not meet the assumptions of normal distribution and homogeneity of variance, the Friedman test was applied to statistically compare the decomposition accuracy (F1-score, precision, sensitivity, and the number of correctly identified MUs) of ML-DRSNet, ML-DCNN, and MT-DCNN. To control for errors from multiple comparisons, the Wilcoxon-Nemenyi-McDonald-Thompson post hoc test was performed. The significance level was set at 0.05. All statistical analyses were conducted using OriginPro 2024b (OriginLab, Northampton, MA, USA).
Results
HD-sEMG decomposition with CBSS algorithm
In this study, the experimental HD-sEMG dataset involved 16 participants with a total of 738 MUs (15.38 ± 8.32) extracted using the CBSS algorithm, achieving an average PNR of 38.74 ± 2.52 dB. At 10% MVC, the average number of MUs was 18.25 ± 7.86 (range: 2 to 30) with a PNR of 38.53 ± 3.04 dB. At 30% MVC, the average number of MUs was 17.06 ± 8.16 (range: 4 to 31) with a PNR of 38.78 ± 2.16 dB. At 50% MVC, the average number of MUs was 10.81 ± 7.37 (range: 2 to 22) with a PNR of 38.90 ± 2.42 dB. The average number of MUs, PNR, and discharge frequency for each contraction intensity is summarized in Table 1.
Effects of window size, step size, and contraction intensity on ML-DRSNet
Decomposition accuracy
This study shows the significant effect of window size and step size on the decomposition accuracy of ML-DRSNet, including sensitivity, precision, F1-score, and the number of correctly identified MUs (Fig. 4; Table 2). When the step size is 30, 40, or 50, the decomposition accuracy decreases with increasing window size. In contrast, for step sizes of 10 and 20, the decomposition accuracy increases initially and then decreases as the window size grows. When the window size is 20 data points, increasing the step size from 10 to 50 data points improves decomposition accuracy (Fig. 4; Table 2). Conversely, with window sizes of 60, 100, and 140 data points, increasing the step size results in a continuous decline in accuracy (Fig. 4; Table 2). ML-DRSNet performs.
The impact of window size and step size on the decomposition accuracy of ML-DRSNet. The average (a) F1-score, (b) sensitivity, (c) correctly identified MU counts, and (d) precision are shown. Bars represent average values across 16 participants and 3 contraction intensities, with error bars indicating standard deviation
best with a window size of 20 data points and a step size of 50 data points, achieving a sensitivity of 0.84 ± 0.25, precision of 0.93 ± 0.12, and F1-score of 0.85 ± 0.23. With a miss rate below 0.1, ML-DRSNet correctly identified an average of 10.90 ± 10.51 MUs, representing 70.87% of the average MUs identified using CBSS (Fig. 4; Table 2).
Moreover, the decomposition accuracy progressively decreases as the contraction intensity increases from 10 to 50% MVC (Fig. 5; Table 3). This trend may be attributed to the increased overlap among MUAPs at higher contraction intensities, which makes it more challenging to accurately identify and distinguish different MU features from sEMG signals.
Time efficiency
The average training time of ML-DRSNet is 198.43 ± 126.19 s/epoch (range: 87.43–427.53 s/epoch; Table 4). The training time is inversely related to step size, with larger step sizes reducing training time due to fewer data being used for training. No explicit correlation was observed between training time and window size.
The prediction time is virtually unaffected by both window size and step size. The average prediction time is 5.39 ± 0.12 ms/window (Table 4). With an optimal window size of 9.76 ms (20 data points at a sampling frequency of 2048 Hz), the total decomposition time required is only 15.15 ms.
Comparison among ML-DRSNet, ML-DCNN, and MT-DCNN
To compare the performance of the three decomposition models, this study summarized the optimal window size and step size combinations for ML-DRSNet, ML-DCNN, and MT-DCNN. ML-DRSNet demonstrates superior performance with a window size of 20 data points and a step size of 50 data points (Fig. 4; Table 2). For ML-DCNN, the optimal combination comprises a window size of 140 data points and a step size of 50 data points, with a sensitivity of 1.00 ± 0.00, precision of 1.00 ± 0.00, F1-score of 1.00 ± 0.00 (Fig. 6; Table 5).
ML-DCNN exhibits consistently high performance under the miss rate standard (< 0.1), maintaining an average number of correctly identified MUs at 14.71 ± 8.23, which accounts for 95.66% of the average MUs identified using CBSS (Fig. 6c; Table 5). As for MT-DCNN, the optimal combination includes a window size of 140 data points and a step size of 10 data points, achieving a sensitivity of 0.92 ± 0.04, a precision of 0.96 ± 0.04, and a F1-score of 0.94 ± 0.04. Under the miss rate standard of < 0.1, MT-DCNN correctly identified 11.90 ± 8.44 MUs, constituting 77.37% of the average MUs identified using CBSS (Fig. 7; Table 6).
The impact of window size and step size on the decomposition accuracy of ML-DCNN. The average (a) F1-score, (b) sensitivity, (c) correctly identified MU counts, and (d) precision are shown. Bars represent average values across 16 participants and 3 contraction intensities, with error bars indicating standard deviation
The impact of window size and step size on the decomposition accuracy of MT-DCNN. The average (a) F1-score, (b) sensitivity, (c) correctly identified MU counts, and (d) precision are shown. Bars represent average values across 16 participants and 3 contraction intensities, with error bars indicating standard deviation
This study compared the decomposition accuracy of ML-DRSNet, ML-DCNN, and MT-DCNN using their respective optimal window sizes and step sizes. For all contraction intensities, ML-DCNN consistently outperforms the other models in terms of F1-score, sensitivity, the number of correctly identified MUs, and precision (P < 0.001, Fig. 8; Table 7). No significant differences are observed between ML-DRSNet and MT-DCNN.
Comparison of (a) F1-score, (b) sensitivity, (c) correctly identified MU counts, and (d) precision among ML-DRSNet, ML-DCNN, and MT-DCNN across all contraction intensities using their optimal window sizes and step sizes. Asterisks indicate statistical significance: * P < 0.05, ** P < 0.01, *** P < 0.001
At 10% MVC, ML-DCNN significantly outperforms both ML-DRSNet and MT-DCNN in precision (P < 0.001), while no significant differences are found in the number of correctly identified MUs (Fig. 9a-b; Table 8). ML-DRSNet shows no significant difference in precision compared to MT-DCNN (P = 0.962, Fig. 9a; Table 8). At 30% MVC, ML-DCNN maintains significantly higher precision than both MT-DCNN (P < 0.001) and ML-DRSNet (P < 0.05, Fig. 9c; Table 8). ML-DRSNet and MT-DCNN do not exhibit significant differences in either precision (P = 0.290) or the number of correctly identified MUs (P = 0.653, Fig. 9c-d; Table 8).
At 50% MVC, ML-DCNN continues to show significantly better precision than both MT-DCNN (P < 0.001) and ML-DRSNet (P < 0.01), while no significant differences are observed between ML-DRSNet and MT-DCNN (P = 0.541, Fig. 9e-f; Table 8).
Furthermore, this study also compared the decomposition accuracy of ML-DRSNet, ML-DCNN, and MT-DCNN at the shortest window size (20 data points) and step size (10 data points). Figure 10; Table 9 show that ML-DRSNet significantly outperforms ML-DCNN and MT-DCNN in terms of F1-score, sensitivity, number of correctly identified MUs, and precision across all contraction intensities (P < 0.001).
Comparison of (a) F1-score, (b) sensitivity, (c) correctly identified MU counts, and (d) precision among ML-DRSNet, ML-DCNN, and MT-DCNN across all contraction intensities using the shortest window size (20 data points) and step size (10 data points). Asterisks indicate statistical significance: * P < 0.05, ** P < 0.01, *** P < 0.001
At 10% MVC, the precision of ML-DRSNet is significantly higher than that of MT-DCNN (0.93 ± 0.18 vs. 0.74 ± 0.17, P < 0.001) and ML-DCNN (0.93 ± 0.18 vs. 0.84 ± 0.16, P < 0.05, Fig. 11a-c; Table 10). At 30% MVC, the precision of ML-DRSNet remains significantly higher than that of ML-DCNN (0.84 ± 0.18 vs. 0.60 ± 0.22, P < 0.01) and MT-DCNN (0.84 ± 0.18 vs. 0.65 ± 0.14, P < 0.05, Fig. 11c-d; Table 10). At 50% MVC, ML-DRSNet continues to show significantly higher decomposition accuracy compared to MT-DCNN (0.83 ± 0.19 vs. 0.59 ± 0.12, P < 0.01, Fig. 11e-f; Table 10).
Comparison of precision and correctly identified MU counts among ML-DRSNet, ML-DCNN, and MT-DCNN at (a)-(b) 10% MVC, (c)-(d) 30% MVC, and (e)-(f) 50% MVC using the shortest window size (20 data points) and step size (10 data points). Asterisks indicate statistical significance: * P < 0.05, ** P < 0.01, *** P < 0.001
Table 11 illustrates the training time and prediction time (including the optimal window size) required by ML-DRSNet, ML-DCNN, and MT-DCNN at their respective optimal step sizes and window sizes. Among the three models, MT-DCNN requires the longest training time, while ML-DCNN has the shortest. In terms of prediction time summed with the optimal window size, ML-DRSNet achieves the shortest, with a time of just 15.15 ms/window. The prediction time of ML-DCNN is shorter than that of MT-DCNN.
Discussion
In this study, HD-sEMG decomposition was approached as a multi-label classification task, leveraging the synergistic effects of MU discharge patterns to simultaneously identify multiple MUs. A new method, ML-DRSNet, was introduced, integrating multi-label learning with DRSNet for real-time HD-sEMG decomposition. Additionally, multi-label learning was incorporated with a simple DCNN resulting in the ML-DCNN. ML-DRSNet reduces the required window size and prediction latency, while ML-DCNN significantly improves decomposition precision compared to MT-DCNN. ML-DCNN accurately identifies most MUs, achieving precision exceeding 0.95, regardless of contraction intensity. Notably, ML-DRSNet maintains high accuracy with an average precision above 0.90, and reduces the latency to 15.15 ms/window. These results demonstrate the feasibility and robustness of combining multi-label learning with DCNNs for precise real-time HD-sEMG decomposition, establishing a technological foundation for neuro-information-driven neural interfaces.
Previous studies have employed similar supervised training methods for real-time HD-sEMG decomposition. For instance, Wen et al. [22] proposed a DCNN, referred to as MT-DCNN in this study, which demonstrated a total latency of 80 ms for processing an HD-sEMG segment of 120 data points, with both sensitivity and precision surpassing 0.80. To validate decomposition accuracy, this study replicated MT-DCNN using the provided structure and code, with results showing sensitivity, precision, and F1-score all exceeding 0.90, and a total latency of 76.96 ms for processing each window size of 140 data points. This successful replication paved the way for incorporating multi-label learning into HD-sEMG decomposition, leading to the ML-DCNN. ML-DCNN, using a window size of 140 data points, achieves sensitivity, accuracy, and F1-score above 0.95, with a reduced prediction latency of 69.36 ms. ML-DCNN consistently outperformed MT-DCNN in the number of correctly identified MUs across various contraction intensities, thanks to the ability of multi-label learning to capture complex interactions among MUs.
Further advancing this approach, ML-DRSNet integrated multi-label learning with DRSNet, efficiently processing HD-sEMG segments as small as 20 data points, reducing prediction latency to 15.15 ms — over five times faster than MT-DCNN and four times faster than ML-DCNN. ML-DRSNet achieves precision above 0.85, with a minimal window size of 20 data points, and significantly outperformed both MT-DCNN and ML-DCNN in decomposition accuracy at the smallest window size and step size. This improvement is likely contributed by the robust noise resistance of DRSNet and its ability to effectively extract essential features from short windows [26], resulting in high accuracy and reduced window size. ML-DRSNet surpasses previous methods in real-time decomposition performance (e.g., 84 ms and 94 ms using fast ICA peel-off [12, 13]; 67 ms and 28 ms using gated recurrent unit networks [20, 41]), achieving prediction latencies below the human electromechanical delay range from 70 ms to 385 ms [37].
However, several limitations must be acknowledged. First, due to the lack of appropriate experimental HD-sEMG signals, this study did not analyze the impact of factors like electrode displacement on ML-DCNN and ML-DRSNet performance. Future research should explore the generalization and repeatability of these methods. Second, this study only validated the models in isometric contraction scenarios; dynamic contractions and muscle fatigue, which alter MUAPs, could affect performance [42, 43]. Thus, developing protocols to enhance stability across various scenarios is crucial. Third, ML-DRSNet may not generalize well to all participants, leading to higher outlier rates in some cases. This sensitivity to data distribution characteristics necessitates further optimization. Fourth, similar to previous studies [20, 22, 41], these models heavily depend on supervised learning from the results of BSS algorithms, limiting their ability to recognize MUs in post-training contractions. Future work should focus on integrating unsupervised deep learning and deep metric learning to track new MUs in dynamic scenarios. Lastly, this study only evaluated the inference time of the deep learning-based decomposition networks under GPU acceleration. without considering the latency introduced by data acquisition, transmission, or hardware implementation in real-world applications. Future work should focus on optimizing real-time performance through hardware implementation using lower-level programming languages, and evaluate the full application latency, including signal acquisition, transmission, preprocessing, and decomposition.
Conclusions
This study introduced a method that integrates multi-label learning with DRSNet for real-time HD-sEMG decomposition. ML-DRSNet significantly enhances real-time performance while maintaining high precision compared to ML-DCNN, which has already demonstrated superior accuracy over MT-DCNN in decomposing MUSTs. The successful combination of multi-label learning with DCNNs for sEMG decomposition demonstrates substantial improvements in both accuracy and real-time performance. The proposed method presents a promising pathway for applying HD-sEMG decomposition algorithms to online neural interfaces.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- EMG:
-
Electromyography
- sEMG:
-
Surface electromyography
- HD-sEMG:
-
High-density surface electromyography
- MU:
-
Motor unit
- MUSTs:
-
Motor unit spike trains
- MUAPTs:
-
Motor unit action potential trains
- BSS:
-
Blind source separation
- CBSS:
-
Convolutional blind source separation
- CKC:
-
Convolution kernel compensation
- ICA:
-
Independent component analysis
- DRSNet:
-
Deep residual shrinkage network
- ML-DRSNet:
-
Multi-label deep residual shrinkage network
- DCNN:
-
Deep convolutional neural network
- ML-DCNN:
-
Multi-label deep convolutional neural network
- MT-DCNN:
-
Multi-task deep convolutional neural network
- MVC:
-
Maximum voluntary contraction
- GM:
-
Gastrocnemius medialis
- PNR:
-
Pulse-to-noise ratio
- FC:
-
Fully connected
- ReLU:
-
Rectified linear unit
- RBU:
-
Residual building unit
- BN:
-
Batch normalization
- RSBU-CW:
-
Residual shrinkage building unit with channel-wise
- GAP:
-
Global average pooling
- BCE:
-
Binary cross-entropy
References
John PD. Bridging the Brain to the World: A Perspective on Neural Interface Systems. Neuron. 2008;60:511–521.
Hatsopoulos NG, Donoghue JP. The science of neural interface systems. Annu Rev Neurosci. 2009;32:249–66.
Farina D, Holobar A. Human-Machine interfacing by decoding the surface electromyogram. IEEE Signal Process Mag. 2015;32:115–20.
Farina D, Enoka RM. Evolution of surface electromyography: from muscle electrophysiology towards neural recording and interfacing. J Electromyogr Kinesiol. 2023;71:102796.
Balasubramanian S, Garcia-Cossio E, Birbaumer N, Burdet E, Ramos-Murguialday A. Is EMG a viable alternative to BCI for detecting movement intention in severe stroke?? IEEE Trans Biomed Eng. 2018;65:2790–7.
Yang Z, Guanghua X, Yixin L, Wei Q. Improved online decomposition of non-stationary electromyogram via signal enhancement using a neuron resonance model: a simulation study. J Neural Eng. 2022;19:026030.
Xu Y, Yu Y, Xia M, Sheng X, Zhu X. A novel and efficient surface electromyography decomposition algorithm using local Spatial information. IEEE J Biomed Health Inf. 2023;27:286–95.
Yu Y, Chen C, Sheng X, Zhu X. Wrist torque Estimation via electromyographic motor unit decomposition and image reconstruction. IEEE J Biomed Health Inf. 2021;25:2557–66.
Farina D, Vujaklija I, Sartori M, Kapelner T, Negro F, Jiang N, et al. Man/machine interface based on the discharge timings of spinal motor neurons after targeted muscle reinnervation. Nat Biomed Eng. 2017;1:0025.
Caillet AH, Avrillon S, Kundu A, Yu T, Phillips ATM, Modenese L et al. Larger and Denser: An Optimal Design for Surface Grids of EMG Electrodes to Identify Greater and More Representative Samples of Motor Units. eNeuro. 2023;10:ENEURO.0064-0023.2023.
Farina D, Merletti R, Enoka RM. The Extraction Of Neural Strategies From The Surface Emg: 2004–2024. J Appl Physiol (1985) 2024.
Zhao H, Zhang X, Chen M, Zhou P. Online decomposition of surface electromyogram into individual motor unit activities using progressive fastica Peel-Off. IEEE Trans Biomed Eng. 2024;71:160–70.
Zhao H, Zhang X, Chen M, Zhou P. Adaptive online decomposition of surface EMG using progressive fastica Peel-off. IEEE Trans Biomed Eng 2023:1–11.
Chen C, Ma S, Sheng X, Farina D, Zhu X. Adaptive Real-Time identification of motor unit discharges from Non-Stationary High-Density surface electromyographic signals. IEEE Trans Biomed Eng. 2020;67:3501–9.
Francesco N, Silvia M, Anna Margherita C, Ales H, Dario F. Multi-channel intramuscular and surface EMG decomposition by convolutive blind source separation. J Neural Eng. 2016;13:026027.
Hyvarinen A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw. 1999;10:626–34.
Hyvärinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw. 2000;13:411–30.
Zheng Y, Hu X. Real-time isometric finger extension force Estimation based on motor unit discharge information. J Neural Eng. 2019;16:066006.
Genzel M, Macdonald J, März M. Solving inverse problems with deep neural Networks– Robustness included?? IEEE Trans Pattern Anal Mach Intell. 2023;45:1119–34.
Clarke AK, Atashzar SF, Vecchio AD, Barsakcioglu D, Muceli S, Bentley P, et al. Deep learning for robust decomposition of High-Density surface EMG signals. IEEE Trans Biomed Eng. 2021;68:526–34.
Urh F, Strnad D, Clarke A, Farina D, Holobar A. On the selection of neural network architecture for supervised motor unit identification from High-Density surface EMG. 42nd annu int Conf. IEEE EMBC. 2020;2020:736–9.
Wen Y, Avrillon S, Hernandez-Pavon JC, Kim SJ, Hug F, Pons JL. A convolutional neural network to identify motor units from high-density surface electromyography signals in real time. J Neural Eng 2021;18.
Du X, Deng X, Qin H, Shu Y, Liu F, Zhao G, et al. MMPosE: Movie-Induced Multi-Label positive emotion classification through EEG signals. IEEE Trans Affect Comput. 2023;14:2925–38.
Ravi MG, E.A G VVS. Explainable deep Learning-Based approach for multilabel classification of electrocardiogram. IEEE Trans Eng Manage. 2023;70:2787–99.
Kline JC, De Luca CJ. Synchronization of motor unit firings: an epiphenomenon of firing rate characteristics not common inputs. J Neurophysiol. 2016;115:178–92.
Zhao M, Zhong S, Fu X, Tang B, Pecht M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans Ind Inf. 2020;16:4681–90.
Hug F, Del Vecchio A, Avrillon S, Farina D, Tucker K. Muscles from the same muscle group do not necessarily share common drive: evidence from the human triceps Surae. J Appl Physiol. 2021;130:342–54.
Holobar A, Farina D. Blind source identification from the multichannel surface electromyogram. Physiol Meas. 2014;35:R143–65.
Hug F, Avrillon S, Del Vecchio A, Casolo A, Ibanez J, Nuccio S, et al. Analysis of motor unit Spike trains estimated from high-density surface electromyography is highly reliable across operators. J Electromyogr Kinesiol. 2021;58:102548.
Del Vecchio A, Falla D, Felici F, Farina D. The relative strength of common synaptic input to motor neurons is not a determinant of the maximal rate of force development in humans. J Appl Physiol. 2019;127:205–14.
Holobar A, Minetto MA, Farina D. Accurate identification of motor unit discharge patterns from high-density surface EMG and validation with a novel signal-based performance metric. J Neural Eng. 2014;11:016008.
Laine CM, Martinez-Valdes E, Falla D, Mayer F, Farina D. Motor neuron pools of synergistic thigh muscles share most of their synaptic input. J Neurosci. 2015;35:12207–16.
Zhang X, Zhou P. Filtering of surface EMG using ensemble empirical mode decomposition. Med Eng Phys. 2013;35:537–42.
Ma S, Lv B, Lin C, Sheng X, Zhu X. EMG signal filtering based on variational mode decomposition and Sub-Band thresholding. IEEE J Biomedical Health Inf. 2021;25:47–58.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Proceedings of 33rd Conf NeurIPS. 2019. 2019.
Diederik PK, Jimmy B, Adam. A Method for Stochastic Optimization, in 3rd Int Conf Learning Representations2015.
Del Vecchio A, Úbeda A, Sartori M, AzorÃn JM, Felici F, Farina D. Central nervous system modulates the neuromechanical delay in a broad range for the control of muscle force. J Appl Physiol. 2018;125:1404–10.
Del Vecchio A, Holobar A, Falla D, Felici F, Enoka RM, Farina D. Tutorial: analysis of motor unit discharge characteristics from high-density surface EMG signals. J Electromyogr Kinesiol. 2020;53:102426.
Tenan MS, Marti CN, Griffin L. Motor unit discharge rate is correlated within individuals: a case for multilevel model statistical analysis. J Electromyogr Kinesiol. 2014;24:917–22.
Del Vecchio A, Negro F, Holobar A, Casolo A, Folland JP, Felici F, et al. You are as fast as your motor neurons: speed of recruitment and maximal discharge of motor neurons determine the maximal rate of force development in humans. J Physiol. 2019;597:2445–56.
Lin C, Chen C, Cui Z, Zhu X. A Bi-GRU-attention neural network to identify motor units from high-density surface electromyographic signals in real time. Front Neurosci. 2024;18:1306054.
Glaser V, Holobar A, Ieee. Impact of Motor Unit Action Potential Components on the Motor Unit Identification from Dynamic High-Density Surface Electromyograms. In: Proceedings of 8th Int IEEE/EMBS Conf Neural Eng. 2017. 2017.
Gazzoni M, Botter A, Vieira T. Surface EMG and muscle fatigue: multi-channel approaches to the study of myoelectric manifestations of muscle fatigue. Physiol Meas. 2017;38:R27.
Acknowledgements
Not applicable.
Funding
This work was supported in part by the Shenzhen Medical Research Fund under Grant A2302006, the National Natural Science Foundation of China under Grants 61973220 and 62001463, the Medicine Plus Program of Shenzhen University under Grant 2024YG002, the Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012012, the Shenzhen Science and Technology Program under Grants KQTD20210811090217009.
Author information
Authors and Affiliations
Contributions
J.M. designed the study, collected and analyzed the data, and drafted the manuscript. L.W. assisted in collecting and analyzing the data and contributed to the data processing and model training. R.W. helped perform the statistical analysis and contributed to the interpretation of results. N.Z. conducted the literature review and provided critical insights. J.W. assisted in data processing and model validation. J.L. and Q.L. contributed to the methodology and overall study design. L.T. supervised the study and ensured compliance with ethical standards. N.J. and G.D. served as corresponding authors, providing guidance throughout the study and manuscript preparation, and contributed to drafting the manuscript and revising it critically for important intellectual content. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable. This study used a publicly available dataset containing human sEMG signals. Ethical approval and participant consent were provided by the original authors of the dataset, as detailed in their publication. No additional ethical approval was required for the secondary analysis of this data.
Consent for publication
Not applicable. This study does not contain data from any individual person.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ma, J., Wang, L., Wu, R. et al. A multi-label deep residual shrinkage network for high-density surface electromyography decomposition in real-time. J NeuroEngineering Rehabil 22, 106 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12984-025-01639-3
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12984-025-01639-3