Abstract
Sign language segmentation is a fundamental task in sign language processing to implement automatic translation systems. In this work, we study and compare the performance of two state-of-the-art methods for automatic sign language segmentation: 'Automatic Segmentation of Sign Language into Subtitle-Units' [Bull et al., European Conference on Computer Vision Workshops, 2020] and 'Linguistically Motivated Sign Language Segmentation' [Moryossef et al., Findings of the Association for Computational Linguistics, 2023]. Each method has an online demo available that can be used to run the approaches here presented on example videos, varying parameters such as considered pose models and probability thresholds. Both methods use pauses and movements of the derived skeletons to detect the limits of a phrase. We consider two datasets, one of American Sign Language (the test set of How2Sign) and one of Uruguayan Sign Language (LSU-DS). For the evaluation, we consider two metrics used in the paper of Moryossef et al. In the case of LSU-DS, as we have triplets of simultaneous videos taken from different points of view, we propose to use the IoU dispersion among points of view to estimate the coherence of the temporal segmentation of a unique signer simultaneously observed by different cameras. The performances of the different variants of each method are evaluated, showing the limits of the methods, the datasets, and the metrics to capture the quality of the automatic solutions.
This is an MLBriefs article, the source code has not been reviewed!
The original source codes are available here (LMSLS) and here (ASSLiSU) (last checked 2025/07/26).
Download
- full text preprint manuscript: PDF (4.4MB)
- source codes: ZIP ZIP