A Presentation and Short Discussion of rVAD-fast, a Fast Voice Activity Detector

Sam Perochon

doi:10.5201/ipol.2022.427

Sam Perochon

published: 2022-10-11
reference: Sam Perochon, A Presentation and Short Discussion of rVAD-fast, a Fast Voice Activity Detector, Image Processing On Line, 12 (2022), pp. 404–419. https://doi.org/10.5201/ipol.2022.427

Communicated by Jean-Michel Morel
Demo edited by Sam Perochon

Abstract

Voice activity detection (VAD) usually refers to the detection of human voices in acoustic signals and is often used as a pre-processing step in numerous audio signal processing tasks. The unsupervised method proposed here was originally developed by Zheng-Hua Tan, Achintya kr. Sarkar and Najim Dehak [Computer Speech & Language, 2020] and consists of a robust segment-based approach. The voice activity detection stage follows two denoising steps. The first one detects high energy segments using a posteriori SNR weighted energy difference, and the second enhances the speech using the MSNE-mod approach. Use cases or downstream tasks include intrusion detection, speech-to-text, speaker diarization, or emotion estimation.

This is an MLBriefs article, the source code has not been reviewed!
The original source code is available here (last checked 2022/10/10).

Download

full text manuscript: PDF (2.1MB)
source code: TAR/GZ

Abstract

Download

Preview