Processing

Please wait...

Settings

Settings

Goto Application

1. WO2022030805 - SPEECH RECOGNITION SYSTEM AND METHOD FOR AUTOMATICALLY CALIBRATING DATA LABEL

Publication Number WO/2022/030805
Publication Date 10.02.2022
International Application No. PCT/KR2021/009250
International Filing Date 19.07.2021
IPC
G10L 19/005 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
19Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
005Correction of errors induced by the transmission channel, if related to the coding algorithm
G10L 21/02 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
21Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
02Speech enhancement, e.g. noise reduction or echo cancellation
G10L 19/26 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
19Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
04using predictive techniques
26Pre-filtering or post-filtering
G10L 15/14 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
14using statistical models, e.g. Hidden Markov Models
CPC
G10L 15/14
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
14using statistical models, e.g. Hidden Markov Models [HMMs]
G10L 19/005
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
19Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
005Correction of errors induced by the transmission channel, if related to the coding algorithm
G10L 19/26
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
19Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
04using predictive techniques
26Pre-filtering or post-filtering
G10L 21/02
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
21Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
02Speech enhancement, e.g. noise reduction or echo cancellation
Applicants
  • 한양대학교 산학협력단 IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY) [KR]/[KR]
Inventors
  • 장준혁 CHANG, Joon-Hyuk
  • 이재홍 LEE, Jaehong
Agents
  • 양성보 YANG, Sungbo
Priority Data
10-2020-009692303.08.2020KR
Publication Language Korean (ko)
Filing Language Korean (KO)
Designated States
Title
(EN) SPEECH RECOGNITION SYSTEM AND METHOD FOR AUTOMATICALLY CALIBRATING DATA LABEL
(FR) SYSTÈME ET PROCÉDÉ DE RECONNAISSANCE VOCALE POUR ÉTALONNER AUTOMATIQUEMENT UNE ÉTIQUETTE DE DONNÉES
(KO) 데이터 라벨을 자동 교정하는 음성 인식 시스템 및 방법
Abstract
(EN) Proposed are a speech recognition system and method for automatically calibrating a data label. A speech recognition method for automatically calibrating a data label according to an embodiment may comprise the steps of: performing confidence-based filtering to find the location of occurrence of a wrong label in time-series speech data, in which a correct label and the wrong label are temporally mixed, by using a transformer-based speech recognition model; and after performing filtering, replacing a label at a decoder time step, which has been determined to be a wrong label by the location of occurrence of the wrong label, so as to improve the performance of the transformer-based speech recognition model, wherein the step of performing confidence-based filtering to find the location of occurrence of the wrong label in the time-series speech data comprises finding and calibrating the wrong label using the confidence obtained by using a transition probability between labels at every decoder time step.
(FR) L'invention concerne un système et un procédé de reconnaissance vocale pour étalonner automatiquement une étiquette de données. Un procédé de reconnaissance vocale pour étalonner automatiquement une étiquette de données selon un mode de réalisation peut comprendre les étapes consistant à : réaliser un filtrage basé sur la confiance pour trouver l'emplacement d'occurrence d'une étiquette incorrecte dans des données de parole en série chronologique, dans lequel une étiquette correcte et l'étiquette incorrecte sont temporairement mélangées, à l'aide d'un modèle de reconnaissance vocale basé sur un transformateur ; et après la réalisation du filtrage, remplacer une étiquette à une étape temporelle de décodage, qui a été déterminée comme étant une étiquette incorrecte par l'emplacement d'occurrence de l'étiquette incorrecte, de façon à améliorer les performances du modèle de reconnaissance vocale basé sur un transformateur, l'étape consistant à réaliser un filtrage basé sur la confiance pour trouver l'emplacement d'occurrence de l'étiquette incorrecte dans les données de parole en série chronologique comprenant la recherche et l'étalonnage de l'étiquette incorrecte à l'aide de la confiance obtenue par utilisation d'une probabilité de transition entre les étiquettes à chaque étape temporelle de décodage.
(KO) 데이터 라벨을 자동 교정하는 음성 인식 시스템 및 방법이 제시된다. 일 실시예에 따른 데이터 라벨을 자동 교정하는 음성 인식 방법은, 트랜스포머(Transformer) 기반 음성 인식 모델을 이용하여 정답 라벨과 잘못된 라벨이 시간적으로 혼재해 있는 시계열 음성 데이터에서 잘못된 라벨의 발생 위치를 찾기 위해 신뢰성 기반(confidence-based) 필터링을 수행하는 단계; 및 필터링 후, 상기 잘못된 라벨의 발생 위치에 의해 잘못된 라벨로 판단된 디코더 타임 스텝(decoder time step)에서의 라벨을 교체하여 상기 트랜스포머(Transformer) 기반 음성 인식 모델의 성능을 향상시키는 단계를 포함하고, 상기 시계열 음성 데이터에서 잘못된 라벨의 발생 위치를 찾기 위해 신뢰성 기반(confidence-based) 필터링을 수행하는 단계는, 매 디코더 타임 스텝(decoder time step)마다의 라벨 간의 전이(transition) 확률을 이용한 신뢰성(confidence)으로 잘못된 라벨을 찾아 교정할 수 있다.
Related patent documents
Latest bibliographic data on file with the International Bureau