Processing

Please wait...

Settings

Settings

Goto Application

1. US20180247639 - Systems and methods for automatic unit selection and target decomposition for sequence labelling

Office United States of America
Application Number 15698593
Application Date 07.09.2017
Publication Number 20180247639
Publication Date 30.08.2018
Grant Number 10373610
Grant Date 06.08.2019
Publication Kind B2
IPC
G10L 15/02
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
02Feature extraction for speech recognition; Selection of recognition unit
G10L 15/06
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/16
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G10L 15/197
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
18using natural language modelling
183using context dependencies, e.g. language models
19Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules
197Probabilistic grammars, e.g. word n-grams
G10L 15/04
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
04Segmentation; Word boundary detection
G10L 15/26
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
26Speech to text systems
CPC
G10L 15/04
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
04Segmentation; Word boundary detection
G10L 15/063
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
063Training
G10L 15/16
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G10L 15/187
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
18using natural language modelling
183using context dependencies, e.g. language models
187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
G10L 15/26
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
26Speech to text systems
G10L 15/197
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
18using natural language modelling
183using context dependencies, e.g. language models
19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
197Probabilistic grammars, e.g. word n-grams
Applicants Baidu USA, LLC
Baidu USA LLC
Inventors Hairong Liu
Zhenyao Zhu
Sanjeev Satheesh
Agents North Weber & Baugh LLP
Title
(EN) Systems and methods for automatic unit selection and target decomposition for sequence labelling
Abstract
(EN)

Described herein are systems and methods for automatic unit selection and target decomposition for sequence labelling. Embodiments include a new loss function called Gram-Connectionist Temporal Classification (CTC) loss that extend the popular CTC loss function criterion to alleviate prior limitations. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, embodiments of Gram-CTC allow a model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. It is also demonstrated that embodiments of Gram-CTC improve CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that systems that employ an embodiment of Gram-CTC can outperform the state-of-the-art on a standard speech benchmark.

Also published as