Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020113053 - METHOD AND SYSTEM FOR PROCESSING SPEECH SIGNAL

Publication Number WO/2020/113053
Publication Date 04.06.2020
International Application No. PCT/US2019/063677
International Filing Date 27.11.2019
IPC
G06N 3/08 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G10L 15/00 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
G10L 15/02 2006.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
02Feature extraction for speech recognition; Selection of recognition unit
G10L 15/06 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/08 2006.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
CPC
G10L 15/16
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G10L 15/187
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
18using natural language modelling
183using context dependencies, e.g. language models
187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
G10L 25/12
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
03characterised by the type of extracted parameters
12the extracted parameters being prediction coefficients
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
27characterised by the analysis technique
30using neural networks
Applicants
  • ALIBABA GROUP HOLDING LIMITED
Inventors
  • ZHANG, Shiliang
  • LEI, Ming
  • LI, Wei
  • YAO, Haitao
Agents
  • CAPRON, Aaron, J.
Priority Data
201811457674.930.11.2018CN
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) METHOD AND SYSTEM FOR PROCESSING SPEECH SIGNAL
(FR) PROCÉDÉ ET SYSTÈME DE TRAITEMENT DE SIGNAL VOCAL
Abstract
(EN)
Embodiments of the present disclosure provide methods and systems for processing a speech signal. The method can include: processing the speech signal to generate a plurality of speech frames, generating a first number of acoustic features based on the plurality of speech frames using a frame shift at a given frequency; and generating a second number of posteriori probability vectors based on the first number of acoustic features using an acoustic model, wherein each of the posteriori probability vectors comprises probabilities of the acoustic features corresponding to a plurality of modeling units, respectively.
(FR)
Dans certains modes de réalisation, la présente invention concerne des procédés et des systèmes de traitement d'un signal vocal. Le procédé peut consister à : traiter le signal vocal pour générer une pluralité de trames vocales, générer un premier nombre de caractéristiques acoustiques sur la base de la pluralité de trames vocales à l'aide d'un décalage de trame à une fréquence donnée ; et générer un second nombre de vecteurs de probabilité a posteriori sur la base du premier nombre de caractéristiques acoustiques à l'aide d'un modèle acoustique, chacun des vecteurs de probabilité a posteriori comprenant les probabilités que les caractéristiques acoustiques correspondent, respectivement, à une pluralité d'unités de modélisation.
Latest bibliographic data on file with the International Bureau