Processing

Please wait...

Settings

Settings

Goto Application

1. WO2022068233 - SPEECH RECOGNITION METHOD AND APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM

Publication Number WO/2022/068233
Publication Date 07.04.2022
International Application No. PCT/CN2021/096848
International Filing Date 28.05.2021
IPC
G10L 15/02 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
02Feature extraction for speech recognition; Selection of recognition unit
G10L 15/08 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
G10L 15/26 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
26Speech to text systems
G10L 25/03 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
03characterised by the type of extracted parameters
CPC
G10L 15/02
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
02Feature extraction for speech recognition; Selection of recognition unit
G10L 15/08
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
G10L 15/26
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
26Speech to text systems
G10L 25/03
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
03characterised by the type of extracted parameters
Applicants
  • 北京捷通华声科技股份有限公司 BEIJING SINOVOICE TECHNOLOGY CO., LTD [CN]/[CN]
Inventors
  • 李健 LI, Jian
  • 韩雨 HAN, Yu
  • 武卫东 WU, Weidong
  • 陈明 CHEN, Ming
Agents
  • 北京康信知识产权代理有限责任公司 KANGXIN PARTNERS, P.C.
Priority Data
202011046734.529.09.2020CN
Publication Language Chinese (zh)
Filing Language Chinese (ZH)
Designated States
Title
(EN) SPEECH RECOGNITION METHOD AND APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM
(FR) PROCÉDÉ ET APPAREIL DE RECONNAISSANCE DE LA PAROLE, ET SUPPORT DE STOCKAGE LISIBLE PAR ORDINATEUR
(ZH) 一种语音识别的方法、装置及计算机可读存储介质
Abstract
(EN) A speech recognition method and apparatus, and a computer-readable storage medium. The method comprises: converting acquired audio data into a corresponding spectrogram (101); determining whether the number of frames of the spectrogram is a pre-set number of frames (102); if the number of frames of the spectrogram is not the pre-set number of frames, performing zero padding on the spectrogram, such that the number of frames of a spectrogram to be recognized that is obtained after zero padding is the pre-set number of frames (103); inputting said spectrogram into an acoustic model (104); and obtaining recognition text output by the acoustic model (105). Compared with information loss in a frequency domain caused by the calculation of MFCC features in the prior art, the solution has the advantages of reducing the loss of input features, increasing the recognition degree of audio data, and facilitating the extraction of feature information by an acoustic model.
(FR) La présente invention concerne un procédé et un appareil de reconnaissance de la parole, et un support de stockage lisible par ordinateur. Le procédé comprend les étapes consistant à : convertir des données audio acquises en un spectrogramme correspondant (101) ; déterminer si le nombre de trames du spectrogramme est un nombre prédéfini de trames (102) ; si le nombre de trames du spectrogramme n'est pas le nombre prédéfini de trames, mettre en oeuvre un remplissage par zéros sur le spectrogramme afin que le nombre de trames d'un spectrogramme à reconnaître, qui est obtenu après le remplissage par zéros, soit le nombre prédéfini de trames (103) ; appliquer ledit spectrogramme à l'entrée d'un modèle acoustique (104) ; et obtenir un texte de reconnaissance produit par le modèle acoustique (105). Par rapport à une perte d'informations dans un domaine fréquentiel causée par le calcul de caractéristiques MFCC dans l'état de la technique, la solution de l'invention présente les avantages suivants : réduction de la perte de caractéristiques d'entrée, augmentation du degré de reconnaissance de données audio et facilitation de l'extraction d'informations de caractéristiques par un modèle acoustique.
(ZH) 一种语音识别的方法、装置及计算机可读存储介质。该方法包括:将获取的音频数据转化为对应的语谱图(101);判断语谱图的帧数是否为预设帧数(102);若语谱图的帧数不为预设帧数,则对语谱图进行补零,以使补零后得到的待识别语谱图的帧数为预设帧数(103);将待识别语谱图输入到声学模型(104);获得声学模型输出的识别文本(105)。相较现有技术计算MFCC特征造成的频域上的信息损失,该方案减少了输入特征的损失,增加了音频数据的辨识度,并且更加有利于声学模型提取特征信息。
Related patent documents
Latest bibliographic data on file with the International Bureau