Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020062680 - WAVEFORM SPLICING METHOD AND APPARATUS BASED ON DOUBLE SYLLABLE MIXING, AND DEVICE, AND STORAGE MEDIUM

Publication Number WO/2020/062680
Publication Date 02.04.2020
International Application No. PCT/CN2018/124440
International Filing Date 27.12.2018
IPC
G10L 13/02 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
G10L 13/06 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
06Elementary speech units used in speech synthesisers; Concatenation rules
CPC
G10L 13/02
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
G10L 13/033
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
033Voice editing, e.g. manipulating the voice of the synthesiser
G10L 13/06
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
06Elementary speech units used in speech synthesisers; Concatenation rules
G10L 13/08
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
G10L 13/10
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
10Prosody rules derived from text; Stress or intonation
G10L 2013/105
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
10Prosody rules derived from text; Stress or intonation
105Duration
Applicants
  • 平安科技(深圳)有限公司 PING AN TECHNOLOGY (SHENZHEN) CO., LTD. [CN]/[CN]
Inventors
  • 房树明 FANG, Shuming
  • 程宁 CHENG, Ning
  • 王健宗 WANG, Jianzong
  • 肖京 XIAO, Jing
Agents
  • 北京英特普罗知识产权代理有限公司 INTELLECPRO CHINA LIMITED
Priority Data
201811153693.230.09.2018CN
Publication Language Chinese (ZH)
Filing Language Chinese (ZH)
Designated States
Title
(EN) WAVEFORM SPLICING METHOD AND APPARATUS BASED ON DOUBLE SYLLABLE MIXING, AND DEVICE, AND STORAGE MEDIUM
(FR) PROCÉDÉ ET APPAREIL D'ASSEMBLAGE DE FORMES D'ONDE BASÉS SUR UN MÉLANGE DE SYLLABES DOUBLES, ET DISPOSITIF, ET SUPPORT DE STOCKAGE
(ZH) 基于双音节混搭的波形拼接方法、装置、设备及存储介质
Abstract
(EN)
A waveform splicing method based on double syllable mixing, which belongs to the field of speech splicing synthesis. The method comprises: sound library production (step 10): dividing standard audio of a disyllabic word into first, middle and rear sections of audio according to Chinese vowels, with each section of audio being saved into a sound library as a primitive speech segment required for waveform splicing; text preprocessing (step 20): regularizing text to be converted into speech, and word-segmenting the regularized text according to speech rules to form phrases, and marking spelling and tone; phrase waveform splicing (step 30): in units of phrases after word segmentation, using every two adjacent words in phrases as a disyllabic word to be converted, and searching, from the sound library and according to a splicing rule, for a primitive speech segment corresponding to the disyllabic word to be converted; and text audio splicing (step 40): according to the order of each phrase, sequentially splicing an audio file of each phrase into a speech file of the text. According to the present invention, extremely realistic offline and real-time Chinese speech can be synthesized by means of double syllable mixing and Chinese vowel segmentation.
(FR)
La présente invention concerne un procédé d'assemblage de formes d'onde basé sur un mélange de syllabes doubles, qui se rapporte au domaine de la synthèse d'assemblage de parole. Le procédé comprend les étapes suivantes : production de sonothèque (étape 10) : diviser un audio standard d'un mot dissyllabique en une première section, en une section intermédiaire et en une section arrière de l'audio selon des voyelles chinoises, chaque section de l'audio étant sauvegardée dans une sonothèque en tant que segment de parole primitif requis pour un assemblage de formes d'onde ; pré-traitement de texte (étape 20) : régulariser le texte à convertir en parole, et segmenter en mots le texte régularisé selon des règles de parole pour former des locutions, et marquer l'épellation et le ton ; assemblage de formes d'onde de locution (étape 30) : dans des unités de locutions après la segmentation en mots, utiliser tous les deux mots adjacents dans des locutions comme mot dissyllabique à convertir, et rechercher, à partir de la sonothèque et selon une règle d'assemblage, un segment de parole primitif correspondant au mot dissyllabique à convertir ; et assemblage audio de texte (étape 40) : selon l'ordre de chaque locution, assembler séquentiellement un fichier audio de chaque locution en un fichier de parole du texte. Selon la présente invention, une parole chinoise hors ligne et en temps réel extrêmement réaliste peut être synthétisée au moyen d'un mélange de syllabes doubles et d'une segmentation en voyelles chinoises.
(ZH)
一种基于双音节混搭的波形拼接方法,属于语音拼接合成术领域。该方法包括:音库制作(步骤10):将双音节词的标准音频按韵母切分为前、中、后三段音频,每段音频作为波形拼接所需的基元语音片段保存至音库中;文本预处理(步骤20):将待转化成语音的文本正则化,对正则化后的文本按说话规则分词以形成短语,并标注拼音和声调;短语波形拼接(步骤30):以分词之后的短语为单位,将短语中每两个相邻的字作为一个待转化双音节词,根据拼接规则从音库中查找与所述待转化双音节词相对应的基元语音片段;文本音频拼接(步骤40):按各个短语的顺序,依次拼接各个短语的音频文件为所述文本的语音文件。其通过双音节混搭和韵母分割,可合成非常逼真的离线和实时中文语音。
Also published as
Latest bibliographic data on file with the International Bureau