Processing

Please wait...

Settings

Settings

Goto Application

1. WO2017204843 - UNIT-SELECTION TEXT-TO-SPEECH SYNTHESIS BASED ON PREDICTED CONCATENATION PARAMETERS

Publication Number WO/2017/204843
Publication Date 30.11.2017
International Application No. PCT/US2016/053313
International Filing Date 23.09.2016
IPC
G06F 17/20 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
20Handling natural language data
G10L 13/00 2006.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
G10L 13/08 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
G10L 15/04 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
04Segmentation; Word boundary detection
CPC
G10L 13/0335
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
033Voice editing, e.g. manipulating the voice of the synthesiser
0335Pitch control
G10L 13/06
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
06Elementary speech units used in speech synthesisers; Concatenation rules
G10L 13/07
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
06Elementary speech units used in speech synthesisers; Concatenation rules
07Concatenation rules
G10L 13/10
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
10Prosody rules derived from text; Stress or intonation
Applicants
  • APPLE INC. [US]/[US]
Inventors
  • RAITIO, Tuomo, J.
  • PRAHALLAD, Kishore, Sunkeswari
  • CONKIE, Alistair, D.
  • GOLIPOUR, Ladan
  • WINARSKY, David
Agents
  • EIDE, Christopher, B.
  • ARJOMAND, Mehran
  • BELUSKO, Vincent, J.
  • BANKO, Max
  • BASOL, Erol, C.
Priority Data
62/341,94826.05.2016US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) UNIT-SELECTION TEXT-TO-SPEECH SYNTHESIS BASED ON PREDICTED CONCATENATION PARAMETERS
(FR) SYNTHÈSE TEXTE-PAROLE À SÉLECTION D'UNITÉS BASÉE SUR DES PARAMÈTRES DE CONCATÉNATION PRÉDITS
Abstract
(EN)
Systems and processes for performing unit- selection text-to- speech synthesis are provided. In an example process, text to be converted to speech is received. The text is represented as a sequence of target units. A plurality of candidate speech segments corresponding to the sequence of target units are selected. Predicted statistical parameters of acoustic features associated with the sequence of target units are determined. The predicted statistical parameters of acoustic features are used to determine target costs and concatenation costs associated with the plurality of candidate speech segments. Based on a combined cost determined from the target costs and concatenation costs, a subset of candidate speech segments is selected from the plurality of candidate speech segments. Speech corresponding to the received text is generated using the subset of candidate speech segments.
(FR)
L'invention concerne des systèmes et des procédés permettant d'effectuer une synthèse texte-parole à sélection d'unités. Dans un exemple de processus, le texte devant être converti en parole est reçu. Le texte est représenté sous la forme d'une séquence d'unités cibles. Une pluralité de segments de parole candidats correspondant à la séquence d'unités cibles est sélectionnée. Des paramètres statistiques prédits de caractéristiques acoustiques associées à la séquence d'unités cibles sont déterminés. Les paramètres statistiques prédits des caractéristiques acoustiques sont utilisés pour déterminer des coûts cibles et des coûts de concaténation associés à la pluralité de segments de parole candidats. Sur la base d'un coût combiné déterminé à partir des coûts cibles et des coûts de concaténation, un sous-ensemble de segments de parole candidats est sélectionné parmi la pluralité de segments de parole candidats. La parole correspondant au texte reçu est générée à l'aide du sous-ensemble de segments de parole candidats.
Latest bibliographic data on file with the International Bureau