Processing

Please wait...

Settings

Settings

Goto Application

Offices all Languages en Stemming true Single Family Member false Include NPL false
RSS feed can only be generated if you have a WIPO account

Save query

A private query is only visible to you when you are logged-in and can not be used in RSS feeds

Query Tree

Refine Options

Offices
All
Specify the language of your search keywords
Stemming reduces inflected words to their stem or root form.
For example the words fishing, fished,fish, and fisher are reduced to the root word,fish,
so a search for fisher returns all the different variations
Returns only one member of a family of patents
Include Non-Patent literature in results

Full Query

AIfunctionalapplicationsSpeechProcessing

Side-by-side view shortcuts

General
Go to Search input
CTRL + SHIFT +
Go to Results (selected record)
CTRL + SHIFT +
Go to Detail (selected tab)
CTRL + SHIFT +
Go to Next page
CTRL +
Go to Previous page
CTRL +
Results (First, do 'Go to Results')
Go to Next record / image
/
Go to Previous record / image
/
Scroll Up
Page Up
Scroll Down
Page Down
Scroll to Top
CTRL + Home
Scroll to Bottom
CTRL + End
Detail (First, do 'Go to Detail')
Go to Next tab
Go to Previous tab

Analysis

1.20240071386Interpreting words prior to vocalization
US 29.02.2024
Int.Class G10L 15/25
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
24Speech recognition using non-acoustical features
25using position of the lips, movement of the lips or face analysis
Appl.No 18505368 Applicant Q (CUE) LTD. Inventor Aviad Maizels

Systems, methods, and computer program products are disclosed for removing noise from facial skin micromovement signals. Removing noise from facial skin micromovements includes, during a time period when an individual is involved in at least one non-speech-related physical activity, operating a light source in a manner enabling illumination of a facial skin region of the individual; receiving signals representing light reflections from the facial skin region; analyzing the received signals to identify a first reflection component indicative of prevocalization facial skin micromovements and a second reflection component associated with the at least one non-speech-related physical activity; and filtering out the second reflection component to enable interpretation of words from the first reflection component indicative of the prevocalization facial skin micromovements.

2.20240127824Identifying silent speech using recorded speech
US 18.04.2024
Int.Class G06F 40/263
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
40Handling natural language data
20Natural language analysis
263Language identification
Appl.No 18505353 Applicant Q (CUE) LTD. Inventor Yonatan Wexler

Systems, methods, and non-transitory computer readable media including instructions for interpreting facial skin micromovements are disclosed. Interpreting facial skin micromovements includes receiving during a first time period first signals representing prevocalization facial skin micromovements, and receiving during a second time period succeeding the first time period, second signals representing sounds. The sounds are analyzed to identify words spoken during the second time period, and the words are correlated with the prevocalization facial skin micromovements received during the first time period. The correlations are stored for future use. During a third time period, third signals representing facial skin micromovements are received in an absence of vocalization. Using the correlations, language associated with the third signals is identified and outputted.

3.20210183392PHONEME-BASED NATURAL LANGUAGE PROCESSING
US 17.06.2021
Int.Class G10L 15/26
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
26Speech to text systems
Appl.No 17028361 Applicant LG ELECTRONICS INC. Inventor Kwangyong LEE

A natural language processing method and apparatus are disclosed. A natural language processing method according to an embodiment of the present disclosure includes extracting a phoneme string from a text corpus labeled with recognition information including at least one of one named entity (NE) or speech intention, generating a phoneme-based training data set by labeling the recognition information in the extracted phoneme string, and generating an artificial neural network-based learning model (LM) using the generated training data set. The natural language processing method of the present disclosure may be associated with an artificial intelligence module, a drone (Unmanned Aerial Vehicle, UAV), a robot, an AR (Augmented Reality) device, a VR (Virtual Reality) device, a device associated with 5G services, etc.

4.WO/2025/141559DETECTING AND USING NON-TEXTUAL INFORMATION IN HUMAN SPEECH
WO 03.07.2025
Int.Class G10L 15/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
18using natural language modelling
Appl.No PCT/IL2024/051195 Applicant YEDA RESEARCH AND DEVELOPMENT CO. LTD., Inventor HAREL, David
Automatic recognition of non-verbal messages in speech, and in particular to detection or analysis of prosodic multilayered analysis of intonation units such as prosodic unit prototypes and their multi-labeled variations, may form a hierarchical classification for the analysis of non¬ verbal information or cues in speech. A speech captured by a microphone is fed to a weakly- supervised deep learning acoustic model for speech recognition and transcription, that may be based on encoder-decoder Transformer architecture, such as Whisper by OpenAI. The model is trained to output multiple words form the text in the captured speech, to identify Intonation Units (IUs) that include one or more words, and associate non-verbal labels to each of the IUs. The labels may indicate a prototype, a discourse function (such as a conversation action), an emotion, an emphasis, or an attitude, as well as a genre of a part of, or whole of, the entire captured speech.
5.12112752Cohort determination in natural language processing
US 08.10.2024
Int.Class G10L 15/22
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
22Procedures used during a speech recognition process, e.g. man-machine dialog
Appl.No 17688279 Applicant Amazon Technologies, Inc. Inventor Rahul Gupta

Devices and techniques are generally described for cohort determination in natural language processing. In various examples, a first natural language input to a natural language processing system may be determined. The first natural language input may be associated with a first account identifier. A first machine learning model may determine first data representing one or more words of the first natural language input. A second machine learning model may determine second data representing one or more acoustic characteristics of the first natural language input. Third data may be determined, the third data including a predicted performance for processing the first natural language input by the natural language processing system. The third data may be determined based on the first data representation and the second data representation.

6.WO/2019/245916METHOD AND SYSTEM FOR PARAMETRIC SPEECH SYNTHESIS
WO 26.12.2019
Int.Class G10L 13/08
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Appl.No PCT/US2019/037294 Applicant GEORGETOWN UNIVERSITY Inventor GARMAN, Joe
Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
7.20250117469USING FACIAL SKIN MICROMOVEMENTS TO PAIR WITH A COMPUTING DEVICE
US 10.04.2025
Int.Class G06F 21/32
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
21Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
30Authentication, i.e. establishing the identity or authorisation of security principals
31User authentication
32using biometric data, e.g. fingerprints, iris scans or voiceprints
Appl.No 18982905 Applicant Q (CUE) LTD. Inventor Aviad MAIZELS

Systems, methods, and non-transitory computer-readable media including instructions for detecting and utilizing facial skin micromovements are disclosed. In some non-limiting embodiments, the detection of the facial skin micromovements occurs using a speech detection system that may include a wearable housing, a light source (either a coherent light source or a non-coherent light source), a light detector, and at least one processor. One or more processors may be configured to analyze light reflections received from a facial region to determine the facial skin micromovements, and extract meaning from the determined facial skin micromovements. Examples of meaning that may be extracted from the determined facial skin micromovements may include words spoken by the individual (either silently spoken or vocally spoken), an identification of the individual, an emotional state of the individual, a heart rate of the individual, a respiration rate of the individual, or any other biometric, emotion, or speech-related indicator.

8.20240127817Earbud with facial micromovement detection capabilities
US 18.04.2024
Int.Class G06F 21/32
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
21Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
30Authentication, i.e. establishing the identity or authorisation of security principals
31User authentication
32using biometric data, e.g. fingerprints, iris scans or voiceprints
Appl.No 18512925 Applicant Q (CUE) LTD. Inventor Yonatan Wexler

A multifunctional earpiece comprising an ear-mountable housing, a speaker integrated with the ear-mountable housing for presenting sound, a light source integrated with the ear-mountable housing for projecting light toward skin of the wearer's face, a light detector integrated with the ear-mountable housing and configured to receive reflections from the skin corresponding to facial skin micromovements indicative of prevocalized words of the wearer, and wherein the multifunctional earpiece is configured to simultaneously present the sound through the speaker, project the light toward the skin, and detect the received reflections indicative of the prevocalized words.

9.20230178066Method and apparatus for synthesizing multi-speaker speech using artificial neural network
US 08.06.2023
Int.Class G10L 13/047
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
04Details of speech synthesis systems, e.g. synthesiser structure or memory management
047Architecture of speech synthesisers
Appl.No 17596037 Applicant IUCF-HYU (Industry-University Cooperation Foundation Hanyang University) Inventor Joon Hyuk Chang

According to an aspect, method for synthesizing multi-speaker speech using an artificial neural network comprises generating and storing a speech learning model for a plurality of users by subjecting a synthetic artificial neural network of a speech synthesis model to learning, based on speech data of the plurality of users, generating speaker vectors for a new user who has not been learned and the plurality of users who have already been learned by using a speaker recognition model, determining a speaker vector having the most similar relationship with the speaker vector of the new user according to preset criteria out of the speaker vectors of the plurality of users who have already been learned, and generating and learning a speaker embedding of the new user by subjecting the synthetic artificial neural network of the speech synthesis model to learning, by using a value of a speaker embedding of a user for the determined speaker vector as an initial value and based on speaker data of the new user.

10.20150127349Method and system for cross-lingual voice conversion
US 07.05.2015
Int.Class G10L 15/00
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
Appl.No 14069492 Applicant Google Inc. Inventor Ioannis Agiomyrgiannakis

A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.