WIPO - Search International and National Patent Collections

1.20240071386Interpreting words prior to vocalization

US - 29.02.2024

Int.Class G10L 15/25 Appl.No 18505368 Applicant Q (CUE) LTD. Inventor Aviad Maizels

Systems, methods, and computer program products are disclosed for removing noise from facial skin micromovement signals. Removing noise from facial skin micromovements includes, during a time period when an individual is involved in at least one non-speech-related physical activity, operating a light source in a manner enabling illumination of a facial skin region of the individual; receiving signals representing light reflections from the facial skin region; analyzing the received signals to identify a first reflection component indicative of prevocalization facial skin micromovements and a second reflection component associated with the at least one non-speech-related physical activity; and filtering out the second reflection component to enable interpretation of words from the first reflection component indicative of the prevocalization facial skin micromovements.

2.20240127824Identifying silent speech using recorded speech

US - 18.04.2024

Int.Class G06F 40/263 Appl.No 18505353 Applicant Q (CUE) LTD. Inventor Yonatan Wexler

Systems, methods, and non-transitory computer readable media including instructions for interpreting facial skin micromovements are disclosed. Interpreting facial skin micromovements includes receiving during a first time period first signals representing prevocalization facial skin micromovements, and receiving during a second time period succeeding the first time period, second signals representing sounds. The sounds are analyzed to identify words spoken during the second time period, and the words are correlated with the prevocalization facial skin micromovements received during the first time period. The correlations are stored for future use. During a third time period, third signals representing facial skin micromovements are received in an absence of vocalization. Using the correlations, language associated with the third signals is identified and outputted.

3.20210183392PHONEME-BASED NATURAL LANGUAGE PROCESSING

US - 17.06.2021

Int.Class G10L 15/26 Appl.No 17028361 Applicant LG ELECTRONICS INC. Inventor Kwangyong LEE

A natural language processing method and apparatus are disclosed. A natural language processing method according to an embodiment of the present disclosure includes extracting a phoneme string from a text corpus labeled with recognition information including at least one of one named entity (NE) or speech intention, generating a phoneme-based training data set by labeling the recognition information in the extracted phoneme string, and generating an artificial neural network-based learning model (LM) using the generated training data set. The natural language processing method of the present disclosure may be associated with an artificial intelligence module, a drone (Unmanned Aerial Vehicle, UAV), a robot, an AR (Augmented Reality) device, a VR (Virtual Reality) device, a device associated with 5G services, etc.

4.WO/2025/141559DETECTING AND USING NON-TEXTUAL INFORMATION IN HUMAN SPEECH

WO - 03.07.2025

Int.Class G10L 15/18 Appl.No PCT/IL2024/051195 Applicant YEDA RESEARCH AND DEVELOPMENT CO. LTD., Inventor HAREL, David

Automatic recognition of non-verbal messages in speech, and in particular to detection or analysis of prosodic multilayered analysis of intonation units such as prosodic unit prototypes and their multi-labeled variations, may form a hierarchical classification for the analysis of non¬ verbal information or cues in speech. A speech captured by a microphone is fed to a weakly- supervised deep learning acoustic model for speech recognition and transcription, that may be based on encoder-decoder Transformer architecture, such as Whisper by OpenAI. The model is trained to output multiple words form the text in the captured speech, to identify Intonation Units (IUs) that include one or more words, and associate non-verbal labels to each of the IUs. The labels may indicate a prototype, a discourse function (such as a conversation action), an emotion, an emphasis, or an attitude, as well as a genre of a part of, or whole of, the entire captured speech.

5.12112752Cohort determination in natural language processing

US - 08.10.2024

Int.Class G10L 15/22 Appl.No 17688279 Applicant Amazon Technologies, Inc. Inventor Rahul Gupta

Devices and techniques are generally described for cohort determination in natural language processing. In various examples, a first natural language input to a natural language processing system may be determined. The first natural language input may be associated with a first account identifier. A first machine learning model may determine first data representing one or more words of the first natural language input. A second machine learning model may determine second data representing one or more acoustic characteristics of the first natural language input. Third data may be determined, the third data including a predicted performance for processing the first natural language input by the natural language processing system. The third data may be determined based on the first data representation and the second data representation.

6.WO/2019/245916METHOD AND SYSTEM FOR PARAMETRIC SPEECH SYNTHESIS

WO - 26.12.2019

Int.Class G10L 13/08 Appl.No PCT/US2019/037294 Applicant GEORGETOWN UNIVERSITY Inventor GARMAN, Joe

Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.

7.20250117469USING FACIAL SKIN MICROMOVEMENTS TO PAIR WITH A COMPUTING DEVICE

US - 10.04.2025

Int.Class G06F 21/32 Appl.No 18982905 Applicant Q (CUE) LTD. Inventor Aviad MAIZELS

Systems, methods, and non-transitory computer-readable media including instructions for detecting and utilizing facial skin micromovements are disclosed. In some non-limiting embodiments, the detection of the facial skin micromovements occurs using a speech detection system that may include a wearable housing, a light source (either a coherent light source or a non-coherent light source), a light detector, and at least one processor. One or more processors may be configured to analyze light reflections received from a facial region to determine the facial skin micromovements, and extract meaning from the determined facial skin micromovements. Examples of meaning that may be extracted from the determined facial skin micromovements may include words spoken by the individual (either silently spoken or vocally spoken), an identification of the individual, an emotional state of the individual, a heart rate of the individual, a respiration rate of the individual, or any other biometric, emotion, or speech-related indicator.

8.20240127817Earbud with facial micromovement detection capabilities

US - 18.04.2024

Int.Class G06F 21/32 Appl.No 18512925 Applicant Q (CUE) LTD. Inventor Yonatan Wexler

A multifunctional earpiece comprising an ear-mountable housing, a speaker integrated with the ear-mountable housing for presenting sound, a light source integrated with the ear-mountable housing for projecting light toward skin of the wearer's face, a light detector integrated with the ear-mountable housing and configured to receive reflections from the skin corresponding to facial skin micromovements indicative of prevocalized words of the wearer, and wherein the multifunctional earpiece is configured to simultaneously present the sound through the speaker, project the light toward the skin, and detect the received reflections indicative of the prevocalized words.

9.20230178066Method and apparatus for synthesizing multi-speaker speech using artificial neural network

US - 08.06.2023

Int.Class G10L 13/047 Appl.No 17596037 Applicant IUCF-HYU (Industry-University Cooperation Foundation Hanyang University) Inventor Joon Hyuk Chang

According to an aspect, method for synthesizing multi-speaker speech using an artificial neural network comprises generating and storing a speech learning model for a plurality of users by subjecting a synthetic artificial neural network of a speech synthesis model to learning, based on speech data of the plurality of users, generating speaker vectors for a new user who has not been learned and the plurality of users who have already been learned by using a speaker recognition model, determining a speaker vector having the most similar relationship with the speaker vector of the new user according to preset criteria out of the speaker vectors of the plurality of users who have already been learned, and generating and learning a speaker embedding of the new user by subjecting the synthetic artificial neural network of the speech synthesis model to learning, by using a value of a speaker embedding of a user for the determined speaker vector as an initial value and based on speaker data of the new user.

10.20150127349Method and system for cross-lingual voice conversion

US - 07.05.2015

Int.Class G10L 15/00 Appl.No 14069492 Applicant Google Inc. Inventor Ioannis Agiomyrgiannakis

A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.

1 / 5,268

Go to Search input	`CTRL` + `SHIFT` +
Go to Results (selected record)	`CTRL` + `SHIFT` +
Go to Detail (selected tab)	`CTRL` + `SHIFT` +
Go to Next page	`CTRL` +
Go to Previous page	`CTRL` +

Go to Next record / image	/
Go to Previous record / image	/
Scroll Up	`Page Up`
Scroll Down	`Page Down`
Scroll to Top	`CTRL` + `Home`
Scroll to Bottom	`CTRL` + `End`

Processing

National Phase Entries

Authority File

Settings

Feedback

Goto Application

Save query

Query Tree

Refine Options

Full Query

Side-by-side view shortcuts

Analysis

Go to Next tab
Go to Previous tab