Traitement en cours

Veuillez attendre...

Paramétrages

Paramétrages

Aller à Demande

1. WO2020117586 - TRAITEMENT D’ENTRÉES VOCALES

Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

[ EN ]

CLAIMS

1. A computer-implemented method comprising:

receiving, by a computing device, audio data of an utterance;

generating, by the computing device using (i) a neural network or (ii) an acoustic model and a language model, a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores that each reflect a likelihood that a respective candidate transcription is a match for the utterance;

determining, by the computing device, a context of the computing device; based on the context of the computing device, identifying, by the computing device, grammars that correspond to the multiple candidate transcriptions;

based on the current context, determining, by the computing device and for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription;

based on the transcription confidence scores and the grammar confidence scores, selecting, by the computing device and from among the candidate transcriptions, a candidate transcription; and

providing, for output by the computing device, the selected candidate transcription as a transcription of the utterance.

2. The method of claim 1 , comprising:

determining that two or more of the grammars correspond to one of the candidate transcriptions; and

based on determining that two or more of the grammars correspond to one of the candidate transcriptions, adjusting the grammar confidence scores for the two or more grammars,

wherein the computing device selects, from among the candidate

transcriptions, the candidate transcription based on the transcription confidence scores and the adjusted grammar confidence scores.

3. The method of claim 2, wherein adjusting the grammar confidence scores for the two or more grammars comprises:

increasing each of the grammar confidence scores for each of the two or more grammars by a factor.

4. The method of claim 2, comprising:

determining, for each of the candidate transcriptions, a product of the respective transcription confidence score and the respective grammar confidence score,

wherein the computing device selects, from among the candidate

transcriptions, the candidate transcription based on the products of the transcription confidence scores and the respective grammar confidence scores.

5. The method of claim 1 , wherein determining, by the computing device, the context of the computing device is based on a location of the computing device, an application running in a foreground of the computing device, and a time of day.

6. The method of claim 1 , wherein:

the language model is configured to identify probabilities for sequences of terms included in the word lattice, and

the acoustic model is configured to identify a phoneme that matches a portion of the audio data.

7. The method of claim 1 , comprising:

performing, by the computing device, an action that is based on the selected candidate transcription and a grammar that matches the selected candidate transcription.

8. A system comprising:

one or more computers; and

one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:

receiving, by a computing device, audio data of an utterance;

generating, by the computing device using (i) a neural network or (ii) an acoustic model and a language model, a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores that each reflect a likelihood that a respective candidate transcription is a match for the utterance;

determining, by the computing device, a context of the computing device;

based on the context of the computing device, identifying, by the computing device, grammars that correspond to the multiple candidate

transcriptions;

based on the current context, determining, by the computing device and for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription;

based on the transcription confidence scores and the grammar confidence scores, selecting, by the computing device and from among the candidate transcriptions, a candidate transcription; and

providing, for output by the computing device, the selected candidate transcription as a transcription of the utterance.

9. The system of claim 8, wherein the operations comprise:

determining that two or more of the grammars correspond to one of the candidate transcriptions; and

based on determining that two or more of the grammars correspond to one of the candidate transcriptions, adjusting the grammar confidence scores for the two or more grammars,

wherein the computing device selects, from among the candidate

transcriptions, the candidate transcription based on the transcription confidence scores and the adjusted grammar confidence scores.

10. The system of claim 9, wherein adjusting the grammar confidence scores for the two or more grammars comprises:

increasing each of the grammar confidence scores for each of the two or more grammars by a factor.

1 1. The system of claim 9, wherein the operations comprise:

determining, for each of the candidate transcriptions, a product of the respective transcription confidence score and the respective grammar confidence score,

wherein the computing device selects, from among the candidate

transcriptions, the candidate transcription based on the products of the transcription confidence scores and the respective grammar confidence scores.

12. The system of claim 8, wherein determining, by the computing device, the context of the computing device is based on a location of the computing device, an application running in a foreground of the computing device, and a time of day.

13. The system of claim 8, wherein the operations comprise:

the language model is configured to identify probabilities for sequences of terms included in the word lattice, and

the acoustic model is configured to identify a phoneme that matches a portion of the audio data.

14. The system of claim 8, wherein the operations comprise:

performing, by the computing device, an action that is based on the selected candidate transcription and a grammar that matches the selected candidate transcription.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:

receiving, by a computing device, audio data of an utterance;

generating, by the computing device using (i) a neural network or (ii) an acoustic model and a language model, a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores that each reflect a likelihood that a respective candidate transcription is a match for the utterance;

determining, by the computing device, a context of the computing device;

based on the context of the computing device, identifying, by the computing device, grammars that correspond to the multiple candidate transcriptions;

based on the current context, determining, by the computing device and for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription;

based on the transcription confidence scores and the grammar confidence scores, selecting, by the computing device and from among the candidate transcriptions, a candidate transcription; and

providing, for output by the computing device, the selected candidate transcription as a transcription of the utterance.

16. The medium of claim 15, wherein the operations comprise:

determining that two or more of the grammars correspond to one of the candidate transcriptions; and

based on determining that two or more of the grammars correspond to one of the candidate transcriptions, adjusting the grammar confidence scores for the two or more grammars,

wherein the computing device selects, from among the candidate

transcriptions, the candidate transcription based on the transcription confidence scores and the adjusted grammar confidence scores.

17. The medium of claim 15, wherein the operations comprise:

determining, for each of the candidate transcriptions, a product of the respective transcription confidence score and the respective grammar confidence score,

wherein the computing device selects, from among the candidate

transcriptions, the candidate transcription based on the products of the transcription confidence scores and the respective grammar confidence scores.

18. The medium of claim 15, wherein determining, by the computing device, the context of the computing device is based on a location of the computing device, an application running in a foreground of the computing device, and a time of day.

19. The medium of claim 15, wherein the operations comprise:

the language model is configured to identify probabilities for sequences of terms included in the word lattice, and

the acoustic model is configured to identify a phoneme that matches a portion of the audio data.

20. The medium of claim 15, wherein the operations comprise:

performing, by the computing device, an action that is based on the selected candidate transcription and a grammar that matches the selected candidate transcription.