Processing

Please wait...

Settings

Settings

Goto Application

1. WO2021038364 - SOFT-FORGETTING FOR CONNECTIONIST TEMPORAL CLASSIFICATION BASED AUTOMATIC SPEECH RECOGNITION

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS

1. A computer-implemented method comprising:

training, by one or more computer processors, a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information;

responsive to a completion of the training of the first model, initiating, by one or more computer processors, a training of a second model utilizing the one or more training batches;

jittering, by one or more computer processors, a random block size for each block of information for each of the one or more training batches for the second model;

unrolling, by one or more computer processors, the second model over one or more non-overlapping contiguous jittered blocks of information; and

responsive to the unrolling of the second model, reducing, by one or computer processors, overfitting for the second model by applying twin regularization.

2. The method of claim 1 , wherein the first model is a whole-utterance bidirectional long short-term memory network.

3. The method of claim 1, wherein the second model is a chunk-based bidirectional long short-term memory network.

4. The method of claim 1, wherein the one or more non-overlapping contiguous jittered blocks of information are calculated from a connectionist temporal classification loss.

5. The method of claim 1, wherein the blocks of information are an acoustic sequence.

6. The method of claim 1 , wherein the training batches comprise one or more acoustic sequences, wherein each acoustic sequence has an associated textual label.

7. The method of claim 1 , further comprising:

responsive to a completion of the training of the second model, disposing, by one or more computer processors, the first model; and

responsive to the completion of the training of second model, deploying, by one or more computer processors, the second model to one or more production environments.

8. The method of claim 1 , wherein twin regularization comprises a loss value and a connectionist temporal classification loss of the second model, wherein the loss value is a mean-squared error between hidden states of the first model and second model and the first model.

9. A computer program product comprising:

one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the stored program instructions comprising:

program instructions to train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information;

program instructions to, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches;

program instructions to jitter a random block size for each block of information for each of the one or more training batches for the second model;

program instructions to unroll the second model over one or more non-overlapping contiguous jittered blocks of information; and

program instructions to, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

10. The computer program product of claim 9, wherein the first model is a whole-utterance bidirectional long short-term memory network.

11. The computer program product of claim 9, wherein the second model is a chunk-based bidirectional long short-term memory network.

12. The computer program product of claim 9, wherein the one or more non-overlapping contiguous jittered blocks of information are calculated from a connectionist temporal classification loss.

13. The computer program product of claim 9, wherein the program instructions stored on the one or more computer readable storage media comprise:

program instructions to, responsive to a completion of the training of the second model, dispose, by one or more computer processors, the first model; and

program instructions to, responsive to the completion of the training of second model, deploy, by one or more computer processors, the second model to one or more production environments.

14. The computer program product of claim 9, wherein twin regularization comprises a loss value and a connectionist temporal classification loss of the second model, wherein the loss value is a mean-squared error between hidden states of the first model and second model and the first model.

15. A computer system comprising:

one or more computer processors;

one or more computer readable storage media; and

program instructions stored on the computer readable storage media for execution by at least one of the one or more processors, the stored program instructions comprising:

program instructions to train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information;

program instructions to, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches;

program instructions to jitter a random block size for each block of information for each of the one or more training batches for the second model;

program instructions to unroll the second model over one or more non-overlapping contiguous jittered blocks of information; and

program instructions to, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

16. The computer system of claim 15, wherein the first model is a whole-utterance bidirectional long short-term memory network.

17. The computer system of claim 15, wherein the second model is a chunk-based bidirectional long short term memory network.

18. The computer system of claim 15, wherein the one or more non-overlapping contiguous jittered blocks of information are calculated from a connectionist temporal classification loss.

19. The method of claim 15, wherein the blocks of information are an acoustic sequence.

20. The method of claim 15, wherein the training batches comprise one or more acoustic sequences, wherein each acoustic sequence has an associated textual label.

21. The computer system of claim 15, wherein the program instructions stored on the one or more computer readable storage media comprise:

program instructions to, responsive to a completion of the training of the second model, dispose, by one or more computer processors, the first model; and

program instructions to, responsive to the completion of the training of second model, deploy, by one or more computer processors, the second model to one or more production environments.

22. The computer system of claim 15, wherein twin regularization comprises a loss value and a connectionist temporal classification loss of the second model, wherein the loss value is a mean-squared error between hidden states of the first model and second model and the first model.

23. A computer program comprising program code means adapted to perform the method of any of claims 1 to 14 when said program is run on a computer.