Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020223495 - SIGNAL COMPONENT ESTIMATION USING COHERENCE

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS

What is claimed is:

1 . A method for estimating a power spectral density of a signal component, the method comprising:

receiving, at one or more processing devices, an input signal representing audio captured using a microphone, the input signal comprising at least a first portion that represents acoustic output from a first audio source in an environment, and a second portion that represents other acoustic energy in the environment;

iteratively modifying, by the one or more processing devices, a frequency domain representation of the input signal, such that the modified frequency domain representation represents a portion of the input signal in which effects due to all but a selected one of the first and second portion is substantially reduced;

determining, from the modified frequency domain representation, an estimate of a power spectral density of the selected portion; and

at least one of reducing noise or echo in a microphone signal based upon the estimated power spectral density or inserting noise in a far end system based upon the estimated power spectral density.

2. The method of claim 1 wherein the frequency domain representation includes, for each of a number of frequency bins:

(i) values that each represent the power of an acoustic output of the first audio source for the particular frequency bin, and

(ii) values that each represent a level of coherence between the acoustic output of the first audio source and the input signal.

3. The method of claim 1 wherein the frequency domain representation comprises a cross-spectral density matrix computed based on an output of the first audio source.

4. The method of claim 3 wherein iteratively modifying the frequency domain representation comprises executing a matrix diagonalization process on the cross-spectral density matrix.

5. The method of claim 1 wherein the input signal includes a third portion that that represents acoustic output from a second audio source in the environment and wherein the selected portion is one of the first, second, or third portion.

6. The method of claim 5 wherein the frequency domain representation includes, for each of a number of frequency bins:

(i) values that each represent a level of coherence between acoustic outputs from the first and second audio sources,

(ii) values that each represent a level of coherence between an acoustic output of a particular one of the first and second audio sources and the input signal, and

(iii) values that each represent the power of the acoustic output for the particular frequency bin, of one of the first and second audio sources.

7. The method of claim 5 wherein the frequency domain representation comprises a cross-spectral density matrix computed based on outputs of the first and second audio sources.

8. The method of claim 7 wherein iteratively modifying the frequency domain representation comprises executing a matrix diagonalization process on the cross-spectral density matrix.

9. A system comprising:

a signal analysis engine comprising one or more processing devices, the signal analysis engine configured to:

receive an input signal representing audio captured using a microphone, the input signal comprising at least a first portion that represents acoustic output from a first audio source in an environment and a second portion that represents other acoustic energy in the environment;

iteratively modify a frequency domain representation of the input signal, such that the modified frequency domain representation represents a portion of the input signal in which effects due to all but a selected one of the first and second portion is substantially reduced;

determine, from the modified frequency domain representation, an estimate of a power spectral density of the selected portion; and

at least one of reduce noise or echo in a microphone signal based upon the estimated power spectral density or insert noise in a far end system based upon the estimated power spectral density.

10. The system of claim 9 wherein the frequency domain representation includes, for each of a number of frequency bins:

(i) values that each represent the power of an acoustic output of the first audio source for the particular frequency bin, and

(ii) values that each represent a level of coherence between the acoustic output of the first audio source and the input signal.

11 . The system of claim 9 wherein the frequency domain representation

comprises a cross-spectral density matrix computed based on an output of the first audio source.

12. The system of claim 11 wherein iteratively modifying the frequency domain representation comprises executing a matrix diagonalization process on the cross-spectral density matrix.

13. The system of claim 9 wherein the input signal includes a third portion that that represents acoustic output from a second audio source in the environment and wherein the selected portion is one of the first, second, or third portion.

14. The system of claim 13 wherein the frequency domain representation includes, for each of a number of frequency bins:

(i) values that each represent a level of coherence between acoustic outputs from the first and second audio sources,

(ii) values that each represent a level of coherence between an acoustic output of a particular one of the first and second audio sources and the input signal, and

(iii) values that each represent the power of the acoustic output for the particular frequency bin, of one of the first and second audio sources.

15. The system of claim 13 wherein the frequency domain representation comprises a cross-spectral density matrix computed based on outputs of the first and second audio sources.

16. The system of claim 15 wherein iteratively modifying the frequency domain representation comprises executing a matrix diagonalization process on the cross-spectral density matrix.

17. One or more machine-readable storage devices having encoded thereon computer readable instructions for causing one or more processing devices to perform operations comprising:

receiving, at one or more processing devices, an input signal representing audio captured using a microphone, the input signal comprising at least a first portion that represents acoustic output from a first audio source in an environment and a second portion that represents other acoustic energy in the environment;

iteratively modifying, by the one or more processing devices, a frequency domain representation of the input signal, such that the modified frequency domain representation represents a portion of the input signal in which effects due to all but a selected one of the first and second portions is substantially reduced;

determining, from the modified frequency domain representation, an estimate of a power spectral density of the selected portion; and

at least one of reducing noise or echo in a microphone signal based upon the estimated power spectral density or inserting noise in a far end system based upon the estimated power spectral density.

18. The storage devices of claim 17 wherein the frequency domain representation includes, for each of a number of frequency bins:

(i) values that each represent the power of an acoustic output of the first audio source for the particular frequency bin, and

(ii) values that each represent a level of coherence between the acoustic output of the first audio source and the input signal.

19. The storage devices of claim 17 wherein the frequency domain representation comprises a cross-spectral density matrix computed based on an output of the first audio source.

20. The storage devices of claim 19 wherein iteratively modifying the frequency domain representation comprises executing a matrix diagonalization process on the cross-spectral density matrix.

21 . The storage devices of claim 17 wherein the input signal includes a third portion that that represents acoustic output from a second audio source in the environment and wherein the selected portion is one of the first, second, or third portion.

22. The storage devices of claim 21 wherein the frequency domain representation includes, for each of a number of frequency bins:

(i) values that each represent a level of coherence between acoustic outputs from the first and second audio sources,

(ii) values that each represent a level of coherence between an acoustic output of a particular one of the first and second audio sources and the input signal, and

(iii) values that each represent the power of the acoustic output for the particular frequency bin, of one of the first and second audio sources.

23. The storage devices of claim 21 wherein the frequency domain representation comprises a cross-spectral density matrix computed based on outputs of the first and second audio sources.

24. The storage devices of claim 23 wherein iteratively modifying the frequency domain representation comprises executing a matrix diagonalization process on the cross-spectral density matrix.