Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020117823 - AUGMENTED REALITY FILTERS FOR CAPTURED AUDIOVISUAL PERFORMANCES

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

WHAT IS CLAIMED IS:

1. A method comprising:

accessing a computer readable encoding of an audiovisual performance

captured in connection with a temporally-synchronized backing track, score and lyrics; and

augmenting a rendering of the audiovisual performance with one or more applied visual effects, wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on an audio feature computationally extracted from the audiovisual performance or from the temporally-synchronized backing track.

2. The method of claim 1 ,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on an element of musical structure coded in, or computationally-determined from, the temporally-synchronized score or lyrics.

3. The method of claim 1 ,

wherein at least one of the applied visual effects includes a performance

synchronized presentation of text from the lyrics, wherein visual scale, movement in a visual field, timing, font color, or brightness of presented text is based on an audio feature extracted from the audiovisual performance or from the temporally synchronized backing track or based on an element of musical structure coded in, or computationally-determined from, the temporally-synchronized score or lyrics.

4. A method comprising:

accessing a computer readable encoding of an audiovisual performance

captured in connection with a temporally-synchronized backing track, score and lyrics; and

augmenting a rendering of the audiovisual performance with one or more applied visual effects, wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on an element of musical structure coded in, or

computationally-determined from, the temporally-synchronized score or lyrics.

5. The method of claim 4,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on an audio feature computationally extracted from the audiovisual performance or from the temporally-synchronized backing track.

6. The method of claim 4,

wherein at least one of the applied visual effects includes a performance synchronized presentation of text from the lyrics, wherein visual scale, movement in a visual field, timing, font color, or brightness of presented text is based on an audio feature extracted from the audiovisual performance or from the temporally synchronized backing track or based on an element of musical structure coded in, or computationally-determined from, the temporally-synchronized score or lyrics.

7. A method comprising:

accessing a computer readable encoding of an audiovisual performance

captured in connection with a temporally-synchronized backing track, score and lyrics; and

augmenting a rendering of the audiovisual performance with one or more applied visual effects, wherein at least one of the applied visual effects includes a performance synchronized presentation of text from the lyrics, wherein visual scale, movement in a visual field, timing, font color, or brightness of presented text is based on an audio feature extracted from the audiovisual performance or from the temporally synchronized backing track or based on an element of musical structure coded in, or computationally-determined from, the temporally-synchronized score or lyrics.

8. The method of claim 7,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on an audio feature computationally extracted from the audiovisual performance or from the temporally-synchronized backing track.

9. The method of claim 7,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on an element of musical structure coded in, or computationally-determined from, the temporally-synchronized score or lyrics.

10. The method of claim 1 , 4, or 7,

wherein the applied visual effect is controlled or includes content based, at least in part, on a received input from a member of an audience to which the audiovisual performance is streamed.

11. The method of claim 10, further comprising:

receiving a like/love or upvote/downvote indication from the member of the audience and, based thereon, presenting the applied visual effect.

12. The method of claim 10, further comprising:

receiving chat traffic from at least one member of the audience and, based on volume, content or keywords of the received chat traffic, presenting the applied visual effect.

13. The method of claim 12,

wherein the applied visual effect includes and visually presents content or keywords from the received chat traffic.

14. The method of claim 1 , 4, or 7, further comprising:

receiving the accessed encoding, via a communications network, from a remote portable computing device at which the audiovisual performance was captured in connection with a karaoke-style audible rendering of the temporally-synchronized backing track, and visual

presentation of the temporally-synchronized lyrics and of pitch cues in correspondence with the temporally-synchronized score.

15. The method of claim 1 , 4, or 7, further comprising:

capturing the audiovisual performance in connection with a karaoke-style audible rendering of the temporally-synchronized backing track, and visual presentation of the temporally-synchronized lyrics and of pitch cues in correspondence with the temporally-synchronized score.

16. The method of claim 1 , 4, or 7, further comprising:

capturing a second audiovisual performance in connection with a karaoke- style visual presentation of the temporally-synchronized lyrics, the captured second audiovisual performance including performance synchronized video of a second performer; and

compositing the captured second audiovisual performance with a first

audiovisual performance including performance synchronized video of a first performer to produce the accessed audiovisual performance, wherein the augmentation with the one or more applied video effects is

applied to either or both of first and second performer visuals detected in the visual field.

17. The method of claim 16,

wherein the captured first and second audiovisual performances present, after the compositing and the augmentation, as a duet.

18. The method of claim 1 , 4, or 7, wherein the applied visual effect includes: dynamically rendered visual augmentations to face or body visuals of a vocal performer detected in a visual field of the captured audiovisual performance.

19. The method of claim 18, wherein the dynamically rendered visual augmentations to face or body visuals include one or more of:

synthetic tatoo visuals that augment face or body visuals of the vocal

performer detected in the visual field of the captured audiovisual performance;

synthetic ear, nose, hair, antenna, hat or glasses visuals that augment facial visuals of the vocal performer detected in the visual field of the captured audiovisual performance;

distortions to eyes, mouth or ears of the vocal performer detected in the visual field of the captured audiovisual performance; and

presentation of a visual avatar for the vocal performer detected in the visual field of the captured audiovisual performance.

20. The method of claim 1 , 4, or 7, wherein the applied visual effect includes one or more of:

a particle-based effect or lens flare;

transitions between, or layouts of, distinct source videos;

animations or motion of a frame within a source video

vector graphics or images of patterns or textures; and

color, saturation or contrast.

21. The method of claim 1 , 4, or 7, wherein the applied visual effect is applied to or as one of:

a vocal performer detected in the visual field;

a synthetic foreground;

a visual feature detected in a background; and

a synthetic background.

22. The method of claim 1 , 4, or 7, wherein the applied visual effect includes: dynamically rendered visual augmentation of a detected reflective surface or a synthetic augmentation of the captured audiovisual performance to include an apparent reflective surface, wherein the dynamically rendered visual augmentation presents a performance synchronized second vocal performer visuals as an apparent reflection in the detected or apparent reflective surface.

23. The method of claim 1 , wherein the applied visual effect includes either or both of:

a synthetic background against which a background-subtracted version of the captured audiovisual performance is rendered; and

a visually overlaid synthetic foreground.

24. The method of claim 1 , 5, or 8, wherein the extracted audio feature includes one or more of:

a time-varying audio signal strength or audio energy density measure

computationally determined from vocal audio of the captured audiovisual performance;

a computationally-determined measure of brightness, breathiness or vibrato; and

beats, tempo, signal strength or energy density of a backing audio track.

25. The method of claim 1 , further comprising:

segmenting a vocal audio track of the audiovisual performance encoding to provide the computationally extracted audio feature.

26. The method of claim 25,

wherein the segmenting is based at least in part on a computational

determination of vocal intensity with at least some segmentation boundaries constrained to temporally align with beats or tempo computationally extracted from the temporally-synchronized backing track.

27. The method of claim 25,

wherein the segmenting is based at least in part on a similarity analysis computationally performed on the temporally-synchronized lyrics to classify particular portions of audiovisual performance encoding as verse or chorus.

28. The method of claim 1 , 5, or 8, further comprising:

segmenting the temporally-synchronized backing track to provide the

computationally extracted audio feature.

29. The method of claim 1 , 4, or 7,

performed, at least in part, on a content server or service platform to which geographically-distributed, network-connected, vocal capture devices are communicatively coupled.

30. The method of claim 1 , 4, or 7,

performed, at least in part, on a network-connected, vocal capture device communicatively coupled to a content server or service platform.

31. The method of claim 1 , 4, or 7,

performed, at least in part, on a network-connected, vocal capture device communicatively coupled as a host device to at least one other network-connected, vocal capture device operating as a paired guest device.

32. The method of claim 1 , 4, or 7,

embodied, at least in part, as a computer program product encoding of

instructions executable on a content server or service platform to which a plurality of geographically-distributed, network-connected, vocal capture devices are communicatively coupled.

33. The method of claim 1 , 4, or 7,

embodied, at least in part, as a computer program product encoding of

instructions executable on a network-connected, vocal capture device on which the augmented rendering of the audiovisual performance is audibly and visually presented to a human user.

34. The method of claim 1 , 4, or 7,

wherein the temporally-synchronized score encodes musical sections of differing types; and

wherein the applied visual effects include differing visual effects for different ones of the encoded musical sections.

35. The method of claim 1 , 5, or 8,

wherein the extracted audio feature corresponds to one or more events or transitions in the audiovisual performance; and

wherein the applied visual effects augment the audiovisual performance with differing visual effects for different ones of the events or transitions. 36. A system comprising:

at least a guest and host pairing of network-connected devices configured to capture at least vocal audio;

the host device configured to (i) receive from the guest device an encoding of at least vocal audio, to (ii) composite the received encoding of at least vocal audio with a locally captured audiovisual performance and, based on an audio feature computationally extracted from the vocal audio, the locally captured audiovisual performance, an associated backing track, or a resulting composited audiovisual performance encoding, to (iii) augment the composited audiovisual performance encoding with one or more applied visual effects,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on the

computationally extracted audio feature.

37. A system comprising:

at least a guest and host pairing of network-connected devices configured to capture at least vocal audio;

the host device configured to (i) receive from the guest device an encoding of at least vocal audio, to (ii) composite the received encoding of at least vocal audio with a locally captured audiovisual performance and, based on an element of musical structure coded in, or

computationally-determined from, the temporally-synchronized score or lyrics, to (iii) augment the composited audiovisual performance encoding with one or more applied visual effects,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on the coded or computationally-determined element of musical structure.

38. A system comprising:

at least a guest and host pairing of network-connected devices configured to capture at least vocal audio;

the host device configured to (i) receive from the guest device an encoding of at least vocal audio, to (ii) composite the received encoding of at least vocal audio with a locally captured audiovisual performance and, an

audio feature extracted from the audiovisual performance or from the temporally synchronized backing track or based on an element of musical structure coded in, or computationally-determined from, the temporally-synchronized score or lyrics, to (iii) augment the composited audiovisual performance encoding with one or more applied visual effects,

wherein at least one of the applied visual effects includes a performance synchronized presentation of text from performance-synchronized lyrics, wherein visual scale, movement in a visual field, timing, font color, or brightness of presented text is based on the extracted audio feature or the coded or computationally-determined element of musical structure.

39. The system of any of claims 36-38,

wherein the host and guest devices are coupled as local and remote peers via a communication network with non-negligible peer-to-peer latency for transmissions of audiovisual content,

wherein the host device is communicatively coupled as the local peer to

receive a media encoding including the vocal audio, and

wherein the guest device is communicatively coupled as the remote peer to supply a media encoding captured from a first one of the performers and mixed with the associated backing track.

40. The system of any of claims 36-38,

wherein the host device is configured to render the audiovisual performance coding as a mixed audiovisual performance, including vocal audio and performance synchronized video from the first and a second one of the performers, and to transmit the audiovisual performance coding as an apparently live broadcast with the augmenting visual effects applied.

41. A system comprising:

a geographically distributed set of network-connected devices configured to capture audiovisual performances including vocal audio with performance synchronized video; and

a service platform configured to (i) receive encodings of the captured audiovisual performances, to (ii) composite the received encodings and, based on an audio feature computationally extracted from one of the received encodings or a resulting composited audiovisual performance encoding, to (iii) augment the composited audiovisual performance encoding with one or more applied visual effects, wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on the

computationally extracted audio feature.

42. A system comprising:

a geographically distributed set of network-connected devices configured to capture audiovisual performances including vocal audio with performance synchronized video; and

a service platform configured to (i) receive encodings of the captured

audiovisual performances, to (ii) composite the received encodings and, based on an element of musical structure coded in, or computationally-determined from, a temporally-synchronized score or lyrics, to (iii) augment the composited audiovisual performance encoding with one or more applied visual effects,

wherein visual scale, movement in a visual field, timing, color, or intensity of at least one of the applied visual effects is based on the coded or computationally-determined element of musical structure.

43. A system comprising:

a geographically distributed set of network-connected devices configured to capture audiovisual performances including vocal audio with performance synchronized video; and

a service platform configured to (i) receive encodings of the captured

audiovisual performances, to (ii) composite the received encodings and, based on an audio feature extracted from the one of the audiovisual performances or the composited audiovisual performance or from the temporally synchronized backing track or based on an element of musical structure coded in, or computationally-determined from, a temporally-synchronized score or lyrics, to (iii) augment the composited audiovisual performance encoding with one or more applied visual effects,

wherein at least one of the applied visual effects includes a performance synchronized presentation of text from performance-synchronized lyrics, wherein visual scale, movement in a visual field, timing, font color, or brightness of presented text is based on the extracted audio feature or the coded or computationally-determined element of musical structure.