Processing

Please wait...

Settings

Settings

Goto Application

1. WO2018195652 - SYSTEM, METHOD AND APPARATUS FOR CO-LOCATING VISUAL IMAGES AND ASSOCIATED SOUND

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

SYSTEM, METHOD AND APPARATUS FOR CO-LOCATING VISUAL IMAGES

AND ASSOCIATED SOUND

BACKGROUND OF THE INVENTION

1. Field of Invention

This invention relates to the presentation or display of audiovisual works and, in particular, to a system, method and apparatus for co-locating visual images and associated sound.

2. Description of Related Art

Since the 1920s, sound and image have been in time with each other, i.e. synchronized. However, mere synchronization does not co-locate visual images and associated sound.

Visual displays include video screens such as television sets, computer monitors, flat-panel displays, LCDs (Liquid Crystal Displays), cathode-ray tube displays, LED (Light-Emitting Diode) arrays, OLED (Organic Light-Emitting Diode) arrays, micro-LED (microscopic Light-Emitting Diode) arrays, and ELDs

(Electroluminescent Displays); projection screens such as slide-tape presentation equipment, slide projectors, film projectors; multi-screen displays; and similar devices for presenting or displaying visual works.

Conventional audiovisual displays include a screen acting as a source of (typically moving) visual images, and a speaker acting as a source of sound. Usually with conventional audiovisual systems, the audio components are dislocated spatially from visual sources and come from separate sources (e.g. speakers next to or arranged elsewhere in a room from a video screen).

Stereophonic sound is a method of sound reproduction that involves

recording live sound by an array of microphones separated from each other and then playing back the recordings on multiple speakers (or stereo headphones) to recreate, as closely as possible, the spatial characteristics of the live sound.

Surround sound is a form of stereophonic sound that involves multiple discrete audio channels routed to an array of speakers. In a surround sound system, the speakers are located separately from each other with the intention of surrounding the listener. However, conventional surround-sound and other stereophonic systems do not involve co-locating the visual images and their associated sounds.

International patent application publication No. WO9601547A2 to Conley et al. discloses piezoelectric patches placed behind a LCD (Liquid Crystal Display) screen of a laptop computer. The piezoelectric patches are placed directly to the back wall of the lid of the laptop computer. The piezoelectric patches form piezo speakers when driven by an audio amplifier and transformer, so as to convert the laptop lid into a loudspeaker. However, the sound generated by the loudspeaker of Conley et al. is not co-located with associated visual images on the LCD screen.

United States patent No. 6,389,935 to Azima et al. discloses an acoustic display screen in the form of a projection screen that is also a distributed mode acoustic radiator loudspeaker having a transducer mounted wholly and exclusively thereon to vibrate the radiator to cause it to resonate to provide an acoustic output. Azima et al. disclose that the non-directional radiation property of the acoustic panel means that the sound appears to come from the general acoustic region of the screen but not from one isolated point. Azima et al.

disclose that the acoustic panel of Azima et al. provides a desirable lack of specific sound source localisation. Thus, the acoustic display screen of Azima et al. does not co-locate visual images and associated sounds.

Virtual reality (VR) refers to a computer-generated immersive environment intended to create a lifelike experience for a user perceiving audiovisual media and/or haptic feedback. The effect of an immersive environment is conventionally created through the use of a VR headset having a head-mounted display with a small screen in front of the eyes to provide the visual images, and the use of headphones, earphones, or nearby speakers to provide the associated sounds. In the case of headphones or earphones, a head-related transfer function (HRTF) simulates a binaural sound that seems to come from a particular point in space distal from the user. In the case of nearby speakers, the immersive environment can alternatively be created in specially designed rooms with multiple large screens to present the visual images.

Augmented reality is a form of virtual reality in which textual or graphical information (i.e. a virtual object) is displayed in front of the eyes without blocking natural vision of the real world. The virtual objects are conventionally displayed on an otherwise transparent medium, such as glass eyewear.

Augmented virtuality is the layering of textual or graphical information onto the display of a live camera feed.

Mixed reality systems merge the display of a live camera feed and computer-generated imagery.

A VR arcade includes VR booths in which a user typically holds one or two hand controllers and wears a VR headset to create an immersive environment within the VR booth for a VR game or other VR user experience.

However, conventional virtual reality, augmented reality, augmented virtuality, and mixed reality systems do not co-locate visual images and associated sounds.

An object of the invention is to address the above shortcomings.

SUMMARY

The above shortcomings may be addressed by providing, in accordance with one aspect of the invention, an audiovisual co-location system which uses a multichannel two-dimensional array of sound signals to co-locate video imagery and sound sources, so that the perception is created that the sound emanates from the exact screen area of the visual cue. The system is composed of a display device with integrated data and audio electronics and a two dimensional array of sound exciters. The system is operable to receive an audio signal from an external source. Two types of media can be accommodated by the system, each of which has a different method of sound-image co-location. With the first type, particularly suitable for video or other visual recordation media, the audio tracks are produced in advance so that the sounds are positioned along the two dimensional array of sound exciters, which receive the audio signal that has been encoded for the array. The array can contain any number of channels. The second type, particularly suitable for computer-generated graphics such as found in game and virtual world media, has a channel routing audio engine which assigns in real time sounds attached to virtual objects to screen areas depending on the real-time current XY location of the object on the screen, causing the sound to emanate from that part of the screen. The screen may be either a video screen or a surface for projected imagery. In either type of screen, the two dimensional array of sound signals is produced internally to the screen's casing, which cause the screen itself to vibrate as the source of the sound. The display device receives its audio signals from the array via a single data port in the screen's casing. The audio signals are sent to the display device either by a video projector, video player, game console, other media device or the like.

In accordance with another aspect of the invention, there is provided an integrated system for producing the illusion that audio emanates from the screen area in a media image associated with the sound. The system comprises a display device with integrated audio electronics interfaced with an external source of audio signal from a media device. The display device contains two subsystems: a screen, which may be either a surface for projection or a video monitor, and the overall housing with additional embedded audio electronics for sound signal distribution. The screen defines the image area that may be either projected or screened video.

The display device can be operable to interface with a video projector, video player, or game console or similar device. The display device can provide either a surface for projected imagery or a video screen. The integrated audio electronics of the display device includes a two dimensional array of audio exciters attached to the interior surface of the screen. The data port's primary role is to receive from the audio source synchronized spatially distributed audio signals, which are sent to the appropriate sound exciter embedded in the display device. The display device may also have a video port for use if the screen subsystem is a video monitor rather than a surface for projection.

In circumstances where the signal comes from a linear media source, such as a video signal for film or television, the audio component of the video media preferably has been prepared in advance in the video post-production phases according to the specifications of the system.

In circumstances where the signal comes from a gaming console or otherwise employs real time audio routing, the channel routing audio engine of the game or other computer-generated graphics source preferably has the insertion, into the signal chain output, of a channel routing protocol. This protocol assigns virtual (e.g. animated game) objects with an audio output channel connected to the final destination in the display device. Such routing of audio to selected sound exciter(s)

The channel routing in a game's or other source's audio engine preferably occurs by assigning each game or other virtual object its own routing

mechanism, which typically occurs during the production phase of the game or other computer-generated graphics source.

All audio received or generated by the game or other source's audio engine is preferably fed into a mixer for combining signals before being directed to the display device.

In some embodiments, the display device has internal electronics which are not visible to viewers, being encased in a housing.

The external audio source signal output and routing element is preferably housed inside of a video projector, video media device, gaming console, or the like, which sends the external audio signal to the display device.

When the external audio source is a gaming console, the content of the game preferably has had its audio elements prepared in accordance with the protocols of the system. In some embodiments, the key elements of this protocol for gaming are:

- virtual objects as particular visual images with assigned sounds are identified.

the X-Y screen coordinates of the sound objects in the virtual world are extracted.

these X-Y coordinates are processed through logical operators which add new information about the sound's location in the display device.

the logical operators control a channel and gating process which assign the sound(s) to the appropriate output.

this output is sent to a mixer where all sounds are combined for final audio processing before being sent to the data port of the display device.

- this process is the same and concurrent with all other virtual objects which have been given sound elements during the content production phase.

Such game-based features of protocol also apply in some embodiments to any source of computer-generated graphics, including immersive computer graphics for example. Taken altogether, various parts constitute a single system. The system may comprise the display device, with its subsystems of screen and sound exciters, plus an audio and data electronics processor. The external audio source for spatially distributing the audio signal into discrete channels for the two dimensional array of sound exciters typically transmits the external audio signal for receiving by the system.

This system produces a novel function, creating the illusion that sound emanates from the visual cue at the correct spatial area of the screened image, which may be called audiovisual co-location. Thus, a large visual display is also the sound producing element, and images in the screen can have the sounds associated with them in the actual spatialized screen area of the image. This system allows for the colocation of moving image and sound. In essence, audio sound sources can be spatially tied to their visual sources on a screen. E.g. if an animated bird flies around the screen, its chirping sound emanates from the exact screen area where the bird is visualized.

In accordance with another aspect of the invention, there is provided a system for co-locating visual images and associated sounds. The system includes: (a) a screen for displaying the visual images, the screen having a display surface upon which the visual images appear at corresponding areas of the display surface, the visual images having associated therewith apparent locations of the visual images, respectively, the apparent locations being selected from the group consisting of: corresponding zones defined within an immersive environment surrounding the screen, and the corresponding areas of the display surface; and (b) a plurality of sound exciters for producing the associated sounds so as to render a location effect of the associated sounds originating from the apparent locations of the visual images, respectively.

The apparent locations may be selected as the corresponding zones. The plurality of sound exciters may form one or more sound walls distal from the screen. The plurality of sound exciters may be operable to receive a spatially distributed audio signal of the associated sounds. The spatially distributed audio signal may include location representations of a plurality of sets of three-dimensional coordinates identifying the corresponding zones. The spatially distributed audio signal may include, in association with the location

representations, associated representations for rendering the location effect. The associated representations may represent at least one of an audio volume, a reverberation, an echo, and a spectral filtering. The system may further include a processor. The system may further include a memory for storing computer-executable instructions for directing the processor to generate the spatially distributed audio signal.

The apparent locations may be selected as the corresponding areas of the display surface. The location effect may be rendered by the sound exciters being operable to vibrate the display surface such that the associated sounds emit from the corresponding areas of the display surface, respectively. The plurality of sound exciters may be operable to receive a spatially distributed audio signal of the associated sounds. The spatially distributed audio signal may include two-dimensional location representations of a plurality of sets of two-dimensional coordinates identifying the corresponding areas. The system may further include a processor. The system may further include a memory for storing computer-executable instructions for directing the processor to generate the spatially distributed audio signal. The screen may be a video screen. The screen may be a projection screen.

In accordance with another aspect of the invention, there is provided a method of co-locating visual images and associated sound. The method involves: (a) displaying the visual images on a display surface of a screen at corresponding areas of the display surface when the visual images have associated therewith apparent locations of the visual images, respectively, upon selection of the apparent locations from the group consisting of: corresponding zones defined within an immersive environment surrounding the screen, and the corresponding areas of the display surface; and (b) producing by a plurality of sound exciters the associated sounds so as to render a location effect of the associated sounds originating from the apparent locations of the visual images, respectively.

Step (a) may involve selecting the apparent locations as the

corresponding zones. Step (b) may involve producing the associated sounds by

the plurality of sound exciters forming one or more sound walls. Producing the associated sounds by the plurality of sound exciters forming one or more sound walls may involve receiving by the plurality of sound exciters a spatially distributed audio signal of the associated sounds when the spatially distributed audio signal comprises location representations of a plurality of sets of three-dimensional coordinates identifying the corresponding zones. Receiving by the plurality of sound exciters the spatially distributed audio signal of the associated sounds when the spatially distributed audio signal comprises the location representations of the plurality of sets of the three-dimensional coordinates identifying the corresponding zones may involve: receiving the spatially distributed audio signal comprising, in association with the location

representations, associated representations for rendering the location effect when the associated representations represent at least one of an audio volume, a reverberation, an echo, and a spectral filtering. The method may further involve directing a processor by computer-executable instructions stored in a memory to generate the spatially distributed audio signal.

Step (a) may involve selecting the apparent locations as the

corresponding areas. Step (b) may involve rendering the location effect by the plurality of sound exciters vibrating the display surface at the corresponding areas so as to emit the associated sounds from the corresponding areas, respectively. Rendering the location effect by the plurality of sound exciters vibrating the display surface at the corresponding areas so as to emit the associated sounds from the corresponding areas, respectively, may involve: receiving by the plurality of sound exciters a spatially distributed audio signal of the associated sounds when the spatially distributed audio signal comprises location representations of a plurality of sets of two-dimensional coordinates identifying the corresponding areas. The method may further involve directing a processor by computer-executable instructions stored in a memory to generate the spatially distributed audio signal. Step (a) may involve displaying the visual images on the display surface of a video screen. Step (a) may involve displaying the visual images on the display surface of a projection screen.

In accordance with another aspect of the invention, there is provided a system for co-locating visual images and associated sound. The system includes: (a) display means for displaying the visual images at corresponding areas of a display surface when the visual images have associated therewith apparent locations of the visual images, respectively, upon selection of the apparent locations from the group consisting of: corresponding zones defined within an immersive environment surrounding the display means, and the corresponding areas of the display surface; and (b) audio means for producing the associated sounds so as to render a location effect of the associated sounds originating from the apparent locations of the visual images, respectively.

The system may further include: (c) processing means for generating a spatially distributed audio signal of the associated sounds such that the spatially distributed audio signal comprises representations of coordinates identifying the apparent locations.

The foregoing summary is illustrative only and is not intended to be in any way limiting. Other aspects and features of the present invention will become apparent to those of ordinary skill in the art upon review of the following description of embodiments of the invention in conjunction with the

accompanying figures and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

This application includes drawings which illustrate by way of example only embodiments of the invention, as follows.

Figure 1 is a block diagram of a display device and an audio source device in accordance with a first embodiment of the invention, showing the display device having only one data port;

Figure 2 is a block diagram of a variation of the display device and audio source device shown in Figure 1 , showing the display device having two data ports;

Figure 3 is an internal view of the display device, showing a two-dimensional array of sound exciters;

Figure 4 is an internal view of the display device, showing a data port, a

DAC (digital-to-analog converter) and a multichannel audio amplifier;

Figure 5 is an internal view of the display device, showing the two

dimensional array of sound exciters, the data port, the DAC, and the multichannel audio amplifier;

Figure 6 is an internal view of the display device of Figure 5, showing a

conceptual schematic drawing of wiring;

Figure 7 is an internal view of the display device of Figure 5, showing a data port for receiving a video signal;

Figure 8 is a sectional side view of a display device according to the first embodiment, showing sound exciters attached to a rear side of the video screen;

Figure 9 is a sectional side view of a modified display device, showing an acoustic sheet for improving acoustic performance;

Figure 10 is a screen shot of a computer game, showing visual images displayed on a number of defined screen areas;

Figure 1 1 is a flowchart showing a method of generating a two-dimensional spatially processed and combined audio signal, showing parallel sub-methods for each game object;

Figure 12 is a perspective view of a plurality of separated visual displays with embedded audio electronics;

Figure 13 is a perspective view of a wall panel having a plurality of adjoining visual displays with embedded audio electronics;

Figure 14 is a perspective view of a virtual reality room according to a second embodiment of the invention, showing in see-through form surrounding wall panels having arrays of sound exciters;

Figure 15 is a perspective view of the virtual reality room shown in Figure 14, showing three-dimensional zones defined within the virtual reality room; and

Figure 16 is a flowchart showing a method of generating a three-dimensional spatially processed and combined audio signal, showing parallel sub-methods for each of a number of virtual objects.

DETAILED DESCRIPTION

A system for co-locating visual images and associated sound includes: (a) display means for displaying the visual images at corresponding areas of a display surface when the visual images have associated therewith apparent locations of the visual images, respectively, upon selection of the apparent locations from the group consisting of: corresponding zones defined within an immersive environment surrounding the display means, and the corresponding areas of the display surface; and (b) audio means for producing the associated sounds so as to render a location effect of the associated sounds originating from the apparent locations of the visual images, respectively. The system may include: (c) processing means for generating a spatially distributed audio signal of the associated sounds such that the spatially distributed audio signal comprises representations of coordinates identifying the apparent locations.

Referring to Figures 1 to 7, the system 10 according to a first embodiment includes a display device 12. The display device 12 is comprised of two elements: (1 ) the screen 14, which may be either a projection surface for projection (i.e. projected imagery) or a video (monitor) screen; and (2) the integrated data and audio electronics for sound distribution.

The screen defines one or more image areas that may, separately or together, display either projected or screened video.

The screen in the first embodiment can be of two kinds or types: (1 ) a video screen 16 (Figures 2 and 7); or (2) a projection screen 18 (Figures 1 and 4 to 6) for receiving and then displaying projected video imagery.

At least some embodiments of the display device 12 have a built-in data port 20 for receiving distributed audio signals 22. If the display device 12 contains a video screen 16 rather than simply a projection surface of a projection screen 18, it will preferably also have a video port 24 (Figures 2 and 7) for receiving the video signal 26.

The display device 12 typically includes an overall housing 28 (Figures 4 to 7) for housing the embedded audio electronics. Regardless of the display device type, the integrated data and audio electronics in the overall display device are similar in at least some embodiments of the invention. The system's internal electronics are typically not visible to viewers, being encased in the display device's housing 28.

Generally, the system and apparatus each include a processor (not shown) and a memory (not shown).

The processor is typically a processing circuit that includes one or more circuit units, such as a central processing unit (CPU), digital signal processor (DSP), embedded processor, etc., and any combination thereof operating independently or in parallel, including possibly operating redundantly. The processor may be implemented by one or more integrated circuits (IC), including being implemented by a monolithic integrated circuit (MIC), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), programmable logic controller (PLC), etc. or any combination thereof. The processor may include circuitry for storing memory, such as digital data, and may comprise the memory or be in wired or wireless communication with the memory, for example.

The memory is typically operable to store digital representations of data or other information, including control commands and data, and to store digital representations of program data or other information, including program code for directing operations of the processor. Typically, the memory is all or part of a digital electronic integrated circuit or formed from a plurality of digital electronic integrated circuits. The memory may be implemented as Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory, one or more flash drives, universal serial bus (USB) connected memory units, magnetic storage, optical storage, magneto-optical storage, etc. or any combination thereof, for example. The memory may be operable to store digital representations as volatile memory, non-volatile memory, dynamic memory, etc. or any combination thereof.

The apparatus and system 10 are advantageously suitable for the context of consumer electronics- home, school, office and similar indoor environments. The display device 12 is typically manufactured as a single piece of equipment with embedded and integrated electronics, with its screen area in a 16:9 aspect ratio. Typical lengths of the screen area (horizontal dimension) might be 6', 8', 10' and 12' with respective heights typically not to exceed 8' high when in low-ceiling rooms. However, in some embodiments larger custom built systems are

designed to specification out of the base components for larger installations, such as public or outdoor spaces (described further below with reference to Figure 13).

In some embodiments, the display device 12 includes a data port 20 in the display device's housing 28 through which the system receives external audio signals, such as from an external audio source 30. External audio signals 22 may be received by the system from any suitable audio source 30 such as a video projector, video player, game console, other media devices or the like for example. An external audio source 30 can be integrated into or otherwise pre-installed in the video projector, video device, gaming console, other media device or the like. The video projectors, media players, gaming consoles, other media devices or the like may be consumer grade electronics. In some embodiments, custom-made computational hardware built to spec, such as by using the same base elements, may be suitably employed for larger installations of the apparatus or system.

For example, customized installations of the apparatus and/or system 10 according to embodiments of the invention may be suitably employed at airports, casinos, hotels, museums, amusement parks and other large public spaces to create rich effects of audiovisual immersion (described further below with reference to Figure 13). In such manner, large scale video-based imagery can capitalize on the movement-spatial resources of an installation of the apparatus and/or system 10. This scenario is particularly suitable to a projector-based apparatus and/or system 10 according to embodiments of the invention (i.e. employing a projection screen 18) because video screens 16 are currently not typically manufactured to architectural scales.

The role of the data port 20 and related electronics is to receive

synchronized spatially distributed audio signals 22 received from the audio source 30 and to send it to the appropriate sound exciter 32 embedded in the display device 12. In the first embodiment, the integrated data and audio

electronics for sound distribution handles audio signal 22 distribution for the two dimensional array of sound exciters 32 internal to the display device 12.

The sound exciters 32 may be of any suitable type of audio emitter, including an electrodynamic speaker such as conventional electrodynamic, multi-cell flat diaghragm, or coaxial speaker; a flat panel speaker such as ribbon, ionic conduction, planar magnetic, magnetostatic, magnetostrictive, or electrostatic speaker; a plasma arc speaker; a piezoelectric speaker; a horn speaker; a bending wave speaker; an air motion speaker; a thermoacoustic speaker; a rotary woofer speaker; a digital speaker; other speaker type, and any

combination thereof for example. The sound exciters may employ any suitable driver type, including a full-range driver, subwoofer, woofer, mid-range driver, tweeter, coaxial driver, other driver types, and any combination thereof for example.

While the sound exciters 32 are preferably internal to and hidden within the display device 12 for convenience and aesthetic purposes, in some embodiments the sound exciters 32 are visible from the exterior of the display device 12 and may be mounted externally to the display device 12, such as at the rear of the display device 12.

Referring to Figure 8, the screen 14, which may be a video screen 16 or a projection screen 18, presents visual images at its front side 34. Attached at the rear side 36 of the screen 14 are a number of the sound exciters 32. Wiring 38 makes electrical connections that are typically parallel electrical connections to each of the sound exciters 32. The sound exciters 32 are operable to vibrate the screen 14 so as to generate sound waves 40 emitted from the screen 14 at its front side 34.

Referring to Figure 9, in a variation an acoustic sheet 42 is attached to the screen 14, such as at its rear side 36, to enhance acoustic performance of the screen 14. The acoustic sheet 42 may be made of any suitable material, including for example a composite material. Examples of a suitable material

include aluminum and graphene, although other materials and any combination thereof may be employed, for example.

Thus, the display device 12 may function as a visual media display and a substitute for stand-alone speakers. For example, the screen 14 may be part of a general entertainment system in which the screen surface acts as the sound production mechanism for a speakerless AV (audiovisual) system (whether for home or office). As a sonic display, the apparatus and/or system 10 according to embodiments of the invention can work with projected media, and can be integrated with a video screen 16 such as an ultra-thin OLED video display. The apparatus and system 10 according to embodiments of the invention can work with any media, e.g. games, video, DVD, mp3s, etc., simply by distributing mono or stereo audio channels to the grid of sound exciters 32 housed within the display device 12. Referring to Figure 10, embodiments of the invention ascribe audio files in tracks, sound files or virtual objects, for example, to a

corresponding screen area 44.

Still referring to Figure 10, the screen 14 can be divided up conceptually into any number of screen areas 44. Typically, each screen area 44 corresponds to one sound exciter 32 (not visible in Figure 10) attached to the screen 14 directly behind the corresponding screen area 44. When a given visual image, such as a person, animal, filmed object, virtual object or other visual object having associated with it an audio track, audio file or other associated sound appears on the screen 14 at a corresponding screen area 44, an electronic representation of the associated sound is routed to the particular sound exciter 32 (Figures 3 and 5 to 9) physically located behind the corresponding area 44. Over time, such as during playback of an audiovisual work, the visual image may move from a first corresponding area 44 to another corresponding area 44 and the appropriate sound exciter 32 will be employed at each corresponding screen area 44 to acoustically emit the associated sound.

Referring back to Figure 1 , the display device 12 is operable to receive a distributed audio signal 22 from the audio source device 30, which may be a video projector, media playback device, gaming console, other media device or the like. As shown in Figure 1 , the display device 12 in some embodiments has only one data port 20, such that its screen surface is particularly suitable for projection-based media. In such embodiments, there would be a video projector (not shown in Figure 1 ) for projecting visual images onto the screen surface.

Referring back to Figure 2, the display device 12 is operable to receive a distributed audio signal from the audio source device 30, which may be a video projector, media playback device, gaming console, other media device or the like for example. As shown in Figure 2, the display device 12 in a variation of the first embodiment has two data ports 20 and 24, such as when its screen surface area is a video screen 16. In such variation, the system requires: (a) a spatially distributed audio signal 22; and (b) a video signal 26. In the embodiment shown in Figure 2, the media device 30 can be a gaming console, other source of computer-generated graphics, or a media playback device, for example.

Referring back to Figure 3, internal to the display device 12 is the two dimensional array of sound exciters 32 (indicated by filled circles) attached to the internal surface of the screen 14 as part of the integrated electronics.

A current prototype uses an array of thirty-two sound exciters 32 mounted to the back of an 4'x8' sheet of Alupanel (TM). A custom built 32-channel amplifier is given thirty-two discrete analog signals 22 from an Orion (TM) 32 digital audio breakout box, which is connected by USB to a MacPro (TM) running various software. Other prototypical embodiments and variations thereof are within the scope contemplated by the present invention.

Referring back to Figures 4 and 5, internal to the display device 12 is the integrated electronics of a two dimensional array of sound exciters 32 (indicated by filled circles) and also: (a) the data port 20 for receiving the spatially distributed audio signal 22, (b) the DAC (digital-to-analog converter) 46, and (c) the multichannel audio amplifier 48 for distributing an amplified audio signal to the two dimensional array of sound exciters 32. While such components are preferably internal to and hidden within the display device 12 for convenience and aesthetic purposes, any one or more of such components can be made operable when visible from the exterior of the display device 12 and when disposed externally to the display device 12.

Referring back to Figure 6, internal to the display device 12 is wiring 38 making connections between various of (a) audio data port 20, (b) DAC (digital-to-analog converter) 46, (c) the multi-channel amplifier 48, and (d) the two dimensional array of sound exciters 32.

A memory (not shown) in accordance with some embodiments of the invention contains blocks of code comprising computer executable instructions for directing a processor (not shown) to parse and/or distribute audio channels of the spatially distributed audio signal 22 to the two-dimensional array of sound exciters 32 (indicated by filled circles), such as by transmitting the audio channels via the wires 38 shown in Figure 6 between the multi-channel amplifier 48 and the two-dimensional array of sound exciters 32.

Referring back to Figure 7, internal to the display device 12 is a data video port 24 for receiving a video signal 26 in embodiments in which the screen 14 is a video screen 16 instead of surface for projection.

Referring to Figure 10, a component of the protocol for content preparation of games, other sources of computer-generated graphics, video, or other visual recordation media is to add a visual overlay 50 to the image to guide audio post-production engineers with a mapping of the two dimensional array of sound exciters 32 to the spatial distribution system embedded within the display device 12, thereby facilitating placement of audio during content production.

Embodiments of the present invention are suitable for use with any video production technique, including video production techniques for object-based media in which metadata is embedded into production files. Such embedded

metadata may include metadata assigning an audio channel (e.g. one of the channels 1 to 32 shown in Figure 10) representing a screen area 44 to a corresponding co-located sound exciter 32 (Figures 5 to 7). Thus, in some embodiments channel assignments to screen areas 44 are accomplished by embedding metadata in audio files such that the metadata maps sounds to corresponding channels for co-located audio output.

Referring to Figure 1 1 , the audio source 30 (Figures 1 and 2) which sends the spatially processed and distributed audio signal 22 (Figures 6 and 7) to the data port 20 (Figures 1 to 7) of the display device 12 typically has internal logic and signal routing capabilities for generating the audio signal 22. An exemplary method 52 is shown in Figure 1 1 in respect of three exemplary virtual game objects 54 (one of which is shown in Figure 10). Reading from top to bottom in respect of one particular game object 54, at step 56 a virtual game object 54 has its X-Y axis coordinates, i.e. its relative spatial position in the overall screen area 44, extracted. At step 58, logical operators (e.g. if X== && Y == then... ) assign the sound associated with the virtual object 54 to a channel gating and routing process of step 60, which sends the audio signal - now spatially mapped by having been assigned to a channel in the two dimensional array of sound exciters 32 - to a multi-channel mixer which at step 62 combines all spatially mapped virtual objects for final mixing processes before sending the combined audio signal 22 of all virtual objects 54 to the display device's data port 20. This overall method applies to each game object 54 in the virtual world to which sounds have been assigned.

While step 60 is referred to in Figure 1 1 by the term "channel gating and routing", step 60 in variations of embodiments is operable to employ any suitable routing technique for routing sounds associated with a game object to a co-located sound exciter 32. Such routing techniques include gating in which a game object appearing on the screen 14 at a location bounded by a given screen area 44 is routed to the particular sound exciter 32 associated with the given

screen area 44. Such routing techniques include panning in which sounds moving from one channel and screen area 44 to an adjacent channel and screen area 44 are cross-faded by fading out the sound from the first co-located sound exciter 32 while fading up the sound in the adjacent co-located sound exciter 32 so as to provide a smooth sounding transition as the game object pans across the screen 14.

Typically, in the production phases of either video post-production or game design, content developers have preferably already treated the audio mix to the specifications of the system's spatial distribution protocol, so that the display device 12 merely has to receive an audio signal 22 that has already been spatially distributed. This data signal of distributed audio channels is received by the display device 12 and then internally sent to the internally housed two-dimensional array of sound exciters 32. The apparatus and system 10 according to embodiments of the invention are operable to make use of any available codec (coder-decoder) that would work, such as codecs associated with the

Multichannel Wave (Waveform Audio File Format) PCM (Pulse-Code Modulation) format.

The apparatus and system 10 according to embodiments of the invention advantageously provide video screens 16 and projection screens 18 that do not require external speakers, and which have excellent quality and co-location sound capability.

The apparatus and system 10 according to embodiments of the invention are advantageously suitable for use in home gaming, and other sources of computer-generated graphics, because embodiments of the invention work very well with vector-based imagery (i.e. animated and computer graphics based imagery), such as where a gaming console has a USB or other data output with 32+ channels of audio embedded in the data signal 22 (e.g. 2 WAVE PCM files, each of which can encode up to 18 channels of information). The game has preferably been pre-formatted by the game developer for spatially distributed

audio output, by allowing the virtual sound objects in the game to map their sound effect components according to an XY grid which positions the sound in the corresponding screen location 44 (Figure 10). Embodiments of the invention advantageously work with any video game or other source of computer-generated graphics having a simple coding modification. This scenario works equally well with either projector-based or OLED technology, for example.

The apparatus and system according to embodiments of the invention are advantageously suitable for use with home video, such as where a video streaming box, such as an Apple (TM) TV or Chromecast (TM), has a USB or other data port which could in principle output 32+ channels of audio embedded in the data signal 22 (e.g. 2 WAVE PCM files, each of which can encode up to 18 channels of information). The video media has preferably been pre-formatted by the postproduction company for spatially distributed audio output, by allowing individual audio tracks of a multichannel sound mix to be allocated according to the XY grid of sound exciters 32, which position the sound in the corresponding screen location 44 (Figure 10). This scenario works equally well with either projector-based or OLED technology, for example.

The screen 14 according to variations of the first embodiment can also complement traditional surround sound audio formats, because its spatialization is restricted to the screen 14 and its image area. Thus, such variations of the first embodiment and traditional surround sound audio are not mutually exclusive, and can be combined for simultaneous operation, such as by having a first set of audio channels for co-location according to embodiments of the present invention and having a second set of audio channels directed to surround-sound speakers appropriately located separately from the display device 12.

Referring to Figure 12, the apparatus and system 10 according to embodiments of the invention are advantageously suitable for work-based telepresence, such as for teleconferencing of multiple speakers (i.e. individuals) present on the same video screen 16, thereby achieving overall media

transparency and immersion enhanced by tying the speakers' voices to their position on the video screen 16. Teleconferencing systems 10 according to embodiments of the invention would advantageously achieve better transparency and telepresence with spatially distributed audio attached to the remote participants. This scenario works equally well with either projector-based or OLED technology, for example. Teleconferencing in some embodiments requires a suitable audio breakout box (similar to those described above, e.g. console or streaming device, such as the audio source 30 shown in Figures 1 and 2) so that the signal is preformatted for display by the screen 14 of the apparatus and/or system 10 according to embodiments of the invention.

The apparatus and system 10 according to embodiments of the invention are advantageously suitable for use in a real-time communications environment which combines audiovisual feeds with multi-channel communications to take advantage of the apparatus and/or system 10 for the enhanced reaction times afforded by the use of sound to draw attention to spatialized visual information. An example of this scenario might be real-time drone footage of a special operations raid in which sounds are used to distinguish enemy combatants and draw command room personnel's attention more quickly to tagged events and their screen areas through the use of sound. Such a system could

advantageously enhance reaction and communication times of mission critical decision making. This scenario works equally well with either projector-based or OLED technology, for example.

Still referring to Figure 12, the media-rich workplace environment with high information densities across multiple screens 14, such as can be found in process control and command-and-control contexts, is particularly suitable for embodiments of the apparatus and system 10 in which sound emanates by sound waves 40 from the screen areas 44 (Figure 10) across multiple screens 14, tied to various kinds of information in the mediated content of the display devices 12.

The apparatus and system 10 according to embodiments of the invention are advantageously suitable for use in a multi-stream video interface, such as where multiple streams of video are positioned in a grid across the screen 14 according to embodiments of the invention. For example, multiple channels from a cable television program guide can be made available for viewing on the screen. A user can momentarily 'tune in' to the audio component of the represented video stream, so that the sound waves 40 come from the

corresponding area 44 (Figure 10) of the screen 14. Once selected, the screen display would expand to a full-size window and the display device 12 becomes a sonic display device as described herein above. This scenario works equally well with either projector-based or OLED technology, for example.

Referring to Figures 12 and 13, the apparatus and system 10 according to embodiments of the invention are advantageously suitable for the creative and performing arts. For example, there are numerous creative applications, for galleries and performance spaces, that can utilize the features of the apparatus and/or system 10 according to embodiments of the invention. Any kind of media application with moving image and sound can be processed and presented on the apparatus and/or system according to embodiments of the invention. This scenario works equally well with either projector-based or OLED technology, for example.

Referring to Figure 13, the apparatus and system 10 according to embodiments of the invention are advantageously suitable for large scale audiovisual display, which may be an array of screens 14, to create a more immersive experience between moving image and sound for the spectators. Examples of large public spaces suitable for creating audiovisual immersion include museums, amusement parks, theme parks, escape rooms, virtual showrooms, virtual arcades, retail spaces, and other large outdoor or indoor public spaces where the apparatus and system 10 can be customized with respect to scale.

The apparatus and system 10 according to embodiments of the invention are advantageously suitable for education and simulation. By way of example only, a specialized apparatus and/or system 10 according to embodiments of the invention could be developed for education scenarios, e.g. teaching students physics and acoustics or other subject matter where animated content might be used for more effective pedagogy. In general, the apparatus and system 10 according to embodiments of the invention are particularly suited to displaying animated (i.e. vector-based) content. By way of further examples, teachers could incorporate other tools that help in the classroom, e.g. live drawing and whiteboard type interactions. This scenario works equally well with either projector-based or OLED technology, for example.

While the exemplary screens 14 in Figures 1 to 13 are shown as flat screens, the screen 14 in some embodiments is curvilinear. Such curvilinear screens 14 have screen areas 44 that wrap around with the curvature of the screen 44, generally without impeding the functionality of the present invention to co-locate visual images and their associated sounds.

Thus there is provided a system for co-locating visual images and associated sound, the system comprising: (a) a screen for displaying visual images, the screen having a display surface upon which the visual images appear at corresponding areas of the display surface, respectively; and (b) a plurality of sound exciters for vibrating the display surface to produce sound emitting from said corresponding areas of the display surface, respectively.

The system advantageously provides co-location, which means the perception that a sound emanates from the image source.

Furthermore, the system may include: (c) a processor for directing to the plurality of sound exciters a spatially distributed audio signal comprising a set of audio channels, each said audio channel corresponding to a respective sound exciter of the plurality of sound exciters, the system being operable to execute codes for directing the processor.

Embodiments of the invention advantageously associate audio and video signals so that sounds are perceived to emanate from the screen surface associated with the visual cue for the sound, which is colocation. Embodiments of the invention advantageously allow for customized large scale immersive displays to be created which co-locate moving visual image (e.g. video) and sound (e.g. their associated sound effects), and integrate well with other existing audiovisual postproduction technologies and workflows, such as soundtrack mixing, game design, or other sources of computer-generated graphics.

Second Embodiment

Still referring to Figure 13, the apparatus and system 10 according to a second embodiment is operable to present visual images to the spectators via virtual reality (VR) headsets, augmented reality (AR) eyeglasses, or mobile screens. In such embodiment, the large display is made of an array of sound panels 64 housing the sound exciters 32. Typically, the sound exciters 32 are embedded in and hidden by the outer surface of the sound panel 64, and are operable to vibrate the outer surface so that co-located sound is emitted from outer surface, the co-located sound being associated with the visual images presented to the spectators by VR, AR, mobile screen or other portable visual media technology.

While Figure 13 shows a flat array of audiovisual (first embodiment) and/or sound panels 64 (second embodiment), in general the array may have any suitable shape (e.g curvilinear). The apparatus and system 10 of Figure 13 may be of any suitable size, for example.

Referring to Figures 14 and 15, another exemplary application would be immersive simulations, where virtual environments are made more realistic for training purposes by suitable use of the apparatus and/or system 10 according to embodiments of the invention. The apparatus and system 10 according to

embodiments of the invention are advantageously suitable for scientific visualization, for example.

The apparatus and system 10 according to embodiments of the invention are advantageously suitable for distributed remote environments, such as where a worker deep in a tunnel of a dam or mine shaft makes use of an audio element 32 built into the surroundings that affords two-way communication between the worker and other personnel. As the worker moves through space, the

conversation follows, such as by having an embodiment of the invention track their bodily position with sensors and assign sound to the appropriate sound exciter 32 based on the movement of the worker through the space. Such a communication system implemented by an embodiment of the invention may be particularly suitable where wireless signals cannot be transmitted. In such embodiments, the apparatus and/or system 10 can operate as a purely audio display, though there could be 'waystations' positioned at various points where there is a live video feed as well. The idea is to convert the surrounding space itself into a two-way communication channel, since engineering-wise any speaker can be made to work like a microphone and vice versa. In such embodiments, the apparatus and/or system 10 could act as both a microphone and a speaker, with the addition of an analog-to-digital converter (ADC) associated with the microphone and a digital-to-analog converter (DAC) associated with the speaker, allowing for communication in extreme, distributed and/or remote environments.

Referring to Figure 14, an enclosure 66 is suitable for use with the apparatus and system 10 in accordance with a second embodiment of the invention. Within the enclosure 66 is created an immersive environment on the basis of virtual reality (VR), augmented reality (AR), augmented virtuality (AV), Mixed Reality (MR), other immersive environment technologies, and any combination thereof for example. The enclosure 66 according to the second embodiment is useful as an enhanced VR booth as part of an enhanced VR arcade, for example. In the enclosure 66, a user typically wears a VR headset providing visual images that have associated therewith sounds appearing to originate from apparent locations within the enclosure 66. In some

embodiments, a mobile screen or eyewear glasses are employed to provide the visual images. The visual images may be real time recordings of the real world (i.e. camera vision), virtual objects in an animated or otherwise imaginary world, and any combination thereof for example. In some embodiments, an imaginary world is created for the VR user, while visual and/or auditory warnings are interjected when the VR user is proximate to real-world objects such as physical walls, etc.

The enclosure 66 includes walls 68 that have any suitable number of sound exciters 32 attached thereto, thus creating sound walls 68. While Figure 14 shows walls 68, in general the sound exciters 32 may be disposed in any suitable manner to surround a defined volume of space. For example, the enclosure 66 may be indoors or outdoors. Additionally or alternatively, the sound exciters 32 can be mounted on physical wall 68 panels or suspended from a ceiling or other overhead structure, attached to posts, or otherwise disposed at defined locations relative to the immersive environment. Typically, the sound exciters 32 are located at the periphery of the immersive environment or at other locations intended to minimize accidental physical contact between VR users and the sound exciters 32.

Referring to Figure 15, the immersive environment created within the enclosure 66 can be divided up into any suitable number of three-dimensional zones 70 to which apparent locations of associated sounds can be mapped to. For example as shown in Figure 10, a visual image of a bird flying around, as seen by the user wearing a VR headset, will be associated with chirping sounds emanating from apparent locations within identifiable zones 70 defined within the three-dimensional immersive environment of the enclosure 66. The sound exciters 32 attached to the sound wall 68 are operated to give the effect of the chirping sound originating from the three-dimensional zone 70 where the bird is located as it appears to the VR user viewing the display screen of the VR headset, thus co-locating the image of the bird and its chirping sound.

Various audio mixing techniques can be employed to give the effect of sound, emitted by the sound exciters 32 that are attached to the sound wall 68, originating from particular zone(s) 70 associated with the apparent locations. For example, a sound whose apparent location is farther away from the VR user's current location can be quieter (i.e. have a lower audio volume), have more reverberation, have a longer-time echo, and have reduced or no high-frequency spectral content. In contrast, a sound whose apparent location is closer to the VR user's current location can be louder (i.e. have a higher audio volume), have less or no reverberation, have less or no echo, and be spectrally brighter (i.e. include or emphasize higher frequency spectral content).

Thus, the apparatus and system 10 according to the second embodiment is operable to display to a VR user a visual image having associated therewith an apparent location within the immersive environment; map the apparent location to a corresponding three-dimensional zone 70 in the enclosure 66; map its associated sound to a selection of one or more sound exciters 32 on the sound wall 68 based on the apparent location of the visual image and relative to the current location of the VR user; and emit the associated sound from the selected sound exciter(s) 32 with appropriate application of volume, reverb, echo, and/or spectral filtering to produce the effect of the associated sound originating from the corresponding zone 70; and continue to do so in real time for each

successive set of visual images.

The zones 70 may be of any suitable size, and may be defined by a range of sets of X, Y, and Z coordinates within the immersive environment of the enclosure 66. In some embodiments, each zone 70 is defined by a single set of such X, Y, and Z coordinates for a single three-dimensional point within the immersive environment. Mapping the apparent locations of visual images to corresponding zones 70 advantageously facilitates selecting the most

appropriate sound exciter(s) 32 for emitting the visual images' associated sounds. For example, the X and Y coordinates may be employed to select a sound exciter 32 located at a particular location along the planar surface of one of the walls 68, while the Z coordinate gives the distance from the walls 68 for the apparent location of the associated sound. It should be noted that the Z coordinate can have a value of nil (or equivalent) when the apparent location coincides with the X-and-Y location on the sound wall 68 where the sound is being emitted by the selected sound exciter 32.

While in some embodiments, the sound exciters 32 are speakers attached to the walls 68, the sound exciters 32 in accordance with the second embodiment are operable to vibrate the sound walls 68 such that the associated sounds are emitted from the surface of the sound walls 68. While Figures 14 and 15 show the sound walls 68 as having planar surfaces, in general the sound walls 68 may have any suitable shape (e.g. curvilinear). For example, in some VR arcades each VR booth includes only one continuous sound wall 68 encircling the immersive environment defined by the VR booth. The location of each sound exciter 32 on a given sound wall 68 can be suitably defined by X and Y

coordinates regardless of the overall shape of the given sound wall 68.

Referring to Figure 16, a method of generating a three-dimensional spatially distributed audio signal according to the second embodiment is shown generally at 72. The exemplary method 72 is shown in respect of three virtual objects 74 created for display within an immersive environment.

Generally, the method 72 is similar to the method 52 of Figure 1 1 , except that the spatially distributed audio signal provides location information in three-dimensions. For example, at step 76 the given virtual object 74 (one of which is shown in Figure 10) has its X-Y-Z coordinates (i.e. its apparent location within the immersive environment) extracted. One channel is generated for each virtual object 74, and the channels are combined at step 78 to form the three-

dimensional spatially distributed audio signal employed by the apparatus and system 10 at the sound walls 68 (Figures 14 and 15).

While Figure 16 references the exemplary virtual objects 74, in general the method 72 is applicable to any visual images, including virtual-world images, virtual-world game objects, real-world images, real-world images in real-time (e.g. real-time camera vision), graphical objects, textual objects, foreground images, background imagery, and any combination thereof for example.

Thus, there is provided a system for co-locating visual images and associated sounds. The system includes: (a) a screen for displaying the visual images, the screen having a display surface upon which the visual images appear at corresponding areas of the display surface, the visual images having associated therewith apparent locations of the visual images, respectively, the apparent locations being selected from the group consisting of: corresponding zones defined within an immersive environment surrounding the screen, and the corresponding areas of the display surface; and (b) a plurality of sound exciters for producing the associated sounds so as to render a location effect of the associated sounds originating from the apparent locations of the visual images, respectively.

While embodiments of the invention have been described and illustrated, such embodiments should be considered illustrative of the invention only. The invention may include variants not described or illustrated herein in detail. Thus, the embodiments described and illustrated herein should not be considered to limit the invention as construed in accordance with the accompanying claims.