Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020223487 - VOLUMETRIC CAPTURE OF OBJECTS WITH A SINGLE RGBD CAMERA

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

WHAT IS CLAIMED IS:

1. A method for generating an image comprising:

receiving a first image including color data and depth data;

determining a viewpoint associated with an augmented reality (AR) and/or virtual reality (VR) display displaying a second image;

receiving at least one calibration image including an object in the first image, the object being in a different pose as compared to a pose of the object in the first image; and

generating the second image based on the first image, the viewpoint and the at least one calibration image.

2. The method of claim 1 , wherein the first image is received from a single camera configured to capture the color data as red, green, blue (RGB) data and at least one of capture the depth data and generate the depth data based on the color data.

3. The method of any of claims 1 and 2, wherein the viewpoint associated with the AR and/or VR display is different than a viewpoint associated with the first image.

4. The method of any of claims 1 to 3, wherein the at least one calibration image is a silhouette image of the object.

5. The method of any of claims 1 to 4, wherein the generating of the second image includes,

determining a target pose of the object by mapping two dimensional (2D) keypoints to corresponding three dimensional (3D) points of depth data associated with the at least one calibration image, and

generating the second image by warping the object in the at least one calibration image using a convolutional neural network that takes the at least one calibration image and the target pose of the object as input.

6. The method of any of claims 1 to 5, wherein the generating of the second image includes,

generating at least one part-mask in a first pass of a convolutional neural network having the at least one calibration image as an input,

generating at least one part-image in the first pass of the convolutional neural network, and

generating the second image a second pass of the convolutional neural network having the at least one part-mask and the at least one part-image as input.

7. The method of any of claims 1 to 6, wherein the generating of the second image includes using two passes of a convolutional neural network that is trained by minimizing at least two losses associated with warping the object.

8. The method of any of claims 1 to 7, wherein the second image is blended using a neural network to generate missing portions of the second image.

9. The method of any of claims 1 to 8, wherein the second image is a silhouette image of the object, the method further comprising merging the second image with a background image.

10. The method of any of claims 1 to 9, further comprising:

a pre-processing stage in which a plurality of images are captured while the pose of the object is changed;

storing the plurality of images as the at least one calibration image;

generating a similarity score for each of the at least one calibration image based on a target pose; and

selecting the at least one calibration image from the at least one calibration image based on the similarity score.

11. The method of any of claims 1 to 10, further comprising:

a pre-processing stage in which a plurality of images are captured while the pose of the object is changed;

storing the plurality of images as the at least one calibration image;

capturing an image, during a communications event, the image including the object in a new pose, and

adding the image to the stored plurality of images.

12. A non-transitory computer-readable storage medium having stored thereon computer executable program code which, when executed on a computer system, causes the computer system to perform a method comprising:

receiving a first image including color data and depth data;

determining a viewpoint associated with an augmented reality (AR) and/or virtual reality (VR) display displaying a second image;

receiving at least one calibration image including an object in the first image, the object being in a different pose as compared to a pose of the object in the first image; and

generating the second image based on the first image, the viewpoint and the at least one calibration image.

13. The non-transitory computer-readable storage medium of claim 12, wherein the first image is received from a single sensor configured to capture the color data as red, green, blue (RGB) data and at least one of capture the depth data and generate the depth data based on the color data.

14. The non-transitory computer-readable storage medium of any of claims 12 and

13, wherein the generating of the second image includes:

determining a target pose of the object by mapping two dimensional (2D) keypoints to corresponding three dimensional (3D) points of depth data associated with the at least one calibration image, and

generating the second image by warping the object in the at least one calibration image using a convolutional neural network that takes the at least one calibration image and the target pose of the object as input.

15. The non-transitory computer-readable storage medium of any of claims 12 to

14, wherein the generating of the second image includes:

generating at least one part-mask in a first pass of a convolutional neural network having the at least one calibration image as an input,

generating at least one part-image in the first pass of the convolutional neural network, and

generating the second image a second pass of the convolutional neural network having the at least one part-mask and the at least one part-image as input.

16. The non-transitory computer-readable storage medium of any of claims 12 to

15, wherein the second image is blended using a neural network to generate missing portions of the second image.

17. The non-transitory computer-readable storage medium of any of claims 12 to

16, wherein the second image is a silhouette image of the object, the method further comprising merging the second image with a background image.

18. The non-transitory computer-readable storage medium of any of claims 12 to

17, the method further comprising:

a pre-processing stage in which a plurality of images are captured while the pose of the object is changed;

storing the plurality of images as the at least one calibration image;

generating a similarity score for each of the at least one calibration image based on a target pose; and

selecting the at least one calibration image from the at least one calibration image based on the similarity score.

19. The non-transitory computer-readable storage medium of any of claims 12 to

18, the method further comprising:

a pre-processing stage in which a plurality of images are captured while the pose of the object is changed;

storing the plurality of images as the at least one calibration image;

capturing an image, during a communications event, the image including the object in a new pose, and

adding the image to the stored plurality of images.

20. An augmented reality (AR) and/or virtual reality (VR) system comprising: a sensor configured to capture color data and depth data; and

a processor configured to:

receive a first image from the sensor,

receive a viewpoint from an AR and/or VR display displaying a second image,

receive at least one calibration image including an object in the first image, the object being in a different pose as compared to a pose of the object in the first image, and

generate the second image based on the first image, the viewpoint and the at least one calibration image.