Certains contenus de cette application ne sont pas disponibles pour le moment.
Si cette situation persiste, veuillez nous contacter àObservations et contact
1. (WO2019032305) DÉTECTION DE MISE EN PLACE ET ENLÈVEMENT D'ARTICLE À L'AIDE D'UNE RECONNAISSANCE D'IMAGE
Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

CLAIMS

1. A system for tracking puts and takes of inventory items by subj ects in an area of real space, comprising:

a plurality of cameras, cameras in the plurality of cameras producing respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras;

a processing system coupled to the plurality of cameras, the processing system including a plurality of image recognition engines, receiving corresponding sequences of images from the plurality of cameras, image recognition engines in the plurality of image recognition engines processing the images in the corresponding sequences to identify subjects represented in the images; and

logic to process sets of images in the sequences of images that include the identified subjects to detect takes of inventory items by identified subjects and puts of inventory items on shelves by identified subjects.

2. The system of claim 1, wherein the logic to process sets of images includes:

for identified subjects, logic to process images to generate classifications of the images of the identified subjects, the classifications including whether the identified subject is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location of a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item.

3. The system of claim 2, including logic to perform time sequence analysis over the classifications of images to detect said takes and said puts by the identified subjects.

4. The system of claim 1, wherein the logic to process sets of images includes:

for identified subjects, logic to identify bounding boxes of data representing hands in images in the sets of images of the identified subjects, and to process data in the bounding boxes to generate classifications of data within the bounding boxes for the identified subjects.

5. The system of claim 4, wherein the classifications include whether the identified subject is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item.

6. The system of claim 4, including logic to perform time sequence analysis over the classifications of data within the bounding boxes in the sets of images to detect said takes and said puts by the identified subjects.

7. The system of claim 1 , including circular buffers coupled to cameras in the plurality of cameras to store sets of images in the sequences of images from the plurality of cameras.

8. The system of claim 1 , wherein the logic to process sets of images comprises convolutional neural networks.

9. The system of claim 1 , wherein cameras in the plurality of cameras are configured to generate synchronized sequences of images.

10. The system of claim 1 , wherein the plurality of cameras comprise cameras disposed over and having fields of view encompassing respective parts of the area in real space.

1 1. The system of claim 1 , including logic responsive to the detected takes and puts, to generate a log data structure including a list of inventory items for each identified subject.

12. A method for tracking puts and takes of inventory items by subjects in an area of real space, the method including:

using a plurality of cameras to produce respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras;

receiving corresponding sequences of images from the plurality of cameras, processing the images in the corresponding sequences using image recognition engines in a plurality of image recognition engines and identifying subjects represented in the images wherein the

plurality of image recognition engines are part of a processing system coupled to the plurality of cameras; and

processing sets of images in the sequences of images that include the identified subjects to detect takes of inventory items by identified subjects and puts of inventory items on shelves by identified subjects.

13. The method of claim 12, wherein the processing sets of images includes:

for identified subjects, generating classifications of the images of the identified subjects, the classifications including whether the identified subject is holding an inventory item, a first neamess classification indicating a location of a hand of the identified subject relative to a shelf, a second neamess classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item.

14. The method of claim 13, including performing time sequence analysis over the classifications of images to detect said takes and said puts by the identified subjects.

15. The method of claim 12, wherein the processing sets of images includes:

for identified subjects, identifying bounding boxes of data representing hands in images in the sets of images of the identified subjects, and processing data in the bounding boxes to generate classifications of data within the bounding boxes for the identified subjects.

16. The method of claim 15, wherein the classifications include whether the identified subject is holding an inventory item, a first neamess classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third neamess classification indicating a location a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item.

17. The method of claim 15, including performing time sequence analysis over the classifications of data within the bounding boxes in the sets of images to detect said takes and said puts by the identified subjects.

18. The method of claim 12, including circular buffers coupled to cameras in the plurality of cameras to store sets of images in the sequences of images from the plurality of cameras.

19. The method of claim 12, including processing sets of images using convolutional neural networks.

20. The method of claim 12, wherein cameras in the plurality of cameras are configured to generate synchronized sequences of images.

21. The method of claim 12, wherein the plurality of cameras comprise cameras disposed over and having fields of view encompassing respective parts of the area in real space.

22. The method of claim 12, including responsive to the detected takes and puts, generating a log data structure including a list of inventory items for each identified subject.

23. A system for tracking puts and takes of inventory items by subjects in an area of real space, comprising:

a plurality of cameras, cameras in the plurality of cameras producing respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras;

a processing system coupled to the plurality of cameras, the processing system including: first image recognition engines, receiving the sequences of images from the plurality of cameras, which process images to generate first data sets that identify subjects and locations of the identified subjects in the real space;

logic to process the first data sets to specify bounding boxes which include images of hands of identified subjects in images in the sequences of images;

second image recognition engines, receiving the sequences of images from the plurality of cameras, which process the specified bounding boxes in the images to generate a classification of hands of the identified subjects, the classification including whether the identified subject is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location of a hand of the identified subject relative to a basket associated with an identified subj ect, and an identifier of a likely inventory item; and

logic to process the classifications of hands for sets of images in the sequences of images of identified subjects to detect takes of inventory items by identified subjects and puts of inventory items on shelves by identified subjects.

24. The system of claim 23, including circular buffers coupled to cameras in the plurality of cameras to store sets of images in the sequences of images from the plurality of cameras.

25. The system of claim 23, wherein the first data sets comprise for each identified subject sets of candidate joints having coordinates in real space.

26. The system of claim 23, wherein the logic to process the first data sets to specify bounding boxes specifies bounding boxes based on locations of joints in the sets of candidate joints for each subject.

27. The system of claim 23, wherein the second image recognition engines comprise convolutional neural networks.

28. The system of claim 23, wherein the logic to process the classifications of bounding boxes comprise convolutional neural networks.

29. The system of claim 23, wherein cameras in the plurality of cameras are configured to generate synchronized sequences of images.

30. The system of claim 23, wherein the plurality of cameras comprise cameras disposed over and having fields of view encompassing respective parts of the area in real space.

31. The system of claim 23, including logic to generate a log data structure including a list of inventory items for each identified subject.

32. A method for tracking puts and takes of inventory items by subjects in an area of real space, comprising:

using a plurality of cameras to produce respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras;

receiving the sequences of images from the plurality of cameras, and using first image recognition engines to process images to generate first data sets that identify subjects and locations of the identified subjects in the real space;

processing the first data sets to specify bounding boxes which include images of hands of identified subjects in images in the sequences of images;

receiving the sequences of images from the plurality of cameras, and processing the specified bounding boxes in the images to generate a classification of hands of the identified subjects using second image recognition engines, the classification including whether the identified subject is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location of a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item; and

processing the classifications of hands for sets of images in the sequences of images of identified subjects to detect takes of inventory items by identified subjects and puts of inventory items on shelves by identified subjects.

33. The method of claim 32, including circular buffers coupled to cameras in the plurality of cameras to store sets of images in the sequences of images from the plurality of cameras.

34. The method of claim 32, wherein the first data sets comprise for each identified subject sets of candidate joints having coordinates in real space.

35. The method of claim 32, wherein the processing the first data sets to specify bounding boxes includes specifying bounding boxes based on locations of joints in the sets of candidate joints for each subject.

36. The method of claim 32, wherein the second image recognition engines comprise convolutional neural networks.

37. The method of claim 32, including processing the classifications of bounding boxes using convolutional neural networks.

38. The method of claim 32, wherein cameras in the plurality of cameras are configured to generate synchronized sequences of images.

39. The method of claim 32, wherein the plurality of cameras comprise cameras disposed over and having fields of view encompassing respective parts of the area in real space.

40. The method of claim 32, including generating a log data structure including a list of inventory items for each identified subject.

41. A computer program product, comprising:

a computer readable memory comprising a non-transitory data storage medium;

computer instructions stored in the memory executable by a computer to track puts and takes of inventory items by subjects in an area of real space by a process including:

using a plurality of cameras to produce respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras; receiving corresponding sequences of images from the plurality of cameras, processing the images in the corresponding sequences using image recognition engines in the plurality of image recognition engines and identifying subjects represented in the images wherein the plurality of image recognition engines are part of a processing system coupled to the plurality of cameras; and

processing sets of images in the sequences of images that include the identified subjects to detect takes of inventory items by identified subjects and puts of inventory items on shelves by identified subjects.

42. The product of claim 41 , wherein the processing sets of images includes:

for identified subjects, generating classifications of the images of the identified subjects, the classifications including whether the identified subj ect is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location of a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item.

43. The product of claim 42, including performing time sequence analysis over the classifications of images to detect said takes and said puts by the identified subjects.

44. The product of claim 41, wherein the processing sets of images includes:

for identified subjects, identifying bounding boxes of data representing hands in images in the sets of images of the identified subjects, and processing data in the bounding boxes to generate classifications of data within the bounding boxes for the identified subjects.

45. The product of claim 44, wherein the classifications include whether the identified subject is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location of a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item.

46. The product of claim 44, including performing time sequence analysis over the classifications of data within the bounding boxes in the sets of images to detect said takes and said puts by the identified subjects.

47. The product of claim 41, including circular buffers coupled to cameras in the plurality of cameras to store sets of images in the sequences of images from the plurality of cameras.

48. The product of claim 41, including processing sets of images using convolutional neural networks.

49. The product of claim 41, wherein cameras in the plurality of cameras are configured to generate synchronized sequences of images.

50. The product of claim 41, wherein the plurality of cameras comprise cameras disposed over and having fields of view encompassing respective parts of the area in real space.

51. The product of claim 41 , including responsive to the detected takes and puts, generating a log data structure including a list of inventory items for each identified subject.

52. A computer program product, comprising:

a computer readable memory comprising a non-transitory data storage medium;

computer instructions stored in the memory executable by a computer to track puts and takes of inventory items by subjects in an area of real space, comprising:

using a plurality of cameras to produce respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras; receiving the sequences of images from the plurality of cameras, and using first image recognition engines to process images to generate first data sets that identify subjects and locations of the identified subjects in the real space;

processing the first data sets to specify bounding boxes which include images of hands of identified subjects in images in the sequences of images;

receiving the sequences of images from the plurality of cameras, and processing the specified bounding boxes in the images to generate a classification of hands of the identified subjects using second image recognition engines, the classification including whether the identified subject is holding an inventory item, a first nearness classification indicating a location of a hand of the identified subject relative to a shelf, a second nearness classification indicating a location of a hand of the identified subject relative to a body of the identified subject, a third nearness classification indicating a location of a hand of the identified subject relative to a basket associated with an identified subject, and an identifier of a likely inventory item; and

processing the classifications of hands for sets of images in the sequences of images of identified subjects to detect takes of inventory items by identified subjects and puts of inventory items on shelves by identified subjects.

53. The product of claim 52, including circular buffers coupled to cameras in the plurality of cameras to store sets of images in the sequences of images from the plurality of cameras.

54. The product of claim 52, wherein the first data sets comprise for each identified subject sets of candidate joints having coordinates in real space.

55. The product of claim 52, wherein processing the first data sets to specify bounding boxes includes specifying bounding boxes based on locations of joints in the sets of candidate joints for each subj ect.

56. The product of claim 52, wherein the second image recognition engines comprise convolutional neural networks.

57. The product of claim 52, including processing the classifications of bounding boxes using convolutional neural networks.

58. The product of claim 52, wherein cameras in the plurality of cameras are configured to generate synchronized sequences of images.

59. The product of claim 52, wherein the plurality of cameras comprise cameras disposed over and having fields of view encompassing respective parts of the area in real space.

60. The product of claim 52, including generating a log data structure including a list of inventory items for each identified subject.

61. A system comprising:

a camera producing a sequences of images including a hand;

a processing system coupled to the camera, the processing system including a hand image recognition engine, receiving the sequence of images, to generate classifications of the hand in time sequence, and logic to process the classifications of the hand from the sequence of images to identify an action by the subject.

62. The system of claim 61 , wherein, the actions are puts and takes of inventory items.

63. The system of claim 61 , including logic to identify locations of joints of subjects in the images in the sequences of images, and to identify bounding boxes in corresponding images that include the hands of the subjects based on the identified joints.