国際・国内特許データベース検索
このアプリケーションの一部のコンテンツは現時点では利用できません。
このような状況が続く場合は、にお問い合わせくださいフィードバック & お問い合わせ
1. (WO2019046820) IDENTIFICATION OF INDIVIDUALS IN A DIGITAL FILE USING MEDIA ANALYSIS TECHNIQUES
注意: このテキストは、OCR 処理によってテキスト化されたものです。法的な用途には PDF 版をご利用ください。

CLAIMS

WHAT IS CLAIMED IS:

1. A method for identifying individuals within a video, the method comprising:

accessing, from computer memory, a video describing the movement of one or more unidentified individuals over a period of time and comprising one or more frames;

dividing the video into a set of segments, wherein each segment describes a part of a frame of the video;

adjusting, for each segment, a pixel resolution of the segment to a detection resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment, wherein at the detection resolution a size of the face in the segment increases relative to the size of the face in the frame; responsive to the detection algorithm detecting a face, adjusting, for each segment, the pixel resolution of the segment from the detection resolution to a recognition resolution such that a recognition resolution matches the face of the unidentified individual to a target individual;

determining, for each match, a confidence level describing the accuracy of the match between the unidentified individual and the target individual, wherein the confidence level is related to the distance between a feature vector of the face of the target individual and a feature vector of the face of the unidentified individual; and

generating a report of search results indicating that the unidentified individual within the video matched a target individual, the confidence level assigned to that match, and a notification indicating where in the video the target individual appeared in the video.

2. The method of claim 1, wherein the video further includes one or more unidentified objects, the video file describing, for each object, the one or more of the following: a position of the object;

a movement of the object; and

an orientation of the object.

3. The method of claim 2, wherein each object is associated with a class, the class

representing a group of objects with at least one shared feature.

4. The method of claim 1, further comprising:

dividing the video into a set of frames, wherein each frame corresponds to a range of timestamps from the period of time during which the video was recorded; and dividing each frame into a set of segments, wherein each segment includes a portion of data stored within the frame within the processing capacity of the detection algorithm and the recognition algorithm.

The method of claim 1, wherein a segment represents a static image representing a portion of data stored by the video.

The method of claim 1, wherein adjusting the pixel resolution of the segment to the detection resolution comprises:

decreasing the pixel resolution for the segment from an original pixel resolution of the video to the detection resolution;

at the detection resolution, detecting, by the detection algorithm, a face of an

unidentified individual based on one or more physical features of the face; and generating, by the detection algorithm, a bounding box encompassing the detected face, wherein the bounding box demarcates the detected face from a surrounding environment recorded by the video.

The method of claim 1, wherein adjusting the pixel resolution of the segment to the detection resolution comprises:

accessing, from computer memory, a pixel resolution within the processing capacity of the detection algorithm; and

assigning the accessed pixel resolution as the detection resolution.

The method of claim 1, wherein the detection algorithm is a neural network.

The method of claim 1, wherein adjusting the pixel resolution of the segment to the recognition resolution comprises:

accessing, from computer memory, a pixel resolution within the processing capacity of the recognition algorithm;

assigning the accessed pixel resolution as the recognition resolution; and

increasing the pixel resolution for the segment from the detection resolution of the segment to the recognition resolution.

The method of claim 1, wherein adjusting the pixel resolution of the segment to the recognition resolution comprises:

identifying, by the recognition algorithm, the corners of the bounding box

surrounding a detected face, wherein the corners of the bounding box correspond to a group of pixels of the segment;

mapping the group of pixels associated with each corner of the bounding box at the detection resolution to a corresponding group of pixels within the segment at the recognition resolution;

generating a bounding box by connecting the mapped group of pixels at the

recognition resolution; and

applying the recognition algorithm to match a face within the bounding box to a target individual.

The method of claim 1, wherein adjusting the pixel resolution of the segment to the recognition resolution comprises:

identifying a face within a bounding box based on one or more colors of pixels

representing physical features of the face;

for each remaining pixel of the bounding box, contrasting the environment

surrounding the face by normalizing the color of each remaining pixel of the bounding box sharing a row and column, the contrast focusing an image of the face; and

extracting the feature vector from the focused image of the face.

The method of claim 11, further comprising:

removing pixels from the environment surrounding the face from the bounding box. The method of claim 1, wherein matching the face of the unidentified individual to a target individual comprises:

extracting, for each segment, a feature vector describing at least one physical feature of the face of each unidentified individual within the segment, wherein the feature vector is extracted by a neural network.

The method of claim 13, wherein extracting a feature vector comprises:

extracting image data describing the physical features of the face of the unidentified individual from the segment of the video;

providing the extracted image data as input to a neural network comprising a plurality of layers; and

extracting the feature vector representing the face based upon an output of a hidden layer of the neural network.

15. The method of claim 14, wherein physical features of a face comprise one or more of the following:

a piece of eyewear;

facial hair;

a piece of headwear;

illumination of the face based on the orientation of the face relative to a light source; and

a facial expression on the face.

16. The method of claim 1, further comprising:

receiving, from a user device, a query to identify one or more target individuals, the query comprising the feature vector describing physical features of the face of the target individual; and

executing a search, within each segment of the video, an unidentified individual

matching each target individual of the query.

17. The method of claim 1, wherein matching an unidentified individual to a target

individual comprises:

determining a distance between the feature vector describing the face of each target individual and the extracted feature vector of each unidentified individual in a segment; and

ranking each match based on the determined distance.

18. The method of claim 1, wherein the distance comprises a Euclidean distance or a Hamming distance.

19. The method of claim 1, further comprising:

comparing, for each match, the determined distance between the feature vector of the target individual and the extracted feature vector of the unidentified individual to a threshold distance; and

determining, responsive to the comparison between the threshold distance and the determined distance, the confidence level for the match.

20. The method of claim 1, wherein the confidence level for a match is inversely related to the determined distance between the feature vector of the face of the target individual and the extracted feature vector of the unidentified individual.

21. The method of claim 1, wherein the confidence level is a quantitative measurement or a qualitative measurement, the qualitative measurement comprising a verbal value and the quantitative measurement comprising a numerical value.

22. The method of claim 1, wherein the report presented through a user device further comprises:

the confidence level for each segment of the video;

the confidence level for each match; and

one or notifications indicating when a target individual appears in the video.

23. The method of claim 1, further comprising:

detecting, within a plurality of segments, an unidentified individuals, wherein the detections access the extracted feature vector for the face of the unidentified individual for each segment of the plurality;

determining, for each pair of consecutive segments, a distance between the extracted feature vectors;

responsive to determining the distance to be within a threshold distance, generating, for each pair of consecutive segments, a representative feature vector by aggregating the feature vectors from the pair of segments; and clustering, across any segment of the video, representative feature vectors determined to be within a threshold distance.

24. The method of claim 23, wherein the representative feature vector is extracted based on a computation of the mean of the detected feature vectors.

25. The method of claim 23, wherein each cluster is assigned a confidence level

describing the distance between the tracks of the cluster.

26. The method of claim 1, further comprising:

identifying, from the one or more segments of the video, segments in which an

unidentified individual was present with a second individual; incrementing, for each combination of unidentified individuals and second

individuals, the number of segments in which both individuals are present; and assigning a label to each combination based on the incremented number of segments, the label describing a strength of the relationship between the individuals of the combination.

27. The method of claim 1, further comprising:

identifying, from the one or more segments of the video, segments in which an unidentified individual was present with a second individual;

accessing, for each of the segments, the confidence level of the match for the

unidentified individual; and

assigning a label to each combination based on the confidence level of each match within each segment, the label describing the strength of the relationship between the individuals of the combination.

A non-transitory computer readable storage medium comprising stored program code executable by at least one processor, the program code when executed causes the processor to:

access, from computer memory, a video describing the movement of one or more unidentified individuals over a period of time and comprising one or more frames;

divide the video into a set of segments, wherein each segment describes a part of a frame of the video;

adjust, for each segment, a pixel resolution of the segment to a detection resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment, wherein at the detection resolution a size of the face in the segment increases relative to the size of the face in the frame; responsive to the detection algorithm detecting a face, adjust, for each segment, the pixel resolution of the segment from the detection resolution to a recognition resolution such that a recognition resolution matches the face of the unidentified individual to a target individual;

determine, for each match, a confidence level describing the accuracy of the match between the unidentified individual and the target individual, wherein the confidence level is related to the distance between a feature vector of the face of the target individual and a feature vector of the face of the unidentified individual; and

generate a report of search results indicating that the unidentified individual within the video matched a target individual, the confidence level assigned to that match, and a notification indicating where in the video the target individual appeared in the video.

29. The non-transitory computer readable storage medium of claim 28, further comprising stored program code that when executed causes the processor to:

decrease the pixel resolution for the segment from an original pixel resolution of the video to the detection resolution;

at the detection resolution, detect, by the detection algorithm, a face of an unidentified individual based on one or more physical features of the face; and generate, by the detection algorithm, a bounding box encompassing the detected face, wherein the bounding box demarcates the detected face from a surrounding environment recorded by the video.

30. The non-transitory computer readable storage medium of claim 28, further comprising stored program code that when executed causes the processor to:

identify, by the recognition algorithm, the corners of the bounding box surrounding a detected face, wherein the corners of the bounding box correspond to a group of pixels of the segment;

map the group of pixels associated with each corner of the bounding box at the

detection resolution to a corresponding group of pixels within the segment at the recognition resolution;

generate a bounding box by connecting the mapped group of pixels at the recognition resolution; and

apply the recognition algorithm to match a face within the bounding box to a target individual.

31. The non-transitory computer readable storage medium of claim 28, further comprising stored program code that when executed causes the processor to:

identify a face within a bounding box based on one or more colors of pixels

representing physical features of the face;

for each remaining pixel of the bounding box, contrast the environment surrounding the face by normalizing the color of each remaining pixel of the bounding box sharing a row and column, the contrast focusing an image of the face; and extract the feature vector from the focused image of the face.

32. A system comprising:

a sensor assembly, communicatively coupled to the processor, recording sensor data and storing the sensor data in computer memory;

a processor; and

a non-transitory computer readable storage medium comprising stored program code executable by at least one processor, the program code when executed causes processor to:

access, from computer memory, a video describing the movement of one or more unidentified individuals over a period of time and comprising one or more frames;

divide the video into a set of segments, wherein each segment describes a part of a frame of the video;

adjust, for each segment, a pixel resolution of the segment to a detection

resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment, wherein at the detection resolution a size of the face in the segment increases relative to the size of the face in the frame;

responsive to the detection algorithm detecting a face, adjust, for each

segment, the pixel resolution of the segment from the detection resolution to a recognition resolution such that a recognition resolution matches the face of the unidentified individual to a target individual; determine, for each match, a confidence level describing the accuracy of the match between the unidentified individual and the target individual, wherein the confidence level is related to the distance between a feature vector of the face of the target individual and a feature vector of the face of the unidentified individual; and

generate a report of search results indicating that the unidentified individual within the video matched a target individual, the confidence level assigned to that match, and a notification indicating where in the video the target individual appeared in the video.

The system of claim 32, wherein the stored program code further comprises program code that when executed causes the processor to:

decrease the pixel resolution for the segment from an original pixel resolution of the video to the detection resolution;

at the detection resolution, detect, by the detection algorithm, a face of an unidentified individual based on one or more physical features of the face; and

generate, by the detection algorithm, a bounding box encompassing the detected face, wherein the bounding box demarcates the detected face from a surrounding environment recorded by the video.

34. The system of claim 32, wherein the stored program code further comprises program code that when executed causes the processor to:

identify, by the recognition algorithm, the corners of the bounding box surrounding a detected face, wherein the corners of the bounding box correspond to a group of pixels of the segment;

map the group of pixels associated with each corner of the bounding box at the

detection resolution to a corresponding group of pixels within the segment at the recognition resolution;

generate a bounding box by connecting the mapped group of pixels at the recognition resolution; and

apply the recognition algorithm to match a face within the bounding box to a target individual.

35. The system of claim 32, wherein the stored program code further comprises program code that when executed causes the processor to:

identify a face within a bounding box based on one or more colors of pixels

representing physical features of the face;

for each remaining pixel of the bounding box, contrast the environment surrounding the face by normalizing the color of each remaining pixel of the bounding box sharing a row and column, the contrast focusing an image of the face; and extract the feature vector from the focused image of the face.