Traitement en cours

Veuillez attendre...

Paramétrages

Paramétrages

Aller à Demande

1. EP3690674 - PROCÉDÉ DE RECOMMANDATION DE CONTENU VIDÉO

Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

[ EN ]
Claims

1. A method of recommending video content using a computer-based system (30), the method comprising:

providing (101) an initial set of a plurality of videos (1); extracting (102) a digital audio signal (2) from each of said plurality of videos (1);

determining (103) at least one temporal sequence (4) of low-level audio features for each digital audio signal (2) of said plurality of videos (1) by analyzing said digital audio signals (2);

calculating (104) an audio similarity index (5) between each of said plurality of videos (1) by comparing their respective at least one temporal sequence (4) of low-level audio features;

receiving (105) a query Q comprising reference to a seed video; said seed video being one of said plurality of videos (1) ;

determining (106), for said seed video, a ranking (7) of the rest of the initial set of videos (1) based on their audio similarity index (5) with respect to said seed video; and

returning (107), as a reply to said query Q, an ordered set of video references according to said ranking (7).


  2. A method according to claim 1, further comprising:

dividing (1031) each digital audio signal (2) into a plurality of audio segments (3);

determining (1032) at least one temporal sequence (4) of low-level audio features for each audio segment (3) by analyzing said audio segments (3); and

calculating (1033) an audio similarity index (5) between each of said plurality of videos (1) by comparing the respective at least one temporal sequence (4) of low-level audio features of at least one of their audio segments (3).


  3. A method according to claim 2, wherein
said plurality of audio segments (3) are non-overlapping; and
said plurality of audio segments (3) have equal segment duration Ls, wherein said segment duration is between 1s < Ls < 60s, more preferably between 5s < Ls < 30s, more preferably Ls = 15 s.
  4. A method according to any one of claims 2 or 3, further comprising:

determining (1034) the temporal arrangement of said plurality of audio segments (3) for each digital audio signal (2); and

calculating (1033) said audio similarity index (5) between each of said plurality of videos (1) taking into account the temporal arrangement of their respective audio segments (3).


  5. A method according to any one of claims 1 to 4, wherein calculating (104) said audio similarity index (5) by comparing said at least one temporal sequence (4) of audio features comprises:

calculating (1043) at least one high-level feature vector Vf for each digital audio signal (2) or segment (3) by analyzing said at least one temporal sequence (4) of low-level audio features, wherein the elements of said high-level feature vector Vf each represent a high-level audio feature associated with said digital audio signal (2) or segment (3), and

calculating (1044) the respective pairwise distance Dp between said high-level feature vectors Vf in the vector space, wherein the shorter pairwise distance Dp represents a higher degree of similarity between the respective digital audio signals (2) or segments (3).


  6. A method according to claim 5, wherein
each of said high-level feature vectors Vf comprises nf elements, wherein
each of said elements is a real or integer number, and represents one of
a perceived musical characteristic corresponding to the style, genre, rhythm, tempo, or instrumentation; or
a perceived emotional characteristic corresponding to the mood
of the respective digital audio signal (2) or segment (3), and wherein 1 ≤ nf ≤ 256, more preferably 1 ≤ nf ≤ 100, more preferably 1 ≤ nf ≤ 34.
  7. A method according to any one of claims 5 or 6, wherein calculating (1044) the respective pairwise distance Dp between said high-level feature vectors Vf comprises:

applying (1045) Dynamic Time Warping (DTW) between said high-level feature vectors Vf,

wherein the shorter pairwise distance Dp between the respective digital audio signals (2) or segments (3) in the vector space represents a higher degree of similarity.


  8. A method according to any one of claims 5 to 7, wherein calculating (1043) said at least one high-level feature vector Vf for each digital audio signal (2) or segment (3) further comprises:

calculating (1041) at least one 2-dimensional low-level audio feature matrix (8) for each digital audio signal (2) or segment (3) based on their respective at least one temporal sequence (4) of low-level audio features,

feeding (1042) at least one of said low-level audio feature matrices (8) or said digital audio signal (2) or segment (3) into a Machine Learning (ML) engine; and

calculating (1043), using the respective output of said Machine Learning (ML) engine, at least one high-level feature vector Vf for each digital audio signal (2) or segment (3);

wherein at least one of said low-level audio features is a Mel Frequency Cepstrum Coefficient (MFCC) vector, a Mel-spectrogram, a Constant-Q transform, a Variable-Q transform, or a Short Time Fourier Transform (STFT).


  9. A method according to any one of claims 1 to 8, wherein said videos (1) in said initial set comprise pieces of metadata (10), each piece of said metadata (10) comprising textual information associated with the respective video such as title, description, tags, keywords, or MPEG-7 metadata, the method further comprising:

extracting (201) metadata (10) from each of said plurality of videos (1);

calculating (202) a metadata similarity index (11) between each of said plurality of videos (1) based on the degree of similarity between their respective metadata (10);

wherein said ranking (7) of the rest of the initial set of videos (1) is determined (106) by ensembling (203) the calculations of the respective similarity indexes of each video with respect to said seed video.


  10. A method according to any one of claims 1 to 9, the method further comprising:

collecting (301) online data (13) by analyzing online sources referring to said plurality of videos (1), said online data (13) representing similarities between said plurality of videos (1) based on at least one of Collaborative Filtering (CF), and associated editorial content;

calculating (302) an online similarity index (14) between each of said plurality of videos (1) based on the online data (13) ;

wherein said ranking (7) of the rest of the initial set of videos (1) is determined (106) by ensembling (303) the calculations of the respective similarity indexes of each video with respect to said seed video.


  11. A method according to any one of claims 1 to 10, the method further comprising
receiving (105) said query Q from a user (16);
extracting (401) user preference data (17) associated with said user (16) from a user profile database (18), said user preference data (17) representing said given user's preferences regarding the ranking (7) of said plurality of videos (1);
adjusting (402) said ranking (7) of the rest of the initial set of videos (1) according to said user preference data (17); returning (107) to said user (16), as a reply to said query Q, an ordered set of videos (1) according to said adjusted ranking (7A).
  12. A method according to any one of claims 1 to 11, the method further comprising
extracting (501) a digital visual signal (19) from each of said plurality of videos (1);
optionally dividing (502) each digital visual signal (19) into a plurality of visual segments (20), according to the segmentation of the respective digital audio signal (2) of said video;
processing said digital visual signals (19) to calculate (503) at least one visual feature vector Vfv for each digital visual signal (19) or segment (20);
calculating (504) a visual similarity index (21) between each of said plurality of videos (1) based on the respective pairwise distance Dpv between their associated visual feature vectors Vfv in the vector space, wherein the shorter pairwise distance Dpv results in a higher visual similarity index (21) between the respective videos (1);
wherein said ranking (7) of the rest of the initial set of videos (1) is determined (106) by ensembling (505) the calculations of the respective similarity indexes of each video with respect to said seed video.
  13. A method according to any one of claims 1 to 13, wherein determining (106) said ranking (7) of the rest of the initial set of videos (1) is solely based on the audio similarity index (5) of each video with respect to said seed video.
  14. A computer-based system (30) for recommending video content, the system comprising:

a storage medium (31) configured to store a plurality of videos (1);

an input device (36) configured to receive a query Q from a user (16) comprising reference to a seed video, said seed video being one of said plurality of videos (1);

a processor (32) configured to execute the steps of a method according to any one of claims 1 to 13; and

a display device (38) controlled by said processor (32) and configured to show to said user (16), as a reply to said query Q, an ordered set of videos (1) according to the ranking (7) determined by executing the steps of said method.


  15. A computer-based system (30) according to claim 14, wherein
said display device (38) is configured to display, as part of a user interface (380), a seed video selector area (384) comprising a plurality of visual representations T1..n, each visual representation T representing one video from an initial set of videos (1),
said input device (36) is configured to allow a user (16) to select one of the visual representations T of said videos (1) from the seed video selector area (384), and to send a query Q to a computer-based system (30) comprising a reference to a seed video based according to the selected visual representation (386), and
wherein said display device (38) is further configured to display, as part of said user interface (380), a video recommendation area (389) comprising a plurality of visual representations T1..m, wherein each visual representation T represents one video (1) from said initial set of videos (1), and wherein said plurality of visual representations T1..m are ordered according to a ranking (7) determined by executing the steps of a method according to any one of claims 1 to 13 by said processor (32) on said computer based system (30).
  16. A computer-based system (30) according to claim 15, wherein
said display device (38) is further configured to display, as part of said user interface (380), a recommendation adjustment area (390) comprising visual means for dynamically adjusting the order of said visual representations T1..m in said video recommendation area (389), wherein said adjustment is achieved by one of
adjusting the weight with which user preference data (17) is taken into account when calculating said ranking (7), or adjusting the weight with which different similarity indexes, such as a metadata similarity index, an online similarity index, or a visual similarity index is taken into account during ensembling calculations for determining said ranking (7),
wherein said visual means comprise at least one of a graphical element (such as a slider) or a numerical input field.