Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020197636 - ENCODER USING MACHINE-TRAINED TERM FREQUENCY WEIGHTING FACTORS THAT PRODUCES A DENSE EMBEDDING VECTOR

Publication Number WO/2020/197636
Publication Date 01.10.2020
International Application No. PCT/US2020/016305
International Filing Date 03.02.2020
IPC
G06F 16/33 2019.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
33Querying
G06N 3/04 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architecture, e.g. interconnection topology
CPC
G06F 16/245
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
24Querying
245Query processing
G06F 16/248
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
24Querying
248Presentation of query results
G06F 16/3347
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
33Querying
3331Query processing
334Query execution
3347using vector based model
G06F 17/16
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
10Complex mathematical operations
16Matrix or vector computation ; , e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
G06K 9/325
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
20Image acquisition
32Aligning or centering of the image pick-up or image-field
3233Determination of region of interest
325Detection of text region in scene imagery, real life image or Web pages, e.g. licenses plates, captions on TV images
G06K 9/627
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
6267Classification techniques
6268relating to the classification paradigm, e.g. parametric or non-parametric approaches
627based on distances between the pattern to be recognised and training or reference patterns
Applicants
  • MICROSOFT TECHNOLOGY LICENSING, LLC [US]/[US]
Inventors
  • WANG, Yan
  • WU, Ye
  • HU, Houdong
  • ULABALA, Surendra
  • THAKKAR, Vishal
  • SACHETI, Arun
Agents
  • MINHAS, Sandip, S.
  • ADJEMIAN, Monica
  • BARKER, Doug
  • CHATTERJEE, Aaron C.
  • CHEN, Wei-Chen Nicholas
  • CHOI, Daniel
  • CHURNA, Timothy
  • DINH, Phong
  • EVANS, Patrick
  • GABRYJELSKI, Henry
  • GUPTA, Anand
  • HINOJOSA-SMITH, Brianna L.
  • HWANG, William C.
  • JARDINE, John S.
  • LEE, Sunah
  • LEMMON, Marcus
  • MARQUIS, Thomas
  • MEYERS, Jessica
  • ROPER, Brandon
  • SPELLMAN, Steven
  • SULLIVAN, Kevin
  • SWAIN, Cassandra T.
  • WALKER, Matt
  • WIGHT, Stephen A.
  • WISDOM, Gregg
  • WONG, Ellen
  • WONG, Thomas S.
  • ZHANG, Hannah
  • TRAN, Kimberly
Priority Data
16/368,79828.03.2019US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) ENCODER USING MACHINE-TRAINED TERM FREQUENCY WEIGHTING FACTORS THAT PRODUCES A DENSE EMBEDDING VECTOR
(FR) CODEUR UTILISANT DES FACTEURS DE PONDÉRATION DE FRÉQUENCE À TERME ENTRAÎNÉS PAR MACHINE QUI PRODUIT UN VECTEUR D'INCORPORATION DENSE
Abstract
(EN)
A computer-implemented technique is described herein for generating a dense embedding vector that provides a distribution representation of input text. In one implementation, the technique includes: generating an input term -frequency (TF) vector of dimension g that includes frequency information relating to frequency of occurrence of terms in an instance of input text; using a TF-modifying to modify the term-specific frequency information in the input TF vector by respective machine-trained weighting factors, to produce an intermediate vector of dimension g, using a projection component to project the intermediate vector of dimension g into an embedding vector of dimension k, where k is less than g. Both the TF-modifying component and the projection component can use respective machine-trained neural networks. An application component can perform any of a retrieval-based function, a recognition-based function, a recommendation- based function, a classification-based function, etc. based on the embedding vector.
(FR)
L'invention concerne une technique mise en œuvre par ordinateur permettant de générer un vecteur d'incorporation dense qui fournit une représentation de distribution de texte d'entrée. Selon un mode de réalisation, la technique consiste : à générer un vecteur de fréquence à terme d'entrée (TF) de dimension g qui comprend des informations de fréquence relatives à la fréquence d'occurrence de termes dans une instance de texte d'entrée ; à utiliser une modification de TF pour modifier les informations de fréquence spécifiques à un terme dans le vecteur de TF d'entrée par des facteurs de pondération respectifs entraînés par machine, pour produire un vecteur intermédiaire de dimension g, à utiliser un composant de projection pour projeter le vecteur intermédiaire de dimension g dans un vecteur d'incorporation de dimension k, k étant inférieur à g. Le composant de modification de TF et le composant de projection peuvent tous deux utiliser des réseaux neuronaux entraînés par machine respectifs. Un composant d'application peut effectuer n'importe quelle fonction parmi une fonction basée sur une récupération, une fonction basée sur une reconnaissance, une fonction basée sur une recommandation, une fonction basée sur une classification, etc. en fonction du vecteur d'incorporation.
Also published as
Latest bibliographic data on file with the International Bureau