Processing

Please wait...

Settings

Settings

Goto Application

1. WO2021046191 - CLASSIFYING BUSINESS SUMMARIES AGAINST A HIERARCHICAL INDUSTRY CLASSIFICATION STRUCTURE USING SUPERVISED MACHINE LEARNING

Publication Number WO/2021/046191
Publication Date 11.03.2021
International Application No. PCT/US2020/049158
International Filing Date 03.09.2020
IPC
G06N 20/00 2019.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
20Machine learning
CPC
G06F 17/16
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
10Complex mathematical operations
16Matrix or vector computation ; , e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
G06F 17/18
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
10Complex mathematical operations
18for evaluating statistical data ; , e.g. average values, frequency distributions, probability functions, regression analysis
G06F 40/279
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
40Handling natural language data
20Natural language analysis
279Recognition of textual entities
G06F 40/30
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
40Handling natural language data
30Semantic analysis
G06K 9/6262
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
6262Validation, performance evaluation or active pattern learning techniques
G06K 9/627
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
6267Classification techniques
6268relating to the classification paradigm, e.g. parametric or non-parametric approaches
627based on distances between the pattern to be recognised and training or reference patterns
Applicants
  • THE DUN & BRADSTREET CORPORATION [US]/[US]
Inventors
  • ZHILTSOV, Nikita
Agents
  • GREELEY, Paul, D.
Priority Data
16/559,96304.09.2019US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) CLASSIFYING BUSINESS SUMMARIES AGAINST A HIERARCHICAL INDUSTRY CLASSIFICATION STRUCTURE USING SUPERVISED MACHINE LEARNING
(FR) CLASSIFICATION DE RÉSUMÉS COMMERCIAUX ADOSSÉE À UNE STRUCTURE DE CLASSIFICATION D'INDUSTRIE HIÉRARCHIQUE À L'AIDE D'UN APPRENTISSAGE MACHINE SUPERVISÉ
Abstract
(EN)
A classification system is provided for classifying text-based business summaries, referred to herein as "summaries," against a hierarchical industry classification structure. The classification system includes a word-based sub classifier that uses a neural network to generate a vector space for each summary in a training set, where each summary in the training set is known to correspond to a particular industry classification in the hierarchical industry classification structure. Weight values in the hidden layer of a neural network used by the word-based sub classifier are changed to improve the predictive capabilities of the neural network in the business summary classification context. Embodiments include increasing representation in the training set for underrepresented parent industry classifications and attributes of the hierarchical industry classification structure, such as distances between industry classifications and whether industry classifications are in the same subgraph. The completion of training of the word-based sub classifier is based upon whether a performance metric, such as an hF1 score, satisfies one or more early stopping criteria. The classification system also includes a category-based sub classifier and a meta classifier.
(FR)
L'invention concerne un système de classification destiné à classifier des résumés commerciaux à base de texte, ici appelés « résumés », adossé une structure de classification d'industrie hiérarchique. Le système de classification comprend un sous-classificateur basé sur des mots qui utilise un réseau neuronal pour générer un espace vectoriel pour chaque résumé dans un ensemble d'apprentissage, chaque résumé de l'ensemble d'apprentissage étant connu pour correspondre à une classification d'industrie particulière de la structure de classification d'industrie hiérarchique. Des valeurs de poids dans la couche cachée d'un réseau neuronal utilisé par le sous-classificateur basé sur des mots sont modifiées pour améliorer les capacités prédictives du réseau neuronal dans le contexte de classification de résumé commercial. Des modes de réalisation comprennent une augmentation de la représentation dans l'ensemble d'apprentissage pour des classifications et des attributs d'industrie parent sous-représentés de la structure de classification d'industrie hiérarchique, tels que des distances entre des classifications industrielles et si des classifications industrielles se situent dans le même sous-graphe. L'achèvement de l'apprentissage du sous-classificateur basé sur des mots est basé sur le fait qu'une mesure de performance, telle qu'un score hF1, satisfait un ou plusieurs critères d'arrêt précoces. Le système de classification comprend également un sous-classificateur basé sur des catégories et un méta-classificateur.
Also published as
Latest bibliographic data on file with the International Bureau