Processing

Please wait...

Settings

Settings

Goto Application

1. EP1830281 - KEYWORD EXTRACTING DEVICE

Office
European Patent Office
Application Number 05793129
Application Date 11.10.2005
Publication Number 1830281
Publication Date 05.09.2007
Publication Kind A1
IPC
G06F 17/30
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
30Information retrieval; Database structures therefor
CPC
G06F 16/313
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
31Indexing; Data structures therefor; Storage structures
313Selection or weighting of terms for indexing
Applicants INTELLECTUAL PROPERTY BANK
Inventors MASUYAMA HIROAKI
SATO HARU-TADA
ASADA MAKOTO
HASUKO KAZUMI
HOTTA HIDEAKI
Designated States
Priority Data 2004322924 05.11.2004 JP
Title
(DE) SCHLÜSSELWORT-EXTRAKTIONSEINRICHTUNG
(EN) KEYWORD EXTRACTING DEVICE
(FR) DISPOSITIF D EXTRACTION DE MOTS-CLÉ
Abstract
(EN) A keyword extraction device for extracting keywords from a document group including a plurality of documents, the device comprising of an index term extraction means for extracting index terms from data of the document group and a high-frequency term extraction means for calculating a weight including evaluation on the level of an appearance frequency of each index term in the document group and extracting high-frequency terms which are the index terms having a great weight. High-frequency term/index term co-occurrence degree calculating means for calculating a co-occurrence degree of each high-frequency term and each index term in the document group on the basis of the presence or absence of the co-occurrence of the corresponding high-frequency term and the corresponding index term in each document. Clustering means for creating clusters by classifying the high-frequency terms on the basis of the calculated co-occurrence degree. Score calculating means for calculating a score of each index term such that a high score is given to the index term among the index terms that co-occurs with the high-frequency term belonging to more clusters and that co-occurs with the high-frequency term in more documents and keyword extraction means for extracting keywords on the basis of the calculated scores.
(FR) Le dispositif d’extraction de mots-clé comprend un moyen d’extraction de mots-clé très fréquents (30) pour extraire des mots très fréquents qui sont des mots d’index ayant un grand poids et contenant l’indication de fréquence dans un groupe de documents (E) consistant en documents (D) dans l’évaluation à partir des mots d’index (w) contenus dans le groupe de documents (E), un moyen de regroupement (50) regroupant les mots très fréquents en fonction du degré de cooccurrence (C) basé sur la présence/absence d’occurrences dans des unités d’un document avec les mots d’index (w) dans le groupe de documents (E), un moyen de calcul de score (70) pour calculer la clé de score (w) de chaque mot d’index (w) tout en estimant fortement ceux des mots d’index (w) présents simultanément avec le mot très fréquent appartenant à plusieurs regroupements (g) et avec les mots très fréquents figurant dans plusieurs documents (D), et un moyen d’extraction de mot-clé (90) pour extraire des mots-clé en fonction du score. Ainsi, des mots-clé indiquant la caractéristique d’un groupe de documents peuvent être extraits automatiquement.