Processing

Please wait...

Settings

Settings

Goto Application

1. CN101069177 - Keyword extracting device

Office
China
Application Number 200580037260.5
Application Date 11.10.2005
Publication Number 101069177
Publication Date 07.11.2007
Publication Kind A
IPC
G06F 17/30
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
30Information retrieval; Database structures therefor
CPC
G06F 16/313
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
31Indexing; Data structures therefor; Storage structures
313Selection or weighting of terms for indexing
Applicants Intellectual Property Bank Cor
株式会社IPB
Inventors Masuyama Hiroaki
增山博昭
Sato Harutada
佐藤晴正
Asada Makoto
浅田诚
Hasuko Kazumi
莲子和巳
Hotta Hideaki
堀田任晃
Agents lujin hua xieli na
中原信达知识产权代理有限责任公司
中原信达知识产权代理有限责任公司
Priority Data 322924/2004 05.11.2004 JP
Title
(EN) Keyword extracting device
(ZH) 关键字抽取装置
Abstract
(EN) A keyword extracting device comprises high-frequency word extracting means (30) for extracting high-frequency words which are index words having great weights and containing the level of the frequency in a document group (E) consisting of documents (D) in the evaluation from the index words (w) contained in the document group (E), clustering means (50) clustering the high-frequency words according to the cooccurrence degree C based on the presence/absence of occurrence in units of one document with the index words (w) in the document group (E), score calculating means (70) for calculating the score key (w) of each index word (w) while highly estimating index words (w), out of the index words (w), which coocurs with the high-frequency word belonging to more clusters (g) and with the high-frequency words in more documents (D), and keyword extracting means (90) for extracting keywords according to the score. With this, keywords indicating the feature of a document group consisting of documents can be automatically extracted.
(ZH)

一种关键字抽取装置,具备:抽取把由多个文件(D)组成的文件群(E)中包含的索引词(w)中的在上述文件群(E)中的出现频度的高低程度包含在评价中的权重大的索引词即高频度词的高频度词抽取部(30);使上述高频度词基于以与在上述文件群(E)中的索引词(w)的各个的在文件单位中的同现的有无为基础的同现度(C)而成簇的成簇单元(50);对于各个索引词(w)算出对与上述索引词(w)中的属于较多的上述簇(g)的高频度词发生同现,且在较多的文件(D)中与上述高频度词发生同现的东西较高地进行评价所得的得分key(w)的得分算出单元(70);以及基于上述得分来抽取关键字的关键字抽取部(90)。据此,就能自动抽取表示由多个文件组成的文件群的特征的关键字。