(EN)
A keyword extracting device comprises high-frequency word extracting means (30) for extracting high-frequency words which are index words having great weights and containing the level of the frequency in a document group (E) consisting of documents (D) in the evaluation from the index words (w) contained in the document group (E), clustering means (50) clustering the high-frequency words according to the cooccurrence degree C based on the presence/absence of occurrence in units of one document with the index words (w) in the document group (E), score calculating means (70) for calculating the score key (w) of each index word (w) while highly estimating index words (w), out of the index words (w), which coocurs with the high-frequency word belonging to more clusters (g) and with the high-frequency words in more documents (D), and keyword extracting means (90) for extracting keywords according to the score. With this, keywords indicating the feature of a document group consisting of documents can be automatically extracted.
(ZH) 一种关键字抽取装置,具备:抽取把由多个文件(D)组成的文件群(E)中包含的索引词(w)中的在上述文件群(E)中的出现频度的高低程度包含在评价中的权重大的索引词即高频度词的高频度词抽取部(30);使上述高频度词基于以与在上述文件群(E)中的索引词(w)的各个的在文件单位中的同现的有无为基础的同现度(C)而成簇的成簇单元(50);对于各个索引词(w)算出对与上述索引词(w)中的属于较多的上述簇(g)的高频度词发生同现,且在较多的文件(D)中与上述高频度词发生同现的东西较高地进行评价所得的得分key(w)的得分算出单元(70);以及基于上述得分来抽取关键字的关键字抽取部(90)。据此,就能自动抽取表示由多个文件组成的文件群的特征的关键字。