Processing

Please wait...

Settings

Settings

Goto Application

1. WO2012057773 - GENERATING A TAXONOMY FROM UNSTRUCTURED INFORMATION

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS

What we claim is:

1 . A method [200A] for generating a taxonomy from unstructured information, said method comprising:

extracting [202] at least one term from unstructured information [122]; validating [204] said at least one term [124];

determining [206] a sense of at least one extracted and validated term

[108];

clustering [208] said at least one extracted and validated term [108] into at least one group [1 12] of terms according to said determined sense; and

generating [210] a taxonomy [120] based on said clustering and a mining of accessible taxonomies.

2. The method [200A] of Claim 1 , further comprising:

assigning [212] a probability value to said at least one group [1 12] of terms.

3. The method [200A] of Claim 1 , wherein said determining [206] a sense of at least one extracted and validated term comprises:

determining a shared sense of a first set of said at least one extracted and validated term[108] that is unambiguous.

4. The method [200A] of Claim 3, further comprising:

based on a determined shared sense, disambiguating a second set of said at least one extracted and validated term [108] that is ambiguous.

5. The method [200A] of Claim 1 , wherein said validating [204] at least one term [124] comprises:

estimating a probability of a co-occurrence of said at least one extracted term, based on at least one language model.

6. The method [200A] of Claim 1 , wherein said validating [204] at least one term [124] comprises:

estimating a probability that a first term of said at least one extracted term is related to a second term of said at least one extracted term and belongs to a domain.

7. The method [200A] of Claim 1 , wherein said generating [210] a taxonomy based on said clustering and a mining of taxonomies comprises:

generating said taxonomy [120] that is in a human readable format.

8. The method [200A] of Claim 1 , wherein said generating [210] a taxonomy [120] based on said clustering and a mining of taxonomies comprises: generating said taxonomy [120] that is in a computer readable format.

9. The method [200A] of Claim 1 , wherein said clustering [208] said at least one extracted and validated term [108] into at least one group [1 12] of terms according to said determining [206] said sense comprises:

grouping together terms with shared hypernyms.

10. The method [200A] of Claim 1 , wherein said clustering [208] said at least one extracted and validated term [108] into at least one group [1 12] of terms according to said determining [206] said sense comprises:

grouping synonymous terms into synonym rings.

1 1 . The method [200A] of Claim 1 , wherein said clustering [208] said at least one extracted and validated term [108] into at least one group [1 12] of terms according to said determining [206] said sense comprises:

grouping together terms with shared senses.

12. A system [100] comprising:

a term extractor [104] configured for extracting at least one term [124] from unstructured information [122];

a term validater [106] configured for validating said at least one term

[124];

a sense determiner [126] configured for determining a sense of at least one extracted and validated term [108];

a term clusterer [1 10] configured for clustering said at least one extracted and validated term [108] into at least one group [1 12] of terms according to a determined sense; and

a taxonomy generator [1 18] configured for generating a taxonomy [120] based on said clustering and a mining of taxonomies [102].

13. The system [100] of Claim 12, wherein said sense determiner [126] comprising:

a shared sense determiner [1 14] configured for determining a shared sense of a first set of said at least one extracted and validated term [108] that is unambiguous; and

a term disambiguater [1 16] configured for, based on a determined shared sense, disambiguating a second set of said at least one extracted and validated term [108] that is ambiguous.

14. The system [100] of Claim 12, wherein said unstructured information [122] is a document comprising text.

15. A non-transitory computer-readable storage medium comprising instructions stored thereon which, when executed by a computer system, cause said computer system to perform a method [200B] for generating a taxonomy from unstructured information [122], said method comprising:

extracting [214] at least one term [124] from unstructured information

[122];

validating [216] said at least one term [124];

determining [218] a sense of at least one extracted and validated term [108], said determining comprising:

determining a shared sense of a first set of said at least one extracted and validated term [108] that is unambiguous; and

based on a determined shared sense, disambiguating a second set of said at least one extracted and validated term [108] that is ambiguous;

clustering [220] said at least one extracted and validated term [108] into at least one group of terms according to said determined sense; and

generating [222] a taxonomy [120] based on said clustering and a mining of taxonomies.