Traitement en cours

Veuillez attendre...

Paramétrages

Paramétrages

Aller à Demande

1. WO2020167557 - INTERROGATION EN LANGAGE NATUREL D’UN LAC DE DONNÉES À L’AIDE DE BASES DE CONNAISSANCES CONTEXTUALISÉES

Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

[ EN ]

CLAIMS

What is claimed is:

1. A method of querying a data lake using natural language, comprising the following steps:

receiving a natural language query directed to an electronic data lake;

parsing the natural language query to determine a plurality of entities within the natural language query;

identifying the plurality of entities using at least one contextual knowledge base, wherein the plurality of entities are compared against at least one entry in the at least one contextual knowledge base;

mapping a dependency of the plurality of identified entities based on the parsed natural language query;

constructing a structured data query based on the plurality of identified entities and the mapped dependency; and

automatically generating a visual output of a result of the structured data query based on at least one characteristic from the set of: a data type, a data format, or a data size of the result of the structured data query.

2. The method of claim 1 , wherein the at least one contextual knowledge base is selected from the set of: entity, entity thesaurus, code set, relationship table structure, aggregation and slice identifier, and visual recommender knowledge bases.

3. The method of claims 1 or 2, wherein the visual output is a type selected from the set of: pie graph, bar graph, line graph, X-Y plot, area chart, scatter plot, bubble plot, and histogram.

4. The method of claim 3, comprising at least two types of visual output.

5. The method of claims 1, 2, 3, or 4, wherein the result of the structured data is saved as a file.

6. The method of claims 1, 2, 3, 4, or 5, wherein the step of parsing the natural language query comprises the following steps:

parsing at least one sentence of the natural language query;

parsing at least one part of speech of the at least one sentence; and

creating a dependency tree based on the at least one part of speech.

7. The method of claims 1 , 2, 3, 4, 5, or 6, wherein the step of identifying the plurality of entities comprises the following steps:

combining at least one phrase of the natural language query;

soft matching the plurality of entities; and

identifying at least one entity above a threshold confidence level.

8. The method of claims 1 , 2, 3, 4, 5, 6, or 7, further comprising the step of suggesting at least a portion of the natural language query.

9. The method of claims 1 , 2, 3, 4, 5, 6, 7, or 8 wherein the structured data query is constructed in Structured Query Language (SQL).

10. A system for querying a data lake using natural language, comprising:

an electronic data lake;

at least one contextual knowledge base; and

a user computer device in communication with the electronic data lake and the at least one contextual knowledge base over at least one network, wherein the user computer device is configured to:

receive a natural language query directed to the electronic data lake;

parse the natural language query to determine a plurality of entities within the natural language query;

identify the plurality of entities using the at least one contextual knowledge base, wherein the plurality of entities are compared against at least one entry in the at least one contextual knowledge base;

map a dependency of the plurality of identified entities based on the parsed natural language query;

construct a structured data query based on the plurality of identified entities and the mapped dependency; and

automatically generate a visual output of a result of the structured data query on a display of the user computer device, wherein the visual output is based on at least one characteristic from the set of: a data type, a data format, or a data size of the result of the structured data query.

11. The system of claim 10, wherein the at least one contextual knowledge base is selected from the set of: entity, entity thesaurus, code set, relationship table structure, aggregation and slice identifier, and visual recommender knowledge bases.

12. The system of claims 10 or 11 , wherein the visual output is a type selected from the set of: pie graph, bar graph, line graph, X-Y plot, area chart, scatter plot, bubble plot, and histogram.

13. The system of claim 12, comprising at least two types of visual output.

14. The system of claims 10, 11, 12, or 13, wherein the result of the structured data is saved as a file.

15. The system of claims 10, 11, 12, 13, or 14, wherein parsing the natural language quay comprises:

parsing at least one sentence of the natural language query;

parsing at least one part of speech of the at least one sentence; and

creating a dependency tree based on the at least one part of speech.

16. The system of claims 10, 11, 12, 13, 14, or 15, wherein identifying the plurality of entities comprises:

combining at least one phrase of the natural language quay;

soft matching the plurality of entities; and

identifying at least one entity above a threshold confidence level.

17. The system of claims 10, 11, 12, 13, 14, 15, or 16 wherein the usa computer device is configured to suggest at least a portion of the natural language query.

18. The system of claims 10, 11, 12, 13, 14, 15, 16, or 17 wherein the structured data query is constructed in Structured Query Language (SQL).

19. The system of claims 10, 11, 12, 13, 14, 15, 16, 17, or 18 wherein the display is selected from the set of: monitor, projector, screen, and wearable.

20. The system of claim 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 wherein the user computer device is configured to return a tabular data output displayed with the visual output.