Processing

Please wait...

Settings

Settings

Goto Application

1. US20210026846 - QUERYING A DATA GRAPH USING NATURAL LANGUAGE QUERIES

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

Claims

1. A computer-implemented method, the method comprising:
receiving, by at least one processor, a search query that includes a first search term;
determining, using the at least one processor, that the search query matches a query template;
mapping the first search term to a first entity in a data graph;
obtaining, using the at least one processor, a path from a model associated with the query template, the model being trained to provide the path for the query template, the path having a score representing a probability that the path provides a correct answer for the query template;
identifying, using the at least one processor, a second entity in the data graph by following the path in the data graph from first entity; and
providing, using the at least one processor, information relating to the second entity in a response to the search query.
2. The method of claim 1, the method further comprising:
training the model to produce the path for the query template.
3. The method of claim 1, wherein determining that the search query matches the query template includes:
determining that the first search term matches a source entity placeholder in the query template; and
determining that a reminder of the query matches a remainder of the query template.
4. The method of claim 1, wherein the model provides plurality of candidate paths, the path being a highest scoring path of the plurality of candidate paths.
5. The method of claim 1, further comprising:
mapping the first search term to at least two entities in the data graph, the first entity being one of the at least two entities,
wherein identifying the second entity occurs for each entity of the at least two entities, resulting in at least two second entities, and
wherein providing the information includes providing information relating to the at least two second entities.
6. The method of claim 1, wherein the search query is a natural language query.
7. A computer-implemented method comprising:
training, using at least one processor, a machine learned model to associate a query template to a weighted feature, the query template including a source entity variable, the weighted feature representing one path in a data graph that starts at the source entity variable and includes a target entity that answers the query, the path having a score representing a probability that the target entity is a correct answer;
receiving a user request matching the query template;
determining, using the at least one processor, a first entity from the user request matching the source entity variable in the query template, the first entity existing in the data graph;
receiving at least a highest scoring path from the machine learned module responsive to providing the first entity to the machine learned model;
identifying, using the at least one processor, a second entity in the data graph using the first entity and the highest scoring path; and
generating a response to the user request that includes information relating to the second entity.
8. The method of claim 7, wherein the information is a name of the second entity.
9. The method of claim 7, wherein the query template is generated from a cluster of queries with similar meaning.
10. The method of claim 7, wherein training the machine learning module includes:
identifying a plurality of source entities from the data graph that satisfy the source entity variable for the query template;
identifying, for each source entity in the plurality of source entities, a set of target entities connected to the source entity in the data graph, generating a plurality of source entity-target entity pairs;
assigning each source entity target-entity pair in the plurality of source entity-target entity pairs to either a positive example or negative example based on a confidence score for the target entity; and
training the model using the positive examples and the negative examples.
11. The method of claim 10, wherein training the model includes:
for each source entity-target entity pair, identifying paths in the data graph of up to a predetermined path length that connect the source entity with the target entity; and
excluding an identified path that, during a testing phase of the training, arrive at a wrong answer more than a correct answer.
12. The method of claim 7, wherein the score is a confidence score.
13. The method of claim 7, wherein the model provides a plurality of candidate paths, the path being a highest scoring path of the plurality of candidate paths.
14. A system comprising:
at least one processor; and
memory storing instructions that, when executed by the at least one processor causes the system to perform operations including:
receiving an entity for a query, the entity existing in an entity graph;
identifying a path in the entity graph associated with the query, the path being predicted by a machine-learned model and having a score representing a probability that the path provides a correct answer;
starting from the entity in the entity graph, following the path to identify a second entity; and
providing information about the second entity in response to the query.
15. The system of claim 14, wherein the entity is a named entity in the query.
16. The system of claim 14, wherein the model is trained by:
identifying a plurality of source entities from the entity graph for the query;
identifying, for each source entity in the plurality of source entities, a set of target entities connected to the source entity in the entity graph, generating a plurality of source entity-target entity pairs;
assigning each source entity target-entity pair in the plurality of source entity-target entity pairs to either a positive example or negative example based on a confidence score for the target entity, wherein the confidence score represents probability that the target entity is a correct answer to the query for the source entity; and
training the model using the positive examples and the negative examples.
17. The system of claim 16, wherein training the machine-learned model includes:
for each source entity-target entity pair, identifying paths in the entity graph of up to a predetermined path length that connect the source entity with the target entity; and
excluding an identified path that, during a testing phase of the training, arrive at a wrong answer more than a correct answer.
18. The system of claim 14, wherein the instructions further include instructions that cause the system to perform operations including:
identifying a plurality of entities for the query, the entity being one of the plurality of entities, each of the plurality of entities existing in the entity graph,
wherein identifying the second entity occurs for each entity of the plurality of entities, resulting in at least two second entities, and
wherein providing the information includes providing information relating to the at least two second entities.