Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020033805 - WEBSITE REPRESENTATION VECTOR TO GENERATE SEARCH RESULTS AND CLASSIFY WEBSITE

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS

1. A computer-implemented method comprising:

receiving data representing each website in a first plurality of websites, each website of the first plurality of websites being associated with a first knowledge domain of a plurality of knowledge domains, each website having an associated first classification of a plurality of classifications;

receiving data representing each website in a second plurality of websites, each website of the second plurality of websites being associated with the first knowledge domain, each website having an associated second classification of the plurality of classifications;

generating a first composite-representation of the first plurality of websites; generating a second composite-representation of the second plurality of websites; receiving a representation of a third website;

determining a first measure of difference between the first composite-representation and the representation;

determining a second measure of difference between the second composite-representation and the representation; and

based on the first measure of difference and the second measure of difference, classifying the third website.

2. The method of claim 1, further comprising:

generating said first and second classifications based upon respective quality scores associated with each website of the first and second plurality of websites, each quality score representing a quality measure of the respective website relative to other websites.

3. The method of claim 2, wherein said first and second classifications are generated based upon first and second threshold values.

4. The method of any preceding claim, further comprising:

receiving a query that includes terms that are determined to be indicative of the particular knowledge domain, and in response:

selecting one of the second plurality of websites for use in responding to

the query; and

responding to the query using information from the selected second website.

5. The method of claim 4, further comprising:

determining, using the terms included in the query, that the query requests responsive data from the particular knowledge domain; and

in response to determining that the query requests responsive data from the particular knowledge domain, determining to search the second websites and not to search the first websites for search results responsive to the query.

6. The method of any preceding claim, wherein the second plurality of websites are determined to be a collection of authoritative data sources, the method further comprising:

generating, from the collection of authoritative data sources, preprocessed responses to future queries;

receiving, after generating the preprocessed responses, a query that is determined to be indicative of the particular knowledge domain; and

in response, responding to the query with one of the preprocessed response.

7. The method of any preceding claim, wherein:

the data representing each website in the first and second plurality of websites comprises a feature vector derived from the corresponding website;

generating the first composite-representation comprises generating a first feature vector that is a central tendency of the feature vectors of the first plurality of websites; and

generating the second composite-representation comprises generating a second feature vector that is a central tendency of the feature vectors of the second plurality of websites.

8. The method of claim 7, wherein:

determining the first measure of difference comprises determining a first scalar difference between the first composite-representation and the representation of the third website; and

determining the second measure of difference comprises determining a second scalar difference between the second composite-representation and the representation of the third website.

9. The method of claim 7, comprising generating each of the feature vectors using a neural network that receives, as input, content included in a corresponding website.

10. The method of claim 7, wherein:

the first feature vector comprises a first feature vector that includes averages of the feature vectors of the websites classified as first websites; and

the second feature vector comprises a second feature vector that includes averages of the feature vectors of the websites classified as first websites.

11. The method of any preceding claim, wherein the representations for at least some of the first and second plurality of websites are generated using only proper subsets of a set of resources that belong to the respective website.

12. A system comprising one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform a method according to any preceding claim.

13. A non-transitory computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform a method according to any one of claims 1 to 11.