Traitement en cours

Veuillez attendre...

Paramétrages

Paramétrages

Aller à Demande

1. WO2020112579 - IMPLÉMENTATIONS ÉVOLUTIVES DE COMPTES DISTINCTS EXACTS ET DE MULTIPLES COMPTES DISTINCTS EXACTS DANS DES SYSTÈMES DE TRAITEMENT DE REQUÊTES RÉPARTIES

Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

[ EN ]

CLAIMS

1. A system for determining multiple distinct counts for values based on a key, the system comprising:

a processing system that includes one or more processors; and

a memory configured to store program code to be executed by the processing system, the program code configured to:

access a dataset that includes a first field, a second field, and a third field; combine the first field with the second field to generate a fourth field; generate a set of compound keys that includes two or more compound keys each comprised of a different combination of one of at least two identifiers for the first field and the second field with one of at least two values for the third field; assign a corresponding compound key of the set of compound keys to each value of the fourth field; and

determine a total number of unique values of the fourth field for each value in the third field, based at least in part on the set of compound keys, as the multiple distinct counts.

2. The system of claim 1, wherein to combine the first field with the second field, the program code is configured to:

un-pivot the dataset based on a first column associated with the first field and a second column associated with the second field to combine the first field and the second field into the fourth field.

3. The system of claim 2, wherein to un-pivot the dataset, the program code is configured to:

generate for the fourth field a fourth column associated therewith, the fourth column including a separate row having values for each value in the first field and for each value in the second field; and

modify a third column associated with the third field to generate a modified third column that includes a number of rows for each separate row of the fourth column, each row in the number of rows for the modified third column having a corresponding value from the third column for each separate row.

4. The system of claim 1, wherein the program code is configured to:

divide the dataset into a plurality of partitions, each partition of the plurality of partitions being the only partition to include a respective portion of the dataset having at least one subset of identifiers of the fourth field of the dataset with a same value; and perform a single distinct count operation across subsets of the plurality of partitions to determine the total number of unique values.

5. The system of claim 1, wherein the at least two identifiers for the first field and the second field comprise values for a first key of a given compound key, and the at least two values for the third field comprise values for a second key of the given compound key; and wherein to determine a total number of unique values, the program code is configured to:

perform a single distinct count operation for values of the fourth field based on the set of compound keys to generate compound key counts;

pivot the compound key counts with respect to the first key and the second key; and

determine a total number of ones of the compound key counts for each of the values for the first key and the values for the second key.

6. The system of claim 1, wherein the at least two identifiers for the first field and the second field comprise alphanumeric field identifiers that uniquely identify the first field and the second field.

7. The system of claim 1, wherein the dataset comprises log entries having data for at least one of a hosted web service or a hosted web application; or

wherein the system is a cloud-based system that hosts big data storage for the dataset.

8. A computer-implemented method for determining multiple distinct counts for values based on a key, the method comprising:

combining a first field of a dataset with a second field of the data set to generate a fourth field;

generating a set of compound keys that includes two or more compound keys, each of the compound keys comprised of at least a first key and a second key, and being a different combination of first values for the first key and second values for the second key; assigning a corresponding compound key of the set of compound keys to each value of the fourth field; and

determining a total number of unique values of the fourth field for each value in a third field, based at least in part on the set of compound keys, as the multiple distinct counts.

9. The computer-implemented method of claim 8, wherein combining the first field with the second field comprises:

un-pivoting the dataset based on a first column associated with the first field and a second column associated with the second field to combine the first field and the second field into the fourth field.

10. The computer-implemented method of claim 9, wherein un-pivoting the dataset comprises:

generating for the fourth field a fourth column associated therewith, the fourth column including a separate row having values for each value in the first field and for each value in the second field; and

modifying a third column associated with the third field to generate a modified third column that includes a number of rows for each separate row of the fourth column, each row in the number of rows for the modified third column having a corresponding value from the third column for each separate row.

11. The computer-implemented method of claim 8, the method further comprising: dividing the dataset into a plurality of partitions, each partition of the plurality of partitions being the only partition to include a respective portion of the dataset having at least one subset of identifiers of the fourth field of the dataset with a same value; and wherein determining the total number of unique values comprises performing a single distinct count operation across subsets of the plurality of partitions.

12. The computer-implemented method of claim 8, wherein the first values include at least two unique alphanumeric field identifiers for the first field and the second field, and wherein the second values include at least two values for the third field.

13. The computer-implemented method of claim 8, wherein determining the total number of unique values comprises:

performing a single distinct count operation for values of the fourth field based on the set of compound keys to generate compound key counts;

pivoting the compound key counts with respect to the first key and the second key; and

determining a total number of ones of the compound key counts for each of the values for the first key and the values for the second key.

14. The computer-implemented method of claim 8, wherein the dataset comprises log entries having data for at least one of a hosted web service or a hosted web application; or wherein the computer-implemented method is performed by a system that is a cloud-based system that hosts big data storage for the dataset.

15. A computer program product that includes a computer-readable medium having

computer program logic recorded thereon, comprising:

computer program logic for enabling a processor to perform any of the computer-implemented methods of claims 8-14.