Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020112993 - SYSTEMS AND METHODS FOR DATA USAGE MONITORING IN MULTI-TENANCY ENABLED HADOOP CLUSTERS

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS

What is claimed is:

1. A method for monitoring data usage in multi-tenancy enabled HADOOP clusters, comprising:

in an information processing apparatus comprising at least one computer processor:

receiving metadata related to a dataset in one or more multi tenant clusters;

receiving entitlement data for a plurality of users to the dataset; receiving group membership data for the plurality of users; receiving access permissions for the plurality of users to the dataset;

receiving audit logs comprising access history for the plurality of users to the dataset;

joining the metadata, entitlement data, group membership data, access permissions, and audit logs into a searchable database;

receiving a query comprising at least one of a date range, a file, a directory, a user, and a group of users;

applying the query to the searchable database; and returning results to the query.

2. The method of claim 1, wherein the dataset comprises at least one of files, directories, tables, databases, and logical constructs.

3. The method of claim 1, wherein the metadata comprises classification metadata and properties metadata.

4. The method of claim 1, wherein the group membership comprises an identification of the plurality of users that are in groups.

5. The method of claim 1, wherein at least of the metadata, the entitlement data, the group membership data, the access permissions, and the audit logs are received in real-time or substantially in real-time.

6. The method of claim 1, wherein at least of the metadata, the entitlement data, the group membership data, the access permissions, and the audit logs are received periodically.

7. The method of claim 1, wherein the access permissions are received from access control lists.

8. The method of claim 7, wherein the access permissions are further received from file and directory permissions.

9. The method of claim 1, wherein the joined data is partitioned based on a unit of time.

10. The method of claim 1, further comprising:

storing the joined data in an optimal format for querying.

11. A system for monitoring data usage in multi-tenancy enabled HADOOP clusters, comprising:

a data mart comprising at least one computer processor;

one or more multi-tenant clusters storing a dataset;

a metadata source storing metadata related to the dataset;

an entitlement data source storing entitlement data for a plurality of users;

a group membership source storing group membership data for the plurality of users;

an access permissions source storing access permissions for the plurality of users to the dataset; and

an audit log source storing access history for the plurality of users to the dataset;

wherein:

the data mart receives the metadata, the entitlement data, the group membership data, the access permissions, and the audit logs; the data mart joint the metadata, entitlement data, group membership data, access permissions, and audit logs into a searchable database;

the data mart receives a query comprising at least one of a date range, a file, a directory, a user, and a group of users;

the data mart applies the query to the searchable database; and the data mart returns results to the query.

12. The system of claim 11, wherein the dataset comprises at least one of files, directories, tables, databases, and logical constructs.

13. The system of claim 11, wherein the metadata comprises classification metadata and properties metadata.

14. The system of claim 11, wherein the group membership comprises an identification of the plurality of users that are in groups.

15. The system of claim 11, wherein at least of the metadata, the entitlement data, the group membership data, the access permissions, and the audit logs are received in real-time or substantially in real-time.

16. The system of claim 11, wherein at least of the metadata, the entitlement data, the group membership data, the access permissions, and the audit logs are received periodically.

17. The system of claim 11, wherein the access permissions source comprises at least one access control list.

18. The system of claim 17, wherein the access permissions comprise file and directory permissions.

19. The system of claim 11, wherein the joined data is partitioned based on a unit of time.

20. The system of claim 11, wherein the data mart stores the joined data in an optimal format for querying.