Processing

Please wait...

Settings

Settings

Goto Application

1. WO2017190757 - DISTRIBUTED DATA ANALYSIS SYSTEM AND METHOD

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

Claims

1. A distributed data analysis system (100; 200) for analyzing collected measurement data, comprising:

a data input device (10; 110) configured to receive measurement data; a first storage device (12; 112) associated with the data input device (10; 110) and configured to store the measurement data input via the data input device (10; 110);

a first computing device (14; 114) associated with the first storage device (12; 112);

a second storage device (16; 116) configured to store measurement data previously stored on the first storage device (12; 1 12);

a second computing device (18; 118) associated with the second storage device (16; 116);

a data distribution system (19; 119) configured to distribute the measurement data between the first storage device (12; 112) and the second storage device (16; 116) based on at least one predetermined criterion;

a data management device (22; 122) configured to store a location of the measurement data and to update the stored location based on the distribution by the data distribution device (20; 120);

a query input device (24; 124) configured to receive a query for an analysis to be performed on the collected measurement data;

an analysis device (26; 126) configured to perform the analysis on the measurement data stored on the first storage device (12; 112) and the second storage device (16; 116) by the first computing device (14; 114) and the second computing device (18; 118), respectively, based on the location of the collected measurement data stored by the data management device (22; 122); and

a reporting device (28; 128) configured to report a result of the analysis of the collected measurement data.

2. The system of claim 1, comprising:

a plurality of first storage devices (12) located at geographically different locations, each first storage device (12) being associated with a corresponding data input device (10) and a corresponding first computing device (14), wherein the second storage device (16) is a central storage device configured to store data previously stored on the plurality of first storage devices (12).

3. The system of claim 2, wherein

each first computing device (14) is configured to acquire a type of the measurement data input via the corresponding data input device (10), for example, by automatically detecting a data format of the same, and to perform a pre-processing of the measurement data based on the acquired type of

measurement data prior to storing the measurement data on the associated first storage device (12).

4. The system of claim 3, wherein the pre-processing includes converting the measurement data into a format that is suitable for analysis in a cluster computing framework.

5. The system of any one of claims 2 to 4, wherein

each first computing device (14) is configured to generate meta data from the measurement data input via the corresponding data input device (10) and stored on the first storage device (20), said meta data including the location of the measurement data, and to forward the meta data to the data management device (22).

6. The system of any one of claims 2 to 5, wherein

each first computing device (14) includes a computing cluster (30), for example, a Hadoop cluster, configured to perform the analysis of the

measurement data stored on the first storage device (12).

7. The system of any one of claims 2 to 6, wherein

the at least one criterion includes at least one of:

a remaining capacity of one of the plurality of first storage devices (12); an age of the measurement data;

lapse of a predetermined time interval; and

a specific type of the measurement data stored on one of the plurality of first storage devices (12), and

the data distribution system (19) includes a data distribution device (20) configured to initiate a backup of at least part of the measurement data stored on one of the plurality of first storage devices (12) to a portable storage device (32) provided at the location of the corresponding first storage device (12) when the at least one criterion is met.

8. The system of claim 7, further comprising a backup data input device (34) provided at the location of the second storage device (16), connectable to the portable storage device (32) and configured to read the measurement data on the portable storage device (32) and to transfer the same to the second storage device (16).

9. The system of any one of claims 2 to 8, further comprising a data verification device (36) configured to verify that data that has been transferred to the second storage device (1 ) is identical to the measurement data previously stored on the first storage device (12), and to initiate deletion of the measurement data previously stored on the first storage device (12) after successful verification of the data, wherein

the data management device (22) is configured to update the location of the verified measurement data after successful verification.

10. The system of any one of claims 2 to 9, wherein each of the plurality of first storage devices (12) is contained in a housing (40) together with the associated first computing device (14), the housing (40) including the data input device (10) and forming a standalone unit.

11. The system of claim 1, wherein

the second storage device (116) is co-located with the first storage device

(Π2),

the first computing device (114) has more computing power than the second computing device (118), and

the data distribution device (120) is configured to classify the

measurement data into data having different priorities, and to transfer data having a lower priority to the second storage device (116).

12. The system of claim 11, wherein the first computing device (114) and the second computing device (118) form part of a cluster computing system (119), and the data management device (120) is configured to classify the measurement data based on at least one of: access times; creation dates; types of data; and other meta data associated with the collected measurement data.

13. The system of claim 11 or 12, further comprising an object store (140) having substantially no computing power and being configured for long-term storage of measurement data having a lowest priority, wherein the data management device (120) includes a meta data generation device (136)

configured to analyze the data to be stored in the object store (40), and to generate appropriate meta data for efficient recovery of the stored measurement data by the data management device (120) based on the analysis.

14. The system of any preceding claim, wherein the analysis device (26; 126) is configured to generate analysis code to be executed on the first computing device (14; 114) and the second computing device (18; 118), respectively, based on the query, and to send the analysis code to the first computing device (14; 1 14) and the second computing device (18; 118) for execution on the same.

15. A method for distributed data analysis of collected measurement data stored on a first storage device (12; 112) and a second storage device (14; 114), comprising:

receiving measurement data;

storing a location of the received measurement data;

distributing the measurement data between the first storage device (12; 112) and the second storage device (16; 116) based on at least one predetermined criterion;

updating the location of the measurement data based on the distribution; receiving a query for an analysis to be performed on the collected measurement data;

performing the analysis on the measurement data stored on the first storage device (12) and the second storage device (14; 114) by a first computing device (14; 114) associated with the first storage device (12; 112) and a second computing device (18; 118) associated with the second storage device (16; 116), respectively, in accordance with the stored location of the measurement data; and reporting a result of the analysis.

16. The method of claim 15, further comprising

acquiring a type of the received measurement data, and

performing pre-processing of the measurement data based on the acquired type of measurement data prior to storing the measurement data.

17. The method of claim 16, wherein pre-processing includes converting the measurement data into a different format, for example, a format that is suitable for analysis in a cluster computing framework.

18. The method of any one of claims 15 to 17, further comprising generating meta data from the received measurement data, said meta data including the location of the measurement data.

19. The method of any one of claims 15 to 18, further comprising initiating a backup of at least part of the measurement data stored on the first storage device (12) to a portable storage device (32) provided at the location of the corresponding first storage device (12) when the at least one criterion is met, the at least one criterion including at least one of:

a remaining capacity of the first storage device (12);

an age of the measurement data;

lapse of a predetermined time interval; and

a specific type of the measurement data stored on the first storage device

(12).

20. The method of claim 19, further comprising

physically transporting the portable storage device (32) to the location of the second storage device (16), for example, by mail,

transferring the backup data from the portable storage device (32) to the second storage device (16),

verifying that the transferred data is identical to the data previously stored on the first storage device (12), and

deleting the measurement data previously stored on the first storage device after successful verification.

21. The method of claim 15, further comprising

classifying the measurement data into data having different priorities based on at least one of access times, creation dates, types of data, and other meta data associated with the collected measurement data, and

transferring data having a lower priority from the first storage device (112) to the second storage device (116).

22. The method of any one of claims 15 to 21, further comprising generating analysis code based on the query, and

executing the analysis code on the first computing device (14; 114) and the second computing device (18; 118).

23. A computer program comprising computer-executable instructions that, when executed on a computer system, cause the computer system to execute the steps of the method of any one of claims 15 to 22.

24. A computer program comprising computer-executable instructions that, when executed on a computer system, cause the computer system to perform the following steps:

acquiring a location of measurement data stored on a plurality of storage devices (12, 16) provided at geographically different locations;

receiving a query for an analysis to be performed on the measurement data;

evaluating the query to determine on which of the plurality of storage devices (12, 16) data relevant to the query is stored;

generating analysis code based on the query;

forwarding the generated analysis code to computing devices (14, 18) associated with the relevant storage devices (12, 16), respectively;

receiving a partial analysis result from each computing device (14, 18); combining the partial analysis results; and

reporting the combined analysis result.

25. The computer program of claim 24, further comprising instructions for initiating a transfer of measurement data from one storage device to another when at least one predetermined criterion is met, the at least one criterion including at least one of:

a remaining capacity of the first storage device (12);

an age of the measurement data;

lapse of a predetermined time interval; and

a specific type of measurement data stored on the first storage device (12).