Some content of this application is unavailable at the moment.
If this situation persist, please contact us atFeedback&Contact
1. (WO2015038442) PROCESSING DATASETS WITH A DBMS ENGINE
Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

CLAIMS

1. A method for processing a dataset with a database management system (DBMS) engine, the method comprising:

splitting bulk data into a plurality of chunks;

converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format;

creating an empty DBMS table within the DBMS;

attaching the external dataset to the empty DBMS table; and

executing a MapReduce job on a cluster of compute nodes, using the

external dataset as input.

2. The method recited in claim 1, the compute nodes each comprising an instance of the DBMS.

3. The method recited in claim 2, wherein converting the chunks is performed in parallel on the compute nodes, each of the compute nodes converting one of the chunks.

4. The method recited in claim 2, comprising:

the MapReduce job sending commands to the DBMS; and

the DBMS processing external dataset in response to the commands.

5. The method recited in claim 2, wherein each of the chunks comprise a number of rows approximate to a number of rows of the bulk data divided by the number of compute nodes.

6. The method recited in claim 2, wherein the MapReduce job comprises a map job executing on each of the compute nodes.

7. The method recited in claim 1, wherein attaching the dataset comprises copying metadata describing the dataset to a catalog of the DBMS.

8. A system for executing a MapReduce job, comprising:

a cluster of compute nodes, each comprising:

a processing unit; and

a system memory, wherein the system memory comprises code configured to direct the processing unit to:

split bulk data into a plurality of chunks;

convert the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format;

create an empty DBMS table within the DBMS;

attach the external dataset to the empty DBMS table; and

execute a MapReduce job on a cluster of compute nodes, using the external dataset as input.

9. The system recited in claim 8, the compute nodes each comprising an instance of the DBMS.

10. The system recited in claim 8, wherein converting the chunks is performed in parallel on the compute nodes, each of the compute nodes converting one of the chunks.

11. The system recited in claim 8, wherein a number of chunks is equal to a number of compute nodes.

12. The system recited in claim 8, wherein each of the chunks comprise a number of rows approximate to a number of rows of the bulk data divided by the number of compute nodes.

13. The system recited in claim 8 , wherein the MapReduce job comprises a map job executing on each of the compute nodes.

14. The system recited in claim 8, wherein attaching the dataset comprises copying metadata describing the dataset to a catalog of the DBMS.

15. The system recited in claim 8, the empty DBMS table comprising one or more columns of a same type and number as one or more columns of the dataset external to the DBMS.