Some content of this application is unavailable at the moment.
If this situation persist, please contact us atFeedback&Contact
1. (WO2010039898) EFFICIENT LARGE-SCALE FILTERING AND/OR SORTING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES
Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

WHAT IS CLAIMED:

1. A device for processing data, comprising: at least one query processor 220 that processes a query over a windowed subset of data based on a histogram computed for a distribution of values in a data store within a target window; and an encoder 210 for encoding, in response to the query, the windowed subset of data received as integer encoded and compressed sequences of values corresponding to different columns of the data.

2. The device of claim 1 , wherein the statistics are dynamically obtained and dependent on WHEREs and ORDER BYs for the purpose of windowing the data to the target window.

3. The device of claim 1, wherein the statistics are built for real data, and for synthetic data values, data is first discretized according to a discretization process.

4. The device of claim 3, wherein the synthetic, discretized data is treated as a form of joined column to participate in ORDER BY operations.

5. The device of claim 1, wherein a final window is not generated until a secondary scan once the content of columns implicated by ORDER BY operations is analyzed.

6. The device of claim 1 , wherein precomputed hierarchy data is used for simulating at least some synthetic, discretized, columns. 7. The device of claim 1, wherein a selection algorithm is employed to choose a discretization method for discretizing data and buffering strategy for the query based on at least one of the query definition, the content of the target window, as determined from statistics, or degree of parallelization.

8. The device of claim 1 , wherein a projection method is chosen based the content of WHERE and ORDER BY buckets of data and the content of the target window, as determined from statistics.

9. The device of claim 1 , wherein the final window is built in parallel using only insertions into one of up to N preconfigured buffers, with N corresponding to a number of processors of the device. 10. The device of claim 1 , wherein internal buffers for buffering data are configured according to buffer sizes and their policy based on at least one of query specification, a selected discretization method, a distribution of data in the final window, as determined by histogram, or degree of parallelization applied to data processing. 11. A method for processing data, comprising: receiving 2110 a query from an application implicating at least one filter or sort operation over data in at least one data store to retrieve a subset of data applying to a local window; computing statistics 2120 about distribution of rows for any specified WHERE clauses or ORDER BY columns of the query, based on the statistics, determining 2130 at least one result set for the at least one filter or sort operation that match with the local window; and transmitting 2140 the at least one result set to the application.

12. The method of claim 1, further comprising: caching 2230 the statistics for at least partial re-use.

13. The method of claim 1 , wherein the determining 2130 includes parallelizing the operations defined by the query with multiple processors and a corresponding number of segments of data, each segment handled by at least one different processor.

14. A computer readable medium comprising computer executable instructions for performing the method of claim 1.

15. A method for query processing, including: transmitting 2210 a query from an application implicating at least one filter or sort operation over data in at least one data store to retrieve a subset of data applying to a local window; based on statistics about distribution of rows for any specified WHERE clauses or

ORDER BY columns of the query, receiving 2220 at least one result set for the at least one filter or sort operation that match with the local window.

16. The method of claim 15 , further comprising : caching 2230 results based on policies regarding costs and potential for re -use of data.

17. The method of claim 15, further comprising dynamically determining 2120 the statistics dependent on WHEREs and ORDER BYs for the purpose of windowing the data to the local window.

18. The method of claim 15, further comprising, prior to determining the statistics, discretizing 700 synthetic data values to form synthetic, discretized data.

19. The method of claim 18, further comprising using the synthetic, discretized data during column joins to participate in ORDER BY operations.

20. The method of claim 15, waiting to generate a final window for the target window until content of columns implicated by ORDER BY operations is analyzed.