Processing

Please wait...

Settings

Settings

Goto Application

1. WO2012148616 - JOINING TABLES IN A MAPREDUCE PROCEDURE

Publication Number WO/2012/148616
Publication Date 01.11.2012
International Application No. PCT/US2012/030941
International Filing Date 28.03.2012
IPC
G06F 17/30 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
30Information retrieval; Database structures therefor
CPC
G06F 16/24532
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
24Querying
245Query processing
2453Query optimisation
24532of parallel queries
G06F 16/2456
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
24Querying
245Query processing
2455Query execution
24553of query operations
24558Binary matching operations
2456Join operations
G06F 16/2471
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
24Querying
245Query processing
2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
2471Distributed queries
G06F 16/278
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
278Data partitioning, e.g. horizontal or vertical partitioning
G06F 16/283
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
20of structured data, e.g. relational data
28Databases characterised by their database models, e.g. relational or object models
283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
G06F 9/5066
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
9Arrangements for program control, e.g. control units
06using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
46Multiprogramming arrangements
50Allocation of resources, e.g. of the central processing unit [CPU]
5061Partitioning or combining of resources
5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
Applicants
  • GOOGLE INC. [US]/[US] (AllExceptUS)
  • CHATTOPADHYAY, Biswapesh [IN]/[US] (UsOnly)
  • LIN, Liang [CN]/[US] (UsOnly)
Inventors
  • CHATTOPADHYAY, Biswapesh
  • LIN, Liang
Agents
  • SODERBERG, J. Richard
Priority Data
13/209,56715.08.2011US
61/480,56329.04.2011US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) JOINING TABLES IN A MAPREDUCE PROCEDURE
(FR) JOINTURE DE TABLES DANS UNE PROCÉDURE MAPREDUCE
Abstract
(EN)
Systems and techniques by which tables can be joined in a mapreduce procedure. In some implementations, when a large table of business data (e.g., having one billion transaction records or more) is to be joined with a large table of customer data (e.g., having hundreds of millions of customer records), then these two tables can be organized before the mapreduce procedure to speed up the table join. For example, the business data and the customer data can both be hash partitioned, based on the same key, into shards of business data and shards of customer data, respectively. The number of shards in these two groups has an integer relationship with each other: for example such that there are two business data shards for every customer data shard, or vice versa.
(FR)
L'invention porte sur des systèmes et des techniques par lesquels des tables peuvent être jointes dans une procédure MapReduce. Selon certains modes de réalisation, lorsqu'une grande table de données d'entreprise (par exemple, comprenant un billion d'enregistrements de transaction ou plus) doit être jointe à une grande table de données de consommateur (par exemple, comprenant des centaines de millions d'enregistrements de consommateur), alors ces deux tables peuvent être organisées avant la procédure MapReduce afin d'accélérer la jointure de tables. Par exemple, les données d'entreprise et les données de consommateur peuvent être partitionnées par hachage, sur la base de la même clé, en fragments de données d'entreprise et fragments de données de consommateur, respectivement. Les nombres de fragments dans ces deux groupes sont en rapport entier l'un avec l'autre : par exemple de telle manière qu'il existe deux fragments de données d'entreprise pour chaque fragment de données de consommateur, ou vice-versa.
Latest bibliographic data on file with the International Bureau