Search International and National Patent Collections
Some content of this application is unavailable at the moment.
If this situation persists, please contact us atFeedback&Contact
1. (WO2010078475) METHODS AND SYSTEM FOR DOCUMENT RECONSTRUCTION
Latest bibliographic data on file with the International Bureau

Pub. No.: WO/2010/078475 International Application No.: PCT/US2009/069885
Publication Date: 08.07.2010 International Filing Date: 31.12.2009
IPC:
G06F 17/27 (2006.01)
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
F
ELECTRIC DIGITAL DATA PROCESSING
17
Digital computing or data processing equipment or methods, specially adapted for specific functions
20
Handling natural language data
27
Automatic analysis, e.g. parsing, orthograph correction
Applicants:
APPLE INC. [US/US]; 1 Infinite Loop, M/s 40-pat Cupertino, CA 95014, US (AllExceptUS)
MANSFIELD, Philip, Andrew [CA/CA]; CA (UsOnly)
LEVY, Michael, Robert [CA/CA]; CA (UsOnly)
CLEGG, Derek, B. [US/US]; US (UsOnly)
Inventors:
MANSFIELD, Philip, Andrew; CA
LEVY, Michael, Robert; CA
CLEGG, Derek, B.; US
Agent:
ADELI, Mani; Adeli & Tollen LLP 11940 San Vicente Blvd., Suite 100 Los Angeles, CA 90049, US
Priority Data:
12/455,86607.06.2009US
12/479,84207.06.2009US
12/479,84307.06.2009US
12/479,84407.06.2009US
12/479,84507.06.2009US
12/479,84707.06.2009US
12/479,84807.06.2009US
12/479,84907.06.2009US
12/479,85007.06.2009US
12/479,85207.06.2009US
61/142,32902.01.2009US
Title (EN) METHODS AND SYSTEM FOR DOCUMENT RECONSTRUCTION
(FR) PROCÉDÉS ET SYSTÈME DE RECONSTRUCTION DE DOCUMENT
Abstract:
(EN) Different embodiments of the invention use different techniques for analyzing an unstructured document to define a structured document. The unstructured document includes numerous primitive elements, but does not include structural elements that specify the structural relationship between the primitive elements and/or structural attributes of the document based on these primitive elements. To define the structured document, the primitive elements of the unstructured document are used to identify various geometric attributes of the unstructured document. The identified geometric attributes and other attributes of the primitive elements are used to define structural elements, such as associated primitive elements (e.g., words, paragraphs, joined graphs, etc.), tables, guides, gutters, etc, as well as to define the flow of reading through the primitive and structural elements. Various methods to enhance the efficiency of the geometric analysis and document reconstruction processes, ( e.g., hierarchical profiling, efficient cluster analysis techniques, efficient data structures) are provided.
(FR) Différents modes de réalisation de l'invention utilisent différentes techniques pour analyser un document non structuré pour définir un document structuré. Le document non structuré comprend de nombreux éléments primitifs, mais ne comprend pas d'éléments structuraux qui spécifient la relation structurale entre les éléments primitifs et/ou des attributs structuraux du document sur la base de ces éléments primitifs. Pour définir le document structuré, les éléments primitifs du document non structuré sont utilisés pour identifier divers attributs géométriques du document non structuré. Les attributs géométriques identifiés et d'autres attributs des éléments primitifs sont utilisés pour définir des éléments structuraux, tels que des éléments primitifs associés (par exemple, des mots, paragraphes, graphiques joints, etc.), des tableaux, des guides, intercalaires colonnes, etc., ainsi que pour définir le flux de lecture à travers les éléments primitifs et structuraux. L'invention porte sur divers procédés pour améliorer l'efficacité des processus d'analyse géométrique et de reconstruction de document (par exemple, profilage hiérarchique, techniques d'analyse par groupe efficaces, structures de données efficaces).
front page image
Designated States: AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW
African Regional Intellectual Property Organization (ARIPO) (BW, GH, GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM, ZW)
Eurasian Patent Organization (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM)
European Patent Office (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, SE, SI, SK, SM, TR)
African Intellectual Property Organization (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG)
Publication Language: English (EN)
Filing Language: English (EN)
Also published as:
EP2374067JP2012514792CN102317933KR1020110112397KR1020130051017KR1020130116958