Processing

Please wait...

Settings

Settings

Goto Application

1. WO2005093599 - METHOD FOR DETERMINING LOGICAL COMPONENTS OF A DOCUMENT

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS
1. A method for determining logical components of a portable document format (PDF) document, the method comprising:
separating the document into a plurality of layers (202);
creating a PDF document for each layer (204);
determining a logical structure for each layer (206); and
combining the logical structures from each of said layers (208).

2. The method of claim 1 wherein the plurality of layers comprise a text layer, an image layer and a vector graphics layer.

3. The method of claim 2 wherein the text layer includes characters and words, said words having at least one of a horizontal and a vertical orientation and being grouped according to the orientation.

4. The method of claim 3 further comprising:
forming text lines with words have same attributes, said attributes including font type, size and color

5. The method of claim 4 further comprising:
forming text segments with text lines having same attributes; and
creating a bounding box for each text segment

6. The method of claim 2 further comprising:
determining a rectangle for an image layer, said rectangle being of a minimum size to encompass the image.

7. The method of claim 2 wherein the vector graphics layer comprises lines, curves and rectangles forming at least one graphic object.

8. The method of claim 1 further comprising:
preserving the order of the layers.

9. A system for determining logical components of a portable document format (PDF) document comprising:
means for separating the document into a plurality of layers;
means for creating a PDF document for each of said layers;
means for determining a logical structure for each layer; and
means for combining the logical structures of said layers.

10. A computer readable medium containing executable instructions, when executed in a processing system, cause the system to perform a method comprising: separating a portable document format (PDF) document into a plurality of layers (202);
creating a PDF document for each layer (204);
determining a logical structure for each layer (206); and
combining the logical structures from each of said layers (208).