All documents available through PATENTSCOPESearch Service are available in PDF and as ZIP files (containing bibliographic data in XML format and complete page images in TIFF format). The published PCT international applications are also available in XML and HTML (containing search-quality OCR text of the description and claims), in addition to PDF and ZIP files. Finally, sequence listings and large documents are available as ZIP files.
Full-text of PCT international applications - the description and claims in text format are obtained by applying automatic Optical Character Recognition procedures (OCR) to the scanned images of the documents. They therefore contain discrepancies with the originals and do not have a legal value. The texts are used to feed the PATENTSCOPE Search Service text indexation engine and are freely provided as an additionnal service to the general public by the International Bureau, notably in the HTML renditions displayed in the "Description" and "Claims" tabs of each dossier.
As a consequence, only the PDF versions of the documents that contain the error-free scanned images should be used for legal matters.
For information, the average accuracy of the texts of published international applications obtained from the PCT automatic OCR procedures is generally well above 98.5% (i.e. less than 40 errors for a page that contains 3000 characters). However, the accuracy may drop significantly for a small percentage of difficult documents published every week. This is usually due to bad paper originals before scanning or to pages with complex layouts, fonts or with words that cannot be found in dictionaries (often applications containing chemical or mathematical formulae with a too small font size).