WIPO logo
Mobile | Deutsch | Español | Français | 日本語 | 한국어 | Português | Русский | 中文 | العربية |
PATENTSCOPE

Search International and National Patent Collections
World Intellectual Property Organization
Search
 
Browse
 
Translate
 
Options
 
News
 
Login
 
Help
 
Machine translation
1. (WO2000062155) SYSTEM AND METHOD FOR PARSING A DOCUMENT
Latest bibliographic data on file with the International Bureau   

Pub. No.:    WO/2000/062155    International Application No.:    PCT/US2000/009357
Publication Date: 19.10.2000 International Filing Date: 06.04.2000
Chapter 2 Demand Filed:    08.11.2000    
IPC:
G06F 17/27 (2006.01)
Applicants: SEMIO CORPORATION [US/US]; 1730 S. Amphlett Boulevard, Suite 101, San Mateo, CA 94402 (US)
Inventors: VOGEL, Claude; (US)
Agent: LOHSE, Timothy, W.; Gray Cary Ware & Freidenrich LLP, 3340 Hillview Avenue, Palo Alto, CA 94304 (US)
Priority Data:
09/288,994 09.04.1999 US
Title (EN) SYSTEM AND METHOD FOR PARSING A DOCUMENT
(FR) SYSTEME ET PROCEDE SERVANT A ANALYSER UN DOCUMENT
Abstract: front page image
(EN)A parsing system and method are provided in which the break characters in the document are used to rapidly parse the document and extract one or more key phrases from the document which characterize the document (44). The break characters in the document may include explicit break characters (46), such as punctuation, soft stop words and hard stop words. The determination of which phrases in the document are extracted depends upon the type of break character appearing after the phrase in the document (52).
(FR)L'invention concerne un système et un procédé d'analyse consistant à recourir aux caractères de coupure dans un document afin d'analyser rapidement le document et d'en extraire une ou plusieurs phrases clés caractérisant ce document (44). Les caractères de coupure dans le document peuvent comprendre des caractères de coupure explicites (46) tels que la ponctuation, les mots d'arrêt programmé et les mots d'arrêt immédiat. Le choix des phrases à extraire du document dépend du type de caractère de coupure apparaissant après la phrase dans le document (52).
Designated States: AE, AG, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CU, CZ, DE, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW.
African Regional Intellectual Property Organization (GH, GM, KE, LS, MW, SD, SL, SZ, TZ, UG, ZW)
Eurasian Patent Organization (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM)
European Patent Office (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE)
African Intellectual Property Organization (BF, BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG).
Publication Language: English (EN)
Filing Language: English (EN)