Processing

Please wait...

Settings

Settings

Goto Application

1. WO2005069199 - METHODS AND SYSTEMS FOR TEXT SEGMENTATION

Publication Number WO/2005/069199
Publication Date 28.07.2005
International Application No. PCT/US2003/041609
International Filing Date 30.12.2003
IPC
G06K 9/72 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
72using context analysis based on the provisionally recognised identity of a number of successive patterns, e.g. a word
CPC
G06F 40/284
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
40Handling natural language data
20Natural language analysis
279Recognition of textual entities
284Lexical analysis, e.g. tokenisation or collocates
Applicants
  • GOOGLE INC. [US]/[US] (AllExceptUS)
  • WEISSMAN, Adam, J. [US]/[US] (UsOnly)
Inventors
  • WEISSMAN, Adam, J.
Agents
  • GARDNER, Steven, J.
Priority Data
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) METHODS AND SYSTEMS FOR TEXT SEGMENTATION
(FR) PROCEDES ET SYSTEMES DE SEGMENTATION DE TEXTE
Abstract
(EN)
Methods and systems for test segmentation are disclosed. In one such method and system, a string of characters is accessed (204), a long token is identified (206), contiguous characters in the long token are pinned down (208), tokens from the string of characters are determined by keeping the pinned down contiguous characters together; and a plurality of combinations of tokens are determined (210), wherein the number of combinations of tokens is reduced by the pinned down contiguous characters.
(FR)
Cette invention concerne des procédés et des systèmes de segmentation de texte. Dans un procédé et un système de cette invention, le procédé consiste à accéder à une chaîne de caractère (204); à identifier un long jeton (206); à fixer les caractères contigus dans le long jeton (208), à déterminer les jetons de la chaînes de caractères en maintenant groupés les caractères contigus fixés; et à déterminer une pluralité de combinaisons (210), le nombre de combinaisons de jetons étant réduit par les caractères contigus fixes.
Latest bibliographic data on file with the International Bureau