Processing

Please wait...

Settings

Settings

Goto Application

1. WO2013016139 - CONFIGURING WEB CRAWLER TO EXTRACT WEB PAGE INFORMATION

Publication Number WO/2013/016139
Publication Date 31.01.2013
International Application No. PCT/US2012/047426
International Filing Date 19.07.2012
IPC
G06F 17/30 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
30Information retrieval; Database structures therefor
CPC
G06F 16/353
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
35Clustering; Classification
353into predefined classes
G06F 16/951
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
951Indexing; Web crawling techniques
G06F 16/954
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
954Navigation, e.g. using categorised browsing
G06F 16/9566
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
955using information identifiers, e.g. uniform resource locators [URL]
9566URL specific, e.g. using aliases, detecting broken or misspelled links
G06F 16/9577
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
957Browsing optimisation, e.g. caching or content distillation
9577Optimising the visualization of content, e.g. distillation of HTML documents
G06Q 10/10
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
10Administration; Management
10Office automation, e.g. computer aided management of electronic mail or groupware
Applicants
  • ALIBABA GROUP HOLDING LIMITED (AllExceptUS)
  • SUN, Yiming [CN]/[CN] (UsOnly)
  • QIANG, Qi [CN]/[CN] (UsOnly)
  • CAI, Boyang [CN]/[CN] (UsOnly)
  • JIN, Xiaojun [CN]/[CN] (UsOnly)
  • WU, Zongyuan [CN]/[CN] (UsOnly)
Inventors
  • SUN, Yiming
  • QIANG, Qi
  • CAI, Boyang
  • JIN, Xiaojun
  • WU, Zongyuan
Agents
  • FU, Diana, Y.
Priority Data
13/552,37418.07.2012US
201110207897.122.07.2011CN
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) CONFIGURING WEB CRAWLER TO EXTRACT WEB PAGE INFORMATION
(FR) CONFIGURATION DE ROBOT WEB POUR EXTRAIRE DES INFORMATIONS DE PAGE WEB
Abstract
(EN)
Web crawling configuration includes: obtaining, using one or more computer processors, a webpage comprising a plurality of nodes; presenting the webpage to a user; receiving a user selection of a node in the webpage, the node comprising at least one element; in response to the user selection of the node, presenting a web crawling configuration option pertaining to a web crawling action to be performed with respect to the node, the web crawling configuration option depending at least in part on a type of an element included in the node; receiving a user input specifying the web crawling configuration options pertaining to the web crawling action to be performed with respect to the node; and storing user specified web crawling configuration options, performing the web crawling action on the node according to the user input, or both.
(FR)
L'invention concerne la configuration d'un robot Web qui consiste : à obtenir, à l'aide d'un ou de plusieurs processeurs d'ordinateur, une page Web comportant une pluralité de nœuds ; à présenter la page Web à un utilisateur ; à recevoir la sélection d'utilisateur d'un nœud dans la page Web, le nœud comportant au moins un élément ; en réponse à la sélection d'utilisateur du nœud, à présenter une option de configuration de robot Web concernant une action de robot Web devant être effectuée relativement au nœud, l'option de configuration de robot Web dépendant, au moins en partie, du type d'un élément inclus dans le nœud ; à recevoir une entrée d'utilisateur spécifiant les options de configuration de robot Web concernant l'action de robot Web devant être effectuée relativement au nœud ; à stocker des options de configuration de robot Web spécifiées par un utilisateur, à effectuer l'action de robot Web sur le nœud selon l'entrée d'utilisateur, ou les deux.
Also published as
Latest bibliographic data on file with the International Bureau