Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020141462 - SYSTEM AND METHOD FOR A WEB SCRAPING TOOL AND CLASSIFICATION ENGINE

Publication Number WO/2020/141462
Publication Date 09.07.2020
International Application No. PCT/IB2020/000025
International Filing Date 02.01.2020
IPC
G06K 9/62 2006.1
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
CPC
G06F 16/55
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
50of still image data
55Clustering; Classification
G06F 16/951
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
951Indexing; Web crawling techniques
G06F 16/957
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
957Browsing optimisation, e.g. caching or content distillation
G06F 16/9577
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
957Browsing optimisation, e.g. caching or content distillation
9577Optimising the visualization of content, e.g. distillation of HTML documents
G06K 9/6273
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
6267Classification techniques
6268relating to the classification paradigm, e.g. parametric or non-parametric approaches
627based on distances between the pattern to be recognised and training or reference patterns
6271based on distances to prototypes
6272based on distances to cluster centroïds
6273Smoothing the distance, e.g. Radial Basis Function Networks
G06K 9/6293
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
62Methods or arrangements for recognition using electronic means
6288Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
6292of classification results, e.g. of classification results related to same input data
6293of classification results relating to different input data, e.g. multimodal recognition
Applicants
  • ZYTE GROUP LIMITED [IE]/[IE]
Inventors
  • KOROBOV, Mikhail
  • LOPUKHIN, Konstantin
Priority Data
16/279,50419.02.2019US
62/787,64202.01.2019US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) SYSTEM AND METHOD FOR A WEB SCRAPING TOOL AND CLASSIFICATION ENGINE
(FR) SYSTÈME ET PROCÉDÉ POUR UN OUTIL D'AMÉNAGEMENT DU WEB ET MOTEUR DE CLASSIFICATION
Abstract
(EN)
A web scaping system configured with artificial intelligence and image object detection. The system processes a web page with a neural network to perform object detection to obtain structured data, including text, image and other kinds of data, from web pages. The neural network allows the system to efficiently process visual information (including screenshots), text content and HTML structure to achieve good quality and decrease extraction time.
(FR)
L'invention concerne un système d'aménagement du Web configuré avec une intelligence artificielle et une détection d'objet d'image. Le système traite une page Web avec un réseau neuronal pour effectuer une détection d'objet de sorte à obtenir des données structurées, comprenant du texte, une image et d'autres types de données, à partir de pages Web. Le réseau neuronal permet au système de traiter efficacement des informations visuelles (y compris des captures d'écran), un contenu textuel et une structure HTML pour obtenir une bonne qualité et réduire le temps d'extraction.
Related patent documents
EP2020709697This application is not viewable in PATENTSCOPE because the national phase entry has not been published yet or the national entry is issued from a country that does not share data with WIPO or there is a formatting issue or an unavailability of the application.
Latest bibliographic data on file with the International Bureau