Search International and National Patent Collections

1. (WO2016058267) CHINESE WEBSITE CLASSIFICATION METHOD AND SYSTEM BASED ON CHARACTERISTIC ANALYSIS OF WEBSITE HOMEPAGE

Pub. No.:    WO/2016/058267    International Application No.:    PCT/CN2014/094220
Publication Date: Fri Apr 22 01:59:59 CEST 2016 International Filing Date: Fri Dec 19 00:59:59 CET 2014
IPC: G06F 17/30
Applicants: SURFILTER NETWORK TECHNOLOGY CO., LTD
任子行网络技术股份有限公司
SOUTH CHINA UNIVERSITY OF TECHNOLOGY
华南理工大学
Inventors: TANG, Xinmin
唐新民
SHEN, Zhijie
沈智杰
JING, Xiaojun
景晓军
CAI, Yi
蔡毅
CAI, Zhiwei
蔡志威
Title: CHINESE WEBSITE CLASSIFICATION METHOD AND SYSTEM BASED ON CHARACTERISTIC ANALYSIS OF WEBSITE HOMEPAGE
Abstract:
Disclosed are a Chinese website classification method and system based on characteristic analysis of a website homepage. The method specifically comprises the following steps: S1, crawling website content; S2, marking a website type; S3, extracting website information; S4, calculating a weight and expressing the weight in the form of a characteristic vector; and S5, classifying the website by comparing the characteristic vector. By utilizing the Chinese website classification method and system based on the characteristic analysis of the website homepage, the noise interference can be alleviated to the greatest extent by only extracting a title and meta-information of the website; by means of pre-processing and characteristic vector expression, the characteristics of the website are accurately expressed with the vector, so that the accuracy of classification is increased; and since only the title and meta-information of the website need to be processed, the quantity of data to be processed is small, and the processing speed is high.