Search International and National Patent Collections
|1. (WO2016058267) CHINESE WEBSITE CLASSIFICATION METHOD AND SYSTEM BASED ON CHARACTERISTIC ANALYSIS OF WEBSITE HOMEPAGE|
|Applicants:||SURFILTER NETWORK TECHNOLOGY CO., LTD
SOUTH CHINA UNIVERSITY OF TECHNOLOGY
|Title:||CHINESE WEBSITE CLASSIFICATION METHOD AND SYSTEM BASED ON CHARACTERISTIC ANALYSIS OF WEBSITE HOMEPAGE|
Disclosed are a Chinese website classification method and system based on characteristic analysis of a website homepage. The method specifically comprises the following steps: S1, crawling website content; S2, marking a website type; S3, extracting website information; S4, calculating a weight and expressing the weight in the form of a characteristic vector; and S5, classifying the website by comparing the characteristic vector. By utilizing the Chinese website classification method and system based on the characteristic analysis of the website homepage, the noise interference can be alleviated to the greatest extent by only extracting a title and meta-information of the website; by means of pre-processing and characteristic vector expression, the characteristics of the website are accurately expressed with the vector, so that the accuracy of classification is increased; and since only the title and meta-information of the website need to be processed, the quantity of data to be processed is small, and the processing speed is high.