Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020111395 - DEVICE AND METHOD FOR TERM CLUSTERING OF UNSTRUCTURED TEXT DATA FOR BIG DATA ANALYSIS

Publication Number WO/2020/111395
Publication Date 04.06.2020
International Application No. PCT/KR2019/002778
International Filing Date 11.03.2019
IPC
G06F 16/35 2019.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
35Clustering; Classification
G06F 17/27 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
20Handling natural language data
27Automatic analysis, e.g. parsing, orthograph correction
G06N 5/04 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
5Computer systems using knowledge-based models
04Inference methods or devices
CPC
G06F 16/35
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
35Clustering; Classification
G06N 5/048
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
5Computer systems using knowledge-based models
04Inference methods or devices
048Fuzzy inferencing
Applicants
  • (주) 위세아이텍 WISEITECH CO., LTD. [KR]/[KR]
Inventors
  • 황덕열 HWANG, Deok Yeol
  • 공성원 KONG, Seong Won
  • 김세경 KIM, Se Kyung
Agents
  • 유민규 YOU, Min Gyu
Priority Data
10-2018-014733526.11.2018KR
Publication Language Korean (KO)
Filing Language Korean (KO)
Designated States
Title
(EN) DEVICE AND METHOD FOR TERM CLUSTERING OF UNSTRUCTURED TEXT DATA FOR BIG DATA ANALYSIS
(FR) DISPOSITIF ET PROCÉDÉ DE REGROUPEMENT DE TERMES DE DONNÉES DE TEXTE NON STRUCTURÉES POUR L'ANALYSE DE MÉGADONNÉES
(KO) 빅데이터 분석을 위한 비정형 텍스트 데이터의 용어 군집화 장치 및 방법
Abstract
(EN)
The present invention relates to a device for term clustering of unstructured text data for big data analysis, and the device for term clustering of unstructured text data for big data analysis may comprise: a database including a data set; a data preprocessing unit for selecting and preprocessing data from a data set included in the database; a recommended term determination unit for separating morphemes of original terms included in preprocessed data, and calculating recommended scores of the original terms; and a data clustering unit for separating phonemes of original terms, calculating the similarity between the respective original terms separated into phonemes, and clustering an original term having a similarity calculation value equal to or greater than a preconfigured threshold value, wherein the recommended term determination unit determines a recommended term among the plurality of original terms on the basis of the recommended scores and a result of the clustering.
(FR)
La présente invention concerne un dispositif de regroupement de termes de données de texte non structurées pour l'analyse de mégadonnées, le dispositif de regroupement de termes de données de texte non structurées pour l'analyse de mégadonnées comprenant : une base de données comprenant un ensemble de données ; une unité de prétraitement de données pour sélectionner et prétraiter des données provenant d'un ensemble de données compris dans la base de données ; une unité de détermination de terme recommandé pour séparer des morphèmes de termes d'origine inclus dans des données prétraitées, et calculer des scores recommandés pour les termes d'origine ; et une unité de regroupement de données pour séparer des phonèmes de termes d'origine, calculer la similarité entre les termes d'origine respectifs séparés en phonèmes, et regrouper un terme d'origine présentant une valeur de calcul de similarité supérieure ou égale à une valeur seuil prédéfinie, l'unité de détermination de terme recommandé déterminant un terme recommandé parmi la pluralité des termes d'origine sur la base des scores recommandés et d'un résultat du regroupement.
(KO)
빅데이터 분석을 위한 비정형 텍스트 데이터의 용어 군집화 장치에 관한 것이며, 빅데이터 분석을 위한 비정형 텍스트 데이터의 용어 군집화 장치는 데이터 셋을 포함하는 데이터 베이스, 상기 데이터 베이스에 포함된 데이터 셋에서 데이터를 선택하고 전처리를 수행하는 데이터 전처리부, 전처리된 데이터에 포함된 원본 용어의 형태소를 분리하고, 원본 용어의 추천 점수를 계산하는 추천 용어 결정부 및 원본 용어의 음운을 분리하고, 음운으로 분리된 각각의 원본 용어 간의 유사도 연산을 수행하고, 상기 유사도 연산 값이 미리 설정된 임계치 이상인 원본 용어를 군집화하는 데이터 군집부를 포함하되, 상기 추천 용어 결정부는, 상기 추천 점수 및 상기 군집화 결과에 기반하여 복수의 원본 용어 중 추천 용어를 결정할 수 있다.
Also published as
Latest bibliographic data on file with the International Bureau