Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020204364 - METHOD AND DEVICE FOR WORD EMBEDDING ON BASIS OF CONTEXT INFORMATION AND MORPHOLOGICAL INFORMATION OF WORD

Publication Number WO/2020/204364
Publication Date 08.10.2020
International Application No. PCT/KR2020/003000
International Filing Date 03.03.2020
IPC
G06F 40/205 2020.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
40Handling natural language data
20Natural language analysis
205Parsing
G06F 40/289 2020.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
40Handling natural language data
20Natural language analysis
279Recognition of textual entities
289Phrasal analysis, e.g. finite state techniques or chunking
G06N 20/00 2019.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
20Machine learning
Applicants
  • 성균관대학교 산학협력단 RESEARCH & BUSINESS FOUNDATION SUNGKYUNKWAN UNIVERSITY [KR]/[KR]
Inventors
  • 원민섭 WON, Min Sub
  • 이지형 LEE, Jee Hyong
  • 이상헌 LEE, Sang Heon
  • 신윤섭 SHIN, Yun Seob
  • 정동언 JEONG, Dong Eon
Agents
  • 인비전 특허법인 ENVISION PATENT & LAW FIRM
Priority Data
10-2019-003858702.04.2019KR
Publication Language Korean (KO)
Filing Language Korean (KO)
Designated States
Title
(EN) METHOD AND DEVICE FOR WORD EMBEDDING ON BASIS OF CONTEXT INFORMATION AND MORPHOLOGICAL INFORMATION OF WORD
(FR) PROCÉDÉ ET DISPOSITIF DE PLONGEMENT LEXICAL SUR LA BASE D'INFORMATIONS CONTEXTUELLES ET D'INFORMATIONS MORPHOLOGIQUES D'UN MOT
(KO) 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치
Abstract
(EN)
The present invention relates to a method and device for word embedding on the basis of context information and morphological information of a word. A method for word embedding according to one embodiment of the present invention comprises the steps of: processing a sentence by replacing an out of vocabulary (OOV) word in the sentence to be learned with an unknown token; inputting characters of a target word excluding the out of vocabulary word in the processed sentence as an input of a context character model to be learned; combining surrounding context vectors for surrounding words of the target word in the sentence so as to set the context character model as an initial state; and learning the context character model such that an error can be minimized between predicted embedding of the target word and real embedding of the target word, the predicted embedding being generated by connecting a forward hidden state and a backward hidden state calculated from the context character model.
(FR)
La présente invention concerne un procédé et un dispositif de plongement lexical sur la base d'informations contextuelles et d'informations morphologiques d'un mot. Un procédé de plongement lexical selon un mode de réalisation de la présente invention comprend les étapes consistant à : traiter une phrase par remplacement d'un mot hors vocabulaire (HV) dans la phrase à apprendre par un jeton inconnu ; entrer des caractères d'un mot cible excluant le mot hors vocabulaire dans la phrase traitée comme entrée d'un modèle de caractère de contexte à apprendre ; combiner des vecteurs de contexte environnants pour des mots environnants du mot cible dans la phrase de façon à définir le modèle de caractère de contexte comme état initial ; et apprendre le modèle de caractère de contexte de telle sorte qu'une erreur puisse être minimisée entre le plongement prédit et le plongement réel du mot cible, le plongement prédit étant généré par connexion d'un état caché avant et d'un état caché arrière calculés à partir du modèle de caractère de contexte.
(KO)
본 발명은 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치에 관한 것으로, 본 발명의 일 실시예에 따른 단어 임베딩 방법은, 학습시킬 문장에서 미등록 단어(OOV: Out Of Vocabulary)를 미지의 토큰(unknown token)으로 대체하여 문장을 가공하는 단계, 상기 가공된 문장에서 상기 미등록 단어를 제외한 타겟 단어의 문자(Character)를 학습 대상인 문맥 문자 모델(Context Character Model)의 입력으로 입력하는 단계, 상기 문장에서 타겟 단어의 주변 단어에 대한 주변 문맥 벡터를 조합하여 상기 문맥 문자 모델의 초기 상태로 설정하는 단계; 및 상기 문맥 문자 모델로부터 산출된 순방향 은닉 상태(Forward hidden state) 및 역방향 은닉 상태(Backward hidden state)를 연결하여 생성된 상기 타겟 단어의 예측 임베딩(Predicted embedding)과 상기 타겟 단어의 실제 임베딩(Real embedding) 간의 오류가 최소가 되도록, 상기 문맥 문자 모델을 학습하는 단계를 포함한다.
Latest bibliographic data on file with the International Bureau