WIPO logo
Mobile | Deutsch | Español | Français | 日本語 | 한국어 | Português | Русский | 中文 | العربية |
PATENTSCOPE

Search International and National Patent Collections
World Intellectual Property Organization
Options
Query Language
Stem
Sort by:
List Length
Some content of this application is unavailable at the moment.
If this situation persist, please contact us atFeedback&Contact
1. (WO2016058138) CONSTRUCTION OF LEXICON FOR SELECTED CONTEXT
Latest bibliographic data on file with the International Bureau   

Pub. No.: WO/2016/058138 International Application No.: PCT/CN2014/088619
Publication Date: 21.04.2016 International Filing Date: 15.10.2014
IPC:
G06F 17/30 (2006.01)
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
F
ELECTRIC DIGITAL DATA PROCESSING
17
Digital computing or data processing equipment or methods, specially adapted for specific functions
30
Information retrieval; Database structures therefor
Applicants: CHUANG, Darren[CN/CN]; CN (US)
LI, Jingmei[US/CN]; CN (US)
LIU, Zhen[FR/US]; US (US)
MAK, Chiu Chun Bobby[CN/CN]; CN (US)
MICROSOFT TECHNOLOGY LICENSING, LLC[US/US]; One Microsoft Way Redmond, WA 98052, US
Inventors: CHUANG, Darren; CN
LI, Jingmei; CN
LIU, Zhen; US
MAK, Chiu Chun Bobby; CN
Agent: SHANGHAI PATENT & TRADEMARK LAW OFFICE, LLC; 435 Guiping Road Shanghai 200233, CN
Priority Data:
Title (EN) CONSTRUCTION OF LEXICON FOR SELECTED CONTEXT
(FR) CONSTRUCTION D'UN LEXIQUE POUR UN CONTEXTE SÉLECTIONNÉ
Abstract:
(EN) Various technologies pertaining to constructing a lexicon for a defined context are set forth herein. Social media text is acquired, where the social media text has contextual data that corresponds thereto. The social media text is encoded to form encoded text (in Unicode), and the contextual data is assigned to the encoded text. A text corpus for a defined context is formed by filtering the encoded text based upon contextual data, such as location. Frequency of occurrence of words or phrases in the text corpus is used to identify words or phrases that are to be included in the lexicon.
(FR) L'invention porte sur diverses technologies relatives à la construction d'un lexique pour un contexte défini. Un texte de média social est acquis, le texte de média social comportant des données contextuelles qui lui correspondent. Le texte de média social est codé pour former un texte codé (en Unicode), et les données contextuelles sont affectées au texte codé. Un corpus de texte pour un contexte défini est formé en filtrant le texte codé sur la base de données contextuelles, telles que l'emplacement. La fréquence d'occurrence de mots ou de phrases dans le corpus de texte est utilisée pour identifier des mots ou des phrases qui sont à inclure dans le lexique.
front page image
Designated States: AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KN, KP, KR, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW
African Regional Intellectual Property Organization (ARIPO) (BW, GH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, UG, ZM, ZW)
Eurasian Patent Office (AM, AZ, BY, KG, KZ, RU, TJ, TM)
European Patent Office (EPO) (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR)
African Intellectual Property Organization (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, KM, ML, MR, NE, SN, TD, TG)
Publication Language: English (EN)
Filing Language: English (EN)