Processing

Please wait...

Settings

Settings

Goto Application

1. US20180165272 - Automatic locale determination for electronic documents

Office United States of America
Application Number 15858980
Application Date 29.12.2017
Publication Number 20180165272
Publication Date 14.06.2018
Grant Number 10346538
Grant Date 09.07.2019
Publication Kind B2
IPC
G06F 16/81
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
80of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
81Indexing, e.g. XML tags; Data structures therefor; Storage structures
G06F 17/22
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
20Handling natural language data
21Text processing
22Manipulating or registering by use of codes, e.g. in sequence of text characters
G06F 17/27
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
17Digital computing or data processing equipment or methods, specially adapted for specific functions
20Handling natural language data
27Automatic analysis, e.g. parsing, orthograph correction
G06F 16/182
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
10File systems; File servers
18File system types
182Distributed file systems
G06F 16/9535
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
953Querying, e.g. by the use of web search engines
9535Search customisation based on user profiles and personalisation
CPC
G06F 16/182
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
10File systems; File servers
18File system types
182Distributed file systems
G06F 17/275
G06F 16/81
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
80of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
81Indexing, e.g. XML tags; Data structures therefor; Storage structures
G06F 16/9535
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
90Details of database functions independent of the retrieved data types
95Retrieval from the web
953Querying, e.g. by the use of web search engines
9535Search customisation based on user profiles and personalisation
G06F 17/2247
G06F 17/2252
Applicants Coupa Software Incorporated
Inventors Matthew Pasquini
Agents Hickman Palermo Becker Bingham LLP
Title
(EN) Automatic locale determination for electronic documents
Abstract
(EN)

Automatic locale determination for documents is described. In an embodiment, a computer server receives an electronic document comprising a plurality of unknown-language data elements each associated with one or more types. Based on a document schema of the document, the computer system selects one or more unknown-language data elements from the plurality of unknown-language data elements and assigning to each of the one or more unknown-language data elements a corresponding weight value based on a respective type of the unknown-language data element. The computer system compares the one or more unknown-language data elements with a plurality of known-language data elements that are associated with the document schema and based on the comparing, determines a number of unknown-language data elements in the one or more unknown-language data elements that matched any in a subset of the plurality of known-language data elements, wherein the subset of known-language data elements corresponds to a particular language. Based on the number of data elements that matched to the subset of known-language data elements and based on the corresponding weight assigned to each unknown-language data element in the number of unknown-language data elements, the computer system determines a language confidence level value specifying a level of machine confidence that the document is expressed in the particular language and based on the language confidence value for the particular language exceeding a language threshold value, automatically processes the document using the particular language.