Search International and National Patent Collections

1. (WO2018122269) BIT-SEQUENCE-BASED DATA CLASSIFICATION SYSTEM

Pub. No.:    WO/2018/122269    International Application No.:    PCT/EP2017/084654
Publication Date: Fri Jul 06 01:59:59 CEST 2018 International Filing Date: Thu Dec 28 00:59:59 CET 2017
IPC: G06F 17/30
Applicants: BUNDESDRUCKEREI GMBH
Inventors: KOMAROV, Ilya
PAESCHKE, Manfred
Title: BIT-SEQUENCE-BASED DATA CLASSIFICATION SYSTEM
Abstract:
The invention relates to a computer-implemented method for data classification. The method comprises: providing (402) a token set (153), which contains tokens, which were produced from a plurality of field values of a plurality of data sets (DR1-DR8) by tokenization, wherein the tokens were produced from field values of at least two different field types (F1-F7), wherein the tokens are stored in the form of a bit sequence; analyzing (404) one or more characteristics of the tokens at the level of the bit sequence in order to identify subsets (420, 422, 424) of characteristic-similar tokens, wherein the characteristics comprise the bit sequence of the tokens and/or the length of the bit sequence; storing (406) a copy of each of the subsets of characteristic-similar tokens in a form (426, 428, 430) that is separated by subset, wherein each subset copy represents a class of characteristic-similar data.