Algum conteúdo deste aplicativo está indisponível no momento.
Se esta situação persistir, por favor entre em contato conoscoFale conosco & Contato
1. (WO2019050624) PROCESSING OF COMPUTER LOG MESSAGES FOR VISUALIZATION AND RETRIEVAL
Nota: O texto foi obtido por processos automáticos de reconhecimento ótico de caracteres.
Para fins jurídicos, favor utilizar a versão PDF.

WHAT IS CLAIMED IS:

1. A computer-implemented method for processing computer log messages for log visualization and log retrieval, comprising:

collecting, by at least one processor operatively coupled to a memory, log messages from one or more computer system components;

performing, by the at least one processor, a log tokenization process on the log messages to generate tokens;

transforming, by the at least one processor for each log message, the tokens into log vectors associated with a metric space;

performing, by the at least one processor, dimensionality reduction on the metric space to project the metric space into a lower dimensional sub-space;

storing, by the at least one processor, similarity distances between respective pairs of the log vectors; and

in response to receiving a query associated with a query log message for reducing operational inefficiencies of the one or more computer system components, employing, by the at least one processor, the similarity distances to retrieve one or more similar log messages corresponding to the query log message for reducing the operational inefficiencies of the one or more computer system components.

2. The computer-implemented method of claim 1, wherein transforming the tokenized logs into the one or more log vectors further comprises:

determining a total number of unique tokens in a tokenized log file;

choosing a number of dimensions d for the metric space;

executing a model to learn vector representations in the metric space; and for a given log message, averaging the vector representations in the given log message to obtain the <i-dimensional vector representation for the given log message.

3. The computer-implemented method of claim 2, wherein determining the total number of unique tokens in the tokenized log file further comprises sorting the tokens of the tokenized log file, removing duplicate tokens from the sorted tokens, and counting remaining ones of the tokens.

4. The computer implemented method of claim 2, wherein the model includes a continuous bag of words (CBOW) model.

5. The computer-implemented method of claim 1, wherein retrieving the one or more similar log messages further comprises:

clustering the log vectors into a plurality of clusters to reduce log retrieval processing time, including assigning cluster identifications (IDs) to the log vectors corresponding to their respective clusters;

assigning a cluster ID to a query log vector associated with the query log message;

ranking respective distances between the query log vector and other log vectors within the cluster to which the query log vector corresponds; and

retrieving the one or more similar log messages based on the ranked distances.

6. The computer-implemented method of claim 5, further comprising, in response to determining that the query log message is not included in the one or more collected log messages:

generating, by the processor, a query log vector to embed the query log vector into the metric space; and

determining, by the processor, which cluster the generated query log vector belongs prior to assigning the cluster ID to the query log vector.

7. The computer-implemented method of claim 5, wherein clustering the log vectors includes employing a density-based spatial clustering of applications with noise (DSCAN) technique.

8. The computer-implemented method of claim 5, wherein ranking the respective distances between the query log vector and the other log vectors in the cluster further comprises:

calculating a similarity distance between the query log vector and each of the other log vectors; and

sorting the other log vectors based on the similarity distances.

9. The computer-implemented method of claim 1, wherein the locality preserving dimensionality reduction technique includes a t-Distributed Stochastic Neighbor Embedding (t-SNE) technique.

10. The computer-implemented method of claim 1, further comprising generating, by the processor, a visualization of the vectors in the lower dimensional sub-space.

11. The computer-implemented method of claim 1, wherein the database includes a Structured Query Language (SQL) database.

12. A system for processing computer log messages for log visualization and log retrieval to improve log processing and management systems, comprising:

a memory device for storing program code;

a processor operatively coupled to the memory device and configured to execute program code stored on the memory device to:

collect log messages from one or more computer system components; perform a log tokenization process on the log messages to generate tokens; transform the tokens into log vectors associated with a metric space;

perform dimensionality reduction on the metric space to project the metric space into a lower dimensional sub-space;

store similarity distances between respective pairs of the log vectors; and in response to receipt of a query associated with a query log message for reducing operational inefficiencies of the one or more computer system

components, employ the similarity distances to retrieve one or more similar log messages corresponding to the query log message for reducing the operational inefficiencies of the one or more computer system components.

13. The system of claim 12, wherein, in transforming the tokenized logs into the one or more log vectors, the processor is further configured to execute program code to:

determine a total number of unique tokens in a tokenized log file by sorting the tokens of the tokenized log file, removing duplicate tokens from the sorted tokens, and counting remaining ones of the tokens;

choose a number of dimensions d for the metric space;

execute a model to learn vector representations in the metric space; and for a given log message, average the vector representations in the given log message to obtain the <i-dimensional vector representation for the given log message.

14. The system of claim 12, wherein, in retrieving the one or more similar log messages, the processor is further configured to execute program code to:

cluster the log vectors into a plurality of clusters to reduce log retrieval processing time, including assigning cluster identifications (IDs) to the log vectors corresponding to their respective clusters;

assign a cluster ID to a query log vector associated with the query log message; rank respective distances between the query log vector and other log vectors within the cluster to which the query log vector corresponds; and

retrieve the one or more similar log messages based on the ranked distances.

15. The system of claim 14, wherein, in response to a determination that the query log message is not included in the one or more collected log messages, the processor is further configured to execute program code to:

generate a query log vector to embed the query log vector into the metric space; and

determine which cluster the generated query log vector belongs prior to assigning the cluster ID to the query log vector.

16. The system of claim 14, wherein, in ranking the respective distances between the query log vector and the other log vectors in the cluster, the processor is further configured to execute program code to:

calculate a similarity distance between the query log vector and each of the other log vectors; and

sort the other log vectors based on the similarity distances.

17. The system of claim 12, wherein the locality preserving dimensionality reduction technique includes a t-Distributed Stochastic Neighbor Embedding (t-SNE) technique.

18. The system of claim 12, wherein the processor is further configured to execute program code stored on the memory device to generate a visualization of the vectors in the lower dimensional sub-space.

19. The system of claim 12, wherein the database includes a Structured Query Language (SQL) database.

20. A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method for processing computer log messages for log visualization and log retrieval to improve log processing and management systems, the method performed by the computer comprising: collecting log messages from one or more computer system components;

performing a log tokenization process on the log messages to generate tokens; transforming the tokens into log vectors associated with a metric space;

performing dimensionality reduction on the metric space to project the metric space into a lower dimensional sub-space;

storing similarity distances between respective pairs of the log vectors; and in response to receiving a query associated with a query log message for reducing operational inefficiencies of the one or more computer system components, employing the similarity distances to retrieve one or more similar log messages corresponding to the query log message for reducing the operational inefficiencies of the one or more computer system components.