Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020141968 - A SYSTEM AND METHOD FOR IMPACT ANALYSIS OF CHANGE REQUEST THAT AFFECTS DATABASE STRUCTURE THROUGH CLASSIFICIATION AND KEYWORD SIMILARITY ANALYSIS

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

A SYSTEM AND METHOD FOR IMPACT ANALYSIS OF CHANGE REQUEST THAT AFFECTS DATABASE STRUCTURE THROUGH CLASSIFICATION AND KEYWORD

SIMILARITY ANALYSIS

FIELD OF INVENTION

The present invention relates to a system and method for impact analysis of change request that affects database structure through classification and keyword similarity analysis. In particular, the present invention utilizes classification, tagged words and keyword similarity in detecting impact to non related requirements when a change request is proposed.

BACKGROUND ART

Currently, computing systems or machine dependent systems are built based on multiple requirements which are interrelated in nature. A change to a particular requirement would have direct and indirect impact to other requirements of the system. Detecting direct impacts are relatively easy, while indirect impacts might not have related visibility to the change request. Direct and indirect impact may cause malfunction of the system.

United States Patent No. 8,768,902 B2 (hereinafter referred to as the US 902 B2 Patent) entitled“Unified concurrent changes to data, schema and application” having a filing date of 1 1 June 2010; (Assignee: Microsoft Corporation) relates to unified concurrent changes to data, schema, and application whereby changes are managed in a multi-user database application environment. Proposed changes are collected, including changes to data, schema or application descriptions. In the US 902 B2 Patent, user’s proposed changes are made according to its dependencies and submitted in a single transaction. For example, a proposed change to a data value may be inconsistent with a change which removes a data element containing the data value. In the US 902 B2 Patent, user may propose changes direct to database objects. Further, the invention in the US 902 B2 includes a diff manager that can produce diffgrams from user gestures to detect inconsistencies whereby diffgrams represent proposed changes to observed data, schema, or application values.

United States Patent Application Publication No. 2008/0147704 A1 (hereinafter referred to as the US 704 A1 Publication) entitled“Systems and Methods for Propogation of Database Schema Changes”, having a filing date of 13 December 2006 (Applicant: Godwin et. al). The US 704 A1 Publication relates to propagation of database schema changes. The US 704 A1 Publication discloses a usability tool which identifies changes in the schema of a database.

The usability tool identifies source code dependencies that are affected by the changes in database schema whereby the tool informs a user of the affected dependencies and may also automatically update existing source code dependencies corresponding to the changed schema. The system further comprises a source code manager to determine dependencies affected by the change in schema, the dependencies comprising computer code references to at least some of the tables or columns of data in the database.

United States Patent Application Publication No. 201 1/01 19288 A1 (hereinafter referred to as the US 288 A1 Publication) entitled“Detecting and Applying Database Schema Changes to Reports”, having a filing date of 17 November 2009 (Applicant: Sinha). The US 288 A1 Publication relates to a system and method to detect database schema changes whereby database schema changes are detected via triggers. The changes are also detected by comparing database schemas and determining modifications to database tables and database columns. The invention in the US 288 A1 Publication utilizes only database information which includes tables and columns only to detect the changes. Further, the invention as disclosed in US 288 A1 Publication provides for the result of analysis to be directly applied to the end process.

With reference to the above-mentioned disclosures, there is indeed a need for an improved system and method for impact analysis of change request that affects database structure.

SUMMARY OF INVENTION

The present invention relates to a system and method for impact analysis of change request that affects database structure through classification and keyword similarity analysis. In particular, the present invention utilizes classification, tagged words and keyword similarity in detecting impact to non related requirements when a change request is proposed. The present invention incorporates a system and methodology of classification and keywords similarity analysis that improves impact analysis on requirement dependencies, whenever there is occurrence of changes to database structure. The impact analysis on system requirement of the present invention improves the visibility of the related changes when a request for change request is required in a database structure.

One aspect of the invention provides a system (100) for impact analysis of change request that affects database structure through classification and keyword similarity analysis. The system comprising a client application (102) at client side for input interaction between user and server enabling the user to configure an input interaction with the server or repository (1 10); a server side comprising a Requirement Relationship Knowledge Builder Engine (104) for gathering information from Requirement Book Knowledge Base, Domain Knowledge Base and Named Entity Recognition repository to generate direct tagged keyword; a Data Mining Engine (106) for building classification, indirect tagged keyword and related word based on direct tagged keyword generated by the Requirement Relationship Knowledge Builder Engine (104); at least one repository (1 10) for storing named entity relationship and engine data classification, tagged keyword and related words; and at least one Knowledge base storage (1 12) of Domain Knowledge Base and Requirement Book Knowledge Base. The server side further comprising an Impact Analysis Engine (108) for analysing requirements that are impacted of change request based on input from the Requirement Relationship Knowledge Builder Engine (104) and Data Mining Engine (106).

Another aspect of the invention provides that the Domain Knowledge Base includes information related to keyword’s domain classification and its related words.

A further aspect of the invention provides that the Requirement Book Knowledge Base includes relationship information between requirement, algorithm, database table and database column and column description; wherein each requirement comprises at least one algorithm; wherein each algorithm comprises at least one database table and wherein each database table comprises at least one column and its description.

Yet another aspect of the invention provides that the Domain Knowledge Base and Requirement Book Knowledge Base is used as an input to all processes and engine.

Still another aspect of the invention provides that the repository (1 10) includes Named Entity Recognition repository and related word repository.

Another aspect of the invention provides a method (200) for impact analysis of change request that affects database structure through classification and keyword similarity. The method comprising steps of generating direct tagged keyword through Requirement Relationship Knowledge Builder Engine that gathers information from Requirement Book Knowledge Base, Domain Knowledge Base and repository (202); executing data mining process through Data Mining Engine by using the direct tagged keywords to build classification, indirect tagged keyword and related word (204); storing result of classification and related word in the repository (206); initiating impact analysis process when a change request is made (208); initiating analysis to identify related database objects requirement that would be impacted due to the change request by retrieving related requirements from Requirement Book Knowledge Base (210); obtaining direct tagged keywords, D{1... M} and indirect tagged keywords, ID{1...X} related to identified related database objects requirement by sending query to the related word repository (212); performing impact analysis to determine highest indirect tagged keyword based on similarity analysis (214); and generating impact analysis report based on the highest indirect tagged keyword identified, HID{1 ,... ,n} to notify indirect or other requirements that are impacted of change request (216). The step for obtaining direct tagged keywords, D{1... M} and indirect tagged keywords, ID{1...X} related to identified related database objects requirement by sending query to the related word repository (212) further comprises steps of (400) selecting at least one indirectly tagged keyword, IDx from a list of indirectly tagged keywords, I D{ 1... X}) which was retrieved from the related word repository (402); determining if the indirectly tagged keyword, IDx is higher value than the value of the indirectly tagged keywords, ID{1...X} (404); returning set of list of highest indirect keyword if the indirectly tagged keyword, IDx is higher value than the value of the indirectly tagged keywords, ID{1...X} (406); if the indirectly tagged keyword, IDx is lower value than the value of the indirectly tagged keywords, ID{1...X} obtaining classification CID (1....Y) from each indirectly impacted tagged keyword, IDx (408); retrieving related classifications, CID (1...Z) for each direct tagged keyword, Dm from the related word repository (410); retrieving indirect related words, WID[1 _ I] from the related word repository for each CIDy (412); retrieving direct related words, WD[1....L] from the related word repository for each CDz (414); comparing related word for each indirect WID[1 _ I], with each direct related work of WD[1....L] using cosine similarity (416); and

obtaining highest similarity average value as part of relevant indirect keyword which traces back to indirect related requirements from results obtained (418).

A further aspect of the invention provides that the step of generating direct tagged keyword through Requirement Relationship Knowledge Builder Engine that gathers information from Requirement Book Knowledge Base and Named Entity Recognition repository (202) further comprises steps of (300) generating each requirement related column by using Requirement Book Knowledge Base (302, 302a); tagging each Requirement Book Knowledge Base column with keywords using Named Entity Recognition repository (304, 304a); and generating classification and related words for each tagged keywords of each Requirement Book using Data Mining Engine (306).

Still another aspect of the invention provides that indirect or other requirements that are impacted of change request are retrieved from requirement knowledge base using HID which traces back to classification, tagged keyword and related word repository.

A further aspect of the invention provides that the step of generating each requirement related column by using Requirement Book Knowledge Base (302, 302a) further comprises providing a set of related columns for each requirement.

Still another aspect of the invention provides that the step of comparing related word for each indirect WID[1 _ I], with each direct related work of WD[1....L] using cosine similarity

(416) further comprises determining similarity of words from set of related words until all related words have been selected and computed for similarity.

The present invention consist of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.

BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS

To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings.

FIG. 1.0 is a block diagram illustrating system of the present invention.

FIG. 2.0 is a flowchart illustrating general methodology of the present invention.

FIG. 3.0 illustrates a schematic diagram of a Requirement Relation Knowledge Builder Engine of the present invention.

FIG. 4.0 is a flowchart illustrating the steps involved in selecting part of relevant information indirect keyword which traces back to indirect related requirements of the present invention.

FIG. 5.0 illustrates an example of a Highest Indirect Keyword.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention relates to a system and method to improve impact analysis of change request that affects database structure through classification and keyword similarity analysis. In particular, the present invention utilizes classification, tagged words and keyword similarity in detecting impact to non related requirements when a change request is proposed. Hereinafter, this specification will describe the present invention according to the preferred embodiments. It is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned without departing from the scope of the appended claims.

Reference is first made to FIG. 1 .0 which is a block diagram illustrating the system of the present invention. As illustrated in FIG. 1.0, the system (100) for impact analysis of change request that affects database structure through classification and keyword similarity analysis of the present invention comprising a client application (102) at client side for input interaction between user and server enabling the user to configure an input interaction with the server or repository (1 10) and a server side. The configuration includes uploading of related requirement documents or initiate execution of process. The server side comprising a Requirement Relationship Knowledge Builder Engine (104) for gathering information from Requirement Book Knowledge Base, Domain Knowledge Base and Named Entity Recognition repository to generate direct tagged keyword whereby the Requirement Relationship Knowledge Builder Engine enables generation of direct tagged keyword; a Data Mining Engine (106) for building classification, indirect tagged keyword and related word based on direct tagged keyword generated by the Requirement Relationship Knowledge Builder Engine (104); an Impact Analysis Engine (108) for analysing requirements that are impacted of change request based on input from the Requirement Relationship Knowledge Builder Engine (104) and Data Mining Engine (106); at least one repository (1 10) for storing named entity relationship and engine data classification, tagged keyword and related words; and at least one Knowledge base storage (1 12) of Domain Knowledge Base and Requirement Book Knowledge Base. The Domain Knowledge Base includes information related to keyword’s domain classification and its related words while the Requirement Book Knowledge Base includes relationship information between requirement, algorithm, database table and database column and column description; wherein each requirement comprises at least one algorithm; wherein each algorithm comprises at least one database table and wherein each database table comprises at least one column and its description. The Domain Knowledge Base and Requirement Book Knowledge Base is used as an input to all processes and engine. The repository (1 10) includes Named Entity Recognition repository and related word repository.

As illustrated in FIG. 1.0, the present invention is initiated with requirement knowledge building through the Requirement Relationship Knowledge Builder Engine (104) which gathers the information from the Domain Knowledge Base and Requirement Book Knowledge Base. The repository specifically a Named Entity Recognition repository generate direct tagged keyword meaning keywords that are related to the requirement. Subsequently, it is followed by using the direct tagged keywords to build classification, indirect tagged keyword and related word through Data Mining Engine (106). Indirect tagged keywords are keywords indirectly to requirement. The result of classification and related word is then stored in the repository (1 10). As soon as a change request is made, an analysis is done by the Impact Analysis Engine (108) to identify the related database objects that would be impacted due to the change request by retrieving related requirements from the Requirement Book Knowledge Base (1 12). A further query is made to the repository (1 10) to obtain both direct and indirect tagged keyword related to the identified related requirement. Subsequently, impact analysis will be performed to determine the highest indirect tagged keyword based on similarity analysis. Finally, based on the highest indirect tagged keyword identified, a report is generated to notify the indirect or other requirements that are impacted of the change request. The methodology will be described in detail in the following paragraphs.

Reference is now made to FIG. 2.0 which is a flowchart illustrating the general methodology of the present invention. As illustrated in FIG. 2.0, the method (200) for impact analysis of change request that affects database structure through classification and keyword similarity analysis is divided into two phases, Phase A and Phase B. Phase A is a pre-requisite of Phase B. Phase A provides the steps involve for constructing the required knowledge base for the execution of Phase B. Phase B provides building blocks of the entire similarity impact analysis process for Impact Analysis Engine. The method (200) as illustrated in FIG. 2.0 comprising steps of first generating direct tagged keyword through Requirement Relationship Knowledge Builder Engine that gathers information from Requirement Book Knowledge Base and repository (202). It is followed by executing data mining process through Data Mining Engine by using direct tagged keywords to build classification, indirect tagged keyword and related word (204). The result of classification and related word is stored in the repository (206). Similarity impact analysis process is initiated when a change request is made (208). Subsequently, analysis is initiated to identify related database objects requirement that would be impacted due to the change request by retrieving related requirements from Requirement Book Knowledge Base (210). Direct tagged keywords, D{1... M} and indirect tagged keywords, ID{1...X} related to identified related database objects requirement are obtained by sending query to the related word repository (212).

Thereafter, impact analysis is performed to determine highest indirect tagged keyword based on similarity analysis (214); and subsequently generating impact analysis report based on highest indirect tagged keyword identified, HID{1 ,... ,n} to notify indirect or other requirements that are impacted of change request (216). All indirect or other requirements that are impacted of change request are retrieved from Requirement Book Knowledge Base (1 12) using Human Interface Device, HID which traces back to classification, tagged keyword and related word repository.

Reference is now made to FIG. 3.0 which illustrates a schematic diagram of the Requirement Relation Knowledge Builder Engine of the present invention. As illustrated in FIG. 3.0, the process in the Requirement Relation Knowledge Builder Engine in generating requirement knowledge building through Requirement Relationship Knowledge Builder Engine that gathers information from Requirement Book Knowledge Base and Named Entity Recognition repository (202) further comprises steps of (300) first generating each requirement related column by using Requirement Book Knowledge Base (302, 302a). Generating each requirement related column by using Requirement Book Knowledge Base (302, 302a) further comprises providing a set of related columns for each requirement. Once the related columns are generated, each requirement would have a set of related columns. For example, requirement RB1 would have {C1 , D1 } as its related columns and its description.

The method continues with tagging each Requirement Book Knowledge Base column with keywords using Named Entity Recognition repository (304, 304a). The tagging process comprises of extracting the entity name from the column’s description and tagged the column with the entity name. For example, C1 columns has description as“This column is related to employee salary based on their position", the tagging process then extract the entity name from the description which tagged the column as“Payslip”,“Staff”,“Money” and “Company”. At the end of the process, each Requirement Book would have set of related tagged keywords based on its set of columns. For example, RB1 would have {K1 , K2,... Kn}. Subsequently, classification and related words are generated for each tagged keywords of each Requirement Book using Data Mining Engine (306). Tagged keywords classification is generated by using Naive Bayes algorithm which uses Requirement Book Knowledge Base as its features reference of the tagged keywords. Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, tagged keyword “Staff” will be classified as “Person” if it has features characteristics as a person such as gender, race, address and others. On the other hand, in the same process, related words of tagged keywords are generated using Latent Semantic

Analysis to discover a new keyword based on meaning behind the words based on certain topic or domain. For example, tagged keyword“Staff” will generate related words such as “Employee”,‘Personnel”,“Team” and others. In the end of the process, all requirements and related result which include stagged keywords, classification and related words are stored in the repository as an input for the next process.

Reference is now made to FIG. 4.0 which is a flowchart illustrating the steps involved in selecting part of relevant information indirect keyword which traces back to indirect related requirements of the present invention. As illustrated in FIG. 4.0, the step of obtaining direct tagged keywords, D{1... M} and indirect tagged keywords, ID{1...X} related to identified related database objects requirement by sending query to the related word repository (212) further comprises steps of (400) first selecting at least one indirectly tagged keyword, IDx from a list of indirectly tagged keywords, ID{1...X} which was retrieved from the related word repository (402). It is further determined if the indirectly tagged keyword, IDx is higher value than the value of the indirectly tagged keywords, ID{1...X} (404). The set of list of highest indirect keyword is returned if the indirectly tagged keyword, IDx is higher value than the value of the indirectly tagged keywords, ID{1...X} (406). If the indirectly tagged keyword, IDx is lower value than the value of the indirectly tagged keywords, ID{1...X}, proceeding to obtain classification CID (1....Y) from each indirectly impacted tagged keyword, IDx (408). Thereafter, related classifications, CID (1...Z) are retrieved for each direct tagged keyword,

Dm from the related word repository (410); and indirect related words, WID[1 _ I] are retrieved from the related word repository for each CIDy (412). Subsequently, direct related words, WD[1....L] are retrieved from the related word repository for each CDz (414a) and each related word is compared for each indirect WID[1 _ I], with each direct related work of

WD[1....L] using cosine similarity (416). The highest similarity average value is obtained as part of relevant indirect keyword which traces back to indirect related requirements from results obtained (418). In comparing related word for each indirect WID[1 _ I], with each direct related work of WD[1....L] using cosine similarity (416) further comprises determining similarity of words from set of related words until all related words have been selected and computed for similarity.

Reference is now made to FIG. 5.0 which illustrates an example of the Highest Indirect Keyword. As illustrated in FIG. 5.0, in this example, the related words of 2 classification {CIDWy, CDWz} is represented as WID[1 ..J] and WD[1 ..L] For example, if the 2 classification are“Human Resource” and“Finance”, CIDWy would represent“Database” and CDWz would represent “Finance”. Each classification has its own related words. For example, CIDWy which represent “Human Resource” has related words such as

“Employee”, “Interview” and “Job”, WID1 would represent “Employee”, WID2 would represent“Interview” and WID3 would represent“Job”. Whereby CDWz which represent “Finance” has related words such as“Salary”,“Billing” and“Profit”, WD1 would represent “Salary”, WD2 would represent“Billing” and WD3 would represent“Profit”. The process then select first classification related words, WIDn i.e WID1 from the set of related words {WID1 , WID2, WID3}. The process continues by calculating the similarity of WID1 with WDm i.e. {WD1 , WD2, WD3} by using the cosine similarity until all the set of related words from WD have been selected and calculated the similarity. Numbers such as 0.8, 0.7, 0.6 and etc represent the similarity between WDm and WID1. For example, the similarity between the first set of related words WID i.e. WID1 and the classification’s related words i.e. WD1 is shown as 0.8. The steps of selecting the related words WIDn and calculating the similarity of WDm and WIDn are repeated until there are no more related words to be selected. Once these steps are completed, the process returns the set of the highest indirect keywords HIDn for each keyword.

The present invention incorporates a system and methodology of classification and keywords similarity analysis that improves impact analysis on requirement dependencies, whenever there is occurrence of changes to database structure. The impact analysis on system requirement of the present invention improves the visibility of the related changes when a request for change request is required in a database structure.

Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.

Throughout this specification, unless the context requires otherwise, the word“comprise”, or variations such as“comprises” or“comprising”, will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term“comprising” is used in an inclusive sense and thus should be understood as meaning“including principally, but not necessarily solely”.