Browse Prior Art Database

Method and system for search and rank the duplicate bug reports

IP.com Disclosure Number: IPCOM000235920D
Publication Date: 2014-Mar-29
Document File: 2 page(s) / 51K

Publishing Venue

The IP.com Prior Art Database

Abstract

During software testing, testing team and developer team use bug report to track the issue. If we found a bug report is similar or with same root cause with another bug report, product triage team will mark this bug report as duplicate to the one created more earlier. After a release complete, testing team will mearsure their effort and quality. Duplicate bug report will reduce the quality of a testing team. So we all want to aviod to open duplicate bug report as much as we can. On the other side, if a bug report is found as same as the one we created in previous release of product, that will be considered as a regression issue, it will be marked as a big issue and big finding of testing team. The mechanism mentioned in this article is to help tester find potential duplicate defect and rank them by priority base on the special data structure of a bug report.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 42% of the total text.

Page 01 of 2

Method and system for search and rank the duplicate bug reports

First is filter, base on some required fields of a bug which tester needs to input during creating a bug. Use these required fields information, a lot of noise data can be filtered out.

Second is preprocessing, the existing solution willuse typical method to do preprocessing. Like remove tokenize, remove stemming and make word tense consistent, etc. But in this invention, we will not only use these typical methods of preprocessing, but also base on special data structure of bug to remove more useless words. Like bug template information which use as the format/guide of a testing team.

Third is to do analyze, existing solution will directly use data mining algorithm and use the similarity result to clustering the bug reports. In this invention, we will firstly use data mining and natural language processing algorithms like TF/IDF and "simhash" to analyze the preprocessing data and get the similarity result. Besides this part, our algorithm will follow some rules which specific related to the information of bug fields, combine these two similarity result together with a formula, the final rank result will be generated.

For creating a new bug, some fields are required and will be inputted firstly. Like summary (title), description, "file against" (in which part of product we found this bug, always is the component name of product), "found in" (in which product version we found the bug). Summary and description contains more detail informationabout the bug, these information will be analyze using data miningalgorithm. "File against" field is helping determine which developer will be responsible for fixing the bug. So if two bugs are both found from product component, which means these two bugs will be more possible as duplicate than found from difference product components. This invention will compare the "file against" field value between the new bug and existing bugs. If the "file against" field value of an existing bug is same with the new one, this bug will be kept. Otherwise, it will be filtered out. With this step, we will remove many noise data.

After remove the noise data, we need to do preprocessing of existing bug report information. Each testing team has a bug report template which will guide tester to input the information needed for further analysis. This information is "stop word" for analyzing data with data mining algorithm and natural language processing algorithm. In data mining and natural language processing domain, preprocessing always contains tokenize, stemming, remove common "stop word", keep consistent of word tense, etc. In this invention, we add bug report template in preprocessing part, and treat it as part of "stop word". For example, the bug report template for description maybe contains these sections "Step to reproduce", "Actual Result", "Expected Result", "Detail error information", "Screenshot Attachment", etc.

After preprocessing the bu...