String Matching Algorithm for Supplier Deduplication – an Extension
Publication Date: 2015-Nov-02
The IP.com Prior Art Database
In Spend Analysis one of the important process is Vendor/Supplier spend consolidation. The major challange is to Group all same Vendor/Supplier and consolidate their spend, where data is coming from different ERP, same supplier enteries in the system with different string representation. Ex: IBM, I.B.M, IBM Corp, International Business Machines, Int'l Business Machines, International Business Machines Corporation, etc., There are various string searching/matching algortihms available to compare two string and find if they match. Even can deduce the percentage of match it has against the compared string. But, no Single algorithm can claim to be perfect or can solve the mystry of string matching with so many variations available in existence. The best way is to make use of different algorithms and also write something new by mining different data patterns available in Data. Time is very important when it comes to solve this critical challange. We can develope simple, less time consuming, pattern focused algorithm and club it with other alogrithms(like Fuzzy Match) to work on Vendor/Supplier Data and Group them and have improved quality of Supplier/Vendor Spend Consolidation. This particular article talks about one such pattern. Here the same supplier name is represented in different ways by different word position or character position with a word. Ex: Hotel Ibis - Ibis Hotel
Page 01 of 3
String Matching Algorithm for Supplier Deduplication - an Extension
User Scenario & Problem Statement:
• Issue Background: When organization procures product or services from its vendor/ suppliers, the vendor/supplier information are stored in system. Entries of Vendor/Supplier information are normally entered manually. This results in multiple instances of same vendor/supplier in the database.
• The complexity increases when a company have multiple division/branches/subsidiary and different system are used for record maintenance.
• If a procurement person want to know consolidated amount of spend with a particular vendor/supplier, it becomes difficult and time consuming to consolidate.
• We have different types of string matching algorithm for de-duplication and grouping similar vendor/supplier names like Fuzzy Matching.
• The challenge comes when same vendor/supplier name of multiple word are not in written in sequence, but they are exact same. And different string matching algorithm like Fuzzy Matching are not able to capture them. Examples below:
The concept is to remove all special characters and blank spaces and re-arrange all alphabets and numbers in a sequential order and group them. The process is a simple with a few lines of code.
This method of identifying same vendor/supplier also helps in identifying vendor/supplier with word having misplaced characters in a word. Examples:
Solution : Process Design
• Working model is designed using Acces...