Browse Prior Art Database

§ System and method for fuzzy search in distributed relational database to improve query performance

IP.com Disclosure Number: IPCOM000215133D
Publication Date: 2012-Feb-21
Document File: 7 page(s) / 142K

Publishing Venue

The IP.com Prior Art Database

Abstract

Nowadays, lots of internet scale applications have massive data only can be stored in distributed database system. Since distributed database system is based on a network topology and with scale to several thousand machines, fuzzy search will cost lots of effort to execute the query in multiple databases. This article propose a method to optimize the execute routing by select serveral candidate databases with could provide enough resultset for this query instead of execuate query in every database, and once the resultset num is achieves, the search is finished. This method will improve the performance of fuzzy search in distributed databases.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 7

§ System and method for fuzzy search in distributed relational database to improve query performance


1. Backgroud

Large internet Web sites applies distributed database system, especially using the scale-out approach.

From WikiPedia, "A distributed database is a database in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers."

But in current design, such as partition the massive data across lots of distributed mysql databases, and it provides an general data access layer to handle the logic about selecting which database to server the query scenarios for read.

Problems


1.Distributed database is a based on the network topology and with scale to several thousand machines. With difference of machines running for a long time and its current workload, the routed queries execution performance will be very different.


2.For fuzzy search, the distributed database engine actually have the flexibility to select the machines with best performance to execute the query. For example, a query like "Select * from product where title = 'HTC' limit 1.. 10", although the product database maybe partitioned to several hundreds machines, but for this query, we can just select1 machine which has more than 10 records, and execute, then return the data. It's not necessary to execute this query on every data partition.

Known solutions:


1.Execute the fuzzy search on each partition group, and then merge the result.

Drawback:
Performance is not good since it routes the query to lots of database instances , need wait for all data come back, then merge.

It doesn't have an intelligent module to decide which machine in which partition group to execute the query through analyzing the query according to partition metadata & partitioned groups' machines previous¤t workload status


2.For selecting the database instance inside a partition group, load balancer is normally used.

Drawback:
Load balancer and its variants are approaches to select the target database base on some static rule.

It doesn't monitor the history and current workload status of the database instances in the partitioned group.

Core idea of this Invention:

The core idea of this invention is system and method to improve query performance for fuzzy search in distributed relational database.
->When the query is fuzzy search, and has explicit limited size for the query result, there is no need to fetch data from each db in the distributed system ->Our system has a method to select the set of db which can return fuzzy search result back as fast as possible
->Once the result set achieves the limited size of the query, the search is finished

Advantages:

->Achieves much better performance for fuzzy search queries in distributed database system

1


Page 02 of 7

Detail Description:

Below is a diagram illustrate the system , and the interaction flow about how the sy...