Browse Prior Art Database

Method and System for Providing a Database Storage Apparatus for Similarity Matching over Data Series using Dictionaries

IP.com Disclosure Number: IPCOM000236902D
Publication Date: 2014-May-21
Document File: 4 page(s) / 82K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for providing an efficient database storage apparatus for similarity matching over data series using dictionaries.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 38% of the total text.

Page 01 of 4

Method and System for Providing a Database Storage Apparatus for Similarity Matching over Data Series using Dictionaries

Increasing amounts of data capturing digital traces are created from daily activities of users. Collected data provides an opportunity to find insights in new and emerging types of data and content. The collected data provides an opportunity to also answer questions for the purpose of one or more of , but not limited to, scientific discoveries, business intelligence and combat fraud/crime. The collected data is becoming so large and complex to process and store using traditional databases or data processing algorithms. Managing and processing such amount of data presents formidable challenges as the data require more adapted data management systems. The data management systems need to handle large amount of data and that scale to keep up with the growth . Data processing systems need a representation of the data that allows the data to be stored and processed more efficiently.

Data series: a time ordered sequence of pairs of (time, v), where v is an object of a general type (e.g. entity, object, numerical value, categorical value, etc.), ∈

and time N is the instant of time when value v occurred.

Time series: a particular case of data series where the value is a numerical value.

Data series segment: given two instants of time, t1 and t2 N , with t1 <t2, a data series segment is a the subset of pairs (time, value) of the data series that respects t1 <= time <= t2.

A dictionary is a collection of elements (called "atoms"). A data series segment is represented using a dictionary by expressing the data series segment as a linear combination of the dictionary atoms.

Disclosed is a method and system for providing an efficient database storage apparatus for similarity matching over data series using dictionaries. The method and system accelerates database pattern matching queries on multidimensional data series . The method and system includes segmenting an original data series into data series segments along one dimension. Each data series segment is represented as a linear combination of dictionary atoms. Thus, a compressed representation of the data series segment is created. The compressed representation in a database is stored . Similarity matching of patterns that are specified by the user is carried out by translating patterns into a query that is run on top of a database and that searches the pattern over the database of compressed representations of the data series segments . The method accuracy can be adjusted by deciding the number of coefficients to consider in the result (i.e., to consider the first k coefficients only); this is based on the exponential decay property that is respected by the data series segment representation over a dictionary and that states the first coefficients have more important contributions than the last ones.

In the first step, data series is divided into data series segmen...