Browse Prior Art Database

Method of limiting I/O operations for spatial data in data warehouses

IP.com Disclosure Number: IPCOM000248962D
Publication Date: 2017-Jan-24
Document File: 5 page(s) / 488K

Publishing Venue

The IP.com Prior Art Database

Abstract

One of the challenges of modern data warehouses is amount of data which is processed per every query. To limit the resource consumption and amount of I/O operations on disks (which is usually the bottleneck of the whole system) Netezza company before acquisition by IBM, come up with low level statistics called zone maps: on extend level (very small chunk of data) some basic statistics are kept, i.e. min and max value of particular column. If query is asking about data which has nothing in common with extend, this extend is not read from the disk at all.

Basically this approach works great for integers. We are proposing new method which will optimize the zone maps for spatial kind of data. What is more, we are proposing new function which will greatly cooperate with new low level statistics and optimize the distance related queries. What we want to protect with this disclosure is method for limiting read operations on spatial columns in data warehouse environment. Said method is based on low level statistics (statistics kept on small chunks of data level) set of reference points which can be pre-defined by system administrator distance metric calculated based on real transportation map (including any kind of transport: railways, roads, air or see transport etc.).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

1

Method of limiting I /O operations for spatial data in data warehouses

What is more, the proposed method allows to easily implement very fast method for returning all points from the database which are within defined range of travel time (i.e. not farer than 5 minutes of travel etc.).

There are few advantages of the proposed solution: 1. it allows to create the new kind of low level statistics dedicated for spatial

data 2. it allows to implement very efficient travel distance (efficient from I/O

operations point of view) 3. it is scalable (number of reference points defines the accuracy of low level

statistics)

Business value of the solution is connected with performance improvement – system is able to not read the data chunks which are not important for query processing. Additional value is achieved due to highly efficient travel distance function.

The novel claims are: - Novel approach to limiting the read operations on spatial data by creation of

spatial data low level statistics - Introduction of highly efficient travel distance function - As a related embodiment: proposal of distance metric to be used for zone map

calculation based on real transportation map

Background

Low level statistics, kept on the small chunk of data level (further called extends) are known and already protected by patent: US 6,973,452B2

https://www.google.com/patents/US6973452?dq=US+6,973,452+B2&hl=pl&sa=X&ei =QZLfU9zuNern7AaGjYF4&ved=0CBwQ6AEwAA

In the currently implemented approach a zone map is an internal mapping structure to show the range (min and max) of values within each extend.

The idea described in this disclosure is to store a bit different kind of statistics in order to minimize amount of read data.

Details:

The main idea is to introduce new kind of low level statistics for spatial data, based on the set of reference points and the knowledge base of already existing maps with transportation roads.

1. Low level statistics and reference points

2

By reference points we understand the set of points which will remain unchanged for given period of time. The points can be easily defined by latitude and longitude. Those can be chosen randomly (evenly spread across the world sphere) or can be chosen logically as i.e.: places where the stores of the particular company are located. I.e.: { 71°18′N; 156°46′W // Barrow , 68°58′N; 33°05′E //Murmansk}

Idea is to keep another type of low level statistics for every extend: the maximum and minimum distance from defined reference points. The distance metric can be chosen according to the purpose of the usage. We are proposing to create the distance based on the time required to get from one point to another based on the pre-loaded transportation maps.

I.e.: lets consider one extend:

Extend 1

value in the column 71°18′N; 156°46′W // Barrow

68°58′N; 33°05′E // Murmansk

79°59′N; 85°56′W //Eureka

13h 5min 25h

70°40′N; 23°41′E // Hammerfest

36h 9h 39min

70°12′N; 148°31′W // Deadhorse

45min 41h

In columns we have reference p...