Browse Prior Art Database

Method and Apparatus for Using SSD as Intermediary Storage in Enabling End User to Make Tradeoff in Freshness, Completeness, and Relevance of Data for Better Analysis Performance

IP.com Disclosure Number: IPCOM000239102D
Publication Date: 2014-Oct-10
Document File: 7 page(s) / 250K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed are methods and an apparatus for using solid-state drives (SSDs) as intermediary storage between the enterprise data warehouse and the client-side to enable users to make tradeoffs between data freshness, completeness, and relevance for better analysis performance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 21% of the total text.

Page 01 of 7

Method and Apparatus for Using SSD as Intermediary Storage in Enabling End User to Make Tradeoff in Freshness, Completeness, and Relevance of Data for Better Analysis Performance

Enterprise big data analytics is critical in providing enterprise users actionable analysis. However, the analysis of such massive data is often too slow to be actionable in real time. Enterprises strive for rapid analysis and are impatient with long analytics response time. In extreme cases, the response from the analytical queries may already be out of date when results are returned . Therefore, enterprise big data analytics that are within a predictable confidence interval of the fully accurate value, but available much faster than the exact analysis, may be more useful to users when the fully accurate results are not needed.

Solutions to the analytics performance issues have been focused on performance tuning of the data warehouses . Data

warehouses are often large and centralized. Replacements and upgrades can be expensive, risky, and labor intensive. Yet, critical decisions are dependent on the analysis of these data sources. The common bottleneck is the Hard Disk Drive (HDD) input/output (I/O) throughput. It is well known that HDD I/O speeds cannot quickly increase to keep up with the increasing amounts of data. Some solutions help to increase data warehouse performance by exploiting features such as single instruction multiple data (SIMD) instructions and a larger on-chip cache per processor. All data is stored in-memory with just a copy of the in-memory image on disk. While such solutions offer significant improvements, enterprise users often require increasingly drastic improvements, due to not only the growing volumes of data, but also the speed at which this data is becoming available.

Recent work on solid-state drives (SSDs) also attempts to further reduce this data size and processing speed gap by utilizing SSDs as a caching layer. These solutions still have a number of limitations, however, such as the need for an all SSD cache to be frequently invalidated when data becomes stale. Therefore, most current approaches use SSDs as cache (by extending the database bufferpool with SSDs) only in shared architecture to simplify the cache invalidation process. Such SSD caching techniques fail to deliver the scaling and elasticity that today's analytical systems demand. More importantly, most analytical queries tend to have a low cache-hit ratio (unlike the transactional processing workload); therefore, the benefit of using SSDs as a cache in a shared architecture is further limited to analytical queries. Thus, new innovative solutions are required to close the gap between data size increase and processing speed.

This disclosure presents methods and an apparatus for using SSDs as intermediary storage between the enterprise data

warehouse and the client-side to enable users to make tradeoffs between data freshness , completeness, and relevance for be...