Browse Prior Art Database

Intelligent Reduction of Noise in Big Data with Report Based Filtering Technique

IP.com Disclosure Number: IPCOM000238014D
Publication Date: 2014-Jul-25
Document File: 4 page(s) / 74K

Publishing Venue

The IP.com Prior Art Database

Abstract

1. ABSTRACT Today, Big Data innovation is running up against some formidable challenges: unchecked growth in data volumes leading to storage cost overruns, the immaturity and complexity of big data platforms, and the need to get insights from all the data, much faster.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 4

Intelligent Reduction of Noise in Big Data with Report Based Filtering Technique


2. SOLUTION ARCHITECTURE

Before getting into the details of the approach, there is a brief overview as how a big data analytics platform today stores the data and thus helps to analyze the big data to develop analytics dashboard with a sample scenario.

Existing Data Indexing Approach:

Figure 1: Building Blocks of Analytics Dashboard in Big Data Platform

Figure 1 explains the process followed by big data analytics platform to index the data and generate analytics out of the indexed data. Once the data is indexed, the users are allowed to write pipes that analyzes the data and generate charts for visualization of data.

Sample Scenario for Consideration:

For a sample scenario, let's assume we are trying to build an analytics dashboard for monitoring the key system metrics like CPU, Memory and IO activities. Here is a sample input data received from the individual systems that we are trying to monitor, since we are not sure which data is useful for analytics we send all the relevant data for analytics from individual systems every 3 minutes.

{ "%user":40, "IFACE":"", "txkB/s":0, "datetime":1360038600000, "rxpck/s":15, "%commit":0, "

1


Page 02 of 4

%memused":60, "source":"vmhost2230", "%nice":10, "txpck/s":10, "ldavg-15":0, "rxkB/s":0, " %swpused":0, "bread/s":10, "bwrtn/s":432, "processes":[ { "%MEM":9.3, "COMMAND ":"/opt/IBM/WebSphere/AppServer/java/bin/java", "PID":28769, "USER":"root", "%CPU":0.3 }, { " %MEM":0.7, "COMMAND":"nautilus", "PID":2555, "USER":"root", "%CPU":0.1 }, { "%MEM":0, " COMMAND":"[ksoftirqd/0]", "PID":4, "USER":"root", "%CPU":0.1 } ], "txerr/s":0, "disks":[ { " Available":31794234, "Used":8223634, "Capacity":0.88, "Mounted_on":"/boot", "Filesystem ":"/dev/vda1", "1024-blocks":99150 }, { "Available":44163968, "Used":14530252, "Capacity":0.25, " Mounted_on":"/", "Filesystem":"/dev/vda2", "1024-blocks":61834620 }, { "Available":39612566, " Used":2763454, "Capacity":0.01, "Mounted_on":"/dev/shm", "Filesystem":"tmpfs", "1024-blocks ":1961532 } ], "%iowait":55, "ldavg-5":0, "rxerr/s":0, "ldavg-1":0, "%system":31 }

Let's see how a pipe is designed to retrieve the data from this sample data and generate visualization for CPU utilization over the period of time.

def cpu_usage(index, configuration):

events = search_datetimefacets(index, 'sysmonitor', query, ['%user','%system
','%nice'], interval) return chart_multiarea(events, keys)

A pipe is basically a user defined entity that allows searching the indexed data based on facets and allows plotting the searched data as visualization. As pipe is the building block of analytics dashboard, the pipe takes the final decision on...