Browse Prior Art Database

A method to enable the fast meta data analysis for object storage system

IP.com Disclosure Number: IPCOM000252237D
Publication Date: 2017-Dec-29
Document File: 5 page(s) / 166K

Publishing Venue

The IP.com Prior Art Database

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 46% of the total text.

1

A method to enable the fast meta data analysis for object storage system

1 Background

1.1 Object metadata

Object consists of two parts of data: object data and object metadata. Usually, users can analyze both data over object storage. In our disclosure, it’s not related with object data analysis. We are focused on object meta data analysis. Object metadata consists of key and value with the format <key,value>. E.g. size=1024Kbytes, owner=userA, permission=read etc. Also, users can set many extended attributes for each object like the following Content-Length: 4194304 Accept-Ranges: bytes Last-Modified: Tue, 12 Apr 2016 06:55:56 GMT Etag: 1ef53578b2507003f5e7a5ab199c22bb X-Timestamp: 1460444155.60423 Content-Type: application/octet-stream X-Trans-Id: txd32ce97afd3e4f0cb853c-00570cab1f Date: Tue, 12 Apr 2016 08:00:31 GMT All these extended attributes are stored by object storage as metadata. Object metadata are managed by object storage and the applications can get these meta data by specific API/interface from object storage. For most object storage, the popular interface from object storage is REST API, for example, application can take the interface to get the meta data of one object: curl -i -I -H “X-Auth-Token: <TOKEN>” http://<BUCKET-NAME-URL>/<OBJECT- NAME> Another kind of interface will be wrapping the REST API into different program language interface, such as java, python.

1.2 Hadoop and Object metadata analysis

Hadoop is the most popular distributed computing technology. It is designed for data analysis. So far, there are different hadoop distributed file system adapters available for object data analysis. All adapters make the hadoop applications able to analyze the data in these object storage systems. Nowadays, more and more customers care about the metadata analysis to know the object data flow, how frequency of update of object, how to manage/organize the

2

data etc. However, for Hadoop, it only defines the API for data reading. If the hadoop workloads want to analyze the metadata of objects, the user needs to take the object storage API to dump the metadata into one file and then analyze the file, shown by the figure 1:

Figure 2 object metadata analysis with Hadoop It can be seen, users have to generate the metadata of object into files first and then they can start the analysis workloads. The efficiency is not good because these metadata files should be created first before analysis. If users want to analyze a lot of object metadata, it will take long time to do additional file creation.

2 The method in the solution

Our key idea is to provide a new method to analyze the object metadata directly, so that analytics applications can analyze the metadata of object directly and thus improve the analysis efficiency. There are many possible approaches to achieve the key idea. Here, we take Hadoop analysis workloads as example to show how to implement our idea.

Other Object

storages

OpenStack

Swift

Metadata

files

1 2

3

Figure3 enabl...

Processing...
Loading...