Browse Prior Art Database

Compression of Multivariate Data

IP.com Disclosure Number: IPCOM000249797D
Publication Date: 2017-Apr-05
Document File: 4 page(s) / 184K

Publishing Venue

The IP.com Prior Art Database

Abstract

It is suggested to use a similar approach of data compression by dimension reduction using linear (e.g. PCA/ICA) or non-linear (e.g. neural network) methods for the optical spectra. There might be even more possible applications for this as other downhole tools need to transmit various other multivariate data or time-based curves.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Compression of multivariate data

Motivation, prior art It is desired to transmit high-resolution multivariate data (optical absorption and fluorescence spectra)

over the slow mud-pulse telemetry channel.

One existing approach for compression of similar data is given in U.S. patent 8004279 ”REAL-TIME NMR

DISTRIBUTION WHILE DRILLING“, which uses principal component analysis (PCA) or independent

component analysis (ICA) for dimension reduction of the NMR echo train.

Patent publication US 20080036457 “NMR Echo Train Compression” is a predecessor of this and claims

use of linear as well as non-linear (neural network based) analysis.

These patent documents only cover NMR data and the claims need an “NMR sensing apparatus”.

Brief description It is suggested to use a similar approach of data compression by dimension reduction using linear (e.g.

PCA/ICA) or non-linear (e.g. neural network) methods for the optical spectra.

There might be even more possible applications for this as other downhole tools need to transmit

various other multivariate data or time-based curves.

Background

Principal Component Analysis PCA mathematically tries to find new basis vectors for the data. The basis vectors are chosen in a way

that the first one is aligned with the maximum variance in the data. The next basis vector is aligned with

the maximum variance perpendicular to the first vector and so on.

Independent component analysis (ICA) can be used as a replacement for principal component analysis. It

differs in the way the new basis is created. Both methods will be commonly called component analysis

(CA) in the remaining part of this document.

Data can be converted from and to the new coordinate system losslessly.

For compression it is usually enough to keep only the first n dimensions of the data in the new

coordinate system. With these reduced dimensions it is still possible to get a good reconstruction of the

original data by back-transformation into the original space.

This way it is possible to use CA for lossy compression of multidimensional data. The amount of loss can

be adjusted by choice of n.

Since CA only uses linear operations, it can only compress linear relationships. E.g. shifting the position of

a peak in a spectrum is a case that cannot be well explained by linear relationships with respect to the

channels. In such a case, a high value for n must be chosen or stronger compression artifacts must be

accepted.

Optical absorption spectra of fluid mixtures are basically linear combinations of the spectra of individual

pure components, so in theory they should be compressible using CA.

Unfortunately, formation fluids are very complex mixtures from lots of substances. Spectrometers have a

dynamic range limit of a few optical densities, which introduces non-linear behavior when the limit of

the dynamic range is reached (e.g. water in NIR, dark oil in visible range). Optical scattering may cause

additional variation the spectra (baseline effects).

Autoencoders a...