Compression of Multivariate Data
Publication Date: 2017-Apr-05
The IP.com Prior Art Database
It is suggested to use a similar approach of data compression by dimension reduction using linear (e.g. PCA/ICA) or non-linear (e.g. neural network) methods for the optical spectra. There might be even more possible applications for this as other downhole tools need to transmit various other multivariate data or time-based curves.
Compression of multivariate data
Motivation, prior art It is desired to transmit high-resolution multivariate data (optical absorption and fluorescence spectra)
over the slow mud-pulse telemetry channel.
One existing approach for compression of similar data is given in U.S. patent 8004279 ”REAL-TIME NMR
DISTRIBUTION WHILE DRILLING“, which uses principal component analysis (PCA) or independent
component analysis (ICA) for dimension reduction of the NMR echo train.
Patent publication US 20080036457 “NMR Echo Train Compression” is a predecessor of this and claims
use of linear as well as non-linear (neural network based) analysis.
These patent documents only cover NMR data and the claims need an “NMR sensing apparatus”.
Brief description It is suggested to use a similar approach of data compression by dimension reduction using linear (e.g.
PCA/ICA) or non-linear (e.g. neural network) methods for the optical spectra.
There might be even more possible applications for this as other downhole tools need to transmit
various other multivariate data or time-based curves.
Principal Component Analysis PCA mathematically tries to find new basis vectors for the data. The basis vectors are chosen in a way
that the first one is aligned with the maximum variance in the data. The next basis vector is aligned with
the maximum variance perpendicular to the first vector and so on.
Independent component analysis (ICA) can be used as a replacement for principal component analysis. It
differs in the way the new basis is created. Both methods will be commonly called component analysis
(CA) in the remaining part of this document.
Data can be converted from and to the new coordinate system losslessly.
For compression it is usually enough to keep only the first n dimensions of the data in the new
coordinate system. With these reduced dimensions it is still possible to get a good reconstruction of the
original data by back-transformation into the original space.
This way it is possible to use CA for lossy compression of multidimensional data. The amount of loss can
be adjusted by choice of n.
Since CA only uses linear operations, it can only compress linear relationships. E.g. shifting the position of
a peak in a spectrum is a case that cannot be well explained by linear relationships with respect to the
channels. In such a case, a high value for n must be chosen or stronger compression artifacts must be
Optical absorption spectra of fluid mixtures are basically linear combinations of the spectra of individual
pure components, so in theory they should be compressible using CA.
Unfortunately, formation fluids are very complex mixtures from lots of substances. Spectrometers have a
dynamic range limit of a few optical densities, which introduces non-linear behavior when the limit of
the dynamic range is reached (e.g. water in NIR, dark oil in visible range). Optical scattering may cause
additional variation the spectra (baseline effects).