Browse Prior Art Database

A method uses dynamic anonymization to protect customer sensitive data during analytics

IP.com Disclosure Number: IPCOM000249015D
Publication Date: 2017-Jan-26
Document File: 6 page(s) / 145K

Publishing Venue

The IP.com Prior Art Database

Abstract

The invention indicates a way to run dynamic anonymization before running analytic algorithms, in order to protect customer's analytic results which might contain sensitive data. User can decide which fields can be sensitive, and then dynamic anonymization will run different anonymization algorithm base on field type of selected fields and analytic algorithm before running the analytics. This method will not change original analytic algorithm itself, just add a step before analytic algorithms to achieve the goal.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

1

A method uses dynamic anonymization to protect customer sensitive data during analytics

Lots of products contain the analytic algorithms to help customer to generate analytic results and help them to do decision making. The final analytic result will be shown in table, visualization diagram and etc. Customer could share these outputs to other users by email or social network.

If customers’ data contain sensitive data or privacy data that customers don’t want to let other people see, how to protect these sensitive data of analytic result? This invention introduces a method to use dynamic anonymization to protect customers’ sensitive data, and it will not break the analysis result.

The reason that analysis result will not be broken: Most of the statistical/data analytic methods are invariant to certain kinds of transformations on the dataset in order to maintain their robustness. For example, Analysis of Variance (ANOVA) is invariant to affine transformation, i.e., location and scale change in univariate cases. If user decides to protect a continuous field then for ANOVA algorithm, an affine transformation with anonymous location and scale parameters can be applied to this field. And with this anonymized dataset, the analysis results, such as P-value, R-square, remain the same with the original dataset. And for different analytic algorithms, the function that will be used for anonymization depends on the analytic algorithm’s invariant properties.

2

Figure 1 Brief process runs dynamic anonymization

The core ideas of this invention:

1. The threshold of anonymous algorithms come from user token, different user session will have different threshold even selected anonymous algorithm is same.

2. Because Analytic contains different analytic algorithms, the anonymous algorithm should be chosen dynamically base on field type in order to prevent breaking analytic result.

The advantage of this invention is to let customer decides when to share sensitive data flexibly, and don’t need to change data characters itself to ensure correct analytic result.

When user A logs on, the user will get his token. When the user wants to upload the data source file, system will use user’s token to generate a threshold. This threshold will be used by his own. If this user A logs on the system in another browser, means get another token, a new threshold will be generated. Figure 2 shows how the threshold will be generated. Of course, the threshold could be generated by different way, not limited by user token. The threshold also could be generated by other way, but the...