Browse Prior Art Database

Mechanism for Privacy Protection when using third party quality checking on textual data

IP.com Disclosure Number: IPCOM000029216D
Original Publication Date: 2004-Jun-18
Included in the Prior Art Database: 2004-Jun-18
Document File: 6 page(s) / 81K

Publishing Venue

IBM

Abstract

When quality checks on textual data are performed by a third party, this may conflict with the restriction not to disclose the textual data to any third party, which restriction may exist for reasons of privacy, confidentiality, or intellectual property. For example, when a spell checking service is implemented as a web service, it may be considered as risk to transmit the candidate text to a web site that performs the actual spell check, because the security policies followed by the third party cannot be controlled by the owner of the textual data. In the following, a method is disclosed which allows to keep a certain level of privacy while having a non-trusted party doing a quality check, for example a spell checking operation.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 26% of the total text.

Page 1 of 6

Mechanism for Privacy Protection when using third party quality checking on

textual data

Problem

When quality checks on textual data are performed by a third party, this may conflict with the requirement not to disclose the textual data to any third party, which requirement may exist for for reasons of privacy, confidentiality, or intellectual property.

    For example, when a spell checking service is implemented as a web service, it may be considered as risk to transmit the candidate text to a web site that performs the actual spell check, because the security policies followed by the third party cannot be controlled by the owner of the textual data .

  Current solutions to this problem are: Do all checking locally (no outsourcing to external service providers). Disadvantage: local maintenance of software and data for the quality checks (not desired in thin-client scenarios).

Fine-granular calls of web services (e.g. one call per token to be checked). Disadvantage: huge overhead; and, this solves the privacy problem only if different calls are routed to different web service providers . Maintain non-disclosure agreements between owner of the textual data and the involved service providers (big overhead; abuse not allowed in this case but still possible).

Solution

To avoid full disclosure of the textual data to be checked, do the following

Do reversible modifications to the textual data

Optionally, do irreversible modifications to the textual data

Submit the data to the service provider and indicate the types of checks desired

Service provider performs the desired checks

Retrieve the results from the service provider

If necessary, do the inverse operations of the operations done in step 1 to

transform the results into readable form again .

The advantage compared to prior art is that the text is not fully disclosed to te service provider: depending on the operations done in steps 1 and 2, illegal use of the text data by other parties than the data owner are inhibited or even made impossible, while still enabling quality checking by the service provider.

Details

    In this section, the 6 steps mentioned in section 2 ("Solution") are discussed in more detail.

1. Do reversible modifications to the textual data

    Using the original textual data T as input, create a modified version of the textual data T' as output. If needed, T can be reconstructed from T' again later on.

    Here are examples for reversible modifications on the textual data, of which one or more can be used:
1.1) Examples for reversible modifications that leave the textual elements (e.g. words, sentences) unchanged

steps:


1.


2.


3.


4.


5.


6.

Page 2 of 6

1.1.1: Permuted order. Copy the input T to output T', but permute the order of the elements. If the inverse operation of this modification is planned to be done in step 6, memorize the schema and maybe the seed that determined the permutation .
1.1.2: Added elements. Copy the input T to output T', but randomly insert elements from other...