System and Method for Auto-Assessment of Security Designation of Document
Publication Date: 2014-Apr-16
The IP.com Prior Art Database
Disclosed is a method for automated clustering documents and performing automated document classification and re-classification for the purpose of effective role based data authorization.
Page 01 of 3
System and Method for Auto-Assessment of Security Designation of Document This is a common challenge across organizations with respect to information management.
Assume that any document within a company can have four possible roles:
Public: the information exists on the Internet
Internal: anyone within the company can see the document, but it is not released outside of the company
Confidential: access to the document is limited to specific departments or teams
Restricted: access to the document is limited to a named individual(s)
A document may receive a restricted designation. The challenge is that the document's role can change over time. A document may originally contain top-secret information; however, after a period, the company's actions or other released information can cause the initial information to be known to the public. Thus, the original document must have the role downgraded to Confidential, Internal, or Public depending on how much information becomes known.
Most organizations have millions of documents and no process for managing roles after the document is written, especially if the original owner is no longer responsible for the document, and a new owner is not assigned. The sheer number of documents marked Restricted or Confidential means that most enterprise search indexes are not going to make the content available to someone searching for this information in the company. Remember, at this point, the information being addressed has (over time) become more public in nature, and should be made accessible to search.
This disclosure discusses a method for automated clustering documents and performing automated document classification and re-classification for the purpose of effective role based data authorization.
The first step is to perform known information extraction techniques on the document. By using triple extraction, the method extracts triples (or facts) from a document. For example, the company may have a highly confidential oil drilling report that talks about a well and a reservoir and values with respect to that.
Extracted Triples (facts): alpha is-a reservoir beta is-a basin gamma is-a region alpha located-in beta beta located-in region
Page 02 of 3
alpha has-porosity 20%
alpha has-facies limestone
alpha has-environment shale
These triples are added to a knowledge graph with provenance (the document that contained the source of the data, along with the document's role, i.e. restricted).
The extracted triples now look like this:
(alpha is-a reservoir) hasPrimarySource aPostDrillDoc
(beta is-a basin) hasPrimarySour...