2016-Jun-20
Disclosed is a method to manage corpus based on experimental protocols and ensure the use of valid documents in a corpus.

Protocol Revision-Based Corpus Management

Business and academic researchers create thousands of articles every year. Some are published, some are rejected by peer review, and some remain in the file drawer. A

high number of articles are not published due to unconfirmed hypothesis, uninteresting outcomes, or rejected peer reviews.

The high number of articles unpublished versus the published articles has resulted in a potential Publication Bias, which ma cause a corpus to bias towards abnormal versus

normal outcomes and conclusions. Studies have found inconsistencies in experiments (e.g., modification to sample sizes). For instance, flipping a coin 10 times results in one head outcome and nine tail outcomes could become a new paper on efficient coin flipping, whereas statistically the results should be .5 probability.

A method is needed to avoid inconsistencies in experiments and monitor changing

experiment protocols.

The novel contribution is a method to manage corpus based on experimental protocols, by:

1. Retrieving the revisions for a document

2. Extracting the protocol from the document

3. Monitoring the document for changes

4. Detecting a change to the protocol for a document

5. Modifying the document status in the corpus

The status is one of deletion, confirmation, mark for review, etc. This method can be used to predict a user's likelihood of needing detailed analysis before being put into the corpus (e.g., the researcher has a low confidence with regards experimental protocol). The protocol may be a composite of the markers indicating a protocol is in use.

Example Embodiment

1. User A authors a document, "Elementary Expertise in ESP" - Doc 1 2. A segment of the document includes reference to "out of 100"

3. System detects a new document is to be loaded.

4. System extracts the protocol markers: "out of 100" - Doc 1 - Version 1 5. User A did not like the original document's result 6. User A performs the experiment again with 1,000 individuals 7. The original document includes an updated reference to "out of 1000"

8. System detects the new version of the document - Doc 1 9. System extracts the protocol markers: "out of 1000" - Doc 1 - Version 2 10.System determines the change to the document and protocol markers. 11.System marks User A's Doc 1 as requiring additional review, and temporarily removes the document/versions from the corpus under analysis

Figure: At a high level, the system acquires source information, analyzes, and manages the corpus, which is used by cognitive services.


To retrieve the revisions for a document:

1. System retrieves the documents which are in/or are candidates for ingestion into the corpus

2. System retrieves any corresponding revision to the document (e.g., revision, file version, etc.) or monitors a specific site/conversation around the evolution of the document)

3. System may:

A. Monitor the drafting of a document

B. Intercept the peer reviewed document

C. Extract the protocol fo...