Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

System for Automatically Expanding a File or Email Folder into Subfolders

IP.com Disclosure Number: IPCOM000239821D
Publication Date: 2014-Dec-03
Document File: 2 page(s) / 33K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system for automatically splitting a file folder into sub-folders upon a user's request. The system runs the documents in the folder through a topic extraction algorithm and then proposes sub-folders that correspond to the most prominent topics.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

System for Automatically Expanding a File or Email Folder into Subfolders

A user that has amassed a large number of files (or emails) in a given folder often

wants to manage this by splitting the original folder into two or more sub -folders. Doing this splitting is often time consuming. In addition, a user might create the new sub-folders, but the old files are often kept in the original folder and only new files end up in the sub-folders.

The solution uses Latent Dirichlet Allocation to identify the prominent topics associated

with the documents in the single large existing folder, and then proposes these topics as sub-folder names. Once accepted by the user, the documents are distributed among the folders using the pre-identified topics.

The novel contribution is a system for automatically splitting a folder into sub -folders upon a user's request. The system runs the documents in the folder through a topic extraction algorithm (i.e. Latent Dirichlet Allocation (LDA)) and then proposes sub-folders that correspond to the most prominent topics . The system preemptively proposes a split into sub-folders based on a sharp division of documents along topic boundaries, where said notion of "sharpness" is ascertained using the following steps:

1. Documents are coordinated based on the prevalent topics found in initial analysis, and then normalized so that all document vectors have norm 1

2. A threshold value lambda is decided upon (reasonable values of lambda being in the range 2 <= lambda <=5, though other values are possible)

3. If the largest coordinate value of a document values is in slot i , with weight

w_i and all other coordinates values w_j are such that w_i >= lambda w_j, then the system identifies a sharp division of documents along topic boundaries and it is appropriate...