Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Page level sampling when total number of pages is unknown

IP.com Disclosure Number: IPCOM000015623D
Original Publication Date: 2002-Mar-05
Included in the Prior Art Database: 2003-Jun-20
Document File: 1 page(s) / 37K

Publishing Venue

IBM

Abstract

An algorithm is disclosed that allows page level sampling when the total number of pages is unknown in a table. It is difficult to sample pages in a table when the total number of pages is unknown beforehand. Thus, it is difficult to determine which pages are to be part of the sample and which are not, when you do not have prior knowledge of the number of pages contained in the table. This can lead to poor sampling rates where the actual sampled percent does not reflect the desired sampled percent. When the total number of pages is unknown, it is possible to have the actual percent of pages sampled to be close (if not equal) to the desired sampling percent rate. This method of page level sampling relies on two counters being maintained. One for the number of pages included in the sample (let's call this one sampledPages), and one for the total number of pages encountered thus far (let's call this one totalPages). The first pages is always part of the set of pages making up the sample. At the second page (and every subsequent page after that) the following formula is used to determine if the page is to be included in the sample or not: if sampledPages totalPages) desired sampling rate then include this page in the sample

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 1

Page level sampling when total number of pages is unknown

   An algorithm is disclosed that allows page level sampling when the total number of pages is unknown in a table. It is difficult to sample pages in a table when the total number of pages is unknown beforehand. Thus, it is difficult to determine which pages are to be part of the sample and which are not, when you do not have prior knowledge of the number of pages contained in the table. This can lead to poor sampling rates where the actual sampled percent does not reflect the desired sampled percent. When the total number of pages is unknown, it is possible to have the actual percent of pages sampled to be close (if not equal) to the desired sampling percent rate. This method of page level sampling relies on two counters being maintained. One for the number of pages included in the sample (let's call this one sampledPages), and one for the total number of pages encountered thus far (let's call this one totalPages). The first pages is always part of the set of pages making up the sample. At the second page (and every subsequent page after that) the following formula is used to determine if the page is to be included in the sample or not:

if ( ( sampledPages / totalPages) < desired sampling rate )
then
include this page in the sample
sampledPages = sampledPages + 1
totalPages = totalPages + 1
else
do not include this page in the sample
totalPages = totalPages + 1
end if.

Even though the total number of pages is unknown before hand, as this algorithm is run on the pages, the percent of pages that are included in the sample will approach the desired sampling rate. This solution solves the problem because t...