Browse Prior Art Database

Data set level notification on retried error events

IP.com Disclosure Number: IPCOM000192759D
Original Publication Date: 2010-Feb-01
Included in the Prior Art Database: 2010-Feb-01
Document File: 1 page(s) / 22K

Publishing Venue

IBM

Abstract

Today many types of I/O errors are retried. If the event is successful, the application performing the I/O does not realize the event took place. If a performance issue is noticed on a particular data set, it may be related to retryable hardware errors that are not serviced back to the application. These errors are sometimes logged in logrec, but generally only include the device number and other information like the CCHHR involved. For most data sets this may not be an issue, but for critical data sets, this can be a problem. Our utility scans logrec for retried type errors. We then determine if the error event took place on a data set that the user has identified as critical performance path. If it is we generate a report back to the client with the details of the event.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 1

Data set level notification on retried error events

The first step in our idea uses a list of data set names that have been identified as being critical for performance to the client. The client can also specify a wild card that will indicate that they consider all data sets to be critical. If all data sets are to be monitored then we process every retryable event. If the user specifies a subset of files, we determine the device numbers where those data sets reside by using a LISTCAT function to determine what devices that data set resides on. During extend processing to new volumes, we hook into the SMS volume selection code and update out list of volumes when one of our critical data sets is extended to a new volume. We use two data structures. The first is a hash table with the device number as the key, and the critical data sets on that volume as the data. The second hash table uses the data set name as the key, and the device numbers it resides on as the data. When an extend takes place we look to see if the data set name is in our second hash table, if it is we updated the device number. When then look in the first hash table to see if that device already exists. If it does we add the new data set name under that existing device number. If the device number does not exist, we add the device number and the data set name. This is how we keep out tables up to date.

We then periodically scan logrec and search for a prede...