Method for distributing and validating data files in a clustered computing environment
Original Publication Date: 2002-May-27
Included in the Prior Art Database: 2003-Jun-21
Data distribution in clustered computing environments has been an issue in high performance computing for many years. However, two recent developments have introduced new issues in this problem domain: • The recent rise in “commodity clusters”, primarily built with Linux on x86 hardware • Data growth in the Life Sciences industry, particularly as it relates to genomic and proteomic data Linux clustering commonly known as “Beowulf”clusters – are increasingly being sought as cost effective solutions to problems requiring high performance computing systems. The nature of a Linux cluster is to interconnect a group of servers (typically x86-based servers connected via 10/100MB Ethernet), and to run parallel or “embarrassingly parallel” applications across the cluster. A considerable challenge in this environment is moving data across the cluster in an efficient, reliable, and dynamic manner. Further, the problem increases linearly as the size of the cluster grows.