Browse Prior Art Database

Simple background garbage collection using fork() Disclosure Number: IPCOM000009148D
Publication Date: 2002-Aug-09

Publishing Venue

The Prior Art Database

Related People

Wayne Gramlich


[ IPCOM000000005S originally published 2001-05-08 06:03 UTC ] Abstract: A mark sweep garbage collector can be designed to have a minimal pause using a fork system call. Introduction: A garbage collector (GC) is a piece of software that is responsible for identifying blocks of memory in a program's address space that are no longer being used by the program. A mark sweep garbage collector is a two pass garbage collector where the first pass is responsible for identifying which blocks of memory are still being used and "marking" them; the second pass is responsible for examining all memory blocks (i.e. "sweeping") and freeing those blocks that have not been marked. For a program that only has a single thread of execution, the program becomes unresponsive to user input for the duration of both the "mark" and "sweep" phases. Programs with large address spaces may pause for several 10's of seconds; this is rather unacceptable. For multi-threaded programs, it becomes necessary to stop all threads for the duration of the mark phase. Again, stopping all threads can be quite unacceptable. There are a number of real-time garbage collectors that attempt to spread the cost of garbage collection out, but these are substantially more complicated than mark-sweep garbage collectors. Disclosure: Many operating systems in current use, including Linux and other flavors of Unix, implement a system call called fork. The fork system call results in two processes ("parent" and "child") continuing from the fork call, each with its own copy of the memory state at the time of the fork. Using "copy on write" technology, the operating system does not need to copy all the memory; instead it simply copies pointers to memory pages, so that each process shares the original memory, now marked read-only. When either process writes to a shared page, the operating system registers a page fault, duplicates the page, and updates the page tables for the two processes to point to private writable versions of the page. This minimizes and spreads the cost of the duplication, so usually there is no noticeable delay as a result of the fork. A forking garbage collector is implemented by executing a fork system call, continuing normal execution in the parent process, and running a simple mark sweep garbage collector in the child process. The parent process can run in parallel with the child process without any significant pause. When the child program is done running the garbage collector, it can report up to the parent program those blocks of memory that it found to be freeable and then dispose of itself. Any garbage found in the child process is also garbage in the parent process! The only pause is the amount of time required to execute the fork system call and the amount of time required to get the list of freeable blocks back from the child and process it. The latter may be interspersed with the running program, freeing the blocks little by little. If the list is returned over a pipe, a program waiting for I/O with a select() statement may read a bit from the pipe at a time. If the list is returned in a shared memory buffer, the program may likewise process it a bit at a time. If the operating system supports threads (like processes, but sharing the same memory space), a new thread may be started and may free the list in parallel with the main thread of execution. The child process that is performing the garbage collection can reduce the number of page faults it produces by keeping the mark bits in a mark bit table separate from the rest of the address space. Thus, only the mark bit table is written to by the child, leaving the other address space pages read-only and uncopied (except for what the parent process actually needs to write to). Note that the child process must mark blocks in the free list, but not scan their contents for pointers. This idea works just fine for memory management using handles and other methods of indirection. It is also directly applicable to multiprocessor systems. This disclosure was written for simple mark sweep garbage collection, but extensions of the mark sweep idea will also work. The idea of forking a process and running a garbage collector in parallel with a running process can also be done with other garbage collection techniques. Extensions of the idea will be obvious to those skilled in operating systems or garbage collection systems. [ 000000005S 05S 5S ]