The Disclaim Patch

These are rather technical notes about a patch to the Boehm-Demers-Weiser Conservative Garbage Collector which adds a low-level feature which is aimed to support efficient hash-consing and finalisation.

The Issues

First off, it should be noted that the performance finalisation is usually a non-issue if recommendations for their use is followed. The main use of finalisation is to free up non-critical resources. (Critical resources should be explicitly managed, since finalisation is not guaranteed.) Since such resources are typically few compared to the abundance of system memory, the overhead of finalisation is small. So why bother?

My interest in this stems from a more exotic use of the collector, hash-consing. This mechanism can be implemented in terms of finalisers or in terms of weak-pointers, but since neither is a good fit, it's useful to think of hash-consing as an independent mechanism.

Hash-consing is a technique that allows sharing of objects constructed independently but which happen to be identical. In addition to reducing memory overhead in applications where such sharing is likely, it also means that equality over complex expression trees reduces to a simple pointer comparison. The technique can be used for expression trees, as exemplified by the ATerm tree-handling library which implements its own garbage collector, and my own cuex library of culibs which uses the Boehm-Demers-Weiser collector.

The official libgc codebase handles weak pointers and schedules finalisation at the end of a collection, when the world is stopped and mark-bits are in a consistent state. The actual finalisation happens later, but if the number of finalisers are comparable to the number of heap objects, there is still a significant overhead at this critical phase when parallel threads cannot use the collector. There is also a significant time and memory overhead if finalisation is used extensively for small objects, due to hash-tables used for internal book-keeping. In heavy tree-processing application, the main part of the memory may be such small objects.

A Solution

The two things we want to accomplish is

Roughly one phase of garbage collection is

The proposed solution is to add a light-weight low-level feature upon which finalisation and hash-consing can be implemented:

What we gain from this:

Limitations of disclaim-based finalisation:

The Patch

In the following is a summary of what the patch does. To make sense of it, you should have the sources to the collector, and the patch itself, which is found in the download directory. The patch is named gc-GC_VERSION-disclaim-PATCH_VERSION.patch.

Extension to Object Kinds and Heap Block Headers

Reclaim notification can be enabled for user-defined object-kinds. Most importantly, the callback is registered along with closure data. In addition there is a flag to follow pointers from unmarked objects in associated heap blocks during the mark phase. The latter is optional and prevents collection of objects reachable from disclaim functions.

The patch also includes a default kind for finalisable objects. A debug-enabled version should be added if this patch is accepted by the GC developers.

include/gc_disclaim.h
disclaim.c
GC_finalizer_closure (struct)
GC_register_disclaim_proc
The low-level interface.
GC_init_finalized_malloc
GC_finalized_malloc
A finaliser implementation.
include/private/gc_priv.h
HAS_DISCLAIM
MARK_UNCONDITIONALLY
Heap block flags to accelerate the presence check for new below object-kind properties.
ok_mark_unconditionally
Added object kind attribute for marking for unmarked objects.
ok_disclaim_proc
ok_disclaim_cd
The disclaim callback closure.
allchblk.c
Propagate HAS_DISCLAIM and MARK_UNCONDITIONALLY to block headers.
misc.c
Initialise the new fields of object kinds.

The Essential Algorithm Changes

This part of the patch makes sure disclaim functions are called before objects of the associated kinds are reclaimed. The notifiers are allowed to resurrect objects.

I found it necessary to support resurrection to support hash-consing, because of the late callback to the notifier. Since the notifiers are not called while the world is stopped, they can not remove unmarked objects from the hash-consing hash-table in time. That is, during the time between the world is stopped and the time objects are reclaimed, the application may receive potentially recyclable objects. The notifier detects this case and prevents these object from being reclaimed.

mark.c
GC_push_unconditionally
Added a function which marks from all objects, marked or not.
GC_block_was_dirty
Call GC_push_unconditionally if MARK_UNCONDITIONALLY was set. Note that the changes above and below has no effect by itself, it's just a move of GC_push_marked so that the MARK_UNCONDITIONALLY case does not call it.
reclaim.c
GC_block_empty
We can not give up a block before reclaim notifiers are run. The current patch simply reports objects as non-empty if reclaim notification is enabled. This must be done properly.
GC_disclaim_and_reclaim
This is a version of GC_reclaim_clear which also calls reclaim notifiers.
GC_reclaim_generic
Call GC_disclaim_and_reclaim for small objects where applicable.
GC_reclaim_block
Call reclaim notifiers for big objects where applicable.

Support for Finalization

include/private/thread_local_alloc.h
thread_local_alloc.c
Declare and initialise finalized_freelists.
include/gc_disclaim.h
disclaim.c
Definitions for the "finalized" object kind was put here along with the low-level API.

Build Infrastructure and Test Case

In addition to the above there are two test cases and miscellaneous changes to the build files.

Last updated 2009-08-14 by Petter Urkedal.