The Reclaim Notification Patch

These are rather technical notes about a patch to the Boehm-Demers-Weiser Conservative Garbage Collector which adds a feature which in lack of a better name I call “reclaim notification”. This is a low-level feature which is aimed to support efficient hash-consing and finalisation.

The Issues

First off, it should be noted that the performance finalisation is usually a non-issue if recommendations for their use is followed. The main use of finalisation is to free up non-critical resources. (Critical resources should be explicitely managed, since finalisation is not guaranteed.) Since such resources are typically few compared to the abundance of system memory, the overhead of finalisation is small. So why bother?

My interest in this stems from a more exotic use of the collector, hash-consing, which I think should be considered a third kind of GC-related mechanism in addition to finalisation and weak pointers. Hash-consing is a technique that allows sharing of objects constructed independently but which happens to be identical. In addition to reducing memory overhead in applications where such sharing is likely, it also means equality over complex expression trees reduces to a simple pointer comparison. The technique can be used for expression trees, as exemplified by the ATerm tree-handling library which implements its own garbage collector, and my own cuex library of culibs which uses the Boehm-Demers-Weiser collector.

The current implementation handles weak pointers and schedules finalisation at the end of a collection, when the mark-bits are at a consistent state. The actual finalisation happens later, but if the number of finalisers are comparable to the number of heap objects, there is still a significant overhead at this critical phase when threads cannot use the collector. There is also a significant time and memory overhead when finalisation is used extensively for small objects. In heavy tree-processing application, the main part of the memory may be such small objects.

A Solution

The two things we want to accomplish is

Roughly one phase of garbage collection is

The proposed solution is to add a light-weight low-level feature upon which finalisation and hash-consing can be implemented:

What we gain from this:

Limitations:

The Patch

In the following is a summary of what the patch does. To make sence of it, you should have the sources to the collector, and the patch itself, which is found in the download directory.

Extension to Object Kinds and Heap Block Headers

Reclaim notification can be enabled for user-defined object-kinds. Most importantly, the callback is registered along with closure data. In addition there is a flag to follow pointers from unmarked objects in associated heap blocks during the mark phase. The latter is optional and prevents collection of objects reachable from reclaim notifiers.

The patch also includes a default kind for finalisable objects. A debug-enabled version should be added if this patch is accepted by the GC developers.

include/gc_mark.h
finalized_mlc.c
GC_register_reclaim_notifier
GC_register_reclaim_notifier_inner
The low-level interface.
include/private/gc_priv.h
HAS_RECLAIM_NOTIFIER
MARK_UNCONDITIONALLY
Heap block flags to accelerate the new object-kind properties below.
ok_mark_unconditionally
Added object kind attribute for marking for unmarked objects.
ok_reclaim_notifier_proc
ok_reclaim_notifier_cd
The callback for kinds with reclaim notification.
allchblk.c
Propagate HAS_RECLAIM_NOTIFIER and MARK_UNCONDITIONALLY to block headers.
misc.c
Initialise the new fields of object kinds.

The Essential Algorithm Changes

This part of the patch makes sure reclaim notifiers are called before objects of the associated kinds are reclaimed. The notifiers are allowed to resurrect objects.

I found it necessary to support resurrection to support hash-consing, because of the late callback to the notifier. Since the notifiers are not called while the world is stopped, they can not remove unmarked objects from the hash-consing hash-table in time. That is, during the time between the world is stopped and the time objects are reclaimed, the application may receive potentially recycleable objects. The notifier detects this case and prevents these object from being reclaimed.

mark.c
GC_push_unconditionally
Added a function which marks from all objects, marked or not.
GC_block_was_dirty
Call GC_push_unconditionally if HARK_UNCONDITIONALLY was set. Note that the changes above and below has no effect by itself, it's just a move of GC_push_marked so that the MARK_UNCONDITIONALLY case does not call it.
reclaim.c
GC_block_empty
We can not give up a block before reclaim notifiers are run. The current patch simply reports objects as non-empty if reclaim notification is enabled. This must be done properly.
GC_reclaim_with_notifiers
This is a version of GC_reclaim_clear which also calls reclaim notifiers.
GC_reclaim_generic
Call GC_reclaim_with_notifiers for small objects where applicable.
GC_reclaim_block
Call reclaim notifiers for big objects where applicable.

Support for Finalization

include/private/thread_local_alloc.h
thread_local_alloc.c
Declare and initialise finalized_freelists.
include/gc_finalized.h
finalized_mlc.c
This defines an "finalized" object kind, except GC_register_reclaim_notifier was put here to reduce the changes to existing files.

Build Infrastructure and Test Case

In addition to the above there are two test cases and miscellaneous changes to the build files.

Last updated 2006-11-17 by Petter Urkedal.