104 lines
3.8 KiB
ReStructuredText
104 lines
3.8 KiB
ReStructuredText
The PDB Serialized Hash Table Format
|
|
====================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
.. _hash_intro:
|
|
|
|
Introduction
|
|
============
|
|
|
|
One of the design goals of the PDB format is to provide accelerated access to
|
|
debug information, and for this reason there are several occasions where hash
|
|
tables are serialized and embedded directly to the file, rather than requiring
|
|
a consumer to read a list of values and reconstruct the hash table on the fly.
|
|
|
|
The serialization format supports hash tables of arbitrarily large size and
|
|
capacity, as well as value types and hash functions. The only supported key
|
|
value type is a uint32. The only requirement is that the producer and consumer
|
|
agree on the hash function. As such, the hash function can is not discussed
|
|
further in this document, it is assumed that for a particular instance of a PDB
|
|
file hash table, the appropriate hash function is being used.
|
|
|
|
On-Disk Format
|
|
==============
|
|
|
|
.. code-block:: none
|
|
|
|
.--------------------.-- +0
|
|
| Size |
|
|
.--------------------.-- +4
|
|
| Capacity |
|
|
.--------------------.-- +8
|
|
| Present Bit Vector |
|
|
.--------------------.-- +N
|
|
| Deleted Bit Vector |
|
|
.--------------------.-- +M ─╮
|
|
| Key | │
|
|
.--------------------.-- +M+4 │
|
|
| Value | │
|
|
.--------------------.-- +M+4+sizeof(Value) │
|
|
... ├─ |Capacity| Bucket entries
|
|
.--------------------. │
|
|
| Key | │
|
|
.--------------------. │
|
|
| Value | │
|
|
.--------------------. ─╯
|
|
|
|
- **Size** - The number of values contained in the hash table.
|
|
|
|
- **Capacity** - The number of buckets in the hash table. Producers should
|
|
maintain a load factor of no greater than ``2/3*Capacity+1``.
|
|
|
|
- **Present Bit Vector** - A serialized bit vector which contains information
|
|
about which buckets have valid values. If the bucket has a value, the
|
|
corresponding bit will be set, and if the bucket doesn't have a value (either
|
|
because the bucket is empty or because the value is a tombstone value) the bit
|
|
will be unset.
|
|
|
|
- **Deleted Bit Vector** - A serialized bit vector which contains information
|
|
about which buckets have tombstone values. If the entry in this bucket is
|
|
deleted, the bit will be set, otherwise it will be unset.
|
|
|
|
- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
|
|
entry is the key (always a uint32), and the second entry is the value. The
|
|
state of each bucket (valid, empty, deleted) can be determined by examining
|
|
the present and deleted bit vectors.
|
|
|
|
|
|
.. _hash_bit_vectors:
|
|
|
|
Present and Deleted Bit Vectors
|
|
===============================
|
|
|
|
The bit vectors indicating the status of each bucket are serialized as follows:
|
|
|
|
.. code-block:: none
|
|
|
|
.--------------------.-- +0
|
|
| Word Count |
|
|
.--------------------.-- +4
|
|
| Word_0 | ─╮
|
|
.--------------------.-- +8 │
|
|
| Word_1 | │
|
|
.--------------------.-- +12 ├─ |Word Count| values
|
|
... │
|
|
.--------------------. │
|
|
| Word_N | │
|
|
.--------------------. ─╯
|
|
|
|
The words, when viewed as a contiguous block of bytes, represent a bit vector
|
|
with the following layout:
|
|
|
|
.. code-block:: none
|
|
|
|
.------------. .------------.------------.
|
|
| Word_N | ... | Word_1 | Word_0 |
|
|
.------------. .------------.------------.
|
|
| | | | |
|
|
+N*32 +(N-1)*32 +64 +32 +0
|
|
|
|
where the k'th bit of this bit vector represents the status of the k'th bucket
|
|
in the hash table.
|