Schema Block
Audience: anyone decoding a .pulse file by hand or writing a
non-Go reader. The schema block follows the 9-byte
header and carries one descriptor per column.
From CLAUDE.md, byte-layout invariants for
.pulsefiles, plus the on-disk format documented inencoding/schema.go.
Top-level shape
u16 field_count
field_record × field_count
Each field_record is variable-width (it includes UTF-8 name and
description strings, and may include a categorical dictionary or
decimal/H3 metadata). The reader walks them sequentially.
Per-field record
In write order — see WriteSchema / ReadSchema in
encoding/schema.go:
| # | Field | Size | Encoding |
|---|---|---|---|
| 1 | type | 1 byte | FieldType byte (see Field Types) |
| 2 | name_length | 2 bytes | u16 little-endian |
| 3 | name | name_length bytes | UTF-8 |
| 4 | byte_offset | 4 bytes | u32 LE — offset within a record |
| 5 | bit_position | 1 byte | u8 — bit position within byte_offset (bit-packed types only) |
| 6 | csv_column_idx | 2 bytes | u16 LE — source column index at import time |
| 7 | description | 2 bytes length + UTF-8 | Capped at 1000 bytes (PULSE_IMPORT_DESCRIPTION_TOO_LONG) |
| 8 | (decimal only) precision | 1 byte | decimal128 and nullable_decimal128 only |
| 9 | (decimal only) scale | 1 byte | same |
| 10 | (categorical only) dictionary | variable | See Dictionary Blocks |
Order matters: every reader walks these in the listed order, so a
malformed record stops the parse with ENCODING_INVALID.
Byte offsets and bit positions
byte_offset is the offset of this field’s first byte within a
record. For bit-packed types (packed_bool, nullable_bool,
nullable_u4), byte_offset plus bit_position together locate the
field’s bits within a byte that may be shared with adjacent fields.
For non-packed types, bit_position is always 0.
Record layout mechanics — including the bit-packing rule, record-size computation, and how the encoder packs adjacent sub-byte fields — are in Record Layout.
Conditional trailers
Two trailers attach only to specific field types:
decimal128/nullable_decimal128get a(precision, scale)pair (u8,u8). Both ≤ 38.- Categorical types (
categorical_u8,categorical_u16,categorical_u32) get a full dictionary block in line — see Dictionary Blocks.
A field with none of the above writes nothing after the description.
Field descriptions
The description string is UTF-8 with a 2-byte length prefix. The
import path rejects descriptions longer than 1000 bytes
(PULSE_IMPORT_DESCRIPTION_TOO_LONG) and warns on low-quality
descriptions (empty, under 10 characters, or generic words like
"n/a", "tbd", "unknown", "field", "data", "value",
"column") — that warning is PULSE_FIELD_DESCRIPTION_LOW_QUALITY,
upgraded to an error under --strict.
When the description is empty, pulse cohort inspect synthesises a
fallback string (“Categorical field: description_source = "synthesized". The original
bytes on disk remain empty.
Reader behaviour
encoding.ReadSchema is intentionally strict:
- Field count limit comes from the u16 prefix (max 65,535 fields).
- Unknown type bytes fail loud (
ENCODING_INVALID). - Truncated records fail loud at the first short read.
- The reader produces a
*encoding.Schemawith oneencoding.Fieldper record;Schema.Field(name)looks fields up by name.
After the schema block, record data starts at the file’s first byte past the schema. The record layout is documented in Record Layout.