Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Schema Block

Audience: anyone decoding a .pulse file by hand or writing a non-Go reader. The schema block follows the 9-byte header and carries one descriptor per column.

From CLAUDE.md, byte-layout invariants for .pulse files, plus the on-disk format documented in encoding/schema.go.

Top-level shape

u16 field_count
field_record × field_count

Each field_record is variable-width (it includes UTF-8 name and description strings, and may include a categorical dictionary or decimal/H3 metadata). The reader walks them sequentially.

Per-field record

In write order — see WriteSchema / ReadSchema in encoding/schema.go:

#FieldSizeEncoding
1type1 byteFieldType byte (see Field Types)
2name_length2 bytesu16 little-endian
3namename_length bytesUTF-8
4byte_offset4 bytesu32 LE — offset within a record
5bit_position1 byteu8 — bit position within byte_offset (bit-packed types only)
6csv_column_idx2 bytesu16 LE — source column index at import time
7description2 bytes length + UTF-8Capped at 1000 bytes (PULSE_IMPORT_DESCRIPTION_TOO_LONG)
8(decimal only) precision1 bytedecimal128 and nullable_decimal128 only
9(decimal only) scale1 bytesame
10(categorical only) dictionaryvariableSee Dictionary Blocks

Order matters: every reader walks these in the listed order, so a malformed record stops the parse with ENCODING_INVALID.

Byte offsets and bit positions

byte_offset is the offset of this field’s first byte within a record. For bit-packed types (packed_bool, nullable_bool, nullable_u4), byte_offset plus bit_position together locate the field’s bits within a byte that may be shared with adjacent fields.

For non-packed types, bit_position is always 0.

Record layout mechanics — including the bit-packing rule, record-size computation, and how the encoder packs adjacent sub-byte fields — are in Record Layout.

Conditional trailers

Two trailers attach only to specific field types:

  • decimal128 / nullable_decimal128 get a (precision, scale) pair (u8, u8). Both ≤ 38.
  • Categorical types (categorical_u8, categorical_u16, categorical_u32) get a full dictionary block in line — see Dictionary Blocks.

A field with none of the above writes nothing after the description.

Field descriptions

The description string is UTF-8 with a 2-byte length prefix. The import path rejects descriptions longer than 1000 bytes (PULSE_IMPORT_DESCRIPTION_TOO_LONG) and warns on low-quality descriptions (empty, under 10 characters, or generic words like "n/a", "tbd", "unknown", "field", "data", "value", "column") — that warning is PULSE_FIELD_DESCRIPTION_LOW_QUALITY, upgraded to an error under --strict.

When the description is empty, pulse cohort inspect synthesises a fallback string (“Categorical field: ” or “Numeric field: ”) with description_source = "synthesized". The original bytes on disk remain empty.

Reader behaviour

encoding.ReadSchema is intentionally strict:

  • Field count limit comes from the u16 prefix (max 65,535 fields).
  • Unknown type bytes fail loud (ENCODING_INVALID).
  • Truncated records fail loud at the first short read.
  • The reader produces a *encoding.Schema with one encoding.Field per record; Schema.Field(name) looks fields up by name.

After the schema block, record data starts at the file’s first byte past the schema. The record layout is documented in Record Layout.