# Parser Subsystem

The parser transforms a flat token stream from the lexer into an abstract syntax
tree (AST) representing a complete C translation unit.  It implements a
hand-written recursive descent parser for the C language, covering C99/C11
semantics plus a broad set of GCC and Clang extensions required for
compatibility with real-world system headers and codebases.

---

## Table of Contents

1. [Why Recursive Descent](#why-recursive-descent)
2. [Module Organization](#module-organization)
3. [The Parser Struct](#the-parser-struct)
4. [Expression Parsing and Precedence Climbing](#expression-parsing-and-precedence-climbing)
5. [The Typedef/Identifier Ambiguity](#the-typedefidentifier-ambiguity)
6. [Declarator Parsing and the Inside-Out Rule](#declarator-parsing-and-the-inside-out-rule)
7. [Type Specifier Parsing](#type-specifier-parsing)
8. [Declaration Parsing](#declaration-parsing)
9. [Statement Parsing](#statement-parsing)
10. [Inline Assembly (GCC Extended Syntax)](#inline-assembly-gcc-extended-syntax)
11. [AST Node Types](#ast-node-types)
12. [Packed Bitfield Attributes](#packed-bitfield-attributes)
13. [Error Recovery Strategy](#error-recovery-strategy)
14. [Known Limitations](#known-limitations)

---

## Why Recursive Descent

The parser is written entirely by hand rather than generated by a tool like
yacc, bison, or LALR table generators.  This is a deliberate choice driven by
several properties of the C grammar:

- **Context sensitivity.**  C's grammar is not context-free.  The meaning of an
  identifier depends on whether it has been previously declared as a `typedef`
  name.  A hand-written parser can consult a live typedef table during parsing,
  which is awkward or impossible to express in a declarative grammar file.

- **Ambiguity resolution.**  Many C constructs are syntactically ambiguous.  A
  `(` token could begin a cast expression, a compound literal, a parenthesized
  expression, or a parenthesized declarator.  The parser resolves these with
  speculative parsing (save position, try one interpretation, backtrack on
  failure) -- a technique that maps naturally to imperative Rust code.

- **GCC extension coverage.**  Real-world C code (particularly Linux kernel
  headers and glibc) uses many GCC extensions: `__attribute__`, statement
  expressions, computed gotos, inline assembly, `typeof`, `__auto_type`, case
  ranges, label addresses, and more.  These extensions are easier to add
  incrementally to a hand-written parser than to graft onto a generated grammar.

- **Error reporting.**  A recursive descent parser knows exactly what construct
  it was attempting to parse when something goes wrong, enabling precise
  diagnostic messages ("expected `;` after return statement") with fix-it hints
  and notes pointing back to matching delimiters.

---

## Module Organization

The `Parser` struct is defined once in `parse.rs`, but its methods are spread
across six files via separate `impl Parser` blocks.  Each file is a private
module that adds domain-specific parsing methods:

```
parser/
  mod.rs            -- Module declarations; re-exports Parser
  parse.rs          -- Parser struct, constructor, token helpers, entry point
  expressions.rs    -- Expression parsing (precedence climbing)
  types.rs          -- Type specifier collection and resolution
  declarations.rs   -- External and local declarations, initializers, function defs
  declarators.rs    -- C declarator syntax (pointers, arrays, function pointers)
  statements.rs     -- All statement types + inline assembly
  ast.rs            -- AST type definitions (pure data, no parsing logic)
```

All parsing methods are `pub(super)` so they can call each other across module
boundaries within the parser crate, while remaining invisible outside it.  The
public API consists of `Parser::new()`, `Parser::parse()`,
`Parser::set_diagnostics()`, `Parser::take_diagnostics()`, the `error_count`
field, and the AST types exported from `ast.rs`.

This split keeps each file focused on one aspect of the grammar.  The type
specifier parser (`types.rs`) handles the combinatorial complexity of C's
type keywords; the declarator parser (`declarators.rs`) handles the recursive
inside-out syntax; the expression parser (`expressions.rs`) handles operator
precedence; and so on.  Despite spanning six files, there is a single `Parser`
struct with a single token stream and a single parse position -- no separate
sub-parsers or intermediate representations.

---

## The Parser Struct

The core state of the parser is:

```rust
pub struct Parser {
    tokens: Vec<Token>,           // Full token stream from the lexer
    pos: usize,                   // Current read position

    typedefs: FxHashSet<String>,           // Known typedef names
    shadowed_typedefs: FxHashSet<String>,  // Typedefs shadowed by local vars

    attrs: ParsedDeclAttrs,       // Accumulated decl attributes (reset per decl)

    pragma_pack_stack: Vec<Option<usize>>,  // #pragma pack push/pop stack
    pragma_pack_align: Option<usize>,       // Current #pragma pack alignment
    pragma_visibility_stack: Vec<String>,   // #pragma GCC visibility push/pop
    pragma_default_visibility: Option<String>,

    error_count: usize,
    diagnostics: DiagnosticEngine,

    enum_constants: FxHashMap<String, i64>,          // For constant-expr eval
    unevaluable_enum_constants: FxHashSet<String>,   // Enum consts with unknown values
    struct_tag_alignments: FxHashMap<String, usize>, // For __alignof__ lookups
}
```

Key design points:

- **Token array + position index.**  The parser operates on a pre-lexed
  `Vec<Token>` rather than a streaming lexer.  This enables O(1) lookahead at
  any depth and trivial backtracking by saving and restoring `pos`.

- **Typedef tracking.**  The `typedefs` set is pre-seeded with ~90 common
  standard library type names (`size_t`, `uint32_t`, `FILE`, `va_list`, etc.)
  and grows as `typedef` declarations are parsed.  The `shadowed_typedefs`
  set tracks typedef names that have been redeclared as local variables in
  the current scope, which is necessary for correct disambiguation.

- **Attribute accumulator.**  `ParsedDeclAttrs` is a temporary buffer that
  collects storage-class specifiers, type qualifiers, and GCC attributes as
  they are encountered during type specifier parsing.  It is consumed and reset
  at each declaration boundary.  Its lifecycle is "set during
  `parse_type_specifier`, consumed by declaration builders in
  `declarations.rs`."

- **Pragma state.**  The parser maintains push/pop stacks for `#pragma pack`
  (struct field alignment) and `#pragma GCC visibility` (ELF symbol visibility).
  These are emitted by the preprocessor as synthetic tokens and consumed during
  parsing.

- **Semantic caches.**  `enum_constants` maps enum constant names to their
  integer values, enabling the parser to evaluate constant expressions in
  contexts like `__attribute__((aligned(1 << ENUM_CONST)))`.
  `unevaluable_enum_constants` tracks enum constants whose values couldn't be
  computed at parse time (e.g., `SIZE = sizeof(some_typedef)`); these are
  recognized as constants by `_Static_assert` even though their values are
  unknown.  `struct_tag_alignments` stores computed struct/union alignments for
  `__alignof__` lookups on tag-only references.

The main entry point is `Parser::parse()`, which loops over
`parse_external_decl()` until EOF, collecting `ExternalDecl` nodes into a
`TranslationUnit`.

---

## Expression Parsing and Precedence Climbing

The expression parser uses **operator precedence climbing** via a table-driven
design.  Rather than writing a separate function for each of C's ten binary
precedence levels, a single `parse_binary_expr` method is parameterized by a
`PrecedenceLevel` enum:

```rust
enum PrecedenceLevel {
    LogicalOr,       // ||
    LogicalAnd,      // &&
    BitwiseOr,       // |
    BitwiseXor,      // ^
    BitwiseAnd,      // &
    Equality,        // == !=
    Relational,      // < <= > >=
    Shift,           // << >>
    Additive,        // + -
    Multiplicative,  // * / %
}
```

The `token_to_binop` method on `Parser` maps a `(&TokenKind, PrecedenceLevel)`
pair to an optional `BinOp`.  The shared loop in `parse_binary_expr` repeatedly:

1. Parses the next-tighter level via `parse_next_tighter`.
2. Checks if the current token maps to an operator at this level.
3. If so, consumes the operator, parses the right-hand operand (at the
   next-tighter level for left-associativity), and wraps both sides in a
   `BinaryOp` node.
4. Repeats until no operator matches.

This eliminates what would otherwise be ten nearly identical recursive
functions while preserving correct associativity and precedence.

The full call hierarchy from loosest to tightest binding is:

```
parse_expr                 -- comma operator (,)
  parse_assignment_expr    -- = += -= *= /= %= <<= >>= &= ^= |=
    parse_conditional_expr -- ? : (ternary, including GNU omitted-middle)
      parse_binary_expr(LogicalOr)
        ...10 levels...
          parse_binary_expr(Multiplicative)
            parse_cast_expr    -- (type)expr, compound literals
              parse_unary_expr -- prefix ++/-- +/- ~ ! & * sizeof _Alignof
                parse_postfix_expr -- postfix ++/-- [] . -> fn()
                  parse_primary_expr -- literals, identifiers, parens, builtins
```

Assignment is parsed as right-associative by having `parse_assignment_expr`
recurse to itself for the right operand.  The ternary operator similarly
recurses for both the then-branch (via `parse_expr`, allowing comma
expressions) and the else-branch (via `parse_conditional_expr`, disallowing
commas).

### Cast Expression Ambiguity

`parse_cast_expr` demonstrates speculative parsing.  When it sees `(`, it:

1. Saves the current position and attribute state.
2. Advances past `(` and checks `is_type_specifier()`.
3. If a type specifier is found, parses it through abstract declarator suffixes.
4. If `)` follows, checks for `{` (compound literal) or parses the operand as a
   cast.
5. If any step fails, restores position and falls through to `parse_unary_expr`
   (treating the `(` as a parenthesized expression).

This same speculative pattern is used by `parse_sizeof_expr`, where
`sizeof(type)` and `sizeof expr` must be disambiguated.

---

## The Typedef/Identifier Ambiguity

C's grammar is famously context-sensitive because an identifier can be either a
type name (if previously declared as a `typedef`) or a variable/function name.
This creates genuine ambiguity:

```c
typedef int T;
void f(void) {
    T * x;    // declaration: pointer to T named x? or expression: T times x?
    T(y);     // declaration: T y? or function call: T(y)?
}
```

The parser resolves this with four mechanisms:

1. **Typedef set.**  `self.typedefs` tracks all names introduced by `typedef`
   declarations.  `is_type_specifier()` returns `true` for identifiers in this
   set, causing the parser to enter the declaration path rather than the
   expression path.

2. **Typedef shadowing.**  When a local variable declaration uses a name that
   shadows a typedef (e.g., `int size_t = 42;`), the name is added to
   `shadowed_typedefs`.  The `is_type_specifier()` check excludes shadowed
   names, so subsequent uses parse as identifiers.  The shadowing set is
   saved/restored at compound statement boundaries to handle scope correctly.

3. **Label vs. declaration.**  A typedef name followed by `:` is a label, not
   a declaration (e.g., `size_t: stmt;`).  The `is_typedef_label()` check
   prevents the parser from misinterpreting labeled statements as declarations.

4. **Builtin typedefs.**  The parser pre-seeds `typedefs` with approximately
   90 common standard library type names (`size_t`, `int32_t`, `FILE`,
   `va_list`, `pthread_t`, GCC vector types, etc.).  These built-in entries
   ensure that code using standard types parses correctly even when compiling
   without real system headers (e.g., cross-compiling without a sysroot).

---

## Declarator Parsing and the Inside-Out Rule

C declarators are notoriously difficult to read because they follow an
**inside-out rule**: you start at the declared name, then alternate between
suffixes (arrays, function parameters) and prefixes (pointers), reading outward
through parentheses.  For example:

```c
int (*fp)(int)        // fp is a pointer to a function(int) returning int
int (*arr)[3][6]      // arr is a pointer to an array[3] of array[6] of int
int *(*fps[10])(int)  // fps is array[10] of pointer to function(int) returning int*
int (**fpp)(int)      // fpp is a pointer to a pointer to a function(int) returning int
```

The declarator parser in `declarators.rs` handles this by decomposing the
parse into three parts:

1. **Outer pointers.**  Leading `*` tokens (with optional cv-qualifiers) before
   the name or parenthesized group.

2. **Inner derived declarators.**  If a `(` is found that starts a
   parenthesized declarator (not a parameter list), the parser recurses into
   `parse_declarator()` to parse the inner declarator, then expects `)`.

3. **Outer suffixes.**  After the name (or closing paren), the parser
   collects array dimensions `[expr]` and function parameter lists `(params)`.

These three parts are then combined by `combine_declarator_parts`, which
applies C's inside-out reading to produce a linear `Vec<DerivedDeclarator>`.
The method handles several cases:

- **Function pointers:** `int (*fp)(int)` produces
  `[Pointer, FunctionPointer([int])]`.
- **Pointer-to-array:** `int (*p)[3]` produces `[Array(3), Pointer]`.
- **Array of function pointers:** `int (*fps[10])(int)` produces
  `[Pointer, FunctionPointer([int]), Array(10)]`.
- **Double pointers to functions:** `int (**fpp)(int)` produces
  `[Pointer, FunctionPointer([int]), Pointer]`.
- **Nested function pointers returning function pointers:** handled via
  `ParenAbstractDecl::NestedFnPtr` for types like `void(*(*)(void*))(void)`.

### Disambiguating `(` in Declarators

The `is_paren_declarator()` helper distinguishes a parenthesized declarator
(like `(*fp)`) from a function parameter list (like `(int x, int y)`).  It
peeks at the token after `(`:

| Token after `(` | Interpretation |
|---|---|
| `*`, `^`, `(`, `[` | Declarator grouping |
| Non-typedef identifier | Named declarator |
| `__attribute__`, `__extension__` | Declarator grouping |
| Type keyword (`int`, `void`, `struct`, ...) | Parameter list |
| `)`, `...` | Parameter list (empty or variadic) |
| Typedef name | Parameter list (since `(size_t)` is a parameter) |

### GCC Attributes on Declarators

`parse_declarator_with_attrs` also collects GCC `__attribute__` annotations
that may appear before and after the declarator proper.  These include
`mode(QI/HI/SI/DI/TI)` for integer width overrides, `common` for tentative
definitions, and `aligned(N)` for alignment.  Pre-declarator and
post-declarator alignment values are merged by taking the maximum, per C11
section 6.7.5.

---

## Type Specifier Parsing

C allows type specifier keywords in any order: `unsigned long int`,
`long unsigned int`, and `int long unsigned` are all the same type.  The
parser in `types.rs` handles this by collecting boolean flags into a
`TypeSpecFlags` struct:

```rust
struct TypeSpecFlags {
    has_void: bool,
    has_bool: bool,
    has_float: bool,
    has_double: bool,
    has_complex: bool,
    has_char: bool,
    has_short: bool,
    has_int: bool,
    has_unsigned: bool,
    has_signed: bool,
    has_struct: bool,
    has_union: bool,
    has_enum: bool,
    has_typeof: bool,
    long_count: u32,         // tracks "long" vs "long long"
    typedef_name: Option<String>,
}
```

The `parse_type_specifier()` method loops, consuming tokens and setting
flags, until it hits a token that is not a type specifier, qualifier, or
storage class.  Then `resolve_type_flags()` maps the flag combination to a
concrete `TypeSpecifier` variant:

| Flags | Result |
|---|---|
| `unsigned` + `long_count=2` | `UnsignedLongLong` |
| `signed` + `char` | `Char` |
| `long` + `double` | `LongDouble` |
| `complex` + `float` | `ComplexFloat` |
| `complex` + `double` | `ComplexDouble` |
| just `unsigned` | `UnsignedInt` |
| just `long_count=1` | `Long` |
| none of the above + `has_int` | `Int` |

Storage-class specifiers (`static`, `extern`, `typedef`, `inline`,
`_Thread_local`) and type qualifiers (`const`, `volatile`, `restrict`) are
accumulated into the `ParsedDeclAttrs` buffer during the same loop, rather
than being part of the returned `TypeSpecifier`.

The method also handles:

- **`struct`/`union`/`enum`** definitions and forward declarations, including
  packed structs and `#pragma pack` alignment propagation.
- **`typeof(expr)`** and **`typeof(type)`** -- GCC extensions for computed types.
- **`__auto_type`** -- GCC type inference from initializer.
- **`_Atomic(type)`** -- C11 atomic type specifier (parsed but atomic qualifier
  not tracked).
- **`_Alignas(N)`** and **`_Alignas(type)`** -- C11 alignment specifiers.
- **`__attribute__((mode(...)))`** -- GCC integer width override that
  transforms the base type to a specific bit-width (`QI`=8, `HI`=16, `SI`=32,
  `DI`=64, `TI`=128) while preserving signedness.
- **`__int128`** and **`__uint128_t`** -- 128-bit integer types (rejected on
  32-bit targets with a diagnostic).

---

## Declaration Parsing

Declarations are parsed in `declarations.rs`.  The top-level entry point
`parse_external_decl()` handles:

1. **Top-level `asm` directives** -- `asm("...")` outside any function, emitted
   verbatim as `ExternalDecl::TopLevelAsm`.

2. **`_Static_assert`** -- the assertion expression is evaluated at parse
   time; if it evaluates to zero, a compile error is emitted with the
   assertion message.  No meaningful AST node is produced (at file scope,
   an empty `Declaration` sentinel is returned to satisfy the
   `ExternalDecl` return type).

3. **Implicit `int`** -- C89 compatibility: if no type specifier is found but
   the token sequence looks like `identifier(`, it is treated as a function
   with implicit `int` return type.

4. **Type specifier + declarator** -- The method parses a type specifier, then
   a declarator (with GCC attributes), then inspects what follows:
   - If the declarator ends with `Function(...)` and the next token is `{`
     (or a type specifier for K&R style), it is a **function definition**.
   - Otherwise, it is a **declaration** (variable, typedef, function prototype).

### Function Definitions

`parse_function_def()` handles both modern and K&R-style parameter
declarations:

```c
// Modern (prototype) style:
int add(int a, int b) { return a + b; }

// K&R (identifier list) style:
int add(a, b)
    int a;
    int b;
{ return a + b; }
```

For K&R functions, `parse_kr_params()` matches parameter names from the
identifier list with type declarations that follow.

The method also shadows typedef names that collide with parameter names for
the duration of the function body, restoring the shadowing set afterward.
The return type is computed from the derived declarator chain by
`build_return_type`, which applies post-Function and pre-Function derivations
(handling cases like `int (*func())[3]` where the return type is a pointer to
an array).

### Variable Declarations

`parse_declaration_rest()` handles multi-declarator declarations
(`int x = 1, y = 2, *z;`), initializer parsing (including nested designated
initializers with C99 `.field` and `[index]` syntax plus GCC range
designators `[lo ... hi]`), and the transfer of accumulated GCC attributes
to per-declarator `DeclAttributes`.

A `DeclContext` struct groups per-declaration attribute state (alignment,
`_Alignas` type, common flag, per-declarator attributes) to avoid threading
many individual parameters between `parse_external_decl` and
`parse_declaration_rest`.

When a `typedef` declaration is parsed, the declared names are added to the
`typedefs` set, making them available for disambiguation in subsequent parsing.

### Attribute Merging

GCC attributes can appear in multiple positions on a declaration:

```c
__attribute__((aligned(16))) int __attribute__((weak)) x __attribute__((section(".data")));
```

The parser collects attributes from each position (type-level, declarator-level,
post-declarator-level) and merges them.  For alignment, the maximum value is
taken per C11 section 6.7.5.  For visibility, an explicit
`__attribute__((visibility(...)))` takes priority over the current
`#pragma GCC visibility` default.

---

## Statement Parsing

The statement parser in `statements.rs` handles all C statement forms:

| Statement | AST variant |
|---|---|
| `expr;` (expression statement) | `Stmt::Expr` |
| `return expr;` | `Stmt::Return` |
| `if (cond) then else` | `Stmt::If` |
| `while (cond) body` | `Stmt::While` |
| `do body while (cond);` | `Stmt::DoWhile` |
| `for (init; cond; inc) body` | `Stmt::For` |
| `{ items }` | `Stmt::Compound` |
| `break;` / `continue;` | `Stmt::Break` / `Stmt::Continue` |
| `switch (expr) body` | `Stmt::Switch` |
| `case expr:` / `default:` | `Stmt::Case` / `Stmt::Default` |
| `case lo ... hi:` | `Stmt::CaseRange` (GCC extension) |
| `goto label;` | `Stmt::Goto` |
| `goto *expr;` | `Stmt::GotoIndirect` (GCC computed goto) |
| `label: stmt` | `Stmt::Label` |
| `asm(...)` | `Stmt::InlineAsm` |
| declaration in statement position | `Stmt::Declaration` (C23/GNU) |

### Compound Statements and Scope Management

`parse_compound_stmt()` implements scope management.  At entry it saves the
typedef shadowing set and the declaration attribute flags; at exit it restores
both.  This prevents storage-class specifiers from declarations inside the
block from leaking into the enclosing context -- critical for GCC statement
expressions used inside `typeof()`:

```c
typeof(({ extern void f(void); 42; })) x = 10;  // 'extern' must not leak
```

The method also handles GNU `__label__` declarations at the start of blocks,
which declare local labels scoped to the compound statement.

Within a compound statement, the parser distinguishes declarations from
statements by calling `is_type_specifier()`.  If the current token begins a
type specifier (and is not a typedef-name used as a label), it takes the
declaration path; otherwise it takes the statement path.

### For Statements

`for` statements receive special handling because C99 allows a declaration in
the initializer position:

```c
for (int i = 0; i < n; i++) { ... }
```

The parser checks `is_type_specifier()` in the initializer to distinguish
`ForInit::Declaration` from `ForInit::Expr`.

---

## Inline Assembly (GCC Extended Syntax)

The parser supports GCC extended inline assembly with the full four-colon
syntax:

```c
asm [volatile] [goto] [inline] (
    "template string"           // template with %0, %1, %[name] placeholders
    : [output operands]         // first colon
    : [input operands]          // second colon
    : [clobber list]            // third colon
    : [goto labels]             // fourth colon (asm goto)
);
```

Each operand is parsed as:

```
[symbolic_name] "constraint" (c_expression)
```

The symbolic name (in brackets) is optional.  Constraints are string literals
like `"=r"`, `"+m"`, `"i"`.  The constraint string may be formed from
concatenated string literals.  Clobbers are bare string literals (`"memory"`,
`"cc"`, register names).  Goto labels are comma-separated identifiers.

All four operand sections are optional; parsing stops at each colon boundary
if there are no more sections.  Template strings support concatenation of
adjacent string literals.

Top-level `asm("...")` directives (outside any function) are also supported
and produce `ExternalDecl::TopLevelAsm` nodes for verbatim emission into the
assembly output.

---

## AST Node Types

The AST is defined as a collection of enums and structs in `ast.rs`.  The
root is `TranslationUnit`, which contains a `Vec<ExternalDecl>`.

### Top-Level Structure

```
TranslationUnit
  Vec<ExternalDecl>
    FunctionDef(FunctionDef)     -- function definition with body
    Declaration(Declaration)     -- variable/typedef/prototype declaration
    TopLevelAsm(String)          -- asm("...") directive at file scope
```

### `FunctionDef`

Represents a function definition: return type, name, parameter list, variadic
flag, compound statement body, function attributes, K&R flag, and source span.

### `Declaration`

Represents variable declarations, typedef declarations, and function
prototypes.  Contains a base `TypeSpecifier`, a list of `InitDeclarator`s (each
with a name, derived declarator chain, optional initializer, and per-declarator
`DeclAttributes`), and packed boolean flags for storage class and qualifiers.
Also carries alignment overrides, address space qualifiers (`__seg_gs`,
`__seg_fs`), and vector size attributes.

### `TypeSpecifier`

A large enum covering all C type forms:

- **Primitive types:** `Void`, `Char`, `Short`, `Int`, `Long`, `LongLong`,
  `Float`, `Double`, `LongDouble`, `Signed`, `Unsigned`, `Bool`, `Int128`
- **Unsigned variants:** `UnsignedChar` through `UnsignedInt128`
- **Complex types:** `ComplexFloat`, `ComplexDouble`, `ComplexLongDouble`
- **Aggregate types:** `Struct(name, fields, packed, pragma_pack, aligned)`,
  `Union(...)`, `Enum(name, variants, packed)`
- **Derived types:** `Pointer(inner, address_space)`, `Array(inner, size)`,
  `FunctionPointer(ret, params, variadic)`, `BareFunction(ret, params, variadic)`
- **Name-based:** `TypedefName(String)`
- **Deferred:** `Typeof(Expr)`, `TypeofType(TypeSpecifier)`, `AutoType`
- **Vector:** `Vector(inner, total_bytes)` — wraps a base element type with
  `__attribute__((vector_size(N)))` in cast/compound-literal/sizeof contexts

Note: `Signed` and `Unsigned` are standalone variants that exist for
completeness in type resolution but are not currently emitted by the parser
(they are resolved into their concrete forms like `Int` or `UnsignedInt`
during `resolve_type_flags`).

### `Expr`

The expression enum has 43 variants covering:

- **Literals:** `IntLiteral`, `UIntLiteral`, `LongLiteral`, `ULongLiteral`,
  `LongLongLiteral`, `ULongLongLiteral`, `FloatLiteral`, `FloatLiteralF32`,
  `FloatLiteralLongDouble` (with full 128-bit precision bytes), `StringLiteral`,
  `WideStringLiteral`, `Char16StringLiteral`, `CharLiteral`,
  `ImaginaryLiteral`, `ImaginaryLiteralF32`, `ImaginaryLiteralLongDouble`
- **Names:** `Identifier`
- **Operators:** `BinaryOp`, `UnaryOp`, `PostfixOp`, `Assign`,
  `CompoundAssign`
- **Access:** `ArraySubscript`, `MemberAccess`, `PointerMemberAccess`,
  `AddressOf`, `Deref`
- **Control flow:** `Conditional` (ternary), `GnuConditional` (omitted-middle
  `cond ?: else`), `Comma`
- **Type operations:** `Cast`, `Sizeof`, `Alignof`, `GnuAlignof` (preferred
  alignment), `AlignofExpr`, `GnuAlignofExpr`, `VaArg`
- **Compound constructs:** `FunctionCall`, `CompoundLiteral`,
  `StmtExpr` (GCC statement expression `({ ... })`)
- **Generic selection:** `GenericSelection` (C11 `_Generic`)
- **GCC extensions:** `LabelAddr` (`&&label` for computed goto),
  `BuiltinTypesCompatibleP`

Every `Expr` variant carries a `Span` for source location tracking.  The
`ExprId` newtype wraps a stable pointer address for use as a cache key in
downstream type-checking and constant-evaluation phases.

### `Stmt`

Covers all C statement forms as described in the
[Statement Parsing](#statement-parsing) section.  The full set of variants:

- `Expr(Option<Expr>)` -- expression statement (the most common statement type)
- `Return`, `If`, `While`, `DoWhile`, `For`, `Compound`
- `Break`, `Continue`, `Switch`, `Case`, `Default`
- `CaseRange` (GCC extension), `Goto`, `GotoIndirect` (GCC computed goto)
- `Label`, `Declaration`, `InlineAsm`

Each statement variant that represents a control-flow construct carries a `Span`
for source location, accessible via `Stmt::span()`.

### `DerivedDeclarator`

Describes type modifications applied by a declarator:

```rust
enum DerivedDeclarator {
    Pointer,                                // *
    Array(Option<Box<Expr>>),               // [expr] or []
    Function(Vec<ParamDecl>, bool),         // (params) with variadic flag
    FunctionPointer(Vec<ParamDecl>, bool),  // (*)(params) -- pointer to function
}
```

The distinction between `Function` and `FunctionPointer` is essential:
`Function` appears in function definitions and prototypes, while
`FunctionPointer` appears in declarators like `void (*fp)(int)` where the
`(*...)` syntax introduces a pointer-to-function.

### Initializers

```rust
enum Initializer {
    Expr(Expr),                         // simple: int x = 42;
    List(Vec<InitializerItem>),         // braced: int a[] = {1, 2, 3};
}

struct InitializerItem {
    designators: Vec<Designator>,       // .field, [index], [lo...hi]
    init: Initializer,                  // recursively nested
}

enum Designator {
    Index(Expr),          // [expr]
    Range(Expr, Expr),    // [lo ... hi]  (GCC extension)
    Field(String),        // .fieldname
}
```

---

## Packed Bitfield Attributes

Four AST and parser-internal types use packed bitfield integers to store
boolean flags instead of individual `bool` fields:

| Struct | Flag type | Booleans stored |
|---|---|---|
| `FunctionAttributes` | `u16` (13 bits) | static, inline, extern, gnu_inline, always_inline, noinline, constructor, destructor, weak, used, fastcall, naked, noreturn |
| `Declaration` | `u16` (9 bits) | static, extern, typedef, const, volatile, common, thread_local, transparent_union, inline |
| `DeclAttributes` | `u16` (8 bits) | constructor, destructor, weak, error_attr, noreturn, used, fastcall, naked |
| `ParsedDeclAttrs` | `u32` (19 bits) | typedef, static, extern, thread_local, inline, const, volatile, constructor, destructor, weak, used, gnu_inline, always_inline, noinline, noreturn, error_attr, transparent_union, fastcall, naked |

Each flag is a named constant (power of two) in a companion module
(e.g., `func_attr_flag::STATIC = 1 << 0`).  Getters use `flags & MASK != 0`
and setters use `flags |= MASK` / `flags &= !MASK`.

This design achieves three goals:

- **Memory savings.**  13 booleans in `FunctionAttributes` collapse from 13
  bytes to 2 bytes.  For `ParsedDeclAttrs`, 19 booleans collapse from 19 bytes
  to 4 bytes.  Since AST nodes are allocated in large numbers, these savings
  are meaningful.

- **Bulk save/restore.**  `ParsedDeclAttrs::save_flags()` and
  `restore_flags()` capture and restore all 19 boolean flags as a single `u32`,
  used to prevent storage-class specifiers from leaking across compound
  statement scope boundaries.

- **API ergonomics.**  Callers use named getter/setter methods
  (`is_static()`, `set_inline(true)`) rather than raw bit manipulation.  The
  `Debug` implementations expand flags into named booleans for readable output.

Non-boolean attributes (section name, visibility string, alias target, asm
register, cleanup function name, vector size, alignment overrides) remain as
`Option<String>` or `Option<usize>` fields alongside the packed flags.

---

## Error Recovery Strategy

The parser employs several error recovery strategies to continue parsing after
encountering invalid input:

1. **Emit and continue.**  Most `expect*` methods report an error but do not
   halt.  If the expected token is missing, the parser records the error and
   continues from the current position, producing a potentially malformed but
   complete AST.

2. **Contextual error messages.**  Four `expect` variants produce
   increasingly specific diagnostics:
   - `expect(token)` -- generic: "expected `;` before `}`"
   - `expect_after(token, context)` -- positional: "expected `;` after return
     statement" with a fix-it hint
   - `expect_context(token, context)` -- contextual: "expected `(` after `if`"
     with a fix-it hint
   - `expect_closing(token, open_span)` -- matching: "expected `)` before `;`"
     with a note "to match this `(` at file.c:10:5" and a fix-it hint

3. **Fix-it hints.**  Error diagnostics include insertion suggestions (e.g.,
   "insert `;`") that an IDE or error display could use to suggest corrections.

4. **Speculative backtracking.**  For ambiguous constructs (casts vs.
   parenthesized expressions, type specifiers vs. identifiers), the parser
   saves its position, attempts one interpretation, and backtracks on failure.
   This avoids committing to a parse that might be wrong.

5. **Top-level skip.**  In `parse()`, if `parse_external_decl()` returns
   `None` for a non-EOF, non-semicolon token, the parser reports an error and
   advances past the unrecognized token to resynchronize.

6. **Error counting.**  `error_count` tracks the total number of parse errors,
   allowing the driver to decide whether to abort compilation after parsing.

---

## Known Limitations

- **Parser does not invoke the preprocessor.**  The compiler has a fully
  integrated C preprocessor (see `frontend/preprocessor/`), but the driver
  orchestrates the pipeline: preprocessing runs first, then lexing, then
  parsing.  From the parser's perspective, input tokens are already
  macro-expanded and include-resolved.  The parser also pre-seeds a set of
  builtin typedef names (see below) so that standard types like `size_t`
  parse correctly even when compiling without real system headers.

- **No type checking during parsing.**  The parser produces an untyped AST.
  Type resolution, implicit conversions, and semantic checks happen in later
  lowering phases.  This means some constructs that are syntactically valid
  but semantically illegal will parse successfully.

- **Typedef scope is approximated.**  Typedef names are tracked globally with
  a shadowing set for local scopes.  This is correct for the vast majority of
  code but does not model C's full block-scope typedef rules with complete
  precision (e.g., a typedef introduced inside a `for` loop initializer may
  remain visible longer than the standard specifies).

- **`_Atomic` is parsed but not fully modeled.**  `_Atomic(T)` is treated as
  equivalent to `T` -- the atomic qualification is syntactically consumed but
  not preserved in the AST.

- **`restrict` is consumed but ignored.**  The `restrict` qualifier is
  recognized and skipped but does not appear in the AST, since it is an
  optimization hint that does not affect code generation in this compiler.

- **Constant expression evaluation is limited.**  The parser can evaluate
  simple constant expressions (for enum values and alignment attributes) using
  a lookup table of previously parsed enum constants, but cannot evaluate
  arbitrary compile-time expressions involving sizeof of composite types.

- **`_Generic` const tracking is partial.**  The parser tracks `is_const`
  flags on `_Generic` associations and parameter declarations to distinguish
  const-qualified pointer types, but CType itself does not carry qualifiers.
  This means const-sensitivity works for the common cases but not for complex
  expressions like casts or subscripts.

- **No C++ support.**  This is a C parser only.  C++ constructs (classes,
  templates, namespaces, references, overloading) are not recognized.