Skip to content

Commit 26f6f8b

Browse files
committed
Initial compiler scaffold: full pipeline for x86-64, AArch64, RISC-V
Complete end-to-end C compiler scaffold with the following components: Frontend: - Lexer: tokenizes C source with spans (keywords, literals, operators) - Preprocessor: strips #include/#define/#ifdef directives (expansion TODO) - Parser: recursive descent parser producing full AST (declarations, statements, expressions with proper operator precedence) - Sema: semantic analysis stub (pass-through for now) IR: - Custom IR with basic blocks, instructions, terminators - AST lowering to alloca-based IR (not yet SSA) - mem2reg stub for future SSA promotion Backend (all three targets): - x86-64: stack-based codegen producing AT&T assembly - AArch64: codegen using x0-x7 arg regs, stp/ldp frame - RISC-V 64: codegen using a0-a7 arg regs, s0 frame pointer - Assembly/linking via system gcc (native ELF writer TODO) Test results (1% sample): - x86-64: ~13% passing (40/300) - AArch64: ~8% passing (23/287) - RISC-V 64: ~15% passing (42/287) Working features: return values, printf with string literals, basic arithmetic, local variables, if/else/while/for control flow, function calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a28ff29 commit 26f6f8b

59 files changed

Lines changed: 5657 additions & 11 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
/target/
2+
Cargo.lock
3+
*.o
4+
*.s

Cargo.toml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
[package]
2+
name = "ccc"
3+
version = "0.1.0"
4+
edition = "2021"
5+
description = "A C compiler targeting x86-64, ARM64, and RISC-V"
6+
7+
[[bin]]
8+
name = "ccc"
9+
path = "src/main.rs"
10+
11+
[[bin]]
12+
name = "ccc-x86"
13+
path = "src/main.rs"
14+
15+
[[bin]]
16+
name = "ccc-arm"
17+
path = "src/main.rs"
18+
19+
[[bin]]
20+
name = "ccc-riscv"
21+
path = "src/main.rs"

README.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# CCC - C Compiler Collection
2+
3+
A C compiler written in Rust, targeting x86-64, AArch64, and RISC-V 64.
4+
5+
## Status
6+
7+
**Initial scaffold complete.** The basic compilation pipeline is functional:
8+
- Lexer, preprocessor (strip-only), parser, semantic analysis (stub)
9+
- IR lowering (AST -> alloca-based IR)
10+
- Code generation for x86-64, AArch64, and RISC-V 64
11+
- Assembly and linking via system tools (gcc/gas)
12+
13+
### Test Results (1% sample)
14+
- x86-64: ~13% passing
15+
- AArch64: ~8% passing
16+
- RISC-V 64: ~15% passing
17+
18+
### What Works
19+
- `int main() { return N; }` for any integer N
20+
- `printf()` with string literal arguments (via libc linking)
21+
- Basic arithmetic (`+`, `-`, `*`, `/`, `%`)
22+
- Local variable declarations and assignments
23+
- `if`/`else`, `while`, `for`, `do-while` control flow
24+
- Function calls with up to 6/8 arguments
25+
- Comparison operators
26+
27+
### What's Not Yet Implemented
28+
- Preprocessor (macros, includes, conditionals)
29+
- Type checking (sema is a stub)
30+
- Structs, unions, enums (parsed but not lowered)
31+
- Arrays and pointers (parsed but codegen incomplete)
32+
- Switch statements (stub)
33+
- Floating point
34+
- Global variables
35+
- String formatting (printf with %d etc)
36+
- Native assembler/linker (currently uses gcc)
37+
- Optimization passes
38+
39+
## Building
40+
41+
```bash
42+
cargo build
43+
# Produces: target/debug/ccc, ccc-x86, ccc-arm, ccc-riscv
44+
```
45+
46+
## Usage
47+
48+
```bash
49+
# Compile C to x86-64 executable
50+
target/debug/ccc -o output input.c
51+
52+
# Compile for AArch64
53+
target/debug/ccc-arm -o output input.c
54+
55+
# Compile for RISC-V 64
56+
target/debug/ccc-riscv -o output input.c
57+
```
58+
59+
## Architecture
60+
61+
```
62+
src/
63+
frontend/
64+
preprocessor/ Strip preprocessor directives (TODO: full expansion)
65+
lexer/ Tokenize C source with source locations
66+
parser/ Recursive descent parser, produces AST
67+
sema/ Semantic analysis (TODO: type checking)
68+
69+
ir/
70+
ir.rs IR definition (instructions, basic blocks, values)
71+
lowering/ AST -> alloca-based IR
72+
mem2reg/ TODO: promote allocas to SSA
73+
74+
passes/ TODO: optimization passes (constant fold, DCE, etc.)
75+
76+
backend/
77+
x86/
78+
codegen/ IR -> x86-64 assembly (stack-based allocation)
79+
assembler/ Assembly -> object file (via gcc -c)
80+
linker/ Object files -> executable (via gcc)
81+
arm/
82+
codegen/ IR -> AArch64 assembly
83+
assembler/ via aarch64-linux-gnu-gcc
84+
linker/ via aarch64-linux-gnu-gcc
85+
riscv/
86+
codegen/ IR -> RISC-V 64 assembly
87+
assembler/ via riscv64-linux-gnu-gcc
88+
linker/ via riscv64-linux-gnu-gcc
89+
90+
common/
91+
types.rs CType, IrType
92+
symbol_table.rs Scoped name resolution
93+
source.rs Span, SourceLocation, SourceManager
94+
error.rs Diagnostic with span
95+
96+
driver/ CLI argument parsing, pipeline orchestration
97+
```
98+
99+
## Running Tests
100+
101+
```bash
102+
python3 /verify/verify_compiler.py --compiler target/debug/ccc-x86 --ratio 100
103+
python3 /verify/verify_compiler.py --compiler target/debug/ccc-arm --ratio 100
104+
python3 /verify/verify_compiler.py --compiler target/debug/ccc-riscv --ratio 100
105+
```

current_tasks/initial_compiler_scaffold.txt

Lines changed: 0 additions & 11 deletions
This file was deleted.

ideas/implement_preprocessor.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Implement a real C preprocessor:
2+
- #define with macro expansion (object-like and function-like macros)
3+
- #include with file resolution (system headers from /usr/include)
4+
- #if, #ifdef, #ifndef, #elif, #else, #endif with expression evaluation
5+
- Predefined macros (__LINE__, __FILE__, __DATE__, __TIME__, etc.)
6+
- Token pasting (##) and stringification (#)
7+
8+
This is critical for passing tests that use #include <stdio.h> properly
9+
rather than relying on gcc's assembler/linker to resolve symbols.
10+
Many test failures are due to missing preprocessor support.

ideas/native_elf_writer.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Replace gcc-based assembler/linker with native ELF generation:
2+
3+
1. Write an x86-64 instruction encoder that directly emits machine code
4+
2. Write an ELF object file writer
5+
3. Write a simple linker that:
6+
- Resolves relocations
7+
- Links with system libc (read its ELF to find symbols)
8+
- Generates a complete ELF executable
9+
10+
This removes the dependency on gcc for assembly and linking.
11+
Start with x86-64, then extend to ARM and RISC-V.

ideas/register_allocator.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Implement a register allocator:
2+
3+
Current codegen uses a stack-based approach where every value is stored
4+
on the stack. This is correct but very slow.
5+
6+
Implement a linear scan or graph coloring register allocator:
7+
1. Build live intervals for each IR value
8+
2. Allocate registers using the chosen algorithm
9+
3. Insert spill/reload code for values that don't fit in registers
10+
4. Handle calling conventions (caller/callee-saved registers)
11+
12+
This will dramatically improve code quality and runtime performance.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
use std::process::Command;
2+
3+
/// AArch64 assembler. Delegates to system cross-assembler.
4+
/// TODO: Implement native ARM instruction encoding.
5+
pub struct ArmAssembler;
6+
7+
impl ArmAssembler {
8+
pub fn new() -> Self {
9+
Self
10+
}
11+
12+
pub fn assemble(&self, asm_text: &str, output_path: &str) -> Result<(), String> {
13+
let asm_path = format!("{}.s", output_path);
14+
std::fs::write(&asm_path, asm_text)
15+
.map_err(|e| format!("Failed to write assembly: {}", e))?;
16+
17+
let result = Command::new("aarch64-linux-gnu-gcc")
18+
.args(["-c", "-o", output_path, &asm_path])
19+
.output()
20+
.map_err(|e| format!("Failed to run ARM assembler: {}", e))?;
21+
22+
let _ = std::fs::remove_file(&asm_path);
23+
24+
if !result.status.success() {
25+
let stderr = String::from_utf8_lossy(&result.stderr);
26+
return Err(format!("ARM assembly failed: {}", stderr));
27+
}
28+
29+
Ok(())
30+
}
31+
}
32+
33+
impl Default for ArmAssembler {
34+
fn default() -> Self {
35+
Self::new()
36+
}
37+
}

src/backend/arm/assembler/mod.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
pub mod assembler;
2+
3+
pub use assembler::ArmAssembler;

0 commit comments

Comments
 (0)