x64: Refactor backend condition handling by alexcrichton · Pull Request #11097 · bytecodealliance/wasmtime

alexcrichton · 2025-06-21T23:06:28Z

This commit represents a refactor of the x64 backend's handling of condition codes for various instructions with a goal of simplifying conversion of a Value to a condition code used for select and brif. This additionally tries to deduplicate handling of condition codes across icmp and fcmp for example.

Previously the x64 backend had types such as IcmpCondResult and FcmpCondResult which respresented the condition codes for the icmp and fcmp values. When used in select, brif, or trap{n,}z each lowering had to special case icmp and fcmp to create these *CondResult values and call specialized helpers to lower the value back. Furthermore on the select instruction this created a matrix of ways-to-produce-the-condition-code along with
ways-to-handle-the-condition-code to conditionally move GPR, XMM, or 128-bit values. The end result was a fair bit of duplication across rules and optimizations on some rules but not others. For example brif on a vall_true value was optimized but select was not.

The design implemented in this commit is inspired/modeled after what the riscv64 backend is currently doing. There is a top-level helper is_nonzero_cmp which takes a Value and produces a CondResult. This CondResult is a merging of IcmpCondResult and FcmpCondResult into one. This centralizes the ability to take any value and produce condition codes, for example this handles icmp, fcmp, 128-bit integers, v{all,any}_true, etc. Once the condition is produced it's then handled separately at each location for jumps, selects, traps, etc. In effect CondResult serves as a "narrow waist" through which production of condition codes can all flow meaning that if an optimization is added for production of condition codes all instructions benefit instead of having to hand-update each one.

The goal of this refactoring was to not actually change any lowerings and only refactor internals, but this refactoring exposed a few non-optimal lowerings in the backend which were improved as a result of this change. In the future I plan on additionally adding more ways to pattern-match "produces a condition" which will now equally benefit all of these locations.

This commit represents a refactor of the x64 backend's handling of condition codes for various instructions with a goal of simplifying conversion of a `Value` to a condition code used for `select` and `brif`. This additionally tries to deduplicate handling of condition codes across `icmp` and `fcmp` for example. Previously the x64 backend had types such as `IcmpCondResult` and `FcmpCondResult` which respresented the condition codes for the `icmp` and `fcmp` values. When used in `select`, `brif`, or `trap{n,}z` each lowering had to special case `icmp` and `fcmp` to create these `*CondResult` values and call specialized helpers to lower the value back. Furthermore on the `select` instruction this created a matrix of ways-to-produce-the-condition-code along with ways-to-handle-the-condition-code to conditionally move GPR, XMM, or 128-bit values. The end result was a fair bit of duplication across rules and optimizations on some rules but not others. For example `brif` on a `vall_true` value was optimized but `select` was not. The design implemented in this commit is inspired/modeled after what the riscv64 backend is currently doing. There is a top-level helper `is_nonzero_cmp` which takes a `Value` and produces a `CondResult`. This `CondResult` is a merging of `IcmpCondResult` and `FcmpCondResult` into one. This centralizes the ability to take any value and produce condition codes, for example this handles `icmp`, `fcmp`, 128-bit integers, `v{all,any}_true`, etc. Once the condition is produced it's then handled separately at each location for jumps, selects, traps, etc. In effect `CondResult` serves as a "narrow waist" through which production of condition codes can all flow meaning that if an optimization is added for production of condition codes all instructions benefit instead of having to hand-update each one. The goal of this refactoring was to not actually change any lowerings and only refactor internals, but this refactoring exposed a few non-optimal lowerings in the backend which were improved as a result of this change. In the future I plan on additionally adding more ways to pattern-match "produces a condition" which will now equally benefit all of these locations.

abrown

I think this makes sense to me but @fitzgen you may want to double-check the Spectre-related changes.

fitzgen

Nice!

This commit fixes an accidental regression from bytecodealliance#11097 where a `select_spectre_guard` with a boolean condition that and'd two CCs together would fail to lower and cause a panic during lowering. This was reachable when explicit bounds checks are enabled from wasm, for example. The fix here is to handle the `And` condition in the same way that lowering `select` does which is to model that as it flows into the select helper.

This commit fixes an accidental regression from #11097 where a `select_spectre_guard` with a boolean condition that and'd two CCs together would fail to lower and cause a panic during lowering. This was reachable when explicit bounds checks are enabled from wasm, for example. The fix here is to handle the `And` condition in the same way that lowering `select` does which is to model that as it flows into the select helper.

…liance#11242) This commit fixes an accidental regression from bytecodealliance#11097 where a `select_spectre_guard` with a boolean condition that and'd two CCs together would fail to lower and cause a panic during lowering. This was reachable when explicit bounds checks are enabled from wasm, for example. The fix here is to handle the `And` condition in the same way that lowering `select` does which is to model that as it flows into the select helper.

…11248) This commit fixes an accidental regression from #11097 where a `select_spectre_guard` with a boolean condition that and'd two CCs together would fail to lower and cause a panic during lowering. This was reachable when explicit bounds checks are enabled from wasm, for example. The fix here is to handle the `And` condition in the same way that lowering `select` does which is to model that as it flows into the select helper.

This commit represents a refactor of the x64 backend's handling of condition codes for various instructions with a goal of simplifying conversion of a `Value` to a condition code used for `select` and `brif`. This additionally tries to deduplicate handling of condition codes across `icmp` and `fcmp` for example. Previously the x64 backend had types such as `IcmpCondResult` and `FcmpCondResult` which respresented the condition codes for the `icmp` and `fcmp` values. When used in `select`, `brif`, or `trap{n,}z` each lowering had to special case `icmp` and `fcmp` to create these `*CondResult` values and call specialized helpers to lower the value back. Furthermore on the `select` instruction this created a matrix of ways-to-produce-the-condition-code along with ways-to-handle-the-condition-code to conditionally move GPR, XMM, or 128-bit values. The end result was a fair bit of duplication across rules and optimizations on some rules but not others. For example `brif` on a `vall_true` value was optimized but `select` was not. The design implemented in this commit is inspired/modeled after what the riscv64 backend is currently doing. There is a top-level helper `is_nonzero_cmp` which takes a `Value` and produces a `CondResult`. This `CondResult` is a merging of `IcmpCondResult` and `FcmpCondResult` into one. This centralizes the ability to take any value and produce condition codes, for example this handles `icmp`, `fcmp`, 128-bit integers, `v{all,any}_true`, etc. Once the condition is produced it's then handled separately at each location for jumps, selects, traps, etc. In effect `CondResult` serves as a "narrow waist" through which production of condition codes can all flow meaning that if an optimization is added for production of condition codes all instructions benefit instead of having to hand-update each one. The goal of this refactoring was to not actually change any lowerings and only refactor internals, but this refactoring exposed a few non-optimal lowerings in the backend which were improved as a result of this change. In the future I plan on additionally adding more ways to pattern-match "produces a condition" which will now equally benefit all of these locations.

…liance#11242) This commit fixes an accidental regression from bytecodealliance#11097 where a `select_spectre_guard` with a boolean condition that and'd two CCs together would fail to lower and cause a panic during lowering. This was reachable when explicit bounds checks are enabled from wasm, for example. The fix here is to handle the `And` condition in the same way that lowering `select` does which is to model that as it flows into the select helper.

We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway.

* Cranelift: x64: fix user-controlled recursion in cmp emission. We had a set of rules introduced in #11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway. * Preserve codegen on branches. This change works by splitting a rule so that the entry point used by `brif` lowering can still peel off one layer of `icmp` and emit it directly, without entering the unbounded structural recursion. It also adds a mid-end rule to catch one case that we were previously catching in the backend only: `fcmp(...) != 0`.

…odealliance#12333) * Cranelift: x64: fix user-controlled recursion in cmp emission. We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway. * Preserve codegen on branches. This change works by splitting a rule so that the entry point used by `brif` lowering can still peel off one layer of `icmp` and emit it directly, without entering the unbounded structural recursion. It also adds a mid-end rule to catch one case that we were previously catching in the backend only: `fcmp(...) != 0`.

… (#12338) * Cranelift: x64: fix user-controlled recursion in cmp emission. We had a set of rules introduced in #11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway. * Preserve codegen on branches. This change works by splitting a rule so that the entry point used by `brif` lowering can still peel off one layer of `icmp` and emit it directly, without entering the unbounded structural recursion. It also adds a mid-end rule to catch one case that we were previously catching in the backend only: `fcmp(...) != 0`.

… (#12341) * Cranelift: x64: fix user-controlled recursion in cmp emission. We had a set of rules introduced in #11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway. * Preserve codegen on branches. This change works by splitting a rule so that the entry point used by `brif` lowering can still peel off one layer of `icmp` and emit it directly, without entering the unbounded structural recursion. It also adds a mid-end rule to catch one case that we were previously catching in the backend only: `fcmp(...) != 0`.

… (#12339) * Cranelift: x64: fix user-controlled recursion in cmp emission. We had a set of rules introduced in #11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway. * Preserve codegen on branches. This change works by splitting a rule so that the entry point used by `brif` lowering can still peel off one layer of `icmp` and emit it directly, without entering the unbounded structural recursion. It also adds a mid-end rule to catch one case that we were previously catching in the backend only: `fcmp(...) != 0`.

… (#12342) * Cranelift: x64: fix user-controlled recursion in cmp emission. We had a set of rules introduced in #11097 that attempted to optimize the case of testing the result of an `icmp` for a nonzero value. This allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to a single level, either `x == 0` or `x != 0` depending on even/odd nesting depth. Unfortunately this kind of recursion in the backend has a depth bounded only by the user input, hence creates a DoS vulnerability: the wrong kind of compiler input can cause a stack overflow in Cranelift at compilation time. This case is reachable from Wasmtime's Wasm frontend via the `i32.eqz` operator (for example) as well. Ideally, this kind of deep rewrite is best done in our mid-end optimizer, where we think carefully about bounds for recursive rewrites. The left-hand sides for the backend rules should really be fixed shapes that correspond to machine instructions, rather than ad-hoc peephole optimizations in their own right. This fix thus simply removes the recursion case that causes the blowup. The patch includes two tests: one with optimizations disabled, showing correct compilation (without the fix, this case fails to compile with a stack overflow), and one with optimizations enabled, showing that the mid-end properly cleans up the nested expression and we get the expected one-level result anyway. * Preserve codegen on branches. This change works by splitting a rule so that the entry point used by `brif` lowering can still peel off one layer of `icmp` and emit it directly, without entering the unbounded structural recursion. It also adds a mid-end rule to catch one case that we were previously catching in the backend only: `fcmp(...) != 0`.

alexcrichton requested review from a team as code owners June 21, 2025 23:06

alexcrichton requested review from abrown and fitzgen and removed request for a team June 21, 2025 23:06

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Jun 22, 2025

abrown approved these changes Jun 23, 2025

View reviewed changes

fitzgen approved these changes Jun 24, 2025

View reviewed changes

fitzgen added this pull request to the merge queue Jun 24, 2025

Merged via the queue into bytecodealliance:main with commit 3d1cbfd Jun 24, 2025
53 checks passed

alexcrichton deleted the x64-refactor-conditions branch June 24, 2025 15:19

alexcrichton mentioned this pull request Jul 15, 2025

x64: Fix a missing lowering rule for select_spectre_guard #11242

Merged

cfallin mentioned this pull request Jan 13, 2026

Cranelift: x64: fix user-controlled recursion in cmp emission. #12333

Merged

cfallin mentioned this pull request Jan 13, 2026

[36.0.0] Cranelift: x64: fix user-controlled recursion in cmp emission. (#12333) #12338

Merged

cfallin mentioned this pull request Jan 13, 2026

[39.0.0] Cranelift: x64: fix user-controlled recursion in cmp emission. (#12333) #12339

Merged

cfallin mentioned this pull request Jan 13, 2026

[40.0.0] Cranelift: x64: fix user-controlled recursion in cmp emission. (#12333) #12341

Merged

cfallin mentioned this pull request Jan 13, 2026

[41.0.0] Cranelift: x64: fix user-controlled recursion in cmp emission. (#12333) #12342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Refactor backend condition handling#11097

x64: Refactor backend condition handling#11097
fitzgen merged 1 commit intobytecodealliance:mainfrom
alexcrichton:x64-refactor-conditions

alexcrichton commented Jun 21, 2025

Uh oh!

abrown left a comment

Uh oh!

fitzgen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexcrichton commented Jun 21, 2025

Uh oh!

abrown left a comment

Choose a reason for hiding this comment

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants