Skip to content

x64: Refactor backend condition handling#11097

Merged
fitzgen merged 1 commit intobytecodealliance:mainfrom
alexcrichton:x64-refactor-conditions
Jun 24, 2025
Merged

x64: Refactor backend condition handling#11097
fitzgen merged 1 commit intobytecodealliance:mainfrom
alexcrichton:x64-refactor-conditions

Conversation

@alexcrichton
Copy link
Member

This commit represents a refactor of the x64 backend's handling of condition codes for various instructions with a goal of simplifying conversion of a Value to a condition code used for select and brif. This additionally tries to deduplicate handling of condition codes across icmp and fcmp for example.

Previously the x64 backend had types such as IcmpCondResult and FcmpCondResult which respresented the condition codes for the icmp and fcmp values. When used in select, brif, or trap{n,}z each lowering had to special case icmp and fcmp to create these *CondResult values and call specialized helpers to lower the value back. Furthermore on the select instruction this created a matrix of ways-to-produce-the-condition-code along with
ways-to-handle-the-condition-code to conditionally move GPR, XMM, or 128-bit values. The end result was a fair bit of duplication across rules and optimizations on some rules but not others. For example brif on a vall_true value was optimized but select was not.

The design implemented in this commit is inspired/modeled after what the riscv64 backend is currently doing. There is a top-level helper is_nonzero_cmp which takes a Value and produces a CondResult. This CondResult is a merging of IcmpCondResult and FcmpCondResult into one. This centralizes the ability to take any value and produce condition codes, for example this handles icmp, fcmp, 128-bit integers, v{all,any}_true, etc. Once the condition is produced it's then handled separately at each location for jumps, selects, traps, etc. In effect CondResult serves as a "narrow waist" through which production of condition codes can all flow meaning that if an optimization is added for production of condition codes all instructions benefit instead of having to hand-update each one.

The goal of this refactoring was to not actually change any lowerings and only refactor internals, but this refactoring exposed a few non-optimal lowerings in the backend which were improved as a result of this change. In the future I plan on additionally adding more ways to pattern-match "produces a condition" which will now equally benefit all of these locations.

This commit represents a refactor of the x64 backend's handling of
condition codes for various instructions with a goal of simplifying
conversion of a `Value` to a condition code used for `select` and
`brif`. This additionally tries to deduplicate handling of condition
codes across `icmp` and `fcmp` for example.

Previously the x64 backend had types such as `IcmpCondResult` and
`FcmpCondResult` which respresented the condition codes for the `icmp`
and `fcmp` values. When used in `select`, `brif`, or `trap{n,}z` each
lowering had to special case `icmp` and `fcmp` to create these
`*CondResult` values and call specialized helpers to lower the value
back. Furthermore on the `select` instruction this created a matrix of
ways-to-produce-the-condition-code along with
ways-to-handle-the-condition-code to conditionally move GPR, XMM, or
128-bit values. The end result was a fair bit of duplication across
rules and optimizations on some rules but not others. For example `brif`
on a `vall_true` value was optimized but `select` was not.

The design implemented in this commit is inspired/modeled after what the
riscv64 backend is currently doing. There is a top-level helper
`is_nonzero_cmp` which takes a `Value` and produces a `CondResult`. This
`CondResult` is a merging of `IcmpCondResult` and `FcmpCondResult` into
one. This centralizes the ability to take any value and produce
condition codes, for example this handles `icmp`, `fcmp`, 128-bit
integers, `v{all,any}_true`, etc. Once the condition is produced it's
then handled separately at each location for jumps, selects, traps, etc.
In effect `CondResult` serves as a "narrow waist" through which
production of condition codes can all flow meaning that if an
optimization is added for production of condition codes all instructions
benefit instead of having to hand-update each one.

The goal of this refactoring was to not actually change any lowerings
and only refactor internals, but this refactoring exposed a few
non-optimal lowerings in the backend which were improved as a result of
this change. In the future I plan on additionally adding more ways to
pattern-match "produces a condition" which will now equally benefit all
of these locations.
@alexcrichton alexcrichton requested review from a team as code owners June 21, 2025 23:06
@alexcrichton alexcrichton requested review from abrown and fitzgen and removed request for a team June 21, 2025 23:06
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Jun 22, 2025
Copy link
Member

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to me but @fitzgen you may want to double-check the Spectre-related changes.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@fitzgen fitzgen added this pull request to the merge queue Jun 24, 2025
Merged via the queue into bytecodealliance:main with commit 3d1cbfd Jun 24, 2025
53 checks passed
@alexcrichton alexcrichton deleted the x64-refactor-conditions branch June 24, 2025 15:19
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jul 15, 2025
This commit fixes an accidental regression from bytecodealliance#11097 where a
`select_spectre_guard` with a boolean condition that and'd two CCs
together would fail to lower and cause a panic during lowering. This was
reachable when explicit bounds checks are enabled from wasm, for
example. The fix here is to handle the `And` condition in the same way
that lowering `select` does which is to model that as it flows into the
select helper.
github-merge-queue bot pushed a commit that referenced this pull request Jul 15, 2025
This commit fixes an accidental regression from #11097 where a
`select_spectre_guard` with a boolean condition that and'd two CCs
together would fail to lower and cause a panic during lowering. This was
reachable when explicit bounds checks are enabled from wasm, for
example. The fix here is to handle the `And` condition in the same way
that lowering `select` does which is to model that as it flows into the
select helper.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jul 15, 2025
…liance#11242)

This commit fixes an accidental regression from bytecodealliance#11097 where a
`select_spectre_guard` with a boolean condition that and'd two CCs
together would fail to lower and cause a panic during lowering. This was
reachable when explicit bounds checks are enabled from wasm, for
example. The fix here is to handle the `And` condition in the same way
that lowering `select` does which is to model that as it flows into the
select helper.
alexcrichton added a commit that referenced this pull request Jul 22, 2025
…11248)

This commit fixes an accidental regression from #11097 where a
`select_spectre_guard` with a boolean condition that and'd two CCs
together would fail to lower and cause a panic during lowering. This was
reachable when explicit bounds checks are enabled from wasm, for
example. The fix here is to handle the `And` condition in the same way
that lowering `select` does which is to model that as it flows into the
select helper.
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
This commit represents a refactor of the x64 backend's handling of
condition codes for various instructions with a goal of simplifying
conversion of a `Value` to a condition code used for `select` and
`brif`. This additionally tries to deduplicate handling of condition
codes across `icmp` and `fcmp` for example.

Previously the x64 backend had types such as `IcmpCondResult` and
`FcmpCondResult` which respresented the condition codes for the `icmp`
and `fcmp` values. When used in `select`, `brif`, or `trap{n,}z` each
lowering had to special case `icmp` and `fcmp` to create these
`*CondResult` values and call specialized helpers to lower the value
back. Furthermore on the `select` instruction this created a matrix of
ways-to-produce-the-condition-code along with
ways-to-handle-the-condition-code to conditionally move GPR, XMM, or
128-bit values. The end result was a fair bit of duplication across
rules and optimizations on some rules but not others. For example `brif`
on a `vall_true` value was optimized but `select` was not.

The design implemented in this commit is inspired/modeled after what the
riscv64 backend is currently doing. There is a top-level helper
`is_nonzero_cmp` which takes a `Value` and produces a `CondResult`. This
`CondResult` is a merging of `IcmpCondResult` and `FcmpCondResult` into
one. This centralizes the ability to take any value and produce
condition codes, for example this handles `icmp`, `fcmp`, 128-bit
integers, `v{all,any}_true`, etc. Once the condition is produced it's
then handled separately at each location for jumps, selects, traps, etc.
In effect `CondResult` serves as a "narrow waist" through which
production of condition codes can all flow meaning that if an
optimization is added for production of condition codes all instructions
benefit instead of having to hand-update each one.

The goal of this refactoring was to not actually change any lowerings
and only refactor internals, but this refactoring exposed a few
non-optimal lowerings in the backend which were improved as a result of
this change. In the future I plan on additionally adding more ways to
pattern-match "produces a condition" which will now equally benefit all
of these locations.
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
…liance#11242)

This commit fixes an accidental regression from bytecodealliance#11097 where a
`select_spectre_guard` with a boolean condition that and'd two CCs
together would fail to lower and cause a panic during lowering. This was
reachable when explicit bounds checks are enabled from wasm, for
example. The fix here is to handle the `And` condition in the same way
that lowering `select` does which is to model that as it flows into the
select helper.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jan 13, 2026
We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.
github-merge-queue bot pushed a commit that referenced this pull request Jan 13, 2026
* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in #11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jan 13, 2026
…odealliance#12333)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jan 13, 2026
…odealliance#12333)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jan 13, 2026
…odealliance#12333)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jan 13, 2026
…odealliance#12333)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in bytecodealliance#11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit that referenced this pull request Jan 14, 2026
… (#12338)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in #11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit that referenced this pull request Jan 14, 2026
… (#12341)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in #11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit that referenced this pull request Jan 14, 2026
… (#12339)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in #11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
cfallin added a commit that referenced this pull request Jan 14, 2026
… (#12342)

* Cranelift: x64: fix user-controlled recursion in cmp emission.

We had a set of rules introduced in #11097 that attempted to optimize
the case of testing the result of an `icmp` for a nonzero value. This
allowed optimization of, for example, `(((x == 0) == 0) == 0 ...)` to
a single level, either `x == 0` or `x != 0` depending on even/odd
nesting depth.

Unfortunately this kind of recursion in the backend has a depth
bounded only by the user input, hence creates a DoS vulnerability: the
wrong kind of compiler input can cause a stack overflow in Cranelift
at compilation time. This case is reachable from Wasmtime's Wasm
frontend via the `i32.eqz` operator (for example) as well.

Ideally, this kind of deep rewrite is best done in our mid-end
optimizer, where we think carefully about bounds for recursive
rewrites. The left-hand sides for the backend rules should really be
fixed shapes that correspond to machine instructions, rather than
ad-hoc peephole optimizations in their own right.

This fix thus simply removes the recursion case that causes the
blowup. The patch includes two tests: one with optimizations disabled,
showing correct compilation (without the fix, this case fails to
compile with a stack overflow), and one with optimizations enabled,
showing that the mid-end properly cleans up the nested expression and
we get the expected one-level result anyway.

* Preserve codegen on branches.

This change works by splitting a rule so that the entry point used by
`brif` lowering can still peel off one layer of `icmp` and emit it
directly, without entering the unbounded structural recursion.

It also adds a mid-end rule to catch one case that we were previously
catching in the backend only: `fcmp(...) != 0`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants