Skip to content

Determine GPUExternalTexture lifetime #2124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Kangz opened this issue Sep 17, 2021 · 27 comments · Fixed by #2302
Closed

Determine GPUExternalTexture lifetime #2124

Kangz opened this issue Sep 17, 2021 · 27 comments · Fixed by #2302
Assignees
Milestone

Comments

@Kangz
Copy link
Contributor

Kangz commented Sep 17, 2021

It is important that importExternalTexture works with the ubiquitous HTMLVideoElement as it does today in the spec. However for browsers that implement WebCodec we should also be able to import VideoFrame as GPUExternalTexture. Since the API will take a VideoFrame, it will simply not possible to use it when WebCodec isn't present since you wouldn't be able to create a VideoFrame.

I think the only change in the API needed is the following:

dictionary GPUExternalTextureDescriptor : GPUObjectDescriptorBase {
-   required HTMLVideoElement source;
+   required (HTMLVideoElement | VideoFrame) source;
    // ...
};

Then we also need to specify what's the lifetime of the resulting GPUExternalTexture. Is it destroyed on VideoFrame.close()?

FYI @shaoboyan

[EDIT by @kainino0x]: Investigation: #1380

@Kangz Kangz added this to the V1.0 milestone Sep 17, 2021
@shaoboyan
Copy link
Contributor

@Kangz Thanks for proposing this. VideoFrame.close() is at least one of the signal to destroy GPUExternalTexture. And current lifecycle logic could still work for GPUExternalTexture.

@kainino0x
Copy link
Contributor

I'm finally thinking about this. Here are two possible reasonable options for the lifetimes, considering an example 3d video application that has e.g. a 24Hz video and a 60Hz render rate.

  1. No GPUExternalTexture.close().

    • When created from VideoFrame, closing the VideoFrame closes the GPUExternalTexture. It doesn't close earlier.
    • When created from <video>, it gets closed automatically in a microtask as today. App would simply re-import every time they want to use the video, but browsers likely want to cache one frame to avoid unnecessary re-importing work.
    • Applications that need more control than provided by importing <video> must use VideoFrame.
  2. Add GPUExternalTexture.close() but keep the auto-closing semantics when importing from <video>. (Close would just close early.)

    • When created from VideoFrame, closing either the VideoFrame or the GPUExternalTexture does NOT close the other. Both must be closed explicitly.
    • Still requires caching as above.
  3. Add GPUExternalTexture.close() and change the semantics of imports from <video>.

    • When created from <video>, it must be closed explicitly (causing a warning if it is GCed before closed). App would have to implement caching itself (if it wants it), by watching for requestVideoFrameCallback callbacks (or possibly by watching the currentTime attribute for changes, but IIRC this is imprecise?) and closing the previous frame when a new one is available.
    • This is more explicit and somewhat matches the style of the rest of WebGPU better.
    • Applications could stall a <video> by holding onto too many imported frames from the decoder pool. (It is no worse than [Chromium's implementation of] new VideoFrame(video_element).) If the browser uses a decoder ring buffer instead of a decoder pool (does any?), holding just one frame for too long could stall the decoder.
    • Requires browser to be able to hold strong-refs to video frames. Easy for Chromium (we do this in VideoFrame), but not sure about other browsers.

Needless to say I'm leaning away from option 3, despite the browser caching in 1/2. But maybe there is more to consider here.
Among 1/2, I slightly prefer 1, as early-closing should be unnecessary and it avoids adding a .close() method.

@shaoboyan
Copy link
Contributor

shaoboyan commented Nov 3, 2021

I'm a fan of option 1.
For option 2, I feel a bit weird about "closing either the VideoFrame or the GPUExternalTexture does NOT close the other."
Option3 is too complicated and may stall the decoder.

For option 1, from the impl view, we need to have a mechanism to ensure VideoFrame.close() notifies WebGPU. So we could report errors for below scenario:

const externalTextureDescriptor = { source: videoFrame};
const externalTexture = device.importExternalTexture(externalTextureDescriptor);
...
videoFrame.close();
...
 const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      {
        binding: 0,
        resource: externalTexture, // should report validation error
      },
    ],
  });

@Kangz
Copy link
Contributor Author

Kangz commented Nov 3, 2021

It seems there's two orthogonal questions that may not need to be resolved the same way.

A) How is the lifetime of imported <video> managed? We have three solutions outlined so far:

  1. The application explicitly calls GPUExternalTexture.destroy/close() and risks stalling the video decoder if it doesn't do it correctly.
  2. The GPUExternalTexture is automatically closed at the end of the microtask. Requires caching on the browser side.
  3. The GPUExternalTexture is closed when the callback passed to requestVideoFrameCallback is called. No caching is required, but WebGPU takes a dependency on browsers implementing requestVideoFrameCallback.

B) How is the lifetime of imported VideoFrame managed? Solutions so far:

  1. There is only VideoFrame.close() that also controls the lifetime of the GPUExternalTexture.
  2. Both GPUExternalTexture.destroy/close() and VideoFrame.close() close the GPUExternalTexture.
  3. VideoFrame.close() doesn't do anything on the GPUExternalTexture, only GPUExternalTexture.destroy/close() does.

I think A3 would be ideal but it depends on what other browsers feel about requestVideoFrame. Either B1 or B2 seems ok, I have concerns with B3 because it means that either WebCodec or WebGPU can stall the video decoder.

@kainino0x
Copy link
Contributor

For A3 I suppose we could just expose a GPUExternalTexture."is still alive" attribute (guaranteed not to change inside a task). This would expose duplicate information with rVFC but is much more straightforward/obvious and doesn't require rVFC support. My concern either way is that it's rather fragile, and pages could end up depending on lifetimes that don't turn out to be portable. Maybe it works if the spec is constraining enough, like, say it can only change on rAF boundaries just after the rAF callback? I don't know if there are any constraints today on when rVFC can fire.

So for this reason I think I still like A2. No real opinion about B; as long as we put in the right warnings, developers will adapt easily to any of these.

@kdashg
Copy link
Contributor

kdashg commented Nov 4, 2021

I like A2.
I think the pre-standards aspect of WebCodec makes B hard to sign up for/agree to.

@Kangz
Copy link
Contributor Author

Kangz commented Nov 4, 2021

My concern either way is that it's rather fragile, and pages could end up depending on lifetimes that don't turn out to be portable.

This seems like a minor concern in the face of usability. There are already 1000 ways promises / callbacks get scheduled differently on the Web platform between browsers, and developers almost always get things correct. Over-specifying the scheduling of things takes some control away from browsers (like tab freezing / throttling, etc).

I like A2.

Please describe why you like it better than others, otherwise it's hard to discuss alternatives.

I think the pre-standards aspect of WebCodec makes B hard to sign up for/agree to.

mozilla/standards-positions#209 describes this as "worth prototyping" and Mozilla is very involved in WebCodec (and even has a group chair). In all cases I think we should design this interaction with the assumption that it can be changed at any time until WebCodec is a stable specification. It's not possible to access the functionality without WebCodec being available anyway so there is no breaking change in changing it when WebCodec isn't a stable shipped spec.

And this is somewhat similar to WebXR / WebGPU integration where we already did design work (though it happened in the immersive web group) for the integration between an unreleased API, and an API that's not in all browsers.

@kainino0x
Copy link
Contributor

This seems like a minor concern in the face of usability.

I have mixed thoughts on how much A2 presents usability issues. It will be a bit counterintuitive to users (and potentially perform poorly if there isn't good caching), but the behavior is very reliable which I think will quickly correct any developer misunderstandings.

I was mostly concerned about timing hazards, but upon further discussion with @Kangz I think we can eliminate that issue with stricter timing:

Maybe it works if the spec is constraining enough, like, say it can only change on rAF boundaries just after the rAF callback? I don't know if there are any constraints today on when rVFC can fire.

I still don't know how exactly when rVFC can fire, but I don't think it matters. At least for Chromium, it's definitely possible to hold strong references to video frames (as WebCodecs VideoFrame does this). So simply: the lifetime lasts from importExternalTexture until the end of some future rAF (in particular, the one after the next rVFC).

The user does still need to know when to re-import. rVFC actually might be too often(?) depending on its behavior when e.g. playing a 60Hz video on a 30Hz display. But regardless, it would be handy to have an alternate signal like "is still alive": this cuts the dependency on rVFC and is more practical for rendering use-cases anyway (if (not still alive) texture = device.importExternalTexture(video);).

One minor concern: this sets no expectation of caching in the browser, yet applications could still import too often. I think we could guard against this with a warning if the same video frame is imported multiple times (perhaps in different animation frames). Developers would see such a warning reliably since video framerate is almost always lower than display framerate.

@shaoboyan
Copy link
Contributor

shaoboyan commented Nov 5, 2021

The GPUExternalTexture is closed when the callback passed to requestVideoFrameCallback is called. No caching is required, but WebGPU takes a dependency on browsers implementing requestVideoFrameCallback.

If I understand correctly, I think this means we could only create (or suggest to creating) GPUExternalTexture in requestVideoFrameCallback and the benefit is that browser don't need to cache the frame. This is due to the different fps from display(browser) and video itself.

"Bullet screen" is one of the cases that graphics needs to working in display fps but not the video fps. When we watching "bullet screen", we could always find some fast bullets which move across the screen smoothly. I think rendering the animation of the bullets with display fps will provide better experience(means, the rendering logic should work in rAF). So if we requires user to create GPUExternalTexture in rVFC and user needs to render bullets in rAF, the logic is a bit complex. It might be:

const bulletTexture = device.CreateTexture(...);

window.requestAnimationFrame(function renderingBullets() {
 //rendering bulletTexture in another draw
});

video.requestVideoFrameCallback(function updateExternalTexture() {
   const externalTexture = device.importExternalTexture(...)
  // rendering the video frame firstly.
});

Nowadays, "bullet screen" also have a popular tech calls "anti-mask"(the translation is by me and not accurate) which the bullets won't cover the human face or important objects in video frame. Instead it flies behind them. It seems that video frame itself has different depth from different parts. Then app composites them with bullets. I'm not sure above code logic could handle this effectively (Here is a showcase if you have interests).

And having a cache in browser seems a good solution to improve the performance. In Chromium, WebGL also cache one frame to improve texImage2D(HTMLVideoElement) performance. The problem is to achieve 0-copy with the cache. I think it might be possible if we could keep the single frame from being updated until new frame comes or decoder tells WebGPU to release the cache.

The GPUExternalTexture is automatically closed at the end of the microtask. Requires caching on the browser side.

I prefer A2 for importing <video>. Constraint the lifecycle of GPUExternalTexture into the microtask is reasonable. If user wants to cache the frame, they could do it by rendering the frame into a texture. Destory() function seems not useful.

There is only VideoFrame.close() that also controls the lifetime of the GPUExternalTexture.

I prefer B1 for importing VideoFrame. For me, If VideoFrame.Close() doesn't bother GPUExternalTexture, it means that the GPUExternalTexture is a copy of "VideoFrame". That is not what I expected. I think GPUExternalTexture is a external object stub in WebGPU. If the external object has been destroyed, the stub should be invalid too.

@Kangz
Copy link
Contributor Author

Kangz commented Nov 5, 2021

Let me rephrase the proposal @kainino0x made above, and let's name it A4:

GPUExternalTextures from HTMLVideoElement can only be invalidated at the end of rAF. When first imported, GPUExternalTextures are guaranteed to be valid at least until the end of the next rAF (or the current one if we're in a rAF callback). GPUExternalTexture expose some member that says whether they are valid. Like .valid.

This means that developers can use the following code:

var videoTexture = null;
var videoBindGroup = null;
function frame() {
    if (!videoTexture || !videoTexture.valid) {
        videoTexture = device.importExternalTexture({source: myVideo});
        videoBindGroup = device.createBindGroup(...);
    }

    // Do something with the video bind group

    requestAnimationFrame(frame);
}

requestAnimationFrame(frame);

I prefer A2 for importing <video>. Constraint the lifecycle of GPUExternalTexture into the microtask is reasonable. If user wants to cache the frame, they could do it by rendering the frame into a texture. Destory() function seems not useful.

While this is a workable solution, I think it makes things more difficult for developers if they want to use external texture in more asynchronous code. It's also better to design a solution that doesn't require a cache in the browser if possible because it reduces implementation complexity and makes things simpler to reason about for developers.

@kainino0x
Copy link
Contributor

If user wants to cache the frame, they could do it by rendering the frame into a texture.

This would be unnecessarily costly because it would always create an extra copy. If the browser performs the caching it would have the option to the video frame alive and cache a simple wrapper instead.

I suppose this is an example of the confusing nature of A2. It's hard to tell whether app-side copy or repeated re-importing is better.

@shaoboyan
Copy link
Contributor

shaoboyan commented Nov 8, 2021

Let me rephrase the proposal @kainino0x made above, and let's name it A4:

Ok, my point is that I don't like to constraint GPUExternalTexture into rVFC which is different from other rendering logics. I think A4 is a balance for browser doesn't need extra cache and user doesn't to cache the thing themseleves, and seems great.

@kainino0x
Copy link
Contributor

kainino0x commented Nov 9, 2021

Unfortunately the reality of videos is that they're highly asynchronous and don't schedule nicely with rAF. I think we're in agreement that A3 (in which the import ends at rVFC) is not ideal.

I'll open a PR soon to propose A4 (after a little thinking about how to expose the "is valid" flag).

Also, that should give us some clarity on VideoFrame since A4 doesn't add a destroy method, so regardless of whether we spec it here or not I think it's reasonable to settle on option B1.

@kainino0x kainino0x self-assigned this Nov 9, 2021
@kainino0x
Copy link
Contributor

@grorg @jdashg Can you provide feedback on whether proposal A4 will work for you, specifically:
Can you hold an HTMLVideoElement video frame (which could be a decoder pool resource) alive for a browser-controlled amount of time that would be no longer than one rAF after the video element has advanced past that frame?

@kainino0x
Copy link
Contributor

FTR the corresponding investigation for VideoFrame is #1380.

@litherum
Copy link
Contributor

Can this issue be renamed to something that doesn't use the word "WebCodec"? During the last call it was stated that it doesn't actually have anything to do with WebCodec.

@kainino0x
Copy link
Contributor

There are two issues here, one is WebCodecs and one isn't.

@kainino0x kainino0x changed the title Specify the importExternalTexture of WebCodec VideoFrame Allow importing VideoFrames and determine GPUExternalTexture lifetime Nov 16, 2021
@kainino0x
Copy link
Contributor

The lifetime issue came up again here because it was blocking the VideoFrame API design.

@kainino0x
Copy link
Contributor

kainino0x commented Nov 23, 2021

I don't know if there are any constraints today on when rVFC can fire.

#2302 (comment)

@litherum
Copy link
Contributor

Our video decoders decode into a pool of IOSurfaces. An IOSurface will be re-used from the pool if it has a reference count of 1 (meaning the pool is the only thing referencing it). If there is no such IOSurface in the pool, a new one is allocated.

IOSurfaces are "wired" kernel memory, which means the kernel's VM system can't relocate the addresses or pages. Therefore, they require more maintenance care than regular allocations.

This means that we would be willing to allocate a few extra IOSurfaces if JS retains them, but we would not be willing to give JS carte blanche to retain as many of them as it feels like.

@kainino0x
Copy link
Contributor

Thanks @litherum! That shouldn't be a problem for the proposal I'm working on in #2302.

@kainino0x
Copy link
Contributor

Drafted w3c/webcodecs#412 to consider adding the spec to WebCodecs.

@kainino0x
Copy link
Contributor

Resolved to open that draft. I'll take care of driving it on the webcodecs side.

With that plus #2302 this issue will be closed!

@dalecurtis
Copy link

dalecurtis commented Dec 2, 2021

@litherum wrote:

This means that we would be willing to allocate a few extra IOSurfaces if JS retains them, but we would not be willing to give JS carte blanche to retain as many of them as it feels like.

FWIW, the WebCodec's spec allows implementations to stall decoding for any reason. The most common being that the client has exhausted the allowed supply of hardware decoding buffers. So WebKit would be free to limit the number of outstanding IOSurfaces to whatever it feels is appropriate.

@kainino0x

This comment has been minimized.

@Kangz Kangz changed the title Allow importing VideoFrames and determine GPUExternalTexture lifetime Determine GPUExternalTexture lifetime Jan 12, 2022
@Kangz
Copy link
Contributor Author

Kangz commented Jan 12, 2022

WebCodec should be a different issue.

@kainino0x
Copy link
Contributor

Filed #2498 for WebCodecs. Closing this since this issue has too many mixed topics. #2302 is still open but we can track it on that PR instead of here.

kainino0x added a commit that referenced this issue Feb 18, 2022
)

* Extend lifetime of GPUExternalTexture imported from video element

Original PR: #1666
Discussion: #2124

* adjust lifetime, add flag

* editorial
Daasin added a commit to FOSS-Archives/WebGPU that referenced this issue Mar 21, 2022
* Always allow reassociation (gpuweb#2403)

Fixed: gpuweb#2402

* Update question.md

* Fix GPUBuffer.destroy when "mapping pending" (gpuweb#2411)

Fixes gpuweb#2410

* CopyExternalImageToTexture: Support copy from webgpu context canvas/offscreenCanvas (gpuweb#2375)

Address gpuweb#2350.

* Fix typo in requestDevice and clarify error cases (gpuweb#2415)

* Disallow empty buffer/texture usages (gpuweb#2422)

* Specify dispatchIndirect behavior when exceeding limit (gpuweb#2417)

The rest of the behavior of this command isn't specified yet, but this
gets this into the spec so we can close the issue and edit later.

Fixes gpuweb#323

* Require buffer bindings to have non-zero size (gpuweb#2419)

* Require depth clear value in 0.0-1.0 (gpuweb#2421)

* Require depth clear value in 0.0-1.0

* clarify handling of union

* Fix error conventions for optional features (gpuweb#2420)

* Fix error conventions for optional features

* relax claim

* Add increment and decrement statements (gpuweb#2342)

* Add increment and decrement statements

Phrased in terms of += and -= complex assignments

Fixes: gpuweb#2320

* Remove the note about ++ and -- being reserved

* Specify order of evaluation for expressions (gpuweb#2413)

Fixes gpuweb#2261

* Expressions (and function parameters) are evaluated in left-to-right
  order
* Remove todos that are covered elsewhere in the spec
  * variable lifetime
  * statement and intra-statement order

* Remove [[block]] attribute from sample (gpuweb#2439)

* Enable dynamic indexing on matrix and array values (gpuweb#2427)

Fixes: gpuweb#1782

* Behaviour of empty statement is {Next} (gpuweb#2432)

Fixes: gpuweb#2431

* wgsl: Fix TODOs in Execution section (gpuweb#2433)

- in Technical overiew, say that evaluation of module-scope constants
  is the first thing to be executed
- Remove "Before an entry point begins" because that's now fully covered
  by the Technical overview
- Add introductory prose in the "Execution" top level section
- remove parens from "Program order (within an invocation)" section.

* wgsl: Fix "Entry Point" section TODOs (gpuweb#2443)

- Link 'stage' attribute text to definitions later on
- Move definition of "entry point" to the top of the "Entry Points"
  section, away from the "Entry point declaration" section.
- Rework and simplify the first part of "Entry point declaration".
  Link to other parts of the spec, e.g. to user-defined function.

* wgsl: Allow 0X for hex prefix (gpuweb#2446)

Fixes: gpuweb#1453

* Specify compilation message order/locations are impl-defined (gpuweb#2451)

Issue gpuweb#2435

* Disallow pipe for hex literals and allow capital (gpuweb#2449)

* Remove [SameObject] from GPUUncapturedErrorEvent.error (gpuweb#2423)

Implements the same behavior by prose rather than by WebIDL attribute.
The WebIDL attribute isn't currently valid on union types, and we have
to define this in prose anyway since [SameObject] is pure documentation
(has no behavioral impact on its own).

Fixes gpuweb#1225

* Make GPUDevice.lost return the same Promise object (gpuweb#2457)

Fixes gpuweb#2147

* Require alignment limits to be powers of 2 (gpuweb#2456)

Fixes gpuweb#2099

* Define GPUTextureViewDimension values (gpuweb#2455)

including the order of faces in cube maps.

Fixes gpuweb#1946

* Restore the box around algorithm divs (gpuweb#2453)

When the spec template changed, algorithms stopped having an outline
around them, which makes the spec hard to read.

* Add source image orientation to copyExternalImageToTexture (gpuweb#2376)

* Add 'originBottomLeft' attribute in GPUImageCopyTextureTagged

Resolved gpuweb#2324

* Simplify the description and move originBottomLeft to GPUImageCopyExternalImage

* Update spec/index.bs

Address Kai's description.

Co-authored-by: Kai Ninomiya <kainino1@gmail.com>

* Fix typo

* Apply suggestions from code review

* Update spec/index.bs

Co-authored-by: Kai Ninomiya <kainino1@gmail.com>

* Clarify that attachments may not alias (gpuweb#2454)

Fixes gpuweb#1358

* Fix examples classes, globals, and previews (gpuweb#2412)

* Rework encoder state and mixins (gpuweb#2452)

* GPUDebugCommandsMixin

* Move state and command list to a GPUCommandsMixin

* Propagate commands in endPass

* fix which [[commands]] is appended

* nits

* "Validate"->"Prepare" the encoder state

* Fully describe validation of render attachments (gpuweb#2458)

* Fully describe validation of render attachments

Fixes gpuweb#2303

* typos

* more typo

* Texture format caps for MSAA and resolve (gpuweb#2463)

* Texture forma caps for MSAA and resolve

* Fix missing columns, add notes

* Add multisample flags even where rendering isn't supported

* [editorial] wgsl: left shifts are logical (gpuweb#2472)

* Remove 'read','read_write','write' as keywords, image formats as keywords (gpuweb#2474)

* Texture format names are not keywords

Issue: gpuweb#2428

* read, write, read_write are not keywords

Fixes: gpuweb#2428

* Only define image format names usable for storage textures (gpuweb#2475)

* Only define image format names usable for storage textures

Fixes: gpuweb#2473

* Sort texel format names by channel width first

Make it consistent with the other tables in the WGSL spec,
and with the Plain Color Formats table in the WebGPU spec.

* [editorial] Rename "built-in variable" -> "built-in value" (gpuweb#2476)

* Rename "built-in variable" -> "built-in value"

Fixes: gpuweb#2445

* Rewrite builtin-in inputs and outputs section

It needed an overhaul because with pipeline I/O via entry point
parameters and return types.  Previously it was phrased in terms
of *variables*, and some things just didn't make sense.

Added rules expressing the need to match builtin stage and direction
with entry point stage and parameter vs. return type.
This also prevents mixing builtins from different stages or conflicting
directions within a structure.

* Move Limits section to under "WGSL Program" (gpuweb#2480)

I think it makes more sense there.

* Fix declaration-and-scope section for out-of-order decls (gpuweb#2479)

* Fix declaration-and-scope section for out-of-order decls

Also reorganize to bring "resolves to" closer to the definition of "in scope".

Fixes: gpuweb#2477

* Apply review feedback

* Behaviors: Ban obviously infinite loops (gpuweb#2430)

Fixes: gpuweb#2414

* Clarify fract (gpuweb#2485)

* Officially add Brandon as a spec editor (gpuweb#2418)

Brandon has de facto equal standing as an editor and I think it's time
to recognize it.

* Require 1D texture mipLevelCount to be 1. (gpuweb#2491)

Fixes gpuweb#2490

* Add simple examples for create, init, and error handling functions

* Address feedback from Kai

* Tweak to device loss comments

* wsgl: Add bit-finding functions. (gpuweb#2467)

* wsgl: Add bit-finding functions.

- countLeadingZeros, countTrailingZeros
   - Same as MSL clz, ctz
- firstBitHigh, firstBitLow
   - Same as HLSL firstbithi, firstbitlow
   - Same as GLSL findMSB, findLSB
   - Same as SPIR-V's GLSL.std.450 FindSMsb FindUMsb, FindILsb

Fixes: gpuweb#2130

* Apply review feedback

- Better description for countLeadingZeros countTrailingZeros
- For i32, we can say -1 instead of |T|(-1)

* Apply review feedback: drop "positions"

* wgsl: Add extractBits, insertBits (gpuweb#2466)

* wgsl: Add extractBits, insertBits

Fixed: gpuweb#2129 gpuweb#288

* Formatting: break lines between parameters

* insertBits operates on both signed and unsigned integral types

* Add mixed vector-scalar float % operator (gpuweb#2495)

Fixes: gpuweb#2450

* wgsl: Remove notes about non-ref dynamic indexing (gpuweb#2483)

We re-enabled dynamically indexing into non-ref arrays and matrices in
gpuweb#2427, as discussed in gpuweb#1782.

* Disallow aliasing writable resources (gpuweb#2441)

* Describe resource aliasing rules

Fixes gpuweb#1842

* Update spec/index.bs

Co-authored-by: Kai Ninomiya <kainino1@gmail.com>

* Update spec/index.bs

Co-authored-by: Kai Ninomiya <kainino1@gmail.com>

* Editorial changes

* Editorial: split out aliasing analysis

* Consider only used bind groups and consider visibility flags

* Consider aliasing between read-only and writable bindings

* Tentatively add note about implementations

* editorial nits

* Fix algorithm

* Remove loop over shader stages

* Rephrase as "TODO figure out what happens"

* clarify

* Add back loop over shader stages

* map -> list

Co-authored-by: Myles C. Maxfield <mmaxfield@apple.com>
Co-authored-by: Myles C. Maxfield <litherum@icloud.com>

* Fix map/list confusion from gpuweb#2441 (gpuweb#2504)

I forgot to save the file before committing my last fix to gpuweb#2441.

* Fix a typo in packing built-in functions list (gpuweb#2513)

* Remove tiny duplicates (gpuweb#2514)

This removes a few tiny duplicates found in the spec.

* integer division corresponds to OpSRem (gpuweb#2518)

* wgsl: OpMod -> OpRem for integers

* OpURem -> OpUMod again

Co-authored-by: munrocket <munrocket@pm.me>

* Remove stride attribute (gpuweb#2503)

* Remove stride attribute

Fixes: gpuweb#2493

Rework the examples for satisfying uniform buffer layout, using align
and stride.

* Remove attribute list from array declaration grammar rule

Fixes: gpuweb#1534 since this is the last attribute that may be applied to a type declaration.

* Switch to `@` for Attributes (gpuweb#2517)

* Switch to `@` for Attributes

* Convert new examples

* Struct decl does not have to end in a semicolon (gpuweb#2499)

Fixes: gpuweb#2492

* wgsl: float to integer conversion saturates (gpuweb#2434)

* wgsl: float to integer conversion saturates

Fixes a TODO

* Saturation follows rounding toward zero.

Simplify the wording of the rule.

Explain what goes on at the extreme value (as discussed in the issue
and agreed at the group), how you don't actually get the max value
in the target type because of imprecision.

* Store type for buffer does not have to be structure (gpuweb#2401)

* Store type for buffer does not have to be structure

* Modify an example showing a runtime-sized array as the store type
  for a storage buffer.

Fixes: gpuweb#2188

* Update the API-side rules about minBindingSize

The store type of the corresponding variable is not always
going to be a structure type. Qualify the rule accordingly.

* Rework minimum binding size in both WebGPU and WGSL spec

Define 'minimum binding size' in WGSL spec, and link to it from WebGPU.
Repeat the rule in both places (to be helpful).

The minimum binding size for a var with store type |T| is
max(AlignOf(T),SizeOf(T)), and explain why the AlignOf part is needed:
it's because sometimes we have to wrap |T| in a struct.

This also replaces the old rule in WGSL which confusingly dependend
on the storage class.  The storage class aspect is already embedded
in the alignment and size constraints for the variable.

* Simplify minimum binding size to Sizeof(store-type)

Underlying APIs don't need the extra padding at the end of any
structure which might wrap the store type for the buffer variable.

* Update API-side to SizeOf(store-type)

* Apply review feedback

- Link to SizeOf in the WGSL spec
- More carefully describe the motivation for the min-binding-size
  constraint.

* Simplify, and avoid using the word "mapping"

"Mapping" is how the buffer's storage is paged into host-accessible
address space. That's a different concept entirely, and would only
confuse things.

* Remove duplicated words (gpuweb#2529)

* Remove duplicated words `be`

Remove two duplicated `be` from the WebGPU spec.

* remove another duplicated `the`

* editorial: streamline array and structure layout descriptions (gpuweb#2521)

* Simplify array layout section

- move definition of element stride to start of memory layout section
- remove redundant explanation of array size and alignment
- remaining material in that example is just examples
- Add more detail to examples, including computing N_runtime for
  runtime-sized array

* Streamline structure layout section

- Make 'alignment' and 'size' defined terms
- Don't repeat the rule for overall struct alignment and size.
- Rename "Structure Layout Rules" to "Structure Member Layout" because
  that's all that remains.
  - Streamline the text in this section.

Fixes: gpuweb#2497

* Apply review feedback:

- state constraints at definition of the align and size attributes
- rename 'size' definition to 'byte-size'
- use the term "memory location" when defining alignment.
- rename the incorrectly-named "lastOffset" to "justPastLastMember"
- in the description of internal layout, state the general rule that the
  original buffer byte offset k must divide the alignment of the type.

* Change notation: say i'th member instead of M<sub>i</sub>

* Remove stray sentence fragment

* Change GPUObjectBase.label from nullable to union-with-undefined (gpuweb#2496)

* Separate loadOp and clear values

* Add note explaining how dispatch args and workgroup sizes interact (gpuweb#2519)

* Add a note explaining how the dispatch arguments and workgroup sizes interact

* Address feedback

* Address feedback from Kai

* Refine the supported swapchain formats (gpuweb#2522)

This removes the "-srgb" formats, and adds "rgba16float".

Fixes: gpuweb#1231

* wgsl: Fix example's builtin name (gpuweb#2530)

* wgsl: detailed semantics of integer division and remainder (gpuweb#1830)

* wgsl: detailed semantics of integer division and remander

Adds definition for `truncate`

For divide by zero and signed integer division overlow, say they
produce a "dynamic error".

Fixes: gpuweb#1774

* Assume polyfill for the overflow case

This pins down the result for both division and remainder.

* Specify definite results for int division, % by zero

These are no longer "dynamic errors".

For integer division:   e1 / 0 = e1

For integers,           e1 % 0 = 0

The % case is somewhat arbitrary, but it makes this true:
    e1 = (e1 / 0) + (e1 % 0)

Another way of formulating the signed integer cases is to forcibly
use a divisor of 1 in the edge cases:
     where MINIT = most negative value in |T|
     where Divisor = select(e2, 1, (e2==0) | ((e1 == MININT) & (e2 == -1)))
then
     "e1 / e2" = truncate(e1 / Divisor)
     "e1 % e2" = e1 - truncate(e1/Divisor) * Divisor

The unsigned integer case is similar but doesn't have (MININT,-1) case.

* Add override declarations (gpuweb#2404)

* Add override declarations

* Refactor `var` and `let` section
  * have a general value declaration subsection
  * subsections on values for let and override
  * move override requirements from module constants to override
    declarations
* introduce override keyword
* remove override attribute and add id attribute
  * literal parameter is now required
* Update many references in the spec to be clearer about a let
  declaration vs an override declaration
* update examples
* Make handling of `offset` parameter for texture builtins consistent
  * always either const_expression or module scope let

* Changes for review

* combine grammar rules
* refactor validation rules for overrides
* fix typos

* add todo for creation-time constant

* fix example

* combine grammar rules

* Rename storage class into address space (gpuweb#2524)

* Rename storage class into address space

* Davids review findings, plus renaming of |SC|

* Add an explainer for Adapter Identifiers to facilitate further design discussion.

* Explain smoothStep better (gpuweb#2534)

- Use more suggestive formal parameter names
- Give the formula at the function definition, not just at the
  error bounds explanation.

* Fix clamp arity in error bounds section (gpuweb#2533)

Also at the definitions, use more suggestive formal parameter names (e,low,high)
instead of the less readable (e1,e2,e3)

* Use consistent capitalisation in section titles (gpuweb#2544)

* remove some unnecessary `dfn` tags

* Remove unnecessary todos (gpuweb#2543)

* Defer API linkage issues to the API spec
* remove issues and todos that are covered in the API
* remove todo about array syntax

Co-authored-by: David Neto <dneto@google.com>

* Add GPUTextureDescriptor viewFormats list (gpuweb#2540)

* Add GPUTextureDescriptor viewFormats list

Initially allows only srgb formats; further rules for format
compatibility can follow.

Issue: gpuweb#168
CC: gpuweb#2322

* note on canvas config

* Enforce presence of an initializer for module-scope let (gpuweb#2538)

* Enforce presence of an initializer for module-scope let

* Since pipeline-overridable constants were split from let declarations
  the grammar for let declarations can enforce the presence of an
  initializer
* Remove global_const_intiailizer since it was only used in for a single
  grammar production (module-scope let) and only had a single grammar
  itself
* Update extract_grammar to initialize type declarations with zero-value
  expressions

* fix typos

* Make depth/stencil LoadOp and StoreOp optional again

This change was landed as part of gpuweb#2387 but was then accidentally
reverted when gpuweb#2386 landed out of order.

* Add the uniformity analysis to the WGSL spec (gpuweb#1571)

* Add the uniformity analysis to the WGSL spec

Make the information computed more explicit per Corentin's suggestion

Add uniformity of builtins, limit the singleton rule to {Next}, do some minor cleanup

Make the typography more uniform and hopefully less confusing

Add rule for switch, and simplify rule for if

Clarify the role of CF_start, and remove two instances of the word 'simply'

Remove TODO and allow accesses to read-only global variables to be uniform

Mark functions that use implicit derivatives as ReturnValueCannotBeUniform

* s/#builtin-variables/#builtin-values/ after rebasing

* Add (trivial) rules for let/var, as suggested by @alan-baker and @dneto

* Add rules for non-shortcircuiting operators, as suggested by @alan-baker and @dneto

* Use the rowspan attribute to simplify the tables in the uniformity section

* Fix syntax of statement sequencing/blocks in the uniformity rules, following an earlier fix to the behavior analysis.

* s/adressing/addressing/, as suggested by @alan-baker in an earlier review.

* Clarify 'local variable'

* Deal with non-reconvergence at the end of functions

* s/global/module-scope/, s/built-in variable/built-in value/, and mention let-declarations

* Address the last issues found by @dneto0

* CannotBeUniform -> MayBeNonUniform

* Apply Dzmitry's suggestions

* s/MustBeUniform/RequiredToBeUniform/g

Co-authored-by: Robin Morisset <rmorisset@apple.com>

* Vectors consist of components (gpuweb#2552)

* Vectors consist of components

* Update index.bs

* Make depth/stencil LoadOp and StoreOp optional again (pt.2)

This change was landed as part of gpuweb#2387 but was then accidentally
reverted when gpuweb#2386 landed out of order.

* WGSL: Replace [SHORTNAME] with WGSL (gpuweb#2564)

Fixes gpuweb#1589

* Fix step() logic (gpuweb#2566)

* Relax vertex stride requirements (gpuweb#2554)

* s/endPass/end/ for pass encoders (gpuweb#2560)

Fixes gpuweb#2555

* Fix canvas resizing example

* Rework "validating texture copy range" for 1D/3D textures. (gpuweb#2548)

That algorithm special cased 1D and 2D textures, making empty copies
valid for 2D and not for 1D. 3D textures where just not discussed.

Fix this by just checking that the copy fits in the subresource size,
and also turn "validating texture copy range" into an algorithm with
arguments.

Co-authored-by: Dzmitry Malyshau <kvark@fastmail.com>

* Add optional trailing comma for the attribute syntax. (gpuweb#2563)

All places that use a variable number of comma-separated things now
support trailing commas. However things with a fixed number of
comma-separated arguments don't. They are:

 - array_type_decl
 - texture_sampler_types
 - type_decl
 - variable_qualifier

Fixes gpuweb#1243

* wgsl: reserve `demote` and `demote_to_helper` (gpuweb#2579)

* if,switch param does not require parentheses (gpuweb#2585)

Fixes: gpuweb#2575

* Add while loop (gpuweb#2590)

Update behaviour analysis and uniformity analysis.

Fixes: gpuweb#2578

* Allow sparse color attachments and ignored FS outputs (gpuweb#2562)

* Allow sparse color attachments and ignored FS outputs

Fixes gpuweb#1250
Fixes gpuweb#2060

* Update pipeline matching rules

Co-authored-by: Dzmitry Malyshau <kvark@fastmail.com>

* Render -- as is (gpuweb#2576)

* Render -- as is

* Use backticks

* Use backticks for plus plus too

* Better separate Security and Privacy sections (gpuweb#2592)

* Better separate Security and Privacy sections

They were largely already separate but the header levels were a
bit confusing so this CL normalizes them and renames the sections
to "Security Considerations" and "Privacy Considerations" as
requested by the W3C security review guidelines.

Also expands the privacy section with a brief header, another
mention of driver bugs as a potentially identifying factor, and
a note indicating that discussions about adapter identifiers are
ongoing.

* Simplify adapter info privacy considerations note.

* Remove SPIR-V mappings (gpuweb#2594)

* Remove most references to SPIR-V opcodes and types in the
  specification
* References remain transitively in the Memory Model section as it is
  necessary for the specification
* Removed goal section as they only described SPIR-V

* Complete the Errors & Debugging section

* Addressed feedback

* Update spec/index.bs

Co-authored-by: Kai Ninomiya <kainino@chromium.org>

* Update spec/index.bs

Co-authored-by: Kai Ninomiya <kainino@chromium.org>

* Make firing the unhandlederror event optional in the algorithm

* Refactored algorithms for more sensible names.

* Fix typo in "validating texture copy range" argument (gpuweb#2596)

* Fix a typo in a note "applicaitions". (gpuweb#2602)

* Disallow renderable 3D textures (gpuweb#2603)

* `createSampler` creates a `GPUSampler` (gpuweb#2604)

Correct the link that errantly pointed to `GPUBindGroupLayout`.

* Extend lifetime of GPUExternalTexture imported from video element (gpuweb#2302)

* Extend lifetime of GPUExternalTexture imported from video element

Original PR: gpuweb#1666
Discussion: gpuweb#2124

* adjust lifetime, add flag

* editorial

* wgsl: reserve 'std', 'wgsl' (gpuweb#2606)

Fixes: gpuweb#2591

* wgsl: Fix typos (gpuweb#2610)

`signficant` -> `significant`
`consectuive` -> `consecutive`

* wgsl: Rename `firstBitHigh` and `firstBitLow`

The current names are confusing, as `High` or `Low` may refer to scanning from the MSB or LSB, or that it is scanning for the bits `1` or `0`.

By renaming to `firstLeadingBit` and `firstTrailingBit` the ambiguity is reduced, and we have a consistent terminology with `countLeadingZeros` / `countTrailingZeros`.

* Update override examples (gpuweb#2614)

Fixes gpuweb#2613

* Update WGSL syntax for overridable constants

* Fix a typo in FP32 internal layout (gpuweb#2615)

* Fix a typo in FP32 internal layout

In the internal layout of float32, Bits 0 through 6 of byte k+2 contain
bits 16 through 22 of the fraction, which has a total of 23 bits.

* remove duplicated "bit"

* WGSL style guide: in progress extensions developed outside main spec (gpuweb#2616)

Fixes: gpuweb#2608

* Clarify when a built-in function name can be redefined (gpuweb#2621)

* wgsl: Cleanup internal layout of matrix type (gpuweb#2624)

* Cleanup internal layout of matrix type

Use alignOf() to cleanup the description of internal layout of matrix type.

* Use "i x AlignOf()" instead of "AlignOf() x i"

* [editorial] Fix uniformity table widths (gpuweb#2623)

* Reduce table width by allowing more line breaks
* Make op consistently italicized

* Add break-if as optional at end of continuing (gpuweb#2618)

* Add break-if as optional at end of continuing

A break-if can only appear as the last statement in a continuing
clause.

Simplifies the rule about where a bare 'break' can occur: It
must not be placed such that it would exit a continuing clause.

Fixes: gpuweb#1867

Also refactor the grammar to make:
  continuing_compound_statement
  case_compound_statement
These are called out as special forms of compound statement, so
that the scope rule of declarations within a compound statement
also clearly apply to them.

* Add statement behaviour of break-if

The expresison always includes {Next}.
When the expression is true, the {Break} behaviour is invoked.
Otherwise, the {Next} behaviour is invoked.

So it's   B - {Next} + {Next, Break}
or   B + {Break}

* Add uniformity analysis for break-if

* Apply review feedback

- Tighten the wording about where control is transferred for break and
  break-if
- Allow "break" to be used in a while loop.

* Avoid double-avoid

* wgsl: Reserve words from common programming languages (gpuweb#2617)

* wgsl: Reserve words from common programming languages

Reserves words from C++, Rust, ECMAScript, and Smalltalk

Add a script to automatically generate the contents of the _reserved
grammar rule.

Update extract-grammar.py to strip HTML comments before processing.

* List WGSL as a reason for a keyword reservation

* Reserve 'null'

* Reserve keywrods from GLSL 4,6 and HLSL

* Use ECMAScript 2022 instead of ECMAScript 5.1

* Reserve HLSL keywords

* Add acosh, asinh, atanh builtin functions (gpuweb#2581)

* Add acosh, asinh, atanh builtin functions

Use the polyfills from Vulkan.

Fixes: gpuweb#1622

* Result is 0 in regions that make no mathematical sense: acosh, atanh

* wgsl: Cleanup the internal memory layout of vector using SizeOf (gpuweb#2626)

* Cleanup the internal memory layout of vector using SizeOf

Descript the internal memory layout of vector types vecN<T> with
SizeOf(T) rather than literal number.

* Fix typo

* Fix typos of accuracy of exp and exp2 (gpuweb#2634)

Fix the accuracy requirement of exp and exp2 to 3 + 2 * abs(x) ULP.

* Reland: Only allow depth/stencil load/store ops when they have an effect

* Validate query index overwrite in timestampWrites of render pass (gpuweb#2627)

Vulkan requires the query set must be reset between uses and the reset
command must be called outside render pass, which makes it impossable to
overwrite a query index in same query set in a render pass, but we can
do that in different query set or different render pass.

* Add definitions for uniformity terms (gpuweb#2638)

* add definitions (and link back to them) for:
  * uniform control flow
  * uniform value
  * uniform variable
* define the scope of uniform control flow for different shader stages

* Allow statically unreachable code (gpuweb#2622)

* Allow statically unreachable code

Fix gpuweb#2378

* modify behavior analysis to allow statically unreachable code
  * unreachable code does not contribute to behaviors
* modify uniformity analysis to not analyze unreachable code
  * unreachable statements are not added to the uniformity graph

* Improve examples

* Clarify when sequential statement behaviour leads to a different
  behaviour from that of the individual statement
* improve example comment formatting to reduce possible horizontal
  scrolling

* Name in enable directive can be a keyword or reserved word (gpuweb#2650)

Fixes: gpuweb#2649

Also simplify description of where an enable directive can appear.
They must appear before any declaration.

* GPUDevice.createBuffer() requires a valid GPUBufferDescriptor (gpuweb#2643)

* Typo in definition of finish()

* Allow unmasked adapter info fields to be requested individually.

* Update design/AdapterIdentifiers.md

Co-authored-by: Kai Ninomiya <kainino@chromium.org>

* Explicitly note that declined consent rejects the promise

* Uses commas to separate struct members intead of semicolons (gpuweb#2656)

Fixes gpuweb#2587

* Change the separator for struct members from semicolons to commas
  * Comma is optional after the last member
* Changes the grammar to require one or more struct members
  * Already required by the prose of the spec

* [editorial] Compress expression tables (gpuweb#2658)

* [editorial] Compress expression tables

* Combine arithmetic and comparison expression table entries for
  integral and floating-point entries
  * since SPIR-V mappings were removed there is no strong need to have
    separate entries

* improved wording

* make online should die on everything (gpuweb#2644)

* Commiting GPUCommandBuffers to the wrong GPUDevice is an error (gpuweb#2666)

Eventually we'll want to add more logic to make sure that command buffers are only valid on
the queue they're created from. Right now, though, every device just has exactly one queue,
so matching devices is the same as matching queues.

* Mipmap filtering might be extended separately from min/mag filtering in the future

* Add a way of setting the initial label for the default queue

* add rAF/rVFC examples

* [Process] Add RequirementsForAdditionalFunctionality.md

* Addressing Kai and Dzmitry's comments

* GPUSamplerDescriptor.maxAnisotropy gets clamped to a platform-specific maximum (gpuweb#2670)

Co-authored
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants