Playlist: Fix tag sanitization in track metadata#76093
Playlist: Fix tag sanitization in track metadata#76093
Conversation
|
Flaky tests detected in 26e962b. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/23242097795
|
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Replace wp_strip_all_tags() with wp_kses() + wp_specialchars_decode() for sanitizing track metadata (title, artist, album, ariaLabel). wp_strip_all_tags() internally uses strip_tags(), which treats any `<` as the start of an HTML tag. This means content like "I <3 Music" would incorrectly have "<3 Music" stripped, resulting in just "I ". The new approach uses wp_kses( $text, array() ) which properly distinguishes real HTML tags from non-tag content containing `<`/`>`, followed by wp_specialchars_decode() to convert entities back to plain text since the data is JSON-encoded via wp_interactivity_state(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f4bc725 to
0d39bc6
Compare
|
Thanks @carolinan, I have rebased this so it's ready for another review :) |
| // Data is passed to wp_interactivity_state() which JSON-encodes it. | ||
| // We use wp_kses() to strip HTML tags safely (preserving non-tag | ||
| // characters like </>), then wp_specialchars_decode() to convert | ||
| // any entities back to plain text for JSON encoding. |
There was a problem hiding this comment.
This is working for me, and titles with > & < aren't stripped!
wp_specialchars_decode LGTM, but I think the wp_kses wrapper may be unnecessary.
These attributes are stored in the block comment JSON, so by the time we've reached this point the string attributes have been through:
- serializeAttributes() on save (escapes
<,>, and&), then - wp_pre_kses_block_attributes, which eventually runs wp_kses on the block attribute values before they're stored in the database
The latter is for users without unfiltered_html, but I tested with both an admin and author role.
Anyway, by the time we're at this point the entities are already encoded.
So, I typed in something like this for my title in the editor and saved:
Let me in - "Test" && < && ><style>background; red; </style><script>alert(\'hacked\');</script>!@#$%^&*()Logging out at this point shows me $title is already:
Let me in - "Test" && < && ><style>background; red; </style><script>alert(\'hacked\');</script>!@#$%^&*()
So wp_kses( $title, array() ) finds no real tags to strip and is a no-op.
Do you see the same?
Happy to be corrected if there's anything that I'm missing!
Side note, I also noticed that the waveform-title element in the editor is still encoded:
I think we'd need to run the props through decodeEntities ??
There was a problem hiding this comment.
Thanks, you're right. I have updated this PR to remove wp_kses since we don't need it, and dealt with the title issue you raised above instead. - we now run titles through stripHTML and decodeEntities.
… editor Block attributes are already sanitized via serializeAttributes() and wp_pre_kses_block_attributes(), making wp_kses() a no-op. Remove it and keep only wp_specialchars_decode() for the PHP render. In the editor, strip HTML tags and decode entities for the WaveformPlayer title and artist props so special characters display correctly and tags like <style>/<script> are removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Size Change: +45 B (0%) Total Size: 8.75 MB
ℹ️ View Unchanged
|
RichText with allowedFormats=[] already prevents HTML tags from being stored in the title/artist attributes, so stripHTML is redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Attributes could contain HTML from sources other than RichText (Code Editor, REST API, media metadata), so stripHTML is needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

What?
Fixes how track metadata (title, artist, album) is sanitized in the playlist block, both in the PHP render and in the editor's waveform player.
Why?
This addresses two issues raised in review:
Redundant
wp_kses()in PHP render (review by @ramonjd): Block attributes are already sanitized viaserializeAttributes()(which escapes<,>,&) andwp_pre_kses_block_attributes()before database storage. By the time we reach the render function, entities are already encoded, makingwp_kses($title, array())a no-op.Encoded entities and raw HTML tags in editor waveform player: The
<WaveformPlayer>component was displaying entity-encoded text (e.g.,&instead of&) and not stripping HTML tags from titles containing markup like<style>or<script>.How?
PHP (
index.php):wp_kses()wrapper — kept onlywp_specialchars_decode()to convert entities back to plain text for JSON encoding viawp_interactivity_state().JS (
edit.js):stripHTML()(from@wordpress/dom) to remove any HTML tags from the title/artist before display in the waveform player. Although the RichText fields useallowedFormats={ [] }, attributes could still contain HTML from other sources (Code Editor, REST API, media metadata). This matches the pattern already used inplaylist-track/edit.jsfor the settings panel fields.decodeEntities()(from@wordpress/html-entities) to decode entity-encoded characters so they display correctly.Testing Instructions
Let me in - "Test" && < && ><style>background; red; </style><script>alert('hacked');</script>!@#$%^&*()🤖 Generated with Claude Code