-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JS/Py/Ruby: add a bad-tag-filter query #6561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
39f5150 to
fe10a90
Compare
ac17564 to
232c8b0
Compare
c2ddeeb to
d9f22af
Compare
|
I'm not sure why the |
The error message suggests some Actions related permissions problem. Perhaps it is because your PR is from a fork. |
If that's the case then it should be fixed. It works fine locally, so you guys can ignore the error when reviewing. |
|
@aibaars I think the workflow needs to run on EDIT: That was apparently already discovered, and a PR was made #6971 |
|
QHelp previews: javascript/ql/src/Security/CWE-116/BadTagFilter.qhelpBad HTML filtering regexpIt is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues. Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. RecommendationUse a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. ExampleThe following example attempts to filters out all function filterScript(html) {
var scriptRegex = /<script\b[^>]*>([\s\S]*?)<\/script>/gi;
var match;
while ((match = scriptRegex.exec(html)) !== null) {
html = html.replace(match[0], match[1]);
}
return html;
}The above sanitizer does not filter out all Other corner cases include that HTML comments can end with References
python/ql/src/Security/CWE-116/BadTagFilter.qhelpBad HTML filtering regexpIt is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues. Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. RecommendationUse a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. ExampleThe following example attempts to filters out all import re
def filterScriptTags(content):
oldContent = ""
while oldContent != content:
oldContent = content
content = re.sub(r'<script.*?>.*?</script>', '', content, flags= re.DOTALL | re.IGNORECASE)
return contentThe above sanitizer does not filter out all Other corner cases include that HTML comments can end with References
ruby/ql/src/queries/security/cwe-116/BadTagFilter.qhelpBad HTML filtering regexpIt is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues. Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. RecommendationUse a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. ExampleThe following example attempts to filters out all def filterScripTags(html)
oldHtml = "";
while (html != oldHtml)
oldHtml = html;
html = html.gsub(/<script[^>]*>.*<\/script>/m, "");
end
return html;
endThe above sanitizer does not filter out all Other corner cases include that HTML comments can end with References
|
|
The |
The complaints seems to be related to the file that was moved in this PR. |
I think we can just ignore that error for this PR, as the files were just moved. |
nickrolfe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ruby-specific changes look pretty reasonable, but I have a suggestion on making the qhelp example a little more idiomatic.
Co-authored-by: Nick Rolfe <nickrolfe@github.com>
|
QHelp previews: javascript/ql/src/Security/CWE-116/BadTagFilter.qhelpBad HTML filtering regexpIt is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues. Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. RecommendationUse a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. ExampleThe following example attempts to filters out all function filterScript(html) {
var scriptRegex = /<script\b[^>]*>([\s\S]*?)<\/script>/gi;
var match;
while ((match = scriptRegex.exec(html)) !== null) {
html = html.replace(match[0], match[1]);
}
return html;
}The above sanitizer does not filter out all Other corner cases include that HTML comments can end with References
python/ql/src/Security/CWE-116/BadTagFilter.qhelpBad HTML filtering regexpIt is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues. Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. RecommendationUse a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. ExampleThe following example attempts to filters out all import re
def filterScriptTags(content):
oldContent = ""
while oldContent != content:
oldContent = content
content = re.sub(r'<script.*?>.*?</script>', '', content, flags= re.DOTALL | re.IGNORECASE)
return contentThe above sanitizer does not filter out all Other corner cases include that HTML comments can end with References
ruby/ql/src/queries/security/cwe-116/BadTagFilter.qhelpBad HTML filtering regexpIt is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues. Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. RecommendationUse a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. ExampleThe following example attempts to filters out all def filter_script_tags(html)
old_html = ""
while (html != old_html)
old_html = html
html = html.gsub(/<script[^>]*>.*<\/script>/m, "")
end
html
endThe above sanitizer does not filter out all Other corner cases include that HTML comments can end with References
|
RasmusWL
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 👍 (based on old reviews saying all LGTM, and no Python related changes)
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM :-)
| /** | ||
| * A word boundary, that is, a regular expression term of the form `\b`. | ||
| */ | ||
| class RegExpWordBoundary extends RegExpEscape { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced \b should be parsed as RegExpEscape, because it can result in ReDoS query FPs that say "...strings starting with 'b'...", but I can handle that in #7120.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
Yes, that causes FPs in Ruby/Python (I checked).
Nice if you can handle it in #7120.
I've made a JS+Python query to detect regular expressions that are bad at matching HTML tags.
The query is currently very focused on
<script>and<!-- -->tags.Take as an example the below regexp.
There are multiple problems with this regexp:
<SCRIPT>tags.<script>tags where the body contain newlines.</script >end tags containing superfluous whitespace. (A parse error, but browsers render it anyway).The more complex cases in the query has to do with capture groups.
Consider the below program.
The regular expression matches both
<!-- foo -->and<!-- foo --!>(both are "valid" HTML5 comments).However, it matches the two types of comments with different capture groups, which will confuse parsers that use regular expressions like the above. (This is a root cause for at least 2 XSS CVEs).
The query works by implementing a regular-expression evaluator on top of
ReDoSUtil.qll, and using that evaluator to test for potential pitfalls when implementing regular expressions matching HTML tags.The
BadTagFilterQuery.qllfile is shared between JS/Python, so there is essentially no language specific implementation for this query.Copilot helped me write the example functions.
It fell into the same pitfall for both JavaScript and Python.
CVE-2021-33829: TP/TN
CVE-2020-17480: TP/TN
For context, see The Curious Case of Copy & Paste.
JS Evaluation shows a improvement in performance. (The baseline is
main+ awhere none()edition of the query).The new
cachedpredicates have no noticeable effect on total DB size.The Python evaluation looks OK.