Skip to content

Incorrect parsing of <svg> with <foreignObject> containing paragraphs #2452

@fauvetg

Description

@fauvetg

Hello, we are using jsoup 1.21.2 and we have an issue with when parsing an <svg> tag that :

  • contains <p> tags (in a <foreignObject> section)
  • is placed inside a <p> tag

Failing case test

var topicHtml = """
<p>
    <svg width="200" height="200" xmlns="http://www.w3.org/2000/svg">
      <foreignObject x="20" y="20" width="160" height="160">
        <div xmlns="http://www.w3.org/1999/xhtml">
          <p>Lorem ipsum</p>
        </div>
      </foreignObject>
    </svg>
</p>
""";

var parsedHtml = Jsoup.parse(topicHtml);
        
System.err.println(parsedHtml.body().html());

Actual output

<p>
 <svg width="200" height="200" xmlns="http://www.w3.org/2000/svg">
  <foreignObject x="20" y="20" width="160" height="160"> </foreignObject>
 </svg>
</p>
<div xmlns="http://www.w3.org/1999/xhtml">
 <p>Lorem ipsum</p>
</div> 
<p></p>

Expected output

<p>
    <svg width="200" height="200" xmlns="http://www.w3.org/2000/svg">
      <foreignObject x="20" y="20" width="160" height="160">
        <div xmlns="http://www.w3.org/1999/xhtml">
          <p>Lorem ipsum</p>
        </div>
      </foreignObject>
    </svg>
</p>

The issue really seems related to <p> tags, because when using <div> tags the issue vanishes. The example above works perfectly fine.

<div>
    <svg width="200" height="200" xmlns="http://www.w3.org/2000/svg">
      <foreignObject x="20" y="20" width="160" height="160">
        <div xmlns="http://www.w3.org/1999/xhtml">
          <p>Lorem ipsum</p>
        </div>
      </foreignObject>
    </svg>
</div>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA confirmed bug, that we should fixfixedAn {bug|improvement} that has been {fixed|implemented}

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions