Skip to content

LO fails to load document after saving with odftoolkit due to invalid UTF-16 entities #137

@FlorianBruckner

Description

@FlorianBruckner

Xalan contains a nasty bug that produces incorrect XML entities in the output, leading to a corrupt document. E.g. this input

<text:span text:style-name="T19">𝜈</text:span>

Is changed to this when saving this document with odftoolkit:

<text:span text:style-name="T19">&#55349;&#57096;</text:span>

More information about the root cause can be found here:
https://issues.apache.org/jira/browse/XALANJ-2419

As it seems unlikely that there will ever be a new Xalan release including a fix for this, one option (and that is what I have been doing now) is to replace the xalan serializer dependency with a known good version, e.g.

        <dependency>
            <groupId>org.docx4j.org.apache</groupId>
            <artifactId>xalan-serializer</artifactId>
            <version>11.0.0</version>
        </dependency>

I cannot vouch for the integrity of this package but I have verified that it actually fixes the invalid encoding.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions