-
Notifications
You must be signed in to change notification settings - Fork 182
Fixes Issue #328 #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes Issue #328 #329
Conversation
Parse the keyword and text directory using utf8 encoding if PNG Chunk Type is iTXt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great. I'll try it on the regression test data set before merging.
@@ -248,7 +249,7 @@ private static IEnumerable<Directory> ProcessChunk(PngChunk chunk) | |||
else if (chunkType == PngChunkType.iTXt) | |||
{ | |||
var reader = new SequentialByteArrayReader(bytes); | |||
var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_latin1Encoding); | |||
var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_utf8Encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually now an issue slightly below here. The bytesLeft
value was based on the length of the string in bytes, which for latin1 is the same as the length of the string in characters. With UTF-8 that's not the case. I'll patch this up and push to your PR.
I ran this against the regression test data set and it now successfully parses a bunch of previously broken values. I see no downside to this anywhere. Great stuff, thanks! |
This is a port of a fix from the .NET library in drewnoakes/metadata-extractor-dotnet#329 PNG chunks of type `iTXt` should have keywords and values decoded using UTF-8, not Latin1 encoding.
Thanks to @RupertAvery for reporting the issue in drewnoakes/metadata-extractor-dotnet#328 and providing a fix in drewnoakes/metadata-extractor-dotnet#329 Ported to Java in drewnoakes/metadata-extractor#611
Fixes #328
Parse the keyword and text directory using utf8 encoding if PNG Chunk Type is iTXt