Skip to content

Fix for issue #121 #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 21, 2016
Merged

Fix for issue #121 #129

merged 1 commit into from
Mar 21, 2016

Conversation

ArjohnKampman
Copy link

When an incorrect jpeg segment length is specified, try to recover by scanning for the next segment marker rather than bailing out with an exception.

…fied, try to

recover by scanning for the next segment marker rather than bailing out
with an exception
@drewnoakes
Copy link
Owner

Thanks for this. In general the library tries to avoid scanning in response to unexpected data as eventually you'll find something that matches even if it's not correct, after which all bets are off and it's not uncommon to start emitting imaginary junk-filled tags.

Some ideas to limit this issue are:

  • Only apply this scanning if you can deduce that the image was created by software known to have this problem (this happens a lot with camera makernote handling based upon the make and model)
  • Only scan a limited number of bytes to reduce the likelihood of finding a matching byte pair

However there is clearly a use case for this. I wonder whether it'd make sense to introduce a configuration option that sets the desired compatibility level. For cases where values are presented directly on a user interface, junk results are more of a problem.

Before merging this I'll think this through a little more and let any feedback gather. I'll also run this code over the image database to see whether it fixes more problems than it creates on a real-world data set. It certainly sounds like that's the case in your scenario.

If the library goes in this direction, then such permissive parsing should be enabled in several places.

@ArjohnKampman
Copy link
Author

I generally appreciate being able to switch between a 'strict' and a 'lenient' mode for data parsing, but in the case of this error I think that most people will want the lenient mode. If this was an error in the data of one of the segments then the parser could recover from that by skipping to the next segment. But this particular error affects the structure of the data in the file and there is no other meta-structure to fall back to. So in this case, the strict mode would mean to stop processing completely and ignore everything else in the file. If the error occurs near the start of the file, you'll end up with little or no metadata at all.

Considering that the actual image data is often/always at the end of the file and that almost all viewers/tools are able to handle such broken data, a lenient parsing approach seems to be what is generally used for processing JPEG data. Some software will have better error recovery solutions than others though. Recording such parsing errors in a way like this is currently done in class Directory would be ideal, but I couldn't find a way to do this in JpegMetadataReader.

drewnoakes added a commit that referenced this pull request Mar 21, 2016
@drewnoakes drewnoakes merged commit 8c14064 into drewnoakes:master Mar 21, 2016
drewnoakes added a commit to drewnoakes/metadata-extractor-images that referenced this pull request Mar 21, 2016
@drewnoakes
Copy link
Owner

Thanks. Coming back to this and running it over a lot of images, it seems pretty harmless and fixes a real world problem. Thanks for your PR, and your patience in getting it merged.

@ArjohnKampman
Copy link
Author

Thanks for the merge. The last (maven) release is almost a year old, are you planning to do a new release some time soon?

drewnoakes added a commit to drewnoakes/metadata-extractor-dotnet that referenced this pull request Mar 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants