Skip to content

*scanf() fixes to make TeX work #1109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 23, 2024
Merged

*scanf() fixes to make TeX work #1109

merged 4 commits into from
Feb 23, 2024

Conversation

dfyz
Copy link
Contributor

@dfyz dfyz commented Feb 21, 2024

TeX uses sscanf() extensively, and there is some breakage when it is compiled with Cosmpolitan, which I'm trying to fix with this PR.

For clarity, the fixes are split into three more-or-less independent commits. The commit messages describe what is being fixed, and below is some context on why this is being fixed.

The first commit doesn't fix any actual breakage in TeX; it's a small fix for an edge case in the current code. It could be squashed with the second commit but I think it's cleaner if this change gets a separate commit.

The second commit fixes the mapfile parsing in TeX. It uses scanf() with %f directive to check if the current config option contains a valid float.

The third commit fixes a very weird error that looks like this:

! pdfTeX error (\pdfsetmatrix): Unrecognized format..
<argument> ...shipout:D \box_use:N \l_shipout_box
                                                  \__shipout_drop_firstpage_...

It turns out that TeX uses scanf() with %lf ... %c directive, where the trailing %c is expected to fail (which would mean the last float actually terminates the string).

dfyz added 3 commits February 21, 2024 02:59
PR jart#924 appears to use `unget()` subtly incorrectly when parsing
floating point numbers. The rest of the code only uses `unget()`
immediately followed by `goto Done;` to return back the symbol that
can't possibly belong to the directive we're processing.

With floating-point, however, the ungot characters could very well
be valid for the *next* directive, so we will essentially read them
twice. It can't be seen in `sscanf()` tests because `unget()` is a
no-op there, but the test I added for `fscanf()` fails like this:

        ...
        EXPECT_EQ(0xDEAD, i1)
                need 57005 (or 0xdead) =
                 got 908973 (or 0x000ddead)
        ...
        EXPECT_EQ(0xBEEF, i2)
                need 48879 (or 0xbeef) =
                 got 769775 (or 0x000bbeef)

This means we read 0xDDEAD instead of 0xDEAD and 0xBBEEF instead of
0xBEEF. I checked that both musl and glibc read 0xDEAD/0xBEEF, as
expected.

Fix the failing test by removing the unneeded `unget()` calls.
Currently, we just ignore any errors from `strtod()`. They can
happen either because no valid float can be parsed at all, or
because the state machine recognizes only a prefix of a valid
floating-point number.

Fix this by making sure `strtod()` parses everything we recognized,
provided it's non-empty. This requires to pop the last character
off the FP buffer, which is supposed to be parsed by the next
`*scanf()` directive.
Currently, `%c`-style directives always succeed even if there
are actually fewer characters in the input than requested.

Before the fix, the added test fails like this:

        ...
        EXPECT_EQ(2, sscanf("ab", "%c %c %c", &c2, &c3, &c4))
                need 2 (or 0x02 or '\2' or ENOENT) =
                 got 3 (or 0x03 or '\3' or ESRCH)
        ...
        EXPECT_EQ(0, sscanf("abcd", "%5c", s2))
                need 0 (or 0x0 or '\0') =
                 got 1 (or 0x01 or '\1' or EPERM)

musl and glibc pass this test.
Copy link
Collaborator

@G4Vi G4Vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good. Thanks for adding thorough tests! Just "Requesting Changes" so you can own your own test file.

Copy link
Owner

@jart jart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this! Looks good!

@jart jart merged commit f7ff515 into jart:master Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants