Skip to content

to_json(std::filesystem::path) can create invalid UTF-8 chars on windows #4271

@MHebes

Description

@MHebes

Description

This conversion function:

https://github.com/nlohmann/json/blob/7efe875495a3ed7d805ddbb01af0c7725f50c88b/include/nlohmann/detail/conversions/to_json.hpp#L416C1-L420C2

template<typename BasicJsonType>
inline void to_json(BasicJsonType& j, const std_fs::path& p)
{
    j = p.string();
}

uses p.string(), which does not give a UTF-8-encoded string on windows (in some cases, maybe?). Trying to dump() the resultant JSON throws a "invalid UTF-8 byte" exception.

Reproduction steps

Convert a std::filesystem::path, which contains a unicode "Right Single Quotation Mark" character (U+2019), to a json implicitly or with to_json.

Inspect the new json (string_t)'s bytes, either by dump()ing, or converting to BSON.

Expected vs. actual results

Expected: "Strings are stored in UTF-8 encoding." per https://json.nlohmann.me/api/basic_json/string_t/

Actual: The string gets converted by std::filesystem::path::string(), which appears to convert it to Windows-1252 encoding. Its bytes end up as \x92 rather than \xe2\x80\x99.

Minimal code example

#include <filesystem>
#include <iostream>
#include <nlohmann/json.hpp>

int main() {
  try {
    wchar_t wide_unicode_right_quote[2] = {0x2019, 0};  // came from a directory_iterator in reality
    nlohmann::json apost = std::filesystem::path(wide_unicode_right_quote);
    std::cout << apost << std::endl;
    return 0;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return 1;
  }
}

Workaround I'm using is to use WideCharToMultiByte + .native() to get the string in UTF-8 before passing to nlohmann:

inline std::string Narrow(std::wstring_view wstr) {
  if (wstr.empty()) return {};
  int len = ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), nullptr, 0, nullptr, nullptr);
  std::string out(len, 0);
  ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), &out[0], len, nullptr, nullptr);
  return out;
}

int main() {
  try {
    wchar_t wide_unicode_right_quote[2] = {0x2019, 0};  // came from a directory_iterator in reality
    nlohmann::json apost = Narrow(std::filesystem::path(wide_unicode_right_quote).native());
    std::cout << apost << std::endl;
    return 0;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return 1;
  }
}

Error messages

"[json.exception.type_error.316] invalid UTF-8 byte at index 0: 0x92

Compiler and operating system

MSVC 2022 Professional, C++ 20

Library version

develop - a259ecc

Validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind: bugsolution: proposed fixa fix for the issue has been proposed and waits for confirmationstate: please discussplease discuss the issue or vote for your favorite option

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions