Skip to content

Conversation

svanteschubert
Copy link
Contributor

Enabling FormatCodeTests and created follow-up issues for the remaining problems, i.e. #370, #371 and by updating a test file (using different dates for different formatting to be certain to have the correct cell) and using pretty printing XML more assertions had to be disabled due to #229

What has been changed:

  1. First of all, I realised that the output was invalid, but so was the input from IBM Lotus Symphony, released October 2010, which omitted the version attribute in the manifest, created Loading an invalid ODF without version attribute in manifest:manifest should add it #364 to fix invalid files (at least in this case) by adding a version.

    then I had to (re)learn this formatting language for spreadsheet cells, which seems to be implementation-dependent:
    image
    see https://docs.oasis-open.org/office/OpenDocument/v1.4/OpenDocument-v1.4-part4-formula.html#TEXT
    But the ODF implementations describe this formatting language:
  1. Second, the formats that I received from LibreOffice differ from the expected ones from the tests, so I simply resaved the document and took the date formatting from the file, which looked good, and changed the dates to be able to prove which cell was taken by the content.

  2. Third, I made some Nullpointer sanity checks, but I avoided changing the API, like sometimes an InvalidArgumentException is thrown, when the cell is not of the time type, but if not set, Null is returned. 

@svanteschubert svanteschubert requested a review from mistmist May 13, 2025 12:31
@xzel23
Copy link
Collaborator

xzel23 commented May 14, 2025

Ah, I thought this was only about the format strings. I have changed the code to not use the obsolete pre-Java 8 time classes and the not threadsafe SimpleDateFormatter. The java.time classes already use ISO 8601 out of the box when parsing or calling toString(), so this all gets a lot easier. When using custom format strings, the DateTimeFormatter accepts the same formats as SimpleDateFormatter (plus some added fields), but it is more picky and will refuse incorrect formats that SimpleDateFormatter will silently process.

@svanteschubert
Copy link
Contributor Author

@xzel23 While I had been waiting for the review for the enabling the format tests, the time refactoring causing some trouble.
Fixed one as part of my pan-ultimate commit, but there is one more issue:
testSetGetFormat() throws Exception - Unsupported field: MonthOfYear

The test calls seem so uncritical:
cell = tablerow.getCellByIndex(3);
cell.setTimeValue(currenttime);
cell.setFormatString("HH:MM:SS");

Others had the problem as well, but I was not able to fix it quickly and will return latest Wednesday noon to my desk:
Could you please take a look at it, Axel?

Unsupported field: MonthOfYear
java.time.temporal.UnsupportedTemporalTypeException
at java.base/java.time.LocalTime.get0(LocalTime.java:710)
at java.base/java.time.LocalTime.getLong(LocalTime.java:688)
at java.base/java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:308)
at java.base/java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2914)
at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2529)
at java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1905)
at java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1879)
at java.base/java.time.LocalTime.format(LocalTime.java:1444)
at org.odftoolkit.odfdom.doc.table.OdfTableCell.setCellFormatString(OdfTableCell.java:1620)
at org.odftoolkit.odfdom.doc.table.OdfTableCell.setFormatString(OdfTableCell.java:1581)
at org.odftoolkit.odfdom.doc.table.TableCellTest.testSetGetFormat(TableCellTest.java:278)

PS: Could you please take a look at it and work on this PR?

@xzel23
Copy link
Collaborator

xzel23 commented May 18, 2025

The problem is the 'M'. Both SimpleDateFormat (pre-Java 8) and DateTimeFormatter (java.time) interpret this as 'Month of year', and there is no month in a time (without a date). For minutes, 'm' has to be used for both formatting classes. Also, 'S' is for milliseconds, so HH:MM:SS should really be HH:mm:ss. So why is it not all lowercase? Because h is for the hour (1-12) in am/pm time.

I'll have a look at this PR later.

tablerow = table.getRowByIndex(7);
cell = tablerow.getCellByIndex(3);
cell.setTimeValue(currenttime);
cell.setFormatString("HH:mm:ss");

cell.setFormatString("HH:MM:SS");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cell.setFormatString("HH:mm:ss");

I first thought that ODF handles this differently, and we'd have to convert the format strings. But OdfNumberDateStyle explicitly mentions the SimpleDateFormat format string, which uses the same format as DateTimeFormatter. So we really need to use the lowercase letters for times. The list of symbols is given in the OdfNumberDateStyle Javadoc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LibreOffice Strings are all locale dependent. If we want to use the LibreOffice strings directly, a translation method has to be implemented with lookups for all supported locales. I think, internally, the ISO 8601 format should be used and a translation method if the library user wants to use the LibreOffice format strings (which as I understand it are not part of the ODF spec).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should maybe add additional String toLibreOfficeFormat(String isoFormat, Locale locale) and String parseLibreOfficeFormat(String loFormat, Locale locale) as convenience methods for users. To have localized meanings when calling setFormatString() is a nightmare when you want to develop an application that should be able to run on different locales.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LibreOffice Strings are all locale dependent.

AFAIK LibreOffice displays locale dependent stuff in the UI but always stores it as en-US (i.e. effectively locale-independent) in ODF.

(in contrast, i've had to solve problems related to importing locale-dependent text fields from DOCX....)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so the formatting code needs some updates. I think it would be a good thing to support the (english) LibreOffice format strings as well as the ISO format, using two different methods. I see if I can put together a two-way-converter building on what is already there and make a suggestion in code (PR).
That way, compatibility for existing users is maintained and we also have a more "standard" (from the Java perspective) way of dealing with formats. Ideally, this will cover all formats, but I'll start with dates and times.

Assert.assertEquals("date", cell.getValueType());

ods = OdfSpreadsheetDocument.newSpreadsheetDocument();
tbl = ods.getTableByName("Sheet1");
for (int i = 1; i <= 10; i++) {
cell = tbl.getCellByPosition("A" + i);
cell.setTimeValue(Calendar.getInstance());
cell.setFormatString("yyyy.MM.dd HH:mm:ss");
cell.setFormatString("YYYY.MM.dd HH:MM:SS");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      cell.setDateTimeValue(Calendar.getInstance());
      cell.setFormatString("yyyy.MM.dd HH:mm:ss");

The format string contains a date part, so the value should contain a date as well. And in the format string, lowercase letters have to be used for the time part.

@@ -270,12 +272,11 @@ public void testSetGetFormat() throws Exception {
OdfTableCell cell = tablerow.getCellByIndex(3);
Calendar currenttime = Calendar.getInstance();
cell.setDateValue(currenttime);
cell.setFormatString("yyyy-MM-dd");
cell.setFormatString("YYYY-MM-dd");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use yyyy here, YYYYis the week-based year, which will give wrong results in the first week of january if the week starts in the year before.

@@ -313,30 +314,30 @@ public void testSetGetFormat() throws Exception {
for (int i = 1; i <= 10; i++) {
cell = tbl.getCellByPosition("A" + i);
cell.setDateValue(Calendar.getInstance());
cell.setFormatString("yyyy.MM.dd");
cell.setFormatString("YYYY.MM.dd");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small yyyyis correct here, see above.

}
cell = tbl.getCellByPosition("A11");
cell.setFormula("=max(A1:A10)");
// contains 'y' 'M' 'd' should be adapted as date.
cell.setFormatString("yyyy.MM.dd");
cell.setFormatString("YYYY.MM.dd");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yyyy

@@ -1025,9 +1028,9 @@ public void testSetDefaultCellStyle() throws Exception {
contentAutoStyles = contentDom.getOrCreateAutomaticStyles();

OdfNumberDateStyle dateStyle =
new OdfNumberDateStyle(contentDom, "yyyy-MM-dd", "numberDateStyle", null);
new OdfNumberDateStyle(contentDom, "YYYY-MM-dd", "numberDateStyle", null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yyyy

@@ -1050,7 +1053,8 @@ public void testSetDefaultCellStyle() throws Exception {
OdfTableCell aCell = column.getCellByIndex(0);
aCell.setValueType("date");
String format = aCell.getFormatString();
Assert.assertEquals("yyyy-MM-dd", format);
// due to https://github.com/tdf/odftoolkit/issues/371
Assert.assertEquals("YYYY-MM-dd", format);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.assertEquals("yyyy-MM-dd", format);

@@ -1126,6 +1127,10 @@ public LocalTime getLocalTimeValue() {
if (getTypeAttr() == OfficeValueTypeAttribute.Value.TIME) {
String timeStr = mCellElement.getOfficeTimeValueAttribute();
try {
if (timeStr == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is an improvement: the old code would throw an exception, the new code more or less silently returns null. Further down, this may lead to incorrect results. Better throw an exception. If the user wants to handle this leniently, they should catch the exception.
If on the other hand empty cells with time value type should be allowed, we should return an Optional instead. Are empty time cells allowed in ODF? I don't know.

@@ -885,7 +885,8 @@ public void testGetCellByPosition() throws Exception {
table = mOdsDoc.getTableByName("Sheet1");
cell = table.getCellByPosition("C1");
Assert.assertNotNull(cell);
Assert.assertEquals("Currency", cell.getStringValue());
// FixMe: Testfile now with pretty printing in XML https://github.com/tdf/odftoolkit/issues/229
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. Does the pretty-printing change the string value?

@@ -613,7 +612,7 @@ public void setCurrencyCode(String currency) {
if (currency == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related to this particular PR, but I see IllegalArgumentExceptions thrown all over the place for null arguments. There are points for and against using IllegalArgumentException instead of NullPointerException. When using NPE, we could simply use Objects.requireNonNull(currency) or Objects.requireNonNull(currency, "Currency code should not be null").

If we use IAE instead, we should add such a utility method in the codebase so this becomes:

    public void setCurrencyCode(String currency) {
        checkParameterNonNull(currency, "Currency code must not be nulll");
        ...
    }

Or use JSpecify anntotations (that makes the user's IDE, Spotbugs and other tools emit warnings) and possibly use a tool such as cabe to automatically have those checks injected during compilation based on the annotations (note: as the author of Cabe, I'm naturally biased here ).

This would simply look like this:

@NullMarked
class OdfTableCell {
    ...
    // as the whole class is @Nullmarked, currency must not be null
    // the user's IDE (if correctly configured) will warn about a (possibly) null value
    public void setCurrencyCode(String currency) { 
        ...
    }
}

Only where null is allowed, the parameter has to be annotated with @Nullable.

@svanteschubert
Copy link
Contributor Author

Let's summarise a bit on the date patterns we have been discussing.

We are an ODF library, our ground truth is the ODF specification.
Unfortunately, the Date patterns are not part of the ODF specification - reusing simply the ISO standard 8601 standard - but every ODF application can use its date patterns, see :
image
see https://docs.oasis-open.org/office/OpenDocument/v1.4/OpenDocument-v1.4-part4-formula.html#TEXT

These patterns are named in the ODF specification FormatCode, which is a sequence of characters with an implementation-defined meaning.
Unfortunately, the ISO 8601 standard is usually behind a pay wall, but there is a PDF draft of working group 5 available from 2016.
But an implementation of this standard might be - hard to prove, not having the spec - the Java implementation available since JDK 8: See https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#patterns

Nevertheless, in the file formats - within the XML - only the "FormatCode" of an ODF application will be found and must be translated back and forth from their implementation-defined to the JDK class.

In this issue, we might start making the existing code, which relies on IBM Lotus Symphony (based on OpenOffice.org), run with LibreOffice:
The FormatCode of LibreOffice is defined in its online help:
https://help.libreoffice.org/24.8/en-GB/text/shared/01/05020301.html?&DbPAR=CALC&System=WIN

When all @ignore are being removed from
https://github.com/tdf/odftoolkit/blob/master/odfdom/src/test/java/org/odftoolkit/odfdom/doc/table/TableCellTest.java
The problems become apparent, in addition to the text normalisation #229, which I will focus on now.
@xzel23 Would you like to give this format code fix a try, as it was related to the time/date refactorings you did earlier?

If anyone likes to provide MS Office ODF FormatCode support, any PR would be welcome! :-)

@xzel23
Copy link
Collaborator

xzel23 commented May 21, 2025

Hi @svanteschubert, yes, I wanted to do that anyway. My suggestion is to make the get- and set-methods accept the LibreOffice format strings, but also add alternative methods, maybe setIsoDateFormat() or setJavaDateFormat(), that accept the Java SimpleDateFormatter/DateTimeFormatter ISO compliant codes, because that's what Java developers are used to and many have in their code base.

PS: The symbols to be used in format strings are defined in the first part of the standard ("ISO 8601-1:2019(en)"), under "3.2.4 Symbols used in place of digits or signs". While you need to pay (IMHO too much) money to obtain a copy of the standard, that section is available free to read when you click the "Read Sample" button below the rendered page on the left of the ISO 8601 page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants