Skip to content

RAM Class Persistence: Introducing Frozen Classes #22063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 9, 2025

Conversation

lzhou2025
Copy link
Contributor

@lzhou2025 lzhou2025 commented Jun 9, 2025

During snapshot creation, the J9Class structures whose class loader is
application class loader are marked as frozen when their classObject
fields are nullified. During the restore phase, these structures are never
used, kept hidden from the GC and other external users as a limitation.
This is achieved by updating the skip logic in the J9HashTable,
J9MemorySegment Iterator, which prevents the GC from encountering
J9Class structures in an unusable state that could lead to crashes.
The class ClassHeapIterator is refactored to use a generic vm interface
segmentIteratorNextClass() to iterate over classes in a segment.

Fixes: #21892

Co-authored-by: Babneet Singh [email protected]
Co-authored-by: Tobi Ajila [email protected]
Co-authored-by: Lige Zhou [email protected]

@lzhou2025 lzhou2025 changed the title RCP: Hide class objects restored from GC RCP: Skip GC's null checking Jun 10, 2025
@lzhou2025 lzhou2025 marked this pull request as ready for review June 10, 2025 13:42
@lzhou2025
Copy link
Contributor Author

@babsingh @TobiAjila @dmitripivkine I checked all gc policies using pingLiberty server program, please have a look.

@babsingh
Copy link
Contributor

GC_ClassHeapIterator::nextClass and similar iterators will also need to be updated to skip frozen classes (J9ClassIsFrozenFromSnapshot) until their classObject is initialized.

GC_ClassHeapIterator::nextClass()

@lzhou2025 lzhou2025 marked this pull request as draft June 10, 2025 14:28
@dmitripivkine
Copy link
Contributor

dmitripivkine commented Jun 10, 2025

I don't like this solution at all. Make j9classes hidden for GC it is bad idea.
j9class can keep alive other objects. Skipping it these objects never be scanned. You can say you guarantee there is no other objects referenced from j9class at this particular moment. I think this condition can be hard to control, now and in the future.
I am repeating again, let's set a meeting with @amicic, @TobiAjila, @babsingh and myself to discuss options.
And my first question is (may be I don't understand context) - why we set j9class->classObject to NULL before creating snapshot? This object exists in the heap and there is no another GC after this point - so object is transitioned to restore size and can be used.
In the case of you have reason to cleanup and re-create class object this new fundamental behaviour should be added to GC code. We should legally allow NULL pointer in the j9class->classObject, at least during initial restore time. I believe it might require to add significant restrictions to GC activity (like disabling dynamic class unloading but not limited to). Making j9class->classObject alive GC makes j9class itself alive and prevent class loader from unloading.
GC code can be adjusted to allow j9class->classObject be NULL if itv is really necessary, but it should be done properly.

@babsingh
Copy link
Contributor

babsingh commented Jun 10, 2025

why we set j9class->classObject to NULL before creating snapshot?

We are caching RAM classes during shutdown in the snapshot, similar to how ROM classes are handled with the Shared Classes Cache. Currently, only the native portions of the J9Module, J9Class, and J9ClassLoader structures are cached in the snapshot. Java heap objects themselves are not cached, so their associated fields are nullified during snapshot creation at shutdown.

During the restore phase, the snapshot is loaded during JVM initialization. However, the corresponding Java objects are not yet initialized; they will be created later through their normal code paths. Despite this, the native structures from the snapshot become visible to the garbage collector as soon as the snapshot is loaded.

The goal is to delay the exposure of these native structures to the GC until their associated Java objects are fully initialized through standard execution paths.

@lzhou2025
Copy link
Contributor Author

GC_ClassHeapIterator::nextClass and similar iterators will also need to be updated to skip frozen classes (J9ClassIsFrozenFromSnapshot) until their classObject is initialized.

GC_ClassHeapIterator::nextClass()

I will include the update

@lzhou2025 lzhou2025 force-pushed the frozenClasses branch 2 times, most recently from 84ee554 to 157d324 Compare June 11, 2025 18:20
@lzhou2025 lzhou2025 changed the title RCP: Skip GC's null checking RCP: Introducing Frozen Classes Jun 11, 2025
@lzhou2025
Copy link
Contributor Author

As per our Teams meeting with @TobiAjila and @amicic, the PR is revised to implement the frozen class into the common class iterator interfaces.

@lzhou2025 lzhou2025 marked this pull request as ready for review June 11, 2025 18:34
@babsingh
Copy link
Contributor

babsingh commented Jun 13, 2025

@lzhou2025 The PR description is slightly inaccurate. We are not freezing the j9class objects; we are freezing j9classes until their classObjects are initialized. Also, the why is missing in the description.

Updated version of the commit message and PR description:

During snapshot creation, J9Class structures are marked as frozen when
their classObject fields are nullified. During the restore phase, these
structures are kept hidden from the GC and other external users until
their classObject fields are properly initialized. This is achieved by
updating the skip logic in the J9HashTable and J9MemorySegment
iterators, which prevents the GC from encountering J9Class structures
in an unusable state that could lead to crashes.

In the PR description, the title is repeated, which should be fixed.

Can you confirm if #21892 is fixed through local testing once the changes are finalized?

@lzhou2025
Copy link
Contributor Author

Will check it.

@lzhou2025 lzhou2025 force-pushed the frozenClasses branch 3 times, most recently from 7002eec to 865c696 Compare June 13, 2025 22:32
@lzhou2025
Copy link
Contributor Author

I pushed the update, but leave null check. The problem is that if delete "null check", jvm crashes. I spent some time to debug the crash, it appears that those "null check" is applied to classes loaded by the application class loader (saved in snapshot) from the snapshot run, in restore, the class loader is ignored, unused, but new application class loader is created instead

#if defined(J9VM_OPT_SNAPSHOT)
, does it cause those classes never initialized?

@lzhou2025 lzhou2025 marked this pull request as draft June 16, 2025 21:12
@lzhou2025 lzhou2025 marked this pull request as draft June 16, 2025 21:12
@lzhou2025 lzhou2025 force-pushed the frozenClasses branch 5 times, most recently from fe206bc to 5150e32 Compare July 4, 2025 15:38
@keithc-ca
Copy link
Contributor

Could someone please expand "RCP" in this context, please?

@babsingh
Copy link
Contributor

babsingh commented Jul 7, 2025

RCP – RAM Class Persistence: This refers to the caching of RAM classes in snapshots so they can be reused in subsequent runs.

@lzhou2025 lzhou2025 force-pushed the frozenClasses branch 2 times, most recently from afc3674 to 36a6020 Compare July 7, 2025 17:25
@keithc-ca
Copy link
Contributor

RCP – RAM Class Persistence: This refers to the caching of RAM classes in snapshots so they can be reused in subsequent runs.

Both the commit message and the description here should spell that out.

@lzhou2025 lzhou2025 changed the title RCP: Introducing Frozen Classes RAM Class Persistence: Introducing Frozen Classes Jul 7, 2025
@lzhou2025
Copy link
Contributor Author

Updated

@keithc-ca
Copy link
Contributor

@dmitripivkine Has this changed sufficiently to address your concern?

@dmitripivkine
Copy link
Contributor

@dmitripivkine Has this changed sufficiently to address your concern?

Suggested way of handling of Frozen classes (hiding from GC) introduces number of limitations how such classes should be treated. So, VM component is taking responsibility for correctness. I have suggested to add assertions to control Frozen classes isolation:

  • Frozen class should exist for persistent class loaders (never be unloaded) only (System, Application, Extensions).
  • Frozen class should not be used to create object instance or used any other way when GC can discover them indirectly (ex. from CP of another class).
  • Frozen class should be activated (loaded) using special rules first (including setting class object) and have other required object pointers be introduced after
  • etc.
    I can approve this change if VM Team developers like this way.

@tajila
Copy link
Contributor

tajila commented Jul 8, 2025

Frozen class should exist for persistent class loaders (never be unloaded) only (System, Application, Extensions).

This is an implilied restriction as we dont have to the ability to cache classes from non-peristent classloaders

Frozen class should not be used to create object instance or used any other way when GC can discover them indirectly (ex. from CP of another class).

All cached classes have empty constantpools and statics so Frozen classes cant be discovered until they are loaded from the cache (which transforms the class from a frozen to a normal class).

Frozen class should be activated (loaded) using special rules first (including setting class object) and have other required object pointers be introduced after

This is the current behaviour. This PR doesnt change the contract with the GC, NULL == clazz->classObject should not be possible when J9_EXTENDED_RUNTIME_CLASS_OBJECT_ASSIGNED is set, and GC is free to assert this.

@tajila
Copy link
Contributor

tajila commented Jul 8, 2025

jenkins test sanity.functional amac jdk17

During snapshot creation, the J9Class structures whose class loader is
application class loader are marked as frozen when their classObject
fields are nullified. During the restore phase, these structures are never
used, kept hidden from the GC and other external users as a limitation.
This is achieved by updating the skip logic in the J9HashTable,
J9MemorySegment Iterator, which prevents the GC from encountering
J9Class structures in an unusable state that could lead to crashes.
The class ClassHeapIterator is refactored to use a generic vm interface
segmentIteratorNextClass() to iterate over classes in a segment.

Fixes: eclipse-openj9#21892

Co-authored-by: Babneet Singh [email protected]
Co-authored-by: Tobi Ajila [email protected]
Co-authored-by: Lige Zhou [email protected]
@keithc-ca
Copy link
Contributor

keithc-ca commented Jul 8, 2025

For the record, the build underway is at https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/7782.

@lzhou2025
Copy link
Contributor Author

For the record, the build underway is at https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/7782.

I will cancel my job, I didn't notice that there is "PullRequest-OpenJ9" under job.

@keithc-ca
Copy link
Contributor

There's no need to cancel that job; it completed successfully.

@lzhou2025
Copy link
Contributor Author

``

There's no need to cancel that job; it completed successfully.

Perfect

@tajila tajila merged commit 0bc5edf into eclipse-openj9:master Jul 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RCP: LibertySUDT Server crashed in restore
6 participants