-
Notifications
You must be signed in to change notification settings - Fork 13
Adding support for FSDP2 #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e2c708e
to
5c34f78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a read (w/o tests) I think the cognitive difficulty is too high for me, e.g., there are recurse=True/False, FSDP and non-FSDP module, DTensor vs non-DTensor etc and their combination makes it hard to reason its correctness. Here is my suggestion:
- implements the summon_full_params in Composer, as it is a quite generic util
- only supports either FSDP1 or DTensor (not just FSDP2), not a hybrid of them
- if it is FSDP1, use FSDP1 summon; if there is DTensor, replace it with a full tensor, cache the FQN and tying for context swap as you have done otherwise do nothing
- use this context regardless of if a module is FSDP module or not
let me know if this will work or if I missed anything
@bowenyang008 re: this comment yeah this should work, moving it to Composer makes sense as I can just call it a |
I think it is covered by my 3rd point above, summon_full_params can work on FSDP1 module or module w/ or w/o DTensor, the last case it is basically a null context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also LGTM, before merge can you run one more e2e test to make sure it produces the same result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - would prefer merging the composer PR in first and not having to push arbitrary branches
Will merge after the tests pass once the composer PR goes in + E2E test (but will only merge both once I get GPUs 😭 ) |
e5d2c5e
to
7dad3ca
Compare
formatting plz format working on testing added new test that fails using dtensor APIs added some more tests formatted finally works smh formatted plz work im begging undid changes to reward modeling since that's breaking on main why doesn't the formatter work smh testing out specific FSDP2 change added another test added some logging wip a comment
195e17d
to
65c936c
Compare
Moving all of the
FSDP.summon_full_params
to support both FSDP1 and FSDP2 (fully_shard). Unfortunately, functionality doesn't exist by default so I created a workaround to handle the args that we use within the codebase.Tests:
compose-rl-grpo-test-WHoytM
compose-rl-grpo-test-fsdp2-q0Gy6Q
<- tested with the changes from Composer's PRThey seem to fail due to unrelated issues that occur regardless of the changes (saving the checkpoints to UC Volumes), but the graphs look reasonable (link)