-
-
Notifications
You must be signed in to change notification settings - Fork 492
Description
Describe the bug
Zram is currently configured to use zstd1, which is suboptimal
What did you expect to happen?
I've spent an inordinate amount of time optimizing zram on my system.
Benchmarks on zstd-1
vs lz4
https://www.reddit.com/r/Fedora/comments/mzun99/new_zram_tuning_benchmarks/
An explanation of vm.swappiness
https://docs.kernel.org/admin-guide/sysctl/vm.html#swappiness
https://stackoverflow.com/questions/72544562/what-is-vm-swappiness-a-percentage-of
Overcommitting memory (zram being bigger than RAM size) is good
https://issuetracker.google.com/issues/227605780
Gathered wisdoms
- Swap is usually written sequentially but read randomly
- Zram writes don't matter nearly as much as because Linux page cache data is already in RAM, and RAM>RAM transfers are absurdly quick
- Zram hardcoded blocksize is 4K
- For system memory performance applications care jack about bandwidth, latency (CAS etc in hardware, IOPS in software) is what matters
- You still want a swapfile, even with zram, due to the fact that you're gonna have incompressible or very idle pages that you want to evict from RAM. This goes double for the Deck, which will be RAM starved on some games due to the GPU claiming a lot of RAM. More on this later.
IOPS benchmark on Samsung 970 EVO Plus 1TB
- lz4: 2 030 000 (!)
- zstd1: 820 000
- 970 EVO: 15 300
Compression ratios on mixed data
- lz4: 2.1
- zstd1: 2.9
This is very relevant for the Deck, because <12GB is right where in a lot of scenarios the benefits of extra memory from zstd1
start to outstrip the latency benefits of lz4
.
Valve probably has a lot of profiled data, but as far as I've been able to tell, even the heaviest games don't go much over 4GB of VRAM.
Swappiness
Swappiness can be derived via formula. On kernel.org, they state
For example, if the random IO against the swap device is on average 2x faster than IO from the filesystem, swappiness should be 133 (x + 2x = 200, 2x = 133.33).
You can reduce that to (yx = 200 -x)
, where y
is filesystem-to-swap IO ratio.
With the 970 Evo Plus as example again, we have aforementioned read IOPS values. 970EVO vs. lz4 = 15 300 / 2 030 000 = 0.008
, so 0.008
is our ratio.
We plug that in, 0.008x = 200 -x = 198.4
, and we get vm.swappiness=198
.
Page clusters
These are logarithmic. With zram, you get noticeable latency improvements with 1 page, vm.page-cluster=0
Writeback device (backing swap partition)
https://www.kernel.org/doc/html/v5.9/admin-guide/blockdev/zram.html#writeback
Remember how I mentioned still needing a swapfile?
Here is where it gets slightly more convoluted.
- Zram currently only accepts swap partitions as a writeback device.
- This swap partition can also not be the "ordinary" swap partition of the system, although I don't know if the Deck uses the swapfile for anything beyond. Thankfully you can replace the swapfile with the swap partition, effectively costing no extra space
- Currently, marking pages idle and evicting them has to be invoked via commensurate commands. This could be done via either
cron
or perhaps asystemd
service. This will also need a writeback limit set to prevent wearing out SSDs, especially the 64GB eMMC Deck.
Two links about configuring page eviction with writeback:
RFE: Actually use the writeback device systemd/zram-generator#164
https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/ZramWriteback.java
Extra
There is also secondary algorithm recompression, although I have not yet tried this out and it is only in the newer kernels.
https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html#recompression
Output of rpm-ostree status
No response
Hardware
No response
Extra information or context
No response