-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm.dirty_bytes Pop!OS customization trashes BTRFS performance #111
Comments
Wow, glad I found this issue. Started happening to me after upgrading to PopOS 21.10. The experience has been HORRIBLE. I have a BTRFS system on bcache. Don't know if that combination made it worse than normal, but the system would become barely usable. I have NEVER had such a negative experience in linux (running for 15 years probably now). I would get stuck watching a youtube video where I couldn't even get out of the window with alt-tab or get out of full-screen for a long time when it would freeze. Sometimes just alt-tabbing between windows would just be stuck and I couldn't tell why. I eventually figured out I could unfreeze if I did the Magic Sysrq ALT-SYSRQ-S keyboard combination to sync the filesystem, so I figured it must be some kind of filesystem buffering problem. I would have to do this shortcut any time a youtube video would just freeze as I navigated around in it, or any web app that had would open a lot of connections. explorer.helium.com was one that would trigger this pretty often where the page just would freeze forever. I have another system using zfs that seems slow sometimes. I wonder if that one is being affected by this as well. It doesn't seem near as bad as this one has been. This is TERRIBLE default behavior and not very obvious how to figure out where the issue is caused. |
Seems this can be closed since #121 is merged. |
It sort of sounds like the performance characteristics across different filesystems and disk types have diverged enough that a single, global value for But that is a feature request for the Linux kernel project, not here. 😉 |
Here is a script that manages most https://gitlab.com/cscs/maxperfwiz/-/blob/master/maxperfwiz?ref_type=heads |
Distribution (run
cat /etc/os-release
):Related Application and/or Package Version (run
apt policy $PACKAGE NAME
):apt policy pop-default-settings pop-default-settings: Installed: 4.0.6~1611854075~20.04~6a2277e Candidate: 4.0.6~1611854075~20.04~6a2277e Version table: *** 4.0.6~1611854075~20.04~6a2277e 1001 1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 Packages 1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main i386 Packages 100 /var/lib/dpkg/status
Issue/Bug Description:
Commit 6a2277e reports:
Unfortunately this fix has the unintended side effect of completely trashing the performance of COW filesystems like BTRFS for regular use as rootfs/home on fast SSDs!
No penalty is observed when when writing large files to a BTRFS partition, but it has very negative effects on operations that do many small writes, like touching metadata on a
btrfs receive
operation or even just when writing a lot of small files (e.g. untarring a big archive with complex directory structure).It can take up to 20 times the wall-clock time of running the same operation commenting out this change (which reverts to the default
vm.dirty_ratio =20
andvm.dirty_background_ratio = 10
).When using BTRFS as rootfs and home, this is even worse, as operations as simple as
apt update
(or packagekit doing it in the background for you),apt upgrade
but also just firefox/chrome regular operation (which can do frequent writes to the local on disk cache) can result in freezes lasting from some seconds to a few minutes where the CPU is stuck in iowait and all processes on the scheduler waiting for kernel triggered IO-trashing to be over.Operations where the user is intentionally doing a lot of writes are even worse: compiling big projects, cloning a moderate or big git repo locally, using
ccache
become just unbearable!My suggestion is to revert this change, or find a different compromise that manage to fix the occasional OOM problems writing big files to slow block devices, without making it impossible to do many small writes to fast devices.
The comments on the LWN article linked in the original commit are quite enlightening on the fact that similar problem on COW filesystems were anticipated following this path and that it might be difficult to strike a good balance without reworking the issue with actual kernel changes that would make these sysfs knobs superfluos.
Steps to reproduce (if you know):
defaults,noatime,compress=zstd
but they are not particularly relevant, you can test with or without)iotop
andhtop
to examine CPU and IO utilization, alternatively you can also usesysstats
to collect the data and visualize it afterwardsExpected behavior:
Using Pop!OS on a BTRFS root filesystem should be usable, and its performance not crippled to avoid rare corner cases when writing large files to slow devices.
Other Notes:
My sample
.tar
to debug the performance issues I was seeing, that finally brought me to isolate commit 6a2277e as the root cause, was a backup of my old rootfs partition: it doesn;t need to be huge, anything that contains a lot of files, with a lot of associated metadata, will work.Actually the smaller the ratio between total archived data size and number of files and metadata, the more the difference should be visible.
The text was updated successfully, but these errors were encountered: