Skip to content

Commit d1b7ce1

Browse files
committed
Add NFS crashed, XFS and deprecate ZFS mover.
Work continues; work around bugs found via use of a different filesystem for cache.
1 parent 0ccccef commit d1b7ce1

File tree

3 files changed

+72
-3
lines changed

3 files changed

+72
-3
lines changed

README.md

+16-2
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ The following open-source projects seem to be able to help reach my goals. It re
2525

2626
- [SnapRAID](https://www.snapraid.it). Provides data parity, backups, checksumming of existing backups.
2727
- [Claims to be better than UNRAID's](https://www.snapraid.it/compare) own parity system with the ability to 'fix silent errors' and 'verify file integrity' among others.
28-
- [BRTFS Filesystem](https://btrfs.wiki.kernel.org/index.php/Main_Page) similar to ZFS in that it provides be the ability to 'send/receive' data streams (ala `zfs send`) with the added benefit that I can run individual `disk scrubs` to detect hardware issues that require me to restore from snapraid parity.
28+
- [BRTFS Filesystem](https://btrfs.wiki.kernel.org/index.php/Main_Page) similar to ZFS in that it provides be the ability to 'send/receive' data streams (ala `zfs send`) with the added benefit that I can run individual `disk scrubs` to detect hardware issues that require me to restore from snapraid parity. **My observed Btrfs performance is that its poor compared to XFS filesystem on linux.** *Since we use btrfs only for the 'data' disks in the slow mergerfs pool we are not sensitive to speed.*
29+
- **XFS Filesystem for NVME cache on mdadm array**. After finding bugs and instability in my ZFS+NFS+mergerfs implementation my cache disks are now formatted to XFS in RAID1. I did not use btrfs raid1 natively here because btrfs performance was poor (50% throughtput penalty). XFS was able to match ZFS raw speeds (without arc) ~900MB/s.
2930
- [MergerFS](https://github.com/trapexit/mergerfs). FUSE filesystem that allows me to 'stitch together' multiple hard drives with different mountpoints and takes care of directing I/O operations based on a set of rules/criteria/policies.
3031
- [snapraid-btrfs](https://github.com/automorphism88/snapraid-btrfs). Automation and helper script for BRTFS based snapraid configurations. Using BRTFS snapshots as the data source for running 'snapraid sync' allows me to continue using my system 24/7 without data corruption risks or downtime when I want to build my parity/snapraid backups.
3132
- [snapraid-btrfs-runner](https://github.com/fmoledina/snapraid-btrfs-runner). Helper script that runs `snapraid-btrfs` sending its output to the console, a log file and via email.
@@ -38,14 +39,27 @@ The following open-source projects seem to be able to help reach my goals. It re
3839
apt-get install zfsutils-linux cockpit-pcp btrfs-progs libbtrfsutil1 btrfs-compsize duc smartmontools
3940
```
4041

41-
## ZFS cache pool setup
42+
## ~~ZFS cache pool setup~~
43+
**WARNING! DEPRECATED** NFS+ZFS is unstable with this setup. Follow XFS+mdadm below.
4244

4345
RAID1 of two SSD disks. We'll write all stuff here then purge to 'cold-storage' slower disks via cron.
4446

4547
```
4648
zpool create -o ashift=12 cache mirror /dev/sdb /dev/nvme0n1
4749
```
4850

51+
## XFS RAID1 mirror mdadm
52+
53+
See [mergerfs](mergerfs.md) for details on ZFS instability. For our cache pool we will use XFS filesystem. Set up the NVME cache as follows:
54+
55+
```
56+
mdadm --create --verbose /dev/md0 --bitmap=none --level=mirror --raid-devices=2 /dev/nvme0n1 /dev/sdb
57+
mkfs.xfs -f -L cache /dev/md0
58+
mdadm --detail /dev/md0
59+
```
60+
61+
Remember to add a mountpoint to start at boot.
62+
4963
## BTRFS (disk setup guide)
5064

5165
### BTRFS Commands TL;DR

filemover/uncache-mover.py filemover/zfs-uncache-mover.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
#!/usr/bin/python3
22
# TheLinuxGuy ZFS cache pool mergerfs tiered cache mover.
33
# File age time-based mover depending on goal % cache utilization.
4-
# TODO: move logs to standalone log file to stop adding crap to syslog
4+
# This script works but is abandoned after NFS+ZFS+mergerfs instability.
5+
# !! THIS SCRIPT IS ZFS POOL SPECIFIC !! DO NOT USE ON XFS cache setup.
56
import argparse
67
import subprocess
78
import syslog

mergerfs.md

+54
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# MergerFS
22

3+
**WARNING: Using ZFS + NFS (non-zfs native export) + mergerfs cause [ZFS mount instability and crashes](https://github.com/trapexit/mergerfs/discussions/1098).**
4+
35
MergerFS is used to "merge" all physical distint disk partitions (/mnt/disk*) into a single logical volume mount.
46

57
### Policies
@@ -44,6 +46,58 @@ To attempt to mirror what unraid provides with their share "cache" we are going
4446

4547
Recall that I chose to use ZFS and RAID1 mirror for this purpose to provide assurances that my data would not be lost before it gets moved onto parity-protected-snapraid-slow-storage-disks.
4648

49+
## NFS instability
50+
51+
`/mnt/cached` is my mergerfs pool and ZFS mountpoint on my local system. The `mergerfs` process seems to be crashing at some point due to NFS. I haven't yet found the root cause of this issue and have tried everything from upgrading kernel, ZFS, nfs-kernel-server, libfuse and OS (Ubuntu 20.04 to 20.10).
52+
53+
The crashes seem to be more pronounced when using NFSv4 protocols. NFSv3 is more stable but that is a stateless protocol and I would much prefer v4 only NFS shares. I have disabled v4 and force v3 for the time being to try to make my implementation stable.
54+
55+
Observed behavior (on local NAS):
56+
```
57+
# ls -lah /mnt/cached
58+
ls: cannot access '/mnt/cached': Input/output error
59+
```
60+
61+
Recovery steps
62+
63+
64+
### Debugging with strace
65+
66+
```
67+
root@nas:/home/gfm# strace -fvTtt -s 256 -p PIDHERE -o /tmp/mergerfs.strace.txt
68+
strace: Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf: Operation not permitted
69+
strace: attach: ptrace(PTRACE_SEIZE, 2081428): Operation not permitted
70+
root@nas:/home/gfm# echo "0"|sudo tee /proc/sys/kernel/yama/ptrace_scope
71+
0
72+
``
73+
74+
If that doesn't work, change setting `/etc/sysctl.d/10-ptrace.conf` to 0. Reboot.
75+
76+
Strace isn't helpful according to mergerfs developer. Here's the proper way to debug mergerfs using gdb
77+
78+
### gdb debugging mergerfs
79+
80+
```
81+
If it's crashing then strace is pretty useless. Need a stack trace from gdb.
82+
83+
gdb path/to/mergerfs
84+
85+
run -f -o options branches mountpoint
86+
87+
when it crashes
88+
89+
thread apply all bt
90+
```
91+
92+
### Remove ZFS from the equation by using XFS RAID 1
93+
94+
```
95+
mdadm --create --verbose /dev/md0 --bitmap=none --level=mirror --raid-devices=2 /dev/nvme0n1 /dev/sdb
96+
mkfs.xfs -f -L cache /dev/md0
97+
mdadm --detail /dev/md0
98+
```
99+
100+
47101
### NFS tweaks that were added
48102
49103
```

0 commit comments

Comments
 (0)