Bug #47446
openNo snap trim progress after removing large snapshots
0%
759622c65fcdf70d4f9a64584a98783ae01635168302f736846d707d878d7395
Description
I've issues with cleaning up a cluster with a relatively large CephFS (201M objects, 372 TiB), after snapshots where removed. I'm not sure if that's really a bug or if we are using CephFS with snapshots way beyond the limits.
After removing snapshots in large directories (>1 mio files, >10TB size), the snapshot trimming started and overloaded the cluster. All CPU cores of the OSD Servers (8 cores with HT for 16 OSDs) were at 100% and CephFS IO blocked was blocked.
OSDs are logging "removing snap head" about every two minutes when trimming is running. This is the "progress" of one hour of an OSD:
osd.0 pg_epoch: 90915 pg[8.5fs0( v 90915'1555675 (90389'1547511,90915'1555675] local-lis/les=90559/90560 n=188923 ec=70330/68831 lis/c=90559/90559 les/c/f=90560/90560/0 sis=90559) [0,15,10]p0(0) r=0 lpr=90559 crt=90915'1555675 lcod 90915'1555673 mlcod 90915'1555673 active+clean+snaptrim trimq=132] removing snap head osd.0 pg_epoch: 90926 pg[8.5fs0( v 90926'1555837 (90389'1547711,90926'1555837] local-lis/les=90559/90560 n=188809 ec=70330/68831 lis/c=90559/90559 les/c/f=90560/90560/0 sis=90559) [0,15,10]p0(0) r=0 lpr=90559 crt=90926'1555837 lcod 90925'1555835 mlcod 90925'1555835 active+clean+snaptrim trimq=132] removing snap head
I couldn't find an explenation for the values, maybe you can help me interpreting them. The trimq value didn't decrease in 24h. That's what I tried:
- Decreasing osd_max_trimming_pgs / osd_pg_max_concurrent_snap_trims from 2 to 1
- reduces the OSD server load, but has no noticeable effect on the progress
- Increasing or decreasing osd_snap_trim_sleep / osd_snap_trim_sleep_hdd
- had no effect
- Setting nosnaptrim
- lets the cluster go back to idle and all PGs to snaptrim_wait.
Sometimes OSDs are reporting crashes when trimming is running, maybe once a day, always a different OSD. You can see one report in the Crash signature field.
For me it would be helpful to get some feedback regarding this behaviour. I want to understand if it is just related to the large amount of objects in the snapshots or if something else is wrong.
Am I right that the trim queue is processed sequentially? So if I delete 10 snapshots (or all) in the same directory, 10 entries get added to the trimq and each object in the directory gets checked ten times.
Of course any ideas of how to recover from this would be appreciated. Let me know if you need any other information.