https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-06-24T09:45:40ZCeph RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2190872022-06-24T09:45:40ZDan van der Ster
<ul></ul><blockquote>
<p>Removing the pool snap then deep scrubbing again removes the inconsistent objects.</p>
</blockquote>
<p>This isn't true -- my quick test was probably so simple that it worked in that one case.</p>
<p>But doing more testing with pool snaps and modifying max_mds shows that you can end up with inconsistent objects that exist in the PG but belong to no snapshots. How would one remove those objects?</p>
<p>Details:</p>
<p>I just created a new snapshot, changed max_mds, then<br />removed the snap -- this time I can't manage to "fix" the<br />inconsistency. In this case, the inconsistent object appears to be an old version of<br />mds0_openfiles.0<br /><pre>
# rados list-inconsistent-obj 3.6 | jq .
{
"epoch": 7754,
"inconsistents": [
{
"object": {
"name": "mds0_openfiles.0",
"nspace": "",
"locator": "",
"snap": 3,
"version": 2467
},
</pre><br />I tried modifying the current version of that with setomapval, but the<br />object stays inconsistent.<br />I even removed it from the pool (head version) and somehow that old<br />snapshotted version remains with the wrong checksum even though the<br />snap exists.<br /><pre>
# rados rm -p cephfs.cephfs.meta mds0_openfiles.0
#
# ceph pg ls inconsistent
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES*
OMAP_KEYS* LOG STATE SINCE VERSION REPORTED
UP ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP
3.6 13 0 0 0 20971520 0
0 41 active+clean+inconsistent 2m 7852'2479 7852:12048
[0,3,2]p0 [0,3,2]p0 2022-06-24T11:31:05.605434+0200
2022-06-24T11:31:05.605434+0200
# rados lssnap -p cephfs.cephfs.meta
0 snaps
</pre><br />This is getting super weird (I can list the object but not stat it):<br /><pre>
# rados ls -p cephfs.cephfs.meta | grep open
mds1_openfiles.0
mds3_openfiles.0
mds0_openfiles.0
mds2_openfiles.0
# rados stat -p cephfs.cephfs.meta mds0_openfiles.0
error stat-ing cephfs.cephfs.meta/mds0_openfiles.0: (2) No such file
or directory
</pre><br />I then failed over the mds to a standby so mds0_openfiles.0 exists<br />again, but the PG remains inconsistent with that old version of the<br />object.</p>
<p>How do I remove that inconsistent object?</p>
<p>Here is the PG contents. I have two objects with the relevant snapid 3:</p>
<pre>
# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 3.6 --op list
Error getting attr on : 3.6_head,#-5:60000000:::scrub_3.6:head#, (61) No data available
["3.6",{"oid":"scrub_3.6","key":"","snapid":-2,"hash":6,"max":0,"pool":-5,"namespace":"","max":0}]
["3.6",{"oid":"200.00000480","key":"","snapid":-2,"hash":1217246726,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000478","key":"","snapid":-2,"hash":2867267206,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"617.00000000","key":"","snapid":3,"hash":1358031174,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"617.00000000","key":"","snapid":-2,"hash":1358031174,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"627.00000000","key":"","snapid":-2,"hash":1048409926,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"mds0_openfiles.0","key":"","snapid":3,"hash":1261208230,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"mds0_openfiles.0","key":"","snapid":-2,"hash":1261208230,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"620.00000000","key":"","snapid":-2,"hash":368001638,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"60f.00000000","key":"","snapid":-2,"hash":1879848182,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000482","key":"","snapid":-2,"hash":3898282742,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"608.00000000","key":"","snapid":-2,"hash":3292552846,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000471","key":"","snapid":-2,"hash":985378558,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000467","key":"","snapid":-2,"hash":2708043262,"max":0,"pool":3,"namespace":"","max":0}]
</pre> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2191662022-06-28T10:01:34ZVenky Shankarvshankar@redhat.com
<ul></ul><p>Hi Dan,</p>
<p>I need to check, but does the inconsistent object warning show up only after reducing max_mds?</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2191812022-06-28T12:36:44ZDan van der Ster
<ul></ul><p>Venky Shankar wrote:</p>
<blockquote>
<p>Hi Dan,</p>
<p>I need to check, but does the inconsistent object warning show up only after reducing max_mds?</p>
</blockquote>
<p>Good point. In fact it is sufficient to just create some files in the cephfs after taking a pool snapshot.</p>
<p>With my test cluster and 1 active mds, I took a pool snapshot, then untarred the linux kernel, called `sync`, then deep-scrubbed all cephfs.cephfs.meta PGs. This found several inconsistent objects:</p>
<pre>
2022-06-28T14:31:32.224869+0200 osd.0 (osd.0) 60 : cluster [ERR] 3.0 shard 2 soid 3:09401a81:::601.00000000:5 : omap_digest 0xffffffff != omap_digest 0x5a8b636f from shard 0, omap_digest 0xffffffff != omap_digest 0x5a8b636f from auth oi 3:09401a81:::601.00000000:5(7954'756 osd.1.0:30 dirty|omap|data_digest|omap_digest s 0 uv 426 dd ffffffff od 5a8b636f alloc_hint [0 0 0])
2022-06-28T14:31:32.225018+0200 osd.0 (osd.0) 61 : cluster [ERR] 3.0 shard 3 soid 3:09401a81:::601.00000000:5 : omap_digest 0xffffffff != omap_digest 0x5a8b636f from shard 0, omap_digest 0xffffffff != omap_digest 0x5a8b636f from auth oi 3:09401a81:::601.00000000:5(7954'756 osd.1.0:30 dirty|omap|data_digest|omap_digest s 0 uv 426 dd ffffffff od 5a8b636f alloc_hint [0 0 0])
2022-06-28T14:31:32.225098+0200 osd.0 (osd.0) 62 : cluster [ERR] 3.0 shard 2 soid 3:0bd6d154:::602.00000000:5 : omap_digest 0xffffffff != omap_digest 0x421b5397 from shard 0, omap_digest 0xffffffff != omap_digest 0x421b5397 from auth oi 3:0bd6d154:::602.00000000:5(7954'758 osd.1.0:31 dirty|omap|data_digest|omap_digest s 0 uv 425 dd ffffffff od 421b5397 alloc_hint [0 0 0])
2022-06-28T14:31:32.225173+0200 osd.0 (osd.0) 63 : cluster [ERR] 3.0 shard 3 soid 3:0bd6d154:::602.00000000:5 : omap_digest 0xffffffff != omap_digest 0x421b5397 from shard 0, omap_digest 0xffffffff != omap_digest 0x421b5397 from auth oi 3:0bd6d154:::602.00000000:5(7954'758 osd.1.0:31 dirty|omap|data_digest|omap_digest s 0 uv 425 dd ffffffff od 421b5397 alloc_hint [0 0 0])
2022-06-28T14:31:32.225241+0200 osd.0 (osd.0) 64 : cluster [ERR] 3.0 shard 2 soid 3:0d82a743:::600.00000000:5 : omap_digest 0xffffffff != omap_digest 0x63500a94 from shard 0, omap_digest 0xffffffff != omap_digest 0x63500a94 from auth oi 3:0d82a743:::600.00000000:5(7954'754 osd.1.0:32 dirty|omap|data_digest|omap_digest s 0 uv 423 dd ffffffff od 63500a94 alloc_hint [0 0 0])
2022-06-28T14:31:32.225307+0200 osd.0 (osd.0) 65 : cluster [ERR] 3.0 shard 3 soid 3:0d82a743:::600.00000000:5 : omap_digest 0xffffffff != omap_digest 0x63500a94 from shard 0, omap_digest 0xffffffff != omap_digest 0x63500a94 from auth oi 3:0d82a743:::600.00000000:5(7954'754 osd.1.0:32 dirty|omap|data_digest|omap_digest s 0 uv 423 dd ffffffff od 63500a94 alloc_hint [0 0 0])
2022-06-28T14:31:32.225408+0200 osd.0 (osd.0) 66 : cluster [ERR] 3.0 shard 2 soid 3:0d89b25e:::603.00000000:5 : omap_digest 0xffffffff != omap_digest 0xa7a553c8 from shard 0, omap_digest 0xffffffff != omap_digest 0xa7a553c8 from auth oi 3:0d89b25e:::603.00000000:5(7954'760 osd.1.0:33 dirty|omap|data_digest|omap_digest s 0 uv 424 dd ffffffff od a7a553c8 alloc_hint [0 0 0])
2022-06-28T14:31:32.225483+0200 osd.0 (osd.0) 67 : cluster [ERR] 3.0 shard 3 soid 3:0d89b25e:::603.00000000:5 : omap_digest 0xffffffff != omap_digest 0xa7a553c8 from shard 0, omap_digest 0xffffffff != omap_digest 0xa7a553c8 from auth oi 3:0d89b25e:::603.00000000:5(7954'760 osd.1.0:33 dirty|omap|data_digest|omap_digest s 0 uv 424 dd ffffffff od a7a553c8 alloc_hint [0 0 0])
2022-06-28T14:31:38.298299+0200 osd.0 (osd.0) 68 : cluster [ERR] 3.0 deep-scrub 0 missing, 4 inconsistent objects
2022-06-28T14:31:38.298305+0200 osd.0 (osd.0) 69 : cluster [ERR] 3.0 deep-scrub 12 errors
2022-06-28T14:31:43.355930+0200 mon.cephoctopus-1 (mon.0) 125985 : cluster [ERR] Health check failed: 12 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:31:43.355969+0200 mon.cephoctopus-1 (mon.0) 125986 : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2022-06-28T14:31:46.130046+0200 osd.0 (osd.0) 71 : cluster [ERR] 3.4 shard 3 soid 3:2f55791f:::606.00000000:5 : omap_digest 0xffffffff != omap_digest 0xc41ff5d8 from shard 0, omap_digest 0xffffffff != omap_digest 0xc41ff5d8 from auth oi 3:2f55791f:::606.00000000:5(7954'3379 osd.3.0:16 dirty|omap|data_digest|omap_digest s 0 uv 2926 dd ffffffff od c41ff5d8 alloc_hint [0 0 0])
2022-06-28T14:31:46.130051+0200 osd.0 (osd.0) 72 : cluster [ERR] 3.4 shard 3 soid 3:3ed09add:::607.00000000:5 : omap_digest 0xffffffff != omap_digest 0x1b2b7ec0 from shard 0, omap_digest 0xffffffff != omap_digest 0x1b2b7ec0 from auth oi 3:3ed09add:::607.00000000:5(7954'3381 osd.1.0:85 dirty|omap|data_digest|omap_digest s 0 uv 451 dd ffffffff od 1b2b7ec0 alloc_hint [0 0 0])
2022-06-28T14:31:46.130364+0200 osd.0 (osd.0) 73 : cluster [ERR] 3.4 deep-scrub 0 missing, 2 inconsistent objects
2022-06-28T14:31:46.130367+0200 osd.0 (osd.0) 74 : cluster [ERR] 3.4 deep-scrub 4 errors
2022-06-28T14:31:49.424773+0200 mon.cephoctopus-1 (mon.0) 125991 : cluster [ERR] Health check update: 16 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:31:49.424814+0200 mon.cephoctopus-1 (mon.0) 125992 : cluster [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2022-06-28T14:31:48.982546+0200 osd.0 (osd.0) 75 : cluster [ERR] 3.6 shard 2 soid 3:717a0223:::608.00000000:5 : omap_digest 0xffffffff != omap_digest 0xdaa9228a from shard 0, omap_digest 0xffffffff != omap_digest 0xdaa9228a from auth oi 3:717a0223:::608.00000000:5(7954'2632 osd.1.0:89 dirty|omap|data_digest|omap_digest s 0 uv 479 dd ffffffff od daa9228a alloc_hint [0 0 0])
2022-06-28T14:31:48.982552+0200 osd.0 (osd.0) 76 : cluster [ERR] 3.6 shard 3 soid 3:717a0223:::608.00000000:5 : omap_digest 0xffffffff != omap_digest 0xdaa9228a from shard 0, omap_digest 0xffffffff != omap_digest 0xdaa9228a from auth oi 3:717a0223:::608.00000000:5(7954'2632 osd.1.0:89 dirty|omap|data_digest|omap_digest s 0 uv 479 dd ffffffff od daa9228a alloc_hint [0 0 0])
2022-06-28T14:31:48.982748+0200 osd.0 (osd.0) 77 : cluster [ERR] 3.6 deep-scrub 0 missing, 1 inconsistent objects
2022-06-28T14:31:48.982752+0200 osd.0 (osd.0) 78 : cluster [ERR] 3.6 deep-scrub 3 errors
2022-06-28T14:31:52.034760+0200 osd.3 (osd.3) 23 : cluster [ERR] 3.3 soid 3:cd5a64a3:::100.00000000:5 : omap_digest 0xb4a035f1 != omap_digest 0xffffffff from shard 3
2022-06-28T14:31:52.034769+0200 osd.3 (osd.3) 24 : cluster [ERR] 3.3 shard 2 soid 3:cd5a64a3:::100.00000000:5 : omap_digest 0xffffffff != omap_digest 0xb4a035f1 from auth oi 3:cd5a64a3:::100.00000000:5(7954'1622 osd.3.0:3 dirty|omap|data_digest|omap_digest s 0 uv 1573 dd ffffffff od b4a035f1 alloc_hint [0 0 0])
2022-06-28T14:31:52.034776+0200 osd.3 (osd.3) 25 : cluster [ERR] 3.3 shard 3 soid 3:cd5a64a3:::100.00000000:5 : omap_digest 0xffffffff != omap_digest 0xb4a035f1 from auth oi 3:cd5a64a3:::100.00000000:5(7954'1622 osd.3.0:3 dirty|omap|data_digest|omap_digest s 0 uv 1573 dd ffffffff od b4a035f1 alloc_hint [0 0 0])
2022-06-28T14:31:52.671434+0200 osd.3 (osd.3) 26 : cluster [ERR] 3.3 deep-scrub 0 missing, 1 inconsistent objects
2022-06-28T14:31:52.671439+0200 osd.3 (osd.3) 27 : cluster [ERR] 3.3 deep-scrub 3 errors
2022-06-28T14:31:55.511182+0200 mon.cephoctopus-1 (mon.0) 125998 : cluster [ERR] Health check update: 22 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:31:55.511211+0200 mon.cephoctopus-1 (mon.0) 125999 : cluster [ERR] Health check update: Possible data damage: 4 pgs inconsistent (PG_DAMAGED)
2022-06-28T14:31:55.793167+0200 osd.0 (osd.0) 79 : cluster [ERR] 3.7 shard 3 soid 3:e0b41d06:::609.00000000:5 : omap_digest 0xffffffff != omap_digest 0x77c6f335 from shard 0, omap_digest 0xffffffff != omap_digest 0x77c6f335 from auth oi 3:e0b41d06:::609.00000000:5(7954'1276 osd.0.0:95 dirty|omap|data_digest|omap_digest s 0 uv 354 dd ffffffff od 77c6f335 alloc_hint [0 0 0])
2022-06-28T14:31:56.841661+0200 osd.0 (osd.0) 80 : cluster [ERR] 3.7 shard 3 soid 3:ff5b34d6:::1.00000000:5 : omap_digest 0xffffffff != omap_digest 0xe57ff843 from shard 0, omap_digest 0xffffffff != omap_digest 0xe57ff843 from auth oi 3:ff5b34d6:::1.00000000:5(7954'1274 osd.0.0:25 dirty|omap|data_digest|omap_digest s 0 uv 1046 dd ffffffff od e57ff843 alloc_hint [0 0 0])
2022-06-28T14:31:56.841958+0200 osd.0 (osd.0) 81 : cluster [ERR] 3.7 deep-scrub 0 missing, 2 inconsistent objects
2022-06-28T14:31:56.841962+0200 osd.0 (osd.0) 82 : cluster [ERR] 3.7 deep-scrub 4 errors
2022-06-28T14:32:00.712221+0200 osd.3 (osd.3) 28 : cluster [ERR] 3.5 shard 0 soid 3:a93a17c2:::604.00000000:5 : omap_digest 0xffffffff != omap_digest 0xfe43bdca from auth oi 3:a93a17c2:::604.00000000:5(7954'1737 osd.1.0:88 dirty|omap|data_digest|omap_digest s 0 uv 1540 dd ffffffff od fe43bdca alloc_hint [0 0 0])
2022-06-28T14:32:00.712421+0200 osd.3 (osd.3) 29 : cluster [ERR] 3.5 shard 2 soid 3:a93a17c2:::604.00000000:5 : omap_digest 0xffffffff != omap_digest 0xfe43bdca from auth oi 3:a93a17c2:::604.00000000:5(7954'1737 osd.1.0:88 dirty|omap|data_digest|omap_digest s 0 uv 1540 dd ffffffff od fe43bdca alloc_hint [0 0 0])
2022-06-28T14:32:00.712507+0200 osd.3 (osd.3) 30 : cluster [ERR] 3.5 shard 3 soid 3:a93a17c2:::604.00000000:5 : omap_digest 0xffffffff != omap_digest 0xfe43bdca from auth oi 3:a93a17c2:::604.00000000:5(7954'1737 osd.1.0:88 dirty|omap|data_digest|omap_digest s 0 uv 1540 dd ffffffff od fe43bdca alloc_hint [0 0 0])
2022-06-28T14:32:00.712679+0200 osd.3 (osd.3) 31 : cluster [ERR] 3.5 soid 3:a93a17c2:::604.00000000:5 : failed to pick suitable auth object
2022-06-28T14:32:00.712820+0200 osd.3 (osd.3) 32 : cluster [ERR] 3.5 shard 0 soid 3:b871830b:::605.00000000:5 : omap_digest 0xffffffff != omap_digest 0x4dd0b557 from auth oi 3:b871830b:::605.00000000:5(7954'1700 osd.1.0:86 dirty|omap|data_digest|omap_digest s 0 uv 1374 dd ffffffff od 4dd0b557 alloc_hint [0 0 0])
2022-06-28T14:32:00.712925+0200 osd.3 (osd.3) 33 : cluster [ERR] 3.5 shard 2 soid 3:b871830b:::605.00000000:5 : omap_digest 0xffffffff != omap_digest 0x4dd0b557 from auth oi 3:b871830b:::605.00000000:5(7954'1700 osd.1.0:86 dirty|omap|data_digest|omap_digest s 0 uv 1374 dd ffffffff od 4dd0b557 alloc_hint [0 0 0])
2022-06-28T14:32:00.713023+0200 osd.3 (osd.3) 34 : cluster [ERR] 3.5 shard 3 soid 3:b871830b:::605.00000000:5 : omap_digest 0xffffffff != omap_digest 0x4dd0b557 from auth oi 3:b871830b:::605.00000000:5(7954'1700 osd.1.0:86 dirty|omap|data_digest|omap_digest s 0 uv 1374 dd ffffffff od 4dd0b557 alloc_hint [0 0 0])
2022-06-28T14:32:00.713109+0200 osd.3 (osd.3) 35 : cluster [ERR] 3.5 soid 3:b871830b:::605.00000000:5 : failed to pick suitable auth object
2022-06-28T14:32:00.816172+0200 osd.3 (osd.3) 36 : cluster [ERR] 3.5 deep-scrub 6 errors
2022-06-28T14:32:01.829075+0200 mon.cephoctopus-1 (mon.0) 126002 : cluster [ERR] Health check update: 26 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:32:01.829120+0200 mon.cephoctopus-1 (mon.0) 126003 : cluster [ERR] Health check update: Possible data damage: 5 pgs inconsistent (PG_DAMAGED)
2022-06-28T14:32:06.831213+0200 mon.cephoctopus-1 (mon.0) 126007 : cluster [ERR] Health check update: 32 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:32:06.831251+0200 mon.cephoctopus-1 (mon.0) 126008 : cluster [ERR] Health check update: Possible data damage: 6 pgs inconsistent (PG_DAMAGED)
</pre> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2191822022-06-28T12:37:28ZDan van der Ster
<ul><li><strong>Subject</strong> changed from <i>Changing max_mds after metadata pool snap causes inconsistent objects</i> to <i>Writes to a cephfs after metadata pool snapshot causes inconsistent objects</i></li></ul> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2197322022-07-05T09:17:42ZVenky Shankarvshankar@redhat.com
<ul></ul><p>Dan van der Ster wrote:</p>
<blockquote>
<p>Venky Shankar wrote:</p>
<blockquote>
<p>Hi Dan,</p>
<p>I need to check, but does the inconsistent object warning show up only after reducing max_mds?</p>
</blockquote>
<p>Good point. In fact it is sufficient to just create some files in the cephfs after taking a pool snapshot.</p>
<p>With my test cluster and 1 active mds, I took a pool snapshot, then untarred the linux kernel, called `sync`, then deep-scrubbed all cephfs.cephfs.meta PGs. This found several inconsistent objects:</p>
<p>[...]</p>
</blockquote>
<p>Are you able to reproduce this after flushing the MDS journal and then taking a pool snap? I wonder if this happens when the journal entries are written out to the backing store (object omap). The digest mismatch hint at being related to omap too.</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2200382022-07-12T12:59:24ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>CephFS</i> to <i>RADOS</i></li></ul><p>AFAICT this is just a RADOS issue?</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2201932022-07-12T13:06:05ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use well. What were you trying to do here, Dan?</p>
<p>I imagine that there’s something weird happening here either when the osd is generating the snapcontext for pool snaps, or that omap/bluestore has something strange going on when that’s the only change across snaps.</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2202142022-07-12T13:57:52ZDan van der Ster
<ul></ul><p>Greg Farnum wrote:</p>
<blockquote>
<p>That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use well. What were you trying to do here, Dan?</p>
<p>I imagine that there’s something weird happening here either when the osd is generating the snapcontext for pool snaps, or that omap/bluestore has something strange going on when that’s the only change across snaps.</p>
</blockquote>
<p>I opened this on behalf of another user. See the thread [ceph-users] Inconsistent PGs after upgrade to Pacific</p>
<p>The use case seems to have been to create a consistent rollback point during a misbehaving upgrade.</p>
<p>I was indeed thinking that we could simply prevent users from taking snapshots of the metadata pool. Because the behavior of cephfs is is not well defined with these pool snapshots. If we wanted to support such a feature we'd probably need a way to atomically snapshot the metadata pool and all data pools.</p>
<p>But we should anyway try to fix the underlying inconsistency issue as well, right?</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2203122022-07-13T17:24:43ZNeha Ojhanojha@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Matan Breizman</i></li></ul> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2213742022-07-27T12:24:57ZPascal Ehlert
<ul></ul><p>This indeed happened during an upgrade from Octopus to Pacific.<br />I had forgotten to reduce the number of ranks in CephFS to 1 which resulted in the mds daemons not coming up with some inconsistency errors if I'm not mistaken.</p>
<p>Worried that I had already done harm to the data, I thought it was best to make a pool snapshot of both data and metadata to at least be able to roll back if I should make it worse during recovery.<br />Later managed to bring CephFS up again, but as soon as the first scrub happened, the status transitioned to HEALTH_ERR with the metadata pool PGs showing as active+clean+inconsistent.</p>
<p>When trying to run `ceph osd pool rmsnap <pool> <snapname>` I just get "pool <pool> snap <snapname> does not exist".</p>
<p>Does anyone have an idea how I could clean this up and bring the cluster back into healthy state?</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2214032022-07-27T17:47:54ZNeha Ojhanojha@redhat.com
<ul></ul><p>Pascal Ehlert wrote:</p>
<blockquote>
<p>This indeed happened during an upgrade from Octopus to Pacific.<br />I had forgotten to reduce the number of ranks in CephFS to 1 which resulted in the mds daemons not coming up with some inconsistency errors if I'm not mistaken.</p>
<p>Worried that I had already done harm to the data, I thought it was best to make a pool snapshot of both data and metadata to at least be able to roll back if I should make it worse during recovery.<br />Later managed to bring CephFS up again, but as soon as the first scrub happened, the status transitioned to HEALTH_ERR with the metadata pool PGs showing as active+clean+inconsistent.</p>
<p>When trying to run `ceph osd pool rmsnap <pool> <snapname>` I just get "pool <pool> snap <snapname> does not exist".</p>
<p>Does anyone have an idea how I could clean this up and bring the cluster back into healthy state?</p>
</blockquote>
<p>Have you tried to repair the inconsistent PG? Did that help?</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2214062022-07-27T17:59:57ZPascal Ehlert
<ul></ul><p>Tried that a few times for different PGs on different OSDs, but it doesn't help</p> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2223602022-08-08T08:51:51ZMatan Breizman
<ul></ul><p>Dan van der Ster wrote:</p>
<blockquote>
<p>Good point. In fact it is sufficient to just create some files in the cephfs after taking a pool snapshot.</p>
<p>With my test cluster and 1 active mds, I took a pool snapshot, then untarred the linux kernel, called `sync`, then deep-scrubbed all cephfs.cephfs.meta PGs. This found several inconsistent objects:</p>
</blockquote>
<p>Hi Dan,<br />I'm trying to reproduce this on a vstart cluster using ceph-fuse with no luck.<br />Does the following case causes inconsistencies for you?</p>
<pre><code class="text syntaxhl"><span class="CodeRay">MDS=1 MON=1 OSD=1 MGR=1 ../src/vstart.sh -X -G --msgr1 -n --without-dashboard
ceph-fuse /mnt/cephfs/
mkdir /mnt/cephfs/mydir
rados mksnap -p cephfs.a.meta testsnap1
tar -xvf v5.19.tar.gz
sync
ceph daemon mds.a flush journal
for i in {0..9}; do ceph pg deep-scrub 3.$i; done &&
for i in {a..f}; do ceph pg deep-scrub 3.$i; done
</span></code></pre><br />Resulting in ok status for all the pgs.<br /><pre><code class="text syntaxhl"><span class="CodeRay">log_channel(cluster) log [DBG] : 3.<> deep-scrub ok
</span></code></pre> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2233742022-08-15T08:19:09ZMatan Breizman
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Can't reproduce</i></li></ul> RADOS - Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objectshttps://tracker.ceph.com/issues/56386?journal_id=2325192023-03-07T17:48:35ZPascal Ehlert
<ul></ul><p>Sorry to warm this up again but our cluster is still in an unhealthy state and we are trying to find ways to recover from it.</p>
<p>Something I apparently did not notice originally but found now is that in the Ceph metadata pool, we seem to have some (most) objects without snapshots, some with <strong>just</strong> the snapshot and no head and some with head and snapshots.</p>
<p>Here are some examples returned by `rados -p pool listsnaps $obj`:</p>
<pre>
20001b80584.00000000:
cloneid snaps size overlap
1 1 0 []
head - 0
10008f71b7d.00000000:
cloneid snaps size overlap
head - 0
100001cdca7.00000000:
cloneid snaps size overlap
1 1 0 []
</pre>
<p>Would it be same to use `rmsnap` to remove the snapshots and for the objects missing `head` even to remove the entire object?</p>
<p>I don't think there is a valid reason to have snapshots in the metadata pool even when using the cephfs snapshot functionality, right?</p>