Project

General

Profile

Bug #56386

Writes to a cephfs after metadata pool snapshot causes inconsistent objects

Added by Dan van der Ster almost 2 years ago. Updated about 1 year ago.

Status:
Can't reproduce
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If you take a snapshot of the meta pool, then decrease max_mds, metadata objects will be inconsistent.
Removing the pool snap then deep scrubbing again removes the inconsistent objects.

Details:

# ceph versions
{
    "mon": {
        "ceph version 16.2.9-1 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 4
    },
    "mgr": {
        "ceph version 16.2.9-1 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 4
    },
    "osd": {
        "ceph version 16.2.9-1 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 4
    },
    "mds": {
        "ceph version 16.2.9-1 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 4
    },
    "overall": {
        "ceph version 16.2.9-1 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 16
    }
}

# ceph -s
  cluster:
    id:     03a1871e-32d8-4f3c-9be1-a7d7e9846205
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum cephoctopus-1,cephoctopus-2,cephoctopus-3,cephoctopus-4 (age 6h)
    mgr: cephoctopus-1(active, since 9d), standbys: cephoctopus-3, cephoctopus-4, cephoctopus-2
    mds: 2/2 daemons up, 2 standby
    osd: 4 osds: 4 up (since 2d), 4 in (since 16M)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 106 pgs
    objects: 16.91k objects, 12 GiB
    usage:   39 GiB used, 41 GiB / 80 GiB avail
    pgs:     106 active+clean

# ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL    USED  RAW USED  %RAW USED
hdd    80 GiB  41 GiB  39 GiB    39 GiB      48.50
TOTAL  80 GiB  41 GiB  39 GiB    39 GiB      48.50

--- POOLS ---
POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics   1    1  3.8 KiB        1   12 KiB      0     11 GiB
test                    2   32  3.3 MiB      847  159 MiB   0.48     11 GiB
cephfs.cephfs.meta      3    8  193 MiB      102  560 MiB   1.67     11 GiB
cephfs.cephfs.data      4   32  102 MiB      981  470 MiB   1.41     11 GiB
volumes                 5    8  2.5 GiB    6.35k  7.8 GiB  19.60     11 GiB
images                  6    8  5.5 GiB      750   17 GiB  33.98     11 GiB
backups                 7    8  1.5 GiB      472  4.6 GiB  12.42     11 GiB
vms                     8    8     19 B        1  192 KiB      0     11 GiB
testpool               10    1  2.3 GiB    7.42k  8.1 GiB  20.07     11 GiB

# rados mksnap -p cephfs.cephfs.meta testsnap1
created pool cephfs.cephfs.meta snap testsnap1

# ceph fs set cephfs max_mds 1

# sleep 60 

# for i in {0..7}; do ceph pg deep-scrub 3.$i; done
instructing pg 3.0 on osd.0 to deep-scrub
instructing pg 3.1 on osd.3 to deep-scrub
instructing pg 3.2 on osd.1 to deep-scrub
instructing pg 3.3 on osd.3 to deep-scrub
instructing pg 3.4 on osd.0 to deep-scrub
instructing pg 3.5 on osd.3 to deep-scrub
instructing pg 3.6 on osd.0 to deep-scrub
instructing pg 3.7 on osd.0 to deep-scrub

2022-06-24T09:09:19.245382+0200 mon.cephoctopus-1 (mon.0) 106142 : cluster [INF] daemon mds.xxx-2 finished stopping rank 1 in filesystem cephfs (now has 1 ranks)
2022-06-24T09:09:33.385965+0200 osd.1 (osd.1) 923 : cluster [ERR] 3.2 shard 3 soid 3:54ba6923:::mds1_openfiles.0:1 : omap_digest 0xffffffff != omap_digest 0x22bb480a from shard 1, omap_digest 0xffffffff != omap_digest 0x22bb480a from auth oi 3:54ba6923:::mds1_openfiles.0:1(7845'910 osd.1.0:15 dirty|omap|data_digest|omap_digest s 0 uv 908 dd ffffffff od 22bb480a alloc_hint [0 0 0])
2022-06-24T09:09:33.386347+0200 osd.1 (osd.1) 924 : cluster [ERR] 3.2 deep-scrub 0 missing, 1 inconsistent objects
2022-06-24T09:09:33.386383+0200 osd.1 (osd.1) 925 : cluster [ERR] 3.2 deep-scrub 2 errors
2022-06-24T09:09:35.615407+0200 osd.3 (osd.3) 593 : cluster [ERR] 3.3 soid 3:c27f454c:::mds1_sessionmap:1 : omap_digest 0x95baa69c != omap_digest 0xffffffff from shard 3
2022-06-24T09:09:35.615418+0200 osd.3 (osd.3) 594 : cluster [ERR] 3.3 shard 2 soid 3:c27f454c:::mds1_sessionmap:1 : omap_digest 0xffffffff != omap_digest 0x95baa69c from auth oi 3:c27f454c:::mds1_sessionmap:1(7845'1553 osd.3.0:54 dirty|omap|data_digest|omap_digest s 0 uv 1412 dd ffffffff od 95baa69c alloc_hint [0 0 0])
2022-06-24T09:09:35.615425+0200 osd.3 (osd.3) 595 : cluster [ERR] 3.3 shard 3 soid 3:c27f454c:::mds1_sessionmap:1 : omap_digest 0xffffffff != omap_digest 0x95baa69c from auth oi 3:c27f454c:::mds1_sessionmap:1(7845'1553 osd.3.0:54 dirty|omap|data_digest|omap_digest s 0 uv 1412 dd ffffffff od 95baa69c alloc_hint [0 0 0])
# rados list-inconsistent-obj 3.2 | jq .
{
  "epoch": 7835,
  "inconsistents": [
    {
      "object": {
        "name": "mds1_openfiles.0",
        "nspace": "",
        "locator": "",
        "snap": 1,
        "version": 908
      },
      "errors": [
        "omap_digest_mismatch" 
      ],
      "union_shard_errors": [
        "omap_digest_mismatch_info" 
      ],
      "selected_object_info": {
        "oid": {
          "oid": "mds1_openfiles.0",
          "key": "",
          "snapid": 1,
          "hash": 3298188586,
          "max": 0,
          "pool": 3,
          "namespace": "" 
        },
        "version": "7845'910",
        "prior_version": "7818'909",
        "last_reqid": "osd.1.0:15",
        "user_version": 908,
        "size": 0,
        "mtime": "2022-06-14T18:22:20.287883+0200",
        "local_mtime": "2022-06-14T18:22:20.288351+0200",
        "lost": 0,
        "flags": [
          "dirty",
          "omap",
          "data_digest",
          "omap_digest" 
        ],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": "0xffffffff",
        "omap_digest": "0x22bb480a",
        "expected_object_size": 0,
        "expected_write_size": 0,
        "alloc_hint_flags": 0,
        "manifest": {
          "type": 0
        },
        "watchers": {}
      },
      "shards": [
        {
          "osd": 0,
          "primary": false,
          "errors": [],
          "size": 0,
          "omap_digest": "0x22bb480a",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 1,
          "primary": true,
          "errors": [],
          "size": 0,
          "omap_digest": "0x22bb480a",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 3,
          "primary": false,
          "errors": [
            "omap_digest_mismatch_info" 
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff" 
        }
      ]
    }
  ]
}
# rados list-inconsistent-obj 3.3 | jq .
{
  "epoch": 7758,
  "inconsistents": [
    {
      "object": {
        "name": "mds1_sessionmap",
        "nspace": "",
        "locator": "",
        "snap": 1,
        "version": 1412
      },
      "errors": [
        "omap_digest_mismatch" 
      ],
      "union_shard_errors": [
        "omap_digest_mismatch_info" 
      ],
      "selected_object_info": {
        "oid": {
          "oid": "mds1_sessionmap",
          "key": "",
          "snapid": 1,
          "hash": 849542723,
          "max": 0,
          "pool": 3,
          "namespace": "" 
        },
        "version": "7845'1553",
        "prior_version": "7511'1467",
        "last_reqid": "osd.3.0:54",
        "user_version": 1412,
        "size": 0,
        "mtime": "2022-05-11T11:47:08.738876+0200",
        "local_mtime": "2022-05-11T11:47:08.741128+0200",
        "lost": 0,
        "flags": [
          "dirty",
          "omap",
          "data_digest",
          "omap_digest" 
        ],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": "0xffffffff",
        "omap_digest": "0x95baa69c",
        "expected_object_size": 0,
        "expected_write_size": 0,
        "alloc_hint_flags": 0,
        "manifest": {
          "type": 0
        },
        "watchers": {}
      },
      "shards": [
        {
          "osd": 0,
          "primary": false,
          "errors": [],
          "size": 0,
          "omap_digest": "0x95baa69c",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 2,
          "primary": false,
          "errors": [
            "omap_digest_mismatch_info" 
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 3,
          "primary": true,
          "errors": [
            "omap_digest_mismatch_info" 
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff" 
        }
      ]
    }
  ]
}
# rados lssnap -p cephfs.cephfs.meta
1    testsnap1    2022.06.24 09:06:56
1 snaps
# rados rmsnap -p cephfs.cephfs.meta testsnap1
removed pool cephfs.cephfs.meta snap testsnap1
# rados lssnap -p cephfs.cephfs.meta
0 snaps

# for i in {0..7}; do ceph pg deep-scrub 3.$i; done
instructing pg 3.0 on osd.0 to deep-scrub
instructing pg 3.1 on osd.3 to deep-scrub
instructing pg 3.2 on osd.1 to deep-scrub
instructing pg 3.3 on osd.3 to deep-scrub
instructing pg 3.4 on osd.0 to deep-scrub
instructing pg 3.5 on osd.3 to deep-scrub
instructing pg 3.6 on osd.0 to deep-scrub
instructing pg 3.7 on osd.0 to deep-scrub
 # ceph health
HEALTH_OK

History

#1 Updated by Dan van der Ster almost 2 years ago

Removing the pool snap then deep scrubbing again removes the inconsistent objects.

This isn't true -- my quick test was probably so simple that it worked in that one case.

But doing more testing with pool snaps and modifying max_mds shows that you can end up with inconsistent objects that exist in the PG but belong to no snapshots. How would one remove those objects?

Details:

I just created a new snapshot, changed max_mds, then
removed the snap -- this time I can't manage to "fix" the
inconsistency. In this case, the inconsistent object appears to be an old version of
mds0_openfiles.0

# rados list-inconsistent-obj 3.6 | jq .
{
  "epoch": 7754,
  "inconsistents": [
    {
      "object": {
        "name": "mds0_openfiles.0",
        "nspace": "",
        "locator": "",
        "snap": 3,
        "version": 2467
      },

I tried modifying the current version of that with setomapval, but the
object stays inconsistent.
I even removed it from the pool (head version) and somehow that old
snapshotted version remains with the wrong checksum even though the
snap exists.
# rados rm -p cephfs.cephfs.meta mds0_openfiles.0
#

# ceph pg ls inconsistent
PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES     OMAP_BYTES*
OMAP_KEYS*  LOG  STATE                      SINCE  VERSION    REPORTED
   UP         ACTING     SCRUB_STAMP
DEEP_SCRUB_STAMP
3.6       13         0          0        0  20971520            0
     0   41  active+clean+inconsistent     2m  7852'2479  7852:12048
[0,3,2]p0  [0,3,2]p0  2022-06-24T11:31:05.605434+0200
2022-06-24T11:31:05.605434+0200

# rados lssnap -p cephfs.cephfs.meta
0 snaps

This is getting super weird (I can list the object but not stat it):
# rados ls -p cephfs.cephfs.meta | grep open
mds1_openfiles.0
mds3_openfiles.0
mds0_openfiles.0
mds2_openfiles.0

# rados stat -p cephfs.cephfs.meta mds0_openfiles.0
 error stat-ing cephfs.cephfs.meta/mds0_openfiles.0: (2) No such file
or directory

I then failed over the mds to a standby so mds0_openfiles.0 exists
again, but the PG remains inconsistent with that old version of the
object.

How do I remove that inconsistent object?

Here is the PG contents. I have two objects with the relevant snapid 3:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 3.6 --op list
Error getting attr on : 3.6_head,#-5:60000000:::scrub_3.6:head#, (61) No data available
["3.6",{"oid":"scrub_3.6","key":"","snapid":-2,"hash":6,"max":0,"pool":-5,"namespace":"","max":0}]
["3.6",{"oid":"200.00000480","key":"","snapid":-2,"hash":1217246726,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000478","key":"","snapid":-2,"hash":2867267206,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"617.00000000","key":"","snapid":3,"hash":1358031174,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"617.00000000","key":"","snapid":-2,"hash":1358031174,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"627.00000000","key":"","snapid":-2,"hash":1048409926,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"mds0_openfiles.0","key":"","snapid":3,"hash":1261208230,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"mds0_openfiles.0","key":"","snapid":-2,"hash":1261208230,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"620.00000000","key":"","snapid":-2,"hash":368001638,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"60f.00000000","key":"","snapid":-2,"hash":1879848182,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000482","key":"","snapid":-2,"hash":3898282742,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"608.00000000","key":"","snapid":-2,"hash":3292552846,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000471","key":"","snapid":-2,"hash":985378558,"max":0,"pool":3,"namespace":"","max":0}]
["3.6",{"oid":"200.00000467","key":"","snapid":-2,"hash":2708043262,"max":0,"pool":3,"namespace":"","max":0}]

#2 Updated by Venky Shankar almost 2 years ago

Hi Dan,

I need to check, but does the inconsistent object warning show up only after reducing max_mds?

#3 Updated by Dan van der Ster almost 2 years ago

Venky Shankar wrote:

Hi Dan,

I need to check, but does the inconsistent object warning show up only after reducing max_mds?

Good point. In fact it is sufficient to just create some files in the cephfs after taking a pool snapshot.

With my test cluster and 1 active mds, I took a pool snapshot, then untarred the linux kernel, called `sync`, then deep-scrubbed all cephfs.cephfs.meta PGs. This found several inconsistent objects:

2022-06-28T14:31:32.224869+0200 osd.0 (osd.0) 60 : cluster [ERR] 3.0 shard 2 soid 3:09401a81:::601.00000000:5 : omap_digest 0xffffffff != omap_digest 0x5a8b636f from shard 0, omap_digest 0xffffffff != omap_digest 0x5a8b636f from auth oi 3:09401a81:::601.00000000:5(7954'756 osd.1.0:30 dirty|omap|data_digest|omap_digest s 0 uv 426 dd ffffffff od 5a8b636f alloc_hint [0 0 0])
2022-06-28T14:31:32.225018+0200 osd.0 (osd.0) 61 : cluster [ERR] 3.0 shard 3 soid 3:09401a81:::601.00000000:5 : omap_digest 0xffffffff != omap_digest 0x5a8b636f from shard 0, omap_digest 0xffffffff != omap_digest 0x5a8b636f from auth oi 3:09401a81:::601.00000000:5(7954'756 osd.1.0:30 dirty|omap|data_digest|omap_digest s 0 uv 426 dd ffffffff od 5a8b636f alloc_hint [0 0 0])
2022-06-28T14:31:32.225098+0200 osd.0 (osd.0) 62 : cluster [ERR] 3.0 shard 2 soid 3:0bd6d154:::602.00000000:5 : omap_digest 0xffffffff != omap_digest 0x421b5397 from shard 0, omap_digest 0xffffffff != omap_digest 0x421b5397 from auth oi 3:0bd6d154:::602.00000000:5(7954'758 osd.1.0:31 dirty|omap|data_digest|omap_digest s 0 uv 425 dd ffffffff od 421b5397 alloc_hint [0 0 0])
2022-06-28T14:31:32.225173+0200 osd.0 (osd.0) 63 : cluster [ERR] 3.0 shard 3 soid 3:0bd6d154:::602.00000000:5 : omap_digest 0xffffffff != omap_digest 0x421b5397 from shard 0, omap_digest 0xffffffff != omap_digest 0x421b5397 from auth oi 3:0bd6d154:::602.00000000:5(7954'758 osd.1.0:31 dirty|omap|data_digest|omap_digest s 0 uv 425 dd ffffffff od 421b5397 alloc_hint [0 0 0])
2022-06-28T14:31:32.225241+0200 osd.0 (osd.0) 64 : cluster [ERR] 3.0 shard 2 soid 3:0d82a743:::600.00000000:5 : omap_digest 0xffffffff != omap_digest 0x63500a94 from shard 0, omap_digest 0xffffffff != omap_digest 0x63500a94 from auth oi 3:0d82a743:::600.00000000:5(7954'754 osd.1.0:32 dirty|omap|data_digest|omap_digest s 0 uv 423 dd ffffffff od 63500a94 alloc_hint [0 0 0])
2022-06-28T14:31:32.225307+0200 osd.0 (osd.0) 65 : cluster [ERR] 3.0 shard 3 soid 3:0d82a743:::600.00000000:5 : omap_digest 0xffffffff != omap_digest 0x63500a94 from shard 0, omap_digest 0xffffffff != omap_digest 0x63500a94 from auth oi 3:0d82a743:::600.00000000:5(7954'754 osd.1.0:32 dirty|omap|data_digest|omap_digest s 0 uv 423 dd ffffffff od 63500a94 alloc_hint [0 0 0])
2022-06-28T14:31:32.225408+0200 osd.0 (osd.0) 66 : cluster [ERR] 3.0 shard 2 soid 3:0d89b25e:::603.00000000:5 : omap_digest 0xffffffff != omap_digest 0xa7a553c8 from shard 0, omap_digest 0xffffffff != omap_digest 0xa7a553c8 from auth oi 3:0d89b25e:::603.00000000:5(7954'760 osd.1.0:33 dirty|omap|data_digest|omap_digest s 0 uv 424 dd ffffffff od a7a553c8 alloc_hint [0 0 0])
2022-06-28T14:31:32.225483+0200 osd.0 (osd.0) 67 : cluster [ERR] 3.0 shard 3 soid 3:0d89b25e:::603.00000000:5 : omap_digest 0xffffffff != omap_digest 0xa7a553c8 from shard 0, omap_digest 0xffffffff != omap_digest 0xa7a553c8 from auth oi 3:0d89b25e:::603.00000000:5(7954'760 osd.1.0:33 dirty|omap|data_digest|omap_digest s 0 uv 424 dd ffffffff od a7a553c8 alloc_hint [0 0 0])
2022-06-28T14:31:38.298299+0200 osd.0 (osd.0) 68 : cluster [ERR] 3.0 deep-scrub 0 missing, 4 inconsistent objects
2022-06-28T14:31:38.298305+0200 osd.0 (osd.0) 69 : cluster [ERR] 3.0 deep-scrub 12 errors
2022-06-28T14:31:43.355930+0200 mon.cephoctopus-1 (mon.0) 125985 : cluster [ERR] Health check failed: 12 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:31:43.355969+0200 mon.cephoctopus-1 (mon.0) 125986 : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2022-06-28T14:31:46.130046+0200 osd.0 (osd.0) 71 : cluster [ERR] 3.4 shard 3 soid 3:2f55791f:::606.00000000:5 : omap_digest 0xffffffff != omap_digest 0xc41ff5d8 from shard 0, omap_digest 0xffffffff != omap_digest 0xc41ff5d8 from auth oi 3:2f55791f:::606.00000000:5(7954'3379 osd.3.0:16 dirty|omap|data_digest|omap_digest s 0 uv 2926 dd ffffffff od c41ff5d8 alloc_hint [0 0 0])
2022-06-28T14:31:46.130051+0200 osd.0 (osd.0) 72 : cluster [ERR] 3.4 shard 3 soid 3:3ed09add:::607.00000000:5 : omap_digest 0xffffffff != omap_digest 0x1b2b7ec0 from shard 0, omap_digest 0xffffffff != omap_digest 0x1b2b7ec0 from auth oi 3:3ed09add:::607.00000000:5(7954'3381 osd.1.0:85 dirty|omap|data_digest|omap_digest s 0 uv 451 dd ffffffff od 1b2b7ec0 alloc_hint [0 0 0])
2022-06-28T14:31:46.130364+0200 osd.0 (osd.0) 73 : cluster [ERR] 3.4 deep-scrub 0 missing, 2 inconsistent objects
2022-06-28T14:31:46.130367+0200 osd.0 (osd.0) 74 : cluster [ERR] 3.4 deep-scrub 4 errors
2022-06-28T14:31:49.424773+0200 mon.cephoctopus-1 (mon.0) 125991 : cluster [ERR] Health check update: 16 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:31:49.424814+0200 mon.cephoctopus-1 (mon.0) 125992 : cluster [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2022-06-28T14:31:48.982546+0200 osd.0 (osd.0) 75 : cluster [ERR] 3.6 shard 2 soid 3:717a0223:::608.00000000:5 : omap_digest 0xffffffff != omap_digest 0xdaa9228a from shard 0, omap_digest 0xffffffff != omap_digest 0xdaa9228a from auth oi 3:717a0223:::608.00000000:5(7954'2632 osd.1.0:89 dirty|omap|data_digest|omap_digest s 0 uv 479 dd ffffffff od daa9228a alloc_hint [0 0 0])
2022-06-28T14:31:48.982552+0200 osd.0 (osd.0) 76 : cluster [ERR] 3.6 shard 3 soid 3:717a0223:::608.00000000:5 : omap_digest 0xffffffff != omap_digest 0xdaa9228a from shard 0, omap_digest 0xffffffff != omap_digest 0xdaa9228a from auth oi 3:717a0223:::608.00000000:5(7954'2632 osd.1.0:89 dirty|omap|data_digest|omap_digest s 0 uv 479 dd ffffffff od daa9228a alloc_hint [0 0 0])
2022-06-28T14:31:48.982748+0200 osd.0 (osd.0) 77 : cluster [ERR] 3.6 deep-scrub 0 missing, 1 inconsistent objects
2022-06-28T14:31:48.982752+0200 osd.0 (osd.0) 78 : cluster [ERR] 3.6 deep-scrub 3 errors
2022-06-28T14:31:52.034760+0200 osd.3 (osd.3) 23 : cluster [ERR] 3.3 soid 3:cd5a64a3:::100.00000000:5 : omap_digest 0xb4a035f1 != omap_digest 0xffffffff from shard 3
2022-06-28T14:31:52.034769+0200 osd.3 (osd.3) 24 : cluster [ERR] 3.3 shard 2 soid 3:cd5a64a3:::100.00000000:5 : omap_digest 0xffffffff != omap_digest 0xb4a035f1 from auth oi 3:cd5a64a3:::100.00000000:5(7954'1622 osd.3.0:3 dirty|omap|data_digest|omap_digest s 0 uv 1573 dd ffffffff od b4a035f1 alloc_hint [0 0 0])
2022-06-28T14:31:52.034776+0200 osd.3 (osd.3) 25 : cluster [ERR] 3.3 shard 3 soid 3:cd5a64a3:::100.00000000:5 : omap_digest 0xffffffff != omap_digest 0xb4a035f1 from auth oi 3:cd5a64a3:::100.00000000:5(7954'1622 osd.3.0:3 dirty|omap|data_digest|omap_digest s 0 uv 1573 dd ffffffff od b4a035f1 alloc_hint [0 0 0])
2022-06-28T14:31:52.671434+0200 osd.3 (osd.3) 26 : cluster [ERR] 3.3 deep-scrub 0 missing, 1 inconsistent objects
2022-06-28T14:31:52.671439+0200 osd.3 (osd.3) 27 : cluster [ERR] 3.3 deep-scrub 3 errors
2022-06-28T14:31:55.511182+0200 mon.cephoctopus-1 (mon.0) 125998 : cluster [ERR] Health check update: 22 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:31:55.511211+0200 mon.cephoctopus-1 (mon.0) 125999 : cluster [ERR] Health check update: Possible data damage: 4 pgs inconsistent (PG_DAMAGED)
2022-06-28T14:31:55.793167+0200 osd.0 (osd.0) 79 : cluster [ERR] 3.7 shard 3 soid 3:e0b41d06:::609.00000000:5 : omap_digest 0xffffffff != omap_digest 0x77c6f335 from shard 0, omap_digest 0xffffffff != omap_digest 0x77c6f335 from auth oi 3:e0b41d06:::609.00000000:5(7954'1276 osd.0.0:95 dirty|omap|data_digest|omap_digest s 0 uv 354 dd ffffffff od 77c6f335 alloc_hint [0 0 0])
2022-06-28T14:31:56.841661+0200 osd.0 (osd.0) 80 : cluster [ERR] 3.7 shard 3 soid 3:ff5b34d6:::1.00000000:5 : omap_digest 0xffffffff != omap_digest 0xe57ff843 from shard 0, omap_digest 0xffffffff != omap_digest 0xe57ff843 from auth oi 3:ff5b34d6:::1.00000000:5(7954'1274 osd.0.0:25 dirty|omap|data_digest|omap_digest s 0 uv 1046 dd ffffffff od e57ff843 alloc_hint [0 0 0])
2022-06-28T14:31:56.841958+0200 osd.0 (osd.0) 81 : cluster [ERR] 3.7 deep-scrub 0 missing, 2 inconsistent objects
2022-06-28T14:31:56.841962+0200 osd.0 (osd.0) 82 : cluster [ERR] 3.7 deep-scrub 4 errors
2022-06-28T14:32:00.712221+0200 osd.3 (osd.3) 28 : cluster [ERR] 3.5 shard 0 soid 3:a93a17c2:::604.00000000:5 : omap_digest 0xffffffff != omap_digest 0xfe43bdca from auth oi 3:a93a17c2:::604.00000000:5(7954'1737 osd.1.0:88 dirty|omap|data_digest|omap_digest s 0 uv 1540 dd ffffffff od fe43bdca alloc_hint [0 0 0])
2022-06-28T14:32:00.712421+0200 osd.3 (osd.3) 29 : cluster [ERR] 3.5 shard 2 soid 3:a93a17c2:::604.00000000:5 : omap_digest 0xffffffff != omap_digest 0xfe43bdca from auth oi 3:a93a17c2:::604.00000000:5(7954'1737 osd.1.0:88 dirty|omap|data_digest|omap_digest s 0 uv 1540 dd ffffffff od fe43bdca alloc_hint [0 0 0])
2022-06-28T14:32:00.712507+0200 osd.3 (osd.3) 30 : cluster [ERR] 3.5 shard 3 soid 3:a93a17c2:::604.00000000:5 : omap_digest 0xffffffff != omap_digest 0xfe43bdca from auth oi 3:a93a17c2:::604.00000000:5(7954'1737 osd.1.0:88 dirty|omap|data_digest|omap_digest s 0 uv 1540 dd ffffffff od fe43bdca alloc_hint [0 0 0])
2022-06-28T14:32:00.712679+0200 osd.3 (osd.3) 31 : cluster [ERR] 3.5 soid 3:a93a17c2:::604.00000000:5 : failed to pick suitable auth object
2022-06-28T14:32:00.712820+0200 osd.3 (osd.3) 32 : cluster [ERR] 3.5 shard 0 soid 3:b871830b:::605.00000000:5 : omap_digest 0xffffffff != omap_digest 0x4dd0b557 from auth oi 3:b871830b:::605.00000000:5(7954'1700 osd.1.0:86 dirty|omap|data_digest|omap_digest s 0 uv 1374 dd ffffffff od 4dd0b557 alloc_hint [0 0 0])
2022-06-28T14:32:00.712925+0200 osd.3 (osd.3) 33 : cluster [ERR] 3.5 shard 2 soid 3:b871830b:::605.00000000:5 : omap_digest 0xffffffff != omap_digest 0x4dd0b557 from auth oi 3:b871830b:::605.00000000:5(7954'1700 osd.1.0:86 dirty|omap|data_digest|omap_digest s 0 uv 1374 dd ffffffff od 4dd0b557 alloc_hint [0 0 0])
2022-06-28T14:32:00.713023+0200 osd.3 (osd.3) 34 : cluster [ERR] 3.5 shard 3 soid 3:b871830b:::605.00000000:5 : omap_digest 0xffffffff != omap_digest 0x4dd0b557 from auth oi 3:b871830b:::605.00000000:5(7954'1700 osd.1.0:86 dirty|omap|data_digest|omap_digest s 0 uv 1374 dd ffffffff od 4dd0b557 alloc_hint [0 0 0])
2022-06-28T14:32:00.713109+0200 osd.3 (osd.3) 35 : cluster [ERR] 3.5 soid 3:b871830b:::605.00000000:5 : failed to pick suitable auth object
2022-06-28T14:32:00.816172+0200 osd.3 (osd.3) 36 : cluster [ERR] 3.5 deep-scrub 6 errors
2022-06-28T14:32:01.829075+0200 mon.cephoctopus-1 (mon.0) 126002 : cluster [ERR] Health check update: 26 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:32:01.829120+0200 mon.cephoctopus-1 (mon.0) 126003 : cluster [ERR] Health check update: Possible data damage: 5 pgs inconsistent (PG_DAMAGED)
2022-06-28T14:32:06.831213+0200 mon.cephoctopus-1 (mon.0) 126007 : cluster [ERR] Health check update: 32 scrub errors (OSD_SCRUB_ERRORS)
2022-06-28T14:32:06.831251+0200 mon.cephoctopus-1 (mon.0) 126008 : cluster [ERR] Health check update: Possible data damage: 6 pgs inconsistent (PG_DAMAGED)

#4 Updated by Dan van der Ster almost 2 years ago

  • Subject changed from Changing max_mds after metadata pool snap causes inconsistent objects to Writes to a cephfs after metadata pool snapshot causes inconsistent objects

#5 Updated by Venky Shankar over 1 year ago

Dan van der Ster wrote:

Venky Shankar wrote:

Hi Dan,

I need to check, but does the inconsistent object warning show up only after reducing max_mds?

Good point. In fact it is sufficient to just create some files in the cephfs after taking a pool snapshot.

With my test cluster and 1 active mds, I took a pool snapshot, then untarred the linux kernel, called `sync`, then deep-scrubbed all cephfs.cephfs.meta PGs. This found several inconsistent objects:

[...]

Are you able to reproduce this after flushing the MDS journal and then taking a pool snap? I wonder if this happens when the journal entries are written out to the backing store (object omap). The digest mismatch hint at being related to omap too.

#6 Updated by Greg Farnum over 1 year ago

  • Project changed from CephFS to RADOS

AFAICT this is just a RADOS issue?

#7 Updated by Greg Farnum over 1 year ago

That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use well. What were you trying to do here, Dan?

I imagine that there’s something weird happening here either when the osd is generating the snapcontext for pool snaps, or that omap/bluestore has something strange going on when that’s the only change across snaps.

#8 Updated by Dan van der Ster over 1 year ago

Greg Farnum wrote:

That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use well. What were you trying to do here, Dan?

I imagine that there’s something weird happening here either when the osd is generating the snapcontext for pool snaps, or that omap/bluestore has something strange going on when that’s the only change across snaps.

I opened this on behalf of another user. See the thread [ceph-users] Inconsistent PGs after upgrade to Pacific

The use case seems to have been to create a consistent rollback point during a misbehaving upgrade.

I was indeed thinking that we could simply prevent users from taking snapshots of the metadata pool. Because the behavior of cephfs is is not well defined with these pool snapshots. If we wanted to support such a feature we'd probably need a way to atomically snapshot the metadata pool and all data pools.

But we should anyway try to fix the underlying inconsistency issue as well, right?

#9 Updated by Neha Ojha over 1 year ago

  • Assignee set to Matan Breizman

#10 Updated by Pascal Ehlert over 1 year ago

This indeed happened during an upgrade from Octopus to Pacific.
I had forgotten to reduce the number of ranks in CephFS to 1 which resulted in the mds daemons not coming up with some inconsistency errors if I'm not mistaken.

Worried that I had already done harm to the data, I thought it was best to make a pool snapshot of both data and metadata to at least be able to roll back if I should make it worse during recovery.
Later managed to bring CephFS up again, but as soon as the first scrub happened, the status transitioned to HEALTH_ERR with the metadata pool PGs showing as active+clean+inconsistent.

When trying to run `ceph osd pool rmsnap <pool> <snapname>` I just get "pool <pool> snap <snapname> does not exist".

Does anyone have an idea how I could clean this up and bring the cluster back into healthy state?

#11 Updated by Neha Ojha over 1 year ago

Pascal Ehlert wrote:

This indeed happened during an upgrade from Octopus to Pacific.
I had forgotten to reduce the number of ranks in CephFS to 1 which resulted in the mds daemons not coming up with some inconsistency errors if I'm not mistaken.

Worried that I had already done harm to the data, I thought it was best to make a pool snapshot of both data and metadata to at least be able to roll back if I should make it worse during recovery.
Later managed to bring CephFS up again, but as soon as the first scrub happened, the status transitioned to HEALTH_ERR with the metadata pool PGs showing as active+clean+inconsistent.

When trying to run `ceph osd pool rmsnap <pool> <snapname>` I just get "pool <pool> snap <snapname> does not exist".

Does anyone have an idea how I could clean this up and bring the cluster back into healthy state?

Have you tried to repair the inconsistent PG? Did that help?

#12 Updated by Pascal Ehlert over 1 year ago

Tried that a few times for different PGs on different OSDs, but it doesn't help

#13 Updated by Matan Breizman over 1 year ago

Dan van der Ster wrote:

Good point. In fact it is sufficient to just create some files in the cephfs after taking a pool snapshot.

With my test cluster and 1 active mds, I took a pool snapshot, then untarred the linux kernel, called `sync`, then deep-scrubbed all cephfs.cephfs.meta PGs. This found several inconsistent objects:

Hi Dan,
I'm trying to reproduce this on a vstart cluster using ceph-fuse with no luck.
Does the following case causes inconsistencies for you?

MDS=1 MON=1 OSD=1 MGR=1 ../src/vstart.sh -X -G --msgr1 -n --without-dashboard
ceph-fuse /mnt/cephfs/
mkdir /mnt/cephfs/mydir
rados mksnap -p cephfs.a.meta testsnap1
tar -xvf v5.19.tar.gz
sync
ceph daemon mds.a flush journal
for i in {0..9}; do ceph pg deep-scrub 3.$i; done && 
for i in {a..f}; do ceph pg deep-scrub 3.$i; done

Resulting in ok status for all the pgs.
log_channel(cluster) log [DBG] : 3.<> deep-scrub ok

#14 Updated by Matan Breizman over 1 year ago

  • Status changed from New to Can't reproduce

#15 Updated by Pascal Ehlert about 1 year ago

Sorry to warm this up again but our cluster is still in an unhealthy state and we are trying to find ways to recover from it.

Something I apparently did not notice originally but found now is that in the Ceph metadata pool, we seem to have some (most) objects without snapshots, some with just the snapshot and no head and some with head and snapshots.

Here are some examples returned by `rados -p pool listsnaps $obj`:

20001b80584.00000000:
cloneid    snaps    size    overlap
1    1    0    []
head    -    0

10008f71b7d.00000000:
cloneid    snaps    size    overlap
head    -    0

100001cdca7.00000000:
cloneid    snaps    size    overlap
1    1    0    []

Would it be same to use `rmsnap` to remove the snapshots and for the objects missing `head` even to remove the entire object?

I don't think there is a valid reason to have snapshots in the metadata pool even when using the cephfs snapshot functionality, right?

Also available in: Atom PDF