Bug #48819
openfsck error: found stray (per-pg) omap data on omap_head
0%
Description
/a/kchai-2021-01-10_13:20:22-rados-master-distro-basic-smithi/
Updated by Josh Durgin over 3 years ago
This is happening due to an exception running ceph-objectstore-tool:
2021-01-10T13:59:28.286 ERROR:tasks.thrashosds.thrasher:exception: Traceback (most recent call last): File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 1069, in do_thrash self._do_thrash() File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 115, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 1201, in _do_thrash self.choose_action()() File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 308, in kill_osd format(ret=proc.exitstatus)) Exception: ceph-objectstore-tool: exp list-pgs failure with status 1
This halts execution of the thrasher thread, since nothing handles the exception, which leaves the cluster with at least one osd down. Thus, adding exception handling for these ceph-object-store calls would fix the problem.
In terms of why ceph-objectstore-tool falied, it may be due to an osd not being completely stopped when trying to run it.
Updated by Josh Durgin over 3 years ago
- Project changed from RADOS to bluestore
Cause appears to be a bluestore issue - manually running the objectstore command with --log-to-stderr we can see it's failing fsck with stray per-pg omap data:
2021-01-15T23:56:00.067+0000 7f77a73d6dc0 1 bluefs fsck 2021-01-15T23:56:00.067+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open walking object keyspace 2021-01-15T23:56:00.069+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open checking shared_blobs 2021-01-15T23:56:00.069+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open checking pool_statfs 2021-01-15T23:56:00.069+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open checking for stray omap data 2021-01-15T23:56:00.069+0000 7f77a73d6dc0 -1 bluestore(/var/lib/ceph/osd/ceph-4) fsck error: found stray (per-pg) omap data on omap_head 16092768843478336515 0 0 2021-01-15T23:56:00.069+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open checking deferred events 2021-01-15T23:56:00.069+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open checking freelist vs allocated 2021-01-15T23:56:00.141+0000 7f77a73d6dc0 1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck_on_open <<<FINISH>>> with 1 errors, 0 warnings, 0 repaired, 1 remaining in 0.075985 seconds 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) allocation stats probe 0: cnt: 0 frags: 0 size: 0 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) probe -1: 0, 0, 0 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) probe -2: 0, 0, 0 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) probe -4: 0, 0, 0 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) probe -8: 0, 0, 0 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) probe -16: 0, 0, 0 2021-01-15T23:56:00.141+0000 7f7769ffb700 0 bluestore(/var/lib/ceph/osd/ceph-4) ------------ 2021-01-15T23:56:00.141+0000 7f77a73d6dc0 4 rocksdb: [db_impl/db_impl.cc:397] Shutdown: canceling all background work 2021-01-15T23:56:00.141+0000 7f77a73d6dc0 4 rocksdb: [db_impl/db_impl.cc:573] Shutdown complete 2021-01-15T23:56:00.141+0000 7f77a73d6dc0 1 bluefs umount 2021-01-15T23:56:00.141+0000 7f77a73d6dc0 1 bdev(0x55dfbbe73bb0 /var/lib/ceph/osd/ceph-4/block) close 2021-01-15T23:56:00.321+0000 7f77a73d6dc0 1 freelist shutdown 2021-01-15T23:56:00.321+0000 7f77a73d6dc0 1 stupidalloc 0x0x55dfbbe7f7e0 shutdown 2021-01-15T23:56:00.321+0000 7f77a73d6dc0 1 bdev(0x55dfbbe3ba00 /var/lib/ceph/osd/ceph-4/block) close Mount failed with '(5) Input/output error' 2021-01-15T23:56:00.554+0000 7f77a73d6dc0 -1 bluestore(/var/lib/ceph/osd/ceph-4) _mount fsck found 1 errors
Updated by Neha Ojha over 3 years ago
- Related to Bug #48810: "mount fsck found 1 errors" in crimson-rados-master added
Updated by Igor Fedotov over 3 years ago
This looks related to my recent PR introducing per-pg omap naming scheme: https://github.com/ceph/ceph/pull/38651
Wondering how I can manually (and locally) run the test case to reproduce the issue?
Or at least some insight on the test case scenario would be highly appreciated.
Updated by Josh Durgin over 3 years ago
Igor, you can log on to ubuntu@smithi092 right now and check out osd.4.
The test isn't doing much special, it's thrashing osds (taking them up/down) and testing monitor recovery. Not doing much I/O other than osdmaps I would think.
Updated by Josh Durgin over 3 years ago
https://github.com/ceph/ceph/pull/38929 this will let us see this in teuthology logs at least
Updated by Josh Durgin over 3 years ago
- Subject changed from verify (valgrind) test times out to fsck error: found stray (per-pg) omap data on omap_head
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943842/
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944045/