Project

General

Profile

Actions

Bug #40998

closed

ceph-objectstore-tool remove broken

Added by David Zafman over 4 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It seems like the remove function no longer works properly with snaps. Test errors are detected with OSDs down then restart never goes healthy due to snaptrim_errors. This may be due changes in snap handling in the OSD without corresponding changes to ceph-objectstore-tool.

ERROR:/home/dzafman/ceph/build/bin/ceph-objectstore-tool --no-mon-config --data-path dev/osd3 '["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}]' remove
ERROR:Should have failed, but got exit 0
ERROR:remove with --force failed for ["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}]
ERROR:removeall failed for ["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}]
ERROR:removeall failed for ["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}]
ERROR:Removeall didn't remove all objects ns1/REPobject1 : ['["1.1",{"oid":"REPobject1","key":"","snapid":1,"hash":1397205569,"max":0,"pool":1,"namespace":"ns1","max":0}]']
vstarting....DONE
Wait for health_ok...DONE
Wait for health_ok...Traceback (most recent call last):
  File "../qa/standalone/special/ceph_objectstore_tool.py", line 2055, in <module>
    status = main(sys.argv[1:])
  File "../qa/standalone/special/ceph_objectstore_tool.py", line 2011, in main
    ERRORS += test_removeall(CFSD_PREFIX, db, OBJREPPGS, REP_POOL, CEPH_BIN, OSDDIR, REP_NAME, NUM_CLONED_REP_OBJECTS)
  File "../qa/standalone/special/ceph_objectstore_tool.py", line 666, in test_removeall
    wait_for_health()
  File "../qa/standalone/special/ceph_objectstore_tool.py", line 63, in wait_for_health
    raise Exception("Time exceeded to go to health")
Exception: Time exceeded to go to health
Command exited with non-zero status 1
Elapsed 7:19.54 (439.54 seconds)
special/ceph_objectstore_tool.py .............. FAILED
1 TESTS FAILED, 1 TOTAL TESTS
2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] snap_trimmer posting
2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork: trimming snap 1
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0#
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.0) returns 0 SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..,buffer::list(len=63,
        buffer::ptr(0~63 0x557b85915700 in raw 0x557b85915700 len 63 nref 1)
)
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0#
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.4) returns 0 SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..,buffer::list(len=63,
        buffer::ptr(0~63 0x557b85915700 in raw 0x557b85915700 len 63 nref 1)
)
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0#
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..) returns 0 SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1,buffer::list(len=66,
        buffer::ptr(0~66 0x557b869e7ef0 in raw 0x557b869e7ef0 len 66 nref 1)
)
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0#
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.8) returns 0 SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1,buffer::list(len=66,
        buffer::ptr(0~66 0x557b869e7a20 in raw 0x557b869e7a20 len 66 nref 1)
)
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0#
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.C) returns 0 SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1,buffer::list(len=66,
        buffer::ptr(0~66 0x557b869e7ef0 in raw 0x557b869e7ef0 len 66 nref 1)
)
2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1
2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork react trimming 1:2c3ef4a4:::REPobject1:1
2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] get_object_context: obc NOT found in cache: 1:2c3ef4a4:::REPobject1:1
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4576): 1.0_head/#1:2c3ef4a4:::REPobject1:1# '_'
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c9180) wait_for_apply #1:2c3ef4a4:::REPobject1:1# done
2019-07-29T17:32:14.132+0000 7f76952da700 10 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4614): 1.0_head/#1:2c3ef4a4:::REPobject1:1# '_' = 242
2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4576): 1.0_head/#1:2c3ef4a4:::REPobject1:head# 'snapset'
2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c9180) wait_for_apply #1:2c3ef4a4:::REPobject1:head# done
2019-07-29T17:32:14.136+0000 7f76952da700 10 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) error opening file /home/dzafman/ceph/build/td/cot_dir/dev/osd1/current/1.0_head/REPobject1__head_252F7C34__1 with flags=2: (2) No such file or directory
2019-07-29T17:32:14.136+0000 7f76952da700 10 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4614): 1.0_head/#1:2c3ef4a4:::REPobject1:head# 'snapset' = -2
2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] populate_obc_watchers 1:2c3ef4a4:::REPobject1:1
2019-07-29T17:32:14.136+0000 7f76952da700 20 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] PrimaryLogPG::check_blacklisted_obc_watchers for obc 1:2c3ef4a4:::REPobject1:1
2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] get_object_context: creating obc from disk: 0x557b86a67e40
2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] get_object_context: 0x557b86a67e40 1:2c3ef4a4:::REPobject1:1 rwstate(none n=0 w=0) oi: 1:2c3ef4a4:::REPobject1:1(23'10 client.4147.0:1 dirty|data_digest s 440 uv 1 dd 75f7fd52 alloc_hint [0 0 0]) exists: 1 ssc: 0x557b86a31340 snapset: 0=[]:{}
2019-07-29T17:32:14.136+0000 7f76952da700 -1 log_channel(cluster) log [ERR] : trim_object: Can not trim 1:2c3ef4a4:::REPobject1:1 repair needed (no obc->ssc or !exists)
2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim+snaptrim_error trimq=[1~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: Snaptrim error=-2
2019-07-29T17:32:14.136+0000 7f76952da700 20 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim+snaptrim_error trimq=[1~1]] exit Trimming/AwaitAsyncWork
2019-07-29T17:32:14.136+0000 7f76952da700 20 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim+snaptrim_error trimq=[1~1]] exit Trimming
Actions #1

Updated by David Zafman over 4 years ago

  • Status changed from New to Can't reproduce

After rebuilding the problem went away.

Actions

Also available in: Atom PDF