Actions
Bug #40998
closedceph-objectstore-tool remove broken
Status:
Can't reproduce
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
It seems like the remove function no longer works properly with snaps. Test errors are detected with OSDs down then restart never goes healthy due to snaptrim_errors. This may be due changes in snap handling in the OSD without corresponding changes to ceph-objectstore-tool.
ERROR:/home/dzafman/ceph/build/bin/ceph-objectstore-tool --no-mon-config --data-path dev/osd3 '["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}]' remove ERROR:Should have failed, but got exit 0 ERROR:remove with --force failed for ["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}] ERROR:removeall failed for ["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}] ERROR:removeall failed for ["1.1", {"hash": 1397205569, "max": 0, "oid": "REPobject1", "namespace": "ns1", "snapid": -2, "key": "", "pool": 1}] ERROR:Removeall didn't remove all objects ns1/REPobject1 : ['["1.1",{"oid":"REPobject1","key":"","snapid":1,"hash":1397205569,"max":0,"pool":1,"namespace":"ns1","max":0}]'] vstarting....DONE Wait for health_ok...DONE Wait for health_ok...Traceback (most recent call last): File "../qa/standalone/special/ceph_objectstore_tool.py", line 2055, in <module> status = main(sys.argv[1:]) File "../qa/standalone/special/ceph_objectstore_tool.py", line 2011, in main ERRORS += test_removeall(CFSD_PREFIX, db, OBJREPPGS, REP_POOL, CEPH_BIN, OSDDIR, REP_NAME, NUM_CLONED_REP_OBJECTS) File "../qa/standalone/special/ceph_objectstore_tool.py", line 666, in test_removeall wait_for_health() File "../qa/standalone/special/ceph_objectstore_tool.py", line 63, in wait_for_health raise Exception("Time exceeded to go to health") Exception: Time exceeded to go to health Command exited with non-zero status 1 Elapsed 7:19.54 (439.54 seconds) special/ceph_objectstore_tool.py .............. FAILED 1 TESTS FAILED, 1 TOTAL TESTS
2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] snap_trimmer posting 2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork: trimming snap 1 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0# 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.0) returns 0 SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..,buffer::list(len=63, buffer::ptr(0~63 0x557b85915700 in raw 0x557b85915700 len 63 nref 1) ) 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0# 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.4) returns 0 SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..,buffer::list(len=63, buffer::ptr(0~63 0x557b85915700 in raw 0x557b85915700 len 63 nref 1) ) 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1.. 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0# 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.43C7F252.1.REPobject1..) returns 0 SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1,buffer::list(len=66, buffer::ptr(0~66 0x557b869e7ef0 in raw 0x557b869e7ef0 len 66 nref 1) ) 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0# 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.8) returns 0 SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1,buffer::list(len=66, buffer::ptr(0~66 0x557b869e7a20 in raw 0x557b869e7a20 len 66 nref 1) ) 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c81c0) wait_for_apply #-1:c0371625:::snapmapper:0# done 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) get_omap_iterator(5390): meta/#-1:c0371625:::snapmapper:0# 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim get_next(SNA_1_0000000000000001_0000000000000001.C) returns 0 SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1,buffer::list(len=66, buffer::ptr(0~66 0x557b869e7ef0 in raw 0x557b869e7ef0 len 66 nref 1) ) 2019-07-29T17:32:14.132+0000 7f76952da700 20 snap_mapper.get_next_objects_to_trim SNA_1_0000000000000001_0000000000000001.C5E790A5.1.REPobject2..ns1 2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork react trimming 1:2c3ef4a4:::REPobject1:1 2019-07-29T17:32:14.132+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] get_object_context: obc NOT found in cache: 1:2c3ef4a4:::REPobject1:1 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4576): 1.0_head/#1:2c3ef4a4:::REPobject1:1# '_' 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c9180) wait_for_apply #1:2c3ef4a4:::REPobject1:1# done 2019-07-29T17:32:14.132+0000 7f76952da700 10 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4614): 1.0_head/#1:2c3ef4a4:::REPobject1:1# '_' = 242 2019-07-29T17:32:14.132+0000 7f76952da700 15 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4576): 1.0_head/#1:2c3ef4a4:::REPobject1:head# 'snapset' 2019-07-29T17:32:14.132+0000 7f76952da700 20 filestore.osr(0x557b858c9180) wait_for_apply #1:2c3ef4a4:::REPobject1:head# done 2019-07-29T17:32:14.136+0000 7f76952da700 10 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) error opening file /home/dzafman/ceph/build/td/cot_dir/dev/osd1/current/1.0_head/REPobject1__head_252F7C34__1 with flags=2: (2) No such file or directory 2019-07-29T17:32:14.136+0000 7f76952da700 10 filestore(/home/dzafman/ceph/build/td/cot_dir/dev/osd1) getattr(4614): 1.0_head/#1:2c3ef4a4:::REPobject1:head# 'snapset' = -2 2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] populate_obc_watchers 1:2c3ef4a4:::REPobject1:1 2019-07-29T17:32:14.136+0000 7f76952da700 20 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] PrimaryLogPG::check_blacklisted_obc_watchers for obc 1:2c3ef4a4:::REPobject1:1 2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] get_object_context: creating obc from disk: 0x557b86a67e40 2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim trimq=[1~1]] get_object_context: 0x557b86a67e40 1:2c3ef4a4:::REPobject1:1 rwstate(none n=0 w=0) oi: 1:2c3ef4a4:::REPobject1:1(23'10 client.4147.0:1 dirty|data_digest s 440 uv 1 dd 75f7fd52 alloc_hint [0 0 0]) exists: 1 ssc: 0x557b86a31340 snapset: 0=[]:{} 2019-07-29T17:32:14.136+0000 7f76952da700 -1 log_channel(cluster) log [ERR] : trim_object: Can not trim 1:2c3ef4a4:::REPobject1:1 repair needed (no obc->ssc or !exists) 2019-07-29T17:32:14.136+0000 7f76952da700 10 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim+snaptrim_error trimq=[1~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: Snaptrim error=-2 2019-07-29T17:32:14.136+0000 7f76952da700 20 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim+snaptrim_error trimq=[1~1]] exit Trimming/AwaitAsyncWork 2019-07-29T17:32:14.136+0000 7f76952da700 20 osd.1 pg_epoch: 68 pg[1.0( v 23'15 (0'0,23'15] local-lis/les=67/68 n=6 ec=15/15 lis/c 67/67 les/c/f 68/68/0 67/67/67) [1,0,2] r=0 lpr=67 crt=23'15 lcod 0'0 mlcod 0'0 active+clean+snaptrim+snaptrim_error trimq=[1~1]] exit Trimming
Updated by David Zafman over 4 years ago
- Status changed from New to Can't reproduce
After rebuilding the problem went away.
Actions