Bug #4216
closedosd: dbojectmap incorrectly skipping ops
0%
Description
ubuntu@teuthology:/a/sage-ooo6/8248 crashed with a bad array index in remove_object_with_snap_hardlinks because a snapped object had an empty oi.snaps vector. this appears to be because when the object was written (first created) a bit earlier the setxattr failed with
2013-02-20 21:43:56.498875 7f6a9d4e1700 10 filestore(/var/lib/ceph/osd/ceph-1) _do_transaction on 0x3a93838 2013-02-20 21:43:56.498881 7f6a9d4e1700 15 filestore(/var/lib/ceph/osd/ceph-1) clone 0.7_head/d30883e7/plana605797-98/head//0 -> 0.7_head/d30883e7/plana605797-98/13c//0 2013-02-20 21:43:56.499046 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) _do_clone_range 0~3204735 to 0 2013-02-20 21:43:56.499058 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) _do_clone_range using copy 2013-02-20 21:43:56.499060 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) _do_copy_range 0~3204735 to 0 2013-02-20 21:43:56.503670 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) _do_copy_range 0~3204735 to 0 = 59007 2013-02-20 21:43:56.503683 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) objectmap clone 2013-02-20 21:43:56.503760 7f6a9d4e1700 20 filestore dbobjectmap: seq is 1992 2013-02-20 21:43:56.503778 7f6a9d4e1700 20 filestore dbobjectmap: seq is 1993 2013-02-20 21:43:56.503789 7f6a9d4e1700 20 filestore set_header: setting seq 1594 2013-02-20 21:43:56.503799 7f6a9d4e1700 20 filestore set_map_header: setting 1991 hoid d30883e7/plana605797-98/head//0 parent seq 1594 2013-02-20 21:43:56.503811 7f6a9d4e1700 20 filestore set_map_header: setting 1992 hoid d30883e7/plana605797-98/13c//0 parent seq 1594 2013-02-20 21:43:56.503907 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) fgetattrs 45 getting '_' 2013-02-20 21:43:56.503927 7f6a9d4e1700 20 filestore(/var/lib/ceph/osd/ceph-1) fgetattrs 45 getting 'snapset' 2013-02-20 21:43:56.503976 7f6a9d4e1700 15 filestore(/var/lib/ceph/osd/ceph-1) setattrs 0.7_head/d30883e7/plana605797-98/13c//0 2013-02-20 21:43:56.504059 7f6a9d4e1700 10 filestore hoid: d30883e7/plana605797-98/13c//0 skipping op, *spos 13278.1.0 <= header.spos 13278.1.0 2013-02-20 21:43:56.504100 7f6a9d4e1700 10 filestore hoid: d30883e7/plana605797-98/13c//0 skipping op, *spos 13278.1.0 <= header.spos 13278.1.0 2013-02-20 21:43:56.504129 7f6a9d4e1700 10 filestore(/var/lib/ceph/osd/ceph-1) setattrs 0.7_head/d30883e7/plana605797-98/13c//0 = 31 2013-02-20 21:43:56.504135 7f6a9d4e1700 10 filestore(/var/lib/ceph/osd/ceph-1) _set_replay_guard 13278.1.0 2013-02-20 21:43:56.573789 7f6a9dce2700 20 filestore dbobjectmap: seq is 1993
note that this is the first time this object appears on this particular OSD.
Updated by Sage Weil about 11 years ago
from gdb:
(gdb) p oi $5 = { soid = { oid = { name = { static npos = <optimized out>, _M_dataplus = { <std::allocator<char>> = { <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider: _M_p = 0x2c4d738 "plana605797-98" } } }, snap = { val = 18446744073709551614 }, hash = 3540550631, max = false, pool = 0, nspace = { static npos = <optimized out>, _M_dataplus = { <std::allocator<char>> = { <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider: _M_p = 0xc4ed98 "" } }, key = { static npos = <optimized out>, _M_dataplus = { <std::allocator<char>> = { <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider: _M_p = 0xc4ed98 "" } } }, oloc = { pool = 0, key = { static npos = <optimized out>, _M_dataplus = { <std::allocator<char>> = { <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider: _M_p = 0xc4ed98 "" } } }, category = { static npos = <optimized out>, _M_dataplus = { ---Type <return> to continue, or q <return> to quit--- <std::allocator<char>> = { <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider: _M_p = 0xc4ed98 "" } }, version = { version = 9, epoch = 7, __pad = 0 }, prior_version = { version = 8, epoch = 7, __pad = 0 }, user_version = { version = 9, epoch = 7, __pad = 0 }, last_reqid = { name = { _type = 8 '\b', _num = 4143, static TYPE_MON = 1, static TYPE_MDS = 2, static TYPE_OSD = 4, static TYPE_CLIENT = 8, static NEW = -1 }, tid = 262, inc = 0 }, size = 3204735, mtime = { tv = { tv_sec = 1361424682, tv_nsec = 196749000 } }, lost = false, wrlock_by = { name = { _type = 0 '\000', _num = 0, static TYPE_MON = 1, static TYPE_MDS = 2, static TYPE_OSD = 4, static TYPE_CLIENT = 8, static NEW = -1 }, tid = 0, inc = 0 }, snaps = { <std::_Vector_base<snapid_t, std::allocator<snapid_t> >> = { _M_impl = { <std::allocator<snapid_t>> = { ---Type <return> to continue, or q <return> to quit--- <__gnu_cxx::new_allocator<snapid_t>> = {<No data fields>}, <No data fields>}, members of std::_Vector_base<snapid_t, std::allocator<snapid_t> >::_Vector_impl: _M_start = 0x0, _M_finish = 0x0, _M_end_of_storage = 0x0 } }, <No data fields>}, truncate_seq = 0, truncate_size = 0, watchers = { _M_t = { _M_impl = { <std::allocator<std::_Rb_tree_node<std::pair<std::pair<unsigned long, entity_name_t> const, watch_info_t> > >> = { <__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::pair<unsigned long, entity_name_t> const, watch_info_t> > >> = {<No data fields>}, <No data fields>}, members of std::_Rb_tree<std::pair<unsigned long, entity_name_t>, std::pair<std::pair<unsigned long, entity_name_t> const, watch_info_t>, std::_Select1st<std::pair<std::pair<unsigned long, entity_name_t> const, watch_info_t> >, std::less<std::pair<unsigned long, entity_name_t> >, std::allocator<std::pair<std::pair<unsigned long, entity_name_t> const, watch_info_t> > >::_Rb_tree_impl<std::less<std::pair<unsigned long, entity_name_t> >, false>: _M_key_compare = { <std::binary_function<std::pair<unsigned long, entity_name_t>, std::pair<unsigned long, entity_name_t>, bool>> = {<No data fields>}, <No data fields>}, _M_header = { _M_color = std::_S_red, _M_parent = 0x0, _M_left = 0x7f5989ad1930, _M_right = 0x7f5989ad1930 }, _M_node_count = 0 } } }, uses_tmap = false } (gdb) (gdb) p oi.snaps $6 = { <std::_Vector_base<snapid_t, std::allocator<snapid_t> >> = { _M_impl = { <std::allocator<snapid_t>> = { <__gnu_cxx::new_allocator<snapid_t>> = {<No data fields>}, <No data fields>}, members of std::_Vector_base<snapid_t, std::allocator<snapid_t> >::_Vector_impl: _M_start = 0x0, _M_finish = 0x0, _M_end_of_storage = 0x0 } }, <No data fields>}
but not that when the object was written originally,
2013-02-20 21:43:56.051200 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] do_osd_op d30883e7/plana605797-98/head//0 [rollback fb] 2013-02-20 21:43:56.051222 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] do_osd_op rollback fb 2013-02-20 21:43:56.051243 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] _rollback_to d30883e7/plana605797-98/head//0 snapid fb 2013-02-20 21:43:56.051265 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] get_snapset_context plana605797-98 1 -> 2 2013-02-20 21:43:56.051285 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] find_object_context d30883e7/plana605797-98/fb//0 @fb snap set 0=[]:[]+head 2013-02-20 21:43:56.051309 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] get_object_context 0x503c780 d30883e7/plana605797-98/head/ /0 1 -> 2 2013-02-20 21:43:56.051331 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] find_object_context d30883e7/plana605797-98/head//0 want fb > snapset seq 0 -- HIT d30883e7/plana605797-98/head//0 2013-02-20 21:43:56.051354 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] put_snapset_context plana605797-98 2 -> 1 2013-02-20 21:43:56.051376 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] put_object_context 0x503c780 d30883e7/plana605797-98/head/ /0 2 -> 1 2013-02-20 21:43:56.051398 7f6a964d3700 15 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] do_osd_op_effects on session 0x2c0a0e0 2013-02-20 21:43:56.051419 7f6a964d3700 20 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] make_writeable d30883e7/plana605797-98/head//0 snapset=0x4 0034e0 snapc=13c=[13c,134,133,131,130,12b,129,127,126,120,11f,11e,11b,113,112,10a,105,102,100,fb,f7,f3,d8,ae,96] 2013-02-20 21:43:56.051523 7f6a964d3700 20 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] write_info bigbl 428 2013-02-20 21:43:56.051550 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] create_snap_collection 0.7_13c, set now [96~1,ae~1,d8~1,f3 ~1,113~1,120~1,13c~1] 2013-02-20 21:43:56.051587 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] cloning v 7'9 to d30883e7/plana605797-98/13c//0 v 411'135 snaps=[13c,134,133,131,130,12b,129,127,126,120,11f,11e,11b,113,112,10a,105,102,100,fb,f7,f3,d8,ae,96] 2013-02-20 21:43:56.051624 7f6a964d3700 20 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] make_writeable d30883e7/plana605797-98/head//0 done, snaps et=13c=[13c,134,133,131,130,12b,129,127,126,120,11f,11e,11b,113,112,10a,105,102,100,fb,f7,f3,d8,ae,96]:[13c]+head 2013-02-20 21:43:56.051659 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] set mtime to 2013-02-20 21:43:56.051131 2013-02-20 21:43:56.051693 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=15 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] final snapset 13c=[13c,134,133,131,130,12b,129,127,126,12 0,11f,11e,11b,113,112,10a,105,102,100,fb,f7,f3,d8,ae,96]:[13c]+head in d30883e7/plana605797-98/head//0 2013-02-20 21:43:56.051733 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 406'134 (0'0,406'134] local-les=317 n=16 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 lcod 405'132 mlcod 405'132 active+clean] append_log log(0'0,406'134] [411'135 (7'9) clone d30883 e7/plana605797-98/13c//0 by client.4143.0:3045 2013-02-20 21:31:22.196749 snaps [13c,134,133,131,130,12b,129,127,126,120,11f,11e,11b,113,112,10a,105,102,100,fb,f7,f3,d8,ae,96],411'136 (7'9) modify d30883e7/plana605797-98/head//0 by client.4143.0:3045 2013-02-20 21:43:56.051131] 2013-02-20 21:43:56.051797 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 411'135 (0'0,411'135] local-les=317 n=16 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 luod=406'134 lcod 405'132 mlcod 405'132 active+clean] add_log_entry 411'135 (7'9) clone d30883e7 /plana605797-98/13c//0 by client.4143.0:3045 2013-02-20 21:31:22.196749 snaps [13c,134,133,131,130,12b,129,127,126,120,11f,11e,11b,113,112,10a,105,102,100,fb,f7,f3,d8,ae,96] 2013-02-20 21:43:56.051861 7f6a964d3700 10 osd.1 pg_epoch: 411 pg[0.7( v 411'136 (0'0,411'136] local-les=317 n=16 ec=1 les/c 317/322 316/316/316) [1,4] r=0 lpr=316 luod=406'134 lcod 405'132 mlcod 405'132 active+clean] add_log_entry 411'136 (7'9) modify d30883e7 /plana605797-98/head//0 by client.4143.0:3045 2013-02-20 21:43:56.051131
note that the version is 411'135 there, and the prior version is 7'9.. which is what is in the oi xattr. basically, the rollback in DBObjectMap seems to have not worked.
Updated by Sage Weil about 11 years ago
root@plana46:~# gdb /usr/bin/ceph-osd /tmp/1361425499.5801.core
if you want to look at hte core
Updated by Samuel Just about 11 years ago
A bit below continuing on the same thread (7f6a9d4e1700):
2013-02-20 21:43:56.597514 7f6a9d4e1700 20 filestore dbobjectmap: seq is 1993
2013-02-20 21:43:56.597596 7f6a9e4e3700 5 filestore(/var/lib/ceph/osd/ceph-1) _journaled_ahead 0x5c23280 seq 13281 osr(default 0x28dc7d0) 0x4c19750
2013-02-20 21:43:56.597608 7f6a9d4e1700 10 filestore hoid: d30883e7/plana605797-98/13c//0 setting spos to 13278.1.0
2013-02-20 21:43:56.597616 7f6a9e4e3700 5 filestore(/var/lib/ceph/osd/ceph-1) queue_op 0x5c23280 seq 13281 osr(default 0x28dc7d0) 5304 bytes (queue has 28 ops and 31507 bytes)
2013-02-20 21:43:56.597617 7f6a9d4e1700 20 filestore set_map_header: setting 1992 hoid d30883e7/plana605797-98/13c//0 parent seq 1594
Above, we set the spos to 13278.1.0 at the end of the clone (for make_writeable). 13278.1.1 would have set the object_info attr for the clone. However, this is the last we hear from this thread before the OSD restarts. Note, we have _journaled_ahead up to seq 13281.
After the replay:
2013-02-20 21:44:53.074062 7f599b887780 10 filestore(/var/lib/ceph/osd/ceph-1) _do_transaction on 0x296cbd0
2013-02-20 21:44:53.074067 7f599b887780 15 filestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys meta/16ef7597/infos/head//-1
2013-02-20 21:44:53.074136 7f599b887780 10 filestore hoid: 16ef7597/infos/head//-1 not skipping op, *spos 13263.0.0
2013-02-20 21:44:53.074145 7f599b887780 10 filestore > header.spos 0.0.0
2013-02-20 21:44:53.074248 7f599b887780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 22: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-02-20 21:44:53.074610 7f5994ae9700 20 filestore(/var/lib/ceph/osd/ceph-1) flusher_entry start
2013-02-20 21:44:53.074631 7f5994ae9700 20 filestore(/var/lib/ceph/osd/ceph-1) flusher_entry sleeping
2013-02-20 21:44:53.074721 7f599b887780 5 filestore(/var/lib/ceph/osd/ceph-1) umount /var/lib/ceph/osd/ceph-1
2013-02-20 21:44:53.074757 7f5997aef700 20 filestore(/var/lib/ceph/osd/ceph-1) sync_entry force_sync set
2013-02-20 21:44:53.074808 7f5997aef700 15 filestore(/var/lib/ceph/osd/ceph-1) sync_entry committing 13263 sync_epoch 1
2013-02-20 21:44:53.074805 7f5994ae9700 20 filestore(/var/lib/ceph/osd/ceph-1) flusher_entry awoke
2013-02-20 21:44:53.074814 7f5994ae9700 20 filestore(/var/lib/ceph/osd/ceph-1) flusher_entry finish
2013-02-20 21:44:53.074828 7f5997aef700 15 filestore(/var/lib/ceph/osd/ceph-1) sync_entry doing a full sync (syncfs(2) if possible)
2013-02-20 21:44:53.180041 7f5997aef700 10 filestore(/var/lib/ceph/osd/ceph-1) sync_entry commit took 0.105250, interval was 0.163580
2013-02-20 21:44:53.180076 7f5997aef700 15 filestore(/var/lib/ceph/osd/ceph-1) sync_entry committed to op_seq 13263
2013-02-20 21:44:53.180519 7f599b887780 1 journal close /var/lib/ceph/osd/ceph-1/journal
we have replayed up to 13263.0.0 and stopped rather than reading up to 13281. The rest of the make_writeable transaction in 13278 including the object_info attr set did not get replayed. Still looking.