Bug #1028
closedsegfault in OSDMap::object_locator_to_pg
Description
As reported yesterday on IRC, this is a crash I get when starting an OSD.
this is at v0.27
From the logs:
2011-04-26 19:33:53.105243 7f7ff576c700 osd0 2055 pg[3.9( v 1956'286201 (1956'286196,1956'286201]+backlog n=5 ec=2 les=2047 2051/2054/2051) [0,1] r=0 lcod 0'0 mlcod 0'0 active] oi.user_version=0'0 is_modify=1 2011-04-26 19:33:53.105300 7f7ff576c700 osd0 2055 pg[3.9( v 1956'286201 (1956'286196,1956'286201]+backlog n=5 ec=2 les=2047 2051/2054/2051) [0,1] r=0 lcod 0'0 mlcod 0'0 active] watch: ctx->obc=0x1c0b480 cookie=1 oi.version=1 ctx->at_version=2055'286202 2011-04-26 19:33:53.105315 7f7ff576c700 osd0 2055 pg[3.9( v 1956'286201 (1956'286196,1956'286201]+backlog n=5 ec=2 les=2047 2051/2054/2051) [0,1] r=0 lcod 0'0 mlcod 0'0 active] watch: oi.user_version=0 *** Caught signal (Segmentation fault) ** in thread 0x7f7ff576c700 ceph version (commit:) 1: /usr/bin/cosd() [0x642279] 2: (()+0xfc60) [0x7f800273fc60] 3: (OSDMap::object_locator_to_pg(object_t const&, object_locator_t const&)+0x72) [0x4d6a52] 4: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&, ceph::buffer::list&)+0x8207) [0x4c5637] 5: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x68) [0x4c6278] 6: (ReplicatedPG::do_op(MOSDOp*)+0x97f) [0x4c7c3f] 7: (OSD::dequeue_op(PG*)+0x36d) [0x51050d] 8: (ThreadPool::worker()+0x2a2) [0x626fa2] 9: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d] 10: (()+0x6d8c) [0x7f8002736d8c] 11: (clone()+0x6d) [0x7f800138404d]
What GDB has to say:
#0 0x00007f104e54db3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f104e54db3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x0000000000641a12 in reraise_fatal (signum=11) at common/signal.cc:63 #2 0x000000000064248c in handle_fatal_signal (signum=11) at common/signal.cc:110 #3 <signal handler called> #4 0x00000000004d6a52 in OSDMap::object_locator_to_pg (this=0x1cb7900, oid=..., loc=...) at osd/OSDMap.h:748 #5 0x00000000004c5637 in ReplicatedPG::do_osd_ops (this=0x2260000, ctx=0x467d678, ops=..., odata=...) at osd/ReplicatedPG.cc:1617 #6 0x00000000004c6278 in ReplicatedPG::prepare_transaction (this=0x2260000, ctx=0x3c81b00) at osd/ReplicatedPG.cc:2240 #7 0x00000000004c7c3f in ReplicatedPG::do_op (this=0x2260000, op=0x4711000) at osd/ReplicatedPG.cc:501 #8 0x000000000051050d in OSD::dequeue_op (this=0x1ca7000, pg=0x2260000) at osd/OSD.cc:5437 #9 0x0000000000626fa2 in ThreadPool::worker (this=0x1ca73f0) at common/WorkQueue.cc:44 #10 0x0000000000529f1d in ThreadPool::WorkThread::entry (this=<value optimized out>) at ./common/WorkQueue.h:113 #11 0x00007f104e544d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #12 0x00007f104d19204d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #13 0x0000000000000000 in ?? ()
(gdb) f 4 #4 0x00000000004d6a52 in OSDMap::object_locator_to_pg (this=0x1cb7900, oid=..., loc=...) at osd/OSDMap.h:748 748 osd/OSDMap.h: No such file or directory. in osd/OSDMap.h
(gdb) p oid $1 = (const object_t &) @0x2e0a248: {name = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x467d678 "munin.rbd"}}}
(gdb) p loc $2 = (const object_locator_t &) @0x2e0a258: {pool = -1, preferred = -1, key = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x8e4778 ""}}} (gdb) f 7 #7 0x00000000004c7c3f in ReplicatedPG::do_op (this=0x2260000, op=0x4711000) at osd/ReplicatedPG.cc:501 501 osd/ReplicatedPG.cc: No such file or directory. in osd/ReplicatedPG.cc
(gdb) p op->oloc $3 = {pool = 3, preferred = -1, key = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x8e4778 ""}}}
Files
Updated by Sage Weil almost 13 years ago
- Category set to OSD
- Target version set to v0.27.1
added some debug checks in the code to track this one down, 85292b367b0e6e6d8963de32ad198482500c887f
Updated by Sage Weil almost 13 years ago
- Status changed from New to In Progress
- Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 654
Updated by ar Fred almost 13 years ago
- File osd.0.log.gz osd.0.log.gz added
- File osd.1.log.gz osd.1.log.gz added
- File osd.2.log.gz osd.2.log.gz added
Cherry-picked 85292b367b0e6e6d8963de32ad198482500c887f into the stable branch, here are the logs... I kept the core files, so do not hesitate if you need some data from gdb!
thanks!
Updated by Sage Weil almost 13 years ago
This problem is that the locator stored in the object_info_t on disk is wrong. Can you say anything about when the objects were written? Is this a really old file system that got upgraded by any chance?
This should get up you and running:
diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index 95473f4..15dd2a5 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -2726,6 +2726,13 @@ ReplicatedPG::ObjectContext *ReplicatedPG::get_object_context(const sobject_t& s } else { object_info_t oi(bv); + + // if the on-disk oloc is bad/undefined, set up the pool value + if (oi.oloc.get_pool() < 0) { + oi.oloc.pool = info.pgid.pool(); + oi.oloc.preferred = info.pgid.preferred(); + } + SnapSetContext *ssc = NULL; if (can_create) ssc = get_snapset_context(soid.oid, true);
Updated by ar Fred almost 13 years ago
Thank you for the patch, compiling right now.
This is indeed an old FS that got created approximately a year ago, and upgraded on a regular basis since that time!
Updated by Sage Weil almost 13 years ago
- Status changed from In Progress to Resolved