Project

General

Profile

Actions

Bug #10119

closed

0.88 EC+ KV OSDs crashing

Added by Kenneth Waegeman over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,
I am further testing the EC+ KV setup, and the OSDs were crashing again, so I updated ticket #9727.
But after the OSDs were initially crashed without any logging information, I tried to restart them a few times, but they crash again immediately, and now with this error:

  -26> 2014-11-17 15:23:00.614802 7fca3492f700  1 -- 10.141.8.181:6825/1741 <== osd.5 10.141.8.180:0/33493 96 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:00.614191) v2 ==== 47+0+0 (4154056460 0 0) 0x51850a0 con 0x5ff5280
   -25> 2014-11-17 15:23:00.614833 7fca3492f700  1 -- 10.141.8.181:6825/1741 --> 10.141.8.180:0/33493 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:00.614191) v2 -- ?+0 0xa6185a0 con 0x5ff5280
   -24> 2014-11-17 15:23:00.614860 7fca3312c700  1 -- 10.143.8.181:6825/1741 <== osd.5 10.141.8.180:0/33493 96 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:00.614191) v2 ==== 47+0+0 (4154056460 0 0) 0x10779fe0 con 0x5ff4fc0
   -23> 2014-11-17 15:23:00.614877 7fca3312c700  1 -- 10.143.8.181:6825/1741 --> 10.141.8.180:0/33493 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:00.614191) v2 -- ?+0 0xa8b9680 con 0x5ff4fc0
   -22> 2014-11-17 15:23:00.850437 7fca3492f700  1 -- 10.141.8.181:6825/1741 <== osd.44 10.143.8.182:0/35567 108 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:00.849840) v2 ==== 47+0+0 (3661645159 0 0) 0xa8205a0 con 0x63aa3c0
   -21> 2014-11-17 15:23:00.850471 7fca3492f700  1 -- 10.141.8.181:6825/1741 --> 10.143.8.182:0/35567 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:00.849840) v2 -- ?+0 0x51850a0 con 0x63aa3c0
   -20> 2014-11-17 15:23:00.850485 7fca3312c700  1 -- 10.143.8.181:6825/1741 <== osd.44 10.143.8.182:0/35567 108 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:00.849840) v2 ==== 47+0+0 (3661645159 0 0) 0x5184ce0 con 0x5e95ac0
   -19> 2014-11-17 15:23:00.850503 7fca3312c700  1 -- 10.143.8.181:6825/1741 --> 10.143.8.182:0/35567 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:00.849840) v2 -- ?+0 0x10779fe0 con 0x5e95ac0
   -18> 2014-11-17 15:23:00.926742 7fca3312c700  1 -- 10.143.8.181:6825/1741 <== osd.28 10.143.8.181:0/60984 106 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:00.926191) v2 ==== 47+0+0 (377939256 0 0) 0xa6c2b20 con 0x63b6ca0
   -17> 2014-11-17 15:23:00.926765 7fca3312c700  1 -- 10.143.8.181:6825/1741 --> 10.143.8.181:0/60984 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:00.926191) v2 -- ?+0 0x5184ce0 con 0x63b6ca0
   -16> 2014-11-17 15:23:00.926861 7fca3492f700  1 -- 10.141.8.181:6825/1741 <== osd.28 10.143.8.181:0/60984 106 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:00.926191) v2 ==== 47+0+0 (377939256 0 0) 0xa42c740 con 0x63b7640
   -15> 2014-11-17 15:23:00.926878 7fca3492f700  1 -- 10.141.8.181:6825/1741 --> 10.143.8.181:0/60984 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:00.926191) v2 -- ?+0 0xa8205a0 con 0x63b7640
   -14> 2014-11-17 15:23:01.081963 7fca3312c700  1 -- 10.143.8.181:6825/1741 <== osd.40 10.141.8.182:0/32395 108 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:01.081429) v2 ==== 47+0+0 (2256845560 0 0) 0xa8b4560 con 0x63b6720
   -13> 2014-11-17 15:23:01.081994 7fca3312c700  1 -- 10.143.8.181:6825/1741 --> 10.141.8.182:0/32395 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:01.081429) v2 -- ?+0 0xa6c2b20 con 0x63b6720
   -12> 2014-11-17 15:23:01.082029 7fca3492f700  1 -- 10.141.8.181:6825/1741 <== osd.40 10.141.8.182:0/32395 108 ==== osd_ping(ping e205 stamp 2014-11-17 15:23:01.081429) v2 ==== 47+0+0 (2256845560 0 0) 0xa61e180 con 0x64e2c00
   -11> 2014-11-17 15:23:01.082055 7fca3492f700  1 -- 10.141.8.181:6825/1741 --> 10.141.8.182:0/32395 -- osd_ping(ping_reply e205 stamp 2014-11-17 15:23:01.081429) v2 -- ?+0 0xa42c740 con 0x64e2c00
   -10> 2014-11-17 15:23:01.124234 7fca1f87c700  1 -- 10.143.8.181:6824/1741 <== osd.29 10.143.8.181:6818/1234 117 ==== osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+0 (724286469 0 0) 0xd5eac00 con 0x5ff35a0
    -9> 2014-11-17 15:23:01.124258 7fca1f87c700  5 -- op tracker -- seq: 1721, time: 2014-11-17 15:23:01.124154, event: header_read, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -8> 2014-11-17 15:23:01.124265 7fca1f87c700  5 -- op tracker -- seq: 1721, time: 2014-11-17 15:23:01.124156, event: throttled, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -7> 2014-11-17 15:23:01.124272 7fca1f87c700  5 -- op tracker -- seq: 1721, time: 2014-11-17 15:23:01.124228, event: all_read, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -6> 2014-11-17 15:23:01.124284 7fca1f87c700  5 -- op tracker -- seq: 1721, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -5> 2014-11-17 15:23:01.124323 7fca2c11e700  5 -- op tracker -- seq: 1721, time: 2014-11-17 15:23:01.124323, event: reached_pg, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -4> 2014-11-17 15:23:01.124333 7fca2c11e700  5 -- op tracker -- seq: 1721, time: 2014-11-17 15:23:01.124333, event: started, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -3> 2014-11-17 15:23:01.124342 7fca2c11e700  1 -- 10.143.8.181:6824/1741 --> 10.143.8.181:6818/1234 -- osd_sub_op_reply(unknown.0.0:0 2.7ds0 0//0//-1 [scrub-reserve] ack, result = 0) v2 -- ?+1 0x8661080 con 0x5ff35a0
    -2> 2014-11-17 15:23:01.124356 7fca2c11e700  5 -- op tracker -- seq: 1721, time: 2014-11-17 15:23:01.124356, event: done, op: osd_sub_op(unknown.0.0:0 2.7ds4 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
    -1> 2014-11-17 15:23:01.125415 7fca37134700  1 -- 10.143.8.181:6824/1741 <== osd.29 10.143.8.181:6818/1234 118 ==== replica scrub(pg: 2.7ds4,from:0'0,to:116'42,epoch:205,start:0//0//-1,end:3cb6e67d//0//-1,chunky:1,deep:0,version:5) v5 ==== 126+0+0 (2329518172 0 0) 0x95dd780 con 0x5ff35a0
     0> 2014-11-17 15:23:01.128900 7fca29118700 -1 os/GenericObjectMap.cc: In function 'int GenericObjectMap::list_objects(const coll_t&, ghobject_t, int, std::vector<ghobject_t>*, ghobject_t*)' thread 7fca29118700 time 2014-11-17 15:23:01.125860
os/GenericObjectMap.cc: 1098: FAILED assert(start <= header.oid)

 ceph version 0.88 (4be687bf4480474117f56c387febc75c904036be)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb8b095]
 2: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x474) [0xa60384]
 3: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x274) [0x923624]
 4: (KeyValueStore::collection_list_range(coll_t, ghobject_t, ghobject_t, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x164) [0x947d24]
 5: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x106) [0x8c12c6]
 6: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x268) [0x7c1f18]
 7: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x502) [0x7c2832]
 8: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xda) [0x6c804a]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa66) [0xb7bdd6]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xb7ce60]
 11: (()+0x7df3) [0x7fca49a87df3]
 12: (clone()+0x6d) [0x7fca4854e01d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -22> 2014-11-17 15:23:01.282625 7fca24606700  2 -- 10.143.8.181:6824/1741 >> 10.143.8.180:6816/36202 pipe(0x5d25800 sd=23 :46066 s=2 pgs=375 cs=1 l=0 c=0x5d3ad60).reader couldn't read tag, (11) Resource temporarily unavailable
   -21> 2014-11-17 15:23:01.282681 7fca24606700  2 -- 10.143.8.181:6824/1741 >> 10.143.8.180:6816/36202 pipe(0x5d25800 sd=23 :46066 s=2 pgs=375 cs=1 l=0 c=0x5d3ad60).fault (11) Resource temporarily unavailable
   -20> 2014-11-17 15:23:01.282697 7fca24606700  0 -- 10.143.8.181:6824/1741 >> 10.143.8.180:6816/36202 pipe(0x5d25800 sd=23 :46066 s=2 pgs=375 cs=1 l=0 c=0x5d3ad60).fault with nothing to send, going to standby
   -19> 2014-11-17 15:23:01.282747 7fca1ea6e700  2 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6232000 sd=66 :36760 s=2 pgs=67 cs=1 l=1 c=0x5ff7220).reader couldn't read tag, (0) Success
   -18> 2014-11-17 15:23:01.282788 7fca1ea6e700  2 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6232000 sd=66 :36760 s=2 pgs=67 cs=1 l=1 c=0x5ff7220).fault (0) Success
   -17> 2014-11-17 15:23:01.282771 7fca1e96d700  2 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6220000 sd=65 :45986 s=2 pgs=67 cs=1 l=1 c=0x5ff6880).reader couldn't read tag, (0) Success
   -16> 2014-11-17 15:23:01.282808 7fca1e96d700  2 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6220000 sd=65 :45986 s=2 pgs=67 cs=1 l=1 c=0x5ff6880).fault (0) Success
   -15> 2014-11-17 15:23:01.282828 7fca35931700  1 -- 10.143.8.181:0/1741 mark_down 0x5ff6880 -- 0x6220000
   -14> 2014-11-17 15:23:01.282854 7fca39e99700  2 -- 10.143.8.181:6825/1741 >> 10.143.8.180:0/36202 pipe(0x569c800 sd=119 :6825 s=2 pgs=475 cs=1 l=1 c=0x63b3c80).reader couldn't read tag, (0) Success
   -13> 2014-11-17 15:23:01.282875 7fca39e99700  2 -- 10.143.8.181:6825/1741 >> 10.143.8.180:0/36202 pipe(0x569c800 sd=119 :6825 s=2 pgs=475 cs=1 l=1 c=0x63b3c80).fault (0) Success
   -12> 2014-11-17 15:23:01.283024 7fca39d98700  2 -- 10.141.8.181:6825/1741 >> 10.143.8.180:0/36202 pipe(0x6475800 sd=118 :6825 s=2 pgs=476 cs=1 l=1 c=0x63a61a0).reader couldn't read tag, (0) Success
   -11> 2014-11-17 15:23:01.283046 7fca39d98700  2 -- 10.141.8.181:6825/1741 >> 10.143.8.180:0/36202 pipe(0x6475800 sd=118 :6825 s=2 pgs=476 cs=1 l=1 c=0x63a61a0).fault (0) Success
   -10> 2014-11-17 15:23:01.283535 7fca39c97700  2 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6220000 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f7a0).connect error 10.141.8.180:6817/36202, (111) Connection refused
    -9> 2014-11-17 15:23:01.283619 7fca39d98700  2 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6475800 sd=65 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f4e0).connect error 10.143.8.180:6817/36202, (111) Connection refused
    -8> 2014-11-17 15:23:01.283656 7fca39d98700  2 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6475800 sd=65 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f4e0).fault (111) Connection refused
    -7> 2014-11-17 15:23:01.283685 7fca39d98700  0 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6475800 sd=65 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f4e0).fault
    -6> 2014-11-17 15:23:01.283751 7fca39c97700  2 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6220000 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f7a0).fault (111) Connection refused
    -5> 2014-11-17 15:23:01.283777 7fca39c97700  0 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6220000 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f7a0).fault
    -4> 2014-11-17 15:23:01.283833 7fca39d98700  2 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6475800 sd=65 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f4e0).connect error 10.143.8.180:6817/36202, (111) Connection refused
    -3> 2014-11-17 15:23:01.284116 7fca39d98700  2 -- 10.143.8.181:0/1741 >> 10.143.8.180:6817/36202 pipe(0x6475800 sd=65 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f4e0).fault (111) Connection refused
    -2> 2014-11-17 15:23:01.284135 7fca39c97700  2 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6220000 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f7a0).connect error 10.141.8.180:6817/36202, (111) Connection refused
    -1> 2014-11-17 15:23:01.284168 7fca39c97700  2 -- 10.143.8.181:0/1741 >> 10.141.8.180:6817/36202 pipe(0x6220000 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x5d3f7a0).fault (111) Connection refused
     0> 2014-11-17 15:23:01.290314 7fca29118700 -1 *** Caught signal (Aborted) **
 in thread 7fca29118700

 ceph version 0.88 (4be687bf4480474117f56c387febc75c904036be)
 1: /usr/bin/ceph-osd() [0xa97bb2]
 2: (()+0xf130) [0x7fca49a8f130]
 3: (gsignal()+0x39) [0x7fca4848d5c9]
 4: (abort()+0x148) [0x7fca4848ecd8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fca48da09d5]
 6: (()+0x5e946) [0x7fca48d9e946]
 7: (()+0x5e973) [0x7fca48d9e973]
 8: (()+0x5eb9f) [0x7fca48d9eb9f]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xb8b28a]
 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x474) [0xa60384]
 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x274) [0x923624]
 12: (KeyValueStore::collection_list_range(coll_t, ghobject_t, ghobject_t, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x164) [0x947d24]
 13: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x106) [0x8c12c6]
 14: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x268) [0x7c1f18]
 15: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x502) [0x7c2832]
 16: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xda) [0x6c804a]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa66) [0xb7bdd6]
 18: (ThreadPool::WorkThread::entry()+0x10) [0xb7ce60]
 19: (()+0x7df3) [0x7fca49a87df3]
 20: (clone()+0x6d) [0x7fca4854e01d]


The OSDs will not start again.

Because it is a different ceph version and a different error I made this new ticket..

Thanks!
Kenneth

Actions #1

Updated by Haomai Wang over 9 years ago

Hmm, it's strange because I already fixed this bug previously. Maybe it's another?

Could you run crashed OSD again with debug_keyvaluestore=20/20?

Actions #2

Updated by Kenneth Waegeman over 9 years ago

I added the debug_keyvaluestore logging, and restarted them. The osds starting to crash immediately again, but there is no stacktrace anymore.. I get this on the osds:


2014-11-18 11:04:29.882303 7f5210216700  5 submit_transaction r = 0
2014-11-18 11:04:29.882355 7f5210216700 10 _do_op 0x5933e40 seq 10 r = 0, finisher 0x5307d80 0
2014-11-18 11:04:29.882366 7f5210216700 10 _finish_op 0x5933e40 seq 10 osr(2.65s11 0x47805d0)/0x47805d0
2014-11-18 11:04:29.924235 7f5210a17700  5 submit_transaction r = 0
2014-11-18 11:04:29.924402 7f5210a17700 10 _do_op 0x559db80 seq 24 r = 0, finisher 0x5adfa20 0
2014-11-18 11:04:29.924412 7f5210a17700 10 _finish_op 0x559db80 seq 24 osr(2.59s6 0x47805c0)/0x47805c0
2014-11-18 11:04:29.999487 7f51f7832700  0 -- 10.143.8.181:6802/59834 >> 10.143.8.181:6812/59941 pipe(0x53da000 sd=29 :44612 s=2 pgs=193 cs=1 l=0 c=0x5266b40).fault, initiating reconnect
2014-11-18 11:04:29.999655 7f51f7b35700  0 -- 10.143.8.181:6802/59834 >> 10.143.8.181:6812/59941 pipe(0x53da000 sd=29 :44612 s=1 pgs=193 cs=2 l=0 c=0x5266b40).fault
2014-11-18 11:04:30.000113 7f51f2cee700  0 -- 10.141.8.181:0/59834 >> 10.143.8.181:6813/59941 pipe(0x5b1d800 sd=63 :0 s=1 pgs=0 cs=0 l=1 c=0x59da7e0).fault
2014-11-18 11:04:30.000132 7f51f4607700  0 -- 10.141.8.181:0/59834 >> 10.141.8.181:6813/59941 pipe(0x611d800 sd=77 :0 s=1 pgs=0 cs=0 l=1 c=0x5b694a0).fault
2014-11-18 11:04:30.055067 7f51f8a44700  0 -- 10.143.8.181:6802/59834 >> 10.143.8.180:6812/34367 pipe(0x524a000 sd=21 :54585 s=1 pgs=505 cs=2 l=0 c=0x5261600).fault
2014-11-18 11:04:30.055697 7f51fde52700 15 getattr 2.4as0_head/6b75e54a/10000006252.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.055789 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 6b75e54a/10000006252.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.055795 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.055798 7f51fde52700 15 getattr 2.4as0_head/6b75e54a/10000006252.00000000/snapdir//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.055831 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 6b75e54a/10000006252.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.055836 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.055909 7f5200657700 15 getattr 2.4as0_head/6b75e54a/10000006252.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.056002 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 6b75e54a/10000006252.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.056009 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056074 7f51fde52700 15 getattr 2.4as0_head/6d4954a/100000062eb.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.056129 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 6d4954a/100000062eb.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.056135 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056139 7f51fde52700 15 getattr 2.4as0_head/6d4954a/100000062eb.00000000/snapdir//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.056175 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 6d4954a/100000062eb.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.056180 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056701 7f5200657700 15 getattr 2.4as0_head/34b2f64a/100000061a1.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.056747 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 34b2f64a/100000061a1.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.056752 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056755 7f5200657700 15 getattr 2.4as0_head/34b2f64a/100000061a1.00000000/snapdir//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.056784 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 34b2f64a/100000061a1.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.056802 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056856 7f51fde52700 15 getattr 2.4as0_head/d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.056902 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.056907 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056911 7f51fde52700 15 getattr 2.4as0_head/d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0 'snapset'
2014-11-18 11:04:30.056941 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.056946 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056949 7f51fde52700 15 getattr 2.4as0_head/d944ec4a/10000001fa5.00000000/snapdir//2/18446744073709551615/0 'snapset'
2014-11-18 11:04:30.056976 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid d944ec4a/10000001fa5.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.056981 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.056995 7f51fde52700 15 getattr 2.4as0_head/d944ec4a/10000001fa5.00000000/snapdir//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057021 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid d944ec4a/10000001fa5.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.057026 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057169 7f5200657700 15 getattr 2.4as0_head/6d4954a/100000062eb.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057207 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 6d4954a/100000062eb.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.057212 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057247 7f51fde52700 15 getattr 2.4as0_head/34b2f64a/100000061a1.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057280 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 34b2f64a/100000061a1.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.057285 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057325 7f5200657700 15 getattr 2.4as0_head/8337d8ca/100000064b1.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057361 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 8337d8ca/100000064b1.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.057366 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057372 7f5200657700 15 getattr 2.4as0_head/8337d8ca/100000064b1.00000000/snapdir//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057398 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 8337d8ca/100000064b1.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.057401 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057434 7f51fde52700 15 getattr 2.4as0_head/8337d8ca/100000064b1.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057467 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 8337d8ca/100000064b1.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.057472 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057531 7f51fde52700 15 getattr 2.4as0_head/787b0d4a/100000064e6.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057571 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 787b0d4a/100000064e6.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.057578 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057582 7f51fde52700 15 getattr 2.4as0_head/787b0d4a/100000064e6.00000000/snapdir//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057609 7f51fde52700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 787b0d4a/100000064e6.00000000/snapdir//2/18446744073709551615/0
2014-11-18 11:04:30.057614 7f51fde52700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.057659 7f5200657700 15 getattr 2.4as0_head/787b0d4a/100000064e6.00000000/head//2/18446744073709551615/0 '_'
2014-11-18 11:04:30.057701 7f5200657700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid 787b0d4a/100000064e6.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.057706 7f5200657700 10 getattr lookup_strip_header failed: r =-2
2014-11-18 11:04:30.059743 7f51fae4c700 10 stat 2.4as0_head/d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.059805 7f51fae4c700 20 lookup_strip_header failed to get strip_header  cid 2.4as0_head oid d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0
2014-11-18 11:04:30.059811 7f51fae4c700 10 stat 2.4as0_head/d944ec4a/10000001fa5.00000000/head//2/18446744073709551615/0=-2


And that is the last thing I see until I start the osd again

2014-11-18 11:10:34.573361 7fdf51499880  0 ceph version 0.88 (4be687bf4480474117f56c387febc75c904036be), process ceph-osd, pid 2865
2014-11-18 11:10:34.576604 7fdf51499880  5 basedir /var/lib/ceph/osd/ceph-26
2014-11-18 11:10:34.576631 7fdf51499880 10 mount fsid is f000cf9e-06ff-4e01-a5e7-12803c1e223b
2014-11-18 11:10:34.693630 7fdf51499880 20 (init)genericobjectmap: seq is 4601
2014-11-18 11:10:34.693780 7fdf51499880  5 umount /var/lib/ceph/osd/ceph-26
2014-11-18 11:10:34.704819 7fdf51499880  5 test_mount basedir /var/lib/ceph/osd/ceph-26
2014-11-18 11:10:34.705040 7fdf51499880  5 basedir /var/lib/ceph/osd/ceph-26
2014-11-18 11:10:34.705059 7fdf51499880 10 mount fsid is f000cf9e-06ff-4e01-a5e7-12803c1e223b
2014-11-18 11:10:34.867810 7fdf51499880 20 (init)genericobjectmap: seq is 4601
2014-11-18 11:10:34.867937 7fdf51499880 15 read meta/23c2fcde/osd_superblock/0//-1 0~0
2014-11-18 11:10:34.868012 7fdf51499880 10 lookup_strip_header done  cid meta oid 23c2fcde/osd_superblock/0//-1
2014-11-18 11:10:34.868022 7fdf51499880 10 file_to_extents done 
2014-11-18 11:10:34.868092 7fdf51499880 10 _generic_read meta/23c2fcde/osd_superblock/0//-1 0~414/414 r = 0
2014-11-18 11:10:34.868119 7fdf51499880 10 existscollection: meta object: 16ef7597/infos/head//-1
2014-11-18 11:10:34.868169 7fdf51499880 10 lookup_strip_header done  cid meta oid 16ef7597/infos/head//-1
2014-11-18 11:10:34.868176 7fdf51499880 10 existscollection: meta object: a468ec03/snapmapper/0//-1
2014-11-18 11:10:34.868235 7fdf51499880 10 lookup_strip_header done  cid meta oid a468ec03/snapmapper/0//-1
2014-11-18 11:10:34.868959 7fdf51499880  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2014-11-18 11:10:34.875142 7fdf51499880 15 read meta/ac96dad5/osdmap.270/0//-1 0~0
2014-11-18 11:10:34.875221 7fdf51499880 10 lookup_strip_header done  cid meta oid ac96dad5/osdmap.270/0//-1
2014-11-18 11:10:34.875229 7fdf51499880 10 file_to_extents done 
...

Do you need other output/statistics ?

Actions #3

Updated by Haomai Wang over 9 years ago

Thankyou, I have started a 3 osds keyvaluestore cluster to do benchmark and try to trigger crash

Actions #4

Updated by Samuel Just over 9 years ago

  • Assignee set to Haomai Wang
Actions #5

Updated by Haomai Wang over 9 years ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Haomai Wang over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF