Project

General

Profile

Bug #19427

Updated by Sage Weil about 7 years ago

<pre> 
    -35> 2017-03-29 22:36:36.972567 7f2538e96700 20 osd.2 756 should_share_map osd.1 172.21.15.117:6814/13151 756 
    -34> 2017-03-29 22:36:36.972564 7f2546eb2700    1 -- 172.21.15.117:6808/1991 <== mon.2 172.21.15.117:6790/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 580+0+0 (1929996202 0 0) 0x565217318800 con 0x56521713b3c0 
    -33> 2017-03-29 22:36:36.972571 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] handle_message: 0x565216b7db00 
    -32> 2017-03-29 22:36:36.972575 7f2538e96700 15 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] do_sub_op osd_sub_op(unknown.0.0:0 0.28 MIN [scrub-map] v 0'0 snapset=0=[]:[]) v12 
    -31> 2017-03-29 22:36:36.972579 7f2538e96700    7 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] sub_op_scrub_map 
    -30> 2017-03-29 22:36:36.972582 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing]    got 1 scrub map 
    -29> 2017-03-29 22:36:36.972586 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] map version is 0'0 
    -28> 2017-03-29 22:36:36.972589 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] requeue_scrub: queueing 
    -27> 2017-03-29 22:36:36.972591 7f2538e96700 20 osd.2 op_wq(0) _enqueue 0.28 PGQueueable(PGScrub prio 5 cost 52428800 e756) 
    -26> 2017-03-29 22:36:36.972596 7f2538e96700 10 osd.2 756 dequeue_op 0x565216b7db00 finish 
    -25> 2017-03-29 22:36:36.972600 7f253b69b700 20 osd.2 op_wq(0) _process 0.28 item PGQueueable(PGScrub prio 5 cost 52428800 e756) queued 
    -24> 2017-03-29 22:36:36.972602 7f2538e96700 20 osd.2 op_wq(0) _process empty q, waiting 
    -23> 2017-03-29 22:36:36.972608 7f253b69b700 20 osd.2 op_wq(0) _process 0.28 item PGQueueable(PGScrub prio 5 cost 52428800 e756) pg 0x565215be2000 
    -22> 2017-03-29 22:36:36.972608 7f2546eb2700    1 -- 172.21.15.117:6808/1991 mark_down 0x565217138000 -- 0x56521562b400 
    -21> 2017-03-29 22:36:36.972610 7f253b69b700 20 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub state WAIT_REPLICAS 
    -20> 2017-03-29 22:36:36.972614 7f253b69b700 20 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub state COMPARE_MAPS 
    -19> 2017-03-29 22:36:36.972617 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps has maps, analyzing 
    -18> 2017-03-29 22:36:36.972619 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps    comparing replica scrub maps 
    -17> 2017-03-29 22:36:36.972622 7f253b69b700    2 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps     osd.2 has 0 items 
    -16> 2017-03-29 22:36:36.972625 7f253b69b700    2 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps replica 1 has 0 items 
    -15> 2017-03-29 22:36:36.972629 7f253b69b700    2 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing]  
    -14> 2017-03-29 22:36:36.972630 7f2546eb2700    1 monclient: found mon.c 
    -13> 2017-03-29 22:36:36.972632 7f2546eb2700 10 monclient: _send_mon_message to mon.c at 172.21.15.117:6790/0 
    -12> 2017-03-29 22:36:36.972632 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_snapshot_metadata 
    -11> 2017-03-29 22:36:36.972633 7f2546eb2700    1 -- 172.21.15.117:6808/1991 --> 172.21.15.117:6790/0 -- mon_subscribe({mgrmap=0+,monmap=3+,osd_pg_creates=0+,osdmap=757}) v2 -- ?+0 0x565216867400 con 0x56521713b3c0 
    -10> 2017-03-29 22:36:36.972636 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_snapshot_metadata (scrub) finish 
     -9> 2017-03-29 22:36:36.972639 7f2546eb2700 10 monclient: _send_mon_message to mon.c at 172.21.15.117:6790/0 
     -8> 2017-03-29 22:36:36.972642 7f2546eb2700    1 -- 172.21.15.117:6808/1991 --> 172.21.15.117:6790/0 -- pg_stats(39 pgs tid 177 v 0) v1 -- ?+0 0x565215f23e40 con 0x56521713b3c0 
     -7> 2017-03-29 22:36:36.972640 7f253b69b700 20 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub state FINISH 
     -6> 2017-03-29 22:36:36.972645 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub got 0/0 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 bytes, 0/0 hit_set_archive bytes. 
     -5> 2017-03-29 22:36:36.972654 7f2546eb2700 10 monclient: _send_mon_message to mon.c at 172.21.15.117:6790/0 
     -4> 2017-03-29 22:36:36.972652 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_process_inconsistent: checking authoritative 
     -3> 2017-03-29 22:36:36.972656 7f2546eb2700    1 -- 172.21.15.117:6808/1991 --> 172.21.15.117:6790/0 -- pg_stats(39 pgs tid 178 v 0) v1 -- ?+0 0x5652197f19c0 con 0x56521713b3c0 
     -2> 2017-03-29 22:36:36.972659 7f253b69b700    0 log_channel(cluster) log [INF] : 0.28 scrub ok 
     -1> 2017-03-29 22:36:36.972660 7f2546eb2700 10 log_client    log_queue is 3 last_log 600 sent 596 num 3 unsent 4 sending 4 
      0> 2017-03-29 22:36:36.976280 7f2546eb2700 -1 /build/ceph-12.0.0-2001-gcc51f56/src/common/LogClient.cc: In function 'Message* LogClient::_get_mon_log_message()' thread 7f2546eb2700 time 2017-03-29 22:36:36.972662 
 /build/ceph-12.0.0-2001-gcc51f56/src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size()) 

  ceph version 12.0.0-2001-gcc51f56 (cc51f56bb2c153bf35078eb95311e1d35c118c9f) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x56520b4d28b2] 
  2: (LogClient::_get_mon_log_message()+0x13e5) [0x56520b482d25] 
  3: (LogClient::get_mon_log_message()+0x1f) [0x56520b482ecf] 
  4: (MonClient::send_log()+0x15) [0x56520b4e64a5] 
  5: (MonClient::handle_auth(MAuthReply*)+0x348) [0x56520b4ef488] 
  6: (MonClient::ms_dispatch(Message*)+0x39b) [0x56520b4efa9b] 
  7: (DispatchQueue::entry()+0xf4a) [0x56520b69e94a] 
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x56520b55a33d] 
  9: (()+0x770a) [0x7f25590bd70a] 
  10: (clone()+0x6d) [0x7f255813482d] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 </pre> 

 this was on wip-mgr-commands, but no changes were made the osd or logclient, so I'm guessing this is related to the monclient multiconnection refactor? 

 /a/sage-2017-03-29_21:10:58-rados-wip-mgr-commands---basic-smithi/962237

Back