Project

General

Profile

Actions

Bug #19427

closed

common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())

Added by Sage Weil about 7 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -35> 2017-03-29 22:36:36.972567 7f2538e96700 20 osd.2 756 should_share_map osd.1 172.21.15.117:6814/13151 756
   -34> 2017-03-29 22:36:36.972564 7f2546eb2700  1 -- 172.21.15.117:6808/1991 <== mon.2 172.21.15.117:6790/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 580+0+0 (1929996202 0 0) 0x565217318800 con 0x56521713b3c0
   -33> 2017-03-29 22:36:36.972571 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] handle_message: 0x565216b7db00
   -32> 2017-03-29 22:36:36.972575 7f2538e96700 15 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] do_sub_op osd_sub_op(unknown.0.0:0 0.28 MIN [scrub-map] v 0'0 snapset=0=[]:[]) v12
   -31> 2017-03-29 22:36:36.972579 7f2538e96700  7 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] sub_op_scrub_map
   -30> 2017-03-29 22:36:36.972582 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing]  got 1 scrub map
   -29> 2017-03-29 22:36:36.972586 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] map version is 0'0
   -28> 2017-03-29 22:36:36.972589 7f2538e96700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] requeue_scrub: queueing
   -27> 2017-03-29 22:36:36.972591 7f2538e96700 20 osd.2 op_wq(0) _enqueue 0.28 PGQueueable(PGScrub prio 5 cost 52428800 e756)
   -26> 2017-03-29 22:36:36.972596 7f2538e96700 10 osd.2 756 dequeue_op 0x565216b7db00 finish
   -25> 2017-03-29 22:36:36.972600 7f253b69b700 20 osd.2 op_wq(0) _process 0.28 item PGQueueable(PGScrub prio 5 cost 52428800 e756) queued
   -24> 2017-03-29 22:36:36.972602 7f2538e96700 20 osd.2 op_wq(0) _process empty q, waiting
   -23> 2017-03-29 22:36:36.972608 7f253b69b700 20 osd.2 op_wq(0) _process 0.28 item PGQueueable(PGScrub prio 5 cost 52428800 e756) pg 0x565215be2000
   -22> 2017-03-29 22:36:36.972608 7f2546eb2700  1 -- 172.21.15.117:6808/1991 mark_down 0x565217138000 -- 0x56521562b400
   -21> 2017-03-29 22:36:36.972610 7f253b69b700 20 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub state WAIT_REPLICAS
   -20> 2017-03-29 22:36:36.972614 7f253b69b700 20 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub state COMPARE_MAPS
   -19> 2017-03-29 22:36:36.972617 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps has maps, analyzing
   -18> 2017-03-29 22:36:36.972619 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps  comparing replica scrub maps
   -17> 2017-03-29 22:36:36.972622 7f253b69b700  2 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps   osd.2 has 0 items
   -16> 2017-03-29 22:36:36.972625 7f253b69b700  2 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_compare_maps replica 1 has 0 items
   -15> 2017-03-29 22:36:36.972629 7f253b69b700  2 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] 
   -14> 2017-03-29 22:36:36.972630 7f2546eb2700  1 monclient: found mon.c
   -13> 2017-03-29 22:36:36.972632 7f2546eb2700 10 monclient: _send_mon_message to mon.c at 172.21.15.117:6790/0
   -12> 2017-03-29 22:36:36.972632 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_snapshot_metadata
   -11> 2017-03-29 22:36:36.972633 7f2546eb2700  1 -- 172.21.15.117:6808/1991 --> 172.21.15.117:6790/0 -- mon_subscribe({mgrmap=0+,monmap=3+,osd_pg_creates=0+,osdmap=757}) v2 -- ?+0 0x565216867400 con 0x56521713b3c0
   -10> 2017-03-29 22:36:36.972636 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_snapshot_metadata (scrub) finish
    -9> 2017-03-29 22:36:36.972639 7f2546eb2700 10 monclient: _send_mon_message to mon.c at 172.21.15.117:6790/0
    -8> 2017-03-29 22:36:36.972642 7f2546eb2700  1 -- 172.21.15.117:6808/1991 --> 172.21.15.117:6790/0 -- pg_stats(39 pgs tid 177 v 0) v1 -- ?+0 0x565215f23e40 con 0x56521713b3c0
    -7> 2017-03-29 22:36:36.972640 7f253b69b700 20 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub state FINISH
    -6> 2017-03-29 22:36:36.972645 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub got 0/0 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 bytes, 0/0 hit_set_archive bytes.
    -5> 2017-03-29 22:36:36.972654 7f2546eb2700 10 monclient: _send_mon_message to mon.c at 172.21.15.117:6790/0
    -4> 2017-03-29 22:36:36.972652 7f253b69b700 10 osd.2 pg_epoch: 756 pg[0.28( empty local-les=727 n=0 ec=47 les/c/f 727/727/0 725/725/163) [2,1] r=0 lpr=725 crt=0'0 mlcod 0'0 active+clean+scrubbing] scrub_process_inconsistent: checking authoritative
    -3> 2017-03-29 22:36:36.972656 7f2546eb2700  1 -- 172.21.15.117:6808/1991 --> 172.21.15.117:6790/0 -- pg_stats(39 pgs tid 178 v 0) v1 -- ?+0 0x5652197f19c0 con 0x56521713b3c0
    -2> 2017-03-29 22:36:36.972659 7f253b69b700  0 log_channel(cluster) log [INF] : 0.28 scrub ok
    -1> 2017-03-29 22:36:36.972660 7f2546eb2700 10 log_client  log_queue is 3 last_log 600 sent 596 num 3 unsent 4 sending 4
     0> 2017-03-29 22:36:36.976280 7f2546eb2700 -1 /build/ceph-12.0.0-2001-gcc51f56/src/common/LogClient.cc: In function 'Message* LogClient::_get_mon_log_message()' thread 7f2546eb2700 time 2017-03-29 22:36:36.972662
/build/ceph-12.0.0-2001-gcc51f56/src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())

 ceph version 12.0.0-2001-gcc51f56 (cc51f56bb2c153bf35078eb95311e1d35c118c9f)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x56520b4d28b2]
 2: (LogClient::_get_mon_log_message()+0x13e5) [0x56520b482d25]
 3: (LogClient::get_mon_log_message()+0x1f) [0x56520b482ecf]
 4: (MonClient::send_log()+0x15) [0x56520b4e64a5]
 5: (MonClient::handle_auth(MAuthReply*)+0x348) [0x56520b4ef488]
 6: (MonClient::ms_dispatch(Message*)+0x39b) [0x56520b4efa9b]
 7: (DispatchQueue::entry()+0xf4a) [0x56520b69e94a]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x56520b55a33d]
 9: (()+0x770a) [0x7f25590bd70a]
 10: (clone()+0x6d) [0x7f255813482d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

this was on wip-mgr-commands, but no changes were made the osd or logclient, so I'm guessing this is related to the monclient multiconnection refactor?

/a/sage-2017-03-29_21:10:58-rados-wip-mgr-commands---basic-smithi/962237


Related issues 4 (0 open4 closed)

Has duplicate Ceph - Bug #20680: osd: FAILED assert(num_unsent <= log_queue.size())Closed07/18/2017

Actions
Is duplicate of RADOS - Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())Duplicate12/09/2016

Actions
Copied to Ceph - Backport #19617: jewel: common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())ResolvedAlexey SheplyakovActions
Copied to Ceph - Backport #19618: kraken: common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())ResolvedAlexey SheplyakovActions
Actions

Also available in: Atom PDF