Project

General

Profile

Bug #23807

Monitor mon_daemon_bytes and mon_client_bytes throttle full after monitor bootstrap

Added by Xuehan Xu about 1 year ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/20/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Recently, we added a monitor to one of our existing clusters, after which one monitor's "mon_daemon_bytes" and "mon_client_bytes" throttles are full. After debugging, we found that the direct reason for this is that the problematic monitor is a peon, and the leader doesn't "ack" any of its forwarded requests. Further debugging indicates that this is due to an very large "in_seq_acked" in the leader, just as the following gdb result shows.

(gdb) $1 = (Pipe) {
  <RefCountedObject> = {
    _vptr.RefCountedObject = 0xc38090 <vtable for Pipe+16>,
    nref = {
      val = 2
    },
    cct = 0x7f272485b100
  },
  members of Pipe:
  reader_thread = {
    <Thread> = {
      _vptr.Thread = 0xc38010 <vtable for Pipe::Reader+16>,
      thread_id = 139804361803520,
      pid = 698632,
      ioprio_class = -1,
      ioprio_priority = -1
    },
    members of Pipe::Reader:
    pipe = 0x7f26e5e0c000
  },
  writer_thread = {
    <Thread> = {
      _vptr.Thread = 0xc38050 <vtable for Pipe::Writer+16>,
      thread_id = 139804372301568,
      pid = 1287458,
      ioprio_class = -1,
      ioprio_priority = -1
    },
    members of Pipe::Writer:
    pipe = 0x7f26e5e0c000
  },
  delay_thread = 0x0,
  msgr = 0x7f26df0c8600,
  conn_id = 2,
  recv_buf = 0x7f26df1d6000 "\b{a\004\005",
  recv_max_prefetch = 4096,
  recv_ofs = 164,
  recv_len = 164,
  sd = 15,
  msgvec = {{
      iov_base = 0x7f25f471001e,
      iov_len = 36
    }, {
      iov_base = 0x7f26ea33f000,
      iov_len = 484
    },
    .........
    {
      iov_base = 0x7f25f471035a,
      iov_len = 36
    }, {
        __mutex = 0x7f26e5e10160,
        __nwaiters = 2,
        __broadcast_seq = 1197734778
      },
      __size = "\000\000\000\000\365\366ǎ{\373cG\000\000\000\000z\373cG\000\000\000\000z\373cG\000\000\000\000`\001\341\345&\177\000\000\002\000\000\000z\373cG",
      __align = -8158280667830943744
    },
    waiter_mutex = 0x7f26e5e10150
  },
  send_keepalive = false,
  send_keepalive_ack = false,
  keepalive_ack_stamp = {
    tv = {
      tv_sec = 0,
      tv_nsec = 0
    }
  },
  halt_delivery = false,
  connect_seq = 1,
  peer_global_seq = 26443177,
  out_seq = 84173179,
  ient_bytes n_seq = 4015527,
  in_seq_acked = 740356360
}

Analysis of the log during the monitor adjustment shows that when a new monitor is added, the leader found its connection with the problematic one had failed and initiated reconnect, just as the following log shows.

2018-04-09 10:17:46.724220 7f26bd52c700  0 -- 10.211.83.147:6789/0 >> 10.211.85.146:6789/0 pipe(0x7f26e5e0c000 sd=15 :60628 s=2 pgs=7626694 cs=1 l=0 c=0x7f272481d580).fault, initiating reconnect
2018-04-09 10:17:47.030728 7f26bdf2f700  0 -- 10.211.83.147:6789/0 >> 10.211.85.146:6789/0 pipe(0x7f26e5e0c000 sd=15 :58156 s=1 pgs=7626694 cs=2 l=0 c=0x7f272481d580).connect got RESETSESSION

The problematic monitor received the connect request, however, the following log shows that the existing->connect_seq at the time was 0, which indicates that the existing pipe is also newly created. Further, the following log also shows that the problematic monitor had just started "bootstrap" operation at the time, which would reap all existing pipes and try to probe other monitors. And the probing phase would create new pipes.

2018-04-09 10:17:46.715816 7f8c73dc4700  0 mon.rg1-ceph06@2(peon) e3  my rank is now 3 (was 2)
2018-04-09 10:17:46.751760 7f8c6d49d700  0 -- 10.211.85.146:6789/0 >> 10.211.83.174:6789/0 pipe(0x7f8b613f4000 sd=868 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f8b612b3100).accept connect_seq 2 vs existing 0 state connecting
2018-04-09 10:17:46.751792 7f8c6d49d700  0 -- 10.211.85.146:6789/0 >> 10.211.83.174:6789/0 pipe(0x7f8b613f4000 sd=868 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f8b612b3100).accept we reset (peer sent cseq 2, 0x7f85d122f000.cseq = 0), sending RESETSESSION

Further analysis of the source code, we found that the reap of pipes would discard their out_q, which would make the "discard_requeued_up_to" operation in Pipe::accept unable to increase its out_seq to the in_seq of the connecting side. So we think that this should be the root cause.

History

#1 Updated by Kefu Chai about 1 year ago

  • Category set to msgr
  • Status changed from New to Need Review
  • Assignee set to Xuehan Xu

#3 Updated by Nathan Cutler about 1 year ago

  • Status changed from Need Review to Resolved

#4 Updated by Greg Farnum 4 months ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)

Also available in: Atom PDF