Bug #22949
closedceph_test_admin_socket_output --all times out
0%
Updated by Kefu Chai about 6 years ago
- Assignee changed from Kefu Chai to Brad Hubbard
Brad, i am not able to reproduce this issue. could you help take a look?
Updated by Brad Hubbard about 6 years ago
Sure mate, added a patch to get better debugging and will test as soon as it's built.
Updated by Brad Hubbard about 6 years ago
- Category deleted (
Tests)
This is not a problem with the test (although it highlights a deficiency with error reporting which I'll submit a PR for). It is timing out because the monitor is aborting and not responding.
The issue can be reproduced using the following minimal test case.
#include <iostream> #include <string> #include "common/admin_socket_client.h" int main(int argc, char **argv) { AdminSocketClient client(argv[1]); std::string response; std::string err = client.do_request("{\"prefix\":\"add_bootstrap_peer_hint\"}", &response); if (!err.empty()) { std::cerr << __func__ << " AdminSocketClient::do_request errored with: " << err << std::endl; return 1; } std::cout << response << '\n'; return 0; }
Once we run this we see the following output.
$ ./tmp /tmp/ceph-asok.niTDiy/mon.a.asok main AdminSocketClient::do_request errored with: safe_read(3) failed to read message size: (33) Numerical argument out of domain
The following stack trace is seen in the mon log.
-2> 2018-02-10 16:10:31.633 7fc5bde5e700 10 mon.a@0(leader).log v20 logging 2018-02-10 16:10:31.634350 mon.a mon.0 127.0.0.1:40273/0 94 : audit [INF] from='admin socket' entity='admin socket' cmd='add_bootstrap_peer_hint' args=[]: ⤷ dispatch -1> 2018-02-10 16:10:31.633 7fc5bde5e700 10 mon.a@0(leader).paxosservice(logm 1..20) setting proposal_timer 0x557d02a7c6c0 with delay of 0.05 0> 2018-02-10 16:10:31.634 7fc5c36c5700 -1 *** Caught signal (Aborted) ** in thread 7fc5c36c5700 thread_name:admin_socket ceph version 13.0.1-1838-g261dd057c8 (261dd057c855dccab29221fa7b9cc709dcbdec35) mimic (dev) 1: (()+0x443440) [0x557cff647440] 2: (()+0x12af0) [0x7fc5c8d6daf0] 3: (gsignal()+0xcb) [0x7fc5c5da866b] 4: (abort()+0x141) [0x7fc5c5daa381] 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7fc5c6752025] 6: (()+0x8fc16) [0x7fc5c674fc16] 7: (()+0x8eb19) [0x7fc5c674eb19] 8: (__gxx_personality_v0()+0x328) [0x7fc5c674f508] 9: (()+0xfee3) [0x7fc5c6163ee3] 10: (_Unwind_Resume()+0x11e) [0x7fc5c616470e] 11: (AdminSocket::do_accept()+0x1df2) [0x7fc5c9a03422] 12: (AdminSocket::entry()+0x288) [0x7fc5c9a03968] 13: (()+0xbbfef) [0x7fc5c677bfef] 14: (()+0x761b) [0x7fc5c8d6261b] 15: (clone()+0x3f) [0x7fc5c5e8898f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I've narrowed this down to https://github.com/adamemerson/ceph/commit/2a8be5510ee313a41b2d853134a7afc699901ecd so the solution is not to merge that commit in it's present form :P
Updated by Kefu Chai about 6 years ago
- Assignee deleted (
Brad Hubbard)
thanks Brad. my bad, i thought the bug was in master also. closing this ticket, as the related PR is not yet merged.