Project

General

Profile

Actions

Bug #22949

closed

ceph_test_admin_socket_output --all times out

Added by Kefu Chai about 6 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Actions #1

Updated by Kefu Chai about 6 years ago

  • Assignee changed from Kefu Chai to Brad Hubbard

Brad, i am not able to reproduce this issue. could you help take a look?

Actions #2

Updated by Brad Hubbard about 6 years ago

Sure mate, added a patch to get better debugging and will test as soon as it's built.

Actions #3

Updated by Brad Hubbard about 6 years ago

  • Category deleted (Tests)

This is not a problem with the test (although it highlights a deficiency with error reporting which I'll submit a PR for). It is timing out because the monitor is aborting and not responding.

The issue can be reproduced using the following minimal test case.

#include <iostream>
#include <string>

#include "common/admin_socket_client.h" 

int main(int argc, char **argv)
{
  AdminSocketClient client(argv[1]);
  std::string response;
  std::string err = client.do_request("{\"prefix\":\"add_bootstrap_peer_hint\"}", &response);
  if (!err.empty()) {
    std::cerr << __func__  << " AdminSocketClient::do_request errored with: " 
      << err << std::endl;
    return 1;
  }
  std::cout << response << '\n';
  return 0;
}

Once we run this we see the following output.

$ ./tmp /tmp/ceph-asok.niTDiy/mon.a.asok
main AdminSocketClient::do_request errored with: safe_read(3) failed to read message size: (33) Numerical argument out of domain

The following stack trace is seen in the mon log.

    -2> 2018-02-10 16:10:31.633 7fc5bde5e700 10 mon.a@0(leader).log v20  logging 2018-02-10 16:10:31.634350 mon.a mon.0 127.0.0.1:40273/0 94 : audit [INF] from='admin socket' entity='admin socket' cmd='add_bootstrap_peer_hint' args=[]:        ⤷ dispatch
    -1> 2018-02-10 16:10:31.633 7fc5bde5e700 10 mon.a@0(leader).paxosservice(logm 1..20)  setting proposal_timer 0x557d02a7c6c0 with delay of 0.05
     0> 2018-02-10 16:10:31.634 7fc5c36c5700 -1 *** Caught signal (Aborted) **
 in thread 7fc5c36c5700 thread_name:admin_socket

 ceph version 13.0.1-1838-g261dd057c8 (261dd057c855dccab29221fa7b9cc709dcbdec35) mimic (dev)
 1: (()+0x443440) [0x557cff647440]
 2: (()+0x12af0) [0x7fc5c8d6daf0]
 3: (gsignal()+0xcb) [0x7fc5c5da866b]
 4: (abort()+0x141) [0x7fc5c5daa381]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7fc5c6752025]
 6: (()+0x8fc16) [0x7fc5c674fc16]
 7: (()+0x8eb19) [0x7fc5c674eb19]
 8: (__gxx_personality_v0()+0x328) [0x7fc5c674f508]
 9: (()+0xfee3) [0x7fc5c6163ee3]
 10: (_Unwind_Resume()+0x11e) [0x7fc5c616470e]
 11: (AdminSocket::do_accept()+0x1df2) [0x7fc5c9a03422]
 12: (AdminSocket::entry()+0x288) [0x7fc5c9a03968]
 13: (()+0xbbfef) [0x7fc5c677bfef]
 14: (()+0x761b) [0x7fc5c8d6261b]
 15: (clone()+0x3f) [0x7fc5c5e8898f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I've narrowed this down to https://github.com/adamemerson/ceph/commit/2a8be5510ee313a41b2d853134a7afc699901ecd so the solution is not to merge that commit in it's present form :P

Actions #4

Updated by Brad Hubbard about 6 years ago

  • Status changed from New to 12
Actions #5

Updated by Kefu Chai about 6 years ago

  • Assignee deleted (Brad Hubbard)

thanks Brad. my bad, i thought the bug was in master also. closing this ticket, as the related PR is not yet merged.

Actions #6

Updated by Kefu Chai about 6 years ago

  • Status changed from 12 to Rejected
Actions

Also available in: Atom PDF