Bug #47179
closedAdminSocket::do_accept() terminate called after throwing an instance of 'std::out_of_range'
0%
Description
2020-08-27T12:28:49.073 INFO:teuthology.orchestra.run.smithi072.stdout:Sending commands to client socket 2020-08-27T12:28:49.074 INFO:teuthology.orchestra.run.smithi072.stdout:Sending command "cache erase" 2020-08-27T12:28:49.081 INFO:tasks.rgw.client.0.smithi072.stdout:terminate called after throwing an instance of 'std::out_of_range' 2020-08-27T12:28:49.081 INFO:tasks.rgw.client.0.smithi072.stdout: what(): map::at 2020-08-27T12:28:49.081 INFO:tasks.rgw.client.0.smithi072.stdout:*** Caught signal (Aborted) ** 2020-08-27T12:28:49.082 INFO:tasks.rgw.client.0.smithi072.stdout: in thread 7f006bfff700 thread_name:admin_socket 2020-08-27T12:28:49.082 INFO:tasks.rgw.client.0.smithi072.stdout: ceph version 16.0.0-4708-gfb44c2f (fb44c2f5009f8e2e18663251149c3bf6b06256a9) pacific (dev) 2020-08-27T12:28:49.082 INFO:tasks.rgw.client.0.smithi072.stdout: 1: (()+0x3efd0) [0x7f008d6bffd0] 2020-08-27T12:28:49.083 INFO:tasks.rgw.client.0.smithi072.stdout: 2: (gsignal()+0xc7) [0x7f008d6bff47] 2020-08-27T12:28:49.083 INFO:tasks.rgw.client.0.smithi072.stdout: 3: (abort()+0x141) [0x7f008d6c18b1] 2020-08-27T12:28:49.083 INFO:tasks.rgw.client.0.smithi072.stdout: 4: (()+0x8c957) [0x7f00827c8957] 2020-08-27T12:28:49.083 INFO:tasks.rgw.client.0.smithi072.stdout: 5: (()+0x92ae6) [0x7f00827ceae6] 2020-08-27T12:28:49.083 INFO:tasks.rgw.client.0.smithi072.stdout: 6: (()+0x91b49) [0x7f00827cdb49] 2020-08-27T12:28:49.084 INFO:tasks.rgw.client.0.smithi072.stdout: 7: (__gxx_personality_v0()+0x2a8) [0x7f00827ce4b8] 2020-08-27T12:28:49.084 INFO:tasks.rgw.client.0.smithi072.stdout: 8: (()+0x10573) [0x7f0082196573] 2020-08-27T12:28:49.084 INFO:tasks.rgw.client.0.smithi072.stdout: 9: (_Unwind_Resume()+0x125) [0x7f0082196df5] 2020-08-27T12:28:49.085 INFO:tasks.rgw.client.0.smithi072.stdout: 10: (AdminSocket::do_accept()+0xecc) [0x7f0083ff130c] 2020-08-27T12:28:49.085 INFO:tasks.rgw.client.0.smithi072.stdout: 11: (AdminSocket::entry()+0x328) [0x7f0083ff1868] 2020-08-27T12:28:49.085 INFO:tasks.rgw.client.0.smithi072.stdout: 12: (()+0xbd6df) [0x7f00827f96df] 2020-08-27T12:28:49.085 INFO:tasks.rgw.client.0.smithi072.stdout: 13: (()+0x76db) [0x7f0082f976db] 2020-08-27T12:28:49.085 INFO:tasks.rgw.client.0.smithi072.stdout: 14: (clone()+0x3f) [0x7f008d7a2a3f]
rados/singleton-nomsgr/{all/admin_socket_output rados supported-random-distro$/{ubuntu_latest}}
/a/teuthology-2020-08-26_07:01:02-rados-master-distro-basic-smithi/5377107
/a/kchai-2020-08-27_10:01:54-rados-wip-kefu-testing-2020-08-27-1255-distro-basic-smithi/5379955
Updated by Brad Hubbard over 3 years ago
Ah, this stack is coming from an instance of radosgw. I wondered about the call to clone since ceph_test_admin_socket_output is single threaded.
The stack from ceph_test_admin_socket_output is further down and looks like this.
2020-08-27T12:28:49.199 INFO:teuthology.orchestra.run.smithi072.stderr:run_command AdminSocketClient::do_request errored with: safe_read(3) failed to read message size: (33) Numerical argument out of domain 2020-08-27T12:28:49.199 INFO:teuthology.orchestra.run.smithi072.stderr:/build/ceph-16.0.0-4708-gfb44c2f/src/test/admin_socket_output.cc: In function 'std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > AdminSocketOutput::run_command(AdminSocketClient&, const string&, bool)' thread 7fd5cd138c00 time 2020-08-27T12:28:49.201692+0000 2020-08-27T12:28:49.200 INFO:teuthology.orchestra.run.smithi072.stderr:/build/ceph-16.0.0-4708-gfb44c2f/src/test/admin_socket_output.cc: 119: ceph_abort_msg("abort() called") 2020-08-27T12:28:49.200 INFO:teuthology.orchestra.run.smithi072.stderr: ceph version 16.0.0-4708-gfb44c2f (fb44c2f5009f8e2e18663251149c3bf6b06256a9) pacific (dev) 2020-08-27T12:28:49.200 INFO:teuthology.orchestra.run.smithi072.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7fd5c42ec477] 2020-08-27T12:28:49.200 INFO:teuthology.orchestra.run.smithi072.stderr: 2: (AdminSocketOutput::run_command(AdminSocketClient&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0x3a0) [0x5612e7362a10] 2020-08-27T12:28:49.200 INFO:teuthology.orchestra.run.smithi072.stderr: 3: (AdminSocketOutput::gather_socket_output()+0x3fa) [0x5612e736494a] 2020-08-27T12:28:49.201 INFO:teuthology.orchestra.run.smithi072.stderr: 4: (AdminSocketOutput::exec()+0x88) [0x5612e7364fd8] 2020-08-27T12:28:49.201 INFO:teuthology.orchestra.run.smithi072.stderr: 5: (main()+0x833) [0x5612e735cd93] 2020-08-27T12:28:49.201 INFO:teuthology.orchestra.run.smithi072.stderr: 6: (__libc_start_main()+0xe7) [0x7fd5c34f8b97] 2020-08-27T12:28:49.201 INFO:teuthology.orchestra.run.smithi072.stderr: 7: (_start()+0x2a) [0x5612e735d6ba]
Looking at the the rgw log it looks like this.
-23> 2020-08-27T12:28:48.975+0000 7f00627fc700 10 monclient: tick -22> 2020-08-27T12:28:48.987+0000 7f0063fff700 10 monclient: _renew_subs -21> 2020-08-27T12:28:48.987+0000 7f0063fff700 10 monclient: _send_mon_message to mon.a at v2:172.21.15.72:3300/0 -20> 2020-08-27T12:28:48.991+0000 7eff6effd700 1 ====== starting new request req=0x7eff6eff5040 ===== -19> 2020-08-27T12:28:48.991+0000 7eff6effd700 2 req 1 0s initializing for trans_id = tx000000000000000000001-005f47a700-1098-default -18> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s getting op 0 -17> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets verifying requester -16> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets normalizing buckets and tenants -15> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets init permissions -14> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets recalculating target -13> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets reading permissions -12> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets init op -11> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets verifying op mask -10> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets verifying op permissions -9> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets verifying op params -8> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets pre-executing -7> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets executing -6> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets completing -5> 2020-08-27T12:28:48.995+0000 7eff6effd700 5 nothing to log for operation -4> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets op status=0 -3> 2020-08-27T12:28:48.995+0000 7eff6effd700 2 req 1 0.004000000s s3:list_buckets http status=200 -2> 2020-08-27T12:28:48.995+0000 7eff6effd700 1 ====== req done req=0x7eff6eff5040 op status=0 http_status=200 latency=0.004000000s ====== -1> 2020-08-27T12:28:48.995+0000 7eff6effd700 1 civetweb: 0x7f007829a010: 172.21.15.72 - - [27/Aug/2020:12:28:48 +0000] "GET / HTTP/1.1" 200 413 - curl/7.58.0 0> 2020-08-27T12:28:49.079+0000 7f006bfff700 -1 *** Caught signal (Aborted) ** in thread 7f006bfff700 thread_name:admin_socket ceph version 16.0.0-4708-gfb44c2f (fb44c2f5009f8e2e18663251149c3bf6b06256a9) pacific (dev) 1: (()+0x3efd0) [0x7f008d6bffd0] 2: (gsignal()+0xc7) [0x7f008d6bff47] 3: (abort()+0x141) [0x7f008d6c18b1] 4: (()+0x8c957) [0x7f00827c8957] 5: (()+0x92ae6) [0x7f00827ceae6] 6: (()+0x91b49) [0x7f00827cdb49] 7: (__gxx_personality_v0()+0x2a8) [0x7f00827ce4b8] 8: (()+0x10573) [0x7f0082196573] 9: (_Unwind_Resume()+0x125) [0x7f0082196df5] 10: (AdminSocket::do_accept()+0xecc) [0x7f0083ff130c] 11: (AdminSocket::entry()+0x328) [0x7f0083ff1868] 12: (()+0xbd6df) [0x7f00827f96df] 13: (()+0x76db) [0x7f0082f976db] 14: (clone()+0x3f) [0x7f008d7a2a3f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
So ceph_test_admin_socket_output sent the first admin socket command "cache erase" to the radosgw and that killed the daemon and then ceph_test_admin_socket_output crashed due to the error from 'safe_read' (due to the socket being disconnected). So ceph_test_admin_socket_output was doing it's job.
The crash seems to be specific to radosgw (but in 'common' code) and I can reproduce it locally with that daemon but not the others. Will investigate further.
Updated by Brad Hubbard over 3 years ago
- Pull request ID set to 36938
This is happening due to the code for "cache erase" and "cache inspect" not checking there is a 'target' argument in the command before trying to use it. I have submitted a patch.
Updated by Brad Hubbard over 3 years ago
- Status changed from New to Fix Under Review
Updated by Brad Hubbard over 3 years ago
- Related to Bug #46985: common: validate type CephBool cause 'invalid command json' added
Updated by Brad Hubbard over 3 years ago
Updated by Kefu Chai over 3 years ago
- Pull request ID changed from 36938 to 37098
Updated by Kefu Chai over 3 years ago
- Backport set to octopus, nautilus
- Regression changed from No to Yes
Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Kefu Chai over 3 years ago
please assign the backport to me if the backport script does not work for the related PR.
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47465: nautilus: AdminSocket::do_accept() terminate called after throwing an instance of 'std::out_of_range' added
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47466: octopus: AdminSocket::do_accept() terminate called after throwing an instance of 'std::out_of_range' added
Updated by Konstantin Shalygin over 1 year ago
- Status changed from Pending Backport to Resolved