Project

General

Profile

Actions

Bug #51533

closed

mon: return -EINVAL when handling unknown option in 'ceph osd pool get'

Added by Cuicui Zhao almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
octopus, pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Fix a monitor crash when client call "ceph osd pool get MyTestPoolName InvalidVariable" with API, due to improper variable parsing in monitor.
A client can get all the monitors down in this way and thus the whole cluster is unavailable.
Found in Luminous/Nautilus and master branch. Should be merged to LTS.
It can't be reproduced if run command line "ceph osd pool get MyTestPoolName AInvalidVariable" directly as it's checked and rejected in client side.

A PR can be found as follows.
https://github.com/ceph/ceph/pull/42179

What we see
In a master branch (17) ceph cluster with 3 monitors, we found all the monitors down with the log as follows.
The same result as in Luminous and Nautilus.
It seems that in processing "ceph osd pool get mytestpoolname variable", when variable is invalid, the monitor will crash.

-18> 2021-07-05T06:43:29.115+0000 7f3d3410c700  1 -- [v2:10.0.0.1:40236/0,v1:10.0.0.1:40237/0] --> 10.0.0.1:0/2820881513 -- osd_map(54..54 src has 1..54) v4 -- 0x5599b311a8c0 con 0x5599b2e4e000
   -17> 2021-07-05T06:43:29.118+0000 7f3d3410c700  1 -- [v2:10.0.0.1:40236/0,v1:10.0.0.1:40237/0] <== client.? 10.0.0.1:0/2820881513 5 ==== mon_command({"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"} v 0) v1 ==== 105+0+0 (secure 0 0 0) 0x5599b2fea000 con 0x5599b2e4e000
   -16> 2021-07-05T06:43:29.118+0000 7f3d3410c700 20 mon.a@0(leader) e1 _ms_dispatch existing session 0x5599b2e2c900 for client.?
   -15> 2021-07-05T06:43:29.118+0000 7f3d3410c700 20 mon.a@0(leader) e1  entity_name client.admin global_id 4170 (new_ok) caps allow *
   -14> 2021-07-05T06:43:29.118+0000 7f3d3410c700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"} v 0) v1
   -13> 2021-07-05T06:43:29.118+0000 7f3d3410c700 20 is_capable service=osd command=osd pool get read addr 10.0.0.1:0/2820881513 on cap allow *
   -12> 2021-07-05T06:43:29.118+0000 7f3d3410c700 20  allow so far , doing grant allow *
   -11> 2021-07-05T06:43:29.118+0000 7f3d3410c700 20  allow all
   -10> 2021-07-05T06:43:29.118+0000 7f3d3410c700 10 mon.a@0(leader) e1 _allowed_command capable
    -9> 2021-07-05T06:43:29.118+0000 7f3d3410c700  0 log_channel(audit) log [DBG] : from='client.? 10.0.0.1:0/2820881513' entity='client.admin' cmd=[{"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"}]: dispatch
    -8> 2021-07-05T06:43:29.118+0000 7f3d3410c700 10 log_client _send_to_mon log to self
    -7> 2021-07-05T06:43:29.118+0000 7f3d3410c700 10 log_client  log_queue is 3 last_log 224 sent 223 num 3 unsent 1 sending 1
    -6> 2021-07-05T06:43:29.118+0000 7f3d3410c700 10 log_client  will send 2021-07-05T06:43:29.118728+0000 mon.a (mon.0) 224 : audit [DBG] from='client.? 10.0.0.1:0/2820881513' entity='client.admin' cmd=[{"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"}]: dispatch
    -5> 2021-07-05T06:43:29.118+0000 7f3d3410c700  1 -- [v2:10.0.0.1:40236/0,v1:10.0.0.1:40237/0] --> [v2:10.0.0.1:40236/0,v1:10.0.0.1:40237/0] -- log(1 entries from seq 224 at 2021-07-05T06:43:29.118728+0000) v1 -- 0x5599b311bc00 con 0x5599b2964000
    -4> 2021-07-05T06:43:29.118+0000 7f3d3410c700 10 mon.a@0(leader).paxosservice(osdmap 1..54) dispatch 0x5599b2fea000 mon_command({"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"} v 0) v1 from client.? 10.0.0.1:0/2820881513 con 0x5599b2e4e000
    -3> 2021-07-05T06:43:29.118+0000 7f3d3410c700  5 mon.a@0(leader).paxos(paxos active c 1..475) is_readable = 1 - now=2021-07-05T06:43:29.118900+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 475
    -2> 2021-07-05T06:43:29.118+0000 7f3d3410c700 10 mon.a@0(leader).osd e54 preprocess_query mon_command({"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"} v 0) v1 from client.? 10.0.0.1:0/2820881513
    -1> 2021-07-05T06:43:29.133+0000 7f3d3410c700 -1 ../src/mon/OSDMonitor.cc: In function 'bool OSDMonitor::preprocess_command(MonOpRequestRef)' thread 7f3d3410c700 time 2021-07-05T06:43:29.119200+0000
../src/mon/OSDMonitor.cc: 6196: FAILED ceph_assert(i != ALL_CHOICES.end())

 ceph version 17.0.0-5658-gc42712dd180 (c42712dd18032d9651e2ca5f6fb9ae5a078378df) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1aa) [0x7f3d4267d9a2]
 2: /work/ceph/ceph8/build/lib/libceph-common.so.2(+0x1665c24) [0x7f3d4267dc24]
 3: (OSDMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0x79e1) [0x5599aa7eb127]
 4: (OSDMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x240) [0x5599aa7c2dec]
 5: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x99d) [0x5599aa7a263f]
 6: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x28ac) [0x5599aa4ca478]
 7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xb67) [0x5599aa4d63cf]
 8: (Monitor::_ms_dispatch(Message*)+0xfd0) [0x5599aa4d5502]
 9: (Monitor::ms_dispatch(Message*)+0x4d) [0x5599aa51a671]
 10: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x5599aa50d076]
 11: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7f3d42875d9b]
 12: (DispatchQueue::entry()+0x61d) [0x7f3d428747e7]
 13: (DispatchQueue::DispatchThread::entry()+0x1c) [0x7f3d429f9dbe]
 14: (Thread::entry_wrapper()+0x83) [0x7f3d4263dfa7]
 15: (Thread::_entry_func(void*)+0x18) [0x7f3d4263df1a]
 16: /lib64/libpthread.so.0(+0x814a) [0x7f3d3ed7e14a]
 17: clone()

How to reproduce
A simple python script as follows is enough which is equivalent to "ceph osd pool get mytestpoolname".
It's a invalid request and lacks variable, and the monitor should have reply with error message rather than crash.
run "python3 the_script_as_follows.py" to trigger monitor crash.
It can't be reproduced if run "ceph osd pool get mytestpoolname" directly as it's checked and rejected in client side.

import json
import rados

c = rados.Rados(conffile='/etc/ceph/ceph.conf')
c.connect()
cmd = json.dumps({"prefix": "osd pool get", "pool": "mytestpoolname", "format": "json"})
print (c.mon_command(cmd, b''))

Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #51555: octopus: mon: return -EINVAL when handling unknown option in 'ceph osd pool get'ResolvedCory SnyderActions
Copied to RADOS - Backport #51556: pacific: mon: return -EINVAL when handling unknown option in 'ceph osd pool get'ResolvedCory SnyderActions
Actions #1

Updated by Kefu Chai almost 3 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Kefu Chai almost 3 years ago

  • Backport set to octopus, pacific
Actions #3

Updated by Kefu Chai almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #51555: octopus: mon: return -EINVAL when handling unknown option in 'ceph osd pool get' added
Actions #5

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #51556: pacific: mon: return -EINVAL when handling unknown option in 'ceph osd pool get' added
Actions #6

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF