Bug #11569
closedmon: "pg ls" is broken
60%
Description
I got handed this:
Backtrace ========= -- begin dump of recent events --- 0> 2015-05-06 07:28:28.359165 7fad29cf5700 -1 *** Caught signal (Aborted) ** in thread 7fad29cf5700 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-mon() [0x9017e2] 2: (()+0xf130) [0x7fad3005b130] 3: (gsignal()+0x37) [0x7fad2ea755d7] 4: (abort()+0x148) [0x7fad2ea76cc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fad2f3799b5] 6: (()+0x5e926) [0x7fad2f377926] 7: (()+0x5e953) [0x7fad2f377953] 8: (()+0x5eb73) [0x7fad2f377b73] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0x7b361a] 10: (PGMap::get_filtered_pg_stats(std::string&, long, long, bool, std::set<pg_t, std::less<pg_t>, std::allocator<pg_t> >&)+0x1d3) [0x885ea3] 11: (PGMonitor::preprocess_command(MMonCommand*)+0x1ccd) [0x66398d] 12: (PGMonitor::preprocess_query(PaxosServiceMessage*)+0x27f) [0x66584f] 13: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x5cacd3] 14: (Monitor::handle_command(MMonCommand*)+0x1549) [0x591b19] 15: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x594c89] 16: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x595936] 17: (Monitor::ms_dispatch(Message*)+0x23) [0x5b5403] 18: (DispatchQueue::entry()+0x64a) [0x8a1d9a] 19: (DispatchQueue::DispatchThread::entry()+0xd) [0x79bd9d] 20: (()+0x7df5) [0x7fad30053df5] 21: (clone()+0x6d) [0x7fad2eb361ad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
The only assert in the get_filtered_pg_stats function is a check that the "type" returned from osd_types::pg_string_state() is a valid one. It looks like pg_string_state() is actually just broken, as it checks for equality of state strings (e.g. "active", "clean") but not for inclusion ("active+clean") — which is how they're most commonly presented. I think this was functionality added for Calamari? But I haven't dug all the way into it so I could be missing something.
Updated by Kefu Chai almost 9 years ago
- Status changed from New to 12
yes, by inspecting the log file,
$ grep "pg ls" logs/*.log logs/ceph-mon.mon1.log:2015-05-06 07:28:30.607083 7fd9237a5700 0 mon.mon1@0(leader) e1 handle_command mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 logs/ceph-mon.mon1.log:2015-05-06 07:28:30.607184 7fd9237a5700 0 log_channel(audit) log [DBG] : from='client.4762 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon1.log: -8> 2015-05-06 07:28:30.606924 7fd9237a5700 1 -- 10.12.27.2:6789/0 <== client.4762 10.12.27.2:0/1024942 4 ==== mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 ==== 102+0+0 (153034552 0 0) 0x6593800 con 0x660ce60 logs/ceph-mon.mon1.log: -7> 2015-05-06 07:28:30.607083 7fd9237a5700 0 mon.mon1@0(leader) e1 handle_command mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 logs/ceph-mon.mon1.log: -6> 2015-05-06 07:28:30.607184 7fd9237a5700 0 log_channel(audit) log [DBG] : from='client.4762 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon1.log: -3> 2015-05-06 07:28:30.607214 7fd9237a5700 10 log_client will send 2015-05-06 07:28:30.607195 mon.0 10.12.27.2:6789/0 2165 : audit [DBG] from='client.4762 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon2.log:2015-05-06 07:28:28.138983 7fad29cf5700 0 mon.mon2@1(peon) e1 handle_command mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 logs/ceph-mon.mon2.log:2015-05-06 07:28:28.139077 7fad29cf5700 0 log_channel(audit) log [DBG] : from='client.? 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon2.log: -8> 2015-05-06 07:28:28.138779 7fad29cf5700 1 -- 10.12.27.3:6789/0 <== client.4762 10.12.27.2:0/1024942 8 ==== mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 ==== 102+0+0 (153034552 0 0) 0x4522800 con 0x425dc20 logs/ceph-mon.mon2.log: -7> 2015-05-06 07:28:28.138983 7fad29cf5700 0 mon.mon2@1(peon) e1 handle_command mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 logs/ceph-mon.mon2.log: -6> 2015-05-06 07:28:28.139077 7fad29cf5700 0 log_channel(audit) log [DBG] : from='client.? 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon2.log: -3> 2015-05-06 07:28:28.139100 7fad29cf5700 10 log_client will send 2015-05-06 07:28:28.139085 mon.1 10.12.27.3:6789/0 245 : audit [DBG] from='client.? 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon3.log:2015-05-06 06:09:14.232803 7fcbea862700 0 mon.mon3@2(peon) e1 handle_command mon_command({"prefix": "pg ls-by-pool", "poolstr": "remapped"} v 0) v1 logs/ceph-mon.mon3.log:2015-05-06 06:09:14.232927 7fcbea862700 0 log_channel(audit) log [DBG] : from='client.? 10.12.27.2:0/1017585' entity='client.admin' cmd=[{"prefix": "pg ls-by-pool", "poolstr": "remapped"}]: dispatch logs/ceph-mon.mon3.log:2015-05-06 06:09:24.683393 7fcbea862700 0 mon.mon3@2(peon) e1 handle_command mon_command({"states": ["remapped"], "prefix": "pg ls-by-pool", "poolstr": "testpool"} v 0) v1 logs/ceph-mon.mon3.log:2015-05-06 06:09:24.683518 7fcbea862700 0 log_channel(audit) log [DBG] : from='client.? 10.12.27.2:0/1017696' entity='client.admin' cmd=[{"states": ["remapped"], "prefix": "pg ls-by-pool", "poolstr": "testpool"}]: dispatch logs/ceph-mon.mon3.log:2015-05-06 07:28:28.359575 7fcbea862700 0 mon.mon3@2(peon) e1 handle_command mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 logs/ceph-mon.mon3.log:2015-05-06 07:28:28.359682 7fcbea862700 0 log_channel(audit) log [DBG] : from='client.4762 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon3.log: -8> 2015-05-06 07:28:28.359428 7fcbea862700 1 -- 10.12.27.5:6789/0 <== client.4762 10.12.27.2:0/1024942 4 ==== mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 ==== 102+0+0 (153034552 0 0) 0x5a6dc00 con 0x7720f20 logs/ceph-mon.mon3.log: -7> 2015-05-06 07:28:28.359575 7fcbea862700 0 mon.mon3@2(peon) e1 handle_command mon_command({"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2} v 0) v1 logs/ceph-mon.mon3.log: -6> 2015-05-06 07:28:28.359682 7fcbea862700 0 log_channel(audit) log [DBG] : from='client.4762 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch logs/ceph-mon.mon3.log: -3> 2015-05-06 07:28:28.359718 7fcbea862700 10 log_client will send 2015-05-06 07:28:28.359695 mon.2 10.12.27.5:6789/0 269 : audit [DBG] from='client.4762 10.12.27.2:0/1024942' entity='client.admin' cmd=[{"states": ["recovery"], "prefix": "pg ls-by-osd", "osd": 2}]: dispatch
and we normalize these "pg ls-by-*" commands to "pg ls". and try to translate the "state" to the internal representation using pg_string_state()
, which brings down the mon if the string is not recognized. in this case, "recovery" is listed in the MonCommands, but not understood by pg_string_state()
.
the root cause of this issue is the inconsistency in between the command spec and the implementation of the command.
Updated by Kefu Chai almost 9 years ago
- Status changed from 12 to Fix Under Review
- % Done changed from 0 to 60
Updated by Greg Farnum almost 9 years ago
Ah, so we're actually only querying on one kind of state at a time, but it was still broken. Coolio.
Updated by Sage Weil almost 9 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Abhishek Lekshmanan almost 9 years ago
hammer-backport: https://github.com/ceph/ceph/pull/5160
Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Resolved