Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2018-11-01T23:47:36ZCeph
Redmine Ceph - Bug #36679 (Resolved): "log last" command skips latest entryhttps://tracker.ceph.com/issues/366792018-11-01T23:47:36ZJohn Sprayjcspray@gmail.com
<p>This showed up for me as a failure in cephtool/test.sh when run locally, even though it's passing in teuthology.</p>
<p>I looked at the teuthology logs from a recent master run, and it seems like there the test log message is somehow getting written twice (on a 3 mon cluster, there are 6 instances of it in the ceph.log.gz), which would explain why the test is passing anyway.</p> RADOS - Bug #36405 (Resolved): unittest_seastar_messenger failure on ARMhttps://tracker.ceph.com/issues/364052018-10-11T15:14:58ZJohn Sprayjcspray@gmail.com
<p>We often ignore these failures, but when I looked at the log I realised it's actually a recently added test that's failing, so probably a genuine bug?</p>
<p><a class="external" href="https://jenkins.ceph.com/job/ceph-pull-requests-arm64/24113/console">https://jenkins.ceph.com/job/ceph-pull-requests-arm64/24113/console</a></p>
<pre>
The following tests FAILED:
157 - unittest_seastar_messenger (SEGFAULT)
</pre> RADOS - Bug #36358 (Resolved): Interactive mode CLI prints no output since Mimichttps://tracker.ceph.com/issues/363582018-10-09T13:34:04ZJohn Sprayjcspray@gmail.com
<p>The polling command stuff (for iostat) changed the path for printing output, and now you just don't get anything when running an interactive session.</p> CephFS - Bug #36028 (Resolved): "ceph fs add_data_pool" applies pool application metadata incorre...https://tracker.ceph.com/issues/360282018-09-17T07:20:39ZJohn Sprayjcspray@gmail.com
<p>From mailing list thread "[ceph-users] CephFS "authorize" on erasure-coded FS"</p> mgr - Bug #35985 (Resolved): deadlock in standby ceph-mgr daemonshttps://tracker.ceph.com/issues/359852018-09-14T11:20:06ZJohn Sprayjcspray@gmail.com
<p>From "[ceph-users] Standby mgr stopped sending beacons after upgrade to 12.2.8"</p>
<pre>
Thread 11 (Thread 0x7fc30888d700 (LWP 224053)):
#0 0x00007fc30f2e0afb in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007fc30f2e0b8f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007fc30f2e0c2b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x00007fc311275735 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#4 0x00007fc311241296 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#5 0x00007fc31127942e in lock_PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#6 0x00007fc311248cf0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#8 0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9 0x00007fc3112486bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fc3112486bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#13 0x00007fc3112486bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#14 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#15 0x00007fc3111d4978 in function_call () from /lib64/libpython2.7.so.1.0
#16 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#17 0x00007fc3111bea55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#18 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#19 0x00007fc311206a87 in slot_tp_init () from /lib64/libpython2.7.so.1.0
#20 0x00007fc31120579f in type_call () from /lib64/libpython2.7.so.1.0
#21 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#22 0x00007fc3112418f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#23 0x000056446a400480 in StandbyPyModule::load (this=0x564475c03420) at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.cc:124
#24 0x000056446a40160f in StandbyPyModules::start_one (this=0x564476345340, module_name="prometheus", pClass=<optimized out>,
pMyThreadState=...) at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.cc:96
#25 0x000056446a405265 in PyModuleRegistry::standby_start (this=this@entry=0x7ffd3bf1e680, monc=monc@entry=0x7ffd3bf1c8c8)
at /usr/src/debug/ceph-12.2.8/src/mgr/PyModuleRegistry.cc:321
#26 0x000056446a41a246 in MgrStandby::handle_mgr_map (this=this@entry=0x7ffd3bf1c8b0, mmap=mmap@entry=0x5644755942c0)
at /usr/src/debug/ceph-12.2.8/src/mgr/MgrStandby.cc:361
#27 0x000056446a41ab04 in MgrStandby::ms_dispatch (this=0x7ffd3bf1c8b0, m=0x5644755942c0)
at /usr/src/debug/ceph-12.2.8/src/mgr/MgrStandby.cc:376
#28 0x000056446a815cb2 in ms_deliver_dispatch (m=0x5644755942c0, this=0x564475332700) at /usr/src/debug/ceph-12.2.8/src/msg/Messenger.h:668
#29 DispatchQueue::entry (this=0x564475332858) at /usr/src/debug/ceph-12.2.8/src/msg/DispatchQueue.cc:197
#30 0x000056446a5ffbed in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-12.2.8/src/msg/DispatchQueue.h:101
#31 0x00007fc30f2dae25 in start_thread () from /lib64/libpthread.so.0
#32 0x00007fc30e3babad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fc2eb7dc700 (LWP 224066)):
#0 0x00007fc30f2de995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000056446a402e1e in Cond::Wait (this=0x564476345558, mutex=...) at /usr/src/debug/ceph-12.2.8/src/common/Cond.h:48
#2 0x000056446a400ab5 in with_config<StandbyPyModule::get_config(const string&, std::string*) const::__lambda8> (
cb=<unknown type in /usr/lib/debug/usr/bin/ceph-mgr.debug, CU 0xc5a7bd, DIE 0xe5b056>, this=0x5644763453d0)
at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.h:76
#3 StandbyPyModule::get_config (this=0x564475c036c0, key="ceph06/server_addr", value=value@entry=0x7fc2eb7d9ef0)
at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.cc:186
#4 0x000056446a413a91 in ceph_config_get (self=0x7fc2ec096bd8, args=<optimized out>)
at /usr/src/debug/ceph-12.2.8/src/mgr/BaseMgrStandbyModule.cc:73
#5 0x00007fc311248cf0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#7 0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9 0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fc3111d4978 in function_call () from /lib64/libpython2.7.so.1.0
#12 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007fc3111bea55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#14 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x00007fc3111afb45 in call_function_tail () from /lib64/libpython2.7.so.1.0
#16 0x00007fc3111afe7b in PyObject_CallMethod () from /lib64/libpython2.7.so.1.0
#17 0x000056446a407ffc in PyModuleRunner::serve (this=0x564475c036c0) at /usr/src/debug/ceph-12.2.8/src/mgr/PyModuleRunner.cc:51
#18 0x000056446a40867f in PyModuleRunner::PyModuleRunnerThread::entry (this=0x564475c036f8)
at /usr/src/debug/ceph-12.2.8/src/mgr/PyModuleRunner.cc:112
#19 0x00007fc30f2dae25 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fc30e3babad in clone () from /lib64/libc.so.6
</pre> mgr - Bug #25345 (Resolved): "balancer execute" only requires read permissionshttps://tracker.ceph.com/issues/253452018-08-02T10:42:09ZJohn Sprayjcspray@gmail.commgr - Bug #25197 (Resolved): Can't turn off mgrc stats with mgr_stats_thresholdhttps://tracker.ceph.com/issues/251972018-07-31T16:50:34ZJohn Sprayjcspray@gmail.com
<p>Reported in <a class="external" href="https://tracker.ceph.com/issues/24982">https://tracker.ceph.com/issues/24982</a></p>
<p>The max on this setting is preventing turning stats off entirely.</p> Ceph - Bug #25172 (Resolved): Unnecessarily obscure CLI error on EPERMhttps://tracker.ceph.com/issues/251722018-07-30T15:32:24ZJohn Sprayjcspray@gmail.com
<p>Inspired by thread "[ceph-users] cephfs tell command not working"</p>
<p>If someone has "allow" instead of "allow *" caps, we give them the unnecessarily obscure message "Error EPERM: problem getting command descriptions from mds.0".</p> CephFS - Bug #24780 (Resolved): Some cephfs tool commands silently operate on only rank 0, even i...https://tracker.ceph.com/issues/247802018-07-05T13:28:02ZJohn Sprayjcspray@gmail.com
<p>I think this is biting people, e.g. in thread "[ceph-users] CephFS - How to handle "loaded dup inode" errors"</p>
<p>We made commands like cephfs-journal-tool operate on rank 0 by default, at a time when single MDS systems were the norm.</p>
<p>On a system with many MDS ranks, it is confusing to run this command, see it succeed, but have it only act on one rank. Maybe we should change the default to all ranks, or maybe we should warn the user that they're only acting on one, but the current silent behaviour is a bit awkward.</p>
<p>We could get rid of the default option entirely, as it is also kind of scary that these commands will operate (I think?) on the default filesystem when there are multiple filesystems, which makes it easy for someone to forget they have multiple filesystems, and do a "reset" on a particular rank forgetting which one they're acting on.</p> CephFS - Feature #24604 (Resolved): Implement "cephfs-journal-tool event splice" equivalent for p...https://tracker.ceph.com/issues/246042018-06-21T12:53:59ZJohn Sprayjcspray@gmail.com
<p>cephfs-journal-tool recently got the ability to scan the purge queue via the --journal=purge_queue argument.</p>
<p>However, all the 'event' mode commands are disable for purge queue, because it relies on the actual contained event format (e.g. ENoOp for splice).</p>
<p>When the purge queue scan reports a region of damage, we need a command for splicing out that region.</p> CephFS - Bug #24533 (Resolved): PurgeQueue sometimes ignores Journaler errorshttps://tracker.ceph.com/issues/245332018-06-15T15:33:19ZJohn Sprayjcspray@gmail.com
<p>We check journaler.get_error() in PurgeQueue::_recover, but never later in _consume -- if something like a decode error happens, the MDS may silently stop progressing the queue.</p>
<p>Ticket inspired by thread "[ceph-users] MDS: journaler.pq decode error"</p> RADOS - Bug #24304 (Resolved): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgradehttps://tracker.ceph.com/issues/243042018-05-25T12:29:46ZJohn Sprayjcspray@gmail.com
<pre>
May 25 12:21:06 magna044.ceph.redhat.com ceph-mon[30366]: 2018-05-25 12:21:06.540921 7f8a87f49ec0 -1 mon.magna044@-1(probing).mgrstat failed to decode mgrstat state; luminous dev version?
May 25 12:21:06 magna044.ceph.redhat.com ceph-mon[30366]: 2018-05-25 12:21:06.754346 7f8a79d83700 -1 mon.magna044@0(leader).mgrstat failed to decode mgrstat state; luminous dev version?
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: terminate called after throwing an instance of 'ceph::buffer::malformed_input'
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: what(): buffer::malformed_input: void object_stat_sum_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: *** Caught signal (Aborted) **
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: in thread 7f8a7e58c700 thread_name:ms_dispatch
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: ceph version 12.2.5-15.el7cp (8af5074c84901971d2c7807ba8270b44b5fbc09b) luminous (stable)
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 1: (()+0x8f6621) [0x55d7416b2621]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 2: (()+0xf680) [0x7f8a872fa680]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 3: (gsignal()+0x37) [0x7f8a84635207]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 4: (abort()+0x148) [0x7f8a846368f8]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f8a84f447d5]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 6: (()+0x5e746) [0x7f8a84f42746]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 7: (()+0x5e773) [0x7f8a84f42773]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 8: (()+0x5e993) [0x7f8a84f42993]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 9: (object_stat_sum_t::decode(ceph::buffer::list::iterator&)+0x587) [0x55d7414c3117]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x67) [0x55d7414df317]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 11: (pool_stat_t::decode(ceph::buffer::list::iterator&)+0x5a) [0x55d7414dfeaa]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 12: (PGMapDigest::decode(ceph::buffer::list::iterator&)+0x1cc) [0x55d7412337ec]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 13: (MgrStatMonitor::prepare_report(boost::intrusive_ptr<MonOpRequest>)+0x72) [0x55d7413699b2]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 14: (MgrStatMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0xbf) [0x55d741369def]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 15: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x55d74129dc28]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 16: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x51f) [0x55d74117dc7f]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 17: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55d74117f2fb]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 18: (Monitor::ms_dispatch(Message*)+0x23) [0x55d7411ab463]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 19: (DispatchQueue::entry()+0x792) [0x55d74165d9c2]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x55d741455ccd]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 21: (()+0x7dd5) [0x7f8a872f2dd5]
May 25 12:21:08 magna044.ceph.redhat.com ceph-mon[30366]: 22: (clone()+0x6d) [0x7f8a846fdb3d]
</pre> mgr - Bug #24175 (Resolved): status module output going to stderrhttps://tracker.ceph.com/issues/241752018-05-18T15:45:23ZJohn Sprayjcspray@gmail.commgr - Feature #24013 (New): Handle module config in ceph.confhttps://tracker.ceph.com/issues/240132018-05-04T13:42:47ZJohn Sprayjcspray@gmail.com
<p>Since Mimic, ceph-mgr module configuration is stored in the same central config store on the mons as the rest of Ceph config.</p>
<p>That should make it possible to load the config from ceph.conf too -- I haven't tried this and suspect it might need some tweaks to get it working, but it's probably worth the effort in order to be consistent with the other config options.</p>
<p>If it's easier, I would probably be okay with making this only take options from the ceph.conf on the monitor. There's a reasonable use case for file-based config (writing all the config before starting the cluster), but there's (probably) no really good case for people to configure their mgr modules by writing out to /etc on local mgr nodes.</p> mgr - Feature #24010 (Resolved): Enable "-i" CLI file input to mgr module commandshttps://tracker.ceph.com/issues/240102018-05-04T13:25:55ZJohn Sprayjcspray@gmail.com
<p>Currently, people input things like SSL certificates by doing "ceph config-key set <foo> -i myfile.bin".</p>
<p>The "-i" functionality puts file content into the data part of the Message object. We don't do anything with this in DaemonServer when it receives a message, so mgr modules currently can't implement commands that use the "-i" mode.</p>
<p>This is problematic, because the "right" way to do things like SSL certificate loading is via the mgr module itself, so that they don't have to restart the mgr to pick up the contents of the mon store.</p>