Bug #15235
MDS : erroneous error message about reading config file
0%
Description
Hi,
I have a bug on Infernalis with MDS.
When a MDS is failing and going to standby mode (ceph mds fail X), it crashes with :
global_init: error reading config file.
But it never tries to read config file ! Running with strace shows :
[...]
fcntl(13, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
write(10, "\0", 1) = 1
rt_sigaction(SIGINT, {0x561eb327c360, ~[RTMIN RT_1], SA_RESTORER|SA_RESETHAND, 0x7f68356c58d0}, {SIG_DFL, [], 0}, 8) = 0
pipe([15, 16]) = 0
fcntl(15, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
write(10, "\0", 1) = 1
rt_sigaction(SIGTERM, {0x561eb327c360, ~[RTMIN RT_1], SA_RESTORER|SA_RESETHAND, 0x7f68356c58d0}, {SIG_DFL, [], 0}, 8) = 0
futex(0x7f682f2f79d0, FUTEX_WAIT, 5913, NULLglobal_init: error reading config file.
<unfinished ...>
++ exited with 1 ++
No file is opened before error.
MDS is running with ceph user, who has rights to read /etc/ceph/ceph.conf :
- su - ceph -s /bin/bash -c "test -r /etc/ceph/ceph.conf && echo OK"
OK
Related issues
History
#1 Updated by Greg Farnum about 8 years ago
If you're failing an MDS from the monitor it's respawning itself, so maybe something weird happened there. Or maybe it's just busted — we don't give it a ton of testing in that case, honestly. Can you upload the MDS log of this occurring, either to the tracker (if it compresses small enough) or via ceph-post-file?
#2 Updated by Florent B about 8 years ago
It should respawn itself, but fails on reading configuration file. But as you can see in strace, it does not try to read any file.
Last minutes log here (can't attach here : "error") : paste.ubuntu.com/15333772/
#3 Updated by Greg Farnum about 8 years ago
That paste doesn't contain an error of any kind in it. :) It seems to end with the respawn command getting invoked, but nothing follows it.
#4 Updated by Florent B about 8 years ago
Yes I know, but is ends like this... :(
#5 Updated by Greg Farnum about 8 years ago
- Subject changed from MDS : active to standby error reading config file to MDS : erroneous error message about reading config file
Oh. I gather you're running into #14144, given https://github.com/ceph/ceph/pull/7060#issuecomment-200745759. The "unable to read config file" output is probably the incorrect interpretation of a return code it's getting because it can't start threads. You can look at internal_safe_to_start_threads and its related functions if you want to explore, but I think is just another result of #14144 running the system out of process IDs.
I've created https://github.com/ceph/ceph/pull/8302 to change the error message.
#6 Updated by Greg Farnum about 8 years ago
- Status changed from New to Fix Under Review
- Priority changed from High to Normal
- Source changed from other to Community (user)
#7 Updated by Florent B about 8 years ago
Ok I didn't know it could be related. No problem in this case :)
#8 Updated by Nathan Cutler about 8 years ago
- Project changed from Ceph to CephFS
#9 Updated by Nathan Cutler about 8 years ago
- Related to Bug #14144: standy-replay MDS does not cleanup finished replay threads added
#10 Updated by Greg Farnum almost 8 years ago
- Status changed from Fix Under Review to Closed
There's a conversation on that PR.