Bug #65423
closedMonitor crashes down when I try to create a FS. The stacks maybe related to metadata server map decoder during the PAXOS service
0%
Description
I have created a ceph cluster with 5 monitors and 2 metadata servers.
After that, I want to create a fs. Thus, I use the following command:
ceph fs new fuchenFS cephfs_metadata cephfs_data
where the cephfs_metadata and cephfs_data are two pools.
But the command hangs and I found that 4 of the 5 monitors crash down. Besides, I cannot check the state of the cluster by using 'ceph -s'.
Then I want to restart the monitor by using:
ceph-mon -i ceph2
The command failed with an exception:
root@ceph2:~# ceph-mon -i ceph2 terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of_buffer' what(): End of buffer [buffer:2] *** Caught signal (Aborted) ** in thread 7f1d57bd7d40 thread_name:ceph-mon ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev) 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520] 2: pthread_kill() 3: raise() 4: abort() 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e] 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c] 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277] 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8] 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624] 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc] 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40] 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e] 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2] 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92] 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9] 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408] 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4] 18: main() 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90] 20: __libc_start_main() 21: _start() 2024-04-11T08:19:29.155+0000 7f1d57bd7d40 -1 *** Caught signal (Aborted) ** in thread 7f1d57bd7d40 thread_name:ceph-mon ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev) 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520] 2: pthread_kill() 3: raise() 4: abort() 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e] 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c] 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277] 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8] 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624] 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc] 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40] 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e] 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2] 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92] 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9] 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408] 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4] 18: main() 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90] 20: __libc_start_main() 21: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2024-04-11T08:19:29.155+0000 7f1d57bd7d40 -1 *** Caught signal (Aborted) ** in thread 7f1d57bd7d40 thread_name:ceph-mon ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev) 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520] 2: pthread_kill() 3: raise() 4: abort() 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e] 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c] 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277] 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8] 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624] 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc] 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40] 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e] 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2] 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92] 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9] 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408] 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4] 18: main() 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90] 20: __libc_start_main() 21: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -297> 2024-04-11T08:19:29.155+0000 7f1d57bd7d40 -1 *** Caught signal (Aborted) ** in thread 7f1d57bd7d40 thread_name:ceph-mon ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev) 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520] 2: pthread_kill() 3: raise() 4: abort() 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e] 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c] 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277] 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8] 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624] 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc] 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40] 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e] 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2] 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92] 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9] 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408] 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4] 18: main() 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90] 20: __libc_start_main() 21: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by fuchen ma 25 days ago
Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.
So, is this a bug related to the compatibility?
Updated by fuchen ma 25 days ago
fuchen ma wrote in #note-1:
Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.So, is this a bug related to the compatibility?
the version of the non-crashed ceph-mon is 18.2.2.
Updated by Venky Shankar 20 days ago
- Status changed from New to Need More Info
fuchen ma wrote in #note-1:
Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.So, is this a bug related to the compatibility?
fuchen ma wrote in #note-1:
Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.So, is this a bug related to the compatibility?
18.0.* is a dev release. Why are you using it? You should be using 18.2.* in all your servers (mixed versions are fine).
Updated by Venky Shankar 6 days ago
- Status changed from Need More Info to Rejected
Please follow suggestion in note-3.