Project

General

Profile

Actions

Bug #65423

closed

Monitor crashes down when I try to create a FS. The stacks maybe related to metadata server map decoder during the PAXOS service

Added by fuchen ma 25 days ago. Updated 6 days ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have created a ceph cluster with 5 monitors and 2 metadata servers.
After that, I want to create a fs. Thus, I use the following command:

ceph fs new fuchenFS cephfs_metadata cephfs_data

where the cephfs_metadata and cephfs_data are two pools.

But the command hangs and I found that 4 of the 5 monitors crash down. Besides, I cannot check the state of the cluster by using 'ceph -s'.

Then I want to restart the monitor by using:

ceph-mon -i ceph2

The command failed with an exception:

root@ceph2:~# ceph-mon -i ceph2
terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of_buffer'
  what():  End of buffer [buffer:2]
*** Caught signal (Aborted) **
 in thread 7f1d57bd7d40 thread_name:ceph-mon
 ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520]
 2: pthread_kill()
 3: raise()
 4: abort()
 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e]
 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c]
 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277]
 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8]
 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624]
 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc]
 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40]
 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e]
 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2]
 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92]
 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9]
 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408]
 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4]
 18: main()
 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90]
 20: __libc_start_main()
 21: _start()
2024-04-11T08:19:29.155+0000 7f1d57bd7d40 -1 *** Caught signal (Aborted) **
 in thread 7f1d57bd7d40 thread_name:ceph-mon

 ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520]
 2: pthread_kill()
 3: raise()
 4: abort()
 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e]
 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c]
 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277]
 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8]
 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624]
 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc]
 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40]
 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e]
 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2]
 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92]
 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9]
 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408]
 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4]
 18: main()
 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90]
 20: __libc_start_main()
 21: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2024-04-11T08:19:29.155+0000 7f1d57bd7d40 -1 *** Caught signal (Aborted) **
 in thread 7f1d57bd7d40 thread_name:ceph-mon

 ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520]
 2: pthread_kill()
 3: raise()
 4: abort()
 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e]
 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c]
 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277]
 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8]
 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624]
 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc]
 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40]
 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e]
 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2]
 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92]
 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9]
 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408]
 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4]
 18: main()
 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90]
 20: __libc_start_main()
 21: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  -297> 2024-04-11T08:19:29.155+0000 7f1d57bd7d40 -1 *** Caught signal (Aborted) **
 in thread 7f1d57bd7d40 thread_name:ceph-mon

 ceph version 18.0.0-5756-g17f1ece3750 (17f1ece375002ab61703156123c5cfd6000b45b3) reef (dev)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1d58ddf520]
 2: pthread_kill()
 3: raise()
 4: abort()
 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f1d5916fb9e]
 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f1d5917b20c]
 7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f1d5917b277]
 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f1d5917b4d8]
 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0x5a) [0x7f1d59ce3624]
 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd26) [0x7f1d59f2f5bc]
 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x2de) [0x7f1d59f3af40]
 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x436) [0x7f1d59f3f61e]
 13: (MDSMonitor::update_from_paxos(bool*)+0x3da) [0x55d1ac8490c2]
 14: (PaxosService::refresh(bool*)+0x310) [0x55d1ac75ea92]
 15: (Monitor::refresh_from_paxos(bool*)+0x323) [0x55d1ac58cbc9]
 16: (Monitor::init_paxos()+0x1b2) [0x55d1ac5a2408]
 17: (Monitor::preinit()+0x13ce) [0x55d1ac5d49d4]
 18: main()
 19: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1d58dc6d90]
 20: __libc_start_main()
 21: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by fuchen ma 25 days ago

Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.

So, is this a bug related to the compatibility?

Actions #2

Updated by fuchen ma 25 days ago

fuchen ma wrote in #note-1:

Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.

So, is this a bug related to the compatibility?

the version of the non-crashed ceph-mon is 18.2.2.

Actions #3

Updated by Venky Shankar 20 days ago

  • Status changed from New to Need More Info

fuchen ma wrote in #note-1:

Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.

So, is this a bug related to the compatibility?

fuchen ma wrote in #note-1:

Another information:
I found that the version of the non-crashed is 18.2.2, and the version of the crashed ones are 18.0.0
And when I created the file system by the command 'ceph fs new', the version of ceph is still 18.0.0.

So, is this a bug related to the compatibility?

18.0.* is a dev release. Why are you using it? You should be using 18.2.* in all your servers (mixed versions are fine).

Actions #4

Updated by Venky Shankar 6 days ago

  • Status changed from Need More Info to Rejected

Please follow suggestion in note-3.

Actions

Also available in: Atom PDF