Bug #59271
closedmon: FAILED ceph_assert(osdmon()->is_writeable())
100%
Description
/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7222122/remote/smithi181/log/mon.a.log.gz
2023-03-28T10:03:07.368+0000 7f5b069f1640 -1 ./src/mon/Monitor.cc: In function 'void Monitor::trigger_degraded_stretch_mode(const std::set<std::__cxx11::basic_string<char> >&, const std::set<int>&)' thread 7f5b069f1640 time 2023-03-28T10:03:07.362541+0000
./src/mon/Monitor.cc: 6824: FAILED ceph_assert(osdmon()->is_writeable())
ceph version 18.0.0-3130-g5fe9aaa7 (5fe9aaa76edfc0b4bca939f2691b5d6a7fac53e8) reef (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x121) [0x7f5b11edf6e1]
2: /usr/lib/ceph/libceph-common.so.2(+0x161895) [0x7f5b11edf895]
3: (Monitor::trigger_degraded_stretch_mode(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&)+0x1fc) [0x5626e5b69b1c]
4: (Monitor::maybe_go_degraded_stretch_mode()+0x52a) [0x5626e5b6a10a]
5: ceph-mon(+0xb0818d) [0x5626e5b0818d]
6: ceph-mon(+0xb6af00) [0x5626e5b6af00]
7: (PaxosService::_active()+0xab) [0x5626e5c3818b]
8: ceph-mon(+0xb0818d) [0x5626e5b0818d]
9: ceph-mon(+0xb6af00) [0x5626e5b6af00]
10: (Paxos::finish_round()+0x73) [0x5626e5c28593]
11: ceph-mon(+0xc31178) [0x5626e5c31178]
12: ceph-mon(+0xb0818d) [0x5626e5b0818d]
13: ceph-mon(+0xb0818d) [0x5626e5b0818d]
14: (Finisher::finisher_thread_entry()+0x175) [0x7f5b11f925f5]
15: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f5b115e1b43]
16: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f5b11673a00]
Updated by Laura Flores about 1 year ago
- Assignee set to Kamoltat (Junior) Sirivadhna
Junior maybe you have an idea? The last issue fixed like this on Pacific was https://tracker.ceph.com/issues/58239.
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
Thanks for reporting this, I'm summarizing this for record:
1. We had this bug which also happens downstream: https://tracker.ceph.com/issues/57017
which is fixed/merged by https://github.com/ceph/ceph/pull/47340
2. After we backported pacific (https://github.com/ceph/ceph/pull/48803), it turns out we hit: https://tracker.ceph.com/issues/58239,
then we reverted: https://github.com/ceph/ceph/pull/49412
3. Now, quincy back-port is still opened https://github.com/ceph/ceph/pull/48802
4. The reason we didn't revert main is because this `ceph_assert(osdmon()->is_writeable())` was really hard to reproduce, one time I ran 100 jobs specifically targeting the mon/mon-stretched-cluster.sh and it didn't reproduce.
In my opinion, I think https://github.com/ceph/ceph/pull/47340 did fix https://tracker.ceph.com/issues/57017, however, there is also an underlying bug that needs fixing after which is what we are seeing here. I don't think it's a regression because you can only hit this case by trying to reproduce https://tracker.ceph.com/issues/57017.
Updated by Laura Flores about 1 year ago
Let's mark this one as a duplicate if you already have a Tracker open for the issue.
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
Lets keep this tracker, and I'll point this tracker to be related to https://tracker.ceph.com/issues/57017
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
- Related to Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash added
Updated by Laura Flores about 1 year ago
/a/yuriw-2023-03-28_22:43:59-rados-wip-yuri11-testing-2023-03-28-0950-distro-default-smithi/7224432/remote/smithi137/log/mon.b.log.gz
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
- Status changed from New to Fix Under Review
- Priority changed from Normal to High
- Pull request ID set to 50857
Updated by Kamoltat (Junior) Sirivadhna 12 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to reef, quincy, pacific
Updated by Kamoltat (Junior) Sirivadhna 12 months ago
Updated by Kamoltat (Junior) Sirivadhna 12 months ago
- Category set to Stretch Clusters
Updated by Backport Bot 12 months ago
- Copied to Backport #59700: pacific: mon: FAILED ceph_assert(osdmon()->is_writeable()) added
Updated by Backport Bot 12 months ago
- Copied to Backport #59701: quincy: mon: FAILED ceph_assert(osdmon()->is_writeable()) added
Updated by Backport Bot 12 months ago
- Copied to Backport #59702: reef: mon: FAILED ceph_assert(osdmon()->is_writeable()) added
Updated by Yuri Weinstein 11 months ago
Updated by Konstantin Shalygin about 2 months ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100