Project

General

Profile

Actions

Bug #59271

closed

mon: FAILED ceph_assert(osdmon()->is_writeable())

Added by Laura Flores about 1 year ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Category:
Stretch Clusters
Target version:
-
% Done:

100%

Source:
Tags:
backport_processed
Backport:
reef, quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7222122/remote/smithi181/log/mon.a.log.gz

2023-03-28T10:03:07.368+0000 7f5b069f1640 -1 ./src/mon/Monitor.cc: In function 'void Monitor::trigger_degraded_stretch_mode(const std::set<std::__cxx11::basic_string<char> >&, const std::set<int>&)' thread 7f5b069f1640 time 2023-03-28T10:03:07.362541+0000
./src/mon/Monitor.cc: 6824: FAILED ceph_assert(osdmon()->is_writeable())

 ceph version 18.0.0-3130-g5fe9aaa7 (5fe9aaa76edfc0b4bca939f2691b5d6a7fac53e8) reef (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x121) [0x7f5b11edf6e1]
 2: /usr/lib/ceph/libceph-common.so.2(+0x161895) [0x7f5b11edf895]
 3: (Monitor::trigger_degraded_stretch_mode(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&)+0x1fc) [0x5626e5b69b1c]
 4: (Monitor::maybe_go_degraded_stretch_mode()+0x52a) [0x5626e5b6a10a]
 5: ceph-mon(+0xb0818d) [0x5626e5b0818d]
 6: ceph-mon(+0xb6af00) [0x5626e5b6af00]
 7: (PaxosService::_active()+0xab) [0x5626e5c3818b]
 8: ceph-mon(+0xb0818d) [0x5626e5b0818d]
 9: ceph-mon(+0xb6af00) [0x5626e5b6af00]
 10: (Paxos::finish_round()+0x73) [0x5626e5c28593]
 11: ceph-mon(+0xc31178) [0x5626e5c31178]
 12: ceph-mon(+0xb0818d) [0x5626e5b0818d]
 13: ceph-mon(+0xb0818d) [0x5626e5b0818d]
 14: (Finisher::finisher_thread_entry()+0x175) [0x7f5b11f925f5]
 15: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f5b115e1b43]
 16: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f5b11673a00]


Related issues 4 (0 open4 closed)

Related to RADOS - Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crashResolvedKamoltat (Junior) Sirivadhna

Actions
Copied to RADOS - Backport #59700: pacific: mon: FAILED ceph_assert(osdmon()->is_writeable())RejectedKamoltat (Junior) SirivadhnaActions
Copied to RADOS - Backport #59701: quincy: mon: FAILED ceph_assert(osdmon()->is_writeable())ResolvedKamoltat (Junior) SirivadhnaActions
Copied to RADOS - Backport #59702: reef: mon: FAILED ceph_assert(osdmon()->is_writeable())ResolvedKamoltat (Junior) SirivadhnaActions
Actions #1

Updated by Laura Flores about 1 year ago

  • Assignee set to Kamoltat (Junior) Sirivadhna

Junior maybe you have an idea? The last issue fixed like this on Pacific was https://tracker.ceph.com/issues/58239.

Actions #2

Updated by Kamoltat (Junior) Sirivadhna about 1 year ago

Thanks for reporting this, I'm summarizing this for record:

1. We had this bug which also happens downstream: https://tracker.ceph.com/issues/57017
which is fixed/merged by https://github.com/ceph/ceph/pull/47340

2. After we backported pacific (https://github.com/ceph/ceph/pull/48803), it turns out we hit: https://tracker.ceph.com/issues/58239,
then we reverted: https://github.com/ceph/ceph/pull/49412

3. Now, quincy back-port is still opened https://github.com/ceph/ceph/pull/48802

4. The reason we didn't revert main is because this `ceph_assert(osdmon()->is_writeable())` was really hard to reproduce, one time I ran 100 jobs specifically targeting the mon/mon-stretched-cluster.sh and it didn't reproduce.

In my opinion, I think https://github.com/ceph/ceph/pull/47340 did fix https://tracker.ceph.com/issues/57017, however, there is also an underlying bug that needs fixing after which is what we are seeing here. I don't think it's a regression because you can only hit this case by trying to reproduce https://tracker.ceph.com/issues/57017.

Actions #3

Updated by Laura Flores about 1 year ago

Let's mark this one as a duplicate if you already have a Tracker open for the issue.

Actions #4

Updated by Kamoltat (Junior) Sirivadhna about 1 year ago

Lets keep this tracker, and I'll point this tracker to be related to https://tracker.ceph.com/issues/57017

Actions #5

Updated by Kamoltat (Junior) Sirivadhna about 1 year ago

  • Related to Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash added
Actions #6

Updated by Laura Flores about 1 year ago

/a/yuriw-2023-03-28_22:43:59-rados-wip-yuri11-testing-2023-03-28-0950-distro-default-smithi/7224432/remote/smithi137/log/mon.b.log.gz

Actions #7

Updated by Kamoltat (Junior) Sirivadhna about 1 year ago

  • Status changed from New to Fix Under Review
  • Priority changed from Normal to High
  • Pull request ID set to 50857
Actions #8

Updated by Kamoltat (Junior) Sirivadhna 12 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to reef, quincy, pacific
Actions #10

Updated by Kamoltat (Junior) Sirivadhna 12 months ago

  • Category set to Stretch Clusters
Actions #11

Updated by Backport Bot 12 months ago

  • Copied to Backport #59700: pacific: mon: FAILED ceph_assert(osdmon()->is_writeable()) added
Actions #12

Updated by Backport Bot 12 months ago

  • Copied to Backport #59701: quincy: mon: FAILED ceph_assert(osdmon()->is_writeable()) added
Actions #13

Updated by Backport Bot 12 months ago

  • Copied to Backport #59702: reef: mon: FAILED ceph_assert(osdmon()->is_writeable()) added
Actions #14

Updated by Backport Bot 12 months ago

  • Tags set to backport_processed
Actions #15

Updated by Yuri Weinstein 11 months ago

Kamoltat (Junior) Sirivadhna wrote:

quincy: https://github.com/ceph/ceph/pull/51413

merged

Actions #16

Updated by Konstantin Shalygin about 2 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF