Bug #38124
closedOSD down on snaptrim.
0%
Description
All of ceph cluster OSD's crash when ceph runs snaptrim.
The particular error osd is throwing before crashing is:
2019-01-31 10:46:01.310 7fbb2fd45700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.4/rpm/el7/BUILD/ceph-13.2.4/src/osd/PrimaryLogPG.h: In function 'Pri
maryLogPG::Trimming::Trimming(boost::statechart::state<PrimaryLogPG::Trimming, PrimaryLogPG::SnapTrimmer, PrimaryLogPG::WaitReservation>::my_context)' thread 7fbb2fd45700 time 2019-01-31 10:46:01.306356
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.4/rpm/el7/BUILD/ceph-13.2.4/src/osd/PrimaryLogPG.h: 1571: FAILED assert(context< SnapTrimmer >().can_trim())
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7fbb59f3716f]
2: (()+0x25a337) [0x7fbb59f37337]
3: (PrimaryLogPG::NotTrimming::react(PrimaryLogPG::KickTrim const&)+0x783) [0x559a606efbc3]
4: (boost::statechart::simple_state<PrimaryLogPG::NotTrimming, PrimaryLogPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xa9) [0x559a6073b629]
5: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x559a60715f23]
6: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x559a60716187]
7: (_ZZN12PrimaryLogPG13WaitTrimTimerC4EN5boost10statechart5stateIS0_NS_8TrimmingENS1_3mpl4listIN4mpl_2naES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EELNS2_12history_modeE0EE10my_contextEEN7OnTimer6finishEi()+0xb2) [0x559a607163a2]
8: (Context::complete(int)+0x9) [0x559a60580e49]
9: (SafeTimer::timer_thread()+0x18b) [0x7fbb59f33a8b]
10: (SafeTimerThread::entry()+0xd) [0x7fbb59f3504d]
11: (()+0x7dd5) [0x7fbb56ab8dd5]
12: (clone()+0x6d) [0x7fbb55ba8ead]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
The rest of the log is attached with the file.
Files
Updated by Darius Kasparavičius about 5 years ago
- File ceph-osd.tar.gz ceph-osd.tar.gz added
Hello,
I have collected additional information Sage asked. Attached log has debug_osd=20 set.
How this happened:
1. One of the nodes had all it's osd's set to out. To clean them up for replacement.
2. Noticed that a lot of snaptrim was running.
3. Set nosnaptrim flag on the cluster.
4. Once mon_osd_snap_trim_queue_warn_on appeared. Removed nosnaptrim flag.
5. All osds on the cluster crashed and started flapping. Set nosnaptrim flag back on.
Updated by Greg Farnum about 5 years ago
I was theorizing in a bug scrub that maybe the PG was running behind on OSDMaps and so missing the nosnaptrim flag update, but that isn't the case — the OSD doesn't look at it directly at all, just the PG when it activates a map.
However, since the crash came from the WaitTrimTimer state's timer triggering a transition into NotTrimming and posting a KickTrim event, I think it's safe to say there's some race or missed timer cleanup that causes this when the flag changes state. Since the timer is cleaned up when you exit the WaitTrimTimer state, that also seems a bit odd, but maybe...oh, in fact, I don't see anything that directly kills it. Maybe we do a wide reset() somewhere? Kinda looks like we only do that in PrimaryLogPG::on_change() and that is only triggered in interval changes, though.
(Also interesting: the WaitRepops state goes into WaitTrimTimer if !can_trim() when the replies come back.)
Updated by Darius Kasparavičius about 5 years ago
Hello,
any updates regarding this bug? I would love a patch to resolve this issue ASAP. One of my monitors just died and I can't add new one. As it's throwing slow io errors while trying to synchronise.
Updated by Erikas Kučinskis about 5 years ago
Hello it's been two months now is there any update about this bug?
Updated by David Zafman almost 5 years ago
- Status changed from New to In Progress
- Assignee set to David Zafman
I am able to reproduce this, so I'll work on a fix.
Updated by David Zafman almost 5 years ago
The following script sometimes hits the race and crashes an OSD. I've removed the assert and the script has been running in a loop without seeing any other core dumps.
#! /bin/bash -x ../src/stop.sh MGR=1 MON=1 MDS=0 OSD=5 ../src/vstart.sh -l -d -n -o osd_snap_trim_sleep=5.0 2> /dev/null sleep 5 bin/ceph osd pool create test 1 1 2> /dev/null sleep 5 sleep 2 bin/ceph pg dump pgs 2> /dev/null for s in $(seq 1 20) do dd if=/dev/urandom bs=1m count=1 of=data for i in $(seq 1 100) do bin/rados -p test put obj$i data 2> /dev/null done bin/rados -p test mksnap snap${s} 2> /dev/null done while(true); do bin/ceph osd set nosnaptrim; sleep 1; bin/ceph osd unset nosnaptrim; done & for s in $(seq 1 20) do bin/rados -p test rmsnap snap$s sleep 3 done sleep 60 bin/ceph status bin/ceph osd dump kill %% wait bin/ceph status bin/ceph osd dump
Updated by David Zafman almost 5 years ago
- Status changed from In Progress to Fix Under Review
Updated by Erikas Kučinskis almost 5 years ago
Hi is there any ETA when the bug will be live?
Updated by Erikas Kučinskis almost 5 years ago
Erikas Kučinskis wrote:
Hi is there any ETA when the bug fix will be live?
Updated by Greg Farnum almost 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to mimic, nautilus
No ETA; it'll have to wend its way through the backports process. I don't think any releases are imminent so it should be the next point release though.
Updated by Erikas Kučinskis almost 5 years ago
Greg Farnum wrote:
No ETA; it'll have to wend its way through the backports process. I don't think any releases are imminent so it should be the next point release though.
Thank you for the information.
Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #39698: mimic: OSD down on snaptrim. added
Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #39699: nautilus: OSD down on snaptrim. added
Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved