Project

General

Profile

Bug #43825

osd stuck down

Added by Sage Weil 10 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

osd stuck at epoch 99, cluster at 2000 or something.

monc fails to reconnect to the mon

/a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703135


Related issues

Copied to RADOS - Backport #43919: nautilus: osd stuck down Resolved

History

#1 Updated by Sage Weil 10 months ago

huh, also /a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703159 osd.7

#2 Updated by Sage Weil 10 months ago

  • Status changed from New to Need More Info

#3 Updated by Sage Weil 10 months ago

  • Target version set to v15.0.0

#4 Updated by Sage Weil 10 months ago

  • Subject changed from monc fails to reconnect, osd stuck down to osd stuck down
2020-01-28T15:07:56.366+0000 7fd3bc46f700 10 osd.6 400 handle_osd_map waiting for pgs to consume 300 (shard 0 min 0, map cache is 50, max_lag_factor 2)

osd is stuck at an old epoch; cluster marks it down and it doesn't come back up.

/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713213

Thread 38 (Thread 0x7fd3bc46f700 (LWP 40649)):
#0  0x00007fd3d20884dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fd3d1749780 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x000055b84445f8bf in std::condition_variable::wait<OSDShard::wait_min_pg_epoch(epoch_t)::<lambda()> > (__p=..., __lock=..., this=0x55b85090e858) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/log/Entry.h:35
#3  OSDShard::wait_min_pg_epoch (this=0x55b85090e6c0, need=300) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/osd/OSD.cc:10007
#4  0x000055b84445ffd8 in OSD::handle_osd_map (this=0x55b84fce8000, m=0x55b853c5b440) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/osd/OSD.cc:7770
#5  0x000055b844474b8b in OSD::_dispatch (this=0x55b84fce8000, m=0x55b853c5b440) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/osd/OSD.cc:7180
#6  0x000055b844474e16 in OSD::ms_dispatch (this=0x55b84fce8000, m=0x55b853c5b440) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/osd/OSD.cc:6917
#7  0x000055b844db4228 in Dispatcher::ms_dispatch2 (m=..., this=0x55b84fce8000) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/msg/Dispatcher.h:124
#8  Messenger::ms_deliver_dispatch (m=..., this=0x55b84fb7a800) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/msg/Messenger.h:693
#9  DispatchQueue::entry (this=0x55b84fb7ab28) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/msg/DispatchQueue.cc:199
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x000055b844c0c5d1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-9869.g4b944a6.el8.x86_64/src/msg/DispatchQueue.h:101
#11 0x00007fd3d20822de in start_thread () from /lib64/libpthread.so.0
#12 0x00007fd3d0e2c4b3 in clone () from /lib64/libc.so.6

it's stuck here:
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 400 advance_pg 1.5 is merge target, sources are 1.25
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 400 advance_pg have 0/1
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 400 advance_pg not ready to merge yet
2020-01-28T15:07:56.366+0000 7fd3ac071700 10 log is not dirty
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 400 advance_pg kicking source 1.25
2020-01-28T15:07:56.366+0000 7fd3ac071700 15 osd.6 400 enqueue_peering_evt 1.25 epoch_sent: 282 epoch_requested: 282 NullEvt
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _enqueue OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282)
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _process 1.25 to_process <> waiting <> waiting_peering {282=<OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282),OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282),OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282)>}
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _process OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282) queued
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _process 1.25 to_process <OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282)> waiting <> waiting_peering {282=<OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282),OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282),OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282)>}
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _process OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282) pg 0
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _process 1.25 splitting 268
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq _add_slot_waiter 1.25 peering, item epoch is 282, will wait on OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282)
2020-01-28T15:07:56.366+0000 7fd3ac071700 20 osd.6 op_wq(5) _process empty q, waiting

#5 Updated by Sage Weil 10 months ago

2020-01-28T14:56:26.155+0000 7fd3ba08d700 20 osd.6 285 identify_splits_and_merges 1.5 e245 to e285 pg_nums {76=28,89=38,116=37,119=36,123=35,127=34,131=33,135=32,139=31,143=30,147=29,151=28,179=27,189=28,248=27,251=28,268=38,282=37}

epoch 268: 1.25 logically created from 1.5
epoch 282: 1.25 logically merges back into 1.5

#6 Updated by Sage Weil 10 months ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Sage Weil

we are splitting:

2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 285 split_pgs splitting pg[1.5( empty local-lis/les=195/196 n=0 ec=15/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5]/[5,4] r=-1 lpr=268 pi=[195,268)/3 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] into 1.25
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 285 _make_pg 1.25
2020-01-28T14:56:26.159+0000 7fd3ac071700  5 osd.6 pg_epoch: 268 pg[1.25(unlocked)] enter Initial
2020-01-28T14:56:26.159+0000 7fd3ac071700 20 osd.6 pg_epoch: 268 pg[1.25(unlocked)] enter NotTrimming
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 bluestore(/var/lib/ceph/osd/ceph-6) _osr_attach 1.25_head fresh osr 0x55b850a28280
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 285  pg_num is 38, m_seed 37, split_bits is 6
2020-01-28T14:56:26.159+0000 7fd3ac071700 20 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval
2020-01-28T14:56:26.159+0000 7fd3ac071700 20 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval upacting_features 0x3f01cfb8ffadffff from [5]+[5]
2020-01-28T14:56:26.159+0000 7fd3ac071700 20 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval checking missing set deletes flag. missing = missing(0 may_include_deletes = 1)
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] init_hb_stamps now []
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval prior_readable_until_ub 0s (mnow 0.072469917s + 0s)
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] cancel_recovery
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 pg_epoch: 268 pg[1.25( empty local-lis/les=195/196 n=0 ec=268/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5] r=-1 lpr=0 pi=[195,268)/3 crt=0'0 mlcod 0'0 unknown mbc={}] clear_recovery_state
2020-01-28T14:56:26.159+0000 7fd3ac071700 10 osd.6 pg_epoch: 268 pg[1.5( empty local-lis/les=195/196 n=0 ec=15/15 lis/c=195/195 les/c/f=196/196/0 sis=268) [5]/[5,4] r=-1 lpr=268 pi=[195,268)/3 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] release_backoffs [MIN,MAX)
...

and then we bail out of advance_pg because there is a merge coming up,
2020-01-28T14:56:26.162+0000 7fd3ac071700 20 osd.6 285 advance_pg not ready to merge yet
2020-01-28T14:56:26.162+0000 7fd3ac071700 10 log is not dirty
2020-01-28T14:56:26.162+0000 7fd3ac071700 20 osd.6 285 advance_pg kicking source 1.25
2020-01-28T14:56:26.162+0000 7fd3ac071700 15 osd.6 285 enqueue_peering_evt 1.25 epoch_sent: 282 epoch_requested: 282 NullEvt
2020-01-28T14:56:26.162+0000 7fd3ac071700 20 osd.6 op_wq(5) _enqueue OpSchedulerItem(1.25 PGPeeringEvent(epoch_sent: 282 epoch_requested: 282 NullEvt) prio 255 cost 10 e282)
...
2020-01-28T14:56:26.162+0000 7fd3ac071700 20 osd.6 op_wq(5) _process 1.25 to_process <> waiting <> waiting_peering {}

but we do not submit the rctx transaction and thus queue the split finisher before dropping the pg lock

#7 Updated by Sage Weil 10 months ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to nautilus
  • Pull request ID set to 32942

#8 Updated by Sage Weil 10 months ago

  • Status changed from Fix Under Review to Pending Backport

#9 Updated by Nathan Cutler 10 months ago

#10 Updated by Nathan Cutler 6 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF