Bug #52781
closedshard-threads cannot wakeup bug
100%
Description
osd: fix shard-threads cannot wakeup bug
Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd in osd.14 operation
Reason:
(1) one shard-queue has three shard-threads
(2) one or some PeeringOp's epoch > osdmap's epoch held by current osd,
and these PeeringOp _add_slot_waiter()
(3) shard-queue become empty and three shard-threads cond.wait()
(4) new osdmap consume and it _wake_pg_slot()
Problem in here
1> OSDShard::consume() exec loop all pg's slot wait
and requeue more than one PeeringOp to shard-queue
2> but it only notify one shard-thread to wakeup,
the other two shard-threads continue cond.wait()
3> OSD::ShardedOpWQ::_enqueue() found the shard-queue not empty
and not notify all shard-thread to wakeup
In a period of time, only one shard-thread of 3 shard-threads is running.
Updated by jianwei zhang over 2 years ago
Updated by Neha Ojha over 2 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSD)
Updated by Kefu Chai over 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 43360
Updated by Kefu Chai over 2 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to octopus, pacific
Updated by Backport Bot over 2 years ago
- Copied to Backport #52840: octopus: shard-threads cannot wakeup bug added
Updated by Backport Bot over 2 years ago
- Copied to Backport #52841: pacific: shard-threads cannot wakeup bug added
Updated by Konstantin Shalygin 5 months ago
- Status changed from Pending Backport to Resolved
- Assignee set to jianwei zhang
- % Done changed from 0 to 100