Feature #12476
closedthrashosds: send sighup to daemons
0%
Description
Add a thrasher option to send sighup to a random daemon with some frequency. Default to something pretty frequent (say .1 seconds?).
Updated by Zack Cerza over 8 years ago
This is to test the fix for http://tracker.ceph.com/issues/12465
Updated by Zack Cerza over 8 years ago
I think Sage means "a random OSD daemon", correct?
Updated by Zack Cerza over 8 years ago
So I'm not totally clear on this, but I think the proposed change would be to:
- Add a feature to
ceph_manager.Thrasher
to periodically send a HUP signal to a random OSD at a given interval - Have
thrashosds
use that feature (by default?) with a value of 0.1s
Is this correct? And, are we okay with the amount of logging this would generate?
Updated by Zack Cerza over 8 years ago
Looking at Thrasher.do_thrash()
, I have another question:
Should this be something chosen by choose_action()
as called here:
https://github.com/ceph/ceph-qa-suite/blob/master/tasks/ceph_manager.py#L679
Or as a codepath similar to:
https://github.com/ceph/ceph-qa-suite/blob/master/tasks/ceph_manager.py#L661
?
Updated by Sage Weil over 8 years ago
We'd like to send SIGHUP with a higher frequency than that existing event loop.. I don't think sending it every few seconds is enough to reliably trigger the original race.
Ideally it'd be a separate thread/greenlet/whatever thing and the interval could be measured in milliseconds (I suggest 100ms default). Sam, do you agree? Or is every few seconds enough?
Updated by Samuel Just over 8 years ago
Might just want to add it to daemon helper. I don't think we need to put it in do thrash, would rather just spawn an extra thread doing only that from the manager task if we don't put it in daemon helper.
Updated by Andrew Schoen over 8 years ago
Zack and I talked about a proposed solution. Here's what we're thinking:
1) Add a method to CephManager called signal_osd that would take an osd id and a signal ID.
2) Add a new config option to the thrashosds task that will be a delay value for the sighup
3) Add a new method to Thrasher with performs the call to CephManager.signal_osd for a random live osd. This will be run in "parallel" with Thrasher.do_thrash in a separate greenlet. This feature in Thrasher will be defaulted as such so that only thrashosds will execute the sighup by default.
Updated by Andrew Schoen over 8 years ago
- Status changed from New to 7
Updated by Zack Cerza over 8 years ago
- Status changed from 7 to Resolved