Project

General

Profile

Actions

Feature #12476

closed

thrashosds: send sighup to daemons

Added by Sage Weil almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Description

Add a thrasher option to send sighup to a random daemon with some frequency. Default to something pretty frequent (say .1 seconds?).

Actions #1

Updated by Zack Cerza almost 9 years ago

  • Tracker changed from Bug to Feature
Actions #2

Updated by Zack Cerza over 8 years ago

This is to test the fix for http://tracker.ceph.com/issues/12465

Actions #3

Updated by Zack Cerza over 8 years ago

I think Sage means "a random OSD daemon", correct?

Actions #4

Updated by Zack Cerza over 8 years ago

So I'm not totally clear on this, but I think the proposed change would be to:

  1. Add a feature to ceph_manager.Thrasher to periodically send a HUP signal to a random OSD at a given interval
  2. Have thrashosds use that feature (by default?) with a value of 0.1s

Is this correct? And, are we okay with the amount of logging this would generate?

Actions #5

Updated by Zack Cerza over 8 years ago

Looking at Thrasher.do_thrash(), I have another question:

Should this be something chosen by choose_action() as called here:
https://github.com/ceph/ceph-qa-suite/blob/master/tasks/ceph_manager.py#L679

Or as a codepath similar to:
https://github.com/ceph/ceph-qa-suite/blob/master/tasks/ceph_manager.py#L661

?

Actions #6

Updated by Sage Weil over 8 years ago

We'd like to send SIGHUP with a higher frequency than that existing event loop.. I don't think sending it every few seconds is enough to reliably trigger the original race.

Ideally it'd be a separate thread/greenlet/whatever thing and the interval could be measured in milliseconds (I suggest 100ms default). Sam, do you agree? Or is every few seconds enough?

Actions #7

Updated by Samuel Just over 8 years ago

Might just want to add it to daemon helper. I don't think we need to put it in do thrash, would rather just spawn an extra thread doing only that from the manager task if we don't put it in daemon helper.

Actions #8

Updated by Sage Weil over 8 years ago

  • Priority changed from Urgent to Immediate
Actions #9

Updated by Andrew Schoen over 8 years ago

  • Assignee set to Andrew Schoen
Actions #10

Updated by Andrew Schoen over 8 years ago

Zack and I talked about a proposed solution. Here's what we're thinking:

1) Add a method to CephManager called signal_osd that would take an osd id and a signal ID.
2) Add a new config option to the thrashosds task that will be a delay value for the sighup
3) Add a new method to Thrasher with performs the call to CephManager.signal_osd for a random live osd. This will be run in "parallel" with Thrasher.do_thrash in a separate greenlet. This feature in Thrasher will be defaulted as such so that only thrashosds will execute the sighup by default.

Actions #11

Updated by Sage Weil over 8 years ago

sounds perfect

Actions #12

Updated by Andrew Schoen over 8 years ago

  • Status changed from New to 7
Actions

Also available in: Atom PDF