Bug #44159: [rbd-mirror] Mirror daemon never recovers from being blacklisted - rbd - Ceph

Actions

Copy link

Bug #44159

closed

[rbd-mirror] Mirror daemon never recovers from being blacklisted

Added by Oliver Freyermuth about 4 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Mykola Golub

Target version:

% Done:

Source:

Tags:

Backport:

luminous,mimic,nautilus

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v14.2.7

ceph-qa-suite:

Pull request ID:

33411

Crash signature (v1):

Crash signature (v2):

Description

I can reproduce this rather reliably by:
- Restarting many OSDs (old nodes, slow spinning disks, likely exceeding default blacklist timeout).
- Sometimes, it also happens when restarting other RBD mirror daemons (we have 3).

The attached log is extracted from one blacklisted RBD mirror unable to recover at log level 15.
RBD volume names and domains are sanitized, otherwise the log is untouched.

Files

ceph-client.rbd_mirror_backup.log.gz (479 KB) ceph-client.rbd_mirror_backup.log.gz

Oliver Freyermuth, 02/15/2020 05:18 PM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Mykola Golub about 4 years ago

Status changed from New to In Progress
Assignee set to Mykola Golub

Actions

Copy link

Updated by Mykola Golub about 4 years ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 33411

In the provided log there are many messages like these ones:

2020-02-14 02:14:56.653 7f42f7ac1700 -1 rbd::mirror::InstanceReplayer: 0x55bab1dc3b80 start_image_replayer: global_image_id=446b538f-1f61-4daa-b05f-93f76cd5e652: blacklisted detected during image replay

2020-02-14 02:14:56.660 7f42f7ac1700  5 rbd::mirror::LeaderWatcher: 0x55bab29a9200 handle_rewatch_complete: r=-108

So both rbd-mirror's InstanceReplayer and LeaderWatcher detected the "blacklisted" state but it was not propagated on the higher level to restart the PoolReplayer.

Actions

Copy link

Updated by Jason Dillaman about 4 years ago

Backport set to luminous,mimic,nautilus

Actions

Copy link

Updated by Jason Dillaman about 4 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Copied to Backport #44262: mimic: [rbd-mirror] Mirror daemon never recovers from being blacklisted added

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Copied to Backport #44263: nautilus: [rbd-mirror] Mirror daemon never recovers from being blacklisted added

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Copied to Backport #44264: luminous: [rbd-mirror] Mirror daemon never recovers from being blacklisted added

Actions

Copy link

Updated by Nathan Cutler about 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #44159

[rbd-mirror] Mirror daemon never recovers from being blacklisted

Updated by Mykola Golub about 4 years ago

Updated by Mykola Golub about 4 years ago

Updated by Jason Dillaman about 4 years ago

Updated by Jason Dillaman about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 3 years ago