Bug #17023: OSD failed to subscribe skipped osdmaps after "ceph osd pause" - Ceph - Ceph

Actions

Copy link

Bug #17023

closed

OSD failed to subscribe skipped osdmaps after "ceph osd pause"

Added by Kefu Chai over 7 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Kefu Chai

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

per Wido's comment in #16982-7,

I tried adding new OSDs to the cluster and they also have to catch up, which never happens until I restart them over and over.
osd.136 in this case is a fresh OSD. You can see it jumps with 1.000 maps (osd_map_message_max), but then just waits.
I restart the osd, it goes 1k maps forward and waits. I restart, etc, etc.

and the root cause is analyzed at #16982-11.

in short, the problem here, due to "ceph osd pause", the subscription sent by objecter always gets in the way of OSD, so the latter cannot subscribe for the older osdmap to catch up with the cluster.

so a workaround is to not "ceph osd pause".

Files

ceph-osd.136.log.gz (47.9 KB) ceph-osd.136.log.gz

Wido den Hollander, 08/11/2016 06:51 AM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #17023

OSD failed to subscribe skipped osdmaps after "ceph osd pause"

Updated by Kefu Chai over 7 years ago

Updated by Wido den Hollander over 7 years ago

Updated by Kefu Chai over 7 years ago

Updated by Kefu Chai over 7 years ago

Updated by Kefu Chai over 7 years ago

Updated by Loïc Dachary over 7 years ago

Updated by Loïc Dachary over 7 years ago

Updated by Loïc Dachary over 7 years ago

Updated by Loïc Dachary over 7 years ago

Updated by Loïc Dachary over 7 years ago