Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())" - RADOS - Ceph

Actions

Copy link

Bug #21287

closed

1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"

Added by Henrik Korkuc over 6 years ago. Updated over 3 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

EC Pools

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.0

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

One PG went down for me during large rebalance (I added racks to OSD placement, almost all data had to be shuffled). RBD EC pool, min_size was set to k as some PGs went inactive after rebalance/OSD restarts.

OSD is failing with FAILED assert(i->prior_version == last || i->is_error()) during peering. This does not seem to be OSD specific, as I moved PG to another OSD, but it experienced same issue after marking original OSD lost. And for example shutting down some other OSDs of that PG enables startup of previously failing OSD. E.g. OSD 133 fails to start. Shutdown OSD 65. OSD 133 starts sucessfully. Start OSD 65, OSD 133, 118 (it has a copy of that pg from OSD 133) and 65 crash. Shutdown OSD 381, OSD 65 can start.

I poster cluster log, OSD 133 log from yesterday and will paste OSD 65 log with more debugging in next message.

ceph-post-file for cluster log: 0d3a2d70-cb27-4ade-b0d7-6ee6f69ffbc9
ceph-post-file for OSD 133: b48cce81-5571-4575-a626-daa9f43725d7

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #21287

1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"

Updated by Henrik Korkuc over 6 years ago

Updated by Henrik Korkuc over 6 years ago

Updated by mingxin liu over 6 years ago

Updated by mingxin liu over 6 years ago

Updated by mingxin liu over 6 years ago

Updated by Shinobu Kinjo over 6 years ago

Updated by Greg Farnum over 6 years ago

Updated by Shinobu Kinjo over 6 years ago

Updated by lingjie kong about 6 years ago

Updated by Chang Liu about 6 years ago

Updated by Chang Liu about 6 years ago

Updated by Josh Durgin about 6 years ago

Updated by huang jun over 3 years ago