Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update) - RADOS - Ceph

Actions

Copy link

Bug #21174

closed

OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)

Added by Martin Millnert over 6 years ago. Updated almost 5 years ago.

Status:

Rejected

Priority:

Urgent

Assignee:

Category:

Target version:

Ceph - v12.2.0

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I've setup a cephfs erasure coded pool on a small cluster consisting of 5 bluestore OSDs.
The pools were created as follows:

ceph osd pool create cephfs_metadata 160 160 replicated
ceph osd pool create cephfs_data 160 160 erasure ecpool ec
ceph osd pool set cephfs_data allow_ec_overwrites true
ceph fs new cephfs cephfs_metadata cephfs_data

I started copying files onto the cephfs, which has now started to crash in endless loops. Cluster is unavailable (which is uncritical for me but not for a "live cluster".)
The log which is available at https://martin.millnert.se/files/cephfs_ec/ceph-osd.1.log.all.gz crashes in the vicinity of a lot of output about "omap" operations.

In the documentation at http://docs.ceph.com/docs/master/rados/operations/erasure-code/ it is stated that erasure coded pools do not support omap operations, which is why special care has to be taken with RBD.
For CephFS, it simply states: "For Cephfs, using an erasure coded pool means setting that pool in a file layout." with a link to the section on CephFS file layouts: http://docs.ceph.com/docs/master/cephfs/file-layouts/
The file layouts documentation section does not reciprocate this link, i.e. the logic/context of using erasure coded pools with cephfs is not further explained there.

So, provided I've done a user error here and there is no other bug causing my OSDs to crash, I think it would be wise to upgrade the documentation on how to use EC pools for CephFS more explicitly.

Furthermore, if it is indeed illegal to crease a cephfs using the command I did, i.e. "ceph fs new cephfs <replicated_metadata_pool> <erasure_coded_data_pool>", probably the code should test and reject that to avoid cluster down states further down the road.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #21174

OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)

Updated by John Spray over 6 years ago

Updated by John Spray over 6 years ago

Updated by Martin Millnert over 6 years ago

Updated by George Zhao over 6 years ago

Updated by huang jun over 6 years ago

Updated by Sage Weil about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by Sage Weil about 5 years ago

Updated by Martin Millnert about 5 years ago

Updated by Martin Millnert about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Samuel Just about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by Grant Slater about 5 years ago

Updated by David Zafman almost 5 years ago