Project

General

Profile

Bug #18037

leveldb corruption leads to "Operation not permitted not handled" and assert

Added by Nathan Cutler 8 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
11/24/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel,kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

we have a jewel OSD that doesn't start due to leveldb corruption (looks like). We upped the debug levels and can see filestore emitting an odd message "Operation not permitted not handled", followed "unexpected error code" and a dump of the operation, followed by assert(0 == "unexpected error").

Detailed log attached.

ceph-osd.14.log View (141 KB) Nathan Cutler, 11/24/2016 04:41 PM


Related issues

Related to Ceph - Bug #16257: Ceph random bug is killing osds (os/filestore/FileStore.cc: 2912: FAILED assert(0 == "unexpected error") Closed 06/13/2016
Copied to Ceph - Backport #18417: jewel: leveldb corruption leads to "Operation not permitted not handled" and assert Resolved
Copied to Ceph - Backport #18418: kraken: leveldb corruption leads to "Operation not permitted not handled" and assert Resolved

History

#1 Updated by Nathan Cutler 8 months ago

ceph version 10.2.2

#2 Updated by Nathan Cutler 8 months ago

  • Related to Bug #16257: Ceph random bug is killing osds (os/filestore/FileStore.cc: 2912: FAILED assert(0 == "unexpected error") added

#3 Updated by Nathan Cutler 8 months ago

  • Description updated (diff)

#4 Updated by Nathan Cutler 8 months ago

  • Description updated (diff)
EPERM

Operation not permitted; only the owner of the file
(or other resource) or processes with special privileges
can perform the operation.

Hm. This OSD had XFS corruption which was repaired. Maybe the xfs_repair restored a file, but left it owned by root? I'll try "chown -R ceph.ceph /var/lib/osd/..." and report back.

#6 Updated by Nathan Cutler 8 months ago

  • Status changed from New to In Progress
  • Assignee set to Nathan Cutler

#7 Updated by Nathan Cutler 8 months ago

  • Status changed from In Progress to Need Review
  • Backport set to jewel

This PR improves the error message: https://github.com/ceph/ceph/pull/12181

#8 Updated by Nathan Cutler 8 months ago

This OSD had XFS corruption which was repaired. Maybe the xfs_repair restored a file, but left it owned by root? I'll try "chown -R ceph.ceph /var/lib/osd/..." and report back.

I just got word that the recursive chown does not help.

#9 Updated by Nathan Cutler 7 months ago

  • Status changed from Need Review to Pending Backport
  • Backport changed from jewel to jewel,kraken

#10 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #18417: jewel: leveldb corruption leads to "Operation not permitted not handled" and assert added

#11 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #18418: kraken: leveldb corruption leads to "Operation not permitted not handled" and assert added

#12 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF