Project

General

Profile

Actions

Bug #18037

closed

leveldb corruption leads to "Operation not permitted not handled" and assert

Added by Nathan Cutler over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel,kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

we have a jewel OSD that doesn't start due to leveldb corruption (looks like). We upped the debug levels and can see filestore emitting an odd message "Operation not permitted not handled", followed "unexpected error code" and a dump of the operation, followed by assert(0 == "unexpected error").

Detailed log attached.


Files

ceph-osd.14.log (141 KB) ceph-osd.14.log Nathan Cutler, 11/24/2016 04:41 PM

Related issues 3 (0 open3 closed)

Related to Ceph - Bug #16257: Ceph random bug is killing osds (os/filestore/FileStore.cc: 2912: FAILED assert(0 == "unexpected error")ClosedSamuel Just06/13/2016

Actions
Copied to Ceph - Backport #18417: jewel: leveldb corruption leads to "Operation not permitted not handled" and assertResolvedNathan CutlerActions
Copied to Ceph - Backport #18418: kraken: leveldb corruption leads to "Operation not permitted not handled" and assertResolvedNathan CutlerActions
Actions #1

Updated by Nathan Cutler over 7 years ago

ceph version 10.2.2

Actions #2

Updated by Nathan Cutler over 7 years ago

  • Related to Bug #16257: Ceph random bug is killing osds (os/filestore/FileStore.cc: 2912: FAILED assert(0 == "unexpected error") added
Actions #3

Updated by Nathan Cutler over 7 years ago

  • Description updated (diff)
Actions #4

Updated by Nathan Cutler over 7 years ago

  • Description updated (diff)
EPERM

Operation not permitted; only the owner of the file
(or other resource) or processes with special privileges
can perform the operation.

Hm. This OSD had XFS corruption which was repaired. Maybe the xfs_repair restored a file, but left it owned by root? I'll try "chown -R ceph.ceph /var/lib/osd/..." and report back.

Actions #6

Updated by Nathan Cutler over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Nathan Cutler
Actions #7

Updated by Nathan Cutler over 7 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to jewel

This PR improves the error message: https://github.com/ceph/ceph/pull/12181

Actions #8

Updated by Nathan Cutler over 7 years ago

This OSD had XFS corruption which was repaired. Maybe the xfs_repair restored a file, but left it owned by root? I'll try "chown -R ceph.ceph /var/lib/osd/..." and report back.

I just got word that the recursive chown does not help.

Actions #9

Updated by Nathan Cutler over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from jewel to jewel,kraken
Actions #10

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #18417: jewel: leveldb corruption leads to "Operation not permitted not handled" and assert added
Actions #11

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #18418: kraken: leveldb corruption leads to "Operation not permitted not handled" and assert added
Actions #12

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF