Bug #18037
closed
leveldb corruption leads to "Operation not permitted not handled" and assert
Added by Nathan Cutler over 7 years ago.
Updated about 7 years ago.
Description
we have a jewel OSD that doesn't start due to leveldb corruption (looks like). We upped the debug levels and can see filestore emitting an odd message "Operation not permitted not handled", followed "unexpected error code" and a dump of the operation, followed by assert(0 == "unexpected error").
Detailed log attached.
Files
- Related to Bug #16257: Ceph random bug is killing osds (os/filestore/FileStore.cc: 2912: FAILED assert(0 == "unexpected error") added
- Description updated (diff)
- Description updated (diff)
EPERM
Operation not permitted; only the owner of the file
(or other resource) or processes with special privileges
can perform the operation.
Hm. This OSD had XFS corruption which was repaired. Maybe the xfs_repair restored a file, but left it owned by root? I'll try "chown -R ceph.ceph /var/lib/osd/..." and report back.
- Status changed from New to In Progress
- Assignee set to Nathan Cutler
- Status changed from In Progress to Fix Under Review
- Backport set to jewel
This OSD had XFS corruption which was repaired. Maybe the xfs_repair restored a file, but left it owned by root? I'll try "chown -R ceph.ceph /var/lib/osd/..." and report back.
I just got word that the recursive chown does not help.
- Status changed from Fix Under Review to Pending Backport
- Backport changed from jewel to jewel,kraken
- Copied to Backport #18417: jewel: leveldb corruption leads to "Operation not permitted not handled" and assert added
- Copied to Backport #18418: kraken: leveldb corruption leads to "Operation not permitted not handled" and assert added
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF