Bug #5445: random osd EPERM on journal - Ceph - Ceph

Actions

Copy link

Bug #5445

closed

random osd EPERM on journal

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2013-06-24_18:48:34-rados-cuttlefish-testing-basic/45071

2013-06-24 19:00:31.187572 7f9e55fa8780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2013-06-24 19:00:31.187586 7f9e55fa8780  1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 27: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-06-24 19:00:31.187654 7f9e55fa8780  1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 27: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-06-24 19:00:42.557275 7f9e55fa8780  1 journal close /var/lib/ceph/osd/ceph-1/journal
2013-06-24 19:00:42.557652 7f9e55fa8780 -1 ESC[0;31m ** ERROR: osd init failed: (1) Operation not permittedESC[0m

Actions

Copy link

Updated by Sage Weil almost 11 years ago

Status changed from New to In Progress

this happens on tasks that don't use all available disks. a previous job with ceph-deploy leaves behind osd disks, something (not sure what) triggers an activate, and they get mounted over teuthology's osd.

Actions

Copy link

Updated by Sage Weil almost 11 years ago

oh, the test in question isn't mounting a drive, but is storing the data directly in /var/lib/ceph/osd/ceph-$id. the ceph-disk test should bail out if a whoami or whatever file is already present

Actions

Copy link

Updated by Sage Weil almost 11 years ago

Status changed from In Progress to Fix Under Review

pushed wip-5445.

this normally wouldn't happen, except that teuthology does not define fsid in the ceph.conf, so ceph-disk assumes the cluster name is 'ceph'. adding it there would also block this particular failure.

Actions

Copy link