Bug #5445
random osd EPERM on journal
0%
Description
ubuntu@teuthology:/a/teuthology-2013-06-24_18:48:34-rados-cuttlefish-testing-basic/45071
2013-06-24 19:00:31.187572 7f9e55fa8780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-06-24 19:00:31.187586 7f9e55fa8780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 27: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-06-24 19:00:31.187654 7f9e55fa8780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 27: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-06-24 19:00:42.557275 7f9e55fa8780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2013-06-24 19:00:42.557652 7f9e55fa8780 -1 ESC[0;31m ** ERROR: osd init failed: (1) Operation not permittedESC[0m
Associated revisions
ceph-disk: do not mount over an osd directly in /var/lib/ceph/osd/$cluster-$id
If we see a 'ready' file in the target OSD dir, do not mount our device
on top of it.
Among other things, this prevents ceph-disk activate on stray disks from
stepping on teuthology osds.
Fixes: #5445
Signed-off-by: Sage Weil <sage@inktank.com>
ceph-disk: do not mount over an osd directly in /var/lib/ceph/osd/$cluster-$id
If we see a 'ready' file in the target OSD dir, do not mount our device
on top of it.
Among other things, this prevents ceph-disk activate on stray disks from
stepping on teuthology osds.
Fixes: #5445
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8a17f33b14d858235dfeaa42be1f4842dcfd66d2)
History
#1 Updated by Sage Weil about 10 years ago
- Status changed from New to In Progress
this happens on tasks that don't use all available disks. a previous job with ceph-deploy leaves behind osd disks, something (not sure what) triggers an activate, and they get mounted over teuthology's osd.
#2 Updated by Sage Weil about 10 years ago
oh, the test in question isn't mounting a drive, but is storing the data directly in /var/lib/ceph/osd/ceph-$id. the ceph-disk test should bail out if a whoami or whatever file is already present
#3 Updated by Sage Weil about 10 years ago
- Status changed from In Progress to Fix Under Review
pushed wip-5445.
this normally wouldn't happen, except that teuthology does not define fsid in the ceph.conf, so ceph-disk assumes the cluster name is 'ceph'. adding it there would also block this particular failure.
#4 Updated by Sage Weil about 10 years ago
- Status changed from Fix Under Review to Pending Backport
#5 Updated by Sage Weil about 10 years ago
- Status changed from Pending Backport to Resolved
#6 Updated by Sage Weil about 10 years ago
- Status changed from Resolved to 12
teuthology-2013-07-05_01:00:13-rados-master-testing-basic 55351: and 55360:
#7 Updated by Sage Weil about 10 years ago
- Priority changed from Urgent to High
#8 Updated by Sage Weil about 10 years ago
- Status changed from 12 to Can't reproduce