Project

General

Profile

Actions

Bug #11298

open

aio gets EPERM when update-grub runs

Added by Igor Megov about 9 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Osd.38 suddenly disappears fron cluster at 12:50. Here is a piece of osd log, the whole log is attached to issue.

2015-04-01 12:50:54.807675 7fb75fbb1700 -1 journal aio to 2809413632~12288 got (1) Operation not permitted
2015-04-01 12:50:54.809220 7fb75fbb1700 -1 os/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7fb75fbb1700 time 2015-04-01 12:50:54.807703
os/FileJournal.cc: 1357: FAILED assert(0 == "unexpected aio error")

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (FileJournal::write_finish_thread_entry()+0x947) [0xa91467]
 2: (FileJournal::WriteFinisher::entry()+0xd) [0x9cc3bd]
 3: (()+0x80a4) [0x7fb76a2f40a4]
 4: (clone()+0x6d) [0x7fb768e84c2d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

We have 5 node cluster in production, journals are on ssd.
Ceph version is 0.80.7-1~bpo70+1 running on Debian GNU/Linux 7.6(wheezy)
with 3.16-2-amd64 kernel, builded from jessie/sid source package.

I think, that I can guess a reason of failuer: I've added "elevator=deadline"
parameter to grub config and did "update-grub" after that. Grub by itself
do a block device scanning for partitions and filesystems and maybe locks
a journal partition someway. Ceph journaling code got "not permitted" error
and exits abnormally.


Files

ceph-osd.38.log.1.bz2 (154 KB) ceph-osd.38.log.1.bz2 OSD log on crash Igor Megov, 04/01/2015 09:53 AM
Actions

Also available in: Atom PDF