Project

General

Profile

Actions

Bug #12100

closed

OSD crash, uexpected aio error in FileJournal.cc

Added by Daniel Schneller almost 9 years ago. Updated almost 7 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This morning we had an OSD crash on us with the following trace:

2015-06-20 11:21:37.197601 7f3c3896e700 -1 journal aio to 1027784032~80 wrote 12288
2015-06-20 11:21:37.304686 7f3c3896e700 -1 os/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7f3c3896e700 time 2015-06-20 11:21:37.208575
os/FileJournal.cc: 1426: FAILED assert(0 == "unexpected aio error")

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc271b]
 2: (FileJournal::write_finish_thread_entry()+0x685) [0xa7e755]
 3: (FileJournal::WriteFinisher::entry()+0xd) [0x92bedd]
 4: (()+0x8182) [0x7f3c45058182]
 5: (clone()+0x6d) [0x7f3c435c3fbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The full log is available at [[https://public.centerdevice.de/e42a98af-69ef-4330-b36c-5858889ba566]]
Attaching it here failed repeatedly.

After that, it recovered:

2015-06-20 11:30:49.799204 7f8c6791d900  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-osd, pid 32519
2015-06-20 11:30:49.992485 7f8c6791d900  0 filestore(/var/lib/ceph/osd/ceph-11) backend xfs (magic 0x58465342)
2015-06-20 11:30:49.994221 7f8c6791d900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: FIEMAP ioctl is supported and appears to work
2015-06-20 11:30:49.994237 7f8c6791d900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-06-20 11:30:50.021112 7f8c6791d900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-06-20 11:30:50.021273 7f8c6791d900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: extsize is supported and kernel 3.13.0-27-generic >= 3.5
2015-06-20 11:30:50.500414 7f8c6791d900  0 filestore(/var/lib/ceph/osd/ceph-11) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-06-20 11:30:51.601678 7f8c6791d900  1 journal _open /var/lib/ceph/osd/ceph-11/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-06-20 11:30:51.651860 7f8c6791d900  1 journal _open /var/lib/ceph/osd/ceph-11/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-06-20 11:30:51.741162 7f8c6791d900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-06-20 11:30:51.893813 7f8c6791d900  0 osd.11 36845 crush map has features 2200130813952, adjusting msgr requires for clients
2015-06-20 11:30:51.893841 7f8c6791d900  0 osd.11 36845 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons
2015-06-20 11:30:51.893849 7f8c6791d900  0 osd.11 36845 crush map has features 2200130813952, adjusting msgr requires for osds
2015-06-20 11:30:51.893876 7f8c6791d900  0 osd.11 36845 load_pgs
2015-06-20 11:31:19.324981 7f8c6791d900  0 osd.11 36845 load_pgs opened 913 pgs
2015-06-20 11:31:19.331751 7f8c6791d900 -1 osd.11 36845 log_to_monitors {default=true}
2015-06-20 11:31:19.360728 7f8c443c4700  0 osd.11 36845 ignoring osdmap until we have initialized
2015-06-20 11:31:19.360796 7f8c443c4700  0 osd.11 36845 ignoring osdmap until we have initialized
2015-06-20 11:31:20.013535 7f8c6791d900  0 osd.11 36845 done with init, starting boot process

We could not find anything else relevant in our logs.
The disk seems to be fine, at leat according to MegaCLI.
Server is a DELL PowerEdge R510.

We found (at least superficially) similar looking issues here: [[http://tracker.ceph.com/issues/11298]] and here: [[http://tracker.ceph.com/issues/9570]]

This was the first time we saw an issue like this.


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #9570: osd crash in FileJournal::WriteFinisher::entry() aioRejectedLoïc Dachary09/22/2014

Actions
Actions

Also available in: Atom PDF