Project

General

Profile

Actions

Bug #9570

closed

osd crash in FileJournal::WriteFinisher::entry() aio

Added by Sheldon Mustard over 9 years ago. Updated almost 8 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

30%

Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Workaround

Try with a kernel newer than 3.13 - as new as the environment allows.

Collect more information

If the problem happens in your environment, increase the in memory logging with:

debug osd = 0/30
debug filestore = 0/30
debug ms = 0/1
debug journal = 0/30

save the core dump and add it to this issue, with the log, when it happens.

Original description

On an OSD running dumpling v0.67.10

   -13> 2014-09-22 16:08:54.484825 7feeb1ebd700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484530, event: header_read, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
   -12> 2014-09-22 16:08:54.484842 7feeb1ebd700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484555, event: throttled, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
   -11> 2014-09-22 16:08:54.484854 7feeb1ebd700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484722, event: all_read, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
   -10> 2014-09-22 16:08:54.484866 7feeb1ebd700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484819, event: dispatched, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
    -9> 2014-09-22 16:08:54.484878 7feeb1ebd700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484878, event: waiting_for_osdmap, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
    -8> 2014-09-22 16:08:54.484966 7feead6b4700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484966, event: reached_pg, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
    -7> 2014-09-22 16:08:54.485003 7feead6b4700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.485003, event: started, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
    -6> 2014-09-22 16:08:54.485121 7feead6b4700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.485121, event: started, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
    -5> 2014-09-22 16:08:54.485157 7feead6b4700  5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.485157, event: commit_queued_for_journal_write, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
    -4> 2014-09-22 16:08:54.536191 7feeaf6b8700  1 -- 10.10.10.7:6810/25820 <== osd.9 10.10.10.24:0/16694 156020 ==== osd_ping(ping e2466 stamp 2014-09-22 16:08:54.535761) v2 ==== 47+0+0 (2119288752 0 0) 0x5071a8c0 con 0x4f26d840
    -3> 2014-09-22 16:08:54.536226 7feeb06ba700  1 -- 10.10.10.7:6812/25820 <== osd.9 10.10.10.24:0/16694 156020 ==== osd_ping(ping e2466 stamp 2014-09-22 16:08:54.535761) v2 ==== 47+0+0 (2119288752 0 0) 0x516b1e00 con 0x4f33d6e0
    -2> 2014-09-22 16:08:54.536258 7feeaf6b8700  1 -- 10.10.10.7:6810/25820 --> 10.10.10.24:0/16694 -- osd_ping(ping_reply e2466 stamp 2014-09-22 16:08:54.535761) v2 -- ?+0 0x38f711c0 con 0x4f26d840
    -1> 2014-09-22 16:08:54.536436 7feeb06ba700  1 -- 10.10.10.7:6812/25820 --> 10.10.10.24:0/16694 -- osd_ping(ping_reply e2466 stamp 2014-09-22 16:08:54.535761) v2 -- ?+0 0x4cdcdc40 con 0x4f33d6e0
     0> 2014-09-22 16:08:54.558046 7feeca125700 -1 *** Caught signal (Aborted) **
 in thread 7feeca125700

 ceph version 0.67.10 (9d446bd416c52cd785ccf048ca67737ceafcdd7f)
 1: /usr/bin/ceph-osd() [0x7fd50a]
 2: (()+0xfcb0) [0x7feed27e7cb0]
 3: (gsignal()+0x35) [0x7feed0f3e0d5]
 4: (abort()+0x17b) [0x7feed0f4183b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7feed189769d]
 6: (()+0xb5846) [0x7feed1895846]
 7: (()+0xb5873) [0x7feed1895873]
 8: (()+0xb596e) [0x7feed189596e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8c68df]
 10: (FileJournal::write_finish_thread_entry()+0x6d7) [0x766c67]
 11: (FileJournal::WriteFinisher::entry()+0xd) [0x69cbad]
 12: (()+0x7e9a) [0x7feed27dfe9a]
 13: (clone()+0x6d) [0x7feed0ffc31d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.18.log
--- end dump of recent events ---
2014-09-22 16:08:55.656515 7fd6f29d9780  0 ceph version 0.67.10 (9d446bd416c52cd785ccf048ca67737ceafcdd7f), process ceph-osd, pid 8912
2014-09-22 16:08:55.736899 7fd6f29d9780  1 filestore(/var/lib/ceph/osd/ceph-18) mount detected xfs
2014-09-22 16:08:55.736910 7fd6f29d9780  1 filestore(/var/lib/ceph/osd/ceph-18)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2014-09-22 16:08:55.745399 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is supported and appears to work
2014-09-22 16:08:55.745427 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-09-22 16:08:55.746476 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount did NOT detect btrfs
2014-09-22 16:08:55.775236 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount syncfs(2) syscall fully supported (by glibc and kernel)
2014-09-22 16:08:55.775847 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount found snaps <>
2014-09-22 16:08:56.254811 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2014-09-22 16:09:00.301748 7fd6f29d9780  1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 19: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-09-22 16:09:00.620119 7fd6f29d9780  1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 19: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-09-22 16:09:00.651236 7fd6f29d9780  1 journal close /var/lib/ceph/osd/ceph-18/journal
2014-09-22 16:09:00.700334 7fd6f29d9780  1 filestore(/var/lib/ceph/osd/ceph-18) mount detected xfs
2014-09-22 16:09:00.707685 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is supported and appears to work
2014-09-22 16:09:00.707707 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-09-22 16:09:00.708656 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount did NOT detect btrfs
2014-09-22 16:09:00.731348 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount syncfs(2) syscall fully supported (by glibc and kernel)
2014-09-22 16:09:00.731493 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount found snaps <>
2014-09-22 16:09:00.744450 7fd6f29d9780  0 filestore(/var/lib/ceph/osd/ceph-18) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2014-09-22 16:09:00.773887 7fd6f29d9780  1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 20: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-09-22 16:09:00.799478 7fd6f29d9780  1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 20: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1


Files

ceph-osd.18.log.gz (204 KB) ceph-osd.18.log.gz Sheldon Mustard, 09/30/2014 08:02 AM

Related issues 2 (0 open2 closed)

Related to Ceph - Bug #12100: OSD crash, uexpected aio error in FileJournal.ccRejectedSage Weil06/20/2015

Actions
Related to Ceph - Bug #14558: journal AIO write error message is confusingResolved01/29/2016

Actions
Actions

Also available in: Atom PDF