Bug #9570
Updated by Loïc Dachary about 8 years ago
h3. Workaround
Try with a kernel newer than 3.13 - as new as the environment allows.
h3. Collect more information
If the problem happens in your environment, increase the in memory logging with:
<pre>
debug osd = 0/30
debug filestore = 0/30
debug ms = 0/1
debug journal = 0/30
</pre>
save the core dump and add it to this issue, with the log, when it happens.
h3. Original description
On an OSD running dumpling v0.67.10
<pre>
-13> 2014-09-22 16:08:54.484825 7feeb1ebd700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484530, event: header_read, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-12> 2014-09-22 16:08:54.484842 7feeb1ebd700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484555, event: throttled, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-11> 2014-09-22 16:08:54.484854 7feeb1ebd700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484722, event: all_read, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-10> 2014-09-22 16:08:54.484866 7feeb1ebd700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484819, event: dispatched, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-9> 2014-09-22 16:08:54.484878 7feeb1ebd700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484878, event: waiting_for_osdmap, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-8> 2014-09-22 16:08:54.484966 7feead6b4700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.484966, event: reached_pg, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-7> 2014-09-22 16:08:54.485003 7feead6b4700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.485003, event: started, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-6> 2014-09-22 16:08:54.485121 7feead6b4700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.485121, event: started, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-5> 2014-09-22 16:08:54.485157 7feead6b4700 5 --OSD::tracker-- reqid: client.19451671.0:182034, seq: 81280699, time: 2014-09-22 16:08:54.485157, event: commit_queued_for_journal_write, request: osd_sub_op(client.19451671.0:182034 366.33e f4e4873e/rb.0.113a598.238e1f29.000000000e4e/head//366 [] v 2466'680337 snapset=0=[]:[] snapc=0=[]) v7
-4> 2014-09-22 16:08:54.536191 7feeaf6b8700 1 -- 10.10.10.7:6810/25820 <== osd.9 10.10.10.24:0/16694 156020 ==== osd_ping(ping e2466 stamp 2014-09-22 16:08:54.535761) v2 ==== 47+0+0 (2119288752 0 0) 0x5071a8c0 con 0x4f26d840
-3> 2014-09-22 16:08:54.536226 7feeb06ba700 1 -- 10.10.10.7:6812/25820 <== osd.9 10.10.10.24:0/16694 156020 ==== osd_ping(ping e2466 stamp 2014-09-22 16:08:54.535761) v2 ==== 47+0+0 (2119288752 0 0) 0x516b1e00 con 0x4f33d6e0
-2> 2014-09-22 16:08:54.536258 7feeaf6b8700 1 -- 10.10.10.7:6810/25820 --> 10.10.10.24:0/16694 -- osd_ping(ping_reply e2466 stamp 2014-09-22 16:08:54.535761) v2 -- ?+0 0x38f711c0 con 0x4f26d840
-1> 2014-09-22 16:08:54.536436 7feeb06ba700 1 -- 10.10.10.7:6812/25820 --> 10.10.10.24:0/16694 -- osd_ping(ping_reply e2466 stamp 2014-09-22 16:08:54.535761) v2 -- ?+0 0x4cdcdc40 con 0x4f33d6e0
0> 2014-09-22 16:08:54.558046 7feeca125700 -1 *** Caught signal (Aborted) **
in thread 7feeca125700
ceph version 0.67.10 (9d446bd416c52cd785ccf048ca67737ceafcdd7f)
1: /usr/bin/ceph-osd() [0x7fd50a]
2: (()+0xfcb0) [0x7feed27e7cb0]
3: (gsignal()+0x35) [0x7feed0f3e0d5]
4: (abort()+0x17b) [0x7feed0f4183b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7feed189769d]
6: (()+0xb5846) [0x7feed1895846]
7: (()+0xb5873) [0x7feed1895873]
8: (()+0xb596e) [0x7feed189596e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8c68df]
10: (FileJournal::write_finish_thread_entry()+0x6d7) [0x766c67]
11: (FileJournal::WriteFinisher::entry()+0xd) [0x69cbad]
12: (()+0x7e9a) [0x7feed27dfe9a]
13: (clone()+0x6d) [0x7feed0ffc31d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.18.log
--- end dump of recent events ---
2014-09-22 16:08:55.656515 7fd6f29d9780 0 ceph version 0.67.10 (9d446bd416c52cd785ccf048ca67737ceafcdd7f), process ceph-osd, pid 8912
2014-09-22 16:08:55.736899 7fd6f29d9780 1 filestore(/var/lib/ceph/osd/ceph-18) mount detected xfs
2014-09-22 16:08:55.736910 7fd6f29d9780 1 filestore(/var/lib/ceph/osd/ceph-18) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2014-09-22 16:08:55.745399 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is supported and appears to work
2014-09-22 16:08:55.745427 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-09-22 16:08:55.746476 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount did NOT detect btrfs
2014-09-22 16:08:55.775236 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount syncfs(2) syscall fully supported (by glibc and kernel)
2014-09-22 16:08:55.775847 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount found snaps <>
2014-09-22 16:08:56.254811 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2014-09-22 16:09:00.301748 7fd6f29d9780 1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 19: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-09-22 16:09:00.620119 7fd6f29d9780 1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 19: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-09-22 16:09:00.651236 7fd6f29d9780 1 journal close /var/lib/ceph/osd/ceph-18/journal
2014-09-22 16:09:00.700334 7fd6f29d9780 1 filestore(/var/lib/ceph/osd/ceph-18) mount detected xfs
2014-09-22 16:09:00.707685 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is supported and appears to work
2014-09-22 16:09:00.707707 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-09-22 16:09:00.708656 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount did NOT detect btrfs
2014-09-22 16:09:00.731348 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount syncfs(2) syscall fully supported (by glibc and kernel)
2014-09-22 16:09:00.731493 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount found snaps <>
2014-09-22 16:09:00.744450 7fd6f29d9780 0 filestore(/var/lib/ceph/osd/ceph-18) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2014-09-22 16:09:00.773887 7fd6f29d9780 1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 20: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-09-22 16:09:00.799478 7fd6f29d9780 1 journal _open /var/lib/ceph/osd/ceph-18/journal fd 20: 16382951424 bytes, block size 4096 bytes, directio = 1, aio = 1
</pre>