Actions
Backport #16915
closedJewel: OSD crash with Hammer to Jewel Upgrade: void FileStore::init_temp_collections()
Release:
jewel
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Updated by David Zafman over 7 years ago
- Copied from Bug #16672: OSD crash with Hammer to Jewel Upgrade: void FileStore::init_temp_collections() added
Updated by David Zafman over 7 years ago
https://github.com/ceph/ceph/pull/10561
We should put the master pull request first after review and testing. Then we can merge this Jewel backport.
Updated by Nathan Cutler over 7 years ago
- Description updated (diff)
original description¶
During a upgrade from 0.94.7 to 10.2.2 I'm seeing multiple OSDs crash with this backtrace:
-7> 2016-07-13 10:23:42.811650 7f3f5fd0c800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-476) detect_features: splice is supported -6> 2016-07-13 10:23:42.843643 7f3f5fd0c800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-476) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) -5> 2016-07-13 10:23:42.844736 7f3f5fd0c800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-476) detect_feature: extsize is disabled by conf -4> 2016-07-13 10:23:42.847300 7f3f5fd0c800 1 leveldb: Recovering log #876 -3> 2016-07-13 10:23:42.894086 7f3f5fd0c800 1 leveldb: Delete type=0 #876 -2> 2016-07-13 10:23:42.894171 7f3f5fd0c800 1 leveldb: Delete type=3 #875 -1> 2016-07-13 10:23:42.895373 7f3f5fd0c800 0 filestore(/var/lib/ceph/osd/ceph-476) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 0> 2016-07-13 10:23:42.905145 7f3f5fd0c800 -1 os/filestore/FileStore.cc: In function 'void FileStore::init_temp_collections()' thread 7f3f5fd0c800 time 2016-07-13 10:23:42.902566 os/filestore/FileStore.cc: 1735: FAILED assert(r == 0) ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f3f607375b5] 2: (FileStore::init_temp_collections()+0xa63) [0x7f3f603ff613] 3: (FileStore::mount()+0x3ed0) [0x7f3f604036d0] 4: (OSD::init()+0x27d) [0x7f3f600c704d] 5: (main()+0x2c55) [0x7f3f6002cbe5] 6: (__libc_start_main()+0xf5) [0x7f3f5cc1eb15] 7: (()+0x353009) [0x7f3f60077009] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 0 osd 0/ 5 optracker 0/ 5 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 0/ 0 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.476.log --- end dump of recent events --- 2016-07-13 10:23:42.909150 7f3f5fd0c800 -1 *** Caught signal (Aborted) ** in thread 7f3f5fd0c800 thread_name:ceph-osd ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: (()+0x91341a) [0x7f3f6063741a] 2: (()+0xf100) [0x7f3f5e66f100] 3: (gsignal()+0x37) [0x7f3f5cc325f7] 4: (abort()+0x148) [0x7f3f5cc33ce8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f3f60737797] 6: (FileStore::init_temp_collections()+0xa63) [0x7f3f603ff613] 7: (FileStore::mount()+0x3ed0) [0x7f3f604036d0] 8: (OSD::init()+0x27d) [0x7f3f600c704d] 9: (main()+0x2c55) [0x7f3f6002cbe5] 10: (__libc_start_main()+0xf5) [0x7f3f5cc1eb15] 11: (()+0x353009) [0x7f3f60077009] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2016-07-13 10:23:42.909150 7f3f5fd0c800 -1 *** Caught signal (Aborted) ** in thread 7f3f5fd0c800 thread_name:ceph-osd ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: (()+0x91341a) [0x7f3f6063741a] 2: (()+0xf100) [0x7f3f5e66f100] 3: (gsignal()+0x37) [0x7f3f5cc325f7] 4: (abort()+0x148) [0x7f3f5cc33ce8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f3f60737797] 6: (FileStore::init_temp_collections()+0xa63) [0x7f3f603ff613] 7: (FileStore::mount()+0x3ed0) [0x7f3f604036d0] 8: (OSD::init()+0x27d) [0x7f3f600c704d] 9: (main()+0x2c55) [0x7f3f6002cbe5] 10: (__libc_start_main()+0xf5) [0x7f3f5cc1eb15] 11: (()+0x353009) [0x7f3f60077009] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 0 osd 0/ 5 optracker 0/ 5 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 0/ 0 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.476.log --- end dump of recent events ---
If I try to start them again it works, but sometimes requires two additional tries.
systemctl reset-failed ceph-osd@476.service systemctl start ceph-osd@476.service
It is a bit hard to diagnose since it doesn't happen on all OSDs and it doesn't always happen a second time on the same OSD.
Updated by David Zafman over 7 years ago
- Status changed from In Progress to Resolved
Actions