Actions
Bug #10067
closed::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 0> 2014-11-11 13:57:12.325521 7f7158621700 -1 *** Caught signal (Aborted) ** /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: in thread 7f7158621700 /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: ceph version 0.80.7-124-g0804dee (0804deeab293e09123d1b58825051ccc4dddbc0e) /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 1: ceph-osd() [0xab6352] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 2: (()+0xf030) [0x7f7161eaf030] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 3: (gsignal()+0x35) [0x7f7160a22475] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 4: (abort()+0x180) [0x7f7160a256f0] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f716127789d] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 6: (()+0x63996) [0x7f7161275996] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 7: (()+0x639c3) [0x7f71612759c3] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 8: (()+0x63bee) [0x7f7161275bee] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 9: (ceph::buffer::create_page_aligned(unsigned int)+0xfb) [0xb97eeb] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 10: (ceph::buffer::list::rebuild_page_aligned()+0x14a) [0xb98b9a] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 11: (FileJournal::align_bl(long, ceph::buffer::list&)+0x4e) [0xa8b72e] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 12: (FileJournal::do_write(ceph::buffer::list&)+0x133) [0xa927c3] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 13: (FileJournal::write_thread_entry()+0x22f) [0xa9527f] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 14: (FileJournal::Writer::entry()+0xd) [0x9cd20d] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 15: (()+0x6b50) [0x7f7161ea6b50] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 16: (clone()+0x6d) [0x7f7160acaa7d] /a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Also note amount of ceph-client.admin.*.log.gz files in the log directory (?!)
Updated by Samuel Just over 9 years ago
- Priority changed from Normal to Urgent
client.admin.* is normal. Crash probably is not.
Updated by Loïc Dachary over 9 years ago
It's a stress split test therefore no erasure code involved after upgrading to firefly.
Updated by Loïc Dachary over 9 years ago
-1327> 2014-11-11 13:57:05.496639 7f71525e9700 10 osd.1 5945 handle_replica_op osd_sub_op_reply(osd.1.0:2438 421.27 b104eea7/vpm05325025-391/1c4//421 [] ondisk, result = 0) v2 epoch 5945 -1326> 2014-11-11 13:57:05.496645 7f71525e9700 15 osd.1 5945 require_same_or_newer_map 5945 (i am 5945) 0x5330780 -1325> 2014-11-11 13:57:05.496674 7f71525e9700 20 osd.1 5945 _share_map_incoming osd.4 10.214.140.111:6804/32429 5945 -1324> 2014-11-11 13:57:05.496684 7f71525e9700 15 osd.1 5945 enqueue_op 0xd4cf770 prio 196 cost 0 latency 0.000190 osd_sub_op_reply(osd.1.0:2438 421.27 b104eea7/vpm05325025-391/1c4//421 [] ondisk, result = 0) v2 -1323> 2014-11-11 13:57:05.496702 7f71525e9700 10 osd.1 5945 do_waiters -- start -1322> 2014-11-11 13:57:05.496704 7f71525e9700 10 osd.1 5945 do_waiters -- finish -1321> 2014-11-11 13:57:05.500882 7f7158621700 10 journal room 82665471 max_size 104857600 pos 37638144 header.start 15450112 top 4096 -1320> 2014-11-11 13:57:05.500894 7f7158621700 10 journal check_for_full at 37638144 : 3153920 < 82665471 -1319> 2014-11-11 13:57:05.500896 7f7158621700 15 journal prepare_single_write 18 will write 37638144 : seq 265226 len 3150926 -> 3153920 (head 40 pre_pad 1428 ebl 3150926 post_pad 1486 tail 40) (ebl alignment 1468) -1318> 2014-11-11 13:57:05.503133 7f7158621700 20 journal prepare_multi_write hit max write size 10485760 -1317> 2014-11-11 13:57:05.503141 7f7158621700 20 journal prepare_multi_write queue_pos now 40792064 -1316> 2014-11-11 13:57:05.503144 7f7158621700 15 journal do_write writing 27795456~12996608
Updated by Loïc Dachary over 9 years ago
The
int r = ::posix_memalign((void**)(void*)&data, CEPH_PAGE_SIZE, len);
at http://workbench.dachary.org/ceph/ceph/blob/firefly/src/common/buffer.cc#L239 fails. Since the logs look consistent I'm not sure it's memory corruption. Maybe out of memory ?
Updated by Loïc Dachary over 9 years ago
- Subject changed from osd.1 crashed in upgrade:dumpling-x-firefly-distro-basic-vps run to ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
- Status changed from New to Can't reproduce
Actions