Project

General

Profile

Actions

Bug #10067

closed

::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/

/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz:     0> 2014-11-11 13:57:12.325521 7f7158621700 -1 *** Caught signal (Aborted) **
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: in thread 7f7158621700
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz:
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: ceph version 0.80.7-124-g0804dee (0804deeab293e09123d1b58825051ccc4dddbc0e)
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 1: ceph-osd() [0xab6352]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 2: (()+0xf030) [0x7f7161eaf030]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 3: (gsignal()+0x35) [0x7f7160a22475]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 4: (abort()+0x180) [0x7f7160a256f0]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f716127789d]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 6: (()+0x63996) [0x7f7161275996]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 7: (()+0x639c3) [0x7f71612759c3]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 8: (()+0x63bee) [0x7f7161275bee]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 9: (ceph::buffer::create_page_aligned(unsigned int)+0xfb) [0xb97eeb]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 10: (ceph::buffer::list::rebuild_page_aligned()+0x14a) [0xb98b9a]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 11: (FileJournal::align_bl(long, ceph::buffer::list&)+0x4e) [0xa8b72e]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 12: (FileJournal::do_write(ceph::buffer::list&)+0x133) [0xa927c3]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 13: (FileJournal::write_thread_entry()+0x22f) [0xa9527f]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 14: (FileJournal::Writer::entry()+0xd) [0x9cd20d]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 15: (()+0x6b50) [0x7f7161ea6b50]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: 16: (clone()+0x6d) [0x7f7160acaa7d]
/a/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/594806/remote/vpm048/log/ceph-osd.1.log.gz: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Also note amount of ceph-client.admin.*.log.gz files in the log directory (?!)

Actions #1

Updated by Yuri Weinstein over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Samuel Just over 9 years ago

  • Priority changed from Normal to Urgent

client.admin.* is normal. Crash probably is not.

Actions #3

Updated by Loïc Dachary over 9 years ago

  • Assignee set to Loïc Dachary
Actions #4

Updated by Loïc Dachary over 9 years ago

It's a stress split test therefore no erasure code involved after upgrading to firefly.

Actions #5

Updated by Loïc Dachary over 9 years ago

 -1327> 2014-11-11 13:57:05.496639 7f71525e9700 10 osd.1 5945 handle_replica_op osd_sub_op_reply(osd.1.0:2438 421.27 b104eea7/vpm05325025-391/1c4//421 [] ondisk, result = 0) v2 epoch 5945
 -1326> 2014-11-11 13:57:05.496645 7f71525e9700 15 osd.1 5945 require_same_or_newer_map 5945 (i am 5945) 0x5330780
 -1325> 2014-11-11 13:57:05.496674 7f71525e9700 20 osd.1 5945 _share_map_incoming osd.4 10.214.140.111:6804/32429 5945
 -1324> 2014-11-11 13:57:05.496684 7f71525e9700 15 osd.1 5945 enqueue_op 0xd4cf770 prio 196 cost 0 latency 0.000190 osd_sub_op_reply(osd.1.0:2438 421.27 b104eea7/vpm05325025-391/1c4//421 [] ondisk, result = 0) v2
 -1323> 2014-11-11 13:57:05.496702 7f71525e9700 10 osd.1 5945 do_waiters -- start
 -1322> 2014-11-11 13:57:05.496704 7f71525e9700 10 osd.1 5945 do_waiters -- finish
 -1321> 2014-11-11 13:57:05.500882 7f7158621700 10 journal room 82665471 max_size 104857600 pos 37638144 header.start 15450112 top 4096
 -1320> 2014-11-11 13:57:05.500894 7f7158621700 10 journal check_for_full at 37638144 : 3153920 < 82665471
 -1319> 2014-11-11 13:57:05.500896 7f7158621700 15 journal prepare_single_write 18 will write 37638144 : seq 265226 len 3150926 -> 3153920 (head 40 pre_pad 1428 ebl 3150926 post_pad 1486 tail 40) (ebl alignment 1468)
 -1318> 2014-11-11 13:57:05.503133 7f7158621700 20 journal prepare_multi_write hit max write size 10485760
 -1317> 2014-11-11 13:57:05.503141 7f7158621700 20 journal prepare_multi_write queue_pos now 40792064
 -1316> 2014-11-11 13:57:05.503144 7f7158621700 15 journal do_write writing 27795456~12996608
Actions #6

Updated by Loïc Dachary over 9 years ago

The

      int r = ::posix_memalign((void**)(void*)&data, CEPH_PAGE_SIZE, len);

at http://workbench.dachary.org/ceph/ceph/blob/firefly/src/common/buffer.cc#L239 fails. Since the logs look consistent I'm not sure it's memory corruption. Maybe out of memory ?

Actions #7

Updated by Loïc Dachary over 9 years ago

  • Subject changed from osd.1 crashed in upgrade:dumpling-x-firefly-distro-basic-vps run to ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF