Bug #2798
handle_osd_ping assert
0%
Description
ceph version 0.48argonaut-404-gabe05a3 (abe05a3fbbb120d8d354623258d9104584db66f7)
1: (OSDMap::get_cluster_inst(int) const+0xc9) [0x58cde9]
2: (OSD::handle_osd_ping(MOSDPing*)+0x8cf) [0x5d4b4f]
3: (OSD::heartbeat_dispatch(Message*)+0x71) [0x5d5491]
4: (SimpleMessenger::DispatchQueue::entry()+0x583) [0x7d5683]
5: (SimpleMessenger::dispatch_entry()+0x15) [0x7d6a05]
6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7957bd]
7: (()+0x77f1) [0x7ffff76507f1]
8: (clone()+0x6d) [0x7ffff6aa1ccd]
Associated revisions
OSD: handle_osd_ping: use service->get_osdmap()
This way, we avoid grabbing the map_lock. Furthermore,
get curmap at the beginning of the method to ensure that
we send the message using the same map used to check
is_up.
This should also fix #2798, which was caused by
an osd being marked up between service.get_osdmap()
and OSD::osdmap.
Signed-off-by: Samuel Just <sam.just@inktank.com>
History
#1 Updated by Tamilarasi muthamizhan about 11 years ago
Recent logs: /a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13020
ubuntu@teuthology:/a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13020$ cat config.yaml
kernel: &id001
kdb: true
sha1: 14240f8208136dbbe7e825caedc0104806027aae
nuke-on-error: true
overrides:
ceph:
fs: btrfs
log-whitelist:
- slow request
sha1: 90ddc5ae51627e7656459085d7e15105c8b8316d
workunit:
coverage: true
sha1: 90ddc5ae51627e7656459085d7e15105c8b8316d
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
ubuntu@plana39.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDo+Kh24vRxeTQ6/n5PIIGuxrPHPRO/xMQlwoLHi7mR01cIXJMG5wet7mp2om3/5SZSDcLBHduDKrdWL142Sg5fC0zZPUggbxS7nz/UCjYBzMsOtHEUAU5Gs0KFopOCHXNEveK95ezsroMAD5+jS/IEpiooYCkrR3H+NSvUU0Ae352PlXqV0vamkYzyQyEMmhFE50ALhUXbKMve3d2mxJee5sqVZSBmQTbze9RKUA96t9iiwiheflXbN1i9WHlbBOIue5pZ5fM3/vqPWgaShfFpa0pT56QKJfjyFcDeCLOislo23E5qKAJOi5vn5BoYVtG3niNQpt/YbYGfDEHVeqt9
ubuntu@plana40.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEwyNlwC9Utqf3PCjL2JR4wwDkzpdEJuW93DOW82vYVisYEGod454JwXeNkjqzTUk6tXeRoUM9f/C6sZS3LFgHcMYt6m0sxP8DC4qU+q0YxCw9zLY8bXKe4DDjijM62h/SnyqyOWIh9amGT7wRwZEHBV1BKvZbNxQIJ7ESkuKsk/tJfWKhq7dSw6E/+MZ4yQtXvTyaJ3pK96Hq2uoUkawv+FxXBrzG3FtTTYA8gqA1SIiV3erEIQuBK/WD74i5yK4rwpfGTo7jNc0V6wrwO1BKFj/OGjSC+2LSAkBgf8WLe6UL/dHr3bBEyzm0V4xMf5Iqb8JGvkaXNEfbFqzKC2Wv
ubuntu@plana42.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLK24JC5ni+UfmxncggOIxjHNsF5J/y/iureYMSpWSQLUsVIudQstka+qcojWrVfcG21W/2kokrvLHqLWyd1ar27ovQE3pMv+nG8jxL+OpEpR72QxAH5g+e7Uv3VewKVfssbyOvtOYWIVfp3z7GtpVlCO9V0DuUgbFyoiNih93kZGpIEkv2QA6j0YA65JntISwMwvXGF+VNilUOERHKXLdmLHKIbZ5aAjxGCJMs8fhsVTfPu9l1nEXklK7bgRa7395BJa48dfONdoESoZZrr4IwE3C6acc+NIw+qGwSnNweAVFvH9C+rbN6gAvW9ax0MMeK5TveDiwaTbEB6Po5UKF
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- ceph-fuse: null
- workunit:
clients:
all:
- suites/pjd.sh
ubuntu@teuthology:/a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13020$ cat summary.yaml
ceph-sha1: 90ddc5ae51627e7656459085d7e15105c8b8316d
client.0-kernel-sha1: 14240f8208136dbbe7e825caedc0104806027aae
description: collection:basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:cfuse_workunit_suites_pjd.yaml
duration: 2266.0501101016998
failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage
/tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper kill /tmp/cephtest/binary/usr/local/bin/ceph-osd
-f -i 3 -c /tmp/cephtest/ceph.conf'''
flavor: basic
mon.a-kernel-sha1: 14240f8208136dbbe7e825caedc0104806027aae
mon.b-kernel-sha1: 14240f8208136dbbe7e825caedc0104806027aae
owner: scheduled_teuthology@teuthology
success: false
ubuntu@teuthology:/a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13020$ zcat remote/*/log/osd.3.log.gz
ceph version 0.48argonaut-413-g90ddc5a (commit:90ddc5ae51627e7656459085d7e15105c8b8316d)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x71ba9a]
2: (()+0xfcb0) [0x7f8e22a9ccb0]
3: (OSD::handle_osd_ping(MOSDPing*)+0x74d) [0x5dbdfd]
4: (OSD::heartbeat_dispatch(Message*)+0x22b) [0x5dc70b]
5: (SimpleMessenger::DispatchQueue::entry()+0x92b) [0x7b5b3b]
6: (SimpleMessenger::dispatch_entry()+0x24) [0x7b6914]
7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7762fd]
8: (()+0x7e9a) [0x7f8e22a94e9a]
9: (clone()+0x6d) [0x7f8e210494bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
#2 Updated by Tamilarasi muthamizhan about 11 years ago
Also,
ubuntu@teuthology: /a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13039
ubuntu@teuthology: /a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13042
ubuntu@teuthology: /a/teuthology-2012-07-17_19:00:06-regression-master-testing-gcov/13115
#3 Updated by Samuel Just about 11 years ago
- Status changed from 12 to Resolved