Bug #15408
closed"osd/PG.cc: 1892: FAILED assert(waiting_for_peered.empty())" in upgrade:hammer-hammer-distro-basic-openstack
0%
Description
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-04-06_20:05:02-upgrade:hammer-hammer-distro-basic-openstack/
Job: 30258
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-06_20:05:02-upgrade:hammer-hammer-distro-basic-openstack/30258/teuthology.log
2016-04-06T22:09:29.988 INFO:teuthology.orchestra.run.target086019.stderr:dumped all in format json 2016-04-06T22:09:30.814 INFO:teuthology.orchestra.run.target086019:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format=json' 2016-04-06T22:09:31.295 INFO:teuthology.orchestra.run.target086019:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph -m 158.69.86.189:6790 mon_status' 2016-04-06T22:09:32.021 INFO:tasks.ceph.osd.5.target086189.stderr:osd/PG.cc: In function 'void PG::replay_queued_ops()' thread 7f2da141d700 time 2016-04-06 22:09:31.915151 2016-04-06T22:09:32.022 INFO:tasks.ceph.osd.5.target086189.stderr:osd/PG.cc: 1892: FAILED assert(waiting_for_peered.empty()) 2016-04-06T22:09:32.212 INFO:tasks.ceph.osd.5.target086189.stderr: ceph version 0.94.6-254-ge219e85 (e219e85be00088eecde7b1f29d7699493a79bc4d) 2016-04-06T22:09:32.213 INFO:tasks.ceph.osd.5.target086189.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1c6b] 2016-04-06T22:09:32.213 INFO:tasks.ceph.osd.5.target086189.stderr: 2: (PG::replay_queued_ops()+0x432) [0x7cc962] 2016-04-06T22:09:32.214 INFO:tasks.ceph.osd.5.target086189.stderr: 3: (OSD::check_replay_queue()+0x3f1) [0x674c61] 2016-04-06T22:09:32.214 INFO:tasks.ceph.osd.5.target086189.stderr: 4: (OSD::tick()+0x60c) [0x6b3d1c] 2016-04-06T22:09:32.214 INFO:tasks.ceph.osd.5.target086189.stderr: 5: (Context::complete(int)+0x9) [0x6c2a49] 2016-04-06T22:09:32.214 INFO:tasks.ceph.osd.5.target086189.stderr: 6: (SafeTimer::timer_thread()+0xec) [0xb9ad6c] 2016-04-06T22:09:32.215 INFO:tasks.ceph.osd.5.target086189.stderr: 7: (SafeTimerThread::entry()+0xd) [0xb9bd0d] 2016-04-06T22:09:32.215 INFO:tasks.ceph.osd.5.target086189.stderr: 8: (()+0x8182) [0x7f2daa145182] 2016-04-06T22:09:32.215 INFO:tasks.ceph.osd.5.target086189.stderr: 9: (clone()+0x6d) [0x7f2da86b047d] 2016-04-06T22:09:32.216 INFO:tasks.ceph.osd.5.target086189.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2016-04-06T22:09:32.217 INFO:teuthology.orchestra.run.target086019.stderr:dumped all in format json
Updated by Loïc Dachary about 8 years ago
It fails when upgrading from v0.94.2 to the latest hammer. The part of the code where it fails has been modified by http://tracker.ceph.com/projects/ceph/repository/revisions/9f3aebee16e256888b149fa770df845787b06b6e/diff in v0.94.6
Updated by Sage Weil about 8 years ago
- Status changed from New to 12
I don't think this is hammer specific:
- boost::statechart::result PG::RecoveryState::Active::react(const AllReplicasActivated &evt) (and elsewhere) guard the requeue of waiting_for_peered:
if (pg->flushes_in_progress 0) {
pg->requeue_ops(pg->waiting_for_peered);
}
- but the replay queue blindly takes pgs that have expired and tries to do the queued events:
if ((pg->is_active() || pg->is_activating()) &&
pg->is_replay() &&
pg->is_primary() &&
pg->replay_until p->second) {
pg->replay_queued_ops();
(which ignores flushes_in_progress), and requeue_queued_ops will
if (is_active()) {
requeue_ops(replay);
requeue_ops(waiting_for_active);
assert(waiting_for_peered.empty());
i.e., active state is not linked to whether there are flushes, and we are asserting there aren't. we could delay replay_queued_ops until we are flushed, I suppose?
Honestly, I'd rather rip replay out entirely.
Updated by Samuel Just over 7 years ago
- Status changed from 12 to Can't reproduce