Project

General

Profile

Actions

Bug #11053

closed

"FAILED assert(beacon_seq_stamp[seq] > beacon_last_acked_stamp)" in upgrade:dumpling-x-firefly-distro-basic-vps

Added by Yuri Weinstein about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

80%

Source:
Q/A
Tags:
Backport:
dumpling, emperor, firefly, giant
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/dumpling-firefly-x
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2015-03-04_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/
Job: 789599
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-04_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/789599/teuthology.log

2015-03-05T20:31:05.602 INFO:teuthology.orchestra.run.vpm078:Running: 'sudo yum -y install libcephfs_jni1 rbd-fuse ceph-radosgw librbd1 ceph-debuginfo ceph-fuse python-ceph ceph ceph-devel ceph-test librados2 cephfs-java rest-bench libcephfs1'
2015-03-05T20:31:05.721 INFO:teuthology.orchestra.run.vpm078.stdout:Loaded plugins: priorities
2015-03-05T20:31:05.862 INFO:tasks.ceph.mds.a.vpm078.stderr:mds/MDS.cc: In function 'void MDS::handle_mds_beacon(MMDSBeacon*)' thread 7f8745004700 time 2015-03-05 23:31:05.856375
2015-03-05T20:31:05.863 INFO:tasks.ceph.mds.a.vpm078.stderr:mds/MDS.cc: 666: FAILED assert(beacon_seq_stamp[seq] > beacon_last_acked_stamp)
2015-03-05T20:31:05.864 INFO:tasks.ceph.mds.a.vpm078.stderr: ceph version 0.61.9-11-ge146934 (e146934ea488219075209816ee96dd16b6d89da2)
2015-03-05T20:31:05.864 INFO:tasks.ceph.mds.a.vpm078.stderr: 1: (MDS::handle_mds_beacon(MMDSBeacon*)+0x16b) [0x4b7c6b]
2015-03-05T20:31:05.864 INFO:tasks.ceph.mds.a.vpm078.stderr: 2: (MDS::handle_core_message(Message*)+0x923) [0x4cc893]
2015-03-05T20:31:05.865 INFO:tasks.ceph.mds.a.vpm078.stderr: 3: (MDS::_dispatch(Message*)+0x2f) [0x4cc97f]
2015-03-05T20:31:05.865 INFO:tasks.ceph.mds.a.vpm078.stderr: 4: (MDS::ms_dispatch(Message*)+0x19b) [0x4ce42b]
2015-03-05T20:31:05.865 INFO:tasks.ceph.mds.a.vpm078.stderr: 5: (DispatchQueue::entry()+0x421) [0x7ed901]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x7803ad]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: 7: (()+0x7851) [0x7f874a53d851]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: 8: (clone()+0x6d) [0x7f874947290d]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 1 (0 open1 closed)

Has duplicate CephFS - Bug #11100: "FAILED assert(beacon_seq_stamp[seq]" in upgrade:dumpling-x-firefly-distro-basic-vps runDuplicate03/11/2015

Actions
Actions #1

Updated by Sage Weil about 9 years ago

  • Project changed from Ceph to CephFS
Actions #2

Updated by Zheng Yan about 9 years ago

  • Status changed from New to 12

looks like some one changed system's time

   -67> 2015-03-06 04:16:03.756151 7f87425fe700  1 -- 10.214.130.78:6806/4328 --> 10.214.130.37:6789/0 -- mdsbeacon(4101/a up:creating seq 2 v4) v2 -- ?+0 0x1ee0840 con 0x1e61b80
   -66> 2015-03-06 04:16:03.757429 7f8745004700  1 -- 10.214.130.78:6806/4328 <== mon.0 10.214.130.37:6789/0 17 ==== mdsbeacon(4101/a up:creating seq 2 v4) v2 ==== 103+0+0 (3972643044 0 0) 0x1ee02c0 con 0x1e61b80
   -65> 2015-03-05 23:31:01.806534 7f8741bfd700  2 -- 10.214.130.78:6806/4328 >> 10.214.130.37:6800/4053 pipe(0x1ef0f00 sd=17 :35809 s=2 pgs=8 cs=1 l=1).reader couldn't read tag, Success
   -64> 2015-03-05 23:31:01.806587 7f8741bfd700  2 -- 10.214.130.78:6806/4328 >> 10.214.130.37:6800/4053 pipe(0x1ef0f00 sd=17 :35809 s=2 pgs=8 cs=1 l=1).fault 0: Success
   -63> 2015-03-05 23:31:01.806782 7f8745004700  5 mds.0.1 ms_handle_reset on 10.214.130.37:6800/4053
   -62> 2015-03-05 23:31:01.806803 7f8745004700  1 mds.0.objecter ms_handle_reset on osd.3

should we fix this assertion?

Actions #3

Updated by Greg Farnum about 9 years ago

We talked about this briefly in standup. It looks like we can just toss this state out and try again, probably setting ourselves laggy until we get an ack with our new timestamps.

Actions #4

Updated by Zheng Yan about 9 years ago

  • Status changed from 12 to Fix Under Review
Actions #5

Updated by Loïc Dachary about 9 years ago

  • Status changed from Fix Under Review to 7
  • Assignee set to Zheng Yan
  • % Done changed from 0 to 80
  • Backport set to firefly
Actions #7

Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)
Actions #8

Updated by Greg Farnum about 9 years ago

  • Backport changed from firefly to dumpling, emperor, firefly, giant
Actions #9

Updated by Greg Farnum about 9 years ago

  • Status changed from 7 to Resolved

These PRs all got merged.

Actions #10

Updated by Yuri Weinstein about 9 years ago

  • Status changed from Resolved to New

Still see in
Run: http://pulpito.ceph.com/teuthology-2015-04-10_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/
Job: ['843744']
Logs: http://pulpito.ceph.com/teuthology-2015-04-10_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/843744/

Assertion: mds/MDS.cc: 666: FAILED assert(beacon_seq_stamp[seq] > beacon_last_acked_stamp)
ceph version 0.61.9-11-ge146934 (e146934ea488219075209816ee96dd16b6d89da2)
 1: (MDS::handle_mds_beacon(MMDSBeacon*)+0x16b) [0x4b7c6b]
 2: (MDS::handle_core_message(Message*)+0x923) [0x4cc893]
 3: (MDS::_dispatch(Message*)+0x2f) [0x4cc97f]
 4: (MDS::ms_dispatch(Message*)+0x19b) [0x4ce42b]
 5: (DispatchQueue::entry()+0x421) [0x84dea1]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8207bd]
 7: (()+0x79d1) [0x7f93bb6b29d1]
 8: (clone()+0x6d) [0x7f93ba5e686d]
Actions #11

Updated by Yuri Weinstein about 9 years ago

  • ceph-qa-suite upgrade/dumpling-firefly-x added
Actions #12

Updated by John Spray about 9 years ago

The stack trace appears to be from Cuttlefish, to which the fix wasn't backported.

It's annoying that we would have to fix this in such an old branch just to make the upgrade tests happy.

Is it possible that there is more of an infrastructure issue here if clocks are jumping around enough to trigger the bug?

Actions #13

Updated by Greg Farnum about 9 years ago

  • Status changed from New to Resolved

Yeah, we're not doing anything with this. I'm confused why cuttlefish is running at all though, since it's pre-dumpling...

Actions

Also available in: Atom PDF