Project

General

Profile

Bug #11053

"FAILED assert(beacon_seq_stamp[seq] > beacon_last_acked_stamp)" in upgrade:dumpling-x-firefly-distro-basic-vps

Added by Yuri Weinstein over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
03/06/2015
Due date:
% Done:

80%

Source:
Q/A
Tags:
Backport:
dumpling, emperor, firefly, giant
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/dumpling-firefly-x
Component(FS):
Labels (FS):
Pull request ID:

Description

Run: http://pulpito.ceph.com/teuthology-2015-03-04_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/
Job: 789599
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-04_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/789599/teuthology.log

2015-03-05T20:31:05.602 INFO:teuthology.orchestra.run.vpm078:Running: 'sudo yum -y install libcephfs_jni1 rbd-fuse ceph-radosgw librbd1 ceph-debuginfo ceph-fuse python-ceph ceph ceph-devel ceph-test librados2 cephfs-java rest-bench libcephfs1'
2015-03-05T20:31:05.721 INFO:teuthology.orchestra.run.vpm078.stdout:Loaded plugins: priorities
2015-03-05T20:31:05.862 INFO:tasks.ceph.mds.a.vpm078.stderr:mds/MDS.cc: In function 'void MDS::handle_mds_beacon(MMDSBeacon*)' thread 7f8745004700 time 2015-03-05 23:31:05.856375
2015-03-05T20:31:05.863 INFO:tasks.ceph.mds.a.vpm078.stderr:mds/MDS.cc: 666: FAILED assert(beacon_seq_stamp[seq] > beacon_last_acked_stamp)
2015-03-05T20:31:05.864 INFO:tasks.ceph.mds.a.vpm078.stderr: ceph version 0.61.9-11-ge146934 (e146934ea488219075209816ee96dd16b6d89da2)
2015-03-05T20:31:05.864 INFO:tasks.ceph.mds.a.vpm078.stderr: 1: (MDS::handle_mds_beacon(MMDSBeacon*)+0x16b) [0x4b7c6b]
2015-03-05T20:31:05.864 INFO:tasks.ceph.mds.a.vpm078.stderr: 2: (MDS::handle_core_message(Message*)+0x923) [0x4cc893]
2015-03-05T20:31:05.865 INFO:tasks.ceph.mds.a.vpm078.stderr: 3: (MDS::_dispatch(Message*)+0x2f) [0x4cc97f]
2015-03-05T20:31:05.865 INFO:tasks.ceph.mds.a.vpm078.stderr: 4: (MDS::ms_dispatch(Message*)+0x19b) [0x4ce42b]
2015-03-05T20:31:05.865 INFO:tasks.ceph.mds.a.vpm078.stderr: 5: (DispatchQueue::entry()+0x421) [0x7ed901]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x7803ad]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: 7: (()+0x7851) [0x7f874a53d851]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: 8: (clone()+0x6d) [0x7f874947290d]
2015-03-05T20:31:05.866 INFO:tasks.ceph.mds.a.vpm078.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Duplicated by fs - Bug #11100: "FAILED assert(beacon_seq_stamp[seq]" in upgrade:dumpling-x-firefly-distro-basic-vps run Duplicate 03/11/2015

Associated revisions

Revision 75bf4bec (diff)
Added by Yan, Zheng over 3 years ago

mds: fix assertion caused by system clock backwards

Fixes: #11053
Signed-off-by: Yan, Zheng <>

Revision 07fc9f66 (diff)
Added by Yan, Zheng over 3 years ago

mds: fix assertion caused by system clock backwards

Fixes: #11053
Signed-off-by: Yan, Zheng <>

Revision 61d60068 (diff)
Added by Yan, Zheng over 3 years ago

mds: fix assertion caused by system clock backwards

Fixes: #11053
Signed-off-by: Yan, Zheng <>

Revision c7cf4933 (diff)
Added by Yan, Zheng over 3 years ago

mds: fix assertion caused by system clock backwards

Fixes: #11053
Signed-off-by: Yan, Zheng <>

History

#1 Updated by Sage Weil over 3 years ago

  • Project changed from Ceph to fs

#2 Updated by Zheng Yan over 3 years ago

  • Status changed from New to Verified

looks like some one changed system's time

   -67> 2015-03-06 04:16:03.756151 7f87425fe700  1 -- 10.214.130.78:6806/4328 --> 10.214.130.37:6789/0 -- mdsbeacon(4101/a up:creating seq 2 v4) v2 -- ?+0 0x1ee0840 con 0x1e61b80
   -66> 2015-03-06 04:16:03.757429 7f8745004700  1 -- 10.214.130.78:6806/4328 <== mon.0 10.214.130.37:6789/0 17 ==== mdsbeacon(4101/a up:creating seq 2 v4) v2 ==== 103+0+0 (3972643044 0 0) 0x1ee02c0 con 0x1e61b80
   -65> 2015-03-05 23:31:01.806534 7f8741bfd700  2 -- 10.214.130.78:6806/4328 >> 10.214.130.37:6800/4053 pipe(0x1ef0f00 sd=17 :35809 s=2 pgs=8 cs=1 l=1).reader couldn't read tag, Success
   -64> 2015-03-05 23:31:01.806587 7f8741bfd700  2 -- 10.214.130.78:6806/4328 >> 10.214.130.37:6800/4053 pipe(0x1ef0f00 sd=17 :35809 s=2 pgs=8 cs=1 l=1).fault 0: Success
   -63> 2015-03-05 23:31:01.806782 7f8745004700  5 mds.0.1 ms_handle_reset on 10.214.130.37:6800/4053
   -62> 2015-03-05 23:31:01.806803 7f8745004700  1 mds.0.objecter ms_handle_reset on osd.3

should we fix this assertion?

#3 Updated by Greg Farnum over 3 years ago

We talked about this briefly in standup. It looks like we can just toss this state out and try again, probably setting ourselves laggy until we get an ack with our new timestamps.

#4 Updated by Zheng Yan over 3 years ago

  • Status changed from Verified to Need Review

#5 Updated by Loic Dachary over 3 years ago

  • Status changed from Need Review to Testing
  • Assignee set to Zheng Yan
  • % Done changed from 0 to 80
  • Backport set to firefly

#7 Updated by Loic Dachary over 3 years ago

  • Description updated (diff)

#8 Updated by Greg Farnum over 3 years ago

  • Backport changed from firefly to dumpling, emperor, firefly, giant

#9 Updated by Greg Farnum over 3 years ago

  • Status changed from Testing to Resolved

These PRs all got merged.

#10 Updated by Yuri Weinstein over 3 years ago

  • Status changed from Resolved to New

Still see in
Run: http://pulpito.ceph.com/teuthology-2015-04-10_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/
Job: ['843744']
Logs: http://pulpito.ceph.com/teuthology-2015-04-10_19:13:01-upgrade:dumpling-x-firefly-distro-basic-vps/843744/

Assertion: mds/MDS.cc: 666: FAILED assert(beacon_seq_stamp[seq] > beacon_last_acked_stamp)
ceph version 0.61.9-11-ge146934 (e146934ea488219075209816ee96dd16b6d89da2)
 1: (MDS::handle_mds_beacon(MMDSBeacon*)+0x16b) [0x4b7c6b]
 2: (MDS::handle_core_message(Message*)+0x923) [0x4cc893]
 3: (MDS::_dispatch(Message*)+0x2f) [0x4cc97f]
 4: (MDS::ms_dispatch(Message*)+0x19b) [0x4ce42b]
 5: (DispatchQueue::entry()+0x421) [0x84dea1]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8207bd]
 7: (()+0x79d1) [0x7f93bb6b29d1]
 8: (clone()+0x6d) [0x7f93ba5e686d]

#11 Updated by Yuri Weinstein over 3 years ago

  • ceph-qa-suite upgrade/dumpling-firefly-x added

#12 Updated by John Spray over 3 years ago

The stack trace appears to be from Cuttlefish, to which the fix wasn't backported.

It's annoying that we would have to fix this in such an old branch just to make the upgrade tests happy.

Is it possible that there is more of an infrastructure issue here if clocks are jumping around enough to trigger the bug?

#13 Updated by Greg Farnum over 3 years ago

  • Status changed from New to Resolved

Yeah, we're not doing anything with this. I'm confused why cuttlefish is running at all though, since it's pre-dumpling...

Also available in: Atom PDF