Project

General

Profile

Bug #19825

mon crash on shutdown, lease_ack_timeout event

Added by Sage Weil 8 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
05/02/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel,kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

   -12> 2017-05-01 22:26:14.521630 178f3700 10 mon.b@0(leader) e2 ms_handle_refused 0x13d4af30 172.21.15.37:6790/0
   -11> 2017-05-01 22:26:15.478776 1a0f8700 -1 received  signal: Terminated from  PID: 6260 task name: /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind
.supp --xml=yes --xml-file=/var/log/ceph/valgrind/mon.b.log --time-stamp=yes --tool=memcheck --leak-check=full --show-reachable=yes ceph-mon -f --cluster ceph -i b  UID: 0
   -10> 2017-05-01 22:26:15.483074 1a0f8700 -1 mon.b@0(leader) e2 *** Got Signal Terminated ***
    -9> 2017-05-01 22:26:15.485762 1a0f8700  1 mon.b@0(leader) e2 shutdown
    -8> 2017-05-01 22:26:15.493068 1a0f8700  5 asok(0xdf374f0) unregister_command mon_status
    -7> 2017-05-01 22:26:15.503498 1a0f8700  5 asok(0xdf374f0) unregister_command quorum_status
    -6> 2017-05-01 22:26:15.504742 1a0f8700  5 asok(0xdf374f0) unregister_command sync_force
    -5> 2017-05-01 22:26:15.505482 1a0f8700  5 asok(0xdf374f0) unregister_command add_bootstrap_peer_hint
    -4> 2017-05-01 22:26:15.506473 1a0f8700  5 asok(0xdf374f0) unregister_command quorum enter
    -3> 2017-05-01 22:26:15.507240 1a0f8700  5 asok(0xdf374f0) unregister_command quorum exit
    -2> 2017-05-01 22:26:15.507961 1a0f8700  5 asok(0xdf374f0) unregister_command ops
    -1> 2017-05-01 22:26:15.526724 150ee700  1 mon.b@0(shutdown).paxos(paxos active c 1..266) lease_ack_timeout -- calling new election
     0> 2017-05-01 22:26:15.585400 150ee700 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-1789-g491f4ce/rpm/el7/BUILD/ceph-12.0.1-1789-g491f4ce/src/mon/Paxos.cc: 
In function 'void Paxos::lease_ack_timeout()' thread 150ee700 time 2017-05-01 22:26:15.528453
/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-1789-g491f4ce/rpm/el7/BUILD/ceph-12.0.1-1789-g491f4ce/src/mon/Paxos.cc: 1204: FAILED assert(mon->is_leader())

 ceph version 12.0.1-1789-g491f4ce (491f4ce2ef640975497d6561c1f0c1ae265e3996)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x6ab810]
 2: (Paxos::lease_ack_timeout()+0x1d3) [0x5011a3]
 3: (Context::complete(int)+0x9) [0x4df0f9]
 4: (SafeTimer::timer_thread()+0x104) [0x6a8204]
 5: (SafeTimerThread::entry()+0xd) [0x6a9c2d]
 6: (()+0x7dc5) [0xa984dc5]
 7: (clone()+0x6d) [0xd3b273d]

/a/yuriw-2017-05-01_21:57:19-rados-wip-yuri-testing_2017_4_29_2---basic-smithi/1088991


Related issues

Copied to Ceph - Backport #19926: jewel: mon crash on shutdown, lease_ack_timeout event Resolved
Copied to Ceph - Backport #19928: kraken: mon crash on shutdown, lease_ack_timeout event Resolved

History

#1 Updated by Kefu Chai 8 months ago

  • Status changed from New to In Progress
  • Assignee set to Kefu Chai

#2 Updated by Kefu Chai 8 months ago

  • Status changed from In Progress to Need Review
  • Backport set to jewel,kraken

#3 Updated by Sage Weil 7 months ago

/a/sage-2017-05-03_20:20:09-rados-wip-sage-testing---basic-smithi/1098340

probe timeout this time. same fix should apply.

#4 Updated by Sage Weil 7 months ago

  • Status changed from Need Review to Pending Backport
  • Priority changed from Immediate to High

#5 Updated by Alexey Sheplyakov 7 months ago

  • Copied to Backport #19926: jewel: mon crash on shutdown, lease_ack_timeout event added

#6 Updated by Alexey Sheplyakov 7 months ago

  • Copied to Backport #19928: kraken: mon crash on shutdown, lease_ack_timeout event added

#7 Updated by Nathan Cutler 4 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF