Project

General

Profile

Bug #15591

infernalis to jewel upgrade

Added by Emil Öhgren almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/25/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

ubuntu trusty
3.13.0-68-generic

  1. dpkg -l | egrep '(ceph|rbd)'
    ii ceph 10.2.0-1trusty amd64 distributed storage and file system
    ii ceph-base 10.2.0-1trusty amd64 common ceph daemon libraries and management tools
    ii ceph-common 10.2.0-1trusty amd64 common utilities to mount and interact with a ceph storage cluster
    ii ceph-fs-common 10.2.0-1trusty amd64 common utilities to mount and interact with a ceph file system
    ii ceph-fuse 10.2.0-1trusty amd64 FUSE-based client for the Ceph distributed file system
    ii ceph-mds 10.2.0-1trusty amd64 metadata server for the ceph distributed file system
    ii ceph-mon 10.2.0-1trusty amd64 monitor server for the ceph storage system
    ii ceph-osd 10.2.0-1trusty amd64 OSD server for the ceph storage system
    ii libcephfs1 10.2.0-1trusty amd64 Ceph distributed file system client library
    ii librbd1 10.2.0-1trusty amd64 RADOS block device client library
    ii python-cephfs 10.2.0-1trusty amd64 Python libraries for the Ceph libcephfs library
    ii python-rbd 10.2.0-1trusty amd64 Python libraries for the Ceph librbd library

procedure:

update repo:
#cat /etc/apt/sources.list.d/ceph.list
deb "http://download.ceph.com/debian-jewel/" trusty main
apt-get update && apt-get install -y ceph ceph-common ceph-fs-common ceph-fuse ceph-mds libcephfs1 python-cephfs

restart ceph-mon-all

  1. ceph mds dump
    dumped mdsmap epoch 4363
    epoch 4363
    flags 0
    created 2015-12-03 11:21:28.128193
    modified 2016-04-25 14:56:39.017597
    tableserver 0
    root 0
    session_timeout 60
    session_autoclose 300
    max_file_size 1099511627776
    last_failure 1781
    last_failure_osd_epoch 5359
    compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
    max_mds 1
    in 0
    up {0=386085}
    failed
    damaged
    stopped
    data_pools 2
    metadata_pool 3
    inline_data disabled
    512768: 10.3.215.5:6800/3025 'ceph-mds03' mds.-1.0 up:standby-replay seq 2 (standby for rank 0)
    386085: 10.3.215.10:6805/19663 'ceph-mds04' mds.0.118 up:active seq 972918 (standby for rank 0)

log:

2016-04-25 16:57:38.117631 7f85e99074c0 0 set uid:gid to 64045:64045 (ceph:ceph)
2016-04-25 16:57:38.117766 7f85e99074c0 0 ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process ceph-mon, pid 31105
2016-04-25 16:57:38.117843 7f85e99074c0 0 pidfile_write: ignore empty --pid-file
2016-04-25 16:57:38.142888 7f85e99074c0 1 leveldb: Recovering log #1752835
2016-04-25 16:57:38.153152 7f85e99074c0 1 leveldb: Delete type=3 #1752834

2016-04-25 16:57:38.153218 7f85e99074c0 1 leveldb: Delete type=0 #1752835

2016-04-25 16:57:38.153584 7f85e99074c0 0 starting mon.ceph-mon01 rank 1 at 10.3.138.29:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-mon01 fsid 391a2547-7dd3-4ca4-8988-4490c660697a
2016-04-25 16:57:38.154114 7f85e99074c0 1 mon.ceph-mon01@-1(probing) e3 preinit fsid 391a2547-7dd3-4ca4-8988-4490c660697a
2016-04-25 16:57:38.154399 7f85e99074c0 1 mon.ceph-mon01@-1(probing).paxosservice(pgmap 9188751..9189280) refresh upgraded, format 0 -> 1
2016-04-25 16:57:38.154433 7f85e99074c0 1 mon.ceph-mon01@-1(probing).pg v0 on_upgrade discarding in-core PGMap
2016-04-25 16:57:38.167066 7f85e99074c0 0 mon.ceph-mon01@-1(probing).mds e4272 print_map
e4272
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}

Filesystem 'cephfs' (0)
fs_name cephfs
epoch 4272
flags 0
created 2015-12-03 11:21:28.128193
modified 2016-04-23 06:57:22.817176
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 1781
last_failure_osd_epoch 4643
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in 0
up {0=386085}
failed
damaged
stopped
data_pools 2
metadata_pool 3
inline_data disabled
386085: 10.3.215.10:6805/19663 'ceph-mds04' mds.0.118 up:active seq 929632 (standby for rank 0)

Standby daemons:

482265: 10.3.215.5:6800/10334 'ceph-mds03' mds.-1.0 up:standby-replay seq 2 (standby for rank 0)

2016-04-25 16:57:38.177164 7f85e99074c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f85e99074c0 time 2016-04-25 16:57:38.167192
mds/FSMap.cc: 607: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)

ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f85e944c9ab]
2: (FSMap::sanity() const+0x932) [0x7f85e9372682]
3: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x7f85e919da80]
4: (PaxosService::refresh(bool*)+0x19a) [0x7f85e911194a]
5: (Monitor::refresh_from_paxos(bool*)+0x143) [0x7f85e90ae6b3]
6: (Monitor::init_paxos()+0x85) [0x7f85e90aeac5]
7: (Monitor::preinit()+0x925) [0x7f85e90be575]
8: (main()+0x236d) [0x7f85e904d15d]
9: (__libc_start_main()+0xf5) [0x7f85e63a9ec5]
10: (()+0x25f2fa) [0x7f85e909f2fa]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-40> 2016-04-25 16:57:38.097958 7f85e99074c0 5 asok(0x7f85f3526000) register_command perfcounters_dump hook 0x7f85f344e050
-39> 2016-04-25 16:57:38.097983 7f85e99074c0 5 asok(0x7f85f3526000) register_command 1 hook 0x7f85f344e050
-38> 2016-04-25 16:57:38.097987 7f85e99074c0 5 asok(0x7f85f3526000) register_command perf dump hook 0x7f85f344e050
-37> 2016-04-25 16:57:38.097989 7f85e99074c0 5 asok(0x7f85f3526000) register_command perfcounters_schema hook 0x7f85f344e050
-36> 2016-04-25 16:57:38.097991 7f85e99074c0 5 asok(0x7f85f3526000) register_command 2 hook 0x7f85f344e050
-35> 2016-04-25 16:57:38.097992 7f85e99074c0 5 asok(0x7f85f3526000) register_command perf schema hook 0x7f85f344e050
-34> 2016-04-25 16:57:38.097994 7f85e99074c0 5 asok(0x7f85f3526000) register_command perf reset hook 0x7f85f344e050
-33> 2016-04-25 16:57:38.097996 7f85e99074c0 5 asok(0x7f85f3526000) register_command config show hook 0x7f85f344e050
-32> 2016-04-25 16:57:38.097999 7f85e99074c0 5 asok(0x7f85f3526000) register_command config set hook 0x7f85f344e050
-31> 2016-04-25 16:57:38.098001 7f85e99074c0 5 asok(0x7f85f3526000) register_command config get hook 0x7f85f344e050
-30> 2016-04-25 16:57:38.098003 7f85e99074c0 5 asok(0x7f85f3526000) register_command config diff hook 0x7f85f344e050
-29> 2016-04-25 16:57:38.098013 7f85e99074c0 5 asok(0x7f85f3526000) register_command log flush hook 0x7f85f344e050
-28> 2016-04-25 16:57:38.098015 7f85e99074c0 5 asok(0x7f85f3526000) register_command log dump hook 0x7f85f344e050
-27> 2016-04-25 16:57:38.098017 7f85e99074c0 5 asok(0x7f85f3526000) register_command log reopen hook 0x7f85f344e050
-26> 2016-04-25 16:57:38.117631 7f85e99074c0 0 set uid:gid to 64045:64045 (ceph:ceph)
-25> 2016-04-25 16:57:38.117766 7f85e99074c0 0 ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process ceph-mon, pid 31105
-24> 2016-04-25 16:57:38.117843 7f85e99074c0 0 pidfile_write: ignore empty --pid-file
-23> 2016-04-25 16:57:38.123648 7f85e99074c0 5 asok(0x7f85f3526000) init /var/run/ceph/ceph-mon.ceph-mon01.asok
-22> 2016-04-25 16:57:38.123676 7f85e99074c0 5 asok(0x7f85f3526000) bind_and_listen /var/run/ceph/ceph-mon.ceph-mon01.asok
-21> 2016-04-25 16:57:38.123721 7f85e99074c0 5 asok(0x7f85f3526000) register_command 0 hook 0x7f85f344a0c0
-20> 2016-04-25 16:57:38.123738 7f85e99074c0 5 asok(0x7f85f3526000) register_command version hook 0x7f85f344a0c0
-19> 2016-04-25 16:57:38.123750 7f85e99074c0 5 asok(0x7f85f3526000) register_command git_version hook 0x7f85f344a0c0
-18> 2016-04-25 16:57:38.123762 7f85e99074c0 5 asok(0x7f85f3526000) register_command help hook 0x7f85f344e240
-17> 2016-04-25 16:57:38.123773 7f85e99074c0 5 asok(0x7f85f3526000) register_command get_command_descriptions hook 0x7f85f344e230
-16> 2016-04-25 16:57:38.135096 7f85e374d700 5 asok(0x7f85f3526000) entry start
-15> 2016-04-25 16:57:38.142888 7f85e99074c0 1 leveldb: Recovering log #1752835
-14> 2016-04-25 16:57:38.153152 7f85e99074c0 1 leveldb: Delete type=3 #1752834

-13> 2016-04-25 16:57:38.153218 7f85e99074c0  1 leveldb: Delete type=0 #1752835
12> 2016-04-25 16:57:38.153584 7f85e99074c0  0 starting mon.ceph-mon01 rank 1 at 10.3.138.29:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-mon01 fsid 391a2547-7dd3-4ca4-8988-4490c660697a
-11> 2016-04-25 16:57:38.153671 7f85e99074c0 1 -
10.3.138.29:6789/0 learned my addr 10.3.138.29:6789/0
-10> 2016-04-25 16:57:38.153685 7f85e99074c0 1 accepter.accepter.bind my_inst.addr is 10.3.138.29:6789/0 need_addr=0
-9> 2016-04-25 16:57:38.153921 7f85e99074c0 5 adding auth protocol: cephx
-8> 2016-04-25 16:57:38.153936 7f85e99074c0 5 adding auth protocol: cephx
-7> 2016-04-25 16:57:38.153978 7f85e99074c0 10 log_channel(cluster) update_config to_monitors: true to_syslog: false syslog_facility: daemon prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201)
-6> 2016-04-25 16:57:38.153994 7f85e99074c0 10 log_channel(audit) update_config to_monitors: true to_syslog: false syslog_facility: local0 prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201)
-5> 2016-04-25 16:57:38.154114 7f85e99074c0 1 mon.ceph-mon01@-1(probing) e3 preinit fsid 391a2547-7dd3-4ca4-8988-4490c660697a
-4> 2016-04-25 16:57:38.154399 7f85e99074c0 1 mon.ceph-mon01@-1(probing).paxosservice(pgmap 9188751..9189280) refresh upgraded, format 0 -> 1
-3> 2016-04-25 16:57:38.154433 7f85e99074c0 1 mon.ceph-mon01@-1(probing).pg v0 on_upgrade discarding in-core PGMap
-2> 2016-04-25 16:57:38.162502 7f85e99074c0 4 mon.ceph-mon01@-1(probing).mds e4272 new map
-1> 2016-04-25 16:57:38.167066 7f85e99074c0 0 mon.ceph-mon01@-1(probing).mds e4272 print_map
e4272
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}

Filesystem 'cephfs' (0)
fs_name cephfs
epoch 4272
flags 0
created 2015-12-03 11:21:28.128193
modified 2016-04-23 06:57:22.817176
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 1781
last_failure_osd_epoch 4643
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in 0
up {0=386085}
failed
damaged
stopped
data_pools 2
metadata_pool 3
inline_data disabled
386085: 10.3.215.10:6805/19663 'ceph-mds04' mds.0.118 up:active seq 929632 (standby for rank 0)

Standby daemons:

482265: 10.3.215.5:6800/10334 'ceph-mds03' mds.-1.0 up:standby-replay seq 2 (standby for rank 0)

0> 2016-04-25 16:57:38.177164 7f85e99074c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f85e99074c0 time 2016-04-25 16:57:38.167192
mds/FSMap.cc: 607: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f85e944c9ab]
2: (FSMap::sanity() const+0x932) [0x7f85e9372682]
3: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x7f85e919da80]
4: (PaxosService::refresh(bool*)+0x19a) [0x7f85e911194a]
5: (Monitor::refresh_from_paxos(bool*)+0x143) [0x7f85e90ae6b3]
6: (Monitor::init_paxos()+0x85) [0x7f85e90aeac5]
7: (Monitor::preinit()+0x925) [0x7f85e90be575]
8: (main()+0x236d) [0x7f85e904d15d]
9: (__libc_start_main()+0xf5) [0x7f85e63a9ec5]
10: (()+0x25f2fa) [0x7f85e909f2fa]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Copied to fs - Backport #15738: jewel: infernalis to jewel upgrade Resolved

History

#1 Updated by John Spray almost 3 years ago

482265: 10.3.215.5:6800/10334 'ceph-mds03' mds.-1.0 up:standby-replay seq 2 (standby for rank 0)

Ah, the code is expecting that a standby-replay daemon will have rank set to to the rank it is following, but that isn't actually the case, hence the assertion failure. I think I was intending that standby replays (which were actually replaying) would be in the MDSMap for their filesystem, and would have their rank set, but that doesn't seem to actually be the case. Not sure why the tests in test_failover.py are passing though!

#2 Updated by John Spray almost 3 years ago

  • Status changed from New to In Progress
  • Assignee set to John Spray

#3 Updated by John Spray almost 3 years ago

  • Project changed from Ceph to fs
  • Category set to 47

#4 Updated by John Spray almost 3 years ago

  • Status changed from In Progress to Need Review

Added the fix for this to https://github.com/ceph/ceph/pull/8300

#5 Updated by Greg Farnum almost 3 years ago

  • Status changed from Need Review to Pending Backport
  • Backport set to jewel

#6 Updated by Nathan Cutler almost 3 years ago

#7 Updated by Greg Farnum almost 3 years ago

  • Status changed from Pending Backport to Resolved

#8 Updated by Greg Farnum over 2 years ago

  • Component(FS) MDS added

Also available in: Atom PDF