Project

General

Profile

Actions

Bug #17400

closed

mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store

Added by huanwen ren over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1. ceph version
https://github.com/ceph/ceph/pull/11126
tchaikov:wip-17179-jewel

2. cluster info:

  [root@node181 mon]# ceph osd tree
ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.51472 root default                                       
-2 0.22887     host node181                                   
 1 0.07629         osd.1         up  1.00000          1.00000 
 2 0.07629         osd.2         up  1.00000          1.00000 
 0 0.07628         osd.0         up  1.00000          1.00000 
-3 0.28586     host node173                                   
 4 0.09529         osd.4         up  1.00000          1.00000 
 5 0.09529         osd.5         up  1.00000          1.00000 
 3 0.09528         osd.3         up  1.00000          1.00000 

3.steps info

  1.stop all server
    systemctl stop ceph-osd.target
    systemctl stop ceph-mon@node173
    systemctl stop ceph-osd.target
    systemctl stop ceph-mon@node181

  2.mkdir tmp_store and backup origin mon
    mkdir -p /tmp/mon-store node173
    mkdir -p /tmp/mon-store node181
    mv /var/lib/ceph/mon/ceph-node173 /var/lib/ceph/mon/ceph-node173_back
    mv /var/lib/ceph/mon/ceph-node181 /var/lib/ceph/mon/ceph-node181_back

  3.collect the cluster map from OSDs in node173
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5/ --op update-mon-db --mon-store-path /tmp/mon-store/

  4.sync mon-store to node181
    rsync -avz /tmp/mon-store/ 10.118.202.181:/tmp/mon-store/

  5.collect the cluster map from OSDs in node181
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --op update-mon-db --mon-store-path /tmp/mon-store/

  6.rebuild the monitor store
    /usr/bin/ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
    mkdir -p /var/lib/ceph/mon/ceph-node181
    cp -r /tmp/mon-store/* /var/lib/ceph/mon/ceph-node181
    cp /keyring /var/lib/ceph/mon/ceph-node181
    touch done; touch systemd

    chown ceph:ceph -R ../ceph-node181

    mkdir -p /var/lib/ceph/mon/ceph-node173 // in node173
    scp -r /tmp/mon-store/* 10.118.202.173:/var/lib/ceph/mon/ceph-node173
    cp /keyring /var/lib/ceph/mon/ceph-node173
    touch done; touch systemd
    chown ceph:ceph -R ../ceph-node173

  7.start mon
    systemctl stop ceph-mon@node173
    systemctl stop ceph-mon@node181

4. core info

2016-09-22 16:49:59.900190 7fab37d8b700 5 mon.node173@0(leader).paxos(paxos active c 1..3) is_readable = 1 - now=2016-09-22 16:49:59.900190 lease_expire=2016-09-22 16:50:04.900174 has v0 lc 3
2016-09-22 16:49:59.916854 7fab37d8b700 -1 mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7fab37d8b700 time 2016-09-22 16:49:59.900215
mon/PGMonitor.cc: 892: FAILED assert(err == 0)

ceph version 10.2.2.8 (4cf7ed7423032cffc3768f1a091251d3733b26d0)
1: (ceph::__ceph_assert_fail(char const, char const, int, char const)+0x85) [0x7fab3eddbaa5]
2: (PGMonitor::check_osd_map(unsigned int)+0x1528) [0x7fab3eb3efe8]
3: (PGMonitor::on_active()+0xf6) [0x7fab3eb3f416]
4: (PaxosService::_active()+0x207) [0x7fab3ea88217]
5: (Context::complete(int)+0x9) [0x7fab3ea54a39]
6: (void finish_contexts(CephContext, std::list >&, int)+0xac) [0x7fab3ea5b27c]
7: (Paxos::finish_round()+0xd1) [0x7fab3ea7f5c1]
8: (Paxos::commit_finish()+0x656) [0x7fab3ea81df6]
9: (C_Committed::finish(int)+0x2b) [0x7fab3ea8501b]
10: (Context::complete(int)+0x9) [0x7fab3ea54a39]
11: (MonitorDBStore::C_DoTransaction::finish(int)+0xa7) [0x7fab3ea83bc7]
12: (Context::complete(int)+0x9) [0x7fab3ea54a39]
13: (Finisher::finisher_thread_entry()+0x216) [0x7fab3ed020e6]
14: (()+0x7df3) [0x7fab3d308df3]
15: (clone()+0x6d) [0x7fab3bbd33dd]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.


Files

store.db.7z (56.4 KB) store.db.7z huanwen ren, 09/26/2016 06:34 AM

Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #17602: hammer: mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store ResolvedKefu ChaiActions
Copied to Ceph - Backport #17603: jewel: mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store ResolvedKefu ChaiActions
Actions #1

Updated by huanwen ren over 7 years ago

for i in `seq 0 5`; do
ls /var/lib/ceph/osd/ceph-3/current/meta/
done

DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
Actions #2

Updated by Kefu Chai over 7 years ago

sorry should be

for i in `seq 0 5`; do
  ls /var/lib/ceph/osd/ceph-$i/current/meta/
done

Actions #3

Updated by huanwen ren over 7 years ago

for i in `seq 0 5`;do
ls /var/lib/ceph/osd/ceph-$i/current/meta/
done

[root@node181 ~]# ./aaaaaa.sh 
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
ls: 无法访问/var/lib/ceph/osd/ceph-3/current/meta/: 没有那个文件或目录
ls: 无法访问/var/lib/ceph/osd/ceph-4/current/meta/: 没有那个文件或目录
ls: 无法访问/var/lib/ceph/osd/ceph-5/current/meta/: 没有那个文件或目录
[root@node173 renhw]# ./aaaaaa.sh 
ls: 无法访问/var/lib/ceph/osd/ceph-0/current/meta/: 没有那个文件或目录
ls: 无法访问/var/lib/ceph/osd/ceph-1/current/meta/: 没有那个文件或目录
ls: 无法访问/var/lib/ceph/osd/ceph-2/current/meta/: 没有那个文件或目录
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
DIR_2  DIR_5  osd\usuperblock__0_23C2FCDE__none  snapmapper__0_A468EC03__none
Actions #4

Updated by Kefu Chai over 7 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Kefu Chai over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to hammer,jewel
Actions #6

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #17602: hammer: mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store added
Actions #7

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #17603: jewel: mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store added
Actions #8

Updated by Nathan Cutler over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF