Actions
Bug #17400
closedmon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
hammer,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
1. ceph version
https://github.com/ceph/ceph/pull/11126
tchaikov:wip-17179-jewel
2. cluster info:
[root@node181 mon]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.51472 root default -2 0.22887 host node181 1 0.07629 osd.1 up 1.00000 1.00000 2 0.07629 osd.2 up 1.00000 1.00000 0 0.07628 osd.0 up 1.00000 1.00000 -3 0.28586 host node173 4 0.09529 osd.4 up 1.00000 1.00000 5 0.09529 osd.5 up 1.00000 1.00000 3 0.09528 osd.3 up 1.00000 1.00000
3.steps info
1.stop all server systemctl stop ceph-osd.target systemctl stop ceph-mon@node173 systemctl stop ceph-osd.target systemctl stop ceph-mon@node181 2.mkdir tmp_store and backup origin mon mkdir -p /tmp/mon-store node173 mkdir -p /tmp/mon-store node181 mv /var/lib/ceph/mon/ceph-node173 /var/lib/ceph/mon/ceph-node173_back mv /var/lib/ceph/mon/ceph-node181 /var/lib/ceph/mon/ceph-node181_back 3.collect the cluster map from OSDs in node173 /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3/ --op update-mon-db --mon-store-path /tmp/mon-store/ /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4/ --op update-mon-db --mon-store-path /tmp/mon-store/ /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5/ --op update-mon-db --mon-store-path /tmp/mon-store/ 4.sync mon-store to node181 rsync -avz /tmp/mon-store/ 10.118.202.181:/tmp/mon-store/ 5.collect the cluster map from OSDs in node181 /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op update-mon-db --mon-store-path /tmp/mon-store/ /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --op update-mon-db --mon-store-path /tmp/mon-store/ /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --op update-mon-db --mon-store-path /tmp/mon-store/ 6.rebuild the monitor store /usr/bin/ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring mkdir -p /var/lib/ceph/mon/ceph-node181 cp -r /tmp/mon-store/* /var/lib/ceph/mon/ceph-node181 cp /keyring /var/lib/ceph/mon/ceph-node181 touch done; touch systemd chown ceph:ceph -R ../ceph-node181 mkdir -p /var/lib/ceph/mon/ceph-node173 // in node173 scp -r /tmp/mon-store/* 10.118.202.173:/var/lib/ceph/mon/ceph-node173 cp /keyring /var/lib/ceph/mon/ceph-node173 touch done; touch systemd chown ceph:ceph -R ../ceph-node173 7.start mon systemctl stop ceph-mon@node173 systemctl stop ceph-mon@node181
4. core info
2016-09-22 16:49:59.900190 7fab37d8b700 5 mon.node173@0(leader).paxos(paxos active c 1..3) is_readable = 1 - now=2016-09-22 16:49:59.900190 lease_expire=2016-09-22 16:50:04.900174 has v0 lc 3 2016-09-22 16:49:59.916854 7fab37d8b700 -1 mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7fab37d8b700 time 2016-09-22 16:49:59.900215 mon/PGMonitor.cc: 892: FAILED assert(err == 0) ceph version 10.2.2.8 (4cf7ed7423032cffc3768f1a091251d3733b26d0) 1: (ceph::__ceph_assert_fail(char const, char const, int, char const)+0x85) [0x7fab3eddbaa5] 2: (PGMonitor::check_osd_map(unsigned int)+0x1528) [0x7fab3eb3efe8] 3: (PGMonitor::on_active()+0xf6) [0x7fab3eb3f416] 4: (PaxosService::_active()+0x207) [0x7fab3ea88217] 5: (Context::complete(int)+0x9) [0x7fab3ea54a39] 6: (void finish_contexts(CephContext, std::list >&, int)+0xac) [0x7fab3ea5b27c] 7: (Paxos::finish_round()+0xd1) [0x7fab3ea7f5c1] 8: (Paxos::commit_finish()+0x656) [0x7fab3ea81df6] 9: (C_Committed::finish(int)+0x2b) [0x7fab3ea8501b] 10: (Context::complete(int)+0x9) [0x7fab3ea54a39] 11: (MonitorDBStore::C_DoTransaction::finish(int)+0xa7) [0x7fab3ea83bc7] 12: (Context::complete(int)+0x9) [0x7fab3ea54a39] 13: (Finisher::finisher_thread_entry()+0x216) [0x7fab3ed020e6] 14: (()+0x7df3) [0x7fab3d308df3] 15: (clone()+0x6d) [0x7fab3bbd33dd] NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.
Files
Updated by huanwen ren over 7 years ago
- File store.db.7z store.db.7z added
for i in `seq 0 5`; do
ls /var/lib/ceph/osd/ceph-3/current/meta/
done
DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none
Updated by Kefu Chai over 7 years ago
sorry should be
for i in `seq 0 5`; do ls /var/lib/ceph/osd/ceph-$i/current/meta/ done
Updated by huanwen ren over 7 years ago
for i in `seq 0 5`;do
ls /var/lib/ceph/osd/ceph-$i/current/meta/
done
[root@node181 ~]# ./aaaaaa.sh DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none ls: 无法访问/var/lib/ceph/osd/ceph-3/current/meta/: 没有那个文件或目录 ls: 无法访问/var/lib/ceph/osd/ceph-4/current/meta/: 没有那个文件或目录 ls: 无法访问/var/lib/ceph/osd/ceph-5/current/meta/: 没有那个文件或目录
[root@node173 renhw]# ./aaaaaa.sh ls: 无法访问/var/lib/ceph/osd/ceph-0/current/meta/: 没有那个文件或目录 ls: 无法访问/var/lib/ceph/osd/ceph-1/current/meta/: 没有那个文件或目录 ls: 无法访问/var/lib/ceph/osd/ceph-2/current/meta/: 没有那个文件或目录 DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none DIR_2 DIR_5 osd\usuperblock__0_23C2FCDE__none snapmapper__0_A468EC03__none
Updated by Kefu Chai over 7 years ago
- Status changed from New to Fix Under Review
Updated by Kefu Chai over 7 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to hammer,jewel
Updated by Nathan Cutler over 7 years ago
- Copied to Backport #17602: hammer: mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store added
Updated by Nathan Cutler over 7 years ago
- Copied to Backport #17603: jewel: mon/tool: PGMonitor::check_osd_map assert fail when the rebuild mon store added
Updated by Nathan Cutler over 7 years ago
- Status changed from Pending Backport to Resolved
Actions