Bug #24419
closedceph-objectstore-tool unable to open mon store
0%
Description
Hi,everyone;
I use luminous v12.2.5,and i try to recovery monitor database from osds,
I perform step by step according to the official website's documentation
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds,
i have 2 hosts A and B, 3 osds per host.
but after i collected map from osds of A and rsync mon store to B, and try to collect map from B,
it print a errer "unable to open mon store /tmp/mon-store",here were what i did.
step1: collect from host A
[root@gz-open-dw-c204 recover]# MS=/tmp/mon-store/; mkdir $MS; host=gz-open-dw-c204; ssh -t root@$host "for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --mon-store-path $MS; done" root@gz-open-dw-c204's password: osd.0 : 0 osdmaps trimmed, 52 osdmaps added. 267 pgs added. osd.1 : 0 osdmaps trimmed, 0 osdmaps added. 259 pgs added. osd.2 : 0 osdmaps trimmed, 0 osdmaps added. 242 pgs added. Connection to gz-open-dw-c204 closed.
step2: rsync to host B
[root@gz-open-dw-c204 recover]# MS=/tmp/mon-store; host=gz-open-dw-c205; rsync -avz $MS/ root@$host:$MS/ root@gz-open-dw-c205's password: sending incremental file list created directory /tmp/mon-store ./ kv_backend store.db.wal/ store.db.wal/000009.log store.db/ store.db/CURRENT store.db/IDENTITY store.db/LOCK store.db/MANIFEST-000008 store.db/OPTIONS-000008 store.db/OPTIONS-000011 sent 16457 bytes received 175 bytes 3696.00 bytes/sec total size is 148691 speedup is 8.94
step3: collect from B
[root@gz-open-dw-c204 recover]# MS=/tmp/mon-store/; host=gz-open-dw-c205; ssh -t root@$host "for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --mon-store-path $MS; done" root@gz-open-dw-c205's password: unable to open mon store: /tmp/mon-store/ unable to open mon store: /tmp/mon-store/ unable to open mon store: /tmp/mon-store/ Connection to gz-open-dw-c205 closed.
why ??? i try to read the source code, but it is too hard for me,wish somebody cant help
Updated by Josh Durgin over 4 years ago
- Project changed from Ceph to RADOS
- Category changed from Monitor to Administration/Usability
- Status changed from New to Won't Fix
It looks like this is due to bluestore setting the rocksdb_db_paths config option in luminous. This causes the ceph-objectstore-tool update-mon-db command to write rocksdb files to 'db' and 'db.slow' directories in the CWD.
These extra directories need to be copied around in addition to the monstore directory, and at the end copied into the monstore/store.db directory.
The procedure for recovering monitors from bluestore in luminous should look like:
ms=/root/mon-store db=/root/db db_slow=/root/db.slow mkdir $ms # collect the cluster map from stopped OSDs for host in $hosts; do rsync -avz $ms/. user@$host:$ms.remote rsync -avz $db/. user@$host:$db.remote rsync -avz $db_slow/. user@$host:$db_slow.remote rm -rf $ms ssh user@$host <<EOF for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path \$osd --no-mon-config --op update-mon-db --mon-store-path $ms.remote done EOF rsync -avz user@$host:$ms.remote/. $ms rsync -avz user@$host:$db.remote/. $db rsync -avz user@$host:$db_slow.remote/. $db_slow done # rebuild the monitor store from the collected map, if the cluster does not # use cephx authentication, we can skip the following steps to update the # keyring with the caps, and there is no need to pass the "--keyring" option. # i.e. just use "ceph-monstore-tool $ms rebuild" instead ceph-authtool /path/to/admin.keyring -n mon. \ --cap mon 'allow *' ceph-authtool /path/to/admin.keyring -n client.admin \ --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring # make a backup of the corrupted store.db just in case! repeat for # all monitors. mv /var/lib/ceph/mon/mon.foo/store.db /var/lib/ceph/mon/mon.foo/store.db.corrupted # move the sst files from db and db.slow directories to the monitor store mv $db/*.sst $db_slow/*.sst $ms/store.db # move rebuild store.db into place. repeat for all monitors. mv $ms/store.db /var/lib/ceph/mon/mon.foo/store.db chown -R ceph:ceph /var/lib/ceph/mon/mon.foo/store.db
This was fixed incidentally in the move to centralized ceph config, by https://github.com/ceph/ceph/commit/955537776b4f29d0950dd6ca1e56f2ea30a7f1c9
This isn't backportable to luminous at this point, when it's close to EOL, so I'm closing this ticket.
Updated by Josh Durgin over 4 years ago
To be clear, this isn't an issue in mimic or later releases.