Project

General

Profile

Bug #24419

ceph-objectstore-tool unable to open mon store

Added by dovefi Z about 2 years ago. Updated 8 months ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Support
Tags:
ceph-objectstore-tool --mon-store-path
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

Hi,everyone;
I use luminous v12.2.5,and i try to recovery monitor database from osds,
I perform step by step according to the official website's documentation
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds,
i have 2 hosts A and B, 3 osds per host.
but after i collected map from osds of A and rsync mon store to B, and try to collect map from B,
it print a errer "unable to open mon store /tmp/mon-store",here were what i did.
step1: collect from host A

[root@gz-open-dw-c204 recover]# MS=/tmp/mon-store/; mkdir $MS; host=gz-open-dw-c204; ssh -t root@$host "for osd in /var/lib/ceph/osd/ceph-*; do  ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --mon-store-path $MS; done" 
root@gz-open-dw-c204's password:
osd.0   : 0 osdmaps trimmed, 52 osdmaps added.
          267 pgs added.
osd.1   : 0 osdmaps trimmed, 0 osdmaps added.
          259 pgs added.
osd.2   : 0 osdmaps trimmed, 0 osdmaps added.
          242 pgs added.
Connection to gz-open-dw-c204 closed.

step2: rsync to host B
[root@gz-open-dw-c204 recover]# MS=/tmp/mon-store; host=gz-open-dw-c205; rsync -avz $MS/ root@$host:$MS/
root@gz-open-dw-c205's password:
sending incremental file list
created directory /tmp/mon-store
./
kv_backend
store.db.wal/
store.db.wal/000009.log
store.db/
store.db/CURRENT
store.db/IDENTITY
store.db/LOCK
store.db/MANIFEST-000008
store.db/OPTIONS-000008
store.db/OPTIONS-000011

sent 16457 bytes  received 175 bytes  3696.00 bytes/sec
total size is 148691  speedup is 8.94

step3: collect from B
[root@gz-open-dw-c204 recover]# MS=/tmp/mon-store/; host=gz-open-dw-c205; ssh -t root@$host "for osd in /var/lib/ceph/osd/ceph-*; do  ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --mon-store-path $MS; done" 
root@gz-open-dw-c205's password:
unable to open mon store: /tmp/mon-store/
unable to open mon store: /tmp/mon-store/
unable to open mon store: /tmp/mon-store/
Connection to gz-open-dw-c205 closed.

why ??? i try to read the source code, but it is too hard for me,wish somebody cant help

History

#1 Updated by Kevin Cao over 1 year ago

Were you able to figure out why?

#2 Updated by yite gu about 1 year ago

I have the same problem.

#3 Updated by Josh Durgin 8 months ago

  • Project changed from Ceph to RADOS
  • Category changed from Monitor to Administration/Usability
  • Status changed from New to Won't Fix

It looks like this is due to bluestore setting the rocksdb_db_paths config option in luminous. This causes the ceph-objectstore-tool update-mon-db command to write rocksdb files to 'db' and 'db.slow' directories in the CWD.

These extra directories need to be copied around in addition to the monstore directory, and at the end copied into the monstore/store.db directory.

The procedure for recovering monitors from bluestore in luminous should look like:

ms=/root/mon-store
db=/root/db
db_slow=/root/db.slow
mkdir $ms

# collect the cluster map from stopped OSDs
for host in $hosts; do
  rsync -avz $ms/. user@$host:$ms.remote
  rsync -avz $db/. user@$host:$db.remote
  rsync -avz $db_slow/. user@$host:$db_slow.remote
  rm -rf $ms
  ssh user@$host <<EOF
    for osd in /var/lib/ceph/osd/ceph-*; do
      ceph-objectstore-tool --data-path \$osd --no-mon-config --op update-mon-db --mon-store-path $ms.remote
    done
EOF
  rsync -avz user@$host:$ms.remote/. $ms
  rsync -avz user@$host:$db.remote/. $db
  rsync -avz user@$host:$db_slow.remote/. $db_slow
done

# rebuild the monitor store from the collected map, if the cluster does not
# use cephx authentication, we can skip the following steps to update the
# keyring with the caps, and there is no need to pass the "--keyring" option.
# i.e. just use "ceph-monstore-tool $ms rebuild" instead
ceph-authtool /path/to/admin.keyring -n mon. \
  --cap mon 'allow *'
ceph-authtool /path/to/admin.keyring -n client.admin \
  --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring

# make a backup of the corrupted store.db just in case!  repeat for
# all monitors.
mv /var/lib/ceph/mon/mon.foo/store.db /var/lib/ceph/mon/mon.foo/store.db.corrupted

# move the sst files from db and db.slow directories to the monitor store
mv $db/*.sst $db_slow/*.sst $ms/store.db
# move rebuild store.db into place.  repeat for all monitors.
mv $ms/store.db /var/lib/ceph/mon/mon.foo/store.db
chown -R ceph:ceph /var/lib/ceph/mon/mon.foo/store.db

This was fixed incidentally in the move to centralized ceph config, by https://github.com/ceph/ceph/commit/955537776b4f29d0950dd6ca1e56f2ea30a7f1c9
This isn't backportable to luminous at this point, when it's close to EOL, so I'm closing this ticket.

#4 Updated by Josh Durgin 8 months ago

To be clear, this isn't an issue in mimic or later releases.

Also available in: Atom PDF