Support #22132
closedOSDs stuck in "booting" state after catastrophic data loss
0%
Description
Hi,
I have Ceph cluster with 3 MONs+MGRs and 5 OSDs with attached disks. All nodes have CoreOS as host OS, Ceph daemons run in Docker (luminous, ubuntu 16.04, bluestore, etcd). Ceph reports HEALTH_OK and I have some data stored on OSDs.
Imagine then (intentionally or due to some failure) almost simultaneously I destroy all the cluster nodes keeping just OSD disks. Now I want to recreate the cluster keeping all my OSD data. I recreate all the nodes and all of them get some new IP address. MONs and MGRs start without any problem, form a quorum (I see all mons in "ceph -s") and I see "/ceph-config/ceph/monSetupComplete" flag in etcd (etcd is redeployed as well so the flag is for sure newly added). While OSDs fail to start stucking on "start_boot" step. And "ceph osd tree" shows all the OSDs in "down" state.
What are the correct steps to bootstrap a Ceph cluster with existing (prepared and activated) OSD disks? One thing I figured out is that I should not regenerate cluster "fsid" so I patched the "ceph/daemon" image to pass my own "fsid". But seems it is not enough because OSDs still stuck. My guess is every OSD tries to connect to peers using previous osdmap and old OSD IP addresses. If it is so, is there any way to reset the osdmap keeping my data stored on OSD? What else could I do?
So far it blocks me from using Ceph in production since I cannot be sure in keeping of my data in case of cluster failure. Feel free to ask me if you need any logs or another details.
Files
Updated by Maxim Manuylov over 6 years ago
core@mm-ceph-mon-0 ~ $ ceph -s cluster: id: ecf1b1ee-d10f-741d-4e01-5124fb84ec4b health: HEALTH_OK services: mon: 3 daemons, quorum mm-ceph-mon-2,mm-ceph-mon-0,mm-ceph-mon-1 mgr: mm-ceph-mon-1(active), standbys: mm-ceph-mon-2, mm-ceph-mon-0 osd: 5 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: core@mm-ceph-mon-0 ~ $ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.30945 root default -4 0.06189 host mm-ceph-osd-0 0 0.06189 osd.0 down 0 1.00000 -3 0.06189 host mm-ceph-osd-1 4 0.06189 osd.4 down 0 1.00000 -5 0.06189 host mm-ceph-osd-2 3 0.06189 osd.3 down 0 1.00000 -6 0.06189 host mm-ceph-osd-3 2 0.06189 osd.2 down 0 1.00000 -2 0.06189 host mm-ceph-osd-4 1 0.06189 osd.1 down 0 1.00000 core@mm-ceph-mon-0 ~ $ ceph osd dump epoch 6 fsid ecf1b1ee-d10f-741d-4e01-5124fb84ec4b created 2017-11-15 15:56:37.832653 modified 2017-11-15 15:56:45.402958 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 5 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client jewel require_osd_release luminous max_osd 5 osd.0 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) - - - - exists,new fc9f64c3-5301-4981-9668-96fbb3d2b606 osd.1 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) - - - - exists,new 2252cf36-ccda-469e-9836-6dcb55891517 osd.2 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) - - - - exists,new cdc1a1c6-7016-4470-ab1a-0cce2809092e osd.3 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) - - - - exists,new 1dbb0120-850b-4c59-bbce-c43bef2161d8 osd.4 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) - - - - exists,new 1780ee97-2f46-441b-80bb-714dc5cd2f1b core@mm-ceph-mon-0 ~ $
Updated by Maxim Manuylov over 6 years ago
Attaching OSD log (one of).
Updated by Greg Farnum over 6 years ago
- Tracker changed from Bug to Support
- Project changed from Ceph to RADOS
- Subject changed from OSDs stuck in "booting" state after entire cluster redeploy to OSDs stuck in "booting" state after catastrophic data loss
- Category deleted (
OSD) - Status changed from New to Resolved
This isn't impossible but I believe you've gone about it the wrong way. See http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures, and I recommend discussing on the mailing list if you have questions. :)