Actions
Support #18630
closedceph osd after reinstall always in down state
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Tags:
Reviewed:
Affected Versions:
Pull request ID:
Description
In my env using ceph version 10.2.1.It has 6 machines, each one has 4 sata osds, 1 ssd osd, total 30 osds.
This env builded 7 months ago. Yesterday, I try to reinstall 6 ssd osds, first remove the pool, after the pg all deleted.
Then remove the osd.0.
ceph osd out 0 systemctl stop ceph-osd@0 ceph osd crush remove osd.0 ceph osd rm 0
Clear the /var/lib/ceph/osd/ceph-0/ ford.
Then recreate this osd.0,using blkid uuid as osd id.
ceph osd create uuid ceph-osd -i 0 --mkfs chown ceph:ceph -R /var/lib/ceph/osd/ceph-0/* systemctl start ceph-osd@0
This ceph-osd -i 0 process is runing,but the osd state is down.
In ceph-osd log as follow:
2017-01-21 16:42:17.712695 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:17.712698 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:18.712768 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock 2017-01-21 16:42:18.712775 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:18.712799 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:18.712802 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:19.712890 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock 2017-01-21 16:42:19.712897 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:19.712912 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:19.712915 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:20.459919 7f3c14f4d700 5 osd.0 6954 heartbeat: osd_stat(20548 MB used, 682 GB avail, 702 GB total, peers []/[] op hist []) 2017-01-21 16:42:20.712990 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:20.712995 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock 2017-01-21 16:42:20.713014 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:20.713016 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:21.713088 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:21.713107 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:21.713110 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:21.713119 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock 2017-01-21 16:42:22.713201 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:22.713223 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:22.713226 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:22.713220 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock 2017-01-21 16:42:23.713249 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:23.713263 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:23.713265 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:23.713291 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock 2017-01-21 16:42:24.713333 7f3c2e115700 10 osd.0 6954 tick 2017-01-21 16:42:24.713348 7f3c2e115700 10 osd.0 6954 do_waiters -- start 2017-01-21 16:42:24.713350 7f3c2e115700 10 osd.0 6954 do_waiters -- finish 2017-01-21 16:42:24.713394 7f3c2d914700 10 osd.0 6954 tick_without_osd_lock
Ceph osd tree
[root@AIBJ-ONLINE-CEPH-001 ~]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -28 0 root default -26 347.95190 root sata -15 57.99194 rack SATA-K2 -14 57.99194 host SATA-AIBJ-ONLINE-CEPH-001 4 14.49799 osd.4 up 1.00000 1.00000 2 14.49799 osd.2 up 1.00000 1.00000 3 14.49799 osd.3 up 1.00000 1.00000 1 14.49799 osd.1 up 1.00000 1.00000 -17 57.99199 rack SATA-K3 -16 57.99199 host SATA-AIBJ-ONLINE-CEPH-002 6 14.49799 osd.6 up 1.00000 1.00000 7 14.49799 osd.7 up 1.00000 1.00000 8 14.49799 osd.8 up 1.00000 1.00000 9 14.49799 osd.9 up 1.00000 1.00000 -19 57.99199 rack SATA-K4 -18 57.99199 host SATA-AIBJ-ONLINE-CEPH-003 11 14.49799 osd.11 up 1.00000 1.00000 12 14.49799 osd.12 up 1.00000 1.00000 13 14.49799 osd.13 up 1.00000 1.00000 14 14.49799 osd.14 up 1.00000 1.00000 -21 57.99199 rack SATA-K5 -20 57.99199 host SATA-AIBJ-ONLINE-CEPH-004 16 14.49799 osd.16 up 1.00000 1.00000 17 14.49799 osd.17 up 1.00000 1.00000 18 14.49799 osd.18 up 1.00000 1.00000 19 14.49799 osd.19 up 1.00000 1.00000 -23 57.99199 rack SATA-K6 -22 57.99199 host SATA-AIBJ-ONLINE-CEPH-005 21 14.49799 osd.21 up 1.00000 1.00000 22 14.49799 osd.22 up 1.00000 1.00000 23 14.49799 osd.23 up 1.00000 1.00000 24 14.49799 osd.24 up 1.00000 1.00000 -25 57.99199 rack SATA-K7 -24 57.99199 host SATA-AIBJ-ONLINE-CEPH-006 26 14.49799 osd.26 up 1.00000 1.00000 27 14.49799 osd.27 up 1.00000 1.00000 28 14.49799 osd.28 up 1.00000 1.00000 29 14.49799 osd.29 up 1.00000 1.00000 -13 0.68639 root ssd -2 0.68639 rack SSD-K2 -1 0.68639 host SSD-AIBJ-ONLINE-CEPH-001 0 0.68639 osd.0 down 1.00000 1.00000 -4 0 rack SSD-K3 -3 0 host SSD-AIBJ-ONLINE-CEPH-002 -6 0 rack SSD-K4 -5 0 host SSD-AIBJ-ONLINE-CEPH-003 -8 0 rack SSD-K5 -7 0 host SSD-AIBJ-ONLINE-CEPH-004 -10 0 rack SSD-K6 -9 0 host SSD-AIBJ-ONLINE-CEPH-005 -12 0 rack SSD-K7 -11 0 host SSD-AIBJ-ONLINE-CEPH-006
Ceph osdmap
epoch 6956 fsid a3972be0-382d-4227-a3d0-c6e98c18b917 created 2016-06-02 15:12:17.828409 modified 2017-01-21 17:47:43.429416 flags noout,noscrub,nodeep-scrub,sortbitwise osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists,new 3c15fbc7-05a2-433e-a6eb-2ced71893a8b osd.1 up in weight 1 up_from 2684 up_thru 6944 down_at 2681 last_clean_interval [1892,2682) 10.19.4.37:6802/3975 10.164.0.37:6802/2003975 10.164.0.37:6805/2003975 10.19.4.37:6807/2003975 exists,up f7a45e51-0d36-4569-9999-855fde5e8989 osd.2 up in weight 1 up_from 2594 up_thru 6944 down_at 2590 last_clean_interval [1897,2593) 10.19.4.37:6800/3967 10.164.0.37:6800/3003967 10.164.0.37:6804/3003967 10.19.4.37:6805/3003967 exists,up 4c89923b-e41e-4857-bed3-3ea6eba7f780 osd.3 up in weight 1 up_from 6847 up_thru 6944 down_at 6845 last_clean_interval [1897,6845) 10.19.4.37:6808/7171 10.164.0.37:6803/7171 10.164.0.37:6808/7171 10.19.4.37:6810/7171 exists,up 304b3e89-176d-4311-a5cd-77f066608603 osd.4 up in weight 1 up_from 3123 up_thru 6944 down_at 3120 last_clean_interval [1892,3122) 10.19.4.37:6804/3973 10.164.0.37:6807/3003973 10.164.0.37:6809/3003973 10.19.4.37:6809/3003973 exists,up 41c720c3-eb11-4cb3-ae23-881c83999fd1 osd.5 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists,new ea7a061e-6383-463c-a265-e91470fb68a9 osd.6 up in weight 1 up_from 2257 up_thru 6944 down_at 2255 last_clean_interval [1825,2255) 10.19.4.38:6802/5118 10.164.0.38:6802/2005118 10.164.0.38:6803/2005118 10.19.4.38:6803/2005118 exists,up 2d722150-bb7a-46dd-b648-c156c04f1d15 osd.7 up in weight 1 up_from 3222 up_thru 6944 down_at 3216 last_clean_interval [1833,3220) 10.19.4.38:6804/5497 10.164.0.38:6810/3005497 10.164.0.38:6811/3005497 10.19.4.38:6810/3005497 exists,up 737a36eb-caa4-4502-8fe7-c31ae1d895d8 osd.8 up in weight 1 up_from 6248 up_thru 6944 down_at 6246 last_clean_interval [1835,6247) 10.19.4.38:6806/5772 10.164.0.38:6800/6005772 10.164.0.38:6806/6005772 10.19.4.38:6807/6005772 exists,up 57273aec-3553-47f4-a54c-a0edbd28c565 osd.9 up in weight 1 up_from 2126 up_thru 6944 down_at 2123 last_clean_interval [1840,2125) 10.19.4.38:6808/6077 10.164.0.38:6808/2006077 10.164.0.38:6809/2006077 10.19.4.38:6809/2006077 exists,up d0a7766d-e804-4530-9176-06fa0e42cb02 osd.11 up in weight 1 up_from 6870 up_thru 6944 down_at 6854 last_clean_interval [1638,6853) 10.19.4.39:6802/4059 10.164.0.39:6801/4059 10.164.0.39:6805/4059 10.19.4.39:6805/4059 exists,up a4f626ff-fd3f-4f28-8c79-64e63a3e542e osd.12 up in weight 1 up_from 6871 up_thru 6944 down_at 6854 last_clean_interval [1638,6853) 10.19.4.39:6800/4044 10.164.0.39:6800/4044 10.164.0.39:6804/4044 10.19.4.39:6804/4044 exists,up 84453d07-02fb-4caa-91d1-5a38c5194a40 osd.13 up in weight 1 up_from 6870 up_thru 6946 down_at 6856 last_clean_interval [1637,6853) 10.19.4.39:6803/4049 10.164.0.39:6803/4049 10.164.0.39:6807/4049 10.19.4.39:6807/4049 exists,up a782e2c5-fe2d-4aa0-9769-21ad0998ce39 osd.14 up in weight 1 up_from 6874 up_thru 6944 down_at 6854 last_clean_interval [1638,6853) 10.19.4.39:6801/4058 10.164.0.39:6802/4058 10.164.0.39:6806/4058 10.19.4.39:6806/4058 exists,up dbd29c97-ae97-4499-972d-530269bf7386 osd.16 up in weight 1 up_from 4608 up_thru 6944 down_at 4605 last_clean_interval [1769,4606) 10.19.4.40:6807/3950 10.164.0.40:6802/6003950 10.164.0.40:6803/6003950 10.19.4.40:6804/6003950 exists,up 7560be94-81f9-4a50-963a-d1fc0f8f672d osd.17 up in weight 1 up_from 4479 up_thru 6944 down_at 4476 last_clean_interval [1768,4477) 10.19.4.40:6803/3942 10.164.0.40:6804/5003942 10.164.0.40:6809/5003942 10.19.4.40:6809/5003942 exists,up 2970493c-f435-49bc-87dc-e235a2eec1bb osd.18 up in weight 1 up_from 2282 up_thru 6944 down_at 2270 last_clean_interval [1771,2271) 10.19.4.40:6806/3949 10.164.0.40:6800/2003949 10.164.0.40:6801/2003949 10.19.4.40:6801/2003949 exists,up 155363cd-0f0b-447e-a956-c9ff9e46accf osd.19 up in weight 1 up_from 4582 up_thru 6944 down_at 4580 last_clean_interval [1769,4581) 10.19.4.40:6800/3948 10.164.0.40:6805/6003948 10.164.0.40:6808/6003948 10.19.4.40:6805/6003948 exists,up 04a851b1-51c8-4b03-a55f-f8056cfbb300 osd.21 up in weight 1 up_from 6619 up_thru 6944 down_at 6617 last_clean_interval [1790,6617) 10.19.4.41:6800/3937 10.164.0.41:6800/9003937 10.164.0.41:6801/9003937 10.19.4.41:6804/9003937 exists,up e4e72f5c-4723-4afa-9081-6fa26e22be3b osd.22 up in weight 1 up_from 2904 up_thru 6944 down_at 2901 last_clean_interval [1790,2903) 10.19.4.41:6801/3943 10.164.0.41:6802/2003943 10.164.0.41:6807/2003943 10.19.4.41:6807/2003943 exists,up 0b6fe0b7-dd50-4748-aab2-8d65fff151d9 osd.23 up in weight 1 up_from 2759 up_thru 6944 down_at 2756 last_clean_interval [1790,2757) 10.19.4.41:6802/3951 10.164.0.41:6805/2003951 10.164.0.41:6806/2003951 10.19.4.41:6805/2003951 exists,up baecf789-0443-4855-8e92-170b658737b8 osd.24 up in weight 1 up_from 3635 up_thru 6944 down_at 3632 last_clean_interval [1791,3634) 10.19.4.41:6803/3932 10.164.0.41:6804/2003932 10.164.0.41:6809/2003932 10.19.4.41:6808/2003932 exists,up a02207c4-6889-466b-92b5-4d42d0cb4e88 osd.26 up in weight 1 up_from 6866 up_thru 6944 down_at 6864 last_clean_interval [5826,6865) 10.19.4.42:6800/4030 10.164.0.42:6812/1004030 10.164.0.42:6813/1004030 10.19.4.42:6811/1004030 exists,up bea179e7-53d2-4516-a682-bdb29561c287 osd.27 up in weight 1 up_from 6230 up_thru 6944 down_at 6228 last_clean_interval [5815,6228) 10.19.4.42:6803/4034 10.164.0.42:6810/3004034 10.164.0.42:6811/3004034 10.19.4.42:6810/3004034 exists,up 8f05aa2d-5257-451e-a92e-f0f991a40af2 osd.28 up in weight 1 up_from 5831 up_thru 6944 down_at 5791 last_clean_interval [1850,5790) 10.19.4.42:6802/4029 10.164.0.42:6803/4029 10.164.0.42:6807/4029 10.19.4.42:6808/4029 exists,up 7f169dae-db12-489f-be91-c55c40bedd58 osd.29 up in weight 1 up_from 6861 up_thru 6944 down_at 6859 last_clean_interval [5817,6860) 10.19.4.42:6804/4024 10.164.0.42:6802/3004024 10.164.0.42:6806/3004024 10.19.4.42:6807/3004024 exists,up 11ec930e-6e9a-4bbc-8797-705263e54c02
ceph status
[root@AIBJ-ONLINE-CEPH-001 ~]# ceph -s cluster a3972be0-382d-4227-a3d0-c6e98c18b917 health HEALTH_WARN too many PGs per OSD (1920 > max 300) pool cn_bj_dev_cinder has many more objects per pg than average (too few pgs?) 2/26 in osds are down noout,noscrub,nodeep-scrub,sortbitwise flag(s) set monmap e11: 5 mons at {0=10.19.4.37:6789/0,1=10.19.4.38:6789/0,2=10.19.4.39:6789/0,3=10.19.4.40:6789/0,4=10.19.4.41:6789/0} election epoch 486, quorum 0,1,2,3,4 0,1,2,3,4 osdmap e6956: 26 osds: 24 up, 26 in flags noout,noscrub,nodeep-scrub,sortbitwise pgmap v17797174: 16640 pgs, 28 pools, 73890 GB data, 9898 kobjects 216 TB used, 131 TB / 347 TB avail 16640 active+clean
ceph-osd -i 0 process network stat
[root@AIBJ-ONLINE-CEPH-001 ~]# netstat -napt | grep 28643 tcp 0 0 10.164.0.37:6801 0.0.0.0:* LISTEN 28643/ceph-osd tcp 0 0 10.19.4.37:6801 0.0.0.0:* LISTEN 28643/ceph-osd tcp 0 0 10.19.4.37:6803 0.0.0.0:* LISTEN 28643/ceph-osd tcp 0 0 10.164.0.37:6806 0.0.0.0:* LISTEN 28643/ceph-osd tcp 0 0 10.19.4.37:51177 10.19.4.37:6789 ESTABLISHED 28643/ceph-osd
ceph-mon log
2017-01-21 16:26:41.643298 7ffe762bc700 0 log_channel(audit) log [INF] : from='client.? 10.19.4.37:0/2377338628' entity='osd.0' cmd=[{"prefix": "osd crush create-or-move", "args": ["host=AIBJ-ONLINE-CEPH-001", "root=default"], "id": 0, "weight": 0.6864}]: dispatch 2017-01-21 16:26:41.643420 7ffe762bc700 0 mon.0@0(leader).osd e6952 create-or-move crush item name 'osd.0' initial_weight 0.6864 at location {host=AIBJ-ONLINE-CEPH-001,root=default} 2017-01-21 16:26:42.538790 7ffe794b2700 1 mon.0@0(leader).osd e6953 e6953: 26 osds: 24 up, 26 in 2017-01-21 16:26:42.539270 7ffe794b2700 0 mon.0@0(leader).osd e6953 crush map has features 288514051259236352, adjusting msgr requires 2017-01-21 16:26:42.539280 7ffe794b2700 0 mon.0@0(leader).osd e6953 crush map has features 288514051259236352, adjusting msgr requires 2017-01-21 16:26:42.539286 7ffe794b2700 0 mon.0@0(leader).osd e6953 crush map has features 288514051259236352, adjusting msgr requires 2017-01-21 16:26:42.539291 7ffe794b2700 0 mon.0@0(leader).osd e6953 crush map has features 288514051259236352, adjusting msgr requires 2017-01-21 16:26:42.539492 7ffe794b2700 10 remove_redundant_temporaries 2017-01-21 16:26:42.539495 7ffe794b2700 10 remove_down_pg_temp 2017-01-21 16:26:42.539620 7ffe794b2700 0 log_channel(audit) log [INF] : from='client.? 10.19.4.37:0/2377338628' entity='osd.0' cmd='[{"prefix": "osd crush create-or-move", "args": ["host=AIBJ-ONLINE-CEPH-001", "root=default"], "id": 0, "weight": 0.6864}]': finished
Actions