Project

General

Profile

Actions

Bug #16336

closed

restart ceph, osd state stuck in booting (not set noup)

Added by luree liu almost 8 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

restart ceph, osd.4 state stuck in booting.
ceph state stuck in peering, because osd.4 not replay other's osd peering request.
about 30 minutes,monitor checkout osd.4 has 900 sec not report pg stat,mark osd.4 down.after 900 sec osd.4 out, but osd.4 process is running.
ceph start peering and recovery.After a period of time, ceph every thing be ok.

reboot osd.4,every thing be ok.


Files

B]ZR34]HI[AVB9PU`BJB}TB.jpg (86.8 KB) B]ZR34]HI[AVB9PU`BJB}TB.jpg luree liu, 06/16/2016 02:58 AM
ceph-osd.4.log (273 KB) ceph-osd.4.log luree liu, 06/16/2016 02:58 AM
osd.4.ops (116 KB) osd.4.ops luree liu, 06/16/2016 02:59 AM
pg.1.162 (5.49 KB) pg.1.162 luree liu, 06/16/2016 02:59 AM
ceph.conf (4.87 KB) ceph.conf luree liu, 06/16/2016 03:12 AM
Actions #1

Updated by luree liu almost 8 years ago

before ceph restart, osdmap info

epoch 8136
fsid 674513c8-27bd-11e6-bf9b-00505687b8a5
created 2016-06-01 13:55:20.767146
modified 2016-06-15 09:57:46.195717
flags

pool 1 'pool' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 600 pgp_num 600 last_change 40 flags hashpspool stripe_width 0
removed_snaps [1~3]

max_osd 12
osd.0 up in weight 1 up_from 8116 up_thru 8135 down_at 8098 last_clean_interval [8092,8095) 37.0.2.101:6800/3280 37.0.1.101:6800/3280 37.0.1.101:6801/3280 37.0.2.101:6801/3280 exists,up 80d20dd9-c741-45c9-8d64-fdf611ea59af
osd.1 up in weight 1 up_from 8115 up_thru 8135 down_at 8098 last_clean_interval [8090,8095) 37.0.2.101:6802/3606 37.0.1.101:6802/3606 37.0.1.101:6803/3606 37.0.2.101:6803/3606 exists,up e39a9792-0497-4888-bdbc-2ad866a98b5a
osd.2 up in weight 1 up_from 8117 up_thru 8135 down_at 8098 last_clean_interval [8092,8095) 37.0.2.101:6804/3933 37.0.1.101:6804/3933 37.0.1.101:6805/3933 37.0.2.101:6805/3933 exists,up a6781ac2-0e38-447b-9010-e8d0bd1b777a
osd.3 up in weight 1 up_from 8120 up_thru 8135 down_at 8096 last_clean_interval [8094,8095) 37.0.2.101:6806/4287 37.0.1.101:6806/4287 37.0.1.101:6807/4287 37.0.2.101:6807/4287 exists,up 2d93a1b9-cf35-4381-b3d0-16243b6cd171
osd.4 up in weight 1 up_from 8131 up_thru 8131 down_at 8125 last_clean_interval [8068,8124) 37.0.2.100:6800/3390 37.0.1.100:6800/3390 37.0.1.100:6801/3390 37.0.2.100:6801/3390 exists,up 1c7948f1-38ac-4d6f-bd98-a236e630e3aa
osd.5 up in weight 1 up_from 8132 up_thru 8132 down_at 8127 last_clean_interval [8070,8124) 37.0.2.100:6802/3725 37.0.1.100:6802/3725 37.0.1.100:6803/3725 37.0.2.100:6803/3725 exists,up bef69cca-c4a7-4376-91db-50150043ea82
osd.6 up in weight 1 up_from 8133 up_thru 8133 down_at 8125 last_clean_interval [8070,8124) 37.0.2.100:6804/4006 37.0.1.100:6804/4006 37.0.1.100:6805/4006 37.0.2.100:6805/4006 exists,up 00146ef1-d71f-4c3c-b049-8dca33984df2
osd.7 up in weight 1 up_from 8135 up_thru 8135 down_at 8125 last_clean_interval [8069,8124) 37.0.2.100:6806/4346 37.0.1.100:6806/4346 37.0.1.100:6807/4346 37.0.2.100:6807/4346 exists,up 7b6708b0-5bdc-43af-85d1-7b06b0f55dea
osd.8 up in weight 1 up_from 8064 up_thru 8127 down_at 8063 last_clean_interval [6437,8062) 37.0.2.102:6804/3957 37.0.1.102:6804/3957 37.0.1.102:6805/3957 37.0.2.102:6805/3957 exists,up a7714146-f29c-4d69-8ef5-84938cc9bd4b
osd.9 up in weight 1 up_from 8070 up_thru 8135 down_at 8069 last_clean_interval [6444,8062) 37.0.2.102:6806/4326 37.0.1.102:6806/4326 37.0.1.102:6807/4326 37.0.2.102:6807/4326 exists,up 388ff0af-8f1c-4af5-945b-62413d0c8f3f
osd.10 up in weight 1 up_from 8067 up_thru 8135 down_at 8066 last_clean_interval [6440,8062) 37.0.2.102:6800/3271 37.0.1.102:6800/3271 37.0.1.102:6801/3271 37.0.2.102:6801/3271 exists,up f2a1cfb4-08a2-45ab-9633-65f44ccb18a0
osd.11 up in weight 1 up_from 8068 up_thru 8135 down_at 8067 last_clean_interval [6440,8062) 37.0.2.102:6802/3593 37.0.1.102:6802/3593 37.0.1.102:6803/3593 37.0.2.102:6803/3593 exists,up b60fa2da-3077-4ea8-a63d-8b02780ad717

Actions #2

Updated by luree liu almost 8 years ago

epoch 8137
fsid 674513c8-27bd-11e6-bf9b-00505687b8a5
created 2016-06-01 13:55:20.767146
modified 2016-06-15 10:45:34.455058
flags

pool 1 'pool' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 600 pgp_num 600 last_change 40 flags hashpspool stripe_width 0
removed_snaps [1~3]

max_osd 12
osd.0 up in weight 1 up_from 8116 up_thru 8135 down_at 8098 last_clean_interval [8092,8095) 37.0.2.101:6800/3280 37.0.1.101:6800/3280 37.0.1.101:6801/3280 37.0.2.101:6801/3280 exists,up 80d20dd9-c741-45c9-8d64-fdf611ea59af
osd.1 up in weight 1 up_from 8115 up_thru 8135 down_at 8098 last_clean_interval [8090,8095) 37.0.2.101:6802/3606 37.0.1.101:6802/3606 37.0.1.101:6803/3606 37.0.2.101:6803/3606 exists,up e39a9792-0497-4888-bdbc-2ad866a98b5a
osd.2 up in weight 1 up_from 8117 up_thru 8135 down_at 8098 last_clean_interval [8092,8095) 37.0.2.101:6804/3933 37.0.1.101:6804/3933 37.0.1.101:6805/3933 37.0.2.101:6805/3933 exists,up a6781ac2-0e38-447b-9010-e8d0bd1b777a
osd.3 up in weight 1 up_from 8120 up_thru 8135 down_at 8096 last_clean_interval [8094,8095) 37.0.2.101:6806/4287 37.0.1.101:6806/4287 37.0.1.101:6807/4287 37.0.2.101:6807/4287 exists,up 2d93a1b9-cf35-4381-b3d0-16243b6cd171
osd.4 up in weight 1 up_from 8131 up_thru 8131 down_at 8125 last_clean_interval [8068,8124) 37.0.2.100:6800/3390 37.0.1.100:6800/3390 37.0.1.100:6801/3390 37.0.2.100:6801/3390 exists,up 1c7948f1-38ac-4d6f-bd98-a236e630e3aa
osd.5 up in weight 1 up_from 8132 up_thru 8132 down_at 8127 last_clean_interval [8070,8124) 37.0.2.100:6802/3725 37.0.1.100:6802/3725 37.0.1.100:6803/3725 37.0.2.100:6803/3725 exists,up bef69cca-c4a7-4376-91db-50150043ea82
osd.6 up in weight 1 up_from 8133 up_thru 8133 down_at 8125 last_clean_interval [8070,8124) 37.0.2.100:6804/4006 37.0.1.100:6804/4006 37.0.1.100:6805/4006 37.0.2.100:6805/4006 exists,up 00146ef1-d71f-4c3c-b049-8dca33984df2
osd.7 up in weight 1 up_from 8135 up_thru 8135 down_at 8125 last_clean_interval [8069,8124) 37.0.2.100:6806/4346 37.0.1.100:6806/4346 37.0.1.100:6807/4346 37.0.2.100:6807/4346 exists,up 7b6708b0-5bdc-43af-85d1-7b06b0f55dea
osd.8 down in weight 1 up_from 8064 up_thru 8127 down_at 8137 last_clean_interval [6437,8062) 37.0.2.102:6804/3957 37.0.1.102:6804/3957 37.0.1.102:6805/3957 37.0.2.102:6805/3957 exists a7714146-f29c-4d69-8ef5-84938cc9bd4b
osd.9 up in weight 1 up_from 8070 up_thru 8135 down_at 8069 last_clean_interval [6444,8062) 37.0.2.102:6806/4326 37.0.1.102:6806/4326 37.0.1.102:6807/4326 37.0.2.102:6807/4326 exists,up 388ff0af-8f1c-4af5-945b-62413d0c8f3f
osd.10 up in weight 1 up_from 8067 up_thru 8135 down_at 8066 last_clean_interval [6440,8062) 37.0.2.102:6800/3271 37.0.1.102:6800/3271 37.0.1.102:6801/3271 37.0.2.102:6801/3271 exists,up f2a1cfb4-08a2-45ab-9633-65f44ccb18a0
osd.11 up in weight 1 up_from 8068 up_thru 8135 down_at 8067 last_clean_interval [6440,8062) 37.0.2.102:6802/3593 37.0.1.102:6802/3593 37.0.1.102:6803/3593 37.0.2.102:6803/3593 exists,up b60fa2da-3077-4ea8-a63d-8b02780ad717

Actions #3

Updated by luree liu almost 8 years ago

osd.4 Have the latest osdmap:

epoch 8150
fsid 674513c8-27bd-11e6-bf9b-00505687b8a5
created 2016-06-01 13:55:20.767146
modified 2016-06-15 10:45:48.852040
flags

pool 1 'pool' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 600 pgp_num 600 last_change 40 flags hashpspool stripe_width 0
removed_snaps [1~3]

max_osd 12
osd.0 up in weight 1 up_from 8146 up_thru 8146 down_at 8145 last_clean_interval [8116,8136) 37.0.2.101:6800/3262 37.0.1.101:6800/3262 37.0.1.101:6801/3262 37.0.2.101:6801/3262 exists,up 80d20dd9-c741-45c9-8d64-fdf611ea59af
osd.1 up in weight 1 up_from 8143 up_thru 8145 down_at 8142 last_clean_interval [8115,8136) 37.0.2.101:6802/3585 37.0.1.101:6802/3585 37.0.1.101:6803/3585 37.0.2.101:6803/3585 exists,up e39a9792-0497-4888-bdbc-2ad866a98b5a
osd.2 up in weight 1 up_from 8146 up_thru 8146 down_at 8145 last_clean_interval [8117,8136) 37.0.2.101:6804/3911 37.0.1.101:6804/3911 37.0.1.101:6805/3911 37.0.2.101:6805/3911 exists,up a6781ac2-0e38-447b-9010-e8d0bd1b777a
osd.3 up in weight 1 up_from 8149 up_thru 8149 down_at 8148 last_clean_interval [8120,8136) 37.0.2.101:6806/4233 37.0.1.101:6806/4233 37.0.1.101:6807/4233 37.0.2.101:6807/4233 exists,up 2d93a1b9-cf35-4381-b3d0-16243b6cd171
osd.4 up in weight 1 up_from 8131 up_thru 8149 down_at 8125 last_clean_interval [8068,8124) 37.0.2.100:6800/3390 37.0.1.100:6800/3390 37.0.1.100:6801/3390 37.0.2.100:6801/3390 exists,up 1c7948f1-38ac-4d6f-bd98-a236e630e3aa
osd.5 up in weight 1 up_from 8139 up_thru 8149 down_at 8138 last_clean_interval [8132,8136) 37.0.2.100:6802/3716 37.0.1.100:6802/3716 37.0.1.100:6803/3716 37.0.2.100:6803/3716 exists,up bef69cca-c4a7-4376-91db-50150043ea82
osd.6 up in weight 1 up_from 8139 up_thru 8149 down_at 8138 last_clean_interval [8133,8136) 37.0.2.100:6804/4005 37.0.1.100:6804/4005 37.0.1.100:6805/4005 37.0.2.100:6805/4005 exists,up 00146ef1-d71f-4c3c-b049-8dca33984df2
osd.7 up in weight 1 up_from 8140 up_thru 8149 down_at 8139 last_clean_interval [8135,8136) 37.0.2.100:6806/4347 37.0.1.100:6806/4347 37.0.1.100:6807/4347 37.0.2.100:6807/4347 exists,up 7b6708b0-5bdc-43af-85d1-7b06b0f55dea
osd.8 up in weight 1 up_from 8138 up_thru 8145 down_at 8137 last_clean_interval [8064,8136) 37.0.2.102:6804/3995 37.0.1.102:6804/3995 37.0.1.102:6805/3995 37.0.2.102:6805/3995 exists,up a7714146-f29c-4d69-8ef5-84938cc9bd4b
osd.9 up in weight 1 up_from 8145 up_thru 8149 down_at 8144 last_clean_interval [8070,8136) 37.0.2.102:6806/4398 37.0.1.102:6806/4398 37.0.1.102:6807/4398 37.0.2.102:6807/4398 exists,up 388ff0af-8f1c-4af5-945b-62413d0c8f3f
osd.10 up in weight 1 up_from 8143 up_thru 8149 down_at 8142 last_clean_interval [8067,8136) 37.0.2.102:6800/3304 37.0.1.102:6800/3304 37.0.1.102:6801/3304 37.0.2.102:6801/3304 exists,up f2a1cfb4-08a2-45ab-9633-65f44ccb18a0
osd.11 up in weight 1 up_from 8143 up_thru 8149 down_at 8142 last_clean_interval [8068,8136) 37.0.2.102:6802/3545 37.0.1.102:6802/3545 37.0.1.102:6803/3545 37.0.2.102:6803/3545 exists,up b60fa2da-3077-4ea8-a63d-8b02780ad717

Actions #4

Updated by luree liu almost 8 years ago

2016-06-15 10:45 ,monitor has not print osd.4 boot log:

[root@cvm-40-43 meta]# cat /var/log/ceph/tmp/ceph-mon.1.log-20160616 | grep boot
2016-06-15 09:57:40.784419 7fad4bd8d700 0 log_channel(cluster) log [INF] : osd.4 37.0.2.100:6800/3390 boot
2016-06-15 09:57:41.768579 7fad4bd8d700 0 log_channel(cluster) log [INF] : osd.5 37.0.2.100:6802/3725 boot
2016-06-15 09:57:42.786249 7fad4bd8d700 0 log_channel(cluster) log [INF] : osd.6 37.0.2.100:6804/4006 boot
2016-06-15 09:57:45.121083 7fad4bd8d700 0 log_channel(cluster) log [INF] : osd.7 37.0.2.100:6806/4346 boot
2016-06-15 10:45:35.541561 7f620984e700 0 log_channel(cluster) log [INF] : osd.8 37.0.2.102:6804/3995 boot
2016-06-15 10:45:36.561945 7f620984e700 0 log_channel(cluster) log [INF] : osd.5 37.0.2.100:6802/3716 boot
2016-06-15 10:45:36.562066 7f620984e700 0 log_channel(cluster) log [INF] : osd.6 37.0.2.100:6804/4005 boot
2016-06-15 10:45:37.641431 7f620984e700 0 log_channel(cluster) log [INF] : osd.7 37.0.2.100:6806/4347 boot
2016-06-15 10:45:40.774959 7f620984e700 0 log_channel(cluster) log [INF] : osd.1 37.0.2.101:6802/3585 boot
2016-06-15 10:45:40.775044 7f620984e700 0 log_channel(cluster) log [INF] : osd.11 37.0.2.102:6802/3545 boot
2016-06-15 10:45:40.775213 7f620984e700 0 log_channel(cluster) log [INF] : osd.10 37.0.2.102:6800/3304 boot
2016-06-15 10:45:42.786944 7f620984e700 0 log_channel(cluster) log [INF] : osd.9 37.0.2.102:6806/4398 boot
2016-06-15 10:45:43.793249 7f620984e700 0 log_channel(cluster) log [INF] : osd.2 37.0.2.101:6804/3911 boot
2016-06-15 10:45:43.793320 7f620984e700 0 log_channel(cluster) log [INF] : osd.0 37.0.2.101:6800/3262 boot
2016-06-15 10:45:47.853283 7f620984e700 0 log_channel(cluster) log [INF] : osd.3 37.0.2.101:6806/4233 boot

Actions #5

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Can't reproduce

please reopen if you see this behavior on jewel or later!

Actions

Also available in: Atom PDF