Project

General

Profile

Actions

Bug #5082

closed

OSD wrongly marked as down

Added by Ivan Kudryavtsev almost 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During ceph crush manipulation

ceph osd crush set 17 osd.17 0.8 pool=default host=ceph-osd-2-1

I see messages like this:

2013-05-16 12:13:59.089557 mon.0 [INF] osdmap e26129: 37 osds: 32 up, 36 in
2013-05-16 12:15:30.739882 osd.31 [WRN] map e26100 wrongly marked me down

Actually, I have 36 working OSDs, but 4 of them are marked as down, however they are alive. I'm using replication factor 3 and as you can see, even with such replication factor I can be in situation when all OSDs with PG will be marked as offline. Is it correct and how it could be avoided.

after some seconds they become online again and I have 36 of 37 online which is OK.

Actions #1

Updated by Ivan Kudryavtsev almost 11 years ago

ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)

Actions #2

Updated by Ivan Kudryavtsev almost 11 years ago

Could it be because I'm using the command above instead of
ceph osd crush reweight?

Actions #3

Updated by Sage Weil almost 11 years ago

can you attach 'ceph osd tree' output before and after the command? it's not clear to me what is going on.. you shouldn't see osds go down from a reweight, though!

Actions #4

Updated by Ivan Kudryavtsev almost 11 years ago

When it reports them down, they're down in tree also, after some seconds they're up again.

ceph osd crush reweight leads to the same effect.

Dmesg fragment from one of clients (after reweighted osd.1 to 0.8 from 1):

[6873884.384863] libceph: osd1 down
[6873914.882049] libceph: tid 562749284 timed out on osd32, will reset osd
[6873914.882131] libceph: tid 562749289 timed out on osd11, will reset osd
[6873914.882188] libceph: tid 562749291 timed out on osd14, will reset osd
[6873914.882241] libceph: tid 562749306 timed out on osd33, will reset osd
[6873914.882297] libceph: tid 562749422 timed out on osd34, will reset osd
[6873914.882355] libceph: tid 562749423 timed out on osd28, will reset osd
[6873914.882412] libceph: tid 562749450 timed out on osd8, will reset osd
[6873924.878521] libceph: tid 562749530 timed out on osd20, will reset osd
[6873924.878600] libceph: tid 562749535 timed out on osd30, will reset osd
[6873929.876708] libceph: tid 562749547 timed out on osd0, will reset osd
[6873933.185726] libceph: osd1 up
[6873939.873141] libceph: tid 562749576 timed out on osd2, will reset osd
[6873944.871372] libceph: tid 562749637 timed out on osd21, will reset osd
[6874405.538742] libceph: osd1 down
[6874405.538859] libceph: osd1 up

Actions #5

Updated by Ivan Kudryavtsev almost 11 years ago

reweighted OSD.9 -> OSD.9, OSD.21 down

# id    weight    type name    up/down    reweight
-1    28.45    pool default
-11    20.2        datacenter zingdc-1
-10    20.2            room zingdc-1-room-1
-8    9.15                rack zingdc-1-rack-1
-4    9.15                    host ceph-osd-2-1
1    0.8                        osd.1    up    1    
5    1                        osd.5    up    1    
6    0.25                        osd.6    up    1    
9    0.8                        osd.9    down    1    
12    1                        osd.12    up    1    
15    1                        osd.15    up    1    
16    0.25                        osd.16    up    1    
17    0.8                        osd.17    up    1    
18    0.25                        osd.18    up    1    
30    1                        osd.30    up    1    
31    1                        osd.31    up    1    
32    1                        osd.32    up    1    
-9    11.05                rack zingdc-1-rack-5
-5    11.05                    host ceph-osd-1-1
2    1                        osd.2    up    1    
7    1                        osd.7    up    1    
8    0.25                        osd.8    up    1    
11    1                        osd.11    up    1    
13    1                        osd.13    up    1    
19    1                        osd.19    up    1    
20    0.8                        osd.20    up    1    
21    1                        osd.21    down    1    
22    1                        osd.22    up    1    
27    1                        osd.27    up    1    
28    1                        osd.28    up    1    
29    1                        osd.29    up    1    
-7    8.25        datacenter megafonsib-1
-6    8.25            room megafonsib-1-room-1
-3    8.25                rack megafonsib-1-rack-1
-2    8.25                    host ceph-osd-3-1
0    0.25                        osd.0    up    1    
3    0.25                        osd.3    up    1    
4    0.25                        osd.4    up    1    
10    0.25                        osd.10    up    1    
14    0.25                        osd.14    up    1    
23    1                        osd.23    up    1    
24    0.5                        osd.24    up    1    
25    1                        osd.25    up    1    
26    0.5                        osd.26    up    1    
33    1                        osd.33    up    1    
34    1                        osd.34    up    1    
35    1                        osd.35    up    1    
36    1                        osd.36    down    0    

OSD.36 is also down, but it's OK.

after some seconds:

# id    weight    type name    up/down    reweight
-1    28.45    pool default
-11    20.2        datacenter zingdc-1
-10    20.2            room zingdc-1-room-1
-8    9.15                rack zingdc-1-rack-1
-4    9.15                    host ceph-osd-2-1
1    0.8                        osd.1    up    1    
5    1                        osd.5    up    1    
6    0.25                        osd.6    up    1    
9    0.8                        osd.9    up    1    
12    1                        osd.12    up    1    
15    1                        osd.15    up    1    
16    0.25                        osd.16    up    1    
17    0.8                        osd.17    up    1    
18    0.25                        osd.18    up    1    
30    1                        osd.30    up    1    
31    1                        osd.31    up    1    
32    1                        osd.32    up    1    
-9    11.05                rack zingdc-1-rack-5
-5    11.05                    host ceph-osd-1-1
2    1                        osd.2    up    1    
7    1                        osd.7    up    1    
8    0.25                        osd.8    up    1    
11    1                        osd.11    up    1    
13    1                        osd.13    up    1    
19    1                        osd.19    up    1    
20    0.8                        osd.20    up    1    
21    1                        osd.21    up    1    
22    1                        osd.22    up    1    
27    1                        osd.27    up    1    
28    1                        osd.28    up    1    
29    1                        osd.29    up    1    
-7    8.25        datacenter megafonsib-1
-6    8.25            room megafonsib-1-room-1
-3    8.25                rack megafonsib-1-rack-1
-2    8.25                    host ceph-osd-3-1
0    0.25                        osd.0    up    1    
3    0.25                        osd.3    up    1    
4    0.25                        osd.4    up    1    
10    0.25                        osd.10    up    1    
14    0.25                        osd.14    up    1    
23    1                        osd.23    up    1    
24    0.5                        osd.24    up    1    
25    1                        osd.25    up    1    
26    0.5                        osd.26    up    1    
33    1                        osd.33    up    1    
34    1                        osd.34    up    1    
35    1                        osd.35    up    1    
36    1                        osd.36    down    0    
Actions #6

Updated by Sage Weil almost 11 years ago

  • Status changed from New to Need More Info

hmm, ok, this is going to need more in the way of logs to diagnose. can you capture 'debug mon = 10', 'debug ms = 1' on the mons and reproduce? so we can see who is mark who down. then we can go from there.

thanks!

Actions #7

Updated by Ivan Kudryavtsev almost 11 years ago

BTW, I got a lot of

2013-05-17 00:52:47.278462 osd.23 [WRN] slow request 30.313363 seconds old, received at 2013-05-17 00:52:16.965044: osd_op(client.11689.1:568237435 rb.0.2cd1.238e1f29.000000000ca7 [write 1699840~4096] 2.7c2320da RETRY) currently waiting for ondisk
2013-05-17 00:52:47.278466 osd.23 [WRN] slow request 30.313346 seconds old, received at 2013-05-17 00:52:16.965061: osd_op(client.11689.1:568237436 rb.0.2cd1.238e1f29.000000000ca7 [write 1556480~4096] 2.7c2320da RETRY) currently waiting for ondisk
2013-05-17 00:52:47.278470 osd.23 [WRN] slow request 30.313332 seconds old, received at 2013-05-17 00:52:16.965075: osd_op(client.11689.1:568237507 rb.0.2cd1.238e1f29.000000000c8d [write 3395584~4096] 2.f5810eda RETRY) currently waiting for ondisk
2013-05-17 00:52:47.278474 osd.23 [WRN] slow request 30.311470 seconds old, received at 2013-05-17 00:52:16.966937: osd_op(client.11689.1:568237508 rb.0.2cd1.238e1f29.000000000c8d [write 3457024~4096] 2.f5810eda RETRY) currently waiting for ondisk
2013-05-17 00:52:47.278477 osd.23 [WRN] slow request 30.311455 seconds old, received at 2013-05-17 00:52:16.966952: osd_op(client.11689.1:568237665 rb.0.3bfe.2ae8944a.00000009b450 [write 0~32768] 2.81ae81da RETRY) currently waiting for ondisk
2013-05-17 00:52:48.278604 osd.23 [WRN] 14 slow requests, 2 included below; oldest blocked for > 95.203284 secs
2013-05-17 00:52:48.278609 osd.23 [WRN] slow request 31.308406 seconds old, received at 2013-05-17 00:52:16.970157: osd_op(client.11689.1:568237859 rb.0.2c86.238e1f29.00000000111f [write 2965504~4096] 2.7b0e33da RETRY) currently waiting for ondisk
2013-05-17 00:52:48.278615 osd.23 [WRN] slow request 31.308391 seconds old, received at 2013-05-17 00:52:16.970172: osd_op(client.11689.1:568237860 rb.0.2c86.238e1f29.00000000111f [write 2969600~32768] 2.7b0e33da RETRY) currently waiting for ondisk

records and some VMs hung, thus...

Actions #8

Updated by Ivan Kudryavtsev almost 11 years ago

As I see It could be that During the process a lot of IO placed on backing device and OSD just waits in 'D' state and doesn't reply, thus this causes slow requests and wrong down. Could it be?

Actions #9

Updated by Ivan Kudryavtsev almost 11 years ago

I wonder if next scenario could be realized in my case. I have a lot of data on OSD and change weight such as OSD loses a moves a lot of data, again, I have large amount of RAM (now 16GB free), and when data migration started some data from OSD (from RAM) creates a lot of pressure to another OSDs, especially on the same host or something like this, thus OSDs under pressure are unresponsible.

Could it be?

Actions #10

Updated by Ivan Kudryavtsev almost 11 years ago

loses a moves
moves

Actions #11

Updated by Ivan Kudryavtsev almost 11 years ago

Since I use XFS on OSD and It does massive RAM caching, and linux VFS in general, don't you think it could be the reason?

Actions #12

Updated by Ivan Kudryavtsev almost 11 years ago

I added options

osd recovery max active = 1
osd osd maxbackfills = 1

to config and injected them in configuration but it still causes Slow requests, however IO is not high according to iostat.

Actions #13

Updated by Sage Weil almost 11 years ago

still need a log to track down the osd marked down issue, if you have it

Actions #14

Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF