Project

General

Profile

Actions

Bug #1192

closed

multi-client stall under heavy write load

Added by Jim Schutt almost 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've been having trouble with Linux client stalls under heavy
write load. I think I've got the signature isolated. The
following logs were collected using the current stable branch
(commit f9e9490d77d), patched with a little extra debugging output.

This was with 64 clients against 48 OSDs, with 4 OSDs/server.

When this type of stall happens, multiple clients each have at least
one request that they retry forever. or until I get tired of waiting.
Each of these requests are directed to the same OSD.

Apparently the stalled requests are waiting in the policy throttler;
here's an example:

# grep -Hn "" osd.32.log | grep "0x983dc80 sd=312" 
osd.32.log:4278322:2011-06-15 11:14:29.325620 7f010b5d5940 -- 172.17.40.30:6800/7317 >> :/0 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=0).accept
osd.32.log:4278323:2011-06-15 11:14:29.325766 7f010b5d5940 -- 172.17.40.30:6800/7317 >> :/0 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=0).accept sd=312
osd.32.log:4278324:2011-06-15 11:14:29.325805 7f010b5d5940 -- 172.17.40.30:6800/7317 >> :/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=0).accept peer addr is :/3749931588
osd.32.log:4278325:2011-06-15 11:14:29.325820 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=0).accept peer addr is really 172.17.40.104:0/3749931588 (socket is 172.17.40.104:40535/0)
osd.32.log:4278326:2011-06-15 11:14:29.325855 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=0).accept got peer connect_seq 0 global_seq 374
osd.32.log:4278327:2011-06-15 11:14:29.325868 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=1).accept of host_type 8, policy.lossy=1
osd.32.log:4278328:2011-06-15 11:14:29.325881 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=1).accept my proto 24, their proto 24
osd.32.log:4278331:2011-06-15 11:14:29.325949 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=1).accept existing 0x5f69500.gseq 352 <= 374, looks ok
osd.32.log:4278332:2011-06-15 11:14:29.325959 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=1).accept replacing existing (lossy) channel (new one lossy=1)
osd.32.log:4278337:2011-06-15 11:14:29.326022 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=0 cs=0 l=1).accept replacing 0x5f69500
osd.32.log:4278343:2011-06-15 11:14:29.326106 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).accept success, connect_seq = 1, sending READY
osd.32.log:4278344:2011-06-15 11:14:29.326119 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).accept features 138
osd.32.log:4278345:2011-06-15 11:14:29.326143 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).register_pipe 0x983dc80
osd.32.log:4278348:2011-06-15 11:14:29.326208 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).accept starting writer, state=2
osd.32.log:4278349:2011-06-15 11:14:29.326254 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).accept done
osd.32.log:4278350:2011-06-15 11:14:29.326282 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).reader reading tag...
osd.32.log:4278351:2011-06-15 11:14:29.326295 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).reader got MSG
osd.32.log:4278352:2011-06-15 11:14:29.326359 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4278353:2011-06-15 11:14:29.326371 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).reader wants 2097280 from policy throttler 23068672/23333333
osd.32.log:4278354:2011-06-15 11:14:29.326494 7f010b4d4940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).writer: state = 2 policy.server=1
osd.32.log:4278355:2011-06-15 11:14:29.326521 7f010b4d4940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).writer sleeping
osd.32.log:4408968:2011-06-15 11:15:29.566658 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).was_session_reset
osd.32.log:4408969:2011-06-15 11:15:29.566677 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).discard_queue
osd.32.log:4408970:2011-06-15 11:15:29.566690 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1). dequeued pipe
osd.32.log:4408973:2011-06-15 11:15:29.566744 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=0 l=1).stop
osd.32.log:4408974:2011-06-15 11:15:29.566760 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=0 l=1).SHUT_RDWR on sd=312
osd.32.log:4408977:2011-06-15 11:15:29.566832 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=0 l=1).unregister_pipe 0x983dc80
osd.32.log:4408981:2011-06-15 11:15:29.566911 7f010b4d4940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=0 l=1).writer finishing
osd.32.log:4408982:2011-06-15 11:15:29.566939 7f010b4d4940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=0 l=1).writer done

That particular request was retried 11 times before I gave up:

# grep -Hn "" osd.32.log | grep "reader got envelope type=42 src client4209 tid=3084" 
osd.32.log:3751659:2011-06-15 11:12:28.790521 7f011b6d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x1125000 sd=184 pgs=314 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4117033:2011-06-15 11:13:30.492619 7f0114261940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x5f69500 sd=248 pgs=352 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4278352:2011-06-15 11:14:29.326359 7f010b5d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983dc80 sd=312 pgs=374 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4408989:2011-06-15 11:15:29.567876 7f0103555940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x503d280 sd=376 pgs=398 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4516095:2011-06-15 11:16:29.806672 7f00fb4d5940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x908c780 sd=440 pgs=417 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4606297:2011-06-15 11:17:29.960538 7f00f3657940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x1236000 sd=503 pgs=423 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4678884:2011-06-15 11:18:30.082235 7f00eb5d7940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x983f500 sd=567 pgs=428 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4739569:2011-06-15 11:19:30.195269 7f00e3557940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x973da00 sd=631 pgs=430 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4800037:2011-06-15 11:20:30.306227 7f00db4d7940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x908f000 sd=695 pgs=433 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4864997:2011-06-15 11:21:30.416680 7f00d3457940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x2069000 sd=759 pgs=434 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4924028:2011-06-15 11:22:30.527851 7f00cb3d7940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x5e9f500 sd=823 pgs=437 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:4980694:2011-06-15 11:23:30.639673 7f00c3357940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x5f4ca00 sd=887 pgs=438 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:5038812:2011-06-15 11:24:30.751323 7f00bb2d7940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x97bf000 sd=951 pgs=441 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0
osd.32.log:5098966:2011-06-15 11:25:30.862398 7f00b3257940 -- 172.17.40.30:6800/7317 >> 172.17.40.104:0/3749931588 pipe(0x9815500 sd=1015 pgs=442 cs=1 l=1).reader got envelope type=42 src client4209 tid=3084 front=128 data=2097152 off 0

When I parse that OSD log looking for messages that were allocated but not
freed, I find 11 such:

102062 messaged allocated, 102051 freed, 11 leaked

osd.32.log:2887851:2011-06-15 11:11:22.812374 7f0125d7b940 -- 172.17.40.30:6800/7317 >> 172.17.40.48:0/710998528 pipe(0x205b000 sd=101 pgs=197 cs=1 l=1).reader got message 23 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2888373:2011-06-15 11:11:22.898354 7f0121d3b940 -- 172.17.40.30:6800/7317 >> 172.17.40.102:0/4124033206 pipe(0x209f500 sd=109 pgs=141 cs=1 l=1).reader got message 42 0x2fe96c0 osd_op(client4202.1:2218 1000000f658.00000899 [write 0~2097152 [1@-1]] 0.54a9 RETRY snapc 1=[])
osd.32.log:2888669:2011-06-15 11:11:22.914830 7f0120927940 -- 172.17.40.30:6800/7317 >> 172.17.40.99:0/4088545569 pipe(0xfde000 sd=119 pgs=99 cs=1 l=1).reader got message 29 0x97f8b40 osd_op(client4201.1:2761 1000000f26f.00000ab4 [write 0~2097152 [1@-1]] 0.43c9 RETRY snapc 1=[])
osd.32.log:2888717:2011-06-15 11:11:22.915870 7f011f715940 -- 172.17.40.30:6800/7317 >> 172.17.40.66:0/2121378630 pipe(0x20bfa00 sd=154 pgs=140 cs=1 l=1).reader got message 42 0x96ecb40 osd_op(client4160.1:2285 10000006d7c.000008da [write 0~2097152 [1@-1]] 0.2879 RETRY snapc 1=[])
osd.32.log:2889440:2011-06-15 11:11:22.965759 7f0122b49940 -- 172.17.40.30:6800/7317 >> 172.17.40.87:0/2771882616 pipe(0x2069280 sd=138 pgs=81 cs=1 l=1).reader got message 46 0x11dec480 osd_op(client4190.1:2397 1000000b7c7.0000094a [write 0~2097152 [1@-1]] 0.8822 RETRY snapc 1=[])
osd.32.log:2889643:2011-06-15 11:11:22.971452 7f0124e6c940 -- 172.17.40.30:6800/7317 >> 172.17.40.71:0/598682247 pipe(0x13f0a00 sd=147 pgs=216 cs=1 l=1).reader got message 33 0x13e6240 osd_op(client4178.1:2817 100000059ef.00000aeb [write 0~2097152 [1@-1]] 0.5b4d RETRY snapc 1=[])
osd.32.log:2890879:2011-06-15 11:11:22.997740 7f0125775940 -- 172.17.40.30:6800/7317 >> 172.17.40.46:0/134575799 pipe(0x1432a00 sd=104 pgs=34 cs=1 l=1).reader got message 77 0x5f106c0 osd_op(client4150.1:2650 10000001f48.00000a4a [write 0~2097152 [1@-1]] 0.2822 RETRY snapc 1=[])
osd.32.log:2891083:2011-06-15 11:11:23.021261 7f011f917940 -- 172.17.40.30:6800/7317 >> 172.17.40.69:0/3984853652 pipe(0x13dc000 sd=157 pgs=178 cs=1 l=1).reader got message 33 0x2077b40 osd_op(client4169.1:3001 10000004279.00000ba3 [write 0~2097152 [1@-1]] 0.6e9c RETRY snapc 1=[])
osd.32.log:2891144:2011-06-15 11:11:23.031152 7f012516f940 -- 172.17.40.30:6800/7317 >> 172.17.40.55:0/2649293018 pipe(0x9088c80 sd=134 pgs=212 cs=1 l=1).reader got message 22 0x97d5480 osd_op(client4177.1:2919 100000088db.00000b51 [write 0~2097152 [1@-1]] 0.ae22 RETRY snapc 1=[])
osd.32.log:2891159:2011-06-15 11:11:23.038210 7f0120d2b940 -- 172.17.40.30:6800/7317 >> 172.17.40.81:0/1726187101 pipe(0x2069a00 sd=173 pgs=138 cs=1 l=1).reader got message 51 0x5111900 osd_op(client4180.1:2846 100000090ad.00000b09 [write 0~2097152 [1@-1]] 0.a677 RETRY snapc 1=[])
osd.32.log:2891782:2011-06-15 11:11:23.188190 7f011d0ef940 -- 172.17.40.30:6800/7317 >> 172.17.40.67:0/988437998 pipe(0x2066500 sd=158 pgs=175 cs=1 l=1).reader got message 31 0x900db40 osd_op(client4171.1:2632 10000004a4b.00000a34 [write 0~2097152 [1@-1]] 0.8ca9 RETRY snapc 1=[]) 

If I look at the processing of one of those messages, I see
that it seems to be stuck waiting for require_same_or_newer_map()
to succeed:

osd.32.log:2886557:2011-06-15 11:11:22.606598 7f0125d7b940 -- 172.17.40.30:6800/7317 >> 172.17.40.48:0/710998528 pipe(0x205b000 sd=101 pgs=197 cs=1 l=1).reader got envelope type=42 src client4153 tid=2857 front=128 data=2097152 off 0
osd.32.log:2887851:2011-06-15 11:11:22.812374 7f0125d7b940 -- 172.17.40.30:6800/7317 >> 172.17.40.48:0/710998528 pipe(0x205b000 sd=101 pgs=197 cs=1 l=1).reader got message 23 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2887864:2011-06-15 11:11:22.812773 7f0137b4b940 -- 172.17.40.30:6800/7317 dispatch_entry pipe 0x205b000 dequeued 0x5f5f900
osd.32.log:2887865:2011-06-15 11:11:22.812795 7f0137b4b940 -- 172.17.40.30:6800/7317 <== client4153 172.17.40.48:0/710998528 23 ==== osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[]) ==== 128+0+2097152 (1533754582 0 0) 0x5f5f900 con 0x9866500
osd.32.log:2887867:2011-06-15 11:11:22.812828 7f0137b4b940 osd32 46 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2887868:2011-06-15 11:11:22.812840 7f0137b4b940 osd32 46 require_same_or_newer_map 47 (i am 46) 0x5f5f900
osd.32.log:2887869:2011-06-15 11:11:22.812850 7f0137b4b940 osd32 46 waiting for newer map epoch 47 > my 46 with 0x5f5f900
osd.32.log:2887878:2011-06-15 11:11:22.812964 7f0137b4b940 -- 172.17.40.30:6800/7317 done calling dispatch on 0x5f5f900
osd.32.log:2901509:2011-06-15 11:11:24.218758 7f0137b4b940 osd32 47 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2901510:2011-06-15 11:11:24.218766 7f0137b4b940 osd32 47 require_same_or_newer_map 47 (i am 47) 0x5f5f900
osd.32.log:2915628:2011-06-15 11:11:25.498130 7f0137b4b940 osd32 49 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2915629:2011-06-15 11:11:25.498137 7f0137b4b940 osd32 49 require_same_or_newer_map 47 (i am 49) 0x5f5f900
osd.32.log:2930434:2011-06-15 11:11:26.616930 7f013734a940 osd32 50 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2930435:2011-06-15 11:11:26.616938 7f013734a940 osd32 50 require_same_or_newer_map 47 (i am 50) 0x5f5f900
osd.32.log:2934698:2011-06-15 11:11:26.754837 7f013734a940 osd32 50 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2934699:2011-06-15 11:11:26.754845 7f013734a940 osd32 50 require_same_or_newer_map 47 (i am 50) 0x5f5f900
osd.32.log:2970736:2011-06-15 11:11:28.662143 7f0137b4b940 osd32 52 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:2970737:2011-06-15 11:11:28.662152 7f0137b4b940 osd32 52 require_same_or_newer_map 47 (i am 52) 0x5f5f900
osd.32.log:3048499:2011-06-15 11:11:34.166768 7f0137b4b940 osd32 58 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:3048500:2011-06-15 11:11:34.166779 7f0137b4b940 osd32 58 require_same_or_newer_map 47 (i am 58) 0x5f5f900
osd.32.log:3091922:2011-06-15 11:11:36.250031 7f0137b4b940 osd32 60 _dispatch 0x5f5f900 osd_op(client4153.1:2857 10000000bbb.00000b17 [write 0~2097152 [1@-1]] 0.c879 RETRY snapc 1=[])
osd.32.log:3091923:2011-06-15 11:11:36.250045 7f0137b4b940 osd32 60 require_same_or_newer_map 47 (i am 60) 0x5f5f900

I can reproduce this, so I can try debug patches if needed, but
this issue is hard to trigger, and can take multiple trials, so
turnaround might be slow.

Actions

Also available in: Atom PDF