Bug #4079
closedosd: journal aio deadlock
Added by Alex Elder about 11 years ago. Updated about 11 years ago.
0%
Description
I don't really understand this yet. I have seen it occurring
with the new request code. I thought there could be a
connection with http://tracker.ceph.com/issues/3936 but maybe
that was just wishful thinking.
Starting last week I found that my normal rbd xfstests run
is hitting a problem around the time it runs test 074 or 075.
When this problem occurs, the test seems to be hung. However,
if you give it enough time you see that some forward progress
is made, it just takes a HUGE amount of time before tests
complete (i.e., hours rather than seconds or minutes).
I also notice that log messages looking like those show below
show up (the "Mounting Filesystem" and "Ending clean mount"
are normal when running these tests).
This is a fairly serious problem, and it is preventing me
from completing my normal test runs (well, within a reasonable
amount of time).
[ 4761.797182] XFS (rbd2): Mounting Filesystem
[ 4765.918310] XFS (rbd2): Ending clean mount
[ 4767.139085] XFS (rbd2): Mounting Filesystem
[ 4771.009442] XFS (rbd2): Ending clean mount
[ 5665.686530] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[ 6101.597699] XFS (rbd2): Mounting Filesystem
[ 6101.899587] XFS (rbd2): Ending clean mount
[ 6875.996583] XFS (rbd2): Mounting Filesystem
[ 6880.217723] XFS (rbd2): Ending clean mount
[ 7860.758422] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[ 8759.535196] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[10927.275859] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[12131.669510] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[14428.139118] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[15326.917246] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[16954.044979] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[16954.070327] libceph: osd0 10.214.133.23:6800 socket error on read
[19011.107288] libceph: osd0 10.214.133.23:6800 socket closed (con state OPEN)
[21159.809958] XFS (rbd2): Mounting Filesystem
[21164.052011] XFS (rbd2): Ending clean mount
Updated by Alex Elder about 11 years ago
Below is a chart with test number, an example of a test
run with results more in line with what's expected, and
the time for a set of slow tests I've seen. Not all tests
are slowed down, and that may give some information about
what is affected by the problem. Variability in the test
time is normal, but it's normally a few seconds off, not
a factor of 100.
Test Time Slow ---- ---- ---- 001 4s 123s 002 0s 0s 003 0s 0s 004 5s 510s 005 1s 0s 006 2s 1s 007 1s 1s 008 5s 3s 009 4s 485s 011 1s 6s 012 2s 16s 013 103s 9642s 014 219s 5381s 015 3s 203s 017 33s 5760s 019 3s 499s 020 3s 3s 021 2s 489s 026 15s 550s 027 15s 560s 028 25s 506s 029 2s 459s 031 51s 7736s 032 47s 855s 033 29s 3279s 034 4s 511s 041 77s 5271s 046 14s 512s 047 25s 507s 048 1s 0s 050 28s 3591s 051 1s 1s 052 3s 496s 053 3s 488s 054 15s 2434s 056 14s 552s 061 13s 549s 063 15s 551s 064 60s 1021s 065 40s 983s 066 4s 499s 067 6s 927s 069 7s 484s 070 27s 4876s 074 329s 3076s 075 363s 39375s 076 41s 3352s 078 70s 1330s 079 6s 495s 084 59s 136s 085 4s 498s 086 154s 15205s 087 52s 4976s 088 0s 0s 089 8s 16s 091 862s 15408s 092 4s 431s 100 18s 303s 103 2s 431s 105 3s 457s 108 13s 728s 110 38s 2217s 116 5s 488s 117 33s 5670s 118 7s 457s 119 10s 1394s 120 19s 472s 121 10s 488s 124 7s 530s 126 0s 0s 129 5s 435s 130 56s 807s 131 1s 1s 132 21s 468s 133 1149s 84964s 134 5s 469s 135 4s 474s 137 16s 489s 138 53s 2348s 139 51s 2390s 140 47s 2349s 141 3s 464s 164 0s 1s 165 1s 0s 166 2s 464s 167 333s 8033s 182 18s 535s 184 0s 0s 187 11s 1773s 188 20s 810s 189 6s 601s 190 4s 439s 192 40s 66s 194 3s 453s 196 6s 474s 199 3s 469s 201 58s 3615s 203 4s 464s 214 1s 36s 215 2s 2s 216 55s 3922s 220 6s 938s 221 1s 1s 222 4s 92s 223 31s 949s 224 39s 451s 225 30s 997s 226 96s 8298s 227 614s 76177s 234 14s 780s 236 1s 1s 237 0s 0s 238 4s 485s 241 73s 242s 243 1s 0s 244 13s 1424s 245 0s 0s 246 0s 1s 247 93s 2689s 248 0s 0s 249 2s 19s 253 11s 978s 257 0s 1s 258 1s 34s 259 51s 560s 261 2s 92s 262 5s 509s 269 419s 5961s 277 5s 498s 278 6s 835s 280 5s 505s 285 1s 0s 286 11s 57s
Updated by Alex Elder about 11 years ago
- Status changed from New to In Progress
I tried running just tests 74 and 75 this morning, and running
just those two (twice each, as well as doing that several times
in a row) did not reproduce this behavior. Test 74 takes about
290+ seconds for me and test 75 takes roughly 230+ seconds.
I just tried running tests 50-80 (which ends up running some
tests we normally don't run from that range) and it looks like
I've hit it.
Tests 55 and 57-60 were not run. Test 62 (normally not run) had
errors. Tests 63-72 ran successfully (including 68, which is
normally not run). 73 produced an output mismatch. Then test
74 took 1524 seconds (rather than 290) and test 75 is still
running.
Looking at the client machine, this is the exact process
running right now:
0 0 19660 18907 20 0 24820 21216 sleep_ D ? 0:03 /var/lib/xfstests/ltp/fsx -d -N 10000 -l 10485760 -S 0 -P /var/lib/xfstests 075.2
That seems to be operating on this file:
/tmp/cephtest/test_dir.96W7qOfsGy/fsx/075.2
And attempting to read that file hangs on the read, or did
initially.
Using dynamic debug was showing no activity in rbd.
I just turned on dynamic debug for libceph and got some
output. I'll go take a look at that.
Updated by Alex Elder about 11 years ago
I haven't done a lot of this, but here's something I find in
the client log after turning on all debug output for the
libceph kernel module.
This message gets sent to osd 1:
[40948.126518] libceph: ----- ffff88020cb9a200 to osd1 42=osd_op len 139+0+32768 -----
Above that we see:
[40948.126510] libceph: calc_object_layout 'rb.0.1008.238e1f29.000000000095' pgid 2.7938354d
[40948.126515] libceph: send_request ffff88020cbcfc00 tid 74425 to osd1 flags 36
So this is a WRITE ONDISK request with 32KB of data to object *00095.
Note transaction id is 74425.
That message gets written successfully in 0.450628 seconds:
[40948.946834] libceph: prepare_write_message ffff88020cb9a200 seq 14941 type 42 len 139+0+32768 0 pgs
[40949.010656] libceph: prepare_write_message front_crc 3377833246 middle_crc 0
[40949.046656] libceph: try_write out_kvec_bytes 202
[40949.080050] libceph: write_partial_kvec ffff880221d7b030 202 left
[40949.115100] libceph: write_partial_kvec ffff880221d7b030 0 left in 0 kvecs ret = 1
[40949.180145] libceph: write_partial_msg_pages ffff880221d7b030 msg ffff88020cb9a200 page 0/0 offset 0
[40949.250134] libceph: write_partial_msg_pages ffff880221d7b030 msg ffff88020cb9a200 done
[40949.321297] libceph: prepare_write_message_footer ffff880221d7b030
[40949.359660] libceph: write_partial_kvec ffff880221d7b030 13 left
[40949.397462] libceph: write_partial_kvec ffff880221d7b030 0 left in 0 kvecs ret = 1
Nothing really happens then for about 3 seconds, until:
[40952.142028] libceph: ceph_sock_data_ready on ffff880221d7b030 state = 5, queueing work
The tag we get is 7, which indicates an incoming message, and then
we learn the message is type 43, CEPH_MSG_OSD_OPREPLY.
[40952.626335] libceph: ===== ffff88020cb9ba00 14941 from osd1 43=osd_opreply len 117+0 (2055398074 0 0) =====
[40952.688569] libceph: handle_reply ffff88020cb9ba00 tid 74425 result 0
It's the same transaction id, 74425 as the write we just sent.
Should I expect a 32KB write request to take 2.744566 seconds
to complete?
Updated by Alex Elder about 11 years ago
And finally test 75 completed, taking 18,723 seconds,
more than 5 hours.
Again, forward progress, absurdly slow.
I could use some help looking at what's going on.
Updated by Alex Elder about 11 years ago
As Sage suggested, I tried with an older kernel. Actually,
I just tried it with the newly updated master branch, which
is based on 3.8-rc5 and includes basically everything in
the testing branch except the new rbd request code.
3ebc21f7 libceph: fix messenger CONFIG_BLOCK dependencies
These tests ran: 50-54 56 61 63-72 74-76 78 79
And the time for tests 74 and 75 (and 76):
074 324s
075 253s
076 31s
I then switched to the testing branch.
07741308 rbd: add barriers near done flag operations
The same tests were run. Results:
074 307s
075 199s
076 30s
I'm going to keep trying. Obviously it doesn't always
happen.
Updated by Alex Elder about 11 years ago
I should also mention, I'm running against the current "next"
branch from the ceph repository. At that moment, that is:
0942e005 OSD: always activate_map in advance_pgs, only send messages if up
And for what it's worth, I'm running with this for teuthology:
0c663ca8 schedule_suite.sh: ensure ceph and kernel branches exist
Updated by Alex Elder about 11 years ago
Is there any chance clock skew could have an impact?
I just got these messages. I don't suppose this is a
problem but I thought I'd mention it.
INFO:teuthology.task.ceph.mon.b.err:2013-02-14 15:34:32.740303 7f73b1fa8700 -1 mon.b@1(peon).paxos(mdsmap active c 1..5) lease_expire from mon.0 10.214.131.10:6789/0 is 0.053851 seconds in the past; mons are laggy or clocks are too skewed
INFO:teuthology.task.ceph.mon.b.err:2013-02-14 15:34:32.740423 7f73b1fa8700 -1 mon.b@1(peon).paxos(monmap active c 1..1) lease_expire from mon.0 10.214.131.10:6789/0 is 0.053820 seconds in the past; mons are laggy or clocks are too skewed
INFO:teuthology.task.ceph.mon.b.err:2013-02-14 15:34:32.740526 7f73b1fa8700 -1 mon.b@1(peon).paxos(auth active c 1..17) lease_expire from mon.0 10.214.131.10:6789/0 is 0.053785 seconds in the past; mons are laggy or clocks are too skewed
Updated by Alex Elder about 11 years ago
I am running xfstests 258 and 259 over rbd, repeating
10 times. The first three iterations took about 1 second
and 53 seconds for the tests, respectively. Then on the
fourth iteration it started taking about 10 seconds and
300+ seconds for them.
So I did some debug tracing on the client.
Here is an example, with extraneous stuff pulled out:
[42111.355376] libceph: send_request ffff880110be8400 tid 48315 to osd4 flags 36
[42111.453278] libceph: ----- ffff88020b8f0c00 to osd4 42=osd_op len 139+0+8192 -----
[42115.456282] libceph: prepare_write_message ffff88020b8f0c00 seq 3186 type 42 len 139+0+8192 0 pgs
This means it took 0.097902 seconds for the osd to report it was
going to send it until the messenger saw it, and then 4.003004 more
seconds between the messenger queueing the message and beginning to
put it on the wire.
Note that after reporting this message (----- fff88...) there is
a test_and_set_bit() to see if we need to reschedule con_work().
Could this be yet another read/write barrier issue? If so
we seem to have some GREAT hardware for finding these.
I'm going to try an experiment to see if I can prove this
happens.
Updated by Alex Elder about 11 years ago
Well, it absolutely looks like it can happen.
I found cases where different CPUs set and cleared
the WRITE_PENDING bit.
I'm going to add some memory barriers to see if I
can make the delays go away. I seem to be able to
reproduce the slowness (300 seconds rather than 50)
on test 259 fairly reliably.
Updated by Alex Elder about 11 years ago
For a while I was pretty hopeful, because it appeared that
with these barriers in place I was getting slowness less
often. I had been getting it pretty reducibly after one or
two passes running tests 258 and 259. The first time I ran
with barriers in place it didn't get slow for 10 consecutive
passes. The next one got fairly slow during the second pass.
The last time I rant them I got 8 passes of 1 second/55 seconds
or so, then pass 9 with 1 second/148 seconds, then the 10th
pass took 146 seconds for test 258 (rather than about 1 second)
and 425 seconds for test 259.
So it's possible the barrier thing is improving things, but
it's not at all conclusive. And even if it does, there
is more to it than just that making things slow.
Updated by Alex Elder about 11 years ago
I don't think the barrier improved anything.
All of those state changes should be happening
under protection of a mutex, so there should
be no need for a barrier.
The next thing I found yesterday was that 0-length
requests were handled differently in the new request
code. Previously they were ignored, whereas the new
code ended up submitting them as 0-length object
object requests. I believe that should have been
harmless, but just to be sure I made a change to
ignore 0-length requests. Once again, things
look OK for a while, then they get slow.
I have been noticing that when things get slow
there is a "mkfs.xfs" process running, and it
appears to be spending a lot of time in one of
two places. Example stack traces of each are
below. I'm using kdb this morning to try to
learn more detail about what's going on.
root@plana27:~# cat /proc/17552/stack
[<ffffffff81314f31>] blkdev_issue_flush+0x141/0x1a0
[<ffffffff811bd45e>] blkdev_fsync+0x3e/0x50
[<ffffffff811b443d>] do_fsync+0x5d/0x90
[<ffffffff811b4840>] sys_fsync+0x10/0x20
[<ffffffff81666f19>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
root@plana27:~#
root@plana27:~# cat /proc/7240/stack
[<ffffffff811c13b7>] do_blockdev_direct_IO+0x1947/0x2060
[<ffffffff811c1b25>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811bd047>] blkdev_direct_IO+0x57/0x60
[<ffffffff81131b80>] generic_file_direct_write+0xd0/0x190
[<ffffffff81131f1d>] __generic_file_aio_write+0x2dd/0x3b0
[<ffffffff811bd912>] blkdev_aio_write+0x52/0xc0
[<ffffffff81183843>] do_sync_write+0xa3/0xe0
[<ffffffff81183ef3>] vfs_write+0xb3/0x180
[<ffffffff811843ba>] sys_pwrite64+0x9a/0xa0
[<ffffffff81666f19>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
root@plana27:~#
Updated by Alex Elder about 11 years ago
After lots of failed attempts to fix this I have what
I hope to be something helpful.
I've added dout() calls to basically trace the process
of a request as it moves through rbd. I just capture
10 seconds worth of this during a period where I find
mkfs.xfs is stuck waiting for completion of direct I/O.
Here's that process:
16959 ? D 0:00 /sbin/mkfs.xfs -f -l su=32k /dev/rbd3
Here is its kernel stack trace before I started capturing
the rbd tracing info:
[<ffffffff811c13b7>] do_blockdev_direct_IO+0x1947/0x2060
[<ffffffff811c1b25>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811bd047>] blkdev_direct_IO+0x57/0x60
[<ffffffff81131b80>] generic_file_direct_write+0xd0/0x190
[<ffffffff81131f1d>] __generic_file_aio_write+0x2dd/0x3b0
[<ffffffff811bd912>] blkdev_aio_write+0x52/0xc0
[<ffffffff81183843>] do_sync_write+0xa3/0xe0
[<ffffffff81183ef3>] vfs_write+0xb3/0x180
[<ffffffff811843ba>] sys_pwrite64+0x9a/0xa0
[<ffffffff81666f19>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
And here it is after that 10 seconds had passed (it's
the same as before).
[<ffffffff811c13b7>] do_blockdev_direct_IO+0x1947/0x2060
[<ffffffff811c1b25>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811bd047>] blkdev_direct_IO+0x57/0x60
[<ffffffff81131b80>] generic_file_direct_write+0xd0/0x190
[<ffffffff81131f1d>] __generic_file_aio_write+0x2dd/0x3b0
[<ffffffff811bd912>] blkdev_aio_write+0x52/0xc0
[<ffffffff81183843>] do_sync_write+0xa3/0xe0
[<ffffffff81183ef3>] vfs_write+0xb3/0x180
[<ffffffff811843ba>] sys_pwrite64+0x9a/0xa0
[<ffffffff81666f19>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Here is the tracing information that got dumped to
the log, with some added annotation. It walks through
an entire image request lifetime.
[ 2192.531728] rbd: rbd_osd_write_callback: ffff88020b259d00 result 0 512/512
[ 2192.531734] rbd: rbd_obj_request_complete: ffff88020b259d00 callback ffffffffa09d9e10
[ 2192.531738] rbd: rbd_img_obj_callback: img ffff88020ca36240 obj ffff88020b259d00
[ 2192.531741] rbd: rbd_img_obj_callback: which 0 next_compl 0 req_count 1
[ 2192.531760] rbd: rbd_img_request_complete: ffff88020ca36240
[ 2192.531763] rbd: rbd_img_request_destroy(ffff88020ca36240)
[ 2192.531765] rbd: rbd_img_obj_request_del(ffff88020ca36240, ffff88020b259d00) 0
[ 2192.531768] rbd: rbd_obj_request_destroy(ffff88020b259d00)
(Partial log of activity above, ignore it for now.)
First, a new image request gets created in response to a request
from the block layer. The data structure representing it is
located at ffff88020ca36840.
[ 2192.531845] rbd: rbd_img_request_create(ffff880219805800, 0x1bc01000, 0x1000, true) -> ffff88020ca36840
That request is split into object requests, each of which
goes to one object containing rbd image data. There is
only one in this case. The object request is at address
ffff88020b258700, and is added to the image request.
[ 2192.531851] rbd: rbd_img_request_fill_bio(ffff88020ca36840, ffff88020ca37d40)
[ 2192.531857] rbd: rbd_obj_request_create("rb.0.1021.238e1f29.00000000006f", 0x1000, 0x1000, 1) -> ffff88020b258700
[ 2192.531866] rbd: rbd_img_obj_request_add(ffff88020ca36840, ffff88020b258700) 0
The image request is submitted. and that causes all of its
object requests to be sent to their target osds.
[ 2192.531868] rbd: rbd_img_request_submit(ffff88020ca36840)
[ 2192.531871] rbd: rbd_obj_request_submit(ffff88020ca3e960, ffff88020b258700)
Now we wait until we get notified of completion. Eventually,
net/ceph/osd_client.c:handle_reply() gets called when the
messenger receives a message destined for the osd client.
Once that has done its processing, it calls the request's
r_callback function, which will always be rbd_osd_req_callback()
for rbd requests.
That function fans out to one of several functions depending
on which type of request it was. In this case it was a
write:
[ 2197.581086] rbd: rbd_osd_write_callback: ffff88020b258700 result 0 4096/4096
The osd write request will now be marked done, so the object
request is completed:
[ 2197.581091] rbd: rbd_obj_request_complete: ffff88020b258700 callback ffffffffa09d9e10
Now that the object request is complete, its callback function
is called. For object requests attached to image requests,
this will be rbd_img_obj_callback():
[ 2197.581094] rbd: rbd_img_obj_callback: img ffff88020ca36840 obj ffff88020b258700
And this is object request 0 for that image, which is the
next one we're waiting to complete for this image request,
and there is only one such object request.
[ 2197.581098] rbd: rbd_img_obj_callback: which 0 next_compl 0 req_count 1
Since that was the last (only) object request for this
image, we've completed the entire image request, so its
completion routine is called:
[ 2197.581109] rbd: rbd_img_request_complete: ffff88020ca36840
This image request has no callback routine, so we're done.
We just drop a reference on this image request, which
turns out to be the last reference, so it gets destroyed.
[ 2197.581112] rbd: rbd_img_request_destroy(ffff88020ca36840)
...including destroying all its component object requests.
[ 2197.581115] rbd: rbd_img_obj_request_del(ffff88020ca36840, ffff88020b258700) 0
[ 2197.581118] rbd: rbd_obj_request_destroy(ffff88020b258700)
And that's it.
Now we start again with the next one.
[ 2197.581185] rbd: rbd_img_request_create(ffff880219805800, 0x1bc02000, 0x1000, true) -> ffff88020ca37a40
[ 2197.581191] rbd: rbd_img_request_fill_bio(ffff88020ca37a40, ffff88020ca37d40)
[ 2197.581197] rbd: rbd_obj_request_create("rb.0.1021.238e1f29.00000000006f", 0x2000, 0x1000, 1) -> ffff88020b259700
[ 2197.581206] rbd: rbd_img_obj_request_add(ffff88020ca37a40, ffff88020b259700) 0
[ 2197.581209] rbd: rbd_img_request_submit(ffff88020ca37a40)
[ 2197.581211] rbd: rbd_obj_request_submit(ffff88020ca3e960, ffff88020b259700)
OK, now let's look at the important two lines, which are
consecutive in the tracing output:
[ 2192.531871] rbd: rbd_obj_request_submit(ffff88020ca3e960, ffff88020b258700)
[ 2197.581086] rbd: rbd_osd_write_callback: ffff88020b258700 result 0 4096/4096
The time difference between these is 5.049215 seconds.
The first function consists of this:
return ceph_osdc_start_request(osdc, obj_request->osd_req, false);
And the second one is as described above, called back
immediately after receiving a response from the request
sent to the osd.
This is why I think I can conclude that something's going
on in the osd (or rados) that is causing these long delays.
I will try to do the same thing, but with full debug logging
so we can see all the messenger and osd client activity between
those two lines I highlighted above.
Updated by Alex Elder about 11 years ago
The process:
20767 ? D 0:00 /sbin/mkfs.xfs -f -l su=32k /dev/rbd3
Its "before" kernel stack trace:
[<ffffffff811c13b7>] do_blockdev_direct_IO+0x1947/0x2060
[<ffffffff811c1b25>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811bd047>] blkdev_direct_IO+0x57/0x60
[<ffffffff81131b80>] generic_file_direct_write+0xd0/0x190
[<ffffffff81131f1d>] __generic_file_aio_write+0x2dd/0x3b0
[<ffffffff811bd912>] blkdev_aio_write+0x52/0xc0
[<ffffffff81183843>] do_sync_write+0xa3/0xe0
[<ffffffff81183ef3>] vfs_write+0xb3/0x180
[<ffffffff81184232>] sys_write+0x52/0xa0
[<ffffffff81666f19>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Its "after" kernel stack trace:
[<ffffffff811c13b7>] do_blockdev_direct_IO+0x1947/0x2060
[<ffffffff811c1b25>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811bd047>] blkdev_direct_IO+0x57/0x60
[<ffffffff81131b80>] generic_file_direct_write+0xd0/0x190
[<ffffffff81131f1d>] __generic_file_aio_write+0x2dd/0x3b0
[<ffffffff811bd912>] blkdev_aio_write+0x52/0xc0
[<ffffffff81183843>] do_sync_write+0xa3/0xe0
[<ffffffff81183ef3>] vfs_write+0xb3/0x180
[<ffffffff81184232>] sys_write+0x52/0xa0
[<ffffffff81666f19>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
And below is my analysis of the very verbose trace info.
We wait about 5 seconds for a response from the osd.
-------------------- New rbd image request: ffff88020b582a80 offset: 0x25808000 = 629178368 length: 0x40000 = 262144 (256KB) write: yes -------------------- [ 3953.583136] rbd: rbd_img_request_create(ffff880219805800, 0x25808000, 0x40000, true) -> ffff88020b582a80 [ 3953.583136] rbd: rbd_img_request_create(ffff880219805800, 0x25808000, 0x40000, true) -> ffff88020b582a80 [ 3953.583140] rbd: rbd_img_request_fill_bio(ffff88020b582a80, ffff88020b583980) -------------------- New rbd object request: ffff88020b582a80 object: "rb.0.1021.238e1f29.000000000096", offset: 0x8000 = 32768 length: 0x40000 = 262144 (256KB) type: 1 = OBJ_REQUEST_BIO -------------------- [ 3953.583145] rbd: rbd_obj_request_create("rb.0.1021.238e1f29.000000000096", 0x8000, 0x40000, 1) -> ffff880225f2a700 -------------------- OSD request allocated, including request and response messages request message is ffff880225f2bd00 -------------------- [ 3953.583152] libceph: ceph_msg_new ffff880225f2b900 front 512 [ 3953.583155] libceph: ceph_msg_new ffff880225f2bd00 front 208 -------------------- object request added to image request, and submitted -------------------- [ 3953.583159] rbd: rbd_obj_request_get ffff880225f2a700 (was 1) [ 3953.583162] rbd: rbd_img_obj_request_add(ffff88020b582a80, ffff880225f2a700) 0 [ 3953.583164] rbd: rbd_img_request_submit(ffff88020b582a80) [ 3953.583167] rbd: rbd_obj_request_submit(ffff88020ca3e960, ffff880225f2a700) -------------------- ceph_osdc_start_request() registers the and maps the request -------------------- [ 3953.583171] libceph: __register_request ffff88020f185800 tid 65536 [ 3953.583174] libceph: first request, scheduling timeout [ 3953.583179] libceph: map_request ffff88020f185800 tid 65536 [ 3953.583183] libceph: calc_object_layout 'rb.0.1021.238e1f29.000000000096' pgid 2.526e76bf [ 3953.583190] libceph: map_request tid 65536 pgid 2.76bf osd1 (was osd-1) [ 3953.583193] libceph: __remove_osd_from_lru ffff88020b595000 -------------------- __send_request() hands request message it to messenger -------------------- [ 3953.583196] libceph: send_request ffff88020f185800 tid 65536 to osd1 flags 36 -------------------- ceph_con_send() calls con->ops->get(con) -------------------- [ 3953.583199] libceph: ceph_con_send: 2: mutex_lock(ffff88020b595030) [ 3953.583202] libceph: get_osd ffff88020b595000 1 -> 2 -------------------- ceph_con_send() puts the message on the out queue -------------------- [ 3953.583207] libceph: ----- ffff880225f2bd00 to osd1 42=osd_op len 139+0+262144 ----- [ 3953.583210] libceph: ceph_con_send: 2: mutex_unlock(ffff88020b595030) -------------------- queue_con_delay() calls con->ops->get(con) (again) ...and ensures con_work will run -------------------- [ 3953.583212] libceph: get_osd ffff88020b595000 2 -> 3 [ 3953.583220] libceph: queue_con_delay ffff88020b595030 0 -------------------- rbd drops an object request reference; done for now -------------------- [ 3953.583224] rbd: rbd_obj_request_put ffff880225f2a700 (was 2) -------------------- Now the messenger needs to send the message -------------------- [ 3953.583232] libceph: con_work: 2: mutex_lock(ffff88020b595030) -------------------- There is nothing incoming -------------------- [ 3953.583236] libceph: try_read start on ffff88020b595030 state 5 [ 3953.583238] libceph: try_read tag 1 in_base_pos 0 [ 3953.583245] libceph: try_read done on ffff88020b595030 ret 0 -------------------- The only outgoing request is the one we built above, ffff880225f2bd00 -------------------- [ 3953.583247] libceph: try_write start ffff88020b595030 state 5 [ 3953.583250] libceph: try_write out_kvec_bytes 0 [ 3953.583253] libceph: prepare_write_message ffff880225f2bd00 seq 2 type 42 len 139+0+262144 0 pgs [ 3953.583258] libceph: prepare_write_message front_crc 541896034 middle_crc 0 [ 3953.583260] libceph: try_write out_kvec_bytes 193 -------------------- Write out the header and middle -------------------- [ 3953.583263] libceph: write_partial_kvec ffff88020b595030 193 left [ 3953.583270] libceph: write_partial_kvec ffff88020b595030 0 left in 0 kvecs ret = 1 -------------------- Write out some of the bio data -------------------- [ 3953.583273] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 page 0/0 offset 0 [ 3953.583399] libceph: try_write done on ffff88020b595030 ret 0 -------------------- con_work() is done, call con->ops->put(con) -------------------- [ 3953.583405] libceph: put_osd ffff88020b595000 3 -> 2 -------------------- messenger gets notified the socket has space to write -------------------- [ 3953.583959] libceph: ceph_sock_write_space ffff88020b595030 queueing write work -------------------- con_work() fires again -------------------- [ 3953.583965] libceph: get_osd ffff88020b595000 2 -> 3 [ 3953.583973] libceph: queue_con_delay ffff88020b595030 0 [ 3953.583985] libceph: con_work: 0: mutex_lock(ffff88020b595030) -------------------- nothing to read -------------------- [ 3953.583990] libceph: try_read start on ffff88020b595030 state 5 [ 3953.583992] libceph: try_read tag 1 in_base_pos 0 [ 3953.583998] libceph: try_read done on ffff88020b595030 ret 0 -------------------- write out some more of the bio data from the same message -------------------- [ 3953.584001] libceph: try_write start ffff88020b595030 state 5 [ 3953.584003] libceph: try_write out_kvec_bytes 0 [ 3953.584007] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 page 13/0 offset 2086 [ 3953.584104] libceph: try_write done on ffff88020b595030 ret 0 [ 3953.584108] libceph: con_work: 0: mutex_unlock(ffff88020b595030) [ 3953.584111] libceph: put_osd ffff88020b595000 3 -> 2 -------------------- the socket has space to write again -------------------- [ 3953.584928] libceph: ceph_sock_write_space ffff88020b595030 queueing write work [ 3953.584933] libceph: get_osd ffff88020b595000 2 -> 3 [ 3953.584940] libceph: queue_con_delay ffff88020b595030 0 [ 3953.584952] libceph: con_work: 0: mutex_lock(ffff88020b595030) -------------------- nothing to read -------------------- [ 3953.584956] libceph: try_read start on ffff88020b595030 state 5 [ 3953.584959] libceph: try_read tag 1 in_base_pos 0 [ 3953.584965] libceph: try_read done on ffff88020b595030 ret 0 [ 3953.584967] libceph: try_write start ffff88020b595030 state 5 [ 3953.584969] libceph: try_write out_kvec_bytes 0 -------------------- write out some more of the bio data from the same message -------------------- [ 3953.584973] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 page 25/0 offset 718 [ 3953.585097] libceph: try_write done on ffff88020b595030 ret 0 [ 3953.585101] libceph: con_work: 0: mutex_unlock(ffff88020b595030) [ 3953.585104] libceph: put_osd ffff88020b595000 3 -> 2 -------------------- and again--more space, nothing to read, write more from the bio -------------------- [ 3953.584973] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 page 25/0 offset 718 [ 3953.585546] libceph: ceph_sock_write_space ffff88020b595030 queueing write work [ 3953.585550] libceph: get_osd ffff88020b595000 2 -> 3 [ 3953.585556] libceph: queue_con_delay ffff88020b595030 0 [ 3953.586209] libceph: con_work: 0: mutex_lock(ffff88020b595030) [ 3953.586214] libceph: try_read start on ffff88020b595030 state 5 [ 3953.586216] libceph: try_read tag 1 in_base_pos 0 [ 3953.586222] libceph: try_read done on ffff88020b595030 ret 0 [ 3953.586225] libceph: try_write start ffff88020b595030 state 5 [ 3953.586227] libceph: try_write out_kvec_bytes 0 [ 3953.586230] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 page 40/0 offset 94 [ 3953.586412] libceph: try_write done on ffff88020b595030 ret 0 [ 3953.586416] libceph: con_work: 0: mutex_unlock(ffff88020b595030) [ 3953.586419] libceph: put_osd ffff88020b595000 3 -> 2 -------------------- and again--more space, nothing to read, write the rest of the bio -------------------- [ 3953.586620] libceph: ceph_sock_write_space ffff88020b595030 queueing write work [ 3953.586625] libceph: get_osd ffff88020b595000 2 -> 3 [ 3953.586633] libceph: queue_con_delay ffff88020b595030 0 [ 3953.586645] libceph: con_work: 0: mutex_lock(ffff88020b595030) [ 3953.586649] libceph: try_read start on ffff88020b595030 state 5 [ 3953.586652] libceph: try_read tag 1 in_base_pos 0 [ 3953.586658] libceph: try_read done on ffff88020b595030 ret 0 [ 3953.586660] libceph: try_write start ffff88020b595030 state 5 [ 3953.586663] libceph: try_write out_kvec_bytes 0 [ 3953.586666] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 page 61/0 offset 958 [ 3953.586690] libceph: write_partial_msg_pages ffff88020b595030 msg ffff880225f2bd00 done -------------------- now write out the message footer -------------------- [ 3953.586692] libceph: prepare_write_message_footer ffff88020b595030 [ 3953.586695] libceph: write_partial_kvec ffff88020b595030 13 left [ 3953.586701] libceph: write_partial_kvec ffff88020b595030 0 left in 0 kvecs ret = 1 -------------------- now write out the message footer, and we're done -------------------- [ 3953.586704] libceph: try_write nothing else to write. [ 3953.586706] libceph: try_write done on ffff88020b595030 ret 0 [ 3953.586710] libceph: con_work: 0: mutex_unlock(ffff88020b595030) [ 3953.586712] libceph: put_osd ffff88020b595000 3 -> 2 -------------------- messenger gets notified the socket has incoming data -------------------- [ 3953.802155] libceph: ceph_sock_data_ready on ffff88020b595030 state = 5, queueing work -------------------- con_work reads in the tag -------------------- [ 3953.802161] libceph: get_osd ffff88020b595000 2 -> 3 [ 3953.802170] libceph: queue_con_delay ffff88020b595030 0 [ 3953.802187] libceph: con_work: 0: mutex_lock(ffff88020b595030) [ 3953.802192] libceph: try_read start on ffff88020b595030 state 5 [ 3953.802194] libceph: try_read tag 1 in_base_pos 0 [ 3953.802201] libceph: try_read got tag 8 -------------------- con_work reads in the tag, it's CEPH_MSGR_TAG_ACK, for our message -------------------- [ 3953.802204] libceph: prepare_read_ack ffff88020b595030 [ 3953.802210] libceph: got ack for seq 2 type 42 at ffff880225f2bd00 [ 3953.802213] libceph: ceph_msg_remove [ 3953.802216] libceph: put_osd ffff88020b595000 3 -> 2 [ 3953.802219] libceph: prepare_read_tag ffff88020b595030 [ 3953.802221] libceph: try_read start on ffff88020b595030 state 5 [ 3953.802224] libceph: try_read tag 1 in_base_pos 0 [ 3953.802228] libceph: try_read done on ffff88020b595030 ret 0 [ 3953.802231] libceph: try_write start ffff88020b595030 state 5 [ 3953.802233] libceph: try_write out_kvec_bytes 0 [ 3953.802236] libceph: try_write nothing else to write. [ 3953.802238] libceph: try_write done on ffff88020b595030 ret 0 [ 3953.802241] libceph: con_work: 0: mutex_unlock(ffff88020b595030) [ 3953.802244] libceph: put_osd ffff88020b595000 2 -> 1 ... now nothing for a while ... -------------------- mon_client isn't hunting, so sends out a keepalive -------------------- [ 3954.443971] libceph: monc delayed_work [ 3954.443978] libceph: con_keepalive ffff88020ca3e410 [ 3954.443981] libceph: ceph_con_keepalive: 0: mutex_lock(ffff88020ca3e410) [ 3954.443986] libceph: ceph_con_keepalive: 0: mutex_unlock(ffff88020ca3e410) [ 3954.443991] libceph: queue_con_delay ffff88020ca3e410 0 -------------------- call __send_subscribe() -------------------- [ 3954.443995] libceph: ceph_x_should_authenticate want=39 need=0 have=39 [ 3954.443998] libceph: ceph_x_is_authenticated want=39 need=0 have=39 [ 3954.444001] libceph: __send_subscribe sub_sent=0 exp=0 want_osd=2 [ 3954.444004] libceph: __schedule_delayed after 2000 -------------------- send out the monitor message -------------------- [ 3954.444010] libceph: con_work: 0: mutex_lock(ffff88020ca3e410) [ 3954.444014] libceph: try_read start on ffff88020ca3e410 state 5 [ 3954.444016] libceph: try_read tag 1 in_base_pos 0 [ 3954.444023] libceph: try_read done on ffff88020ca3e410 ret 0 [ 3954.444026] libceph: try_write start ffff88020ca3e410 state 5 [ 3954.444028] libceph: try_write out_kvec_bytes 0 [ 3954.444030] libceph: prepare_write_keepalive ffff88020ca3e410 [ 3954.444033] libceph: try_write out_kvec_bytes 1 [ 3954.444035] libceph: write_partial_kvec ffff88020ca3e410 1 left [ 3954.444044] libceph: write_partial_kvec ffff88020ca3e410 0 left in 0 kvecs ret = 1 [ 3954.444047] libceph: try_write nothing else to write. [ 3954.444049] libceph: try_write done on ffff88020ca3e410 ret 0 [ 3954.444052] libceph: con_work: 0: mutex_unlock(ffff88020ca3e410) ... and nothing for 4 more seconds ... -------------------- now osd_client times out -------------------- [ 3958.576811] libceph: timeout [ 3958.576817] libceph: request_next_osdmap have 7 [ 3958.576823] libceph: tid 65536 is slow, will send keepalive on osd1 -------------------- handle_timeout() calls ceph_con_keepalive() -------------------- [ 3958.576826] libceph: con_keepalive ffff88020b595030 [ 3958.576829] libceph: ceph_con_keepalive: 2: mutex_lock(ffff88020b595030) [ 3958.576833] libceph: ceph_con_keepalive: 2: mutex_unlock(ffff88020b595030) [ 3958.576836] libceph: get_osd ffff88020b595000 1 -> 2 [ 3958.576840] libceph: queue_con_delay ffff88020b595030 0 [ 3958.576844] libceph: __send_queued -------------------- con_work() sends out the keepalive -------------------- [ 3958.576848] libceph: con_work: 2: mutex_lock(ffff88020b595030) [ 3958.576852] libceph: try_read start on ffff88020b595030 state 5 [ 3958.576855] libceph: try_read tag 1 in_base_pos 0 [ 3958.576861] libceph: try_read done on ffff88020b595030 ret 0 [ 3958.576864] libceph: try_write start ffff88020b595030 state 5 [ 3958.576866] libceph: try_write out_kvec_bytes 0 [ 3958.576868] libceph: prepare_write_keepalive ffff88020b595030 [ 3958.576871] libceph: try_write out_kvec_bytes 1 [ 3958.576873] libceph: write_partial_kvec ffff88020b595030 1 left [ 3958.576892] libceph: write_partial_kvec ffff88020b595030 0 left in 0 kvecs ret = 1 [ 3958.576894] libceph: try_write nothing else to write. [ 3958.576897] libceph: try_write done on ffff88020b595030 ret 0 [ 3958.576900] libceph: con_work: 2: mutex_unlock(ffff88020b595030) [ 3958.576903] libceph: put_osd ffff88020b595000 2 -> 1 -------------------- messenger gets notified the socket has incoming data -------------------- [ 3958.623971] libceph: ceph_sock_data_ready on ffff88020b595030 state = 5, queueing work -------------------- con_work reads in the tag, it's CEPH_MSGR_TAG_MSG -------------------- [ 3958.623977] libceph: get_osd ffff88020b595000 1 -> 2 [ 3958.623986] libceph: queue_con_delay ffff88020b595030 0 [ 3958.624002] libceph: con_work: 0: mutex_lock(ffff88020b595030) [ 3958.624006] libceph: try_read start on ffff88020b595030 state 5 [ 3958.624009] libceph: try_read tag 1 in_base_pos 0 [ 3958.624016] libceph: try_read got tag 7 -------------------- read in the message header--it's our response -------------------- [ 3958.624018] libceph: prepare_read_message ffff88020b595030 [ 3958.624021] libceph: read_partial_message con ffff88020b595030 msg (null) [ 3958.624027] libceph: got hdr type 43 front 117 data 0 [ 3958.624030] libceph: ceph_con_in_msg_alloc: 0: mutex_unlock(ffff88020b595030) [ 3958.624034] libceph: get_osd ffff88020b595000 2 -> 3 --- Finally, response to our request arrives [ 3958.624037] libceph: get_reply tid 65536 ffff880225f2b900 [ 3958.624040] libceph: ceph_con_in_msg_alloc: 0: mutex_lock(ffff88020b595030) -------------------- read in the message data -------------------- [ 3958.624043] libceph: get_osd ffff88020b595000 3 -> 4 [ 3958.624053] libceph: read_partial_message got msg ffff880225f2b900 117 (4140793455) + 0 (0) + 0 (0) [ 3958.624058] libceph: put_osd ffff88020b595000 4 -> 3 [ 3958.624063] libceph: process_message: 0: mutex_unlock(ffff88020b595030) [ 3958.624070] libceph: ===== ffff880225f2b900 2 from osd1 43=osd_opreply len 117+0 (4140793455 0 0) ===== [ 3958.624076] libceph: handle_reply ffff880225f2b900 tid 65536 result 0 [ 3958.624079] libceph: dropping con_filling_msg ref ffff88020b595030 [ 3958.624082] libceph: put_osd ffff88020b595000 3 -> 2 -------------------- result was good -------------------- [ 3958.624084] libceph: handle_reply result 0 bytes 0 [ 3958.624087] libceph: handle_reply tid 65536 flags 37 [ 3958.624089] libceph: __unregister_request ffff88020f185800 tid 65536 [ 3958.624092] libceph: moving osd to ffff88020b595000 lru [ 3958.624094] libceph: __move_osd_to_lru ffff88020b595000 [ 3958.624096] libceph: no requests, canceling timeout -------------------- and now we're back in rbd, getting called back -------------------- [ 3958.624101] rbd: rbd_osd_req_callback: ffff88020f185800 ffff880225f2b900 [ 3958.624104] rbd: rbd_osd_write_callback: ffff880225f2a700 result 0 262144/262144 [ 3958.624107] rbd: rbd_obj_request_complete: ffff880225f2a700 callback ffffffffa09d9e10 [ 3958.624110] rbd: rbd_img_obj_callback(ffff880225f2a700) 0 [ 3958.624113] rbd: rbd_img_obj_callback: img ffff88020b582a80 obj ffff880225f2a700 [ 3958.624116] rbd: rbd_img_obj_callback: which 0 next_compl 0 req_count 1 [ 3958.624132] rbd: rbd_img_request_complete: ffff88020b582a80 [ 3958.624135] rbd: rbd_img_request_put ffff88020b582a80 (was 1) [ 3958.624138] rbd: rbd_img_request_destroy(ffff88020b582a80) [ 3958.624141] rbd: rbd_img_obj_request_del(ffff88020b582a80, ffff880225f2a700) 0
Updated by Alex Elder about 11 years ago
Here is the yaml file I used for reproducing this.
interactive-on-error: roles: - [mon.a, mon.c, osd.0, osd.1, osd.2] - [mon.b, mds.a, osd.3, osd.4, osd.5] - [client.0] kernel: mon: branch: wip-4079 kdb: true client: branch: wip-4079 kdb: true overrides: ceph: conf: osd: osd op complaint time: 120 fs: btrfs log-whitelist: - clocks not synchronized - old request tasks: - ceph: branch: next conf: global: debug ms: 10 osd: debug osd: 20 - rbd: all: - rbd.xfstests: client.0: count: 5 tests: 258-259
Updated by Ian Colle about 11 years ago
- Assignee changed from Alex Elder to Josh Durgin
Updated by Alex Elder about 11 years ago
Another possible clue. Things seemed to be going reasonably
well on this run, but sometime during test 78 things went bad,
and we also got lots of "mons are laggy or clocks are too skewed"
messages.
First, this is a set of possibly typical timings for these tests:
. . . INFO:teuthology.task.rbd.client.0.out:070 22s INFO:teuthology.task.rbd.client.0.out:074 295s INFO:teuthology.task.rbd.client.0.out:075 240s INFO:teuthology.task.rbd.client.0.out:076 37s INFO:teuthology.task.rbd.client.0.out:078 72s INFO:teuthology.task.rbd.client.0.out:079 4s INFO:teuthology.task.rbd.client.0.out:084 59s INFO:teuthology.task.rbd.client.0.out:085 4s INFO:teuthology.task.rbd.client.0.out:086 160s . . .
Now on this one, note that tests 75 and 76 show about
the right elapsed time (240 and 37 seconds), but then
78 (after the messages) took 707 seconds rather than
around 72 seconds, and test 79 took 213 seconds rather
than the expected (about) 4.
. . . INFO:teuthology.task.rbd.client.0.out:070 23s INFO:teuthology.task.rbd.client.0.out:074 326s INFO:teuthology.task.rbd.client.0.out:075 240s INFO:teuthology.task.rbd.client.0.out:076 37s INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:12:49.873079 7fb1ba56e700 -1 mon.a@1(peon).paxos(monmap active c 1..1) lease_expire from mon.0 10.214.133.20:6789/0 is 0.100789 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:12:49.873082 7f72de10a700 -1 mon.c@2(peon).paxos(monmap active c 1..1) lease_expire from mon.0 10.214.133.20:6789/0 is 0.100793 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:13:00.061164 7fb1ba56e700 -1 mon.a@1(peon).paxos(logm updating c 132..634) lease_expire from mon.0 10.214.133.20:6789/0 is 0.847412 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:13:00.061199 7f72de10a700 -1 mon.c@2(peon).paxos(logm updating c 132..634) lease_expire from mon.0 10.214.133.20:6789/0 is 0.847448 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:13:00.588403 7fb1ba56e700 -1 mon.a@1(peon).paxos(monmap active c 1..1) lease_expire from mon.0 10.214.133.20:6789/0 is 0.544320 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:13:00.588527 7fb1ba56e700 -1 mon.a@1(peon).paxos(osdmap active c 1..5) lease_expire from mon.0 10.214.133.20:6789/0 is 0.463277 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:13:00.588650 7fb1ba56e700 -1 mon.a@1(peon).paxos(auth active c 1..10) lease_expire from mon.0 10.214.133.20:6789/0 is 0.463295 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:13:00.588762 7fb1ba56e700 -1 mon.a@1(peon).paxos(mdsmap active c 1..5) lease_expire from mon.0 10.214.133.20:6789/0 is 0.463101 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:13:00.588384 7f72de10a700 -1 mon.c@2(peon).paxos(monmap active c 1..1) lease_expire from mon.0 10.214.133.20:6789/0 is 0.544301 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:13:00.588489 7f72de10a700 -1 mon.c@2(peon).paxos(osdmap active c 1..5) lease_expire from mon.0 10.214.133.20:6789/0 is 0.463238 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:13:00.588624 7f72de10a700 -1 mon.c@2(peon).paxos(auth active c 1..10) lease_expire from mon.0 10.214.133.20:6789/0 is 0.463268 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:13:00.588847 7f72de10a700 -1 mon.c@2(peon).paxos(mdsmap active c 1..5) lease_expire from mon.0 10.214.133.20:6789/0 is 0.463186 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.a.err:2013-02-22 19:13:01.181790 7fb1ba56e700 -1 mon.a@1(peon).paxos(pgmap updating c 124..626) lease_expire from mon.0 10.214.133.20:6789/0 is 0.889894 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.ceph.mon.c.err:2013-02-22 19:13:01.181809 7f72de10a700 -1 mon.c@2(peon).paxos(pgmap updating c 124..626) lease_expire from mon.0 10.214.133.20:6789/0 is 0.889913 seconds in the past; mons are laggy or clocks are too skewed INFO:teuthology.task.rbd.client.0.out:078 707s INFO:teuthology.task.rbd.client.0.out:079 213s INFO:teuthology.task.rbd.client.0.out:084 100s INFO:teuthology.task.rbd.client.0.out:085 204s INFO:teuthology.task.rbd.client.0.out:086 4306s . . .
Updated by Sage Weil about 11 years ago
- Assignee deleted (
Josh Durgin) - Priority changed from High to Urgent
Updated by Sage Weil about 11 years ago
- Subject changed from krbd: requests become extremely slow following socket error to osd: journal aio deadlock
tracked this down to a hang in the journal aio code.. the throttle code in write_thread_entry()
Updated by Josh Durgin about 11 years ago
Alex saw a slowdown in one of his xfstests runs against ceph master. Logs and a short description of the setup are on plana29 in /home/ubuntu/bug_4079
Updated by Sage Weil about 11 years ago
not this bug.
also, facepalm, the reason i can't reproduce is becuase we turned of journal aio by default until we fix this. need to put journal aio = true in the config to trigger...
Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Resolved