Project

General

Profile

Bug #2316

rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"

Added by Oliver Francke almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
librbd
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Hi,

in my current test-setup all four VM's are started with rbd_cache parm. After all VM's are started and began their stress-testing, some of them got stale, no ping, no VNC, no communication to qemu-process at all. So I think, the new caching needs some more robustness in relation to qemu?
All I do on the node is some "kill -SIGTOP, -9 ..." to the ceph-osd process. And let enough time to settle them down again after restart.
What I recognized is, that the time from "stale+... degraded ( 50%)" to clean take way longer than before with ceph-0.44.

Even if all OSD are "active+clean;" again, no more response from the VM's, they need to be killed and next starts leads to fsck with errors and so on...
Logfiles attached.

Regards,

Oliver.

ceph.osd.0-stale.log.bz2 (24 MB) Oliver Francke, 04/19/2012 03:19 AM

ceph.osd.1-stale.log.bz2 (24.1 MB) Oliver Francke, 04/19/2012 03:19 AM

hung_vms.tar (56.6 MB) Josh Durgin, 05/07/2012 06:05 PM

906_test.log View (16.5 MB) Josh Durgin, 05/11/2012 09:03 PM


Related issues

Duplicated by Ceph - Bug #2393: objecter: dropping messages (old connection being used) Duplicate 05/09/2012

History

#1 Updated by Sage Weil over 8 years ago

  • Category set to librbd
  • Priority changed from Normal to High
  • Target version set to v0.47
  • Source changed from Development to Community (user)

#2 Updated by Josh Durgin over 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Josh Durgin

#3 Updated by Josh Durgin over 8 years ago

There were a couple bugs fixed that may have caused this problem. Could you try upgrading to the 'next' branch (at commit 9579365720818125a4b15741ae65e58948b9c69f currently)? I haven't been able to reproduce this problem in my environment, but I haven't tried without those bug fixes. I'm still running more tests to be safe though.

#4 Updated by Oliver Francke over 8 years ago

Hi Josh,

well, just allowed myself another time-window for some investigations. Do you have some pointers to the fixed bugs? Other then that, I get a:
--- 8-< ---
Cloning into ceph-object-corpus...
fatal: read error: Connection reset by peer
Clone of 'git://ceph.newdream.net/git/ceph-object-corpus.git' into submodule path 'ceph-object-corpus' failed
--- 8-< ---
might be silly me not being able to work with git & friends. Perhaps you can help me out there.
git describe gives me: v0.45-295-g9579365, which seems to be correct.

Kind regards,

Oliver.

#5 Updated by Josh Durgin over 8 years ago

There was a fix to librbd's handling of short reads that span objects (b94d6a6cf459478b27539eb8eccd30e19a67bbc5), as well as a couple fixes for replay of non-idempotent transactions (d1740bd586db80068fc0292223cf21911de66428 and 6910d83897fda5ad2d4dc1d4ce405a6ec95ee4e1).

If you run the following commands from your ceph checkout, you should be at the right version:

git fetch
git clean -fdx
git reset --hard origin/next
git submodule init
git submodule update

Then you should be able to ./autogen.sh && ./configure etc. as usual.

#6 Updated by Oliver Francke over 8 years ago

Hi Josh, ( sending this second time, "bad browser request..." grrr)

I had some spare-time over the weekend and decided to do some testing... To make sure everything is ok, I made three runs without being destructive. After that raised debug-levels, deleted everything and startet off.
Three of four session got stale, could not even establish a VNC-console to have a look what happend to the VM...
After that I was curious what happens, if I remove/create the testdisk-images...:
---
rbd rm data/908-testdisk.rbd
Removing image: 99% complete...failed.
delete error: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed.2012-04-29 14:33:01.381680 7fc2b5baa760 -1 librbd: error removing header: (16) Device or resource busy

Error 0 removing data/908-testdisk.rbd
---

Error-pattern slightly changed, but no improve so far. sigh
Logfiles attached, hope it helps.

Kind regards,

Oliver.

P.S.: Got a "Bad request" couple of times... logfiles now on root@fcmsnode3:2316/...

#7 Updated by Oliver Francke over 8 years ago

Hi Josh, Sage,

as we are now proud to be mentioned on ceph.com, we should push things a bit, cause have to breathe some life back into our platform; serving what we are talking about in theory right now with a working setup ;)

Comments as always welcome,

Chris + Jens + myself.

#8 Updated by Josh Durgin over 8 years ago

The most obvious problem is with clients thinking their requests outstanding, when the osds think they've completed all requests.

ceph --admin-daemon /var/run/ceph/osd.1.asok dump_ops_in_flight
{ "num_ops": 0,
  "ops": []}
ceph --admin-daemon 907_root.asok objecter_requests | wc -l
1610

I'm attaching logs with client and osd debugging turned up. osd.1 is killed and restarted repeatedly, and this lead to it replaying some ops without sending a response to the client (i.e. op is applied, osd is restarted, replays op, never sends ack to client). One such op is tid 5400 from the 905_root device. There may be other things going on, but this is the most obvious cause of the stalls.

#9 Updated by Sage Weil over 8 years ago

op 5400 was resent in epoch 8565, but the osd never saw it arrive the second time around. looking at the other laggy ops, they seem to be in the same boat.

is it possible to reproduce this with debug ms = 20? maybe leaked refs in the osd preventing msgs from being read off the wire? or something else wonky in the messenger..

#10 Updated by Josh Durgin over 8 years ago

You're right that one of the requests was dropped, but it was just because osd.1 was killed, and it was the second message, not the third. osd.0 didn't reply to the third attempt because it hit this condition:

(m->get_map_epoch() < pg->info.history.same_primary_since)

This shows up in the log as:

grep -A 10 osd_op.*\:5400 ceph.osd.0.log | grep changed
2012-05-08 02:07:17.018693 7fa0cb604700  7 osd.0 8566 pg[14.3( v 8564'453375 (8464'452374,8564'453375] n=964 ec=1365 les/c 8564/8564 8566/8566/8566) [1,0] r=1 lpr=8566 lcod 8564'453374 inactive] changed after 8565, dropping
2012-05-08 02:07:17.018715 7fa0cb604700  7 osd.0 8566 pg[14.3( v 8564'453375 (8464'452374,8564'453375] n=964 ec=1365 les/c 8564/8564 8566/8566/8566) [1,0] r=1 lpr=8566 lcod 8564'453374 inactive] changed after 8565, dropping
2012-05-08 02:07:17.018735 7fa0cb604700  7 osd.0 8566 pg[14.3( v 8564'453375 (8464'452374,8564'453375] n=964 ec=1365 les/c 8564/8564 8566/8566/8566) [1,0] r=1 lpr=8566 lcod 8564'453374 inactive] changed after 8565, dropping

The client doesn't continue retrying because for some reason it doesn't get any maps after 8565, despite continuing to subscribe to them. Maybe there's a bug with having only one monitor?

Also, this is very reproducible in Oliver's environment. If we need more debugging, it can be done.

#11 Updated by Josh Durgin over 8 years ago

With more debugging, this looks like a problem with the messenger or the monitor client's use of it. After a fault, the messenger is failing the pipe and it is reaped, but no new pipe is opened for the connection. The connection object is still being used to submit map subscription requests, which are subsequently dropped because it's a lossy connection with no pipe.

Here's where it goes wrong:

2012-05-11 00:08:55.005962 7f1752ffd700 10 client.16053.objecter maybe_request_map subscribing (onetime) to next osd map
2012-05-11 00:08:55.005970 7f1752ffd700  1 -- 10.0.0.11:0/1028586 --> 10.0.0.11:6800/28268 -- ping v1 -- ?+0 0x7f1762dd18a0 con 0x7f1762da94d0
2012-05-11 00:08:55.005984 7f1752ffd700 20 -- 10.0.0.11:0/1028586 submit_message ping v1 remote 10.0.0.11:6800/28268
2012-05-11 00:08:55.006004 7f1758173700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 00:08:55.006023 7f1758173700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).writer encoding 2678 0x7f1762dd18a0 pi
ng v1
2012-05-11 00:08:55.006032 7f1758173700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).writer sending 2678 0x7f1762dd18a0
2012-05-11 00:08:55.006040 7f1758173700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).write_message 0x7f1762dd18a0
2012-05-11 00:08:55.006091 7f1758173700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 00:08:55.006098 7f1758173700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).writer sleeping
2012-05-11 00:08:55.207239 7f17517fa700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).reader got ACK
2012-05-11 00:08:55.207260 7f17517fa700 15 reader got ack seq 2678
2012-05-11 00:08:55.207277 7f17517fa700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.11:6800/28268 pipe(0x7f1762da9260 sd=18 pgs=2 cs=1 l=1).reader reading tag...
2012-05-11 00:08:55.589150 7f1758274700  2 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).reader couldn't read tag, Success
2012-05-11 00:08:55.589177 7f1758274700  2 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).fault 0: Success
2012-05-11 00:08:55.589198 7f1758274700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).fault on lossy channel, failing
2012-05-11 00:08:55.589208 7f1758274700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).fail
2012-05-11 00:08:55.589216 7f1758274700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).stop
2012-05-11 00:08:55.589224 7f1758274700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).discard_queue
2012-05-11 00:08:55.589239 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1). dequeued pipe 
2012-05-11 00:08:55.589247 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 164 to dispatch throttler 101148/104857600
2012-05-11 00:08:55.589262 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f174c044ba0
2012-05-11 00:08:55.589287 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 100984/104857600
2012-05-11 00:08:55.589292 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f174c0755e0
2012-05-11 00:08:55.589301 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 3628 to dispatch throttler 100964/104857600
2012-05-11 00:08:55.589306 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f174c3d36f0
2012-05-11 00:08:55.589326 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 97336/104857600
2012-05-11 00:08:55.589331 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f174c1aecb0
2012-05-11 00:08:55.589339 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 3628 to dispatch throttler 97316/104857600
2012-05-11 00:08:55.589344 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f174c3c9d80
2012-05-11 00:08:55.589362 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 93688/104857600
2012-05-11 00:08:55.589366 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f174c4beb90
2012-05-11 00:08:55.589391 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 3889 to dispatch throttler 93668/104857600
2012-05-11 00:08:55.589395 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17547168a0
2012-05-11 00:08:55.589433 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 89779/104857600
2012-05-11 00:08:55.589438 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f175421cb20
2012-05-11 00:08:55.589444 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 5278 to dispatch throttler 89759/104857600
2012-05-11 00:08:55.589447 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f175488ad00
2012-05-11 00:08:55.589467 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 84481/104857600
2012-05-11 00:08:55.589470 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f175481caa0
2012-05-11 00:08:55.589477 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 6111 to dispatch throttler 84461/104857600
2012-05-11 00:08:55.589480 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f176328eb00
2012-05-11 00:08:55.589500 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 78350/104857600
2012-05-11 00:08:55.589502 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1762f1d9b0
2012-05-11 00:08:55.589508 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 6251 to dispatch throttler 78330/104857600
2012-05-11 00:08:55.589511 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1764076df0
2012-05-11 00:08:55.589531 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 72079/104857600
2012-05-11 00:08:55.589534 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17638e5300
2012-05-11 00:08:55.589540 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 6251 to dispatch throttler 72059/104857600
2012-05-11 00:08:55.589543 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17634a65b0
2012-05-11 00:08:55.589562 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 65808/104857600
2012-05-11 00:08:55.589565 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17634ac690
2012-05-11 00:08:55.589571 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 6251 to dispatch throttler 65788/104857600
2012-05-11 00:08:55.589574 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1762fc57e0
2012-05-11 00:08:55.589597 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 59537/104857600
2012-05-11 00:08:55.589600 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1762f80f90
2012-05-11 00:08:55.589606 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 6251 to dispatch throttler 59517/104857600
2012-05-11 00:08:55.589609 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17633c6910
2012-05-11 00:08:55.589632 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 53266/104857600
2012-05-11 00:08:55.589636 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1762e920d0
2012-05-11 00:08:55.589642 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 7208 to dispatch throttler 53246/104857600
2012-05-11 00:08:55.589645 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1763495490
2012-05-11 00:08:55.589665 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 46038/104857600
2012-05-11 00:08:55.589669 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17633b8550
2012-05-11 00:08:55.589674 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 8041 to dispatch throttler 46018/104857600
2012-05-11 00:08:55.589678 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17633b8990
2012-05-11 00:08:55.589704 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 37977/104857600
2012-05-11 00:08:55.589707 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f176321c780
2012-05-11 00:08:55.589717 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 8041 to dispatch throttler 37957/104857600
2012-05-11 00:08:55.589720 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1763169160
2012-05-11 00:08:55.589743 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 29916/104857600
2012-05-11 00:08:55.589753 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17631032e0
2012-05-11 00:08:55.589759 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 8041 to dispatch throttler 29896/104857600
2012-05-11 00:08:55.589762 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1763329570
2012-05-11 00:08:55.589782 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 21855/104857600
2012-05-11 00:08:55.589785 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f17633f2e50
2012-05-11 00:08:55.589791 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 8041 to dispatch throttler 21835/104857600
2012-05-11 00:08:55.589793 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1762ddaf60
2012-05-11 00:08:55.589809 7f1758274700 10 -- 10.0.0.11:0/1028586 dispatch_throttle_release 20 to dispatch throttler 13794/104857600
2012-05-11 00:08:55.589812 7f1758274700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).  discard 0x7f1762eb6260
2012-05-11 00:08:55.589825 7f1758274700 20 -- 10.0.0.11:0/1028586 >> :/0 pipe(0x7f1762d972e0 sd=-1 pgs=0 cs=0 l=0).queue_received queuing pipe
2012-05-11 00:08:55.589832 7f1758274700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).reader done
2012-05-11 00:08:55.589897 7f17605ca700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).writer finishing
2012-05-11 00:08:55.589913 7f17605ca700 10 -- 10.0.0.11:0/1028586 queue_reap 0x7f1762d98ef0
2012-05-11 00:08:55.589922 7f17605ca700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).writer done
2012-05-11 00:08:55.589958 7f1758a75700 10 -- 10.0.0.11:0/1028586 reaper
2012-05-11 00:08:55.589966 7f1758a75700 10 -- 10.0.0.11:0/1028586 reaper reaping pipe 0x7f1762d98ef0 10.0.0.10:6789/0
2012-05-11 00:08:55.589971 7f1758a75700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).discard_queue
2012-05-11 00:08:55.589976 7f1758a75700 20 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1). dequeued pipe 
2012-05-11 00:08:55.589984 7f1758a75700 10 -- 10.0.0.11:0/1028586 >> 10.0.0.10:6789/0 pipe(0x7f1762d98ef0 sd=17 pgs=62 cs=1 l=1).unregister_pipe
2012-05-11 00:08:55.590006 7f1758a75700 10 -- 10.0.0.11:0/1028586 reaper reaped pipe 0x7f1762d98ef0 10.0.0.10:6789/0
2012-05-11 00:08:55.590022 7f1758a75700 10 -- 10.0.0.11:0/1028586 reaper deleted pipe 0x7f1762d98ef0
2012-05-11 00:08:55.590025 7f1758a75700 10 -- 10.0.0.11:0/1028586 reaper done
2012-05-11 00:08:57.991788 7f1753fff700  0 -- 10.0.0.11:0/1028586 send_message dropped message mon_subscribe({monmap=2+,osdmap=8634}) v2 because of no pipe on con 0x7f1762d99160
2012-05-11 00:08:57.991815 7f1753fff700  0 -- 10.0.0.11:0/1028586 send_keepalive con 0x7f1762d99160, no pipe.

#12 Updated by Josh Durgin over 8 years ago

So the connection with no pipe problem only happens when the monitor is restarted. The issue still happens otherwise though - the mon_subscribe_ack isn't being dispatched - it's received and read, but not actually transferred to the higher level dispatchers. Investigating more...

Log of this

2012-05-11 18:08:48.083494 7f98abfff700 10 monclient: tick
2012-05-11 18:08:48.083503 7f98abfff700 10 monclient: renew_subs
2012-05-11 18:08:48.083508 7f98abfff700 10 monclient: _send_mon_message to mon.0 at 10.0.0.10:6789/0
2012-05-11 18:08:48.083515 7f98abfff700  1 -- 10.0.0.11:0/2022646 --> 10.0.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=8771}) v2 -- ?+0 0x7f98c3df9450 con 0x7f98b400e3
90
2012-05-11 18:08:48.083526 7f98abfff700 20 -- 10.0.0.11:0/2022646 submit_message mon_subscribe({monmap=2+,osdmap=8771}) v2 remote 10.0.0.10:6789/0
2012-05-11 18:08:48.083539 7f98abfff700 20 -- 10.0.0.11:0/2022646 send_keepalive con 0x7f98b400e390, have pipe.
2012-05-11 18:08:48.083544 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.083567 7f98b849e700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer encoding 96 0x7f98c3df9450 mon_sub
scribe({monmap=2+,osdmap=8771}) v2
2012-05-11 18:08:48.083591 7f98b849e700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer sending 96 0x7f98c3df9450
2012-05-11 18:08:48.083602 7f98b849e700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).write_message 0x7f98c3df9450
2012-05-11 18:08:48.083625 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.083636 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).write_keepalive
2012-05-11 18:08:48.083649 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.083658 7f98b849e700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer sleeping
2012-05-11 18:08:48.084599 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got ACK
2012-05-11 18:08:48.084627 7f98b839d700 15 reader got ack seq 96
2012-05-11 18:08:48.084630 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader reading tag...
2012-05-11 18:08:48.084640 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got MSG
2012-05-11 18:08:48.084651 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got envelope type=41 src mon.0 fro
nt=6939 data=0 off 0
2012-05-11 18:08:48.084661 7f98b839d700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader wants 6939 from dispatch throttler
 568491/104857600
2012-05-11 18:08:48.084680 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got front 6939
2012-05-11 18:08:48.084696 7f98b839d700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).aborted = 0
2012-05-11 18:08:48.084704 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got 6939 + 0 + 0 byte message
2012-05-11 18:08:48.084750 7f98b839d700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got message 191 0x7f98c39606a0 osd
_map(8771..8796 src has 8296..8796) v3
2012-05-11 18:08:48.084765 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader reading tag...
2012-05-11 18:08:48.084767 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.084777 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got MSG
2012-05-11 18:08:48.084779 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).write_ack 191
2012-05-11 18:08:48.084792 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got envelope type=16 src mon.0 fro
nt=20 data=0 off 0
2012-05-11 18:08:48.084796 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.084802 7f98b839d700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader wants 20 from dispatch throttler 5
75430/104857600
2012-05-11 18:08:48.084827 7f98b849e700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer sleeping
2012-05-11 18:08:48.084831 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got front 20
2012-05-11 18:08:48.084839 7f98b839d700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).aborted = 0
2012-05-11 18:08:48.084846 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got 20 + 0 + 0 byte message
2012-05-11 18:08:48.084855 7f98b839d700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader got message 192 0x7f98802b1860 mon
_subscribe_ack(300s) v1
2012-05-11 18:08:48.084863 7f98b839d700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).reader reading tag...
2012-05-11 18:08:48.084865 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.084872 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).write_ack 192
2012-05-11 18:08:48.084878 7f98b849e700 10 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer: state = 2 policy.server=0
2012-05-11 18:08:48.084884 7f98b849e700 20 -- 10.0.0.11:0/2022646 >> 10.0.0.10:6789/0 pipe(0x7f98b400e120 sd=25 pgs=26 cs=1 l=1).writer sleeping
2012-05-11 18:08:50.173867 7f98aaffd700 10 client.16418.objecter tick

#13 Updated by Josh Durgin over 8 years ago

The dispatch thread hasn't run since this point in the log:

2012-05-11 17:53:07.402591 7f98b17fa700 15 client.16418.objecter send_op 836 to osd.0
2012-05-11 17:53:07.402597 7f98b17fa700  1 -- 10.0.0.11:0/2022646 --> 10.0.0.11:6803/21968 -- osd_op(client.16418.0:836 rb.0.450.000000000109 [write 0~3304448] 14.ee44b5a
a RETRY) v4 -- ?+0 0x7f98b44748e0 con 0x7f98b44fee40
2012-05-11 17:53:07.402603 7f98b17fa700 20 -- 10.0.0.11:0/2022646 submit_message osd_op(client.16418.0:836 rb.0.450.000000000109 [write 0~3304448] 14.ee44b5aa RETRY) v4 r
emote 10.0.0.11:6803/21968
2012-05-11 17:53:07.402610 7f98b17fa700 15 client.16418.objecter send_linger 1
2012-05-11 17:53:07.402619 7f98b17fa700 10 client.16418.objecter recalc_op_target tid 0 pgid 14.dc4274ab acting [0]

#14 Updated by Josh Durgin over 8 years ago

Here's a log in which the dispatcher never runs again, despite the osd_map_ack and other messages being received.

Oliver, would it be possible to install ceph packages with debugging symbols?

#15 Updated by Oliver Francke over 8 years ago

Hi Josh,

should've been done, if I missedt s/t, stuff is in /usr/src...

Hope it helps,

Oliver.

#16 Updated by Josh Durgin over 8 years ago

  • Status changed from In Progress to 7

The deadlock was due to throttling of resent linger requests during map changes. wip-objecter-throttle should fix this.

The backtrace of the hung thread, for posterity:

Thread 6 (Thread 0x7fd78dffb700 (LWP 27691)):
#0  0x00007fd7a3ad22d4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fd7a5023c09 in Wait (mutex=..., this=<optimized out>) at ./common/Cond.h:48
#2  _wait (c=<optimized out>, this=<optimized out>) at ./common/Throttle.h:46
#3  Throttle::get (this=0x7fd7a7db5a48, c=3096576, m=<optimized out>) at ./common/Throttle.h:89
#4  0x00007fd7a5024574 in Objecter::throttle_op (this=0x7fd7a7db5810, op=<optimized out>, op_budget=<optimized out>) at osdc/Objecter.cc:1142
#5  0x00007fd7a5024872 in take_op_budget (op=<optimized out>, this=<optimized out>) at osdc/Objecter.h:897
#6  Objecter::op_submit (this=0x7fd7a7db5810, op=0x7fd7903d0830, s=0x0) at osdc/Objecter.cc:845
#7  0x00007fd7a5012398 in Objecter::write (this=0x7fd7a7db5810, oid=..., oloc=..., off=941056, len=<optimized out>, snapc=..., bl=..., mtime=..., onack=0x7fd790580c60, 
    oncommit=0x7fd7908c5260, objver=0x7fd790b618d8, extra_ops=0x0, flags=<optimized out>) at ./osdc/Objecter.h:1197
#8  0x00007fd7a5012bec in librados::IoCtxImpl::aio_write (this=0x7fd79800e230, oid=..., c=<optimized out>, bl=..., len=<optimized out>, off=<optimized out>)
    at librados/IoCtxImpl.cc:755
#9  0x00007fd7a4fef712 in librados::IoCtx::aio_write (this=0x7fd798009738, oid=..., c=<optimized out>, bl=..., len=<optimized out>, off=<optimized out>)
    at librados/librados.cc:934
#10 0x00007fd7a54a5952 in LibrbdWriteback::write (this=0x7fd798009720, oid=..., oloc=..., off=941056, len=<optimized out>, snapc=..., bl=..., mtime=..., trunc_size=0, 
    trunc_seq=0, oncommit=0x7fd790779af0) at librbd/LibrbdWriteback.cc:81
#11 0x00007fd7a54aae84 in ObjectCacher::bh_write (this=0x7fd79800e390, bh=0x7fd767ed6b60) at osdc/ObjectCacher.cc:654
#12 0x00007fd7a54b377d in ObjectCacher::flusher_entry (this=0x7fd79800e390) at osdc/ObjectCacher.cc:1196
---Type <return> to continue, or q <return> to quit---
#13 0x00007fd7a54bdd2d in ObjectCacher::FlusherThread::entry() () from /usr/lib/librbd.so.1
#14 0x00007fd7a3acdb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fd7a381890d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()

#17 Updated by Josh Durgin over 8 years ago

Er, the important hung thread:

Thread 14 (Thread 0x7fd7957fa700 (LWP 27683)):
#0  0x00007fd7a3ad22d4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fd7a5023c09 in Wait (mutex=..., this=<optimized out>) at ./common/Cond.h:48
#2  _wait (c=<optimized out>, this=<optimized out>) at ./common/Throttle.h:46
#3  Throttle::get (this=0x7fd7a7db5a48, c=0, m=<optimized out>) at ./common/Throttle.h:89
#4  0x00007fd7a5024574 in Objecter::throttle_op (this=0x7fd7a7db5810, op=<optimized out>, op_budget=<optimized out>) at osdc/Objecter.cc:1142
#5  0x00007fd7a5024872 in take_op_budget (op=<optimized out>, this=<optimized out>) at osdc/Objecter.h:897
#6  Objecter::op_submit (this=0x7fd7a7db5810, op=0x7fd7a7f9b0f0, s=0x7fd7a7f7dc40) at osdc/Objecter.cc:845
#7  0x00007fd7a50278b1 in Objecter::send_linger (this=<optimized out>, info=<optimized out>) at osdc/Objecter.cc:267
#8  0x00007fd7a502a1dd in Objecter::handle_osd_map (this=0x7fd7a7db5810, m=<optimized out>) at osdc/Objecter.cc:540
#9  0x00007fd7a4fff5ac in librados::RadosClient::_dispatch (this=0x7fd7a7db1650, m=0x80) at librados/RadosClient.cc:295
#10 0x00007fd7a5000b7b in librados::RadosClient::ms_dispatch (this=0x7fd7a7db1650, m=0x7fd7907d7600) at librados/RadosClient.cc:261
#11 0x00007fd7a510fb59 in ms_deliver_dispatch (m=<optimized out>, this=<optimized out>) at msg/Messenger.h:178
#12 SimpleMessenger::dispatch_entry (this=0x7fd7a7db49c0) at msg/SimpleMessenger.cc:363
#13 0x00007fd7a50038dd in SimpleMessenger::DispatchThread::entry (this=<optimized out>) at ./msg/SimpleMessenger.h:560
#14 0x00007fd7a3acdb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fd7a381890d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()

#18 Updated by Josh Durgin over 8 years ago

  • Status changed from 7 to Resolved

More than 100 runs with no problems, I'd say this is fixed. d7343814a01257a4f727fdfc752361b930ab5719

#19 Updated by Oliver Francke over 8 years ago

Hi Josh,

yeah, seems to be fixed. When do we have to expect it in a more "stable" release?

Thnx for all efforts,

Oliver.

#20 Updated by Josh Durgin over 8 years ago

Hi Oliver,

The fix will be in 0.47, which should be out in the next couple days. Glad to hear it's fixed!

Josh

Also available in: Atom PDF