Bug #8515
rbd ls or rbd info hangs for ever on specific client
Description
I'm observing something on a cluster which I can't explain and I think it's a bug in the OSD.
On a freshly installed client (deployed with Puppet) a "rbd ls" will block for ever.
It seems somewhat related to #2454 but looking closer I ruled out that's the case.
In #2454 the client would go to the wrong OSD, but in this case it's not.
When running with --debug-rados=20 this is what I see:
2014-06-03 13:26:11.614320 7f876cd72780 1 librados: starting msgr at :/0 2014-06-03 13:26:11.614329 7f876cd72780 1 librados: starting objecter 2014-06-03 13:26:11.614488 7f876cd72780 1 librados: setting wanted keys 2014-06-03 13:26:11.614498 7f876cd72780 1 librados: calling monclient init 2014-06-03 13:26:11.619010 7f876cd72780 1 librados: init done 2014-06-03 13:26:11.619040 7f876cd72780 10 librados: wait_for_osdmap waiting 2014-06-03 13:26:11.621082 7f876cd72780 10 librados: wait_for_osdmap done waiting 2014-06-03 13:26:11.621135 7f876cd72780 10 librados: read oid=rbd_directory nspace= 2014-06-03 13:26:11.624063 7f876cd72780 10 librados: Objecter returned from read r=0 2014-06-03 13:26:11.624165 7f876cd72780 10 librados: call oid=rbd_directory nspace= ^C
As you can see I have to interrupt the command since it never finishes.
Running with debug-objecter=20 I can see this happening:
2014-06-03 13:31:17.877171 7f1dbc046700 10 client.?.objecter ms_handle_connect 0x17af530 2014-06-03 13:31:17.877240 7f1dbc046700 10 client.?.objecter resend_mon_ops 2014-06-03 13:31:17.879852 7f1dc108b780 10 client.942971.objecter maybe_request_map subscribing (onetime) to next osd map 2014-06-03 13:31:17.881358 7f1dbc046700 3 client.942971.objecter handle_osd_map got epochs [25152,25152] > 0 2014-06-03 13:31:17.881375 7f1dbc046700 3 client.942971.objecter handle_osd_map decoding full epoch 25152 2014-06-03 13:31:17.881929 7f1dbc046700 20 client.942971.objecter dump_active .. 0 homeless 2014-06-03 13:31:17.882041 7f1dc108b780 10 client.942971.objecter recalc_op_target tid 1 pgid 20.30a98c1c acting [56,40,19] 2014-06-03 13:31:17.882161 7f1dc108b780 20 client.942971.objecter note: not requesting commit 2014-06-03 13:31:17.882167 7f1dc108b780 10 client.942971.objecter op_submit oid rbd_directory @20 [read 0~0] tid 1 osd.56 2014-06-03 13:31:17.882173 7f1dc108b780 15 client.942971.objecter send_op 1 to osd.56 2014-06-03 13:31:17.882189 7f1dc108b780 5 client.942971.objecter 1 unacked, 0 uncommitted 2014-06-03 13:31:17.882208 7f1dbc046700 3 client.942971.objecter handle_osd_map ignoring epochs [25152,25152] <= 25152 2014-06-03 13:31:17.882217 7f1dbc046700 20 client.942971.objecter dump_active .. 0 homeless 2014-06-03 13:31:17.882220 7f1dbc046700 20 client.942971.objecter 1 20.30a98c1c osd.56 rbd_directory [read 0~0] 2014-06-03 13:31:17.884183 7f1dbc046700 10 client.942971.objecter ms_handle_connect 0x17b23c0 2014-06-03 13:31:17.885587 7f1dbc046700 10 client.942971.objecter in handle_osd_op_reply 2014-06-03 13:31:17.885592 7f1dbc046700 7 client.942971.objecter handle_osd_op_reply 1 ondisk v 25152'296551 in 20.30a98c1c attempt 0 2014-06-03 13:31:17.885600 7f1dbc046700 10 client.942971.objecter op 0 rval 0 len 0 2014-06-03 13:31:17.885604 7f1dbc046700 15 client.942971.objecter handle_osd_op_reply ack 2014-06-03 13:31:17.885609 7f1dbc046700 15 client.942971.objecter handle_osd_op_reply completed tid 1 2014-06-03 13:31:17.885617 7f1dbc046700 15 client.942971.objecter finish_op 1 2014-06-03 13:31:17.885628 7f1dbc046700 5 client.942971.objecter 0 unacked, 0 uncommitted 2014-06-03 13:31:17.885715 7f1dc108b780 10 client.942971.objecter recalc_op_target tid 2 pgid 20.30a98c1c acting [56,40,19] 2014-06-03 13:31:17.885722 7f1dc108b780 20 client.942971.objecter note: not requesting commit 2014-06-03 13:31:17.885724 7f1dc108b780 10 client.942971.objecter op_submit oid rbd_directory @20 [call rbd.dir_list] tid 2 osd.56 2014-06-03 13:31:17.885730 7f1dc108b780 15 client.942971.objecter send_op 2 to osd.56 2014-06-03 13:31:17.885750 7f1dc108b780 5 client.942971.objecter 1 unacked, 0 uncommitted 2014-06-03 13:31:22.879952 7f1db9f41700 10 client.942971.objecter tick 2014-06-03 13:31:27.880154 7f1db9f41700 10 client.942971.objecter tick 2014-06-03 13:31:32.880300 7f1db9f41700 10 client.942971.objecter tick 2014-06-03 13:31:32.880326 7f1db9f41700 2 client.942971.objecter tid 2 on osd.56 is laggy 2014-06-03 13:31:32.880334 7f1db9f41700 10 client.942971.objecter maybe_request_map subscribing (onetime) to next osd map 2014-06-03 13:31:37.880590 7f1db9f41700 10 client.942971.objecter tick 2014-06-03 13:31:37.880620 7f1db9f41700 2 client.942971.objecter tid 2 on osd.56 is laggy 2014-06-03 13:31:37.880629 7f1db9f41700 10 client.942971.objecter maybe_request_map subscribing (onetime) to next osd map
This keeps going on, osd.56 stays laggy and the command never finishes.
Running this command on a different machine in the cluster works just fine. It's only this specific client which is not working.
On all clients I verified that they get the correct osdmap (e25152) and that rbd_directory is indeed on osd.56
The whole cluster was upgraded from 0.67.7 to 0.67.9 to rule out it wasn't a bug which was already resolved, but that didn't fix the issue either.
When running on a different node you can see this:
2014-06-03 13:33:43.433939 7fe840c2d7c0 10 client.942986.objecter recalc_op_target tid 2 pgid 20.30a98c1c acting [56,40,19] 2014-06-03 13:33:43.433951 7fe840c2d7c0 20 client.942986.objecter note: not requesting commit 2014-06-03 13:33:43.433954 7fe840c2d7c0 10 client.942986.objecter op_submit oid rbd_directory @20 @20 [call rbd.dir_list] tid 2 osd.56 2014-06-03 13:33:43.433962 7fe840c2d7c0 15 client.942986.objecter send_op 2 to osd.56 2014-06-03 13:33:43.433999 7fe840c2d7c0 5 client.942986.objecter 1 unacked, 0 uncommitted 2014-06-03 13:33:43.449012 7fe83b676700 10 client.942986.objecter in handle_osd_op_reply 2014-06-03 13:33:43.449016 7fe83b676700 7 client.942986.objecter handle_osd_op_reply 2 ondisk v 25152'296551 uv 296551 in 20.30a98c1c attempt 0 2014-06-03 13:33:43.449022 7fe83b676700 10 client.942986.objecter op 0 rval 0 len 37434 2014-06-03 13:33:43.449024 7fe83b676700 15 client.942986.objecter handle_osd_op_reply ack 2014-06-03 13:33:43.449027 7fe83b676700 15 client.942986.objecter handle_osd_op_reply completed tid 2 2014-06-03 13:33:43.449029 7fe83b676700 15 client.942986.objecter finish_op 2 2014-06-03 13:33:43.449035 7fe83b676700 5 client.942986.objecter 0 unacked, 0 uncommitted ... ... 2014-06-03 13:33:43.458260 7fe840c2d7c0 10 client.942986.objecter close_session for osd.56
Again, I verified the network (IPv6) is working just fine between all the nodes and clients in the cluster. It's HEALTH_OK and all I/O is working.
You would say this has to be a client specific problem, but this client was freshly installed with Puppet and is running exactly the same version (0.67.7) as any other client in the network.
It's not only osd.56 which this client is having issues with. When trying to run "rbd info" on any of the 2k RBD images in that pool 9 of that will block.
In this case it tries to talk to osd.35 which is showing similar issues.
Related issues
History
#1 Updated by Wido den Hollander almost 10 years ago
What I forgot to add is that stopping osd.56 doesn't work either,
osd.40 then becomes primary for that PG and the result is the same. The command never finishes on osd.40
#2 Updated by Sage Weil almost 10 years ago
- Status changed from New to Need More Info
- Assignee set to Sage Weil
- Priority changed from Normal to High
- Source changed from other to Community (dev)
can you confirm that librados2 and ceph-common and librbd1 package versions all match?
if so, can you turn up logging on osd.52 and capture a log fragment with both the successful client and teh blocked client?
thanks!
#3 Updated by Wido den Hollander almost 10 years ago
- File ceph-osd.56.log.gz added
Sage Weil wrote:
can you confirm that librados2 and ceph-common and librbd1 package versions all match?
Yes, I did and they match:
On the client:
[root@n05:~]$ ceph --version ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a) [root@n05:~]$ dpkg -l|grep librados ii librados2 0.67.9-1precise RADOS distributed object store client library [root@n05:~]$ dpkg -l|grep librbd ii librbd1 0.67.9-1precise RADOS block device client library [root@n05:~]$ dpkg -l|grep ceph-common ii ceph-common 0.67.9-1precise common utilities to mount and interact with a ceph storage cluster [root@n05:~]$
On the machine running osd.56:
[root@ceph10:~]$ ceph --version ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a) [root@ceph10:~]$ dpkg -l|grep librados ii librados2 0.67.9-1precise RADOS distributed object store client library [root@ceph10:~]$ dpkg -l|grep librbd ii librbd1 0.67.9-1precise RADOS block device client library [root@ceph10:~]$ dpkg -l|grep ceph-common ii ceph-common 0.67.9-1precise common utilities to mount and interact with a ceph storage cluster [root@ceph10:~]$
if so, can you turn up logging on osd.52 and capture a log fragment with both the successful client and teh blocked client?
It's osd.56, but that doesn't matter. I set debug_osd to 20 with both a succesful and blocked client. Log is attached.
The successful client is 'client.944775' and the blocked one is 'client.947030'.
I also tried to stop osd.56 and let osd.40 become the primary. Same result, that client blocked.
We also set up a second client, freshly installed. Same issue.
thanks!
#4 Updated by Wido den Hollander almost 10 years ago
I also tried to upgrade the client to Firefly 0.80.1 but that didn't change anything. The end result is the same.
#5 Updated by Sage Weil almost 10 years ago
Hrm, I'm not seeing what is going wrong here. ANy chance you can repeat this with debug ms = 20 on the osd and client side for the failure case?
Thanks!
#6 Updated by Wido den Hollander almost 10 years ago
- File ceph-osd.56-ms.log.gz added
- File ceph-client-ms.log View added
Sage Weil wrote:
Hrm, I'm not seeing what is going wrong here. ANy chance you can repeat this with debug ms = 20 on the osd and client side for the failure case?
I attached both logs. Grep for the IPv6 address ending on 6801:5 to see all the traffic of that client.
What I noticed is that the osd seems to reply, but the client thinks it didn't?
2014-06-04 13:47:31.225791 7fa4609a6700 20 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer encoding 2 features 4432406249471 0x1074ba00 osd_op_reply(2 rbd_directory [call rbd.dir_list] ondisk = 0) v4 2014-06-04 13:47:31.225977 7fa4609a6700 20 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer signed seq # 2): sig = 9946002230841854515 2014-06-04 13:47:31.226013 7fa4609a6700 20 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer sending 2 0x1074ba00 2014-06-04 13:47:31.226118 7fa4609a6700 10 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer: state = open policy.server=1
#7 Updated by Wido den Hollander almost 10 years ago
- Status changed from Need More Info to Rejected
So it turns out this was not a Ceph problem after all.
The cluster in this case runs over IPv6 and the network admin forgot to enable Jumbo Frames on the ports of these specific clients.
The Ceph nodes all run with a MTU of 9000 and the clients as well.
IPv6 sets the "Do Not Fragment" bit, so the switches discarded the packet instead of fragmenting it into multiple smaller packets.
With IPv4 this setup would have worked, but the switches would have been very busy in fragmenting all the packets.