Project

General

Profile

Bug #8515

rbd ls or rbd info hangs for ever on specific client

Added by Wido den Hollander about 5 years ago. Updated about 5 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
06/03/2014
Due date:
% Done:

0%

Spent time:
Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

I'm observing something on a cluster which I can't explain and I think it's a bug in the OSD.

On a freshly installed client (deployed with Puppet) a "rbd ls" will block for ever.

It seems somewhat related to #2454 but looking closer I ruled out that's the case.

In #2454 the client would go to the wrong OSD, but in this case it's not.

When running with --debug-rados=20 this is what I see:

2014-06-03 13:26:11.614320 7f876cd72780  1 librados: starting msgr at :/0
2014-06-03 13:26:11.614329 7f876cd72780  1 librados: starting objecter
2014-06-03 13:26:11.614488 7f876cd72780  1 librados: setting wanted keys
2014-06-03 13:26:11.614498 7f876cd72780  1 librados: calling monclient init
2014-06-03 13:26:11.619010 7f876cd72780  1 librados: init done
2014-06-03 13:26:11.619040 7f876cd72780 10 librados: wait_for_osdmap waiting
2014-06-03 13:26:11.621082 7f876cd72780 10 librados: wait_for_osdmap done waiting
2014-06-03 13:26:11.621135 7f876cd72780 10 librados: read oid=rbd_directory nspace=
2014-06-03 13:26:11.624063 7f876cd72780 10 librados: Objecter returned from read r=0
2014-06-03 13:26:11.624165 7f876cd72780 10 librados: call oid=rbd_directory nspace=
^C

As you can see I have to interrupt the command since it never finishes.

Running with debug-objecter=20 I can see this happening:

2014-06-03 13:31:17.877171 7f1dbc046700 10 client.?.objecter ms_handle_connect 0x17af530
2014-06-03 13:31:17.877240 7f1dbc046700 10 client.?.objecter resend_mon_ops
2014-06-03 13:31:17.879852 7f1dc108b780 10 client.942971.objecter maybe_request_map subscribing (onetime) to next osd map
2014-06-03 13:31:17.881358 7f1dbc046700  3 client.942971.objecter handle_osd_map got epochs [25152,25152] > 0
2014-06-03 13:31:17.881375 7f1dbc046700  3 client.942971.objecter handle_osd_map decoding full epoch 25152
2014-06-03 13:31:17.881929 7f1dbc046700 20 client.942971.objecter dump_active .. 0 homeless
2014-06-03 13:31:17.882041 7f1dc108b780 10 client.942971.objecter recalc_op_target tid 1 pgid 20.30a98c1c acting [56,40,19]
2014-06-03 13:31:17.882161 7f1dc108b780 20 client.942971.objecter  note: not requesting commit
2014-06-03 13:31:17.882167 7f1dc108b780 10 client.942971.objecter op_submit oid rbd_directory @20 [read 0~0] tid 1 osd.56
2014-06-03 13:31:17.882173 7f1dc108b780 15 client.942971.objecter send_op 1 to osd.56
2014-06-03 13:31:17.882189 7f1dc108b780  5 client.942971.objecter 1 unacked, 0 uncommitted
2014-06-03 13:31:17.882208 7f1dbc046700  3 client.942971.objecter handle_osd_map ignoring epochs [25152,25152] <= 25152
2014-06-03 13:31:17.882217 7f1dbc046700 20 client.942971.objecter dump_active .. 0 homeless
2014-06-03 13:31:17.882220 7f1dbc046700 20 client.942971.objecter 1    20.30a98c1c    osd.56    rbd_directory    [read 0~0]
2014-06-03 13:31:17.884183 7f1dbc046700 10 client.942971.objecter ms_handle_connect 0x17b23c0
2014-06-03 13:31:17.885587 7f1dbc046700 10 client.942971.objecter in handle_osd_op_reply
2014-06-03 13:31:17.885592 7f1dbc046700  7 client.942971.objecter handle_osd_op_reply 1 ondisk v 25152'296551 in 20.30a98c1c attempt 0
2014-06-03 13:31:17.885600 7f1dbc046700 10 client.942971.objecter  op 0 rval 0 len 0
2014-06-03 13:31:17.885604 7f1dbc046700 15 client.942971.objecter handle_osd_op_reply ack
2014-06-03 13:31:17.885609 7f1dbc046700 15 client.942971.objecter handle_osd_op_reply completed tid 1
2014-06-03 13:31:17.885617 7f1dbc046700 15 client.942971.objecter finish_op 1
2014-06-03 13:31:17.885628 7f1dbc046700  5 client.942971.objecter 0 unacked, 0 uncommitted
2014-06-03 13:31:17.885715 7f1dc108b780 10 client.942971.objecter recalc_op_target tid 2 pgid 20.30a98c1c acting [56,40,19]
2014-06-03 13:31:17.885722 7f1dc108b780 20 client.942971.objecter  note: not requesting commit
2014-06-03 13:31:17.885724 7f1dc108b780 10 client.942971.objecter op_submit oid rbd_directory @20 [call rbd.dir_list] tid 2 osd.56
2014-06-03 13:31:17.885730 7f1dc108b780 15 client.942971.objecter send_op 2 to osd.56
2014-06-03 13:31:17.885750 7f1dc108b780  5 client.942971.objecter 1 unacked, 0 uncommitted
2014-06-03 13:31:22.879952 7f1db9f41700 10 client.942971.objecter tick
2014-06-03 13:31:27.880154 7f1db9f41700 10 client.942971.objecter tick
2014-06-03 13:31:32.880300 7f1db9f41700 10 client.942971.objecter tick
2014-06-03 13:31:32.880326 7f1db9f41700  2 client.942971.objecter  tid 2 on osd.56 is laggy
2014-06-03 13:31:32.880334 7f1db9f41700 10 client.942971.objecter maybe_request_map subscribing (onetime) to next osd map
2014-06-03 13:31:37.880590 7f1db9f41700 10 client.942971.objecter tick
2014-06-03 13:31:37.880620 7f1db9f41700  2 client.942971.objecter  tid 2 on osd.56 is laggy
2014-06-03 13:31:37.880629 7f1db9f41700 10 client.942971.objecter maybe_request_map subscribing (onetime) to next osd map

This keeps going on, osd.56 stays laggy and the command never finishes.

Running this command on a different machine in the cluster works just fine. It's only this specific client which is not working.

On all clients I verified that they get the correct osdmap (e25152) and that rbd_directory is indeed on osd.56

The whole cluster was upgraded from 0.67.7 to 0.67.9 to rule out it wasn't a bug which was already resolved, but that didn't fix the issue either.

When running on a different node you can see this:

2014-06-03 13:33:43.433939 7fe840c2d7c0 10 client.942986.objecter recalc_op_target tid 2 pgid 20.30a98c1c acting [56,40,19]
2014-06-03 13:33:43.433951 7fe840c2d7c0 20 client.942986.objecter  note: not requesting commit
2014-06-03 13:33:43.433954 7fe840c2d7c0 10 client.942986.objecter op_submit oid rbd_directory @20 @20 [call rbd.dir_list] tid 2 osd.56
2014-06-03 13:33:43.433962 7fe840c2d7c0 15 client.942986.objecter send_op 2 to osd.56
2014-06-03 13:33:43.433999 7fe840c2d7c0  5 client.942986.objecter 1 unacked, 0 uncommitted
2014-06-03 13:33:43.449012 7fe83b676700 10 client.942986.objecter in handle_osd_op_reply
2014-06-03 13:33:43.449016 7fe83b676700  7 client.942986.objecter handle_osd_op_reply 2 ondisk v 25152'296551 uv 296551 in 20.30a98c1c attempt 0
2014-06-03 13:33:43.449022 7fe83b676700 10 client.942986.objecter  op 0 rval 0 len 37434
2014-06-03 13:33:43.449024 7fe83b676700 15 client.942986.objecter handle_osd_op_reply ack
2014-06-03 13:33:43.449027 7fe83b676700 15 client.942986.objecter handle_osd_op_reply completed tid 2
2014-06-03 13:33:43.449029 7fe83b676700 15 client.942986.objecter finish_op 2
2014-06-03 13:33:43.449035 7fe83b676700  5 client.942986.objecter 0 unacked, 0 uncommitted
...
...
2014-06-03 13:33:43.458260 7fe840c2d7c0 10 client.942986.objecter close_session for osd.56

Again, I verified the network (IPv6) is working just fine between all the nodes and clients in the cluster. It's HEALTH_OK and all I/O is working.

You would say this has to be a client specific problem, but this client was freshly installed with Puppet and is running exactly the same version (0.67.7) as any other client in the network.

It's not only osd.56 which this client is having issues with. When trying to run "rbd info" on any of the 2k RBD images in that pool 9 of that will block.

In this case it tries to talk to osd.35 which is showing similar issues.

ceph-osd.56.log.gz (104 KB) Wido den Hollander, 06/03/2014 02:50 PM

ceph-osd.56-ms.log.gz (380 KB) Wido den Hollander, 06/04/2014 04:56 AM

ceph-client-ms.log View (76.6 KB) Wido den Hollander, 06/04/2014 04:56 AM


Related issues

Related to Ceph - Bug #2454: "rbd info xyz" hanging forever sometimes Resolved 05/20/2012

History

#1 Updated by Wido den Hollander about 5 years ago

What I forgot to add is that stopping osd.56 doesn't work either,

osd.40 then becomes primary for that PG and the result is the same. The command never finishes on osd.40

#2 Updated by Sage Weil about 5 years ago

  • Status changed from New to Need More Info
  • Assignee set to Sage Weil
  • Priority changed from Normal to High
  • Source changed from other to Community (dev)

can you confirm that librados2 and ceph-common and librbd1 package versions all match?

if so, can you turn up logging on osd.52 and capture a log fragment with both the successful client and teh blocked client?

thanks!

#3 Updated by Wido den Hollander about 5 years ago

Sage Weil wrote:

can you confirm that librados2 and ceph-common and librbd1 package versions all match?

Yes, I did and they match:

On the client:

[root@n05:~]$ ceph --version
ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a)
[root@n05:~]$ dpkg -l|grep librados
ii  librados2                        0.67.9-1precise                            RADOS distributed object store client library
[root@n05:~]$ dpkg -l|grep librbd
ii  librbd1                          0.67.9-1precise                            RADOS block device client library
[root@n05:~]$ dpkg -l|grep ceph-common
ii  ceph-common                      0.67.9-1precise                            common utilities to mount and interact with a ceph storage cluster
[root@n05:~]$

On the machine running osd.56:

[root@ceph10:~]$ ceph --version
ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a)
[root@ceph10:~]$ dpkg -l|grep librados
ii  librados2                        0.67.9-1precise                   RADOS distributed object store client library
[root@ceph10:~]$ dpkg -l|grep librbd
ii  librbd1                          0.67.9-1precise                   RADOS block device client library
[root@ceph10:~]$ dpkg -l|grep ceph-common
ii  ceph-common                      0.67.9-1precise                   common utilities to mount and interact with a ceph storage cluster
[root@ceph10:~]$

if so, can you turn up logging on osd.52 and capture a log fragment with both the successful client and teh blocked client?

It's osd.56, but that doesn't matter. I set debug_osd to 20 with both a succesful and blocked client. Log is attached.

The successful client is 'client.944775' and the blocked one is 'client.947030'.

I also tried to stop osd.56 and let osd.40 become the primary. Same result, that client blocked.

We also set up a second client, freshly installed. Same issue.

thanks!

#4 Updated by Wido den Hollander about 5 years ago

I also tried to upgrade the client to Firefly 0.80.1 but that didn't change anything. The end result is the same.

#5 Updated by Sage Weil about 5 years ago

Hrm, I'm not seeing what is going wrong here. ANy chance you can repeat this with debug ms = 20 on the osd and client side for the failure case?

Thanks!

#6 Updated by Wido den Hollander about 5 years ago

Sage Weil wrote:

Hrm, I'm not seeing what is going wrong here. ANy chance you can repeat this with debug ms = 20 on the osd and client side for the failure case?

I attached both logs. Grep for the IPv6 address ending on 6801:5 to see all the traffic of that client.

What I noticed is that the osd seems to reply, but the client thinks it didn't?

2014-06-04 13:47:31.225791 7fa4609a6700 20 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer encoding 2 features 4432406249471 0x1074ba00 osd_op_reply(2 rbd_directory [call rbd.dir_list] ondisk = 0) v4
2014-06-04 13:47:31.225977 7fa4609a6700 20 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer signed seq # 2): sig = 9946002230841854515
2014-06-04 13:47:31.226013 7fa4609a6700 20 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer sending 2 0x1074ba00
2014-06-04 13:47:31.226118 7fa4609a6700 10 -- [XXXX:200::6789:10]:6800/29479 >> [XXXX:200:0:2:6801:5]:0/1005152 pipe(0x2761c80 sd=560 :6800 s=2 pgs=2 cs=1 l=1 c=0xffc7000).writer: state = open policy.server=1

#7 Updated by Wido den Hollander about 5 years ago

  • Status changed from Need More Info to Rejected

So it turns out this was not a Ceph problem after all.

The cluster in this case runs over IPv6 and the network admin forgot to enable Jumbo Frames on the ports of these specific clients.

The Ceph nodes all run with a MTU of 9000 and the clients as well.

IPv6 sets the "Do Not Fragment" bit, so the switches discarded the packet instead of fragmenting it into multiple smaller packets.

With IPv4 this setup would have worked, but the switches would have been very busy in fragmenting all the packets.

Also available in: Atom PDF