Project

General

Profile

Bug #17573

librados doesn't properly report failure to create socket

Added by Ben England over 3 years ago. Updated over 3 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

In a really large Jewel cluster with > 1000 OSDs, I see rados bench hang on a pool with this command. It starts off ok but it gets an error saying it couldn't create a socket : too many open files. I'm guessing it is trying to open a socket to an OSD. If I run it with "ulimit -n 131072", this doesn't happen. My point is that it should exit when it gets this error and not just hang, so that CBT doesn't hang, and we can get some kind of conclusion about the health of the cluster.

Is this a problem bigger than the rados utility? If so, it could impact Ceph scalability with OSD count.

To reproduce in a smaller cluster, you might have to lower the open file descriptor limit drastically.

workload:
/usr/bin/rados c /etc/ceph/ceph.conf -p rados-bench-cbt bench -b 4194304 30 write --concurrent-ios 16 --run-name `hostname -s`-0 --no-cleanup
-=>>ulimit -n 1024 ; /usr/bin/rados -c /etc/ceph/ceph.conf -p rados-bench-cbt bench -b 4194304 30 write --concurrent-ios 16 --run-name `hostname -s`-0 --no-cleanup
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects
Object prefix: benchmark_data_c02-h17-r620.rdu.openstack.en_48662
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 287 271 1083.94 1084 0.0389778 0.0567504
2 16 567 551 1101.88 1120 0.110503 0.0570464
3 16 846 830 1106.53 1116 0.0659647 0.057096
4 16 1127 1111 1110.86 1124 0.0575751 0.0570865
5 16 1407 1391 1112.65 1120 0.0415876 0.0571044
6 16 1687 1671 1113.84 1120 0.0728675 0.0569945
7 16 1969 1953 1115.84 1128 0.0406718 0.0570041
8 16 2250 2234 1116.84 1124 0.0374575 0.0570422
9 16 2529 2513 1116.73 1116 0.0466408 0.0570168
10 16 2813 2797 1118.64 1136 0.0933425 0.0570534
11 16 3090 3074 1117.66 1108 0.0443647 0.0570197
12 16 3370 3354 1117.84 1120 0.0398344 0.0570368
13 16 3655 3639 1119.53 1140 0.0525464 0.0570459
14 16 3934 3918 1119.27 1116 0.0366038 0.0570136
15 16 4215 4199 1119.58 1124 0.0470502 0.0570443
16 16 4496 4480 1119.84 1124 0.0656467 0.0570273
2016-10-13 18:17:35.702165 7f893ae34700 -1 -
172.18.66.49:0/1757557866 >> 172.18.64.232:6904/113736 pipe(0x7f8cda701810 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8cda43f870).connect couldn't created socket (24) Too many open files
2016-10-13 18:17:35.702299 7f893ae34700 0 -- 172.18.66.49:0/1757557866 >> 172.18.64.232:6904/113736 pipe(0x7f8cda701810 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8cda43f870).fault
2016-10-13 18:17:35.702393 7f893ae34700 1 - 172.18.66.49:0/1757557866 >> 172.18.64.232:6904/113736 pipe(0x7f8cda701810 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8cda43f870).connect couldn't created socket (24) Too many open files
2016-10-13 18:17:35.725879 7f893ad33700 1 - 172.18.66.49:0/1757557866 >> 172.18.65.170:6928/218835 pipe(0x7f8cda7044c0 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8cda6f2990).connect couldn't created socket (24) Too many open files
2016-10-13 18:17:35.725962 7f893ad33700 0 -- 172.18.66.49:0/1757557866 >> 172.18.65.170:6928/218835 pipe(0x7f8cda7044c0 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8cda6f2990).fault

There is just one rados bench process running, nothing else going on in the cluster.

-=>>ceph -s
cluster 6942aa60-f4c4-4954-88d7-c67e0d0a0df5
health HEALTH_OK
monmap e1: 5 mons at {c04-h33-6018r=172.18.65.121:6789/0,c05-h33-6018r=172.18.65.118:6789/0,c06-h29-6018r=172.18.64.251:6789/0,c07-h29-6018r=172.18.64.226:6789/0,c07-h30-6018r=172.18.64.238:6789/0}
election epoch 6, quorum 0,1,2,3,4 c07-h29-6018r,c07-h30-6018r,c06-h29-6018r,c05-h33-6018r,c04-h33-6018r
fsmap e6: 1/1/1 up {0=c07-h29-6018r=up:active}
osdmap e2149: 1043 osds: 1043 up, 1043 in
flags sortbitwise
pgmap v7937: 27752 pgs, 10 pools, 241 GB data, 61964 objects
1141 GB used, 1895 TB / 1896 TB avail
27752 active+clean

pool 22 'rados-bench-cbt' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 16384 pgp_num 16384 last_change 2149 flags hashpspool stripe_width 0


Related issues

Duplicates Ceph - Feature #15308: LIBRADOS modify Pipe::connect() to return the error code Resolved 06/15/2016

History

#1 Updated by Ben England over 3 years ago

This is bigger than rados bench, it's librbd. I ran into the same problem with fio using rbd engine, saw this with CBT librbdfio benchmark then standalone. So I think it's librbd not rados bench.

(root@c05-h21-6048r) - (21:11) - ()
-=>>ulimit -n 512
(root@c05-h21-6048r) - (21:12) - (
)
-=>>/usr/local/bin/fio --ioengine=rbd --clientname=admin --pool=cbt-librbdfio --rbdname=cbt-librbdfio-`hostname -s`-7 --invalidate=0 --rw=write --numjobs=1 --bs=4M --size 16384M --name=librbdfio-`hostname -s`-0
librbdfio-c05-h21-6048r-0: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=1
fio-2.14-27-gafd2c
Starting 1 process
rbd engine: RBD version: 0.1.10
Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:09s]

Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:39s]
Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:44s]
Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:49s]
Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:54s]
Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:59s]
Jobs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 04m:14s]
^Cbs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 04m:19s]
fio: terminating on signal 2
^Cbs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 04m:34s]
fio: terminating on signal 2
^Cbs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 04m:39s]
fio: terminating on signal 2
^Zbs: 1 (f=1): [W(1)] [16.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 04m:44s]
[1]+ Stopped /usr/local/bin/fio --ioengine=rbd --clientname=admin --pool=cbt-librbdfio --rbdname=cbt-librbdfio-`hostname -s`-7 --invalidate=0 --rw=write --numjobs=1 --bs=4M --size 16384M --name=librbdfio-`hostname -s`-0
(root@c05-h21-6048r) - (21:13) - (~)
-=>>kill %1

fio: terminating on signal 15
(root@c05-h21-6048r) - (21:13) - ()0KB/0KB/0KB /s] [0/0/0 iops] [eta 04m:59s]
-=>>
(root@c05-h21-6048r) - (21:13) - (
)
1obs: 1 (f=1): [W(1)] [16.7 done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 05m:34s]9s]]
(root@c05-h21-6048r) - (21:13) - ()
-=>>
[1]+ Killed /usr/local/bin/fio --ioengine=rbd --clientname=admin --pool=cbt-librbdfio --rbdname=cbt-librbdfio-`hostname -s`-7 --invalidate=0 --rw=write --numjobs=1 --bs=4M --size 16384M --name=librbdfio-`hostname -s`-0
(root@c05-h21-6048r) - (21:13) - (
)

#2 Updated by Ben England over 3 years ago

To be clear, the problem isn't that librbd gets the error creating a socket (caused by failure to allocate enough file descriptors in some cases). The problem is that librbd does not return a failure to the application so that the sysadmin knows that there is a problem here, and the user experiences a hang instead of a clear indication that something is wrong in the RBD I/O path.

#3 Updated by Ben England over 3 years ago

Here is an article from the red hat customer portal, this problem is apparently not new. One of the symptoms here is different - mkfs.xfs run inside a guest on a Ceph-backed cinder volume hangs, but it's the same issue.

https://access.redhat.com/solutions/1602683

#4 Updated by David Galloway over 3 years ago

  • Project changed from www.ceph.com to Ceph

#5 Updated by Jason Dillaman over 3 years ago

rados bench doesn't use librbd -- the "problem" is at the lower level within librados. We really don't want VMs to crash when you hit a file descriptor limit, which would be the end-result of what you are proposing. From an RBD point-of-view, we do similar things where IO will hang if the cluster is unhealthy, the cluster is full, the pool is over its quota, etc.

#6 Updated by Ian Colle over 3 years ago

  • Subject changed from rados bench hangs because of failure to create socket to librados doesn't properly report failure to create socket
  • Assignee set to Josh Durgin

#7 Updated by Josh Durgin over 3 years ago

  • Status changed from New to Duplicate

This is a non-issue with async messenger, since it uses a fixed number of fds. In jewel, this is a well-known issue that requires high fd limit settings. The error reporting has been improved #15308 already (and already backported), so I'm closing this one as a dup.

#8 Updated by Josh Durgin over 3 years ago

  • Duplicates Feature #15308: LIBRADOS modify Pipe::connect() to return the error code added

Also available in: Atom PDF