Project

General

Profile

Actions

Bug #3491

closed

test_librbd_fsx: too many open files

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this is probably something runaway in the msgr, but:

ubuntu@teuthology:/a/teuthology-2012-11-13_19:00:04-regression-next-testing-basic/15008$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 22cddde104d715600a4c218bf9224923208afe90
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 7926ef53935313501d4a7fe0e587f3e3b00b313c
  s3tests:
    branch: next
  workunit:
    sha1: 7926ef53935313501d4a7fe0e587f3e3b00b313c
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana42.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzQfmtpfECJ+NZaaiSH/R8X+dGXHH+aDTCKGLLiHhW9fttxzfzcJJaBx1b664D3ynZAC7NiaegfLDTCMW7FFVDUltMQcWjsM4BqfFipIquDP4KOclCc6EwG5aYG/MLCJwL6sovt1uKg00bSkVQsUSHBgZbMJKCjCbBb0XPxfuS4dppA3diEZBOMt1YHr+NdV7sace/Gc7YBlGsNOinnqkKfVWIpfYCiTQ18cvaisSEHsQR6zhKqrX4afQk13cTjdvZeQp9AXxRIf1g9fq2zHVWMdJdVNR8D0BSBtfAzMqIqZ8qcJqmzQN0Zq9Wk9Y021vMFORZy2SFI6c7yBWDJLdT
  ubuntu@plana44.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYE0eu9E8TQwtUy89Wldp54VbNBEoO9XQf77eXXzzmNwYUFRrNX0mZV/I8GqyRJuMrPG8V4aZBthBHTtnEmQ6RAS7fVdthi/hEgwnM9cAqY3KX9mR5xJnHBc/fa5KLrnSr3Wrztf42PpQNEN5Tk55K6wWUlZOTHU3vE0j3kF+YQ5FeBhQbghztHPKFR8bOmZJp9TpbXgbvEM2RWr9bYtro1KuQOgrairyVVNWdAuwZuxSQT4soyHoSkY9JmeXKsNRAOamxH9w57mDC3PXui7r6Fp8OCWSK+GmlLTtPaZtulSCcucaZtpVae7F4s9JNxaRl5RxuUtwMRfgAHGlL2BZv
  ubuntu@plana55.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCdrzGTR0Fbl6sedYlwlX+FlmF6fuE3l/RTu2kzOkmG47rPEn5CI37Injb7Epc50RXCbUIfzmDqtEY6uZT3YssYrE4jvhQlynPndbn1KmiTbgxTyuumGXv7O4OOntezighA1W49phUNZys1DhdEEO8VSQAIdHrBgBLhY9DDgC4LAhrP4BSbDTN0rUXtYYHBj4aa3sJV0o3sKjpsyjjlieEQnto6JkjK6EGZCSuY+AyMZyLJjFTgMwJ9i4aC5eZoWZAWSDfDsxo8PtFR+kjUmz5uiheyn5lAzKBxmd4ZNojf7wOhSGia0ghbtUeQkdoRZXZhP2ourNn3uAguf1xt43kX
task:
- ceph:
    conf:
      client:
        rbd cache: true
        rbd cache max dirty: 0
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- rbd_fsx:
    clients:
    - client.0
    ops: 5000

tail of log:
2012-11-14 09:10:46.378617 7f4ccbc81700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6803/21803 pipe(0x7f4f100f7920 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files
2012-11-14 09:10:46.378640 7f4ccb77c700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6806/21804 pipe(0x7f4f103151a0 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files
2012-11-14 09:10:46.378647 7f4ccb87d700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.36:6800/10787 pipe(0x7f4f10064f90 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files
2012-11-14 09:10:47.088595 7f4ccb67b700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6789/0 pipe(0x7f4f0c0aa300 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files
2012-11-14 09:10:47.088642 7f4ccb67b700  0 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6789/0 pipe(0x7f4f0c0aa300 sd=-1 :0 pgs=0 cs=0 l=1).fault
2012-11-14 09:10:47.088660 7f4ccb67b700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6789/0 pipe(0x7f4f0c0aa300 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files
2012-11-14 09:10:47.288840 7f4ccb67b700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6789/0 pipe(0x7f4f0c0aa300 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files
2012-11-14 09:10:47.689041 7f4ccb67b700 -1 -- 10.214.132.23:0/1030198 >> 10.214.132.34:6789/0 pipe(0x7f4f0c0aa300 sd=-1 :0 pgs=0 cs=0 l=1).connect couldn't created socket Too many open files

Actions #1

Updated by Sage Weil over 11 years ago

a zillion msgr threasd blocked behind

Thread 916 (Thread 0x7f4e1a5e5700 (LWP 22937)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f4f2a00a085 in Wait (mutex=..., this=0x7f4efdd8b680) at ./common/Cond.h:55
#2  Throttle::_wait (this=0x1eccf10, c=33) at common/Throttle.cc:87
#3  0x00007f4f2a00ad64 in Throttle::get (this=0x1eccf10, c=33, m=0) at common/Throttle.cc:142
#4  0x00007f4f2a0c4be6 in Pipe::read_message (this=0x7f4f0c027700, pm=0x7f4e1a5e4db0) at msg/Pipe.cc:1487
#5  0x00007f4f2a0d6190 in Pipe::reader (this=0x7f4f0c027700) at msg/Pipe.cc:1199
#6  0x00007f4f2a0d8e3d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#7  0x00007f4f2959de9a in start_thread (arg=0x7f4e1a5e5700) at pthread_create.c:308
#8  0x00007f4f298a54bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9  0x0000000000000000 in ?? ()

this is probably a deadlock where we aren't releasing anything to the throttle, and the msgr is faulting/retrying because of timeouts and such.

frustrating that mark_down won't zap the guys blocked on the throttler, but shouldn't be a problem if the throttler isn't blocked indefinitely... ?

process is still running

Actions #2

Updated by Sage Weil over 11 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved

commit:12c2b7fa20be6878bc0763404d2a5c648e5fadbc

Actions

Also available in: Atom PDF