Project

General

Profile

Actions

Bug #3075

closed

rados python tests occasionally hang with ms failures

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
librados
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


kernel: &id001
  branch: testing
  kdb: true
nuke-on-error: true
overrides:
  ceph:
    branch: wip-3070
    conf:
      client:
        debug client: 20
        debug monc: 20
      global:
        debug ms: 20
        ms inject socket failures: 500
      mds:
        debug mds: 20
    fs: btrfs
    log-whitelist:
    - slow request
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana54.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC60+3L8IN2WBpWHY94YAuCOMVdKs3xUqYQpO1ie127fBomk7fiEhR0RovhmDHWIzNr3qvvNkIp9Y+MHUcpZ7C4MFFMYsy2+zq026Ag3XLEOyZWDSyPfMapd5+nmuvxJqEvAx4wAWBhYVEB3aPFmDmz4mayZ9aSYoA1lhsClxfYpAHZ0zRWX3kY1KxXlk6UrZy0igYGvKIvmubkYcmFzOPsI3aWpgWU1rEXGWsFHOlwaor0KJPnpEsZYTrlPyLZqJcKbI/EcHgti0ak22vsDT7LVMKoyPXeUFL5ZGUEpuqQ+IMiECCMKa8X8vPG2MN9V6DK3gQezF+lo5CRCAu7DYdn
  ubuntu@plana55.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCdrzGTR0Fbl6sedYlwlX+FlmF6fuE3l/RTu2kzOkmG47rPEn5CI37Injb7Epc50RXCbUIfzmDqtEY6uZT3YssYrE4jvhQlynPndbn1KmiTbgxTyuumGXv7O4OOntezighA1W49phUNZys1DhdEEO8VSQAIdHrBgBLhY9DDgC4LAhrP4BSbDTN0rUXtYYHBj4aa3sJV0o3sKjpsyjjlieEQnto6JkjK6EGZCSuY+AyMZyLJjFTgMwJ9i4aC5eZoWZAWSDfDsxo8PtFR+kjUmz5uiheyn5lAzKBxmd4ZNojf7wOhSGia0ghbtUeQkdoRZXZhP2ourNn3uAguf1xt43kX
  ubuntu@plana57.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuMOcu2XPQovy/Qzmwyvc9tvGP9JZVJ6cqiJ3RPOSGgAifKLTxe2ramHpD8AKcdthu8VAfouFpZK4CtBWKJowurR+4yZKgEugzvYuZ/nK/np56vreBQmRBWD1vLPtxPsTT3YGu5qx+ixdSwrSxexxc0/7+EW9x1D6knL+OGUNWksoGIRlXxjh9qafbw/1XKeQQF28vxBXHofXUFY8USMUcq5HDuaFfmgKzufH6vk84oqyr/jtGej6b4g6tbGiHPYR+o5tmTQHyxpOxqLZP2RFFqHlQ/QaOmRvSNIoOo+1UbqdcWsLk16/lXIS1mI+BZsZouk1H+fGeMTEUDGktiPW7
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test_python.sh

i got 2 hangs out of ~200 runs

gdb attach to python didn't say much, except that it was blocking in RadosClient::pool_delete in one case and


#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f6bfc279196 in Wait (mutex=..., this=0x7fff537bf790) at ./common/Cond.h:55
#2  librados::RadosClient::pool_create (this=<optimized out>, name=..., auid=0, crush_rule=0 '\000') at librados/RadosClient.cc:413
#3  0x00007f6bfc26c839 in rados_pool_create (cluster=0x12faf10, name=<optimized out>) at librados/librados.cc:1652

in another. probably an issue in the monclient/mon interaction...

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-fuse2/14739
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-fuse2/14564

Actions #1

Updated by Sage Weil over 11 years ago

void Objecter::maybe_request_map(epoch_t epoch)
{
  int flag = 0;
  if (osdmap->test_flag(CEPH_OSDMAP_FULL)) {
    ldout(cct, 10) << "maybe_request_map subscribing (continuous) to next osd map (FULL flag is set)" << dendl;
  } else {
    ldout(cct, 10) << "maybe_request_map subscribing (onetime) to next osd map" << dendl;
    flag = CEPH_SUBSCRIBE_ONETIME;
  }
  if (!epoch) {
    epoch = osdmap->get_epoch() ? osdmap->get_epoch()+1 : 0;
  }
  if (monc->sub_want("osdmap", epoch, flag))
    monc->renew_subs();
}

this is broken. the epoch we send with the subscription is the start epoch, not what we (think we) want.
Actions #2

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF