Project

General

Profile

Actions

Bug #7620

closed

BUG: soft lockup - CPU#0 stuck for 23s!

Added by Zack Cerza about 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Noticed this hung job:

http://pulpito.front.sepia.ceph.com/teuthology-2014-03-04_19:01:51-rbd-dumpling-testing-basic-plana/116699/
http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-04_19:01:51-rbd-dumpling-testing-basic-plana/116699/teuthology.log


2014-03-04T23:09:03.958 INFO:teuthology.task.qemu.client.0.out:[10.214.133.24]: mount: block device /dev/sr0 is write-protected, mounting read-only
2014-03-05T03:23:44.115 INFO:teuthology.task.qemu.client.0.out:[10.214.133.24]: [15292.040016] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:11087]
2014-03-05T03:23:44.115 INFO:teuthology.task.qemu.client.0.out:[10.214.133.24]: [15292.040736] Stack:
2014-03-05T03:23:44.115 INFO:teuthology.task.qemu.client.0.out:[10.214.133.24]: [15292.040949] Call Trace:
2014-03-05T03:23:44.115 INFO:teuthology.task.qemu.client.0.out:[10.214.133.24]: [15292.041398] Code: dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8 00 01 00 00 48 89 e5 3e 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 <0f> b6 07 38 d0 75 f7 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00

Not killing it, to give others a chance to investigate

Actions #1

Updated by Yuri Weinstein about 10 years ago

Test failed with similar error.

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-24_20:35:03-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/213298/

2014-04-25T02:25:14.274 INFO:teuthology.orchestra.run.out:[10.214.138.107]: rsyslog start/running, process 12755
2014-04-25T02:25:14.276 INFO:teuthology.task.internal:Checking logs for errors...
2014-04-25T02:25:14.276 DEBUG:teuthology.task.internal:Checking ubuntu@vpm040.front.sepia.ceph.com
2014-04-25T02:25:14.276 DEBUG:teuthology.orchestra.run:Running [10.214.138.94]: "egrep --binary-files=text '\\bBUG\\b|\\bINFO\\b|\\bDEADLOCK\\b' /home/ubuntu/cephtest/archive/syslog/*.log | grep -v 'task .* blocked for more than .* seconds' | grep -v 'lockdep is turned off' | grep -v 'trying to register non-static key' | grep -v 'DEBUG: fsize' | grep -v CRON | grep -v 'BUG: bad unlock balance detected' | grep -v 'inconsistent lock state' | grep -v '*** DEADLOCK ***' | grep -v 'INFO: possible irq lock inversion dependency detected' | grep -v 'INFO: NMI handler (perf_event_nmi_handler) took too long to run' | grep -v 'INFO: recovery required on readonly' | head -n 1" 
2014-04-25T02:25:14.441 DEBUG:teuthology.task.internal:Checking ubuntu@vpm039.front.sepia.ceph.com
2014-04-25T02:25:14.441 DEBUG:teuthology.orchestra.run:Running [10.214.138.96]: "egrep --binary-files=text '\\bBUG\\b|\\bINFO\\b|\\bDEADLOCK\\b' /home/ubuntu/cephtest/archive/syslog/*.log | grep -v 'task .* blocked for more than .* seconds' | grep -v 'lockdep is turned off' | grep -v 'trying to register non-static key' | grep -v 'DEBUG: fsize' | grep -v CRON | grep -v 'BUG: bad unlock balance detected' | grep -v 'inconsistent lock state' | grep -v '*** DEADLOCK ***' | grep -v 'INFO: possible irq lock inversion dependency detected' | grep -v 'INFO: NMI handler (perf_event_nmi_handler) took too long to run' | grep -v 'INFO: recovery required on readonly' | head -n 1" 
2014-04-25T02:25:14.616 DEBUG:teuthology.task.internal:Checking ubuntu@vpm038.front.sepia.ceph.com
2014-04-25T02:25:14.617 DEBUG:teuthology.orchestra.run:Running [10.214.138.107]: "egrep --binary-files=text '\\bBUG\\b|\\bINFO\\b|\\bDEADLOCK\\b' /home/ubuntu/cephtest/archive/syslog/*.log | grep -v 'task .* blocked for more than .* seconds' | grep -v 'lockdep is turned off' | grep -v 'trying to register non-static key' | grep -v 'DEBUG: fsize' | grep -v CRON | grep -v 'BUG: bad unlock balance detected' | grep -v 'inconsistent lock state' | grep -v '*** DEADLOCK ***' | grep -v 'INFO: possible irq lock inversion dependency detected' | grep -v 'INFO: NMI handler (perf_event_nmi_handler) took too long to run' | grep -v 'INFO: recovery required on readonly' | head -n 1" 
2014-04-25T02:25:19.267 ERROR:teuthology.task.internal:Error in syslog on ubuntu@vpm038.front.sepia.ceph.com: /home/ubuntu/cephtest/archive/syslog/kern.log:2014-04-25T04:36:10.915422+00:00 vpm038 kernel: [ 2080.060036] BUG: soft lockup - CPU#0 stuck for 24s! [rwhod:1265]

2014-04-25T02:25:19.268 INFO:teuthology.task.internal:Compressing syslogs...
2014-04-25T02:25:19.268 DEBUG:teuthology.orchestra.run:Running [10.214.138.107]: "find /home/ubuntu/cephtest/archive/syslog -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --" 
2014-04-25T02:25:19.271 DEBUG:teuthology.orchestra.run:Running [10.214.138.96]: "find /home/ubuntu/cephtest/archive/syslog -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --" 
2014-04-25T02:25:19.274 DEBUG:teuthology.orchestra.run:Running [10.214.138.94]: "find /home/ubuntu/cephtest/archive/syslog -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --" 
archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-24_20:35:03-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/213298
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rados_api_tests.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
email: null
job_id: '213298'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-04-24_20:35:03-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '12.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
  s3tests:
    branch: master
  workunit:
    sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@vpm038.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDFyMM1axZoScxRE3sY8ImoFGCHd+vZG2KaLx/9xlG8JmluSW6yvj6kz2Vg7hPbFppKS+NsIGsDDvtJYDmHbEKilQlzAcAbvBeuPna+VmyEfaaknnz7LzN59+G06itecO3Ix1PfW9c5FbLQiaZ1go1stfkTwpS3gzu1PJG3wPlF0RKyh0Y12FiybXncvciD/Rbhjq5axyZMGbe1vAIuL71/YcpBWdRPSL9Nsom24cIPufNhosOcQKEw7mYfby0/qYgExA2h7wDS90WEQgr7Tx9j0icYF/tqPzzIAoTrUlULJgk0fUycuj6ckPEWHwZJ01P0HDCdaAHrXY9kpLlaxRB
  ubuntu@vpm039.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCw8YQxiNxcHF8Zda6Lbvyu/VixWj2Fh2vDzx8ftawS+h0umt3giu4fwk8XBWybmw65jT2+lEAVDo8vqB72v+94X7Yf05kK0c+BzCtESp7pFhCy0L+kx5eBMQPBy9hwdEJxxgJkzgg6omVOteSxP0QN/E+Q7rHdPKKbai77EuJL8elBDPDOabBpGvJ/WHv5wWRMKxlWbrDWX1Ywr4noJ3/btzBbSnlIcQgOXJYuYIV/3tBSnej5Qn332ZANJT0yFf+5e07HcIT//P6RBlA2+pQmiWBdiihrtmJ9oT6f/DsXvaNZ/RHr0onBVo5IU3w/WT7Iln3vC7NtEHugiLaAR+FZ
  ubuntu@vpm040.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8lY1htTUcngc7uQ8fqkA4Co8iBWMNytgkKCaENRLmA71VlLrZdqlraeWyOmBmyQ9bkdO1omCvqzf5gshqNrJkEam2Ve7tLFTkvcuZbSyy/TRTdSLiQ3cN8wKRxgQqbl0QMm0VD5fTHkoRWAIa7MwtdtZRH6tb9VPd7K2MGTME/EzlMXv9qutCk1zaVk+p98HKCXpp6wqgypTTf45s/xtUAPCZxiDhQpRkk4F7qGdkn6ws42w4CJ1m9WQAK01k1eesfUzZItooU9f4WVb77YTzHpcLAgVtLm4U+l70eVxvl42ClUhGyN9D3mCQgPs1EqzdM+qtbJU+Nq+dNE7ruVKr
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.30566
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rados_api_tests.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
duration: 20001.275528907776
failure_reason: '''/home/ubuntu/cephtest/archive/syslog/kern.log:2014-04-25T04:36:10.915422+00:00
  vpm038 kernel: [ 2080.060036] BUG: soft lockup - CPU#0 stuck for 24s! [rwhod:1265]

  '' in syslog'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF