Project

General

Profile

Actions

Bug #2715

closed

krbd: spinlock wrong CPU

Added by Sage Weil almost 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


2012-07-04T19:29:21.090076-07:00 plana78 kernel: [15514.186151] BUG: spinlock wrong CPU on CPU#3, rbd/29337
2012-07-04T19:29:21.090092-07:00 plana78 kernel: [15514.207654]  lock: ffffffffa010d8e0, .magic: dead4ead, .owner: rbd/29337, .owner_cpu: 2
2012-07-04T19:29:21.133316-07:00 plana78 kernel: [15514.250817] Pid: 29337, comm: rbd Not tainted 3.4.0-ceph #1
2012-07-04T19:29:21.158012-07:00 plana78 kernel: [15514.275494] Call Trace:
2012-07-04T19:29:21.179210-07:00 plana78 kernel: [15514.296619]  [<ffffffff81328328>] spin_dump+0x78/0xc0
2012-07-04T19:29:21.204454-07:00 plana78 kernel: [15514.321823]  [<ffffffff8132839b>] spin_bug+0x2b/0x40
2012-07-04T19:29:21.230004-07:00 plana78 kernel: [15514.347354]  [<ffffffff81328438>] do_raw_spin_unlock+0x88/0xb0
2012-07-04T19:29:21.257344-07:00 plana78 kernel: [15514.374647]  [<ffffffff8162114b>] _raw_spin_unlock+0x2b/0x40
2012-07-04T19:29:21.285363-07:00 plana78 kernel: [15514.402620]  [<ffffffffa0109cf2>] rbd_put_client+0x42/0x60 [rbd]
2012-07-04T19:29:21.314504-07:00 plana78 kernel: [15514.431707]  [<ffffffffa010b426>] rbd_dev_release+0xe6/0x170 [rbd]
2012-07-04T19:29:21.344534-07:00 plana78 kernel: [15514.461689]  [<ffffffff813f0637>] device_release+0x27/0xa0
2012-07-04T19:29:21.374534-07:00 plana78 kernel: [15514.491634]  [<ffffffff8131758d>] kobject_release+0x8d/0x1d0
2012-07-04T19:29:21.405406-07:00 plana78 kernel: [15514.522453]  [<ffffffff8131740c>] kobject_put+0x2c/0x60
2012-07-04T19:29:21.436270-07:00 plana78 kernel: [15514.553268]  [<ffffffff813f01f7>] put_device+0x17/0x20
2012-07-04T19:29:21.467567-07:00 plana78 kernel: [15514.584505]  [<ffffffff813f124a>] device_unregister+0x2a/0x60
2012-07-04T19:29:21.500144-07:00 plana78 kernel: [15514.617028]  [<ffffffffa010922b>] rbd_remove+0x13b/0x170 [rbd]
2012-07-04T19:29:21.533001-07:00 plana78 kernel: [15514.649829]  [<ffffffff813f2577>] bus_attr_store+0x27/0x30
2012-07-04T19:29:21.565515-07:00 plana78 kernel: [15514.682284]  [<ffffffff811edc06>] sysfs_write_file+0xe6/0x170
2012-07-04T19:29:21.598249-07:00 plana78 kernel: [15514.714963]  [<ffffffff8117da18>] vfs_write+0xc8/0x190
2012-07-04T19:29:21.629500-07:00 plana78 kernel: [15514.746159]  [<ffffffff8117dbd1>] sys_write+0x51/0x90
2012-07-04T19:29:21.659573-07:00 plana78 kernel: [15514.776179]  [<ffffffff816294a9>] system_call_fastpath+0x16/0x1b
2012-07-04T19:30:10.651828-07:00 plana78 kernel: Kernel logging (proc) stopped.
ubuntu@teuthology:/a/teuthology-2012-07-04_19:00:11-regression-master-testing-gcov/5968$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 5a73ba3314ee1ac094ab1cb19b91c7e876861606
nuke-on-error: true
overrides:
  ceph:
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 7fa85790fb568ade3c68ba116c654eb95240d68c
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana73.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAZtaJPPhGrqcL1tRUHvgMMykh4hU2IdrX9cC2hYhotYVpzeejuA2TkmRdFhno54E6BM4dRIh3Wqv95YXpsToxOvzGp46lmRl3aNrLeWEEE5KxwgwTWuK72vMpZTuD/8XClj/faeFYuKvdiFFK0jGiX+j74HBOOrl3A3WdCi+/xFJZ/b5vZUJRFZxEkAkpy868iebiMgMHWDzs9A8GVb35MGf8nQGwFtPQlHEtgQNK51Wm7rj3qJM1tP1ZL2ghtrck9mdLBUzo6Fj1YTifewKt/PhteEZ1aP8lMpakOoRrJWawM27XQ7XJZDldZcI4REvF3suJLbiQHzdGArUTG3oz
  ubuntu@plana78.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5u611hXVaR3i+fQJ3IWIw9y4mhS0WrusfWBkk4lDeuEyaJy0SqxbN6D5GhB7KzLZfpyiVcmdFzxk4w9Kc3XgQpJ42mNx4EDTen5ndI8HL8GclRd71RaocVIerynAD94lCAVzLSQCRBOA3HLH35OITPMR+ztZG8zSXUhhdYpwCLTJVb2BE8bA47GzyhCvu4L1JYwnIRzVUrufwugSS4odhlBJvge++omtL+r3Cm8M8W3Ahow+Yk8Ni40m60j82CEs7/Mc95j/CXYIQzwJWkIeRoQMMgT3HmK9koJRzEBkK8FQS9KMnrEi6mJAa+8eL/AgVMvyxpz+W7z77cH5p2Kab
  ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCv9ah6KrNde/qjaxGaNCBWMlzzprTwkPmQetLrGhY+MsTz9VN6CRBegYq94TcOjE6A+w3swGNkuPzgG+meNGKV3fq+rggCEDJq+lPPgvM2dU86d0eoPr3m4BYgHcEdxcIzkaZ6f+F+vC956Lc1a1DGRyz6ygAuD8HJh/4FKhGe+PaaQ2EtvrwY8zEj+i9PARj9PHttLYLgsaOrdhszJkXBPGHGSJ7jaBI/AViv12WmYN9WpXWE+khICRpqfm2PIhzKZOgrRBMUuH8YpHTh8hUH5HjBtC2kNS6FPowrNhJU4luFQWV5un85CDRHthfGW+e8/ZelH56NfS1oOxnRM7jL
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- rbd:
    all:
      image_size: 20480
- workunit:
    clients:
      all:
      - suites/iozone.sh
Actions #1

Updated by Tamilarasi muthamizhan almost 12 years ago

latest logs:ubuntu@teuthology:/a/teuthology-2012-07-11_19:00:11-regression-master-testing-gcov/9371

12-07-11T19:36:03.524831-07:00 plana31 kernel: [ 2506.796650] BUG: spinlock wrong CPU on CPU#3, rbd/31496
2012-07-11T19:36:03.524848-07:00 plana31 kernel: [ 2506.818397] lock: ffffffffa02168e0, .magic: dead4ead, .owner: rbd/31496, .owner_cpu: 6
2012-07-11T19:36:03.592569-07:00 plana31 kernel: [ 2506.861640] Pid: 31496, comm: rbd Not tainted 3.4.0-ceph #1
2012-07-11T19:36:03.592579-07:00 plana31 kernel: [ 2506.886033] Call Trace:
2012-07-11T19:36:03.638953-07:00 plana31 kernel: [ 2506.907084] [<ffffffff81328328>] spin_dump+0x78/0xc0
2012-07-11T19:36:03.638965-07:00 plana31 kernel: [ 2506.932331] [<ffffffff8132839b>] spin_bug+0x2b/0x40
2012-07-11T19:36:03.691554-07:00 plana31 kernel: [ 2506.957706] [<ffffffff81328438>] do_raw_spin_unlock+0x88/0xb0
2012-07-11T19:36:03.691563-07:00 plana31 kernel: [ 2506.984839] [<ffffffff8162114b>] _raw_spin_unlock+0x2b/0x40
2012-07-11T19:36:03.748253-07:00 plana31 kernel: [ 2507.012493] [<ffffffffa0212cf2>] rbd_put_client+0x42/0x60 [rbd]
2012-07-11T19:36:03.748263-07:00 plana31 kernel: [ 2507.041436] [<ffffffffa0214426>] rbd_dev_release+0xe6/0x170 [rbd]
2012-07-11T19:36:03.807948-07:00 plana31 kernel: [ 2507.071253] [<ffffffff813f0637>] device_release+0x27/0xa0
2012-07-11T19:36:03.807958-07:00 plana31 kernel: [ 2507.101030] [<ffffffff8131758d>] kobject_release+0x8d/0x1d0
2012-07-11T19:36:03.869370-07:00 plana31 kernel: [ 2507.131711] [<ffffffff8131740c>] kobject_put+0x2c/0x60
2012-07-11T19:36:03.869382-07:00 plana31 kernel: [ 2507.162346] [<ffffffff813f01f7>] put_device+0x17/0x20
2012-07-11T19:36:03.932852-07:00 plana31 kernel: [ 2507.193398] [<ffffffff813f124a>] device_unregister+0x2a/0x60
2012-07-11T19:36:03.932862-07:00 plana31 kernel: [ 2507.225715] [<ffffffffa021222b>] rbd_remove+0x13b/0x170 [rbd]
2012-07-11T19:36:03.997753-07:00 plana31 kernel: [ 2507.258307] [<ffffffff813f2577>] bus_attr_store+0x27/0x30
2012-07-11T19:36:03.997764-07:00 plana31 kernel: [ 2507.290501] [<ffffffff811edc06>] sysfs_write_file+0xe6/0x170
2012-07-11T19:36:04.061230-07:00 plana31 kernel: [ 2507.322899] [<ffffffff8117da18>] vfs_write+0xc8/0x190
2012-07-11T19:36:04.061239-07:00 plana31 kernel: [ 2507.353872] [<ffffffff8117dbd1>] sys_write+0x51/0x90
2012-07-11T19:36:04.122300-07:00 plana31 kernel: [ 2507.383662] [<ffffffff816294a9>] system_call_fastpath+0x16/0x1b
2012-07-11T19:36:13.211027-07:00 plana31 kernel: Kernel logging (proc) stopped.

ubuntu@teuthology:/a/teuthology-2012-07-11_19:00:11-regression-master-testing-gcov/9371$ cat config.yaml
kernel: &id001
kdb: true
sha1: ea18acf27e2f7cee4ac9d01719564414d2cd64b5
nuke-on-error: true
overrides:
ceph:
coverage: true
fs: btrfs
log-whitelist:
- slow request
sha1: 0782db3694e10db0cdeb678d5771f378e1a372ca
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYDpptyOTrVH3RqiwH5A//Q2CkkVz5dPTpd/s8qG/Q4EHVA4WDMu80pcDvdSewOfFJl83MEtDKKjuJOuEzI4OGn0DPptDN5wHC1OWrXqFMcIaWVe/KBYOdWEZbA7FECeXgEZR1Sid2bH7XDUE9AYalpS2/SmuuHEU1ObL6zSpAqoY6AIPCR6LgFrtxAqrYmIdpb8YfSuI5uPBv6qikl0yvam06WNerUNQ9lnZXFmFm1wBeicRvWH3jZ6w/xlQBIp/zG6k9IJa0vaLm+FqztLkDWri8Qz1dbdsz0bNjyzD6iRuDOpgmz0Kf8m2IjaJRgRgz2ARcOOdBJKmwnnW/knk5
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5J4n7rTsH+IMjGAu+EfhukuK5+zScoSaPIfXDOUU8LfvuI/3x8Luiyv9eRVwZgwuLBWZ/zorBbGZ+G2Iaxy3632AG/XE7cRZA9AxzZT+Qvm9D+BW+Uletgf92cttKMk7qwK3DetQwRKKl6AMv0SDpUff+nzqnJH6LMS8zoBPVXDHFM3Lup8h9H6DYEs1F/Zn8LVSw8hNiD279rg1n1hqWdItmnKBPKyC/qkRoPa6h7gDU6FPaBiNhuhBd0016XGrVwL7Y8gqoDBiArP+NDt1lcnbeiK43bFhqW+pYovOdIA2MJC6z+bkZDlOJdxoz9mDP0cJZBdB43v3UdbS1R+WT
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDZv0weNolvyE+m05bRiGKWyjAbZs/nYL1/5DAdr8QYbmjTSrEjycJ2iZCHVPXHHPTkFN4vWtWQzfEE2Ra8T/Ti0w65C+H6HNwckWvLk0RRYWSFNLjfvTR+0OeCNBbpTcCBaeGpIdJhMcM9k6eek5GGm1djc7ZgG11jepzVe6HiKKbh2roc/EZGuADs8sY/bBf0cRbhsPc/1EJ2sxd8CUmnWXrGfCj5izW/1bAdyBQcAZPpMp5OJkuAT2OznMVYWxkg54JM8TlKKj1T8nccEC5+c01Dbe0vAxuIPCeU2obkxr+VvQN3oJbhUXFDYv9PCNaS0LReuVBKVN9bRYn97xXB
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- rbd:
all: null
- workunit:
clients:
all:
- suites/fsx.sh
ubuntu@teuthology:/a/teuthology-2012-07-11_19:00:11-regression-master-testing-gcov/9371$ cat summary.yaml
ceph-sha1: 0782db3694e10db0cdeb678d5771f378e1a372ca
description: collection:basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:rbd_workunit_suites_fsx.yaml
duration: 184.97839188575745
failure_reason: '''/tmp/cephtest/archive/syslog/kern.log:2012-07-11T19:36:03.524831-07:00
plana31 kernel: [ 2506.796650] BUG: spinlock wrong CPU on CPU#3, rbd/31496

'' in syslog'
flavor: gcov
owner: scheduled_teuthology@teuthology
success: false
Actions #2

Updated by Sage Weil almost 12 years ago

  • Status changed from New to 7

hoping this was caused by the mutex-less con_open, or something similar. will keep this open for a few more days to make sure it's gone.

Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF