Project

General

Profile

Actions

Bug #13905

closed

"BUG: held lock freed!"

Added by John Spray over 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
powercycle
Crash signature (v1):
Crash signature (v2):

Description

Seen:
http://pulpito.ceph.com/teuthology-2015-11-27_23:10:02-knfs-master-testing-basic-multi/1161921/
http://pulpito.ceph.com/sage-2015-11-26_05:24:56-rados-wip-sage-testing---basic-multi/1160684/

2015-11-26T10:59:24.851878-05:00 plana41 kernel: =========================
2015-11-26T10:59:24.851930-05:00 plana41 kernel: [ BUG: held lock freed! ]
2015-11-26T10:59:24.851964-05:00 plana41 kernel: 4.4.0-rc2-ceph-12616-gb0098c3 #1 Tainted: G          I
2015-11-26T10:59:24.851995-05:00 plana41 kernel: -------------------------
2015-11-26T10:59:24.852046-05:00 plana41 kernel: ceph-osd/22228 is freeing memory ffff8801cd571ec0-ffff8801cd5728ef, with a lock still held there!
2015-11-26T10:59:24.852080-05:00 plana41 kernel: (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0
2015-11-26T10:59:24.852112-05:00 plana41 kernel: 5 locks held by ceph-osd/22228:
2015-11-26T10:59:24.852143-05:00 plana41 kernel: #0:  (&(&p->alloc_lock)->rlock){+.+...}, at: [<ffffffff810c94bd>] switch_task_namespaces+0x3d/0x80
2015-11-26T10:59:24.852180-05:00 plana41 kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816848eb>] netif_receive_skb_internal+0x4b/0x1f0
2015-11-26T10:59:24.852233-05:00 plana41 kernel: #2:  (rcu_read_lock){......}, at: [<ffffffff816cd27f>] ip_local_deliver_finish+0x3f/0x380
2015-11-26T10:59:24.852294-05:00 plana41 kernel: #3:  (slock-AF_INET){+.-.-.}, at: [<ffffffff8166be3d>] sk_clone_lock+0x19d/0x4a0
2015-11-26T10:59:24.852349-05:00 plana41 kernel: #4:  (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0
2015-11-26T10:59:24.852408-05:00 plana41 kernel:
stack backtrace:
2015-11-26T10:59:24.852464-05:00 plana41 kernel: CPU: 2 PID: 22228 Comm: ceph-osd Tainted: G          I     4.4.0-rc2-ceph-12616-gb0098c3 #1
2015-11-28T07:39:09.676922-05:00 plana77 kernel: =========================
2015-11-28T07:39:09.676952-05:00 plana77 kernel: [ BUG: held lock freed! ]
2015-11-28T07:39:09.676980-05:00 plana77 kernel: 4.4.0-rc2-ceph-12616-gb0098c3 #1 Tainted: G          I
2015-11-28T07:39:09.677016-05:00 plana77 kernel: -------------------------
2015-11-28T07:39:09.677044-05:00 plana77 kernel: ceph-osd/32180 is freeing memory ffff880224dca900-ffff880224dcb32f, with a lock still held there!
2015-11-28T07:39:09.677073-05:00 plana77 kernel: (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0
2015-11-28T07:39:09.677100-05:00 plana77 kernel: 4 locks held by ceph-osd/32180:
2015-11-28T07:39:09.677127-05:00 plana77 kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff81683b8d>] process_backlog+0x14d/0x240
2015-11-28T07:39:09.677155-05:00 plana77 kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816cd27f>] ip_local_deliver_finish+0x3f/0x380
2015-11-28T07:39:09.677182-05:00 plana77 kernel: #2:  (slock-AF_INET){+.-.-.}, at: [<ffffffff8166be3d>] sk_clone_lock+0x19d/0x4a0
2015-11-28T07:39:09.677209-05:00 plana77 kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0
2015-11-28T07:39:09.677237-05:00 plana77 kernel:
stack backtrace:
2015-11-28T07:39:09.677269-05:00 plana77 kernel: CPU: 4 PID: 32180 Comm: ceph-osd Tainted: G          I     4.4.0-rc2-ceph-12616-gb0098c3 #1
Actions #1

Updated by John Spray over 8 years ago

  • Category set to OSD
  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil over 8 years ago

  • Project changed from Ceph to Linux kernel client
  • Subject changed from ceph-osd "BUG: held lock freed!" to "BUG: held lock freed!"
  • Category deleted (OSD)
Actions #3

Updated by Ilya Dryomov about 8 years ago

  • Status changed from New to Closed

This looks like a problem in the networking stack, the kernel client isn't in play here. Closing, given that it was a random testing kernel and the issue didn't repeat itself.

Actions #4

Updated by Loïc Dachary about 8 years ago

  • Status changed from Closed to 12
Actions #5

Updated by Ilya Dryomov about 8 years ago

No - this occurrence is on a very recent kernel, so could be a kernel bug. I'll look into it as time allows.

Actions #6

Updated by Ilya Dryomov about 8 years ago

  • Priority changed from Urgent to Normal
Actions #7

Updated by Ilya Dryomov about 8 years ago

  • Assignee set to Ilya Dryomov
Actions #8

Updated by Yuri Weinstein about 8 years ago

  • Priority changed from Normal to Urgent
  • Source changed from other to Q/A
  • Release set to infernalis
  • ceph-qa-suite powercycle added

infernalis v9.2.1 testing
Run: http://pulpito.ceph.com/teuthology-2016-02-18_09:34:39-powercycle-infernalis-testing-basic-smithi/
Jobs: ['15330', '15344', '15352', '15354', '15355', '15368', '15370']

Actions #9

Updated by Ilya Dryomov about 8 years ago

  • Status changed from 12 to Closed

This indeed turned out to be a regression in the networking layer, see http://www.spinics.net/lists/ceph-devel/msg28655.html.

I pushed Eric's patch ("tcp/dccp: fix another race at listener dismantle") for it and a prerequisite ("tcp: md5: release request socket instead of listener") to testing to avoid further failures.

Actions

Also available in: Atom PDF