Bug #13905
closed"BUG: held lock freed!"
0%
Description
Seen:
http://pulpito.ceph.com/teuthology-2015-11-27_23:10:02-knfs-master-testing-basic-multi/1161921/
http://pulpito.ceph.com/sage-2015-11-26_05:24:56-rados-wip-sage-testing---basic-multi/1160684/
2015-11-26T10:59:24.851878-05:00 plana41 kernel: ========================= 2015-11-26T10:59:24.851930-05:00 plana41 kernel: [ BUG: held lock freed! ] 2015-11-26T10:59:24.851964-05:00 plana41 kernel: 4.4.0-rc2-ceph-12616-gb0098c3 #1 Tainted: G I 2015-11-26T10:59:24.851995-05:00 plana41 kernel: ------------------------- 2015-11-26T10:59:24.852046-05:00 plana41 kernel: ceph-osd/22228 is freeing memory ffff8801cd571ec0-ffff8801cd5728ef, with a lock still held there! 2015-11-26T10:59:24.852080-05:00 plana41 kernel: (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0 2015-11-26T10:59:24.852112-05:00 plana41 kernel: 5 locks held by ceph-osd/22228: 2015-11-26T10:59:24.852143-05:00 plana41 kernel: #0: (&(&p->alloc_lock)->rlock){+.+...}, at: [<ffffffff810c94bd>] switch_task_namespaces+0x3d/0x80 2015-11-26T10:59:24.852180-05:00 plana41 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816848eb>] netif_receive_skb_internal+0x4b/0x1f0 2015-11-26T10:59:24.852233-05:00 plana41 kernel: #2: (rcu_read_lock){......}, at: [<ffffffff816cd27f>] ip_local_deliver_finish+0x3f/0x380 2015-11-26T10:59:24.852294-05:00 plana41 kernel: #3: (slock-AF_INET){+.-.-.}, at: [<ffffffff8166be3d>] sk_clone_lock+0x19d/0x4a0 2015-11-26T10:59:24.852349-05:00 plana41 kernel: #4: (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0 2015-11-26T10:59:24.852408-05:00 plana41 kernel: stack backtrace: 2015-11-26T10:59:24.852464-05:00 plana41 kernel: CPU: 2 PID: 22228 Comm: ceph-osd Tainted: G I 4.4.0-rc2-ceph-12616-gb0098c3 #1
2015-11-28T07:39:09.676922-05:00 plana77 kernel: ========================= 2015-11-28T07:39:09.676952-05:00 plana77 kernel: [ BUG: held lock freed! ] 2015-11-28T07:39:09.676980-05:00 plana77 kernel: 4.4.0-rc2-ceph-12616-gb0098c3 #1 Tainted: G I 2015-11-28T07:39:09.677016-05:00 plana77 kernel: ------------------------- 2015-11-28T07:39:09.677044-05:00 plana77 kernel: ceph-osd/32180 is freeing memory ffff880224dca900-ffff880224dcb32f, with a lock still held there! 2015-11-28T07:39:09.677073-05:00 plana77 kernel: (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0 2015-11-28T07:39:09.677100-05:00 plana77 kernel: 4 locks held by ceph-osd/32180: 2015-11-28T07:39:09.677127-05:00 plana77 kernel: #0: (rcu_read_lock){......}, at: [<ffffffff81683b8d>] process_backlog+0x14d/0x240 2015-11-28T07:39:09.677155-05:00 plana77 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816cd27f>] ip_local_deliver_finish+0x3f/0x380 2015-11-28T07:39:09.677182-05:00 plana77 kernel: #2: (slock-AF_INET){+.-.-.}, at: [<ffffffff8166be3d>] sk_clone_lock+0x19d/0x4a0 2015-11-28T07:39:09.677209-05:00 plana77 kernel: #3: (&(&queue->rskq_lock)->rlock){+.-.-.}, at: [<ffffffff816da598>] inet_csk_reqsk_queue_add+0x28/0xa0 2015-11-28T07:39:09.677237-05:00 plana77 kernel: stack backtrace: 2015-11-28T07:39:09.677269-05:00 plana77 kernel: CPU: 4 PID: 32180 Comm: ceph-osd Tainted: G I 4.4.0-rc2-ceph-12616-gb0098c3 #1
Updated by John Spray over 8 years ago
- Category set to OSD
- Priority changed from Normal to Urgent
Updated by Sage Weil over 8 years ago
- Project changed from Ceph to Linux kernel client
- Subject changed from ceph-osd "BUG: held lock freed!" to "BUG: held lock freed!"
- Category deleted (
OSD)
Updated by Ilya Dryomov about 8 years ago
- Status changed from New to Closed
This looks like a problem in the networking stack, the kernel client isn't in play here. Closing, given that it was a random testing kernel and the issue didn't repeat itself.
Updated by Loïc Dachary about 8 years ago
- Status changed from Closed to 12
Happened twice in three runs of the powercycle suite. Should we rule that as environmental and whitelist the message in teuthology ?
Updated by Ilya Dryomov about 8 years ago
No - this occurrence is on a very recent kernel, so could be a kernel bug. I'll look into it as time allows.
Updated by Ilya Dryomov about 8 years ago
- Priority changed from Urgent to Normal
Updated by Yuri Weinstein about 8 years ago
- Priority changed from Normal to Urgent
- Source changed from other to Q/A
- Release set to infernalis
- ceph-qa-suite powercycle added
infernalis v9.2.1 testing
Run: http://pulpito.ceph.com/teuthology-2016-02-18_09:34:39-powercycle-infernalis-testing-basic-smithi/
Jobs: ['15330', '15344', '15352', '15354', '15355', '15368', '15370']
Updated by Ilya Dryomov about 8 years ago
- Status changed from 12 to Closed
This indeed turned out to be a regression in the networking layer, see http://www.spinics.net/lists/ceph-devel/msg28655.html.
I pushed Eric's patch ("tcp/dccp: fix another race at listener dismantle") for it and a prerequisite ("tcp: md5: release request socket instead of listener") to testing to avoid further failures.