Project

General

Profile

Bug #37733

os/bluestore: fixup access a destroy cond cause deadlock or undefine behaviors

Added by bing lin 10 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
Start date:
12/21/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

1. osd has been mark down because of on heartbeat
2. gdb attach, found thread hung by _lock_lock_wait

(gdb) t 2
[Switching to thread 2 (Thread 0x7f6af1c87700 (LWP 1543))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     2:      movl    %edx, %eax

3. cond.
_data.__lock is 2, this will cause hung
(gdb) p cond

$7 = {_M_cond = {__data = {__lock = 2, __futex = 0, __total_seq = 18446744073709551615, __wakeup_seq = 94435500207862, __woken_seq = 0, __mutex = 0xe7d39a28b922f100, __nwaiters = 0, __broadcast_seq = 0},

    __size = "\002\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\366\256qz\343U", '\000' <repeats 11 times>, "\361\"\271(\232\323\347\000\000\000\000\000\000\000", __align = 2}}

4. i have a test, the main logic is: when destory a cond, then cond.__data.__lock will be set to 1. means that we should not using this cond until we recall pthread_cond_init. if we access the destoried cond, then casue __lock_lock_wait


Related issues

Copied to bluestore - Backport #38142: luminous: os/bluestore: fixup access a destroy cond cause deadlock or undefine behaviors Resolved
Copied to bluestore - Backport #38143: mimic: os/bluestore: fixup access a destroy cond cause deadlock or undefine behaviors Resolved

History

#1 Updated by Sage Weil 10 months ago

  • Status changed from New to Need Review

#2 Updated by Sage Weil 9 months ago

  • Priority changed from Normal to High
  • Backport set to luminous,mimic

#3 Updated by Kefu Chai 9 months ago

  • Pull request ID set to 25631

#4 Updated by Neha Ojha 9 months ago

  • Status changed from Need Review to Pending Backport

#5 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #38142: luminous: os/bluestore: fixup access a destroy cond cause deadlock or undefine behaviors added

#6 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #38143: mimic: os/bluestore: fixup access a destroy cond cause deadlock or undefine behaviors added

#7 Updated by Nathan Cutler 8 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF