Project

General

Profile

Actions

Bug #21480

closed

bluestore: flush_commit is racy

Added by Sage Weil over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

observed hang on 'osd bench' command:
/a/yuriw-2017-09-19_19:54:13-rados-wip-yuri-testing3-2017-09-19-1710-distro-basic-smithi/1648854

2017-09-19 22:29:11.978724 7f70bd261700  1 heartbeat_map is_healthy 'OSD::command_tp thread 0x7f709a9e7700' had suicide timed out after 900
2017-09-19 22:29:11.983003 7f709a9e7700 -1 *** Caught signal (Aborted) **
 in thread 7f709a9e7700 thread_name:tp_osd_cmd

 ceph version 13.0.0-1010-g1c941a3 (1c941a39eaa824e91551e0b37ebcca96e0f6f174) mimic (dev)
 1: (()+0xa39e89) [0x7f70c29c3e89]
 2: (()+0x10330) [0x7f70c0699330]
 3: (pthread_cond_wait()+0xc4) [0x7f70c0695404]
 4: (C_SaferCond::wait()+0x8c) [0x7f70c24f741c]
 5: (OSD::do_command(Connection*, unsigned long, std::vector<std::string, std::allocator<std::string> >&, ceph::buffer::list&)+0x1b7d) [0x7f70c24e4d7d]
 6: (OSD::CommandWQ::_process(OSD::Command*, ThreadPool::TPHandle&)+0x49) [0x7f70c25291a9]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa6e) [0x7f70c2a06d4e]
 8: (ThreadPool::WorkThread::entry()+0x10) [0x7f70c2a07c30]

The flush_commit appears to be racy because it sets state to KV_DONE without holding the osr lock, but uses the lock for the flush_commit() waiter.

the possibly good news is that there are only a handful of users of flush_commit(); maybe we can just drop it.


Related issues 2 (0 open2 closed)

Copied to bluestore - Backport #24260: luminous: bluestore: flush_commit is racyResolvedIgor FedotovActions
Copied to bluestore - Backport #24261: mimic: bluestore: flush_commit is racyResolvedPrashant DActions
Actions

Also available in: Atom PDF