Bug #13926
closedlockup in multithreaded application
0%
Description
A multithreaded applications end up in a blocked state when multiple threads try to access the same file.
- apt-cache policy ceph-fuse
ceph-fuse:
Installed: 0.94.5-1trusty
ceph.conf:
[global]
osd_pool_default_pgp_num = 512
osd_pool_default_min_size = 2
auth_service_required = cephx
mon_initial_members = <one monitor>
fsid = <fs id>
cluster_network = <network>
auth_supported = cephx
auth_cluster_required = cephx
mon_host = <monitor hosts>
auth_client_required = cephx
osd_pool_default_size = 3
osd_pool_default_pg_num = 512
public_network = <network>
#fuse_use_invalidate_cb = True
debug_client=20/20
The locked up process is consuming 100% CPU in system call at that time (40 CPU cores):
top - 10:38:26 up 7 min, 1 user, load average: 0.99, 0.64, 0.30
Tasks: 40 total, 2 running, 38 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 2.5 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 26411547+total, 2673076 used, 26144240+free, 43872 buffers
KiB Swap: 26855424+total, 0 used, 26855424+free. 687924 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4039 blinke 20 0 16.198g 38060 2152 R 100.1 0.0 2:41.25 hammer
Trying to terminate the process (e.g. CTRL-C) kills the worker thread, but the main thread keeps running. Accessing the list of file handles associated with the process (/proc/$PID/fd) also blocks.
Debug output is available with ceph-post-file id a8eb75d5-cc13-430a-bed8-428c8a33d6d8
Updated by Zheng Yan over 8 years ago
looks like you have debug_client=20, could you upload the client log to some place.
Updated by Burkhard Linke over 8 years ago
The logfile has already been uploaded with ceph-post-file, id is a8eb75d5-cc13-430a-bed8-428c8a33d
Updated by Zheng Yan about 8 years ago
- Status changed from New to Need More Info
did not find anything in the log
Updated by Greg Farnum almost 8 years ago
- Category changed from 45 to Correctness/Safety
- Component(FS) ceph-fuse added
Updated by Zheng Yan over 5 years ago
- Status changed from Need More Info to Closed
no update for a long time