Project

General

Profile

Bug #37474

Ceph-fuse mount point hang

Added by wei wei rong over 5 years ago. Updated over 5 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
ceph cli
Target version:
% Done:

0%

Source:
Support
Tags:
Ceph-fuse
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

we suffer a network disconnect about 20 mins in the night during the switch upgrade. And our mount point can't open. fusermount -u also hang
ceph-fuse process still running

root@sg-gcloud-10-65-28-33:~# ps -ef | grep ceph
root 337 2 0 Oct19 ? 00:00:00 [ceph-msgr]
root 4372 1 0 Oct19 ? 00:08:55 ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m x.x.x.x:6789,x.x.x.x:6789 /Repo/ -d --debug-client=20 --debug-ms=1 --debug-monc=20

debug log

2018-11-30 14:56:41.224350 7fb7b5139700 10 monclient: tick
2018-11-30 14:56:41.224365 7fb7b5139700 20 monclient: _check_auth_rotating not needed by client.admin
2018-11-30 14:56:42.216312 7fb7b713d700 20 client.24295 trim_cache size 5015 max 16384
2018-11-30 14:56:43.216396 7fb7b713d700 20 client.24295 trim_cache size 5015 max 16384
2018-11-30 14:56:44.216463 7fb7b713d700 20 client.24295 trim_cache size 5015 max 16384
2018-11-30 14:56:45.216537 7fb7b713d700 20 client.24295 trim_cache size 5015 max 16384
2018-11-30 14:56:46.216602 7fb7b713d700 10 client.24295 renew_caps()
2018-11-30 14:56:46.216620 7fb7b713d700 15 client.24295 renew_caps requesting from mds.0
2018-11-30 14:56:46.216623 7fb7b713d700 10 client.24295 renew_caps mds.0
2018-11-30 14:56:46.216626 7fb7b713d700 1 -- 10.65.28.33:0/917778705 --> x.x.x.x:6805/4114465471 -- client_session(request_renewcaps seq 181923) v2 -- ?+0 0x55b607e40b40 con 0x55b605b02600
2018-11-30 14:56:46.216643 7fb7b713d700 20 client.24295 trim_cache size 5015 max 16384
2018-11-30 14:56:47.216677 7fb7b713d700 20 client.24295 trim_cache size 5015 max 16384

strace log

root@sg-gcloud-10-65-28-33:/etc/ceph# strace -ff -p 4372
strace: Process 4372 attached with 32 threads
[pid 26266] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 26265] futex(0x55b607dd3a74, FUTEX_WAIT_PRIVATE, 16273, NULL <unfinished ...>
[pid 24530] read(11, <unfinished ...>
[pid 24469] read(11, <unfinished ...>
[pid 13482] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 32589] futex(0x7fb7aee28d3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 24436] futex(0x7fb7ae526d3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 32588] futex(0x7fb7af629d3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 21335] futex(0x7fb7afe2ad3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4421] futex(0x7fb7b062bd3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4419] futex(0x7fb7b0e2cd3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4416] futex(0x7fb7b162dd3c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4414] futex(0x55b605b6265c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4413] futex(0x55b605b624d4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4412] futex(0x55b605b6234c, FUTEX_WAIT_PRIVATE, 933, NULL <unfinished ...>
[pid 4411] futex(0x55b605b621c4, FUTEX_WAIT_PRIVATE, 757, NULL <unfinished ...>
[pid 4409] futex(0x55b605b7c674, FUTEX_WAIT_PRIVATE, 756793, NULL <unfinished ...>
[pid 4404] futex(0x55b605b56394, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4403] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 4402] futex(0x55b605b5a44c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4401] futex(0x55b605b5a284, FUTEX_WAIT_PRIVATE, 377651, NULL <unfinished ...>
[pid 4400] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 4399] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 4398] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid 4397] futex(0x55b605b5a72c, FUTEX_WAIT_PRIVATE, 41, NULL <unfinished ...>
[pid 4396] futex(0x55b605b627e4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4395] futex(0x55b605ad770c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4394] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 4393] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid 4392] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 4384] futex(0x55b605ade0bc, FUTEX_WAIT_PRIVATE, 8395399, NULL <unfinished ...>
[pid 4372] futex(0x7fffeccd22d0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff <unfinished ...>
[pid 4400] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 4400] futex(0x55b605b62d60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 4400] futex(0x55b605ad761c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7276941, {1543561045, 857290766}, ffffffff <unfinished ...>
[pid 4392] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 4392] futex(0x55b605b0a140, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 4392] futex(0x55b605b0a194, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1455471, {1543561050, 70055767}, ffffffff <unfinished ...>
[pid 4399] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 4399] futex(0x55b605ade0bc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x55b605ade0b8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid 4384] <... futex resumed> ) = 0
[pid 4399] futex(0x55b605b62d60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 4384] futex(0x55b605ade038, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 4399] <... futex resumed> ) = 0
[pid 4384] <... futex resumed> ) = 0
[pid 4399] futex(0x55b605b62054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7276923, {1543561046, 219489337}, ffffffff <unfinished ...>

ceph fuse version
root@sg-gcloud-10-65-28-33:/etc/ceph# ceph-fuse -v
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)

ceph server version
root@sg-gcloud-203-116-214-238:~# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)

History

#1 Updated by Patrick Donnelly over 5 years ago

  • Status changed from New to Rejected

The client was probably blacklisted. You should use `umount -f` to force the unmount.

Please ask questions like this on ceph-users in the future.

Also available in: Atom PDF