Project

General

Profile

Actions

Bug #45635

closed

kclient: kclient node get stuck dues to double lock happens

Added by Xiubo Li almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature (v1):
Crash signature (v2):

Description

[183549.259361] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[183549.260653] Call Trace:
[183549.261436]  ? __schedule+0x272/0x5b0
[183549.262431]  schedule+0x45/0xb0
[183549.263322]  schedule_preempt_disabled+0x5/0x10
[183549.264472]  __mutex_lock.isra.0+0x262/0x4b0
[183549.265549]  ? get_page_from_freelist+0x710/0x1000
[183549.266719]  ? __ceph_caps_issued+0x68/0xc0 [ceph]
[183549.267878]  ceph_check_caps+0x4a9/0x980 [ceph]
[183549.268965]  ? con_get+0xc/0x20 [ceph]
[183549.269892]  ? msg_con_set.isra.0+0x31/0x50 [libceph]
[183549.271113]  ? ceph_con_send+0xbc/0x1b0 [libceph]
[183549.272263]  ? __send_request+0x683/0x890 [ceph]
[183549.273391]  ? select_collect2+0xe0/0xe0
[183549.274474]  ceph_put_cap_refs+0x24c/0x320 [ceph]
[183549.275643]  send_mds_reconnect+0x268/0x679 [ceph]
[183549.276779]  ceph_mdsc_handle_mdsmap+0x5b6/0x620 [ceph]
[183549.278007]  ? extra_mon_dispatch+0x2f/0x40 [ceph]
[183549.279192]  extra_mon_dispatch+0x2f/0x40 [ceph]
[183549.280325]  dispatch+0x527/0x8d0 [libceph]
[183549.281423]  ceph_con_workfn+0xcc6/0x29d0 [libceph]
[183549.282614]  ? __switch_to_asm+0x34/0x70
[183549.283620]  ? __switch_to_asm+0x40/0x70
[183549.284640]  ? __switch_to_asm+0x34/0x70
[183549.285640]  ? __switch_to_asm+0x40/0x70
[183549.286682]  ? __switch_to_asm+0x34/0x70
[183549.287678]  ? __switch_to_asm+0x34/0x70
[183549.288693]  ? __switch_to_asm+0x40/0x70
[183549.289705]  ? __switch_to_asm+0x34/0x70
[183549.290696]  ? __switch_to+0x162/0x3f0
[183549.291634]  process_one_work+0x1d2/0x3a0
[183549.292682]  worker_thread+0x45/0x3c0
[183549.293647]  kthread+0xf6/0x130
[183549.294539]  ? process_one_work+0x3a0/0x3a0
[183549.295956]  ? kthread_park+0x80/0x80
[183549.297424]  ret_from_fork+0x22/0x40
Actions #1

Updated by Xiubo Li almost 4 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Xiubo Li almost 4 years ago

In the ceph_check_caps() it may call the session lock/unlock stuff.

There have some deadlock cases, like:

handle_forward()
...
mutex_lock(&mdsc->mutex)
...
ceph_mdsc_put_request()
  --> ceph_mdsc_release_request()
    --> ceph_put_cap_request()
      --> ceph_put_cap_refs()
        --> ceph_check_caps()
...
mutex_unlock(&mdsc->mutex)

And also there maybe has some double session lock cases, like:

send_mds_reconnect()
...
mutex_lock(&session->s_mutex);
...
  --> replay_unsafe_requests()
    --> ceph_mdsc_release_dir_caps()
      --> ceph_put_cap_refs()
        --> ceph_check_caps()
...
mutex_unlock(&session->s_mutex);
Actions #3

Updated by Xiubo Li almost 4 years ago

  • Status changed from In Progress to Fix Under Review
Actions #4

Updated by Xiubo Li over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF