Project

General

Profile

Bug #43199

recursive lock of RGWCoroutinesManager::lock (43)

Added by Chang Liu about 2 months ago. Updated about 18 hours ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
multisite elasticsearch
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

I backported this pr (https://github.com/ceph/ceph/pull/29637) to luminous, and found that RGW is segfaulted as follow:

    -2> 2019-12-09 09:43:52.213009 7f76635f9e00 20 _will_lock RGWCoroutinesManager::lock (43)
    -1> 2019-12-09 09:43:52.213010 7f76635f9e00  0
recursive lock of RGWCoroutinesManager::lock (43)
 ceph version 12.2.8-387-g14bf2e5 (14bf2e55a49eef6ddcc926f8a20844f8b0ecbdcf) luminous (stable)
 1: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x167) [0x7f76630ce417]
 2: (RGWCoroutinesManager::run(RGWCoroutine*)+0x98) [0x7f76630cf548]
 3: (RGWElasticDataSyncModule::init(RGWDataSyncEnv*, unsigned long)+0x341) [0x7f76631e99c1]
 4: (RGWDataSyncCR::operate()+0x1c3) [0x7f76630a0bb3]
 5: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7f76630cbb8e]
 6: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x4cb) [0x7f76630ce77b]
 7: (RGWCoroutinesManager::run(RGWCoroutine*)+0x98) [0x7f76630cf548]
 8: (RGWRemoteDataLog::run_sync(int)+0xe3) [0x7f766307c3f3]
 9: (main()+0x1f7db) [0x7f7662f9547b]
 10: (__libc_start_main()+0xf5) [0x7f765680c505]
 11: (()+0x1751b0) [0x7f7662fa11b0]

previously locked at
 ceph version 12.2.8-387-g14bf2e5 (14bf2e55a49eef6ddcc926f8a20844f8b0ecbdcf) luminous (stable)
 1: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x4ab) [0x7f76630ce75b]
 2: (RGWCoroutinesManager::run(RGWCoroutine*)+0x98) [0x7f76630cf548]
 3: (RGWRemoteDataLog::run_sync(int)+0xe3) [0x7f766307c3f3]
 4: (main()+0x1f7db) [0x7f7662f9547b]
 5: (__libc_start_main()+0xf5) [0x7f765680c505]
 6: (()+0x1751b0) [0x7f7662fa11b0]

it seems that it's invalid to run RGWCoroutinesManager in an RGWCoroutinesManager. RGW ES sync module works as expected if I disable the lockdep detection. as following:

  void init(RGWDataSyncEnv *sync_env, uint64_t instance_id) override {
    conf->init_instance(sync_env->store->svc()->zone->get_realm(), instance_id);        conf->init_instance(sync_env->store->svc()->zone->get_realm(), instance_id);
    // try to get elastic search version
    lockdep_unregister_ceph_context(sync_env->store->ctx());
    .....

History

#1 Updated by Chang Liu about 2 months ago

could we re-use the RGWCoroutinesManager of data sync instance here?

#2 Updated by Casey Bodley about 1 month ago

  • Assignee set to Casey Bodley

#3 Updated by Casey Bodley about 1 month ago

  • Status changed from New to In Progress

I added a comment about this to https://github.com/ceph/ceph/pull/29637/files#r357250244. We should address that on master first, then include the change with this backport.

#4 Updated by Chang Liu about 1 month ago

  • Pull request ID set to 32269

#5 Updated by Casey Bodley 22 days ago

  • Status changed from In Progress to Fix Under Review
  • Tags set to multisite elasticsearch

#6 Updated by Daniel Gryniewicz about 18 hours ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF