Bug #22080
radosgw-admin data sync run crashes
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Seen this on a jewel-master luminous-secondary cluster. Maybe reproducible on a L-L cluster as well haven't tried.
$ radosgw-admin data sync-run --source-zone=az1 ceph version 12.2.1-367-g40d92ddf14 (40d92ddf1435ebeea6d9c17464367ef9ad332f0e) luminous (stable) 1: (ceph::BackTrace::BackTrace(int)+0x48) [0x55d8caafb994] 2: (()+0x9f6b2b) [0x55d8caafab2b] 3: (()+0x10b10) [0x7fdfaa0d6b10] 4: (RGWDataSyncCR::operate()+0x1db) [0x55d8ca81fff9] 5: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x15b) [0x55d8ca85dced] 6: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x1fd) [0x55d8ca85f51f] 7: (RGWCoroutinesManager::run(RGWCoroutine*)+0x9b) [0x55d8ca860847] 8: (RGWRemoteDataLog::run_sync(int)+0xc1) [0x55d8ca802615] 9: (RGWDataSyncStatusManager::run()+0x28) [0x55d8ca6dbeb8] 10: (main()+0x17ae5) [0x55d8ca6c0579] 11: (__libc_start_main()+0xf5) [0x7fdf9ddfa6e5] 12: (_start()+0x29) [0x55d8ca69c7f9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
running this through gdb
1456 data_sync_module = sync_env->sync_module->get_data_handler(); Missing separate debuginfos, use: zypper install krb5-debuginfo-1.12.5-9.1.x86_64 libblkid1-debuginfo-2.29.2-3.4.x86_64 libcurl4-debuginfo-7.37.0-23.1.x86_64 libexpat1-debuginfo-2.1.0-24.1.x86_64 libfreebl3-debuginfo-3.28.6-44.1.x86_64 libibverbs1-debuginfo-14-6.4.x86_64 libkeyutils1-debuginfo-1.5.9-7.13.x86_64 libldap-2_4-2-debuginfo-2.4.44-18.1.x86_64 libopenssl1_0_0-debuginfo-1.0.2j-10.1.x86_64 libpcre1-debuginfo-8.39-11.1.x86_64 libselinux1-debuginfo-2.5-4.17.x86_64 libsoftokn3-debuginfo-3.28.6-44.1.x86_64 liburcu0-debuginfo-debuginfo-0.8.8-5.3.x86_64 liburcu2-debuginfo-debuginfo-0.8.7-4.1.x86_64 libuuid1-debuginfo-2.29.2-3.4.x86_64 libz1-debuginfo-1.2.8-13.15.x86_64 mozilla-nspr-debuginfo-4.17-1.1.x86_64 mozilla-nss-debuginfo-3.33-2.1.x86_64 (gdb) bt #0 0x0000555555c6fff9 in RGWDataSyncCR::operate (this=0x5555566037b0) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_data_sync.cc:1456 #1 0x0000555555cadced in RGWCoroutinesStack::operate (this=0x555556617a10, _env=0x7fffffff97b0) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_coroutine.cc:195 #2 0x0000555555caf51f in RGWCoroutinesManager::run (this=0x7fffffffb0a8, stacks=...) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_coroutine.cc:485 #3 0x0000555555cb0847 in RGWCoroutinesManager::run (this=0x7fffffffb0a8, op=0x5555566031c0) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_coroutine.cc:624 #4 0x0000555555c52615 in RGWRemoteDataLog::run_sync (this=0x7fffffffb0a8, num_shards=128) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_data_sync.cc:1643 #5 0x0000555555b2beb8 in RGWDataSyncStatusManager::run (this=0x7fffffffb050) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_data_sync.h:320 #6 0x0000555555b10579 in main (argc=7, argv=0x7fffffffda48) at /ssd/builds/cpp/ceph_cmake_new/src/rgw/rgw_admin.cc:6438
Possibly looks like we don't intialize sync-module instance in rgw-admin ( sync_modules_manage->create_instance in rados init)?
Related issues
History
#1 Updated by Abhishek Lekshmanan over 6 years ago
- Assignee set to Yehuda Sadeh
#2 Updated by Abhishek Lekshmanan over 6 years ago
seeing in a Luminous-Luminous cluster as well
#3 Updated by Kefu Chai over 6 years ago
- Status changed from New to Fix Under Review
#4 Updated by Yehuda Sadeh over 6 years ago
- Status changed from Fix Under Review to 7
#5 Updated by Casey Bodley about 6 years ago
- Status changed from 7 to Pending Backport
- Backport set to luminous
both the original pr and a followup need backport:
https://github.com/ceph/ceph/pull/18898
https://github.com/ceph/ceph/pull/20611
#6 Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23180: luminous: radosgw-admin data sync run crashes added
#7 Updated by Nathan Cutler almost 6 years ago
- Status changed from Pending Backport to Resolved