Project

General

Profile

Actions

Bug #20906

closed

multisite: FAILED assert(prev_iter != pos_to_prev.end()) in RGWMetaSyncShardCR::collect_children()

Added by Casey Bodley over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-08-03 15:19:13.189356 7f05bcff9700 20 cr:s=0x7f053c004a30:op=0x7f053c0f3d90:21RGWReadMDLogEntriesCR: operate()
2017-08-03 15:19:13.189370 7f05bcff9700 20 cr:s=0x7f053c004a30:op=0x7f053c02f0c0:18RGWMetaSyncShardCR: operate()
2017-08-03 15:19:13.189378 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787951.872747_56.1:bucket.instance:haixjc-1:b2f89ca6-a10a-425c-ad79-82ba886bd6fe.4109.1:2017-08-03 15:19:11.872747
2017-08-03 15:19:13.189418 7f05bcff9700 20 cr:s=0x7f053c083170:op=0x7f053c0f3d90:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189423 7f05bcff9700 20 meta sync: skipping pending operation
2017-08-03 15:19:13.189440 7f05bcff9700 20 cr:s=0x7f053c004a30:op=0x7f053c02f0c0:18RGWMetaSyncShardCR: operate()
2017-08-03 15:19:13.189454 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.051313_57.1:bucket.instance:haixjc-1:b2f89ca6-a10a-425c-ad79-82ba886bd6fe.4109.1:2017-08-03 15:19:12.051313
2017-08-03 15:19:13.189457 7f05bcff9700  0 meta sync: ERROR: cannot start syncing 1_1501787952.051313_57.1. Duplicate entry?
2017-08-03 15:19:13.189460 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.111111_58.1:bucket:haixjc-1:2017-08-03 15:19:12.111111
2017-08-03 15:19:13.189480 7f05bcff9700 20 cr:s=0x7f053c083170:op=0x7f053c0f3d90:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189487 7f05bcff9700 20 run: stack=0x7f053c083170 is done
2017-08-03 15:19:13.189518 7f05bcff9700 20 cr:s=0x7f053c071a20:op=0x7f053c0a4450:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189521 7f05bcff9700 20 meta sync: skipping pending operation
2017-08-03 15:19:13.189529 7f05bcff9700 20 cr:s=0x7f053c004a30:op=0x7f053c02f0c0:18RGWMetaSyncShardCR: operate()
2017-08-03 15:19:13.189535 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.263821_59.1:bucket:haixjc-1:2017-08-03 15:19:12.263821
2017-08-03 15:19:13.189539 7f05bcff9700  0 meta sync: ERROR: cannot start syncing 1_1501787952.263821_59.1. Duplicate entry?
2017-08-03 15:19:13.189548 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.473367_60.1:bucket.instance:haixjc-2:b2f89ca6-a10a-425c-ad79-82ba886bd6fe.4109.2:2017-08-03 15:19:12.473367
2017-08-03 15:19:13.189572 7f05bcff9700 20 cr:s=0x7f053c071a20:op=0x7f053c0a4450:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189580 7f05bcff9700 20 run: stack=0x7f053c071a20 is done
2017-08-03 15:19:13.189585 7f05bcff9700 20 cr:s=0x7f053c0726e0:op=0x7f053c0f3d90:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189587 7f05bcff9700 20 meta sync: skipping pending operation
2017-08-03 15:19:13.189593 7f05bcff9700 20 cr:s=0x7f053c004a30:op=0x7f053c02f0c0:18RGWMetaSyncShardCR: operate()
2017-08-03 15:19:13.189598 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.611686_61.1:bucket.instance:haixjc-2:b2f89ca6-a10a-425c-ad79-82ba886bd6fe.4109.2:2017-08-03 15:19:12.611686
2017-08-03 15:19:13.189602 7f05bcff9700  0 meta sync: ERROR: cannot start syncing 1_1501787952.611686_61.1. Duplicate entry?
2017-08-03 15:19:13.189604 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.660282_62.1:bucket:haixjc-2:2017-08-03 15:19:12.660282
2017-08-03 15:19:13.189624 7f05bcff9700 20 cr:s=0x7f053c0726e0:op=0x7f053c0f3d90:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189630 7f05bcff9700 20 run: stack=0x7f053c0726e0 is done
2017-08-03 15:19:13.189636 7f05bcff9700 20 cr:s=0x7f053c068790:op=0x7f053c0a45c0:24RGWMetaSyncSingleEntryCR: operate()
2017-08-03 15:19:13.189637 7f05bcff9700 20 meta sync: skipping pending operation
2017-08-03 15:19:13.189643 7f05bcff9700 20 cr:s=0x7f053c004a30:op=0x7f053c02f0c0:18RGWMetaSyncShardCR: operate()
2017-08-03 15:19:13.189648 7f05bcff9700 20 meta sync: incremental_sync:1665: shard_id=0 log_entry: 1_1501787952.787951_65.1:bucket:haixjc-2:2017-08-03 15:19:12.787951
2017-08-03 15:19:13.189651 7f05bcff9700  0 meta sync: ERROR: cannot start syncing 1_1501787952.787951_65.1. Duplicate entry?
2017-08-03 15:19:13.189655 7f05bcff9700  4 meta sync: cr:s=0x7f053c004a30:op=0x7f053c02f0c0:18RGWMetaSyncShardCR: adjusting marker pos=1_1501787951.872747_56.1
2017-08-03 15:19:13.196648 7f05bcff9700 -1 /home/cbodley/ceph/src/rgw/rgw_sync.cc: In function 'void RGWMetaSyncShardCR::collect_children()' thread 7f05bcff9700 time 2017-08-03 15:19:13.189662
/home/cbodley/ceph/src/rgw/rgw_sync.cc: 1398: FAILED assert(prev_iter != pos_to_prev.end())

 ceph version Development (no_version) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x137) [0x7f0630aebbb2]
 2: (RGWMetaSyncShardCR::collect_children()+0x303) [0x5569581341bb]
 3: (RGWMetaSyncShardCR::incremental_sync()+0x2862) [0x556958138dea]
 4: (RGWMetaSyncShardCR::operate()+0x1ee) [0x556958133cd6]
 5: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x191) [0x556957e7af39]
 6: (RGWCoroutinesManager::run(std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x290) [0x556957e7cb04]
 7: (RGWCoroutinesManager::run(RGWCoroutine*)+0xbc) [0x556957e7deb0]
 8: (RGWRemoteMetaLog::run_sync()+0x175e) [0x55695812045c]
 9: (RGWMetaSyncStatusManager::run()+0x1c) [0x556957f7470a]
 10: (RGWMetaSyncProcessorThread::process()+0x1c) [0x556957f76f06]
 11: (RGWRadosThread::Worker::entry()+0xe8) [0x556957f10dd2]
 12: (Thread::entry_wrapper()+0xc1) [0x7f0630f372a3]
 13: (Thread::_entry_func(void*)+0x18) [0x7f0630f371d8]
 14: (()+0x773a) [0x7f0639b5173a]
 15: (clone()+0x3f) [0x7f062cf83e0f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Further debugging shows that RGWMetaSyncShardCR is looping over the same entries a second time. The marker_tracker detects some of those duplicates (see the "Duplicate entry?" messages), but not others.


Related issues 1 (0 open1 closed)

Copied to rgw - Backport #21097: luminous: multisite: FAILED assert(prev_iter != pos_to_prev.end()) in RGWMetaSyncShardCR::collect_children()ResolvedAbhishek LekshmananActions
Actions #1

Updated by Casey Bodley over 6 years ago

possibly related to cls changes in https://github.com/ceph/ceph/pull/16667?

Actions #2

Updated by Casey Bodley over 6 years ago

  • Status changed from New to Fix Under Review
  • Backport set to luminous
Actions #3

Updated by Orit Wasserman over 6 years ago

  • Status changed from Fix Under Review to 17
Actions #4

Updated by Orit Wasserman over 6 years ago

  • Assignee set to Casey Bodley
Actions #5

Updated by Yuri Weinstein over 6 years ago

Casey Bodley wrote:

https://github.com/ceph/ceph/pull/17024

merged

Actions #6

Updated by Nathan Cutler over 6 years ago

  • Status changed from 17 to Pending Backport
Actions #7

Updated by Abhishek Lekshmanan over 6 years ago

  • Copied to Backport #21097: luminous: multisite: FAILED assert(prev_iter != pos_to_prev.end()) in RGWMetaSyncShardCR::collect_children() added
Actions #8

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF