Project

General

Profile

Actions

Bug #22726

open

multip miss sync when put objs on cosbench as uniform objs

Added by Amine Liu over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

if objs name’s uffix is uniform, some objs not sync to slave zone;

<workflow>

    <workstage name="init">  
      <work type="init" workers="1" config="cprefix=www2;containers=r(1,2)" />
    </workstage>

    <workstage name="prepare">
      <work type="prepare" workers="1"  config="cprefix=www2;containers=r(1,2);objects=r(1,10);sizes=c(100)KB" />
    </workstage>

    <workstage name="main">  
      <work name="main" workers="100" runtime="300">
        <operation type="read" ratio="80" config="cprefix=www2;containers=u(1,2);objects=u(1,10)" />
        <operation type="write" ratio="20" config="cprefix=www2;containers=u(1,2);objects=u(11,5000);sizes=c(100)KB" />
      </work>
    </workstage>

  </workflow>

Files

20180119095418.jpg (87.6 KB) 20180119095418.jpg slave index is always locking Amine Liu, 01/19/2018 01:52 AM
Actions #1

Updated by John Spray over 6 years ago

  • Project changed from Ceph to rgw
  • Description updated (diff)
  • Category deleted (librados)

(added pre tags to fix formatting)

Actions #2

Updated by Casey Bodley over 6 years ago

How are you coming to the conclusion that objects are not syncing? Bucket listing on the other zone?

Can you share the sync status of this secondary zone? You might also look at the 'sync error list' for entries related to those missing object names.

Actions #3

Updated by Amine Liu over 6 years ago

Casey Bodley wrote:

How are you coming to the conclusion that objects are not syncing? Bucket listing on the other zone?

Can you share the sync status of this secondary zone? You might also look at the 'sync error list' for entries related to those missing object names.

radosgw-admin sync status is ok on both master zone and slave zone. but `sync error list ` found some ""message": "failed to sync bucket instance: (16) Device or resource busy"" not found those missing obj names

yes,that objs are not in the slave bucket, and not found those objs in the salve index(I found those index are locking),Even if I delete those objs that can not be synchronized, and then re-put, it is impossible to synchronize successfully. the log is "stack is still running"

log on master:
2018-01-18 11:12:59.480274 7f89a37fe700 20 process(): notifying datalog change, shard_id=111: www21:e3026b1f-be0f-4ce3-a159-b4bf6c5bf416.39492.12:10

log on slave:
2018-01-18 11:13:00.530596 7f3bb2eed700 20 execute(): modified key=www21:e3026b1f-be0f-4ce3-a159-b4bf6c5bf416.39492.12:10
2018-01-18 11:13:00.530599 7f3c56ffd700 20 cr:s=0x7f3a39a5a050:op=0x7f3a39ab5f80:24RGWBucketShardFullSyncCR: operate()
2018-01-18 11:13:00.530601 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a397cd1b0 explicitily skipping stack
2018-01-18 11:13:00.530603 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a3998c370 is still running
2018-01-18 11:13:00.530605 7f3c56ffd700 20 cr:s=0x7f3c400b6a60:op=0x7f3a30798fc0:31RGWReadRemoteDataLogShardInfoCR: operate()
2018-01-18 11:13:00.530599 7f3bb2eed700 20 wakeup_data_sync_shards: source_zone=e3026b1f-be0f-4ce3-a159-b4bf6c5bf416, shard_ids={111=www21:e3026b1f-be0f-4ce3-a159-b4bf
6c5bf416.39492.12:10}
2018-01-18 11:13:00.530626 7f3bb2eed700 2 req 1146797:1.000895::POST /admin/log:datalog_notify:completing
2018-01-18 11:13:00.530673 7f3bb2eed700 2 req 1146797:1.000944::POST /admin/log:datalog_notify:op status=0
2018-01-18 11:13:00.530677 7f3bb2eed700 2 req 1146797:1.000947::POST /admin/log:datalog_notify:http status=200
2018-01-18 11:13:00.530680 7f3bb2eed700 1 ====== req done req=0x7f3bb2ee7710 op status=0 http_status=200 ======
2018-01-18 11:13:00.530720 7f3c56ffd700 20 cr:s=0x7f3a39a5a050:op=0x7f3a39ab5f80:24RGWBucketShardFullSyncCR: operate()
2018-01-18 11:13:00.530724 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a397cd1b0 explicitily skipping stack
2018-01-18 11:13:00.530725 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a3998c370 is still running
2018-01-18 11:13:00.530727 7f3c56ffd700 20 cr:s=0x7f3c400b6a60:op=0x7f3a30599780:18RGWDataSyncShardCR: operate()
2018-01-18 11:13:00.530730 7f3c56ffd700 20 incremental_sync:1256: shard_id=7 datalog_marker=1_1516177458.134717_624.1 sync_marker.marker=1_1516177458.134717_624.1
2018-01-18 11:13:00.530734 7f3c56ffd700 20 incremental_sync:1309: shard_id=7 datalog_marker=1_1516177458.134717_624.1 sync_marker.marker=1_1516177458.134717_624.1
2018-01-18 11:13:00.530744 7f3c56ffd700 20 run: stack=0x7f3c400b6a60 is io blocked
2018-01-18 11:13:00.530746 7f3c56ffd700 20 cr:s=0x7f3a39a5a050:op=0x7f3a39ab5f80:24RGWBucketShardFullSyncCR: operate()
2018-01-18 11:13:00.530747 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a397cd1b0 explicitily skipping stack
2018-01-18 11:13:00.530749 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a3998c370 is still running
2018-01-18 11:13:00.530751 7f3c56ffd700 20 cr:s=0x7f3a39a5a050:op=0x7f3a39ab5f80:24RGWBucketShardFullSyncCR: operate()
2018-01-18 11:13:00.530752 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a397cd1b0 explicitily skipping stack
2018-01-18 11:13:00.530754 7f3c56ffd700 20 collect(): s=0x7f3a39a5a050 stack=0x7f3a3998c370 is still running
`
  1. and not found `incremental_sync()` after `wakeup_data_sync_shards`,not found `sync start `
Actions #4

Updated by Amine Liu over 6 years ago

my config:

rgw_override_bucket_index_max_shards = 20

Actions

Also available in: Atom PDF