Project

General

Profile

Actions

Bug #55465

open

test/rgw-multi: test_datalog_autotrim is failed.

Added by ren weiguo about 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In my tethology test rgw/multisite,tasks is three-zone test. I find test_datalog_autotrim is failed.
2022-04-21T13:23:34.933 INFO:tasks.rgw_multisite_tests:FAIL: tasks.rgw_multi.tests.test_datalog_autotrim
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests:----------------------------------------------------------------------
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests:Traceback (most recent call last):
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests: File "/home/teuthology/src/teuthology-master/virtualenv/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests: self.test(self.arg)
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests: File "/home/teuthology/src/ceph/qa/tasks/rgw_multi/tests.py", line 998, in test_datalog_autotrim
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests: assert before_trim < after_trim, "any datalog entries must be newer than trim"
2022-04-21T13:23:34.934 INFO:tasks.rgw_multisite_tests:AssertionError: any datalog entries must be newer than trim

So I add debug_rgw=20 level to reproduce it.
brief log
2022-04-24T19:39:21.853 DEBUG:teuthology.orchestra.run.plana002:> adjust-ulimits ceph-coverage /mnt/cephtest/archive/coverage radosgw-admin --log-to-stderr --format json -n client.0 --cluster c1 datalog status --cluster c1 --debug-rgw 0 --debug-ms 0 --rgw-cache-enabled false
2022-04-24T19:39:22.327 INFO:teuthology.orchestra.run.plana002.stdout:[{"marker":"","last_update":"0.000000"},{"marker":"","last_update":"0.000000"},{"marker":"1_1650800358.088348_685.1","last_update":"2022-04-24 11:39:18.088348Z"},{"marker":"","last_update":"0.000000"}]
2022-04-24T19:39:22.328 INFO:tasks.util.rgw:rgwadmin: c1.client.0 : ['datalog', 'autotrim', '--cluster', 'c1', '--debug-rgw', '20', '--debug-ms', '0']
2022-04-24T19:39:22.328 DEBUG:tasks.util.rgw:rgwadmin: cmd=['adjust-ulimits', 'ceph-coverage', '/mnt/cephtest/archive/coverage', 'radosgw-admin', '--log-to-stderr', '--format', 'json', '-n', 'client.0', '--cluster', 'c1', 'datalog', 'autotrim', '--cluster', 'c1', '--debug-rgw', '20', '--debug-ms', '0']
2022-04-24T19:39:22.328 DEBUG:teuthology.orchestra.run.plana002:> adjust-ulimits ceph-coverage /mnt/cephtest/archive/coverage radosgw-admin --log-to-stderr --format json -n client.0 --cluster c1 datalog autotrim --cluster c1 --debug-rgw 20 --debug-ms 0
2022-04-24T19:39:22.751 INFO:teuthology.orchestra.run.plana002.stderr:2022-04-24 19:39:22.741246 7f9a1a09ef80 10 data trim: trimming log shards
2022-04-24T19:39:22.751 INFO:teuthology.orchestra.run.plana002.stderr:2022-04-24 19:39:22.741248 7f9a1a09ef80 10 data trim: trimming log shard 2 at marker=1_1650800351.661341_677.1 last_trim=
2022-04-24T19:39:22.793 DEBUG:teuthology.orchestra.run.plana002:> adjust-ulimits ceph-coverage /mnt/cephtest/archive/coverage radosgw-admin --log-to-stderr --format json -n client.0 --cluster c1 datalog list --shard-id 2 --max-entries 1 --cluster c1 --debug-rgw 0 --debug-ms 0 --rgw-cache-enabled false
2022-04-24T19:39:23.261 INFO:teuthology.orchestra.run.plana002.stdout:[{"entity_type":"bucket","key":"wshhld-47:af2b0b0d-db02-449c-83d3-071cd0e2d80d.6525.4","timestamp":"2022-04-24 11:39:18.088348Z"}]
2022-04-24T19:39:23.262 INFO:tasks.rgw_multisite_tests:tasks.rgw_multi.tests.test_datalog_autotrim ... FAIL.
I find that trim is not based on the latest position.

So i add some code to prove my guess.
*Code:

def test_datalog_autotrim():
zonegroup = realm.master_zonegroup()
zonegroup_conns = ZonegroupConns(zonegroup)
buckets, zone_bucket = create_bucket_per_zone(zonegroup_conns)

  1. upload an object to each zone to generate a datalog entry
    for zone, bucket in zone_bucket:
    k = new_key(zone, bucket.name, 'key')
    k.set_contents_from_string('body')
  1. wait for metadata and data sync to catch up
    zonegroup_meta_checkpoint(zonegroup)
    zonegroup_data_checkpoint(zonegroup_conns)
  1. trim each datalog
    for zone, _ in zone_bucket: # read max markers for each shard
    status = datalog_status(zone.zone)

    datalog_autotrim(zone.zone)

    for shard_id, shard_status in enumerate(status):
    try:
    before_trim = dateutil.parser.isoparse(shard_status['last_update'])
    except: # empty timestamps look like "0.000000" and will fail here
    continue
    entries = datalog_list(zone.zone, ['--shard-id', str(shard_id), '--max-entries', '1'])
    if not len(entries):
    continue
    after_trim = dateutil.parser.isoparse(entries0['timestamp'])
    + if before_trim >= after_trim:
    + log.info('zone=%s need datasync', zone.name)
    + zonegroup_data_checkpoint(zonegroup_conns)
    + datalog_autotrim(zone.zone)
    + datalog_list(zone.zone, ['--shard-id', str(shard_id), '--max-entries', '1'])
    assert before_trim < after_trim, "any datalog entries must be newer than trim"
    Look 1.log.
    reslut:
    After executing first zonegroup_data_checkpoint, it does not guarantee that all zones are finished synchronously.
    In my photo.
    zonegroup_data_checkpoint start stage1.it find B,C datalog cauht up,and wait stage2 A datalog cauht up. But the all datalog caught up is in stage3.


Files

1.log (115 KB) 1.log ren weiguo, 04/27/2022 08:17 AM
3zone-putobj.png (32.9 KB) 3zone-putobj.png ren weiguo, 04/27/2022 08:28 AM
Actions #1

Updated by ren weiguo about 2 years ago

https://github.com/ceph/ceph/pull/46048
sorry I can't modify the format so it looks bad.

Actions

Also available in: Atom PDF