Project

General

Profile

Actions

Bug #51846

open

rados/test.sh: LibRadosList.ListObjectsCursor did not complete.

Added by Sridhar Seshasayee almost 3 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/nojha-2021-07-22_18:00:50-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6287310
/a/nojha-2021-07-21_18:05:51-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6285172

Logs from JobID: 6287310

The logs of the api_list test abruptly ends after a seek shown below. The test is presumably hung for some reason.

2021-07-22T18:58:56.346 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: [ RUN ] LibRadosList.ListObjectsCursor
2021-07-22T18:58:56.347 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: x cursor=MIN
2021-07-22T18:58:56.347 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=1 cursor=7:02547ec2:::1:head
2021-07-22T18:58:56.347 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=10 cursor=7:52ea6a34:::10:head
2021-07-22T18:58:56.348 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=13 cursor=7:566253c9:::13:head
2021-07-22T18:58:56.348 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=7 cursor=7:5c6b0b28:::7:head
2021-07-22T18:58:56.348 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=14 cursor=7:62a1935d:::14:head
2021-07-22T18:58:56.348 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=0 cursor=7:6cac518f:::0:head
2021-07-22T18:58:56.349 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=15 cursor=7:863748b0:::15:head
2021-07-22T18:58:56.349 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=11 cursor=7:89d3ae78:::11:head
2021-07-22T18:58:56.349 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=5 cursor=7:b29083e3:::5:head
2021-07-22T18:58:56.349 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=8 cursor=7:bd63b0f1:::8:head
2021-07-22T18:58:56.350 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=6 cursor=7:c4fdafeb:::6:head
2021-07-22T18:58:56.350 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=3 cursor=7:cfc208b3:::3:head
2021-07-22T18:58:56.350 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=4 cursor=7:d83876eb:::4:head
2021-07-22T18:58:56.350 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=12 cursor=7:de5d7c5f:::12:head
2021-07-22T18:58:56.350 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=9 cursor=7:e960b815:::9:head
2021-07-22T18:58:56.351 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > oid=2 cursor=7:f905c69b:::2:head
2021-07-22T18:58:56.351 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: FIRST> seek to MIN oid=1
2021-07-22T18:58:56.351 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=1 cursor=7:02547ec2:::1:head
2021-07-22T18:58:56.351 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:02547ec2:::1:head
2021-07-22T18:58:56.352 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:02547ec2:::1:head -> 1
2021-07-22T18:58:56.352 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=10 cursor=7:52ea6a34:::10:head
2021-07-22T18:58:56.352 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:52ea6a34:::10:head
2021-07-22T18:58:56.353 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:52ea6a34:::10:head -> 10
2021-07-22T18:58:56.353 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=13 cursor=7:566253c9:::13:head
2021-07-22T18:58:56.353 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:566253c9:::13:head
2021-07-22T18:58:56.353 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:566253c9:::13:head -> 13
2021-07-22T18:58:56.353 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=7 cursor=7:5c6b0b28:::7:head
2021-07-22T18:58:56.354 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:5c6b0b28:::7:head
2021-07-22T18:58:56.354 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:5c6b0b28:::7:head -> 7
2021-07-22T18:59:05.777 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=14 cursor=7:62a1935d:::14:head
2021-07-22T18:59:05.777 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:62a1935d:::14:head
2021-07-22T18:59:05.777 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:62a1935d:::14:head -> 14
2021-07-22T18:59:05.778 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=0 cursor=7:6cac518f:::0:head
2021-07-22T18:59:05.778 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:6cac518f:::0:head
2021-07-22T18:59:05.778 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:6cac518f:::0:head -> 0
2021-07-22T18:59:05.778 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=15 cursor=7:863748b0:::15:head
2021-07-22T18:59:05.778 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:863748b0:::15:head
2021-07-22T18:59:05.779 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:863748b0:::15:head -> 15
2021-07-22T18:59:05.779 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=11 cursor=7:89d3ae78:::11:head
2021-07-22T18:59:05.779 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:89d3ae78:::11:head
2021-07-22T18:59:05.779 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:89d3ae78:::11:head -> 11
2021-07-22T18:59:05.780 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=5 cursor=7:b29083e3:::5:head
2021-07-22T18:59:05.780 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:b29083e3:::5:head
2021-07-22T18:59:05.780 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:b29083e3:::5:head -> 5
2021-07-22T18:59:05.780 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=8 cursor=7:bd63b0f1:::8:head
2021-07-22T18:59:05.781 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:bd63b0f1:::8:head
2021-07-22T18:59:05.781 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:bd63b0f1:::8:head -> 8
2021-07-22T18:59:05.781 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=6 cursor=7:c4fdafeb:::6:head
2021-07-22T18:59:05.781 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:c4fdafeb:::6:head
2021-07-22T18:59:05.782 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:c4fdafeb:::6:head -> 6
2021-07-22T18:59:05.782 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=3 cursor=7:cfc208b3:::3:head
2021-07-22T18:59:05.782 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:cfc208b3:::3:head
2021-07-22T18:59:05.782 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:cfc208b3:::3:head -> 3
2021-07-22T18:59:05.782 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=4 cursor=7:d83876eb:::4:head
2021-07-22T18:59:05.783 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:d83876eb:::4:head
2021-07-22T18:59:05.783 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:d83876eb:::4:head -> 4
2021-07-22T18:59:05.783 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=12 cursor=7:de5d7c5f:::12:head
2021-07-22T18:59:05.783 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:de5d7c5f:::12:head
2021-07-22T18:59:05.783 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:de5d7c5f:::12:head -> 12
2021-07-22T18:59:05.784 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=9 cursor=7:e960b815:::9:head
2021-07-22T18:59:05.784 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:e960b815:::9:head
2021-07-22T18:59:05.784 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:e960b815:::9:head -> 9
2021-07-22T18:59:05.784 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : oid=2 cursor=7:f905c69b:::2:head
2021-07-22T18:59:05.785 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:f905c69b:::2:head
2021-07-22T18:59:05.785 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:f905c69b:::2:head -> 2
2021-07-22T18:59:05.785 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:f905c69b:::2:head
2021-07-22T18:59:05.785 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:f905c69b:::2:head expected=7:f905c69b:::2:head
2021-07-22T18:59:05.785 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:f905c69b:::2:head -> 2
2021-07-22T18:59:05.786 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=2 expected=2
2021-07-22T18:59:05.786 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:e960b815:::9:head
2021-07-22T18:59:05.786 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:e960b815:::9:head expected=7:e960b815:::9:head
2021-07-22T18:59:05.786 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:e960b815:::9:head -> 9
2021-07-22T18:59:05.787 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=9 expected=9
2021-07-22T18:59:05.787 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:566253c9:::13:head
2021-07-22T18:59:05.787 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:566253c9:::13:head expected=7:566253c9:::13:head
2021-07-22T18:59:05.788 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:566253c9:::13:head -> 13
2021-07-22T18:59:05.788 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=13 expected=13
2021-07-22T18:59:05.788 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:02547ec2:::1:head
2021-07-22T18:59:05.788 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:02547ec2:::1:head expected=7:02547ec2:::1:head
2021-07-22T18:59:05.788 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:02547ec2:::1:head -> 1
2021-07-22T18:59:05.789 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=1 expected=1
2021-07-22T18:59:05.789 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:6cac518f:::0:head
2021-07-22T18:59:05.789 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:6cac518f:::0:head expected=7:6cac518f:::0:head
2021-07-22T18:59:05.789 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:6cac518f:::0:head -> 0
2021-07-22T18:59:05.790 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=0 expected=0
2021-07-22T18:59:05.790 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:62a1935d:::14:head
2021-07-22T18:59:05.790 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:62a1935d:::14:head expected=7:62a1935d:::14:head
2021-07-22T18:59:05.790 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:62a1935d:::14:head -> 14
2021-07-22T18:59:05.790 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=14 expected=14
2021-07-22T18:59:05.791 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:de5d7c5f:::12:head
2021-07-22T18:59:05.791 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:de5d7c5f:::12:head expected=7:de5d7c5f:::12:head
2021-07-22T18:59:05.791 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:de5d7c5f:::12:head -> 12
2021-07-22T18:59:05.791 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=12 expected=12
2021-07-22T18:59:05.791 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:d83876eb:::4:head
2021-07-22T18:59:05.792 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : cursor()=7:d83876eb:::4:head expected=7:d83876eb:::4:head
2021-07-22T18:59:05.792 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: > 7:d83876eb:::4:head -> 4
2021-07-22T18:59:05.792 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : entry=4 expected=4
2021-07-22T18:59:05.792 INFO:tasks.workunit.client.0.smithi115.stdout: api_list: : seek to 7:cfc208b3:::3:head

All the other tests succeeded. Much later in the logs, the wait_for_recovery check fails:

2021-07-21T19:55:12.224 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 188, in wrapper
return func(self)
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1404, in _do_thrash
timeout=self.config.get('timeout')
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 2804, in wait_for_recovery
'wait_for_recovery: failed before timeout expired'
AssertionError: wait_for_recovery: failed before timeout expired

2021-07-21T19:55:12.224 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1279, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 188, in wrapper
return func(self)
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1404, in _do_thrash
timeout=self.config.get('timeout')
File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 2804, in wait_for_recovery
'wait_for_recovery: failed before timeout expired'
AssertionError: wait_for_recovery: failed before timeout expired

The test performs a seek within a loop but it looks like the seek probably didn't complete:

TEST_F(LibRadosList, ListObjectsCursor) {
...
while (p != cursor_to_obj.rend()) {
cout << ": seek to " << p->first << std::endl;
rados_object_list_cursor cursor;
rados_object_list_cursor oid(p->first);
rados_nobjects_list_seek_cursor(ctx, oid);
...
}

Actions #1

Updated by Sridhar Seshasayee almost 3 years ago

Attempted to reproduce this issue multiple times using the original config file. This occurs rarely. After multiple attempts, the "wait_for_recovery" timeout was hit on master with the following SHA1:

SHA1: 7b616677119eb020987377f163ce775ca5d94c29

2021-07-22T19:41:18.286 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 188, in wrapper
    return func(self)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1404, in _do_thrash
    timeout=self.config.get('timeout')
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 2804, in wait_for_recovery
    'wait_for_recovery: failed before timeout expired'
AssertionError: wait_for_recovery: failed before timeout expired

2021-07-22T19:41:18.287 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1279, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 188, in wrapper
    return func(self)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1404, in _do_thrash
    timeout=self.config.get('timeout')
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 2804, in wait_for_recovery
    'wait_for_recovery: failed before timeout expired'

The following rados api tests ran and passed:

2021-07-30T09:49:08.782 INFO:tasks.workunit.client.0.smithi125.stdout:                 api_asio: [  PASSED  ] 12 tests.
2021-07-30T09:49:10.146 INFO:tasks.workunit.client.0.smithi125.stdout:     api_cls_remote_reads: [  PASSED  ] 1 test.
2021-07-30T09:49:26.394 INFO:tasks.workunit.client.0.smithi125.stdout:              api_service: [  PASSED  ] 4 tests.
2021-07-30T09:49:29.311 INFO:tasks.workunit.client.0.smithi125.stdout:           api_service_pp: [  PASSED  ] 4 tests.
2021-07-30T09:49:30.060 INFO:tasks.workunit.client.0.smithi125.stdout:                 api_pool: [  PASSED  ] 7 tests.
2021-07-30T09:49:41.746 INFO:tasks.workunit.client.0.smithi125.stdout:                 api_misc: [  PASSED  ] 11 tests.
2021-07-30T09:50:27.149 INFO:tasks.workunit.client.0.smithi125.stdout:   api_c_write_operations: [  PASSED  ] 8 tests.
2021-07-30T10:08:33.514 INFO:tasks.workunit.client.0.smithi125.stdout:                   api_io: [  PASSED  ] 23 tests.
2021-07-30T10:09:51.365 INFO:tasks.workunit.client.0.smithi125.stdout:      api_watch_notify_pp: [  PASSED  ] 16 tests.
2021-07-30T10:33:29.239 INFO:tasks.workunit.client.0.smithi125.stdout:              api_lock_pp: [  PASSED  ] 16 tests.
2021-07-30T10:50:15.128 INFO:tasks.workunit.client.0.smithi125.stdout:               api_aio_pp: [  PASSED  ] 53 tests.
2021-07-30T10:51:10.135 INFO:tasks.workunit.client.0.smithi125.stdout:                  api_aio: [  PASSED  ] 40 tests.
2021-07-30T10:55:12.089 INFO:tasks.workunit.client.0.smithi125.stdout:              api_stat_pp: [  PASSED  ] 9 tests.
2021-07-30T10:55:30.629 INFO:tasks.workunit.client.0.smithi125.stdout:                 api_stat: [  PASSED  ] 8 tests.
2021-07-30T11:02:03.800 INFO:tasks.workunit.client.0.smithi125.stdout:         api_snapshots_pp: [  PASSED  ] 18 tests.
2021-07-30T11:03:04.323 INFO:tasks.workunit.client.0.smithi125.stdout:                 api_lock: [  PASSED  ] 16 tests.
2021-07-30T11:34:12.961 INFO:tasks.workunit.client.0.smithi125.stdout:            api_snapshots: [  PASSED  ] 12 tests.

The background thrashing is resulting in many PGs going into backfill_wait/recovery_wait states indefinitely. For e.g.,

2021-07-30T11:54:20.114 INFO:tasks.ceph.ceph_manager.ceph:PG 22.2 is not active+clean
2021-07-30T11:54:20.114 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '22.2', 'version': "69'1", 'reported_seq': 1273, 'reported_epoch': 4235, 'state': 'active+recovery_wait+degraded', 'last_fresh': '2021-07-30T11:54:12.821600+0000', 'last_change': '2021-07-30T11:09:04.946334+0000', 'last_active': '2021-07-30T11:54:12.821600+0000', 'last_peered': '2021-07-30T11:54:12.821600+0000', 'last_clean': '2021-07-30T09:49:16.052921+0000', 'last_became_active': '2021-07-30T11:08:16.490204+0000', 'last_became_peered': '2021-07-30T11:08:16.490204+0000', 'last_unstale': '2021-07-30T11:54:12.821600+0000', 'last_undegraded': '2021-07-30T11:07:23.621302+0000', 'last_fullsized': '2021-07-30T11:54:12.821600+0000', 'mapping_epoch': 2883, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 53, 'last_epoch_clean': 56, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-07-30T09:49:00.048478+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-07-30T09:49:00.048478+0000', 'last_clean_scrub_stamp': '2021-07-30T09:49:00.048478+0000', 'log_size': 1, 'ondisk_log_size': 1, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 74, 'num_objects': 1, 'num_object_clones': 0, 'num_object_copies': 3, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 2, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 1, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 1, 'num_write_kb': 1, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [1, 2, 7], 'acting': [1, 2, 7], 'avail_no_missing': ['1', '3', '6'], 'object_location_counts': [{'shards': '1,3,6', 'objects': 1}], 'blocked_by': [], 'up_primary': 1, 'acting_primary': 1, 'purged_snaps': []}
2021-07-30T11:54:20.114 INFO:tasks.ceph.ceph_manager.ceph:PG 28.7 is not active+clean
2021-07-30T11:54:20.115 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '28.7', 'version': "0'0", 'reported_seq': 4051, 'reported_epoch': 4235, 'state': 'active+remapped+backfill_wait', 'last_fresh': '2021-07-30T11:54:12.920651+0000', 'last_change': '2021-07-30T11:10:53.435718+0000', 'last_active': '2021-07-30T11:54:12.920651+0000', 'last_peered': '2021-07-30T11:54:12.920651+0000', 'last_clean': '2021-07-30T11:06:26.522618+0000', 'last_became_active': '2021-07-30T11:10:53.435012+0000', 'last_became_peered': '2021-07-30T11:10:53.435012+0000', 'last_unstale': '2021-07-30T11:54:12.920651+0000', 'last_undegraded': '2021-07-30T11:54:12.920651+0000', 'last_fullsized': '2021-07-30T11:54:12.920651+0000', 'mapping_epoch': 2884, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 53, 'last_epoch_clean': 2878, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-07-30T09:49:00.048478+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-07-30T09:49:00.048478+0000', 'last_clean_scrub_stamp': '2021-07-30T09:49:00.048478+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [2, 4, 1], 'acting': [2, 4, 3], 'avail_no_missing': ['2', '1', '3', '4'], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 2, 'acting_primary': 2, 'purged_snaps': []}

The PGs going into the above states is consistently seen across all the failures mentioned above.

Actions #2

Updated by Kamoltat (Junior) Sirivadhna about 2 years ago

/a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721371
/a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721651

Actions #3

Updated by Neha Ojha about 2 years ago

Kamoltat Sirivadhna wrote:

/a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721371
/a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721651

These are different failures, https://tracker.ceph.com/issues/51076.

Actions

Also available in: Atom PDF