Project

General

Profile

Actions

Bug #61201

closed

qa: test_rebuild_moved_file (tasks/data-scan) fails because mds crashes in pacific

Added by Kotresh Hiremath Ravishankar 12 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description: fs/functional/{begin clusters/1a3s-mds-4c-client conf/{client mds mon osd} distro/{ubuntu_latest} mount/kclient/{mount overrides/{distro/stock/{k-stock rhel_8} ms-die-on-skipped}} objectstore/bluestore-bitmap overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile} tasks/data-scan}

Log: http://qa-proxy.ceph.com/teuthology/yuriw-2023-05-15_21:56:33-fs-wip-yuri2-testing-2023-05-15-0810-pacific_2-distro-default-smithi/7274986/teuthology.log

Job: https://pulpito.ceph.com/yuriw-2023-05-15_21:56:33-fs-wip-yuri2-testing-2023-05-15-0810-pacific_2-distro-default-smithi/7274986/

Sentry event: https://sentry.ceph.com/organizations/ceph/?query=1660ede559ce46d6bcc4eff3f8ec2d71

Failure Reason:
Test failure: test_rebuild_moved_file (tasks.cephfs.test_data_scan.TestDataScan)

The test failed waiting for mds to become up:standby

2023-05-16T05:53:56.526 INFO:tasks.cephfs_test_runner:======================================================================
2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:ERROR: test_rebuild_moved_file (tasks.cephfs.test_data_scan.TestDataScan)
2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_20531b61f434caac5e29cea07411fac6911e8e91/qa/tasks/cephfs/test_data_scan.py", line 399, in test_rebuild_moved_file
2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:    self._rebuild_metadata(MovedFile(self.fs, self.mount_a))
2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_20531b61f434caac5e29cea07411fac6911e8e91/qa/tasks/cephfs/test_data_scan.py", line 352, in _rebuild_metadata
2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner:    self.wait_until_equal(
2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_20531b61f434caac5e29cea07411fac6911e8e91/qa/tasks/ceph_test_case.py", line 198, in wait_until_equal
2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner:    raise TestTimeoutError("Timed out after {0} seconds waiting for {1} (currently {2})".format(
2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 60 seconds waiting for up:standby (currently None)
2023-05-16T05:53:56.529 INFO:tasks.cephfs_test_runner:
2023-05-16T05:53:56.529 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

The mds stack trace is as below

      -30> 2023-05-16T05:52:41.036+0000 7f7365170700 10 log_client  will send 2023-05-16T05:52:41.038437+0000 mds.c (mds.0) 1 : cluster [ERR] error reading table object 'mds0_inotable' -2 ((2) No such file or directory)
   -29> 2023-05-16T05:52:41.036+0000 7f7365170700 10 monclient: _send_mon_message to mon.a at v2:172.21.15.16:3300/0
   -28> 2023-05-16T05:52:41.036+0000 7f7365170700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:3300/0,v1:172.21.15.16:6789/0] -- log(1 entries from seq 1 at 2023-05-16T05:52:41.038437+0000) v1
 -- 0x55df90ab2700 con 0x55df90b83000
   -27> 2023-05-16T05:52:41.036+0000 7f736d981700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.1 v2:172.21.15.16:6816/116597 1 ==== osd_op_reply(5 mds0_openfiles.0 [omap-get-header,omap-get-vals] v0'0 uv0 
ondisk = -2 ((2) No such file or directory)) v8 ==== 202+0+0 (crc 0 0 0) 0x55df91780fc0 con 0x55df918c5800
   -26> 2023-05-16T05:52:41.036+0000 7f7365170700  5 mds.beacon.c Sending beacon down:damaged seq 2
   -25> 2023-05-16T05:52:41.037+0000 7f7365170700 10 monclient: _send_mon_message to mon.a at v2:172.21.15.16:3300/0
   -24> 2023-05-16T05:52:41.037+0000 7f7365170700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:3300/0,v1:172.21.15.16:6789/0] -- mdsbeacon(10998/c down:damaged seq=2 v152) v8 -- 0x55df90b4cdc0
 con 0x55df90b83000
   -23> 2023-05-16T05:52:41.037+0000 7f7365170700 20 mds.beacon.c send_and_wait: awaiting 2 for up to 5s
   -22> 2023-05-16T05:52:41.037+0000 7f736d981700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.1 v2:172.21.15.16:6816/116597 2 ==== osd_op_reply(6 mds_snaptable [read 0~0] v0'0 uv0 ondisk = -2 ((2) No such
 file or directory)) v8 ==== 157+0+0 (crc 0 0 0) 0x55df91780fc0 con 0x55df918c5800
   -21> 2023-05-16T05:52:41.038+0000 7f736b17c700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== mon.0 v2:172.21.15.16:3300/0 17 ==== osd_map(168..168 src has 1..168) v4 ==== 366+0+0 (secure 0 0 0) 0x55df90ab270
0 con 0x55df90b83000
   -20> 2023-05-16T05:52:41.039+0000 7f736e182700  1 --2- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] >> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] conn(0x55df918c4c00 0x55df918ab200 unknown :-1 s=BANNER
_CONNECTING pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
   -19> 2023-05-16T05:52:41.039+0000 7f736e182700 10 monclient: get_auth_request con 0x55df918c4c00 auth_method 0
   -18> 2023-05-16T05:52:41.039+0000 7f736e182700  1 --2- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] >> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] conn(0x55df918c4c00 0x55df918ab200 crc :-1 s=READY pgs=
66 cs=0 l=1 rev1=1 rx=0 tx=0).ready entity=osd.2 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
   -17> 2023-05-16T05:52:41.039+0000 7f736e182700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 1 ==== osd_op_reply(4 500.00000000 [read 0~0] v0'0 uv0 ondisk = -2 ((2) No such 
file or directory)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00
   -16> 2023-05-16T05:52:41.039+0000 7f7366172700  0 mds.0.journaler.pq(ro) error getting journal off disk
   -15> 2023-05-16T05:52:41.039+0000 7f736e182700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 2 ==== osd_op_reply(3 400.00000000 [read 0~0] v0'0 uv0 ondisk = -2 ((2) No such 
file or directory)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00
   -14> 2023-05-16T05:52:41.039+0000 7f7366172700  1 mds.0.purge_queue operator(): Purge Queue not found, assuming this is an upgrade and creating it.
   -13> 2023-05-16T05:52:41.039+0000 7f7366172700  4 mds.0.purge_queue create: creating
   -12> 2023-05-16T05:52:41.039+0000 7f7366172700  1 mds.0.journaler.pq(ro) set_writeable
   -11> 2023-05-16T05:52:41.039+0000 7f7366172700  1 mds.0.journaler.pq(rw) created blank journal at inode 0x0x500, format=1
   -10> 2023-05-16T05:52:41.039+0000 7f736496f700  1 mds.0.journalpointer Journal pointer '400.00000000' read failed: (2) No such file or directory
    -9> 2023-05-16T05:52:41.039+0000 7f736496f700  4 mds.0.journalpointer Writing pointer object '400.00000000': 0x0x200:0x0x0
    -8> 2023-05-16T05:52:41.039+0000 7f7366172700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] -- osd_op(unknown.0.152:7 21.15 21:aa448500:::500.000000
00:head [writefull 0~90 in=90b] snapc 0=[] ondisk+write+known_if_redirected+full_force e168) v8 -- 0x55df918c5000 con 0x55df918c4c00
    -7> 2023-05-16T05:52:41.039+0000 7f736496f700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] -- osd_op(unknown.0.152:8 21.f 21:f1f69726:::400.0000000
0:head [writefull 0~22 in=22b] snapc 0=[] ondisk+write+known_if_redirected+full_force e168) v8 -- 0x55df918c5400 con 0x55df918c4c00
    -6> 2023-05-16T05:52:41.041+0000 7f736e182700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 3 ==== osd_op_reply(7 500.00000000 [writefull 0~90] v0'0 uv0 ondisk = -108 ((108
) Cannot send after transport endpoint shutdown)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00
    -5> 2023-05-16T05:52:41.041+0000 7f7366172700 -1 mds.0.journaler.pq(rw) _finish_write_head got (108) Cannot send after transport endpoint shutdown
    -4> 2023-05-16T05:52:41.041+0000 7f7366172700 -1 mds.0.journaler.pq(rw) handle_write_error (108) Cannot send after transport endpoint shutdown
    -3> 2023-05-16T05:52:41.041+0000 7f736e182700  1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 4 ==== osd_op_reply(8 400.00000000 [writefull 0~22] v0'0 uv0 ondisk = -108 ((108
) Cannot send after transport endpoint shutdown)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00
    -2> 2023-05-16T05:52:41.042+0000 7f736496f700 -1 mds.0.journalpointer Error writing pointer object '400.00000000': (108) Cannot send after transport endpoint shutdown
    -1> 2023-05-16T05:52:41.042+0000 7f736496f700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.13-145-g20531b61/rpm/el8/
BUILD/ceph-16.2.13-145-g20531b61/src/mds/MDLog.cc: In function 'void MDLog::_recovery_thread(MDSContext*)' thread 7f736496f700 time 2023-05-16T05:52:41.043527+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.13-145-g20531b61/rpm/el8/BUILD/ceph-16.2.13-145-g20531b61/src/mds/MDLog.cc: 979: FAILED ceph_assert(write_result >= 0)

 ceph version 16.2.13-145-g20531b61 (20531b61f434caac5e29cea07411fac6911e8e91) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f73739911e2]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2793fc) [0x7f73739913fc]
 3: (MDLog::_recovery_thread(MDSContext*)+0xd72) [0x55df8ef15132]
 4: (MDLog::RecoveryThread::entry()+0x15) [0x55df8ec07845]
 5: /lib64/libpthread.so.0(+0x814a) [0x7f737297514a]
 6: clone()

Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #62054: reef: qa: test_rebuild_moved_file (tasks/data-scan) fails because mds crashes in pacificResolvedJos CollinActions
Copied to CephFS - Backport #62055: pacific: qa: test_rebuild_moved_file (tasks/data-scan) fails because mds crashes in pacificResolvedJos CollinActions
Copied to CephFS - Backport #62056: quincy: qa: test_rebuild_moved_file (tasks/data-scan) fails because mds crashes in pacificResolvedJos CollinActions
Actions

Also available in: Atom PDF