Actions
Bug #61201
closedqa: test_rebuild_moved_file (tasks/data-scan) fails because mds crashes in pacific
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:
0%
Source:
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Description: fs/functional/{begin clusters/1a3s-mds-4c-client conf/{client mds mon osd} distro/{ubuntu_latest} mount/kclient/{mount overrides/{distro/stock/{k-stock rhel_8} ms-die-on-skipped}} objectstore/bluestore-bitmap overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile} tasks/data-scan}
Sentry event: https://sentry.ceph.com/organizations/ceph/?query=1660ede559ce46d6bcc4eff3f8ec2d71
Failure Reason:
Test failure: test_rebuild_moved_file (tasks.cephfs.test_data_scan.TestDataScan)
The test failed waiting for mds to become up:standby
2023-05-16T05:53:56.526 INFO:tasks.cephfs_test_runner:====================================================================== 2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:ERROR: test_rebuild_moved_file (tasks.cephfs.test_data_scan.TestDataScan) 2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_20531b61f434caac5e29cea07411fac6911e8e91/qa/tasks/cephfs/test_data_scan.py", line 399, in test_rebuild_moved_file 2023-05-16T05:53:56.527 INFO:tasks.cephfs_test_runner: self._rebuild_metadata(MovedFile(self.fs, self.mount_a)) 2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_20531b61f434caac5e29cea07411fac6911e8e91/qa/tasks/cephfs/test_data_scan.py", line 352, in _rebuild_metadata 2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner: self.wait_until_equal( 2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_20531b61f434caac5e29cea07411fac6911e8e91/qa/tasks/ceph_test_case.py", line 198, in wait_until_equal 2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0} seconds waiting for {1} (currently {2})".format( 2023-05-16T05:53:56.528 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 60 seconds waiting for up:standby (currently None) 2023-05-16T05:53:56.529 INFO:tasks.cephfs_test_runner: 2023-05-16T05:53:56.529 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
The mds stack trace is as below
-30> 2023-05-16T05:52:41.036+0000 7f7365170700 10 log_client will send 2023-05-16T05:52:41.038437+0000 mds.c (mds.0) 1 : cluster [ERR] error reading table object 'mds0_inotable' -2 ((2) No such file or directory) -29> 2023-05-16T05:52:41.036+0000 7f7365170700 10 monclient: _send_mon_message to mon.a at v2:172.21.15.16:3300/0 -28> 2023-05-16T05:52:41.036+0000 7f7365170700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:3300/0,v1:172.21.15.16:6789/0] -- log(1 entries from seq 1 at 2023-05-16T05:52:41.038437+0000) v1 -- 0x55df90ab2700 con 0x55df90b83000 -27> 2023-05-16T05:52:41.036+0000 7f736d981700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.1 v2:172.21.15.16:6816/116597 1 ==== osd_op_reply(5 mds0_openfiles.0 [omap-get-header,omap-get-vals] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 202+0+0 (crc 0 0 0) 0x55df91780fc0 con 0x55df918c5800 -26> 2023-05-16T05:52:41.036+0000 7f7365170700 5 mds.beacon.c Sending beacon down:damaged seq 2 -25> 2023-05-16T05:52:41.037+0000 7f7365170700 10 monclient: _send_mon_message to mon.a at v2:172.21.15.16:3300/0 -24> 2023-05-16T05:52:41.037+0000 7f7365170700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:3300/0,v1:172.21.15.16:6789/0] -- mdsbeacon(10998/c down:damaged seq=2 v152) v8 -- 0x55df90b4cdc0 con 0x55df90b83000 -23> 2023-05-16T05:52:41.037+0000 7f7365170700 20 mds.beacon.c send_and_wait: awaiting 2 for up to 5s -22> 2023-05-16T05:52:41.037+0000 7f736d981700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.1 v2:172.21.15.16:6816/116597 2 ==== osd_op_reply(6 mds_snaptable [read 0~0] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 157+0+0 (crc 0 0 0) 0x55df91780fc0 con 0x55df918c5800 -21> 2023-05-16T05:52:41.038+0000 7f736b17c700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== mon.0 v2:172.21.15.16:3300/0 17 ==== osd_map(168..168 src has 1..168) v4 ==== 366+0+0 (secure 0 0 0) 0x55df90ab270 0 con 0x55df90b83000 -20> 2023-05-16T05:52:41.039+0000 7f736e182700 1 --2- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] >> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] conn(0x55df918c4c00 0x55df918ab200 unknown :-1 s=BANNER _CONNECTING pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0 -19> 2023-05-16T05:52:41.039+0000 7f736e182700 10 monclient: get_auth_request con 0x55df918c4c00 auth_method 0 -18> 2023-05-16T05:52:41.039+0000 7f736e182700 1 --2- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] >> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] conn(0x55df918c4c00 0x55df918ab200 crc :-1 s=READY pgs= 66 cs=0 l=1 rev1=1 rx=0 tx=0).ready entity=osd.2 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0 -17> 2023-05-16T05:52:41.039+0000 7f736e182700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 1 ==== osd_op_reply(4 500.00000000 [read 0~0] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00 -16> 2023-05-16T05:52:41.039+0000 7f7366172700 0 mds.0.journaler.pq(ro) error getting journal off disk -15> 2023-05-16T05:52:41.039+0000 7f736e182700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 2 ==== osd_op_reply(3 400.00000000 [read 0~0] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00 -14> 2023-05-16T05:52:41.039+0000 7f7366172700 1 mds.0.purge_queue operator(): Purge Queue not found, assuming this is an upgrade and creating it. -13> 2023-05-16T05:52:41.039+0000 7f7366172700 4 mds.0.purge_queue create: creating -12> 2023-05-16T05:52:41.039+0000 7f7366172700 1 mds.0.journaler.pq(ro) set_writeable -11> 2023-05-16T05:52:41.039+0000 7f7366172700 1 mds.0.journaler.pq(rw) created blank journal at inode 0x0x500, format=1 -10> 2023-05-16T05:52:41.039+0000 7f736496f700 1 mds.0.journalpointer Journal pointer '400.00000000' read failed: (2) No such file or directory -9> 2023-05-16T05:52:41.039+0000 7f736496f700 4 mds.0.journalpointer Writing pointer object '400.00000000': 0x0x200:0x0x0 -8> 2023-05-16T05:52:41.039+0000 7f7366172700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] -- osd_op(unknown.0.152:7 21.15 21:aa448500:::500.000000 00:head [writefull 0~90 in=90b] snapc 0=[] ondisk+write+known_if_redirected+full_force e168) v8 -- 0x55df918c5000 con 0x55df918c4c00 -7> 2023-05-16T05:52:41.039+0000 7f736496f700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] --> [v2:172.21.15.16:6802/116595,v1:172.21.15.16:6804/116595] -- osd_op(unknown.0.152:8 21.f 21:f1f69726:::400.0000000 0:head [writefull 0~22 in=22b] snapc 0=[] ondisk+write+known_if_redirected+full_force e168) v8 -- 0x55df918c5400 con 0x55df918c4c00 -6> 2023-05-16T05:52:41.041+0000 7f736e182700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 3 ==== osd_op_reply(7 500.00000000 [writefull 0~90] v0'0 uv0 ondisk = -108 ((108 ) Cannot send after transport endpoint shutdown)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00 -5> 2023-05-16T05:52:41.041+0000 7f7366172700 -1 mds.0.journaler.pq(rw) _finish_write_head got (108) Cannot send after transport endpoint shutdown -4> 2023-05-16T05:52:41.041+0000 7f7366172700 -1 mds.0.journaler.pq(rw) handle_write_error (108) Cannot send after transport endpoint shutdown -3> 2023-05-16T05:52:41.041+0000 7f736e182700 1 -- [v2:172.21.15.16:6833/627211370,v1:172.21.15.16:6835/627211370] <== osd.2 v2:172.21.15.16:6802/116595 4 ==== osd_op_reply(8 400.00000000 [writefull 0~22] v0'0 uv0 ondisk = -108 ((108 ) Cannot send after transport endpoint shutdown)) v8 ==== 156+0+0 (crc 0 0 0) 0x55df9188ab40 con 0x55df918c4c00 -2> 2023-05-16T05:52:41.042+0000 7f736496f700 -1 mds.0.journalpointer Error writing pointer object '400.00000000': (108) Cannot send after transport endpoint shutdown -1> 2023-05-16T05:52:41.042+0000 7f736496f700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.13-145-g20531b61/rpm/el8/ BUILD/ceph-16.2.13-145-g20531b61/src/mds/MDLog.cc: In function 'void MDLog::_recovery_thread(MDSContext*)' thread 7f736496f700 time 2023-05-16T05:52:41.043527+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.13-145-g20531b61/rpm/el8/BUILD/ceph-16.2.13-145-g20531b61/src/mds/MDLog.cc: 979: FAILED ceph_assert(write_result >= 0) ceph version 16.2.13-145-g20531b61 (20531b61f434caac5e29cea07411fac6911e8e91) pacific (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f73739911e2] 2: /usr/lib64/ceph/libceph-common.so.2(+0x2793fc) [0x7f73739913fc] 3: (MDLog::_recovery_thread(MDSContext*)+0xd72) [0x55df8ef15132] 4: (MDLog::RecoveryThread::entry()+0x15) [0x55df8ec07845] 5: /lib64/libpthread.so.0(+0x814a) [0x7f737297514a] 6: clone()
Actions