Bug #39634
qa: test_full_same_file timeout
0%
Description
2019-05-08T05:26:30.146 INFO:tasks.cephfs_test_runner:====================================================================== 2019-05-08T05:26:30.146 INFO:tasks.cephfs_test_runner:ERROR: test_full_same_file (tasks.cephfs.test_full.TestClusterFull) 2019-05-08T05:26:30.146 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2019-05-08T05:26:30.146 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2019-05-08T05:26:30.147 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20190501.220632/qa/tasks/cephfs/test_full.py", line 205, in test_full_same_file 2019-05-08T05:26:30.147 INFO:tasks.cephfs_test_runner: self._test_full(False) 2019-05-08T05:26:30.147 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20190501.220632/qa/tasks/cephfs/test_full.py", line 187, in _test_full 2019-05-08T05:26:30.147 INFO:tasks.cephfs_test_runner: timeout=osd_mon_report_interval * 5) 2019-05-08T05:26:30.147 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20190501.220632/qa/tasks/ceph_test_case.py", line 163, in wait_until_true 2019-05-08T05:26:30.148 INFO:tasks.cephfs_test_runner: raise RuntimeError("Timed out after {0}s".format(elapsed)) 2019-05-08T05:26:30.148 INFO:tasks.cephfs_test_runner:RuntimeError: Timed out after 25s
/ceph/teuthology-archive/pdonnell-2019-05-08_04:00:30-fs-wip-pdonnell-testing-20190501.220632-distro-basic-smithi/3939426/teuthology.log
History
#1 Updated by Patrick Donnelly about 4 years ago
- Target version deleted (
v15.0.0)
#2 Updated by Kotresh Hiremath Ravishankar over 2 years ago
Seen in the below pacific run
Log: /ceph/teuthology-archive/yuriw-2021-11-08_20:21:11-fs-wip-yuri5-testing-2021-11-08-1003-pacific-distro-basic-smithi/6492110/teuthology.log
#4 Updated by Xiubo Li over 2 years ago
When the test_full test case was deleting the "large_file_b" and "large_file_a", from /ceph/teuthology-archive/yuriw-2021-11-08_20:21:11-fs-wip-yuri5-testing-2021-11-08-1003-pacific-distro-basic-smithi/6492110/teuthology.log:
2021-11-09T15:08:25.412 DEBUG:teuthology.orchestra.run.smithi053:> (cd /home/ubuntu/cephtest/mnt.0 && exec rm -f large_file_b) 2021-11-09T15:08:25.485 INFO:teuthology.orchestra.run:Running command with timeout 900 2021-11-09T15:08:25.486 DEBUG:teuthology.orchestra.run.smithi053:> (cd /home/ubuntu/cephtest/mnt.0 && exec rm -f large_file_a)
In /ceph/teuthology-archive/yuriw-2021-11-08_20:21:11-fs-wip-yuri5-testing-2021-11-08-1003-pacific-distro-basic-smithi/6492110/remote/smithi060/log/ceph-osd.4.log.gz:
2021-11-09T15:08:20.659+0000 7f0e1144e700 20 osd.4 110 check_full_status cur ratio 1, physical ratio 1, new state full 2021-11-09T15:08:20.659+0000 7f0e1144e700 10 osd.4 110 check_full_status none -> full ... 2021-11-09T15:08:26.563+0000 7f0e1144e700 20 osd.4 112 check_full_status cur ratio 1, physical ratio 1, new state full ... 2021-11-09T15:08:27.659+0000 7f0e1144e700 20 osd.4 113 check_full_status cur ratio 1, physical ratio 1, new state full ... 2021-11-09T15:08:27.763+0000 7f0e13452700 20 osd.4 pg_epoch: 114 pg[14.24( v 114'9 (0'0,114'9] local-lis/les=113/114 n=0 ec=107/107 lis/c=113/107 les/c/f=114/108/110 sis=113) [7,4] r=1 lpr=113 pi=[107,113)/1 luod=0'0 lua=114'5 crt=114'9 lcod 114'7 mlcod 109'2 active mbc={}] rollforward: entry=114'9 (109'4) delete 14:276f434d:::10000000001.0000003d:head by mds.0.43:200 2021-11-09T15:08:27.255282+0000 0 ObjectCleanRegions clean_offsets: [4194304~18446744073705357311], clean_omap: 0, new_object: 0 ... 2021-11-09T15:08:33.563+0000 7f0e1144e700 20 osd.4 116 check_full_status cur ratio 9.22337e+10, physical ratio 9.22337e+10, new state failsafe 2021-11-09T15:08:33.563+0000 7f0e1144e700 10 osd.4 116 check_full_status full -> failsafe ... 2021-11-09T15:20:43.168+0000 7f0e1144e700 20 osd.4 127 check_full_status cur ratio 9.22337e+10, physical ratio 9.22337e+10, new state failsafe
During deleting the objects for those two files, the osd.4 went into failsafe state until the test_full timedout.
This should be one OSD side bug.
#5 Updated by Xiubo Li over 2 years ago
Checked all the other OSDs, they all didn't reach the "mon osd full ratio: 0.7". Only the osd.4 did.
That means the pgs still didn't evenly distributed among the OSDs for the fs pools.
#6 Updated by Xiubo Li over 2 years ago
In /ceph/teuthology-archive/yuriw-2021-11-08_20:21:11-fs-wip-yuri5-testing-2021-11-08-1003-pacific-distro-basic-smithi/6492110/remote/smithi060/log/ceph-osd.4.log.gz:
2021-11-09T15:08:27.659+0000 7f0e1144e700 5 osd.4 113 heartbeat osd_stat(store_statfs(0x0/0x0/0xbebc200, data 0x0/0x0, compress 0x0/0x0/0x0, omap 0x0, meta 0x0), peers [0,1,2,3,5,6,7] op hist []) 2021-11-09T15:08:27.659+0000 7f0e1144e700 20 osd.4 113 check_full_status cur ratio 1, physical ratio 1, new state full ... 2021-11-09T15:08:33.563+0000 7f0e1144e700 5 osd.4 116 heartbeat osd_stat(store_statfs(0xc215534/0x0/0xbebc200, data 0x0/0x0, compress 0x0/0x0/0x0, omap 0x0, meta 0x0), peers [0,1,2,3,5,6,7] op hist []) 2021-11-09T15:08:33.563+0000 7f0e1144e700 20 osd.4 116 check_full_status cur ratio 9.22337e+10, physical ratio 9.22337e+10, new state failsafe 2021-11-09T15:08:33.563+0000 7f0e1144e700 10 osd.4 116 check_full_status full -> failsafe
There has something wrong when calculating the statfs "0xc215534/0x0/0xbebc200", which is "available/internally_reserved/total". The available disk space is larger than total.
#7 Updated by Xiubo Li over 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 44128