Bug #22136
closedceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2
0%
Description
This is luminous v12.2.2 point release
Run: http://pulpito.ceph.com/yuriw-2017-11-14_21:13:54-ceph-disk-luminous-distro-basic-vps/
Jobs: 1849027
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2017-11-14_21:13:54-ceph-disk-luminous-distro-basic-vps/1849027/teuthology.log
-p /dev/vdb2 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: trigger /dev/vdb2 parttype 45b0969e-9b03-4f30-b4c6-35865ceff106 uuid 059eb62e-c988-11e7-aa08-525400ca8456 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:Traceback (most recent call last): 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/sbin/ceph-disk", line 9, in <module> 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() 2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5709, in run 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: main(sys.argv[1:]) 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5660, in main 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: args.func(args) 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5410, in <lambda> 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: func=lambda args: main_activate_space(name, args), 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4125, in main_activate_space 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: dev = dmcrypt_map(args.dev, args.dmcrypt_key_dir) 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3453, in dmcrypt_map 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: dmcrypt_key = get_dmcrypt_key(part_uuid, dmcrypt_key_dir, luks) 2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1314, in get_dmcrypt_key 2017-11-14T22:07:16.739 INFO:tasks.workunit.client.0.vpm039.stderr: raise Error('unknown key-management-mode ' + str(mode))
Updated by Kefu Chai over 6 years ago
- Status changed from New to Can't reproduce
Updated by Yuri Weinstein over 6 years ago
@Kefu Chai the job from your run has the same error http://pulpito.ceph.com/kchai-2017-11-22_04:38:54-ceph-disk-wip-pr-19075-luminous-kefu-distro-basic-vps/1877198/
Updated by Yuri Weinstein over 6 years ago
- Status changed from Can't reproduce to New
Updated by Kefu Chai over 6 years ago
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2
the command above was launched because the OSD.2 using /dev/vdb2 as its journal was just deactivated and all data devices. as a sequence, /dev/vdb1 (data device), /dev/vdb2 (journal devices), /dev/vdb5 (lockbox) are unmounted.
i think this is expected.
and the error message is expected and harmless. the other two passed tests also share the same error. see http://pulpito.ceph.com/kchai-2017-11-24_11:36:00-ceph-disk-wip-22136-kefu-distro-basic-vps/.
the reason why the test fails is
2017-11-27 10:47:31.105 7f66a8fab700 20 osd.2 328 check_full_status cur ratio 0.878733. nearfull_ratio 0.85. backfillfull_ratio 0.9, full_rat io 0.95, failsafe_ratio 0.97, new state nearfull
see /a/kchai-2017-11-27_09:59:39-ceph-disk-wip-22136-kefu-distro-basic-vps/1896328/remote/*/log/ceph-osd.2*
2017-11-27 10:47:13.402 7f66d4b7d400 0 ceph version 13.0.0-3501-g72834d9 (72834d933cac17295fc18ce0c00e6394fe8440b2) mimic (dev), process (unknown), pid 73969 ... 2017-11-27 10:47:13.592 7f66a8fab700 20 osd.2 0 update_osd_stat osd_stat(9912 kB used, 87988 kB avail, 97900 kB total, peers [] op hist [])
for some reason, osd.2 has only 97900KB space, less than 100MB.
but it had abundant space before that:
2017-11-27 10:26:28.847 7f4c8e3bf700 20 osd.2 0 update_osd_stat osd_stat(37616 kB used, 102212 MB avail, 102249 MB total, peers [] op hist []
i think the multiple-path device is too small: the journal device sizes 100MB, and the data device also sizes 100MB. this bug only happens on centos: test_activate_multipath is skipped due to https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688 .
if we take the inode size into consideration, the reported total size matches the size of journal we allocate for osd.2.
Updated by Kefu Chai over 6 years ago
- Subject changed from "stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name" in ceph-disk-luminous to ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2
Updated by Kefu Chai over 6 years ago
- Status changed from New to Fix Under Review
- Backport set to luminous
Updated by Kefu Chai over 6 years ago
- Copied to Backport #22262: luminous: ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2 added
Updated by Kefu Chai over 6 years ago
- Status changed from Fix Under Review to Resolved