Project

General

Profile

Bug #22136

ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2

Added by Yuri Weinstein almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
11/15/2017
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Pull request ID:

Description

This is luminous v12.2.2 point release

Run: http://pulpito.ceph.com/yuriw-2017-11-14_21:13:54-ceph-disk-luminous-distro-basic-vps/
Jobs: 1849027
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2017-11-14_21:13:54-ceph-disk-luminous-distro-basic-vps/1849027/teuthology.log

-p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: trigger /dev/vdb2 parttype 45b0969e-9b03-4f30-b4c6-35865ceff106 uuid 059eb62e-c988-11e7-aa08-525400ca8456
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger:
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:Traceback (most recent call last):
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/sbin/ceph-disk", line 9, in <module>
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5709, in run
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    main(sys.argv[1:])
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5660, in main
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    args.func(args)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5410, in <lambda>
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    func=lambda args: main_activate_space(name, args),
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4125, in main_activate_space
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    dev = dmcrypt_map(args.dev, args.dmcrypt_key_dir)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3453, in dmcrypt_map
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    dmcrypt_key = get_dmcrypt_key(part_uuid, dmcrypt_key_dir, luks)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1314, in get_dmcrypt_key
2017-11-14T22:07:16.739 INFO:tasks.workunit.client.0.vpm039.stderr:    raise Error('unknown key-management-mode ' + str(mode))

Related issues

Copied to Ceph - Backport #22262: luminous: ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2 Resolved

History

#1 Updated by Kefu Chai almost 2 years ago

  • Assignee set to Kefu Chai

#4 Updated by Yuri Weinstein over 1 year ago

  • Status changed from Can't reproduce to New

#5 Updated by Kefu Chai over 1 year ago

2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2

the command above was launched because the OSD.2 using /dev/vdb2 as its journal was just deactivated and all data devices. as a sequence, /dev/vdb1 (data device), /dev/vdb2 (journal devices), /dev/vdb5 (lockbox) are unmounted.

i think this is expected.

and the error message is expected and harmless. the other two passed tests also share the same error. see http://pulpito.ceph.com/kchai-2017-11-24_11:36:00-ceph-disk-wip-22136-kefu-distro-basic-vps/.

the reason why the test fails is

2017-11-27 10:47:31.105 7f66a8fab700 20 osd.2 328 check_full_status cur ratio 0.878733. nearfull_ratio 0.85. backfillfull_ratio 0.9, full_rat
io 0.95, failsafe_ratio 0.97, new state nearfull

see /a/kchai-2017-11-27_09:59:39-ceph-disk-wip-22136-kefu-distro-basic-vps/1896328/remote/*/log/ceph-osd.2*

2017-11-27 10:47:13.402 7f66d4b7d400  0 ceph version 13.0.0-3501-g72834d9 (72834d933cac17295fc18ce0c00e6394fe8440b2) mimic (dev), process (unknown), pid 73969
...
2017-11-27 10:47:13.592 7f66a8fab700 20 osd.2 0 update_osd_stat osd_stat(9912 kB used, 87988 kB avail, 97900 kB total, peers [] op hist [])

for some reason, osd.2 has only 97900KB space, less than 100MB.

but it had abundant space before that:

2017-11-27 10:26:28.847 7f4c8e3bf700 20 osd.2 0 update_osd_stat osd_stat(37616 kB used, 102212 MB avail, 102249 MB total, peers [] op hist []

i think the multiple-path device is too small: the journal device sizes 100MB, and the data device also sizes 100MB. this bug only happens on centos: test_activate_multipath is skipped due to https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688 .

if we take the inode size into consideration, the reported total size matches the size of journal we allocate for osd.2.

#6 Updated by Kefu Chai over 1 year ago

  • Subject changed from "stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name" in ceph-disk-luminous to ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2

#7 Updated by Kefu Chai over 1 year ago

  • Status changed from New to Need Review
  • Backport set to luminous

#8 Updated by Kefu Chai over 1 year ago

  • Copied to Backport #22262: luminous: ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2 added

#9 Updated by Kefu Chai over 1 year ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF