Bug #22136: ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2 - Ceph - Ceph

Actions

Copy link

Bug #22136

closed

ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2

Added by Yuri Weinstein over 6 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Kefu Chai

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

ceph-disk

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is luminous v12.2.2 point release

Run: http://pulpito.ceph.com/yuriw-2017-11-14_21:13:54-ceph-disk-luminous-distro-basic-vps/
Jobs: 1849027
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2017-11-14_21:13:54-ceph-disk-luminous-distro-basic-vps/1849027/teuthology.log

-p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: trigger /dev/vdb2 parttype 45b0969e-9b03-4f30-b4c6-35865ceff106 uuid 059eb62e-c988-11e7-aa08-525400ca8456
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger:
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:Traceback (most recent call last):
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/sbin/ceph-disk", line 9, in <module>
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5709, in run
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    main(sys.argv[1:])
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5660, in main
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    args.func(args)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5410, in <lambda>
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    func=lambda args: main_activate_space(name, args),
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4125, in main_activate_space
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    dev = dmcrypt_map(args.dev, args.dmcrypt_key_dir)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3453, in dmcrypt_map
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    dmcrypt_key = get_dmcrypt_key(part_uuid, dmcrypt_key_dir, luks)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1314, in get_dmcrypt_key
2017-11-14T22:07:16.739 INFO:tasks.workunit.client.0.vpm039.stderr:    raise Error('unknown key-management-mode ' + str(mode))

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Kefu Chai over 6 years ago

Assignee set to Kefu Chai

Actions

Copy link

Updated by Kefu Chai over 6 years ago

Status changed from New to Can't reproduce

http://pulpito.ceph.com/kchai-2017-11-22_04:38:54-ceph-disk-wip-pr-19075-luminous-kefu-distro-basic-vps/

Actions

Copy link

Updated by Yuri Weinstein over 6 years ago

@Kefu Chai the job from your run has the same error http://pulpito.ceph.com/kchai-2017-11-22_04:38:54-ceph-disk-wip-pr-19075-luminous-kefu-distro-basic-vps/1877198/

Actions

Copy link

Updated by Yuri Weinstein over 6 years ago

Status changed from Can't reproduce to New

Actions

Copy link

Updated by Kefu Chai over 6 years ago

2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2

the command above was launched because the OSD.2 using /dev/vdb2 as its journal was just deactivated and all data devices. as a sequence, /dev/vdb1 (data device), /dev/vdb2 (journal devices), /dev/vdb5 (lockbox) are unmounted.

i think this is expected.

and the error message is expected and harmless. the other two passed tests also share the same error. see http://pulpito.ceph.com/kchai-2017-11-24_11:36:00-ceph-disk-wip-22136-kefu-distro-basic-vps/.

the reason why the test fails is

2017-11-27 10:47:31.105 7f66a8fab700 20 osd.2 328 check_full_status cur ratio 0.878733. nearfull_ratio 0.85. backfillfull_ratio 0.9, full_rat
io 0.95, failsafe_ratio 0.97, new state nearfull

see /a/kchai-2017-11-27_09:59:39-ceph-disk-wip-22136-kefu-distro-basic-vps/1896328/remote/*/log/ceph-osd.2*

2017-11-27 10:47:13.402 7f66d4b7d400  0 ceph version 13.0.0-3501-g72834d9 (72834d933cac17295fc18ce0c00e6394fe8440b2) mimic (dev), process (unknown), pid 73969
...
2017-11-27 10:47:13.592 7f66a8fab700 20 osd.2 0 update_osd_stat osd_stat(9912 kB used, 87988 kB avail, 97900 kB total, peers [] op hist [])

for some reason, osd.2 has only 97900KB space, less than 100MB.

but it had abundant space before that:

2017-11-27 10:26:28.847 7f4c8e3bf700 20 osd.2 0 update_osd_stat osd_stat(37616 kB used, 102212 MB avail, 102249 MB total, peers [] op hist []

i think the multiple-path device is too small: the journal device sizes 100MB, and the data device also sizes 100MB. this bug only happens on centos: test_activate_multipath is skipped due to https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688 .

if we take the inode size into consideration, the reported total size matches the size of journal we allocate for osd.2.

Actions

Copy link

Updated by Kefu Chai over 6 years ago

Subject changed from "stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name" in ceph-disk-luminous to ceph-disk-test.py:test_activate_multipath fails because nearfull on osd.2

Actions

Copy link