Bug #22136 fails because nearfull on osd.2

This is luminous v12.2.2 point release

Jobs: 1849027

-p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: trigger /dev/vdb2 parttype 45b0969e-9b03-4f30-b4c6-35865ceff106 uuid 059eb62e-c988-11e7-aa08-525400ca8456
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger:
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:main_trigger: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb2
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:Traceback (most recent call last):
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/sbin/ceph-disk", line 9, in <module>
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/", line 5709, in run
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    main(sys.argv[1:])
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/", line 5660, in main
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    args.func(args)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/", line 5410, in <lambda>
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    func=lambda args: main_activate_space(name, args),
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/", line 4125, in main_activate_space
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    dev = dmcrypt_map(, args.dmcrypt_key_dir)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/", line 3453, in dmcrypt_map
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:    dmcrypt_key = get_dmcrypt_key(part_uuid, dmcrypt_key_dir, luks)
2017-11-14T22:07:16.738 INFO:tasks.workunit.client.0.vpm039.stderr:  File "/usr/lib/python2.7/site-packages/ceph_disk/", line 1314, in get_dmcrypt_key
2017-11-14T22:07:16.739 INFO:tasks.workunit.client.0.vpm039.stderr:    raise Error('unknown key-management-mode ' + str(mode))

Related issues

Copied to Ceph - Backport #22262: luminous: fails because nearfull on osd.2 Resolved


#1 Updated by Kefu Chai almost 2 years ago

  • Assignee set to Kefu Chai

#4 Updated by Yuri Weinstein over 1 year ago

  • Status changed from Can't reproduce to New

#5 Updated by Kefu Chai over 1 year ago

2017-11-14T22:07:16.737 INFO:tasks.workunit.client.0.vpm039.stderr:command: Running command: /usr/sbin/ceph-disk --verbose activate-journal --dmcrypt /dev/vdb2

the command above was launched because the OSD.2 using /dev/vdb2 as its journal was just deactivated and all data devices. as a sequence, /dev/vdb1 (data device), /dev/vdb2 (journal devices), /dev/vdb5 (lockbox) are unmounted.

i think this is expected.

and the error message is expected and harmless. the other two passed tests also share the same error. see

the reason why the test fails is

2017-11-27 10:47:31.105 7f66a8fab700 20 osd.2 328 check_full_status cur ratio 0.878733. nearfull_ratio 0.85. backfillfull_ratio 0.9, full_rat
io 0.95, failsafe_ratio 0.97, new state nearfull

see /a/kchai-2017-11-27_09:59:39-ceph-disk-wip-22136-kefu-distro-basic-vps/1896328/remote/*/log/ceph-osd.2*

2017-11-27 10:47:13.402 7f66d4b7d400  0 ceph version 13.0.0-3501-g72834d9 (72834d933cac17295fc18ce0c00e6394fe8440b2) mimic (dev), process (unknown), pid 73969
2017-11-27 10:47:13.592 7f66a8fab700 20 osd.2 0 update_osd_stat osd_stat(9912 kB used, 87988 kB avail, 97900 kB total, peers [] op hist [])

for some reason, osd.2 has only 97900KB space, less than 100MB.

but it had abundant space before that:

2017-11-27 10:26:28.847 7f4c8e3bf700 20 osd.2 0 update_osd_stat osd_stat(37616 kB used, 102212 MB avail, 102249 MB total, peers [] op hist []

i think the multiple-path device is too small: the journal device sizes 100MB, and the data device also sizes 100MB. this bug only happens on centos: test_activate_multipath is skipped due to .

if we take the inode size into consideration, the reported total size matches the size of journal we allocate for osd.2.

#6 Updated by Kefu Chai over 1 year ago

  • Subject changed from "stderr:get_dmcrypt_key: no `ceph_fsid` found falling back to 'ceph' for cluster name" in ceph-disk-luminous to fails because nearfull on osd.2

#7 Updated by Kefu Chai over 1 year ago

  • Status changed from New to Need Review
  • Backport set to luminous

#8 Updated by Kefu Chai over 1 year ago

  • Copied to Backport #22262: luminous: fails because nearfull on osd.2 added

#9 Updated by Kefu Chai over 1 year ago

  • Status changed from Need Review to Resolved

