Project

General

Profile

Actions

Fix #8109

closed

OSD-disk fails to activate when final mount dir is not empty and shows no proper error message

Added by Alphe Salas about 10 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
ceph-disk
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD-disk fails to activate when STATEDIR + '/osd/{cluster}-{osd_id}'.format(cluster=cluster,osd_id=osd_id) is not empty and shows no proper error message.

Here is how the code proceed actually in the function mount_activate().
This funcion fetch config data for the drive we want to mount, then it mount the drive in a
temporary directory then it does tests to raise up flags as result of those test (activate=true or other=true).

The problem show on the if test then raise flag and in the if flags up then display message.

In my case there was no message of the fact that the test
elif os.listdir('/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
)):
other = True
was properly operating because activate variable was still with it s default False value.

the Error messages of the os.dirlist test is trigger by a raise Error() and this doesn t work properly. And if it would the message would not be accurate Error('another %s osd.%s already mounted in position (old/different cluster instance?); unmounting ours.' % (cluster, osd_id))

At that point it should say '/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
)+ ' is not an empty directory! Please delete the files in it if it was properly umounted!'

So how would I change it ?

instead of rising an Error that doesn't display any message other than it has fails (variable other=true) at a successfull os.dirlist I would do a
LOG.debug('/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
) + ' is not an empty directory! If it was previously properly unmounted then delete the files in that directory!')

Something like:
elif other:

Log.debug('/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
) + ' is not an empty directory! If it was previously properly unmounted then delete the files in that directory!')
raise Error('')

That way we still have the proper error raised and we have a complement of information of the nature of the problem that is more accurate and allow a faster resolution of the problem.

Actually for that problem the verbose text shows no mentions of the problem like you can see:
#ceph-disk -v activate /dev/sda1
DEBUG:ceph-disk:Mounting /dev/sda1 on /var/lib/ceph/tmp/mnt.HFXmKe with options
noatime
DEBUG:ceph-disk:Cluster uuid is 735977a6-8415-4132-8316-b9ba6efc53dc
DEBUG:ceph-disk:Cluster name is ceph
DEBUG:ceph-disk:OSD uuid is d38f05cf-1bd1-4497-b2a4-89db783ccb27
DEBUG:ceph-disk:OSD id is4
DEBUG:ceph-disk:Marking with init system upstart
DEBUG:ceph-disk:ceph osd.4 data dir is ready at /var/lib/ceph/tmp/mnt.HFXmK
ERROR:ceph-disk:Failed to activate /dev/sda1
DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.HFXmKe

To reproduce that problem copy some of the root files and dirs name from the disks of one of your osd related disk. then unmount that disk then copy the files and directories back to the directory that is the final mount point var/lib/ceph/osd/{cluster}-{osd_id}'.format(cluster=cluster, osd_id=osd_id, ). Then try to use ceph-disk -v /dev/sdX1 and see that it doesn t work or show the reason why it fails.

Actions #1

Updated by Alphe Salas about 10 years ago

the other way to solve that is just to change

elif os.listdir('/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
)):
other = True

to

elif os.listdir('/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
)):
LOG.debug('the directory ' + '/var/lib/ceph/osd/{cluster}-{osd_id}'.format(
cluster=cluster,
osd_id=osd_id,
) + '! Check that the related drive isn't mounted!' )

Actions #2

Updated by Sage Weil almost 6 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF