Project

General

Profile

Bug #23911

ceph:luminous: osd out/down when setup with ubuntu/bluestore

Added by Vasu Kulkarni almost 6 years ago. Updated about 2 years ago.

Status:
Won't Fix - EOL
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this could be a systemd issue or more,

a) setup cluster using ceph-deploy
b) use ceph-disk/bluestore option for osds
cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy osd create --bluestore mira006:sdb

c) osd out/down

http://qa-proxy.ceph.com/teuthology/teuthology-2018-04-26_05:55:02-ceph-deploy-luminous-distro-basic-mira/2441160/teuthology.log


2018-04-26T09:49:38.010 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.IwXf08
2018-04-26T09:49:38.011 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] unmount: Unmounting /var/lib/ceph/tmp/mnt.IwXf08
2018-04-26T09:49:38.011 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.IwXf08
2018-04-26T09:49:38.125 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
2018-04-26T09:49:38.132 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command_check_call: Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb
2018-04-26T09:49:39.149 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] Warning: The kernel is still using the old partition table.
2018-04-26T09:49:39.149 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] The new table will be used at the next reboot or after you
2018-04-26T09:49:39.149 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] run partprobe(8) or kpartx(8)
2018-04-26T09:49:39.149 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] The operation has completed successfully.
2018-04-26T09:49:39.149 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] update_partition: Calling partprobe on prepared device /dev/sdb
2018-04-26T09:49:39.149 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command_check_call: Running command: /sbin/udevadm settle --timeout=600
2018-04-26T09:49:39.150 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb
2018-04-26T09:49:39.263 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command_check_call: Running command: /sbin/udevadm settle --timeout=600
2018-04-26T09:49:39.263 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] command_check_call: Running command: /sbin/udevadm trigger --action=add --sysname-match sdb1
2018-04-26T09:49:39.280 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][INFO  ] Running command: sudo systemctl enable ceph.target
2018-04-26T09:49:44.402 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][INFO  ] checking OSD status...
2018-04-26T09:49:44.402 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] find the location of an executable
2018-04-26T09:49:44.406 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][INFO  ] Running command: sudo /usr/bin/ceph --cluster=ceph osd stat --format=json
2018-04-26T09:49:44.723 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] there is 1 OSD down
2018-04-26T09:49:44.724 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][WARNING] there is 1 OSD out
2018-04-26T09:49:44.724 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.osd][DEBUG ] Host mira006 is now ready for osd use.

History

#2 Updated by Alfredo Deza almost 6 years ago

Looking at the logs for the OSD that failed:

2018-04-26 08:53:01.504628 7f8b27cb0d00  0 set uid:gid to 167:167 (ceph:ceph)
2018-04-26 08:53:01.504666 7f8b27cb0d00  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 33448
2018-04-26 08:53:01.505092 7f8b27cb0d00 10 bluestore(/var/lib/ceph/tmp/mnt.90QTvg) set_cache_shards 1
2018-04-26 08:53:01.509392 7f8b27cb0d00 10 bluestore(/var/lib/ceph/tmp/mnt.90QTvg) _set_csum csum_type crc32c
2018-04-26 08:53:01.511346 7f8b27cb0d00  1 bluestore(/var/lib/ceph/tmp/mnt.90QTvg) mkfs path /var/lib/ceph/tmp/mnt.90QTvg
2018-04-26 08:53:01.511356 7f8b27cb0d00 10 bluestore(/var/lib/ceph/tmp/mnt.90QTvg/block) _read_bdev_label
2018-04-26 08:53:01.511866 7f8b27cb0d00 10 bluestore(/var/lib/ceph/tmp/mnt.90QTvg/block) _read_bdev_label got bdev(osd_uuid 607e79c6-8970-408c-9679-64c4ce3e59bd, size 0x1680000000, btime 2018-04-12 07:25:03.997935, desc main, 7 meta)
2018-04-26 08:53:01.511896 7f8b27cb0d00  1 bluestore(/var/lib/ceph/tmp/mnt.90QTvg) mkfs already created
2018-04-26 08:53:01.511900 7f8b27cb0d00  1 bluestore(/var/lib/ceph/tmp/mnt.90QTvg) _fsck repair (shallow) start
2018-04-26 08:53:01.511954 7f8b27cb0d00  1 bdev create path /var/lib/ceph/tmp/mnt.90QTvg/block type kernel
2018-04-26 08:53:01.511963 7f8b27cb0d00  1 bdev(0x55bd2cfc3e00 /var/lib/ceph/tmp/mnt.90QTvg/block) open path /var/lib/ceph/tmp/mnt.90QTvg/block
2018-04-26 08:53:01.512393 7f8b27cb0d00  1 bdev(0x55bd2cfc3e00 /var/lib/ceph/tmp/mnt.90QTvg/block) open size 96636764160 (0x1680000000, 92160 MB) block_size 4096 (4096 B) rotational
2018-04-26 08:53:01.512408 7f8b27cb0d00 10 bluestore(/var/lib/ceph/tmp/mnt.90QTvg/block) _read_bdev_label
2018-04-26 08:53:01.512524 7f8b27cb0d00 10 bluestore(/var/lib/ceph/tmp/mnt.90QTvg/block) _read_bdev_label got bdev(osd_uuid 607e79c6-8970-408c-9679-64c4ce3e59bd, size 0x1680000000, btime 2018-04-12 07:25:03.997935, desc main, 7 meta)
2018-04-26 08:53:01.512536 7f8b27cb0d00 -1 bluestore(/var/lib/ceph/tmp/mnt.90QTvg/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.90QTvg/block fsid 607e79c6-8970-408c-9679-64c4ce3e59bd does not match our fsid 02da9e3b-b896-441e-b46f-4e76699219b7
2018-04-26 08:53:01.512544 7f8b27cb0d00  1 bdev(0x55bd2cfc3e00 /var/lib/ceph/tmp/mnt.90QTvg/block) close
2018-04-26 08:53:01.770845 7f8b27cb0d00 -1 bluestore(/var/lib/ceph/tmp/mnt.90QTvg) mkfs fsck found fatal error: (5) Input/output error
2018-04-26 08:53:01.770892 7f8b27cb0d00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
2018-04-26 08:53:01.771184 7f8b27cb0d00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.90QTvg: (5) Input/output error
2018-04-26 08:53:03.392737 7efd11702d00  0 set uid:gid to 167:167 (ceph:ceph)
2018-04-26 08:53:03.392767 7efd11702d00  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 33524
2018-04-26 08:53:03.393076 7efd11702d00 10 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU) set_cache_shards 1
2018-04-26 08:53:03.396358 7efd11702d00 10 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU) _set_csum csum_type crc32c
2018-04-26 08:53:03.398031 7efd11702d00  1 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU) mkfs path /var/lib/ceph/tmp/mnt.BFbBCU
2018-04-26 08:53:03.398040 7efd11702d00 10 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU/block) _read_bdev_label
2018-04-26 08:53:03.398440 7efd11702d00 10 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU/block) _read_bdev_label got bdev(osd_uuid 607e79c6-8970-408c-9679-64c4ce3e59bd, size 0x1680000000, btime 2018-04-12 07:25:03.997935, desc main, 7 meta)
2018-04-26 08:53:03.398488 7efd11702d00  1 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU) mkfs already created
2018-04-26 08:53:03.398496 7efd11702d00  1 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU) _fsck repair (shallow) start
2018-04-26 08:53:03.398589 7efd11702d00  1 bdev create path /var/lib/ceph/tmp/mnt.BFbBCU/block type kernel
2018-04-26 08:53:03.398605 7efd11702d00  1 bdev(0x55a712649e00 /var/lib/ceph/tmp/mnt.BFbBCU/block) open path /var/lib/ceph/tmp/mnt.BFbBCU/block
2018-04-26 08:53:03.399337 7efd11702d00  1 bdev(0x55a712649e00 /var/lib/ceph/tmp/mnt.BFbBCU/block) open size 96636764160 (0x1680000000, 92160 MB) block_size 4096 (4096 B) rotational
2018-04-26 08:53:03.399364 7efd11702d00 10 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU/block) _read_bdev_label
2018-04-26 08:53:03.399562 7efd11702d00 10 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU/block) _read_bdev_label got bdev(osd_uuid 607e79c6-8970-408c-9679-64c4ce3e59bd, size 0x1680000000, btime 2018-04-12 07:25:03.997935, desc main, 7 meta)
2018-04-26 08:53:03.399583 7efd11702d00 -1 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.BFbBCU/block fsid 607e79c6-8970-408c-9679-64c4ce3e59bd does not match our fsid 02da9e3b-b896-441e-b46f-4e76699219b7
2018-04-26 08:53:03.399598 7efd11702d00  1 bdev(0x55a712649e00 /var/lib/ceph/tmp/mnt.BFbBCU/block) close
2018-04-26 08:53:03.657847 7efd11702d00 -1 bluestore(/var/lib/ceph/tmp/mnt.BFbBCU) mkfs fsck found fatal error: (5) Input/output error
2018-04-26 08:53:03.657886 7efd11702d00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
2018-04-26 08:53:03.658126 7efd11702d00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.BFbBCU: (5) Input/output error
2018-04-26 08:53:05.227931 7fe9d1706d00  0 set uid:gid to 167:167 (ceph:ceph)
2018-04-26 08:53:05.227971 7fe9d1706d00  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 33598
2018-04-26 08:53:05.228290 7fe9d1706d00 10 bluestore(/var/lib/ceph/tmp/mnt.svj4hW) set_cache_shards 1
2018-04-26 08:53:05.232601 7fe9d1706d00 10 bluestore(/var/lib/ceph/tmp/mnt.svj4hW) _set_csum csum_type crc32c
2018-04-26 08:53:05.234586 7fe9d1706d00  1 bluestore(/var/lib/ceph/tmp/mnt.svj4hW) mkfs path /var/lib/ceph/tmp/mnt.svj4hW
2018-04-26 08:53:05.234600 7fe9d1706d00 10 bluestore(/var/lib/ceph/tmp/mnt.svj4hW/block) _read_bdev_label
2018-04-26 08:53:05.234973 7fe9d1706d00 10 bluestore(/var/lib/ceph/tmp/mnt.svj4hW/block) _read_bdev_label got bdev(osd_uuid 607e79c6-8970-408c-9679-64c4ce3e59bd, size 0x1680000000, btime 2018-04-12 07:25:03.997935, desc main, 7 meta)
2018-04-26 08:53:05.235017 7fe9d1706d00  1 bluestore(/var/lib/ceph/tmp/mnt.svj4hW) mkfs already created
2018-04-26 08:53:05.235023 7fe9d1706d00  1 bluestore(/var/lib/ceph/tmp/mnt.svj4hW) _fsck repair (shallow) start
2018-04-26 08:53:05.235103 7fe9d1706d00  1 bdev create path /var/lib/ceph/tmp/mnt.svj4hW/block type kernel
2018-04-26 08:53:05.235122 7fe9d1706d00  1 bdev(0x5654f61f9e00 /var/lib/ceph/tmp/mnt.svj4hW/block) open path /var/lib/ceph/tmp/mnt.svj4hW/block
2018-04-26 08:53:05.235527 7fe9d1706d00  1 bdev(0x5654f61f9e00 /var/lib/ceph/tmp/mnt.svj4hW/block) open size 96636764160 (0x1680000000, 92160 MB) block_size 4096 (4096 B) rotational
2018-04-26 08:53:05.235556 7fe9d1706d00 10 bluestore(/var/lib/ceph/tmp/mnt.svj4hW/block) _read_bdev_label
2018-04-26 08:53:05.235712 7fe9d1706d00 10 bluestore(/var/lib/ceph/tmp/mnt.svj4hW/block) _read_bdev_label got bdev(osd_uuid 607e79c6-8970-408c-9679-64c4ce3e59bd, size 0x1680000000, btime 2018-04-12 07:25:03.997935, desc main, 7 meta)
2018-04-26 08:53:05.235732 7fe9d1706d00 -1 bluestore(/var/lib/ceph/tmp/mnt.svj4hW/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.svj4hW/block fsid 607e79c6-8970-408c-9679-64c4ce3e59bd does not match our fsid 02da9e3b-b896-441e-b46f-4e76699219b7
2018-04-26 08:53:05.235748 7fe9d1706d00  1 bdev(0x5654f61f9e00 /var/lib/ceph/tmp/mnt.svj4hW/block) close
2018-04-26 08:53:05.493646 7fe9d1706d00 -1 bluestore(/var/lib/ceph/tmp/mnt.svj4hW) mkfs fsck found fatal error: (5) Input/output error
2018-04-26 08:53:05.493696 7fe9d1706d00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
2018-04-26 08:53:05.493946 7fe9d1706d00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.svj4hW: (5) Input/output error

This line is implying that the device was used for a previous cluster, and was not cleaned up:

2018-04-26 08:53:01.512536 7f8b27cb0d00 -1 bluestore(/var/lib/ceph/tmp/mnt.90QTvg/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.90QTvg/block fsid 607e79c6-8970-408c-9679-64c4ce3e59bd does not match our fsid 02da9e3b-b896-441e-b46f-4e76699219b7

#3 Updated by Vasu Kulkarni almost 6 years ago

Thanks alfredo

It shows that zap is not working now, I think we should fix the ceph-disk zap to properly clean the bluestore device here since we document that everywhere.

#5 Updated by Vasu Kulkarni almost 6 years ago

The zap run in this is definitely not zero'ing the first block based on log output

2018-04-26T09:49:30.042 INFO:teuthology.orchestra.run.mira006:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy disk zap mira006:sdb'
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ubuntu/.cephdeploy.conf
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ] Invoked (1.5.39): ./ceph-deploy disk zap mira006:sdb
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ] ceph-deploy options:
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  username                      : None
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  verbose                       : False
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
2018-04-26T09:49:30.255 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  subcommand                    : zap
2018-04-26T09:49:30.256 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  quiet                         : False
2018-04-26T09:49:30.256 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f3d1b0a5200>
2018-04-26T09:49:30.256 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  cluster                       : ceph
2018-04-26T09:49:30.257 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  func                          : <function disk at 0x7f3d1b07f578>
2018-04-26T09:49:30.257 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
2018-04-26T09:49:30.257 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  default_release               : False
2018-04-26T09:49:30.257 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.cli][INFO  ]  disk                          : [('mira006', '/dev/sdb', None)]
2018-04-26T09:49:30.257 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.osd][DEBUG ] zapping /dev/sdb on mira006
2018-04-26T09:49:30.302 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] connection detected need for sudo
2018-04-26T09:49:30.333 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] connected to host: mira006
2018-04-26T09:49:30.333 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] detect platform information from remote host
2018-04-26T09:49:30.359 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] detect machine type
2018-04-26T09:49:30.365 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] find the location of an executable
2018-04-26T09:49:30.366 INFO:teuthology.orchestra.run.mira006.stderr:[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 16.04 xenial
2018-04-26T09:49:30.366 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] zeroing last few blocks of device
2018-04-26T09:49:30.367 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] find the location of an executable
2018-04-26T09:49:30.370 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][INFO  ] Running command: sudo /usr/sbin/ceph-disk zap /dev/sdb
2018-04-26T09:49:31.906 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] Creating new GPT entries.
2018-04-26T09:49:31.906 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
2018-04-26T09:49:31.906 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] other utilities.
2018-04-26T09:49:32.874 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] Creating new GPT entries.
2018-04-26T09:49:32.874 INFO:teuthology.orchestra.run.mira006.stderr:[mira006][DEBUG ] The operation has completed successfully.

#6 Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to RADOS

#7 Updated by Neha Ojha about 2 years ago

  • Status changed from New to Won't Fix - EOL

Also available in: Atom PDF