Project

General

Profile

Actions

Bug #5345

closed

ceph-disk: handle less common device names

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/dev/sdaa*
/dev/cciss/c0d0p1
etc.


Related issues 1 (0 open1 closed)

Has duplicate devops - Bug #5734: ceph-deploy osd prepare <part1>:<part2> fails, tries to look up /sys/block/<part1>Duplicate07/23/2013

Actions
Actions #1

Updated by Sage Weil almost 11 years ago

  • Assignee set to Sage Weil
Actions #2

Updated by Sage Weil almost 11 years ago

  • Status changed from New to Need More Info
Actions #3

Updated by Sage Weil almost 11 years ago

  • Priority changed from Urgent to High
Actions #4

Updated by Tomas Lovato almost 11 years ago

I have several HP dl180's with p400 raid controllers. This is all standard hardware.

The disk paths are enumerated to /dev/cciss/c0d0 with partitions being p0,p1,p2,etc...

The error occurs when I try an osd prepare. I can't proceed.

  1. ceph-deploy osd prepare ceph01:/dev/cciss/c0d0p3:/dev/cciss/c0d0p2
    ceph-disk-prepare -- /dev/cciss/c0d0p3 /dev/cciss/c0d0p2 returned 1
    Information: Moved requested sector from 4194338 to 4196352 in
    order to align on 2048-sector boundaries.
    Warning: The kernel is still using the old partition table.
    The new table will be used at the next reboot.
    The operation has completed successfully.

WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
Warning: WARNING: the kernel failed to re-read the partition table on /dev/cciss/c0d0p2 (Invalid argument). As a result, it may not reflect all of your changes until after reboot.
ceph-disk: Error: Command '['partprobe', '/dev/cciss/c0d0p2']' returned non-zero exit status 1

ceph-deploy: Failed to create 1 OSDs

Actions #5

Updated by Sage Weil almost 11 years ago

can you please try the version of ceph-disk in the wip-ceph-disk branch? it has a bunch of changes to be smarter about using the basename, but I don't have a system with that kind of driver to test it against.

Actions #6

Updated by Jing Yuan Luke almost 11 years ago

I had similar problem as Thomas but mine are HP Blades (more specifically BL460 G1) with P200i controllers but I suspect this only affect certain HP products only the newer so call G7 or G8 don't seem to display this behaviour.

Actually I should have provided the following information earlier but the activation email somehow was blocked by my corporate email server so this is my second attempt using a login with a different email account.

Anyway some information for your reference (this was run from one of the OSD host after I manually changed the ceph-disk myself):
find /dev/disk -ls
1397 0 drwxr-xr-x 7 root root 140 Jun 12 13:06 /dev/disk
20986 0 drwxr-xr-x 2 root root 80 Jun 12 13:06 /dev/disk/by-partlabel
22167 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partlabel/ceph\\x20data -> ../../cciss/c0d1p1
20987 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partlabel/ceph\\x20journal -> ../../cciss/c0d1p2
1429 0 drwxr-xr-x 2 root root 120 Jun 12 13:06 /dev/disk/by-uuid
22175 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-uuid/770be2c4-1961-4b32-8082-68f73ded5145 -> ../../cciss/c0d1p1
8989 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/754906aa-ec5c-4ecb-985c-500c60f564b0 -> ../../cciss/c0d0p2
1463 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/831417b6-493c-488f-bace-05f4a6f9be2d -> ../../cciss/c0d0p4
1430 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/72451fa4-ce32-487d-b630-aafd64dba78f -> ../../cciss/c0d0p3
1422 0 drwxr-xr-x 2 root root 140 Jun 12 13:06 /dev/disk/by-partuuid
22170 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partuuid/a520444d-28f5-4d7e-8c8c-24ce61aaddc1 -> ../../cciss/c0d1p1
20990 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partuuid/2c95b241-c514-4d42-a176-4306bef58e38 -> ../../cciss/c0d1p2
8983 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/bbbe29b9-22c7-40d4-bfda-beb27f4a465a -> ../../cciss/c0d0p2
1457 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/f6ccc910-f761-4e64-a189-33cf0a7994c9 -> ../../cciss/c0d0p4
1423 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/9cdf604b-f075-4cc5-bb8e-583fb6154943 -> ../../cciss/c0d0p3
1405 0 drwxr-xr-x 2 root root 140 Jun 12 13:06 /dev/disk/by-path
22173 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0-part1 -> ../../cciss/c0d1p1
20993 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0-part2 -> ../../cciss/c0d1p2
20724 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0 -> ../../cciss/c0d1
1460 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-path/pci-0000:0b:08.0-part4 -> ../../cciss/c0d0p4
1426 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-path/pci-0000:0b:08.0-part3 -> ../../cciss/c0d0p3
1398 0 drwxr-xr-x 2 root root 240 Jun 12 13:06 /dev/disk/by-id
22165 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part1 -> ../../cciss/c0d1p1
22163 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part1 -> ../../cciss/c0d1p1
20984 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part2 -> ../../cciss/c0d1p2
20982 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part2 -> ../../cciss/c0d1p2
20723 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000 -> ../../cciss/c0d1
20722 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000 -> ../../cciss/c0d1
1454 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part4 -> ../../cciss/c0d0p4
1451 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part4 -> ../../cciss/c0d0p4
1419 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part3 -> ../../cciss/c0d0p3
1416 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part3 -> ../../cciss/c0d0p3

find /dev/cciss ls
7500 0 drwxr-xr-x 2 root root 200 Jun 12 13:06 /dev/cciss
18074 0 brw-rw---
1 root disk Jul 2 15:33 /dev/cciss/c0d1p2
18073 0 brw-rw---- 1 root disk Jun 12 13:06 /dev/cciss/c0d1p1
13426 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p1
7508 0 brw-rw---- 1 root disk Jun 12 13:06 /dev/cciss/c0d1
7505 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p4
7504 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p3
7503 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p2
7501 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0

Actions #7

Updated by Sage Weil almost 11 years ago

Hi Jing,

As far as I can tell the current ceph-disk supports these device names, but as I mentioned I don't have a system to test with. Can you pull the lastest from https://raw.github.com/ceph/ceph/master/src/ceph-disk put it in /usr/sbin, and see if everything works?

If not, my first guess is that /dev/cciss/$foo doesn't appear at /sys/block/$foo; can you check? Thanks!

Actions #8

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

Just tried the ceph-disk as per your suggestion, however I found the following error:

ceph-deploy osd prepare xxx:cciss/c0d1
ceph-disk-prepare -- /dev/cciss/c0d1 returned 1

Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 2295, in <module>
main()
File "/usr/sbin/ceph-disk", line 2284, in main
args.func(args)
File "/usr/sbin/ceph-disk", line 1116, in main_prepare
verify_not_in_use(args.data)
File "/usr/sbin/ceph-disk", line 323, in verify_not_in_use
for partition in list_partitions(dev):
File "/usr/sbin/ceph-disk", line 233, in list_partitions
for name in os.listdir(os.path.join('/sys/block', base)):
OSError: [Errno 2] No such file or directory: '/sys/block/c0d1'

ceph-deploy: Failed to create 1 OSDs

Going through the /sys/block I found the following:
find /sys/block -ls
2651 0 drwxr-xr-x 2 root root 0 Jun 11 18:17 /sys/block
12289 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram0 -> ../devices/virtual/block/ram0
12364 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram1 -> ../devices/virtual/block/ram1
12439 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram2 -> ../devices/virtual/block/ram2
12514 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram3 -> ../devices/virtual/block/ram3
12589 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram4 -> ../devices/virtual/block/ram4
12664 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram5 -> ../devices/virtual/block/ram5
12739 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram6 -> ../devices/virtual/block/ram6
12814 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram7 -> ../devices/virtual/block/ram7
12889 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram8 -> ../devices/virtual/block/ram8
12964 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram9 -> ../devices/virtual/block/ram9
13039 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram10 -> ../devices/virtual/block/ram10
13114 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram11 -> ../devices/virtual/block/ram11
13189 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram12 -> ../devices/virtual/block/ram12
13264 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram13 -> ../devices/virtual/block/ram13
13339 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram14 -> ../devices/virtual/block/ram14
13414 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram15 -> ../devices/virtual/block/ram15
13505 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop0 -> ../devices/virtual/block/loop0
13580 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop1 -> ../devices/virtual/block/loop1
13655 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop2 -> ../devices/virtual/block/loop2
13730 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop3 -> ../devices/virtual/block/loop3
13805 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop4 -> ../devices/virtual/block/loop4
13880 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop5 -> ../devices/virtual/block/loop5
13955 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop6 -> ../devices/virtual/block/loop6
14030 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop7 -> ../devices/virtual/block/loop7
16402 0 lrwxrwxrwx 1 root root 0 Jun 11 18:17 /sys/block/cciss!c0d0 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d0/block/cciss!c0d0
16636 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/cciss!c0d1 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d1/block/cciss!c0d1

Thanks.
Luke

Actions #9

Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to In Progress
  • Priority changed from High to Urgent

Thanks, Luke--that was exactly the info I needed!

Actions #10

Updated by Sage Weil almost 11 years ago

  • Status changed from In Progress to Fix Under Review
Actions #11

Updated by Sage Weil almost 11 years ago

Hi Luke, Tomas,

Are you able to test the latest version in this branch? https://raw.github.com/ceph/ceph/wip-ceph-disk/src/ceph-disk

Thanks!

Actions #12

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

I had the following error:

root@yyy:~/ceph-configure# ceph-deploy v osd prepare xxx:cciss/c0d1
Preparing cluster ceph disks xxx:/dev/cciss/c0d1:
Deploying osd to xxx
Host xxx is now ready for osd use.
Preparing host xxx disk /dev/cciss/c0d1 journal None activate False
ceph-disk-prepare -
/dev/cciss/c0d1 returned 1

ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1

ceph-deploy: Failed to create 1 OSDs

I had tested the same script on 2 separate servers and able to replicate the same error on both. Also no partitions were created.

Regards,
Luke

Actions #13

Updated by Sage Weil almost 11 years ago

Jing Yuan Luke wrote:

Hi Sage,

I had the following error:

root@yyy:~/ceph-configure# ceph-deploy v osd prepare xxx:cciss/c0d1
Preparing cluster ceph disks xxx:/dev/cciss/c0d1:
Deploying osd to xxx
Host xxx is now ready for osd use.
Preparing host xxx disk /dev/cciss/c0d1 journal None activate False
ceph-disk-prepare -
/dev/cciss/c0d1 returned 1

ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1

ceph-deploy: Failed to create 1 OSDs

I had tested the same script on 2 separate servers and able to replicate the same error on both. Also no partitions were created.

Regards,
Luke

Hi Luke,

I pushed a version to the same branch (same url) that should print a line like 'dev ... name is ...' if you run 'ceph-disk -v /dev/cciss/c0d1'. Can you try it? And then verify whether /sys/block/$name is present? It looked like it is doing a simple s/\//!/ to the path relative to /dev, but i may be wrong.

Thanks!

Actions #14

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

Here is what I got:

ceph-disk -v prepare /dev/cciss/c0d1
DEBUG:ceph-disk:dev /dev/cciss/c0d1 name is cciss/c0d1
ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1

Checking /sys/block:

find /sys/block/cciss* -ls
16402 0 lrwxrwxrwx 1 root root 0 Jun 5 18:51 /sys/block/cciss!c0d0 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d0/block/cciss!c0d0
16636 0 lrwxrwxrwx 1 root root 0 Jul 10 09:15 /sys/block/cciss!c0d1 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d1/block/cciss!c0d1

Regards,
Luke

Actions #15

Updated by Sage Weil almost 11 years ago

aha, i see the problem. pushed a fix.. can you see if it works now?

(thanks!)

Actions #16

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

I think there is a typo in line 327:

Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 2307, in <module>
main()
File "/usr/sbin/ceph-disk", line 2296, in main
args.func(args)
File "/usr/sbin/ceph-disk", line 1128, in main_prepare
verify_not_in_use(args.data)
File "/usr/sbin/ceph-disk", line 327, in verify_not_in_use
basename = get_dev_name(os.realpath(dev))
AttributeError: 'module' object has no attribute 'realpath'

Anyway, after I change it to basename = get_dev_name(os.path.realpath(dev)), I managed to get it running again until hit with the following error:

/dev/cciss/c0d11: No such file or directory
Usage: mkfs.xfs
< some long message from mkfs.xfs >
ceph-disk: Error: Command '['mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/cciss/c0d11']' returned non-zero exit status 1

ceph-deploy: Failed to create 1 OSDs

I believe the correct one should be /dev/cciss/c0d1p1, here is the output of my /proc/partition:
cat /proc/partitions
major minor #blocks name

104        0  244163520 cciss/c0d0
104 1 1024 cciss/c0d0p1
104 2 195584 cciss/c0d0p2
104 3 235966464 cciss/c0d0p3
104 4 7998464 cciss/c0d0p4
104 16 244163520 cciss/c0d1
104 17 243113903 cciss/c0d1p1
104 18 1047552 cciss/c0d1p2

Basically, the partitions are correctly created but not formatted.

Regards,
Luke

Actions #17

Updated by Sage Weil almost 11 years ago

ok, fixed the typo and redi the partition naming code.. try again?

thanks!

Actions #18

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

The code get through without any error, but I think somewhere in prepare_journal or related failed to setup the journal properly and the OSD daemon failed (ceph osd tree show the host as being down and out). Below are some observations:

From ceph.log after ceph-deploy:
2013-07-12 09:49:56,293 ceph_deploy.osd DEBUG Preparing cluster ceph disks xxxx:/dev/cciss/c0d1:
2013-07-12 09:49:56,527 ceph_deploy.osd DEBUG Deploying osd to xxxx
2013-07-12 09:49:56,950 ceph_deploy.osd DEBUG Host xxxx is now ready for osd use.
2013-07-12 09:49:56,950 ceph_deploy.osd DEBUG Preparing host xxxx disk /dev/cciss/c0d1 journal None activate False
2013-07-12 09:50:50,791 ceph_deploy.osd DEBUG Activating cluster ceph disks xxxx:/dev/cciss/c0d1:
2013-07-12 09:50:51,025 ceph_deploy.osd DEBUG Activating host xxxx disk /dev/cciss/c0d1
2013-07-12 09:50:51,157 ceph_deploy.osd DEBUG Distro Ubuntu codename precise, will use upstart

From ceph osd tree:
  1. id weight type name up/down reweight
    -1 1.88 root default
    -2 0.23 host aaa
    0 0.23 osd.0 up 1
    -3 0.23 host bbb
    1 0.23 osd.1 up 1
    -4 0.23 host ccc
    2 0.23 osd.2 up 1
    -5 0.5 host ddd
    3 0.5 osd.3 up 1
    -6 0.23 host eee
    4 0.23 osd.4 up 1
    -7 0.23 host fff
    5 0.23 osd.5 up 1
    -8 0.23 host xxxx
    6 0.23 osd.6 down 0

From the host where I tried to prepare the OSD:

parted /dev/cciss/c0d1 p
Model: Compaq Smart Array (cpqarray)
Disk /dev/cciss/c0d1: 250GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
2 1049kB 1074MB 1073MB ceph journal
1 1075MB 250GB 249GB xfs ceph data

Then the /proc/partition (here the c0d1p2 aka journal is missing):
cat /proc/partitions
major minor #blocks name

104        0  244163520 cciss/c0d0
104 1 1024 cciss/c0d0p1
104 2 195584 cciss/c0d0p2
104 3 235966464 cciss/c0d0p3
104 4 7998464 cciss/c0d0p4
104 16 244163520 cciss/c0d1
104 17 243113903 cciss/c0d1p1

The mount command showed the data is indeed mounted:
mount | grep ceph
/dev/cciss/c0d1p1 on /var/lib/ceph/osd/ceph-6 type xfs (rw)

Looking into the mount and I found the link to journal is missing (not showing in /dev/disk/by-partuuid:

ls -l /var/lib/ceph/osd/ceph-6/
total 40
-rw-r--r-- 1 root root 490 Jul 12 09:49 activate.monmap
-rw-r--r-- 1 root root   3 Jul 12 09:49 active
-rw-r--r-- 1 root root  37 Jul 12 09:49 ceph_fsid
drwxr-xr-x 4 root root  61 Jul 12 09:49 current
-rw-r--r-- 1 root root  37 Jul 12 09:49 fsid
lrwxrwxrwx 1 root root  58 Jul 12 09:49 journal -> /dev/disk/by-partuuid/49c86d49-56fd-41af-a4a8-a7bd1de5e7a7
-rw-r--r-- 1 root root  37 Jul 12 09:49 journal_uuid
-rw------- 1 root root  56 Jul 12 09:49 keyring
-rw-r--r-- 1 root root  21 Jul 12 09:49 magic
-rw-r--r-- 1 root root   6 Jul 12 09:49 ready
-rw-r--r-- 1 root root   4 Jul 12 09:49 store_version
-rw-r--r-- 1 root root   0 Jul 12 09:49 upstart
-rw-r--r-- 1 root root   2 Jul 12 09:49 whoami

ls -l /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 18 Jul 12 09:48 24b8f94f-05ac-4440-aee2-65fc4874c8ff -> ../../cciss/c0d0p4
lrwxrwxrwx 1 root root 18 Jul 12 09:49 41088959-ef5c-4588-9d37-0e447a63db0c -> ../../cciss/c0d1p1
lrwxrwxrwx 1 root root 18 Jul 12 09:48 87657540-58ec-4ab8-8cef-ba4909e9f201 -> ../../cciss/c0d0p2
lrwxrwxrwx 1 root root 18 Jul 12 09:48 e083f517-8a32-4829-be99-7ceb5cbf6f9b -> ../../cciss/c0d0p3

From the ceph-osd.0.log:
2013-07-12 09:48:42.832776 7f3ec6dd5780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17205
2013-07-12 09:48:42.894943 7f3ec6dd5780 1 journal _open /dev/cciss/c0d1p2 fd 5: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:48:42.895731 7f3ec6dd5780 -1 journal read_header error decoding journal header
2013-07-12 09:48:44.492938 7f7ea4469780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17227
2013-07-12 09:48:44.555216 7f7ea4469780 1 journal _open /dev/cciss/c0d1p2 fd 5: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:48:44.555782 7f7ea4469780 -1 journal read_header error decoding journal header
2013-07-12 09:49:05.530703 7f7d3197f780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17261
2013-07-12 09:49:05.592673 7f7d3197f780 1 journal _open /dev/cciss/c0d1p2 fd 4: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:49:05.593157 7f7d3197f780 -1 journal read_header error decoding journal header

And lastly ceph-osd.6.log:
2013-07-12 09:49:06.227090 7f8632fdf780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17299
2013-07-12 09:49:06.230161 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs in /var/lib/ceph/tmp/mnt.WImxPL
2013-07-12 09:49:06.230203 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs fsid is already set to 41088959-ef5c-4588-9d37-0e447a63db0c
2013-07-12 09:49:06.360547 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) leveldb db exists/created
2013-07-12 09:49:06.422731 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 10: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:06.423208 7f8632fdf780 -1 journal read_header error decoding journal header
2013-07-12 09:49:06.487265 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 10: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:06.523450 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkjournal created journal on /var/lib/ceph/tmp/mnt.WImxPL/journal
2013-07-12 09:49:06.523494 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs done in /var/lib/ceph/tmp/mnt.WImxPL
2013-07-12 09:49:06.671646 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount FIEMAP ioctl is supported and appears to work
2013-07-12 09:49:06.671699 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-07-12 09:49:06.672100 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount did NOT detect btrfs
2013-07-12 09:49:06.894047 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-07-12 09:49:06.894121 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount found snaps <>
2013-07-12 09:49:06.961018 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-07-12 09:49:07.023186 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 16: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:07.087738 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 16: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:07.088183 7f8632fdf780 -1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2013-07-12 09:49:07.505819 7f8632fdf780 1 journal close /var/lib/ceph/tmp/mnt.WImxPL/journal
2013-07-12 09:49:07.506224 7f8632fdf780 -1 created object store /var/lib/ceph/tmp/mnt.WImxPL journal /var/lib/ceph/tmp/mnt.WImxPL/journal for osd.6 fsid da54c1ec-86a9-4bd8-b8bc-31f870c744ed
2013-07-12 09:49:07.506276 7f8632fdf780 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.WImxPL/keyring: can't open /var/lib/ceph/tmp/mnt.WImxPL/keyring: (2) No such file or directory
2013-07-12 09:49:07.506376 7f8632fdf780 -1 created new key in keyring /var/lib/ceph/tmp/mnt.WImxPL/keyring

I am not sure if this is correct, but from somewhere I had read, this maybe due to the udev issue?

Regards,
Luke

Actions #19

Updated by Sage Weil almost 11 years ago

what is strange is that parted showed 2 partitoins but cat /proc/partitions only showed 1. is it still in that state? can you do 'partprobe /dev/cciss/c0d1' and then cat /proc/partitions again and see if that makes it show both partitions?

in general, when testing this, the most helpful output is from running ceph-disk directly with -v, e.g.

ceph-disk -v zap /dev/cciss/c0d1
ceph-disk -v prepare /dev/cciss/c0d1
Actions #20

Updated by Sage Weil almost 11 years ago

Hi Luke- have you had a chance to see if a 'partprobe /dev/cciss/c0d1' makes the journal parition appear?

Actions #21

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

I had run partprobe and after probably 10 minutes, I still not seeing the second partition in /proc/partitions.

Regards,
Luke

Actions #22

Updated by Sage Weil almost 11 years ago

it sounds like the partition table didn't actually get written, or some other problem between your kernel and parted. :/ The ceph-disk parts seem to be behaving properly, at least, so I'll merge them in.

Luke, how reproducible is this? does it happen every time? even after you, say, ceph-deploy zap host:disk to blow away the old partition table and try it again?

Actions #23

Updated by Sage Weil almost 11 years ago

  • Priority changed from Urgent to High
Actions #24

Updated by Sage Weil almost 11 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Priority changed from High to Urgent
Actions #25

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

I tried the methods you suggested (zap and prepare) on 2 identical servers with the same controller. Both show the same problem despite doing partprobe twice 30 minutes apart.

However when I rebooted both servers, the partition table somehow got updated and now the OSDs are up and operational. I am not sure if its a kernel issue, I have Ubuntu Precise installed using the standard 3.2.0 kernel.

Regards,
Luke

Actions #26

Updated by Sage Weil almost 11 years ago

Jing Yuan Luke wrote:

Hi Sage,

I tried the methods you suggested (zap and prepare) on 2 identical servers with the same controller. Both show the same problem despite doing partprobe twice 30 minutes apart.

However when I rebooted both servers, the partition table somehow got updated and now the OSDs are up and operational. I am not sure if its a kernel issue, I have Ubuntu Precise installed using the standard 3.2.0 kernel.

Oh- Is one of the other partitions on the disk mounted at the time you run partprobe? I believe that can prevent a refresh.

Actions #27

Updated by Jing Yuan Luke almost 11 years ago

Hi Sage,

I think the first partition (data) was mounted. I should had noted that when after running zap, parted still show the partition (probably there is a need to do a check mount before zap as well?).

Regards,
Luke

Actions #28

Updated by Sage Weil almost 11 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF