Bug #5345
closedceph-disk: handle less common device names
0%
Description
/dev/sdaa*
/dev/cciss/c0d0p1
etc.
Updated by Sage Weil almost 11 years ago
- Status changed from New to Need More Info
Updated by Tomas Lovato almost 11 years ago
I have several HP dl180's with p400 raid controllers. This is all standard hardware.
The disk paths are enumerated to /dev/cciss/c0d0 with partitions being p0,p1,p2,etc...
The error occurs when I try an osd prepare. I can't proceed.
- ceph-deploy osd prepare ceph01:/dev/cciss/c0d0p3:/dev/cciss/c0d0p2
ceph-disk-prepare -- /dev/cciss/c0d0p3 /dev/cciss/c0d0p2 returned 1
Information: Moved requested sector from 4194338 to 4196352 in
order to align on 2048-sector boundaries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
Warning: WARNING: the kernel failed to re-read the partition table on /dev/cciss/c0d0p2 (Invalid argument). As a result, it may not reflect all of your changes until after reboot.
ceph-disk: Error: Command '['partprobe', '/dev/cciss/c0d0p2']' returned non-zero exit status 1
ceph-deploy: Failed to create 1 OSDs
Updated by Sage Weil almost 11 years ago
can you please try the version of ceph-disk in the wip-ceph-disk branch? it has a bunch of changes to be smarter about using the basename, but I don't have a system with that kind of driver to test it against.
Updated by Jing Yuan Luke almost 11 years ago
I had similar problem as Thomas but mine are HP Blades (more specifically BL460 G1) with P200i controllers but I suspect this only affect certain HP products only the newer so call G7 or G8 don't seem to display this behaviour.
Actually I should have provided the following information earlier but the activation email somehow was blocked by my corporate email server so this is my second attempt using a login with a different email account.
Anyway some information for your reference (this was run from one of the OSD host after I manually changed the ceph-disk myself):
find /dev/disk -ls
1397 0 drwxr-xr-x 7 root root 140 Jun 12 13:06 /dev/disk
20986 0 drwxr-xr-x 2 root root 80 Jun 12 13:06 /dev/disk/by-partlabel
22167 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partlabel/ceph\\x20data -> ../../cciss/c0d1p1
20987 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partlabel/ceph\\x20journal -> ../../cciss/c0d1p2
1429 0 drwxr-xr-x 2 root root 120 Jun 12 13:06 /dev/disk/by-uuid
22175 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-uuid/770be2c4-1961-4b32-8082-68f73ded5145 -> ../../cciss/c0d1p1
8989 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/754906aa-ec5c-4ecb-985c-500c60f564b0 -> ../../cciss/c0d0p2
1463 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/831417b6-493c-488f-bace-05f4a6f9be2d -> ../../cciss/c0d0p4
1430 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/72451fa4-ce32-487d-b630-aafd64dba78f -> ../../cciss/c0d0p3
1422 0 drwxr-xr-x 2 root root 140 Jun 12 13:06 /dev/disk/by-partuuid
22170 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partuuid/a520444d-28f5-4d7e-8c8c-24ce61aaddc1 -> ../../cciss/c0d1p1
20990 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partuuid/2c95b241-c514-4d42-a176-4306bef58e38 -> ../../cciss/c0d1p2
8983 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/bbbe29b9-22c7-40d4-bfda-beb27f4a465a -> ../../cciss/c0d0p2
1457 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/f6ccc910-f761-4e64-a189-33cf0a7994c9 -> ../../cciss/c0d0p4
1423 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/9cdf604b-f075-4cc5-bb8e-583fb6154943 -> ../../cciss/c0d0p3
1405 0 drwxr-xr-x 2 root root 140 Jun 12 13:06 /dev/disk/by-path
22173 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0-part1 -> ../../cciss/c0d1p1
20993 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0-part2 -> ../../cciss/c0d1p2
20724 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0 -> ../../cciss/c0d1
1460 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-path/pci-0000:0b:08.0-part4 -> ../../cciss/c0d0p4
1426 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-path/pci-0000:0b:08.0-part3 -> ../../cciss/c0d0p3
1398 0 drwxr-xr-x 2 root root 240 Jun 12 13:06 /dev/disk/by-id
22165 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part1 -> ../../cciss/c0d1p1
22163 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part1 -> ../../cciss/c0d1p1
20984 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part2 -> ../../cciss/c0d1p2
20982 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part2 -> ../../cciss/c0d1p2
20723 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000 -> ../../cciss/c0d1
20722 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000 -> ../../cciss/c0d1
1454 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part4 -> ../../cciss/c0d0p4
1451 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part4 -> ../../cciss/c0d0p4
1419 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part3 -> ../../cciss/c0d0p3
1416 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part3 -> ../../cciss/c0d0p3
find /dev/cciss ls 1 root disk Jul 2 15:33 /dev/cciss/c0d1p2
7500 0 drwxr-xr-x 2 root root 200 Jun 12 13:06 /dev/cciss
18074 0 brw-rw---
18073 0 brw-rw---- 1 root disk Jun 12 13:06 /dev/cciss/c0d1p1
13426 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p1
7508 0 brw-rw---- 1 root disk Jun 12 13:06 /dev/cciss/c0d1
7505 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p4
7504 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p3
7503 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p2
7501 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0
Updated by Sage Weil almost 11 years ago
Hi Jing,
As far as I can tell the current ceph-disk supports these device names, but as I mentioned I don't have a system to test with. Can you pull the lastest from https://raw.github.com/ceph/ceph/master/src/ceph-disk put it in /usr/sbin, and see if everything works?
If not, my first guess is that /dev/cciss/$foo doesn't appear at /sys/block/$foo; can you check? Thanks!
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
Just tried the ceph-disk as per your suggestion, however I found the following error:
ceph-deploy osd prepare xxx:cciss/c0d1
ceph-disk-prepare -- /dev/cciss/c0d1 returned 1
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 2295, in <module>
main()
File "/usr/sbin/ceph-disk", line 2284, in main
args.func(args)
File "/usr/sbin/ceph-disk", line 1116, in main_prepare
verify_not_in_use(args.data)
File "/usr/sbin/ceph-disk", line 323, in verify_not_in_use
for partition in list_partitions(dev):
File "/usr/sbin/ceph-disk", line 233, in list_partitions
for name in os.listdir(os.path.join('/sys/block', base)):
OSError: [Errno 2] No such file or directory: '/sys/block/c0d1'
ceph-deploy: Failed to create 1 OSDs
Going through the /sys/block I found the following:
find /sys/block -ls
2651 0 drwxr-xr-x 2 root root 0 Jun 11 18:17 /sys/block
12289 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram0 -> ../devices/virtual/block/ram0
12364 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram1 -> ../devices/virtual/block/ram1
12439 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram2 -> ../devices/virtual/block/ram2
12514 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram3 -> ../devices/virtual/block/ram3
12589 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram4 -> ../devices/virtual/block/ram4
12664 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram5 -> ../devices/virtual/block/ram5
12739 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram6 -> ../devices/virtual/block/ram6
12814 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram7 -> ../devices/virtual/block/ram7
12889 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram8 -> ../devices/virtual/block/ram8
12964 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram9 -> ../devices/virtual/block/ram9
13039 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram10 -> ../devices/virtual/block/ram10
13114 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram11 -> ../devices/virtual/block/ram11
13189 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram12 -> ../devices/virtual/block/ram12
13264 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram13 -> ../devices/virtual/block/ram13
13339 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram14 -> ../devices/virtual/block/ram14
13414 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram15 -> ../devices/virtual/block/ram15
13505 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop0 -> ../devices/virtual/block/loop0
13580 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop1 -> ../devices/virtual/block/loop1
13655 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop2 -> ../devices/virtual/block/loop2
13730 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop3 -> ../devices/virtual/block/loop3
13805 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop4 -> ../devices/virtual/block/loop4
13880 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop5 -> ../devices/virtual/block/loop5
13955 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop6 -> ../devices/virtual/block/loop6
14030 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop7 -> ../devices/virtual/block/loop7
16402 0 lrwxrwxrwx 1 root root 0 Jun 11 18:17 /sys/block/cciss!c0d0 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d0/block/cciss!c0d0
16636 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/cciss!c0d1 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d1/block/cciss!c0d1
Thanks.
Luke
Updated by Sage Weil almost 11 years ago
- Status changed from Need More Info to In Progress
- Priority changed from High to Urgent
Thanks, Luke--that was exactly the info I needed!
Updated by Sage Weil almost 11 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil almost 11 years ago
Hi Luke, Tomas,
Are you able to test the latest version in this branch? https://raw.github.com/ceph/ceph/wip-ceph-disk/src/ceph-disk
Thanks!
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
I had the following error:
root@yyy:~/ceph-configure# ceph-deploy v osd prepare xxx:cciss/c0d1 /dev/cciss/c0d1 returned 1
Preparing cluster ceph disks xxx:/dev/cciss/c0d1:
Deploying osd to xxx
Host xxx is now ready for osd use.
Preparing host xxx disk /dev/cciss/c0d1 journal None activate False
ceph-disk-prepare -
ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1
ceph-deploy: Failed to create 1 OSDs
I had tested the same script on 2 separate servers and able to replicate the same error on both. Also no partitions were created.
Regards,
Luke
Updated by Sage Weil almost 11 years ago
Jing Yuan Luke wrote:
Hi Sage,
I had the following error:
root@yyy:~/ceph-configure# ceph-deploy
v osd prepare xxx:cciss/c0d1/dev/cciss/c0d1 returned 1
Preparing cluster ceph disks xxx:/dev/cciss/c0d1:
Deploying osd to xxx
Host xxx is now ready for osd use.
Preparing host xxx disk /dev/cciss/c0d1 journal None activate False
ceph-disk-prepare -ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1
ceph-deploy: Failed to create 1 OSDs
I had tested the same script on 2 separate servers and able to replicate the same error on both. Also no partitions were created.
Regards,
Luke
Hi Luke,
I pushed a version to the same branch (same url) that should print a line like 'dev ... name is ...' if you run 'ceph-disk -v /dev/cciss/c0d1'. Can you try it? And then verify whether /sys/block/$name is present? It looked like it is doing a simple s/\//!/ to the path relative to /dev, but i may be wrong.
Thanks!
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
Here is what I got:
ceph-disk -v prepare /dev/cciss/c0d1
DEBUG:ceph-disk:dev /dev/cciss/c0d1 name is cciss/c0d1
ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1
Checking /sys/block:
find /sys/block/cciss* -ls
16402 0 lrwxrwxrwx 1 root root 0 Jun 5 18:51 /sys/block/cciss!c0d0 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d0/block/cciss!c0d0
16636 0 lrwxrwxrwx 1 root root 0 Jul 10 09:15 /sys/block/cciss!c0d1 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d1/block/cciss!c0d1
Regards,
Luke
Updated by Sage Weil almost 11 years ago
aha, i see the problem. pushed a fix.. can you see if it works now?
(thanks!)
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
I think there is a typo in line 327:
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 2307, in <module>
main()
File "/usr/sbin/ceph-disk", line 2296, in main
args.func(args)
File "/usr/sbin/ceph-disk", line 1128, in main_prepare
verify_not_in_use(args.data)
File "/usr/sbin/ceph-disk", line 327, in verify_not_in_use
basename = get_dev_name(os.realpath(dev))
AttributeError: 'module' object has no attribute 'realpath'
Anyway, after I change it to basename = get_dev_name(os.path.realpath(dev)), I managed to get it running again until hit with the following error:
/dev/cciss/c0d11: No such file or directory
Usage: mkfs.xfs
< some long message from mkfs.xfs >
ceph-disk: Error: Command '['mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/cciss/c0d11']' returned non-zero exit status 1
ceph-deploy: Failed to create 1 OSDs
I believe the correct one should be /dev/cciss/c0d1p1, here is the output of my /proc/partition:
cat /proc/partitions
major minor #blocks name
104 0 244163520 cciss/c0d0
104 1 1024 cciss/c0d0p1
104 2 195584 cciss/c0d0p2
104 3 235966464 cciss/c0d0p3
104 4 7998464 cciss/c0d0p4
104 16 244163520 cciss/c0d1
104 17 243113903 cciss/c0d1p1
104 18 1047552 cciss/c0d1p2
Basically, the partitions are correctly created but not formatted.
Regards,
Luke
Updated by Sage Weil almost 11 years ago
ok, fixed the typo and redi the partition naming code.. try again?
thanks!
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
The code get through without any error, but I think somewhere in prepare_journal or related failed to setup the journal properly and the OSD daemon failed (ceph osd tree show the host as being down and out). Below are some observations:
From ceph.log after ceph-deploy:
2013-07-12 09:49:56,293 ceph_deploy.osd DEBUG Preparing cluster ceph disks xxxx:/dev/cciss/c0d1:
2013-07-12 09:49:56,527 ceph_deploy.osd DEBUG Deploying osd to xxxx
2013-07-12 09:49:56,950 ceph_deploy.osd DEBUG Host xxxx is now ready for osd use.
2013-07-12 09:49:56,950 ceph_deploy.osd DEBUG Preparing host xxxx disk /dev/cciss/c0d1 journal None activate False
2013-07-12 09:50:50,791 ceph_deploy.osd DEBUG Activating cluster ceph disks xxxx:/dev/cciss/c0d1:
2013-07-12 09:50:51,025 ceph_deploy.osd DEBUG Activating host xxxx disk /dev/cciss/c0d1
2013-07-12 09:50:51,157 ceph_deploy.osd DEBUG Distro Ubuntu codename precise, will use upstart
- id weight type name up/down reweight
-1 1.88 root default
-2 0.23 host aaa
0 0.23 osd.0 up 1
-3 0.23 host bbb
1 0.23 osd.1 up 1
-4 0.23 host ccc
2 0.23 osd.2 up 1
-5 0.5 host ddd
3 0.5 osd.3 up 1
-6 0.23 host eee
4 0.23 osd.4 up 1
-7 0.23 host fff
5 0.23 osd.5 up 1
-8 0.23 host xxxx
6 0.23 osd.6 down 0
From the host where I tried to prepare the OSD:
parted /dev/cciss/c0d1 p
Model: Compaq Smart Array (cpqarray)
Disk /dev/cciss/c0d1: 250GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
2 1049kB 1074MB 1073MB ceph journal
1 1075MB 250GB 249GB xfs ceph data
Then the /proc/partition (here the c0d1p2 aka journal is missing):
cat /proc/partitions
major minor #blocks name
104 0 244163520 cciss/c0d0
104 1 1024 cciss/c0d0p1
104 2 195584 cciss/c0d0p2
104 3 235966464 cciss/c0d0p3
104 4 7998464 cciss/c0d0p4
104 16 244163520 cciss/c0d1
104 17 243113903 cciss/c0d1p1
The mount command showed the data is indeed mounted:
mount | grep ceph
/dev/cciss/c0d1p1 on /var/lib/ceph/osd/ceph-6 type xfs (rw)
Looking into the mount and I found the link to journal is missing (not showing in /dev/disk/by-partuuid:
ls -l /var/lib/ceph/osd/ceph-6/ total 40 -rw-r--r-- 1 root root 490 Jul 12 09:49 activate.monmap -rw-r--r-- 1 root root 3 Jul 12 09:49 active -rw-r--r-- 1 root root 37 Jul 12 09:49 ceph_fsid drwxr-xr-x 4 root root 61 Jul 12 09:49 current -rw-r--r-- 1 root root 37 Jul 12 09:49 fsid lrwxrwxrwx 1 root root 58 Jul 12 09:49 journal -> /dev/disk/by-partuuid/49c86d49-56fd-41af-a4a8-a7bd1de5e7a7 -rw-r--r-- 1 root root 37 Jul 12 09:49 journal_uuid -rw------- 1 root root 56 Jul 12 09:49 keyring -rw-r--r-- 1 root root 21 Jul 12 09:49 magic -rw-r--r-- 1 root root 6 Jul 12 09:49 ready -rw-r--r-- 1 root root 4 Jul 12 09:49 store_version -rw-r--r-- 1 root root 0 Jul 12 09:49 upstart -rw-r--r-- 1 root root 2 Jul 12 09:49 whoami
ls -l /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 18 Jul 12 09:48 24b8f94f-05ac-4440-aee2-65fc4874c8ff -> ../../cciss/c0d0p4
lrwxrwxrwx 1 root root 18 Jul 12 09:49 41088959-ef5c-4588-9d37-0e447a63db0c -> ../../cciss/c0d1p1
lrwxrwxrwx 1 root root 18 Jul 12 09:48 87657540-58ec-4ab8-8cef-ba4909e9f201 -> ../../cciss/c0d0p2
lrwxrwxrwx 1 root root 18 Jul 12 09:48 e083f517-8a32-4829-be99-7ceb5cbf6f9b -> ../../cciss/c0d0p3
From the ceph-osd.0.log:
2013-07-12 09:48:42.832776 7f3ec6dd5780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17205
2013-07-12 09:48:42.894943 7f3ec6dd5780 1 journal _open /dev/cciss/c0d1p2 fd 5: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:48:42.895731 7f3ec6dd5780 -1 journal read_header error decoding journal header
2013-07-12 09:48:44.492938 7f7ea4469780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17227
2013-07-12 09:48:44.555216 7f7ea4469780 1 journal _open /dev/cciss/c0d1p2 fd 5: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:48:44.555782 7f7ea4469780 -1 journal read_header error decoding journal header
2013-07-12 09:49:05.530703 7f7d3197f780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17261
2013-07-12 09:49:05.592673 7f7d3197f780 1 journal _open /dev/cciss/c0d1p2 fd 4: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:49:05.593157 7f7d3197f780 -1 journal read_header error decoding journal header
And lastly ceph-osd.6.log:
2013-07-12 09:49:06.227090 7f8632fdf780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17299
2013-07-12 09:49:06.230161 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs in /var/lib/ceph/tmp/mnt.WImxPL
2013-07-12 09:49:06.230203 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs fsid is already set to 41088959-ef5c-4588-9d37-0e447a63db0c
2013-07-12 09:49:06.360547 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) leveldb db exists/created
2013-07-12 09:49:06.422731 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 10: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:06.423208 7f8632fdf780 -1 journal read_header error decoding journal header
2013-07-12 09:49:06.487265 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 10: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:06.523450 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkjournal created journal on /var/lib/ceph/tmp/mnt.WImxPL/journal
2013-07-12 09:49:06.523494 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs done in /var/lib/ceph/tmp/mnt.WImxPL
2013-07-12 09:49:06.671646 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount FIEMAP ioctl is supported and appears to work
2013-07-12 09:49:06.671699 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-07-12 09:49:06.672100 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount did NOT detect btrfs
2013-07-12 09:49:06.894047 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-07-12 09:49:06.894121 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount found snaps <>
2013-07-12 09:49:06.961018 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-07-12 09:49:07.023186 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 16: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:07.087738 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 16: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:07.088183 7f8632fdf780 -1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2013-07-12 09:49:07.505819 7f8632fdf780 1 journal close /var/lib/ceph/tmp/mnt.WImxPL/journal
2013-07-12 09:49:07.506224 7f8632fdf780 -1 created object store /var/lib/ceph/tmp/mnt.WImxPL journal /var/lib/ceph/tmp/mnt.WImxPL/journal for osd.6 fsid da54c1ec-86a9-4bd8-b8bc-31f870c744ed
2013-07-12 09:49:07.506276 7f8632fdf780 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.WImxPL/keyring: can't open /var/lib/ceph/tmp/mnt.WImxPL/keyring: (2) No such file or directory
2013-07-12 09:49:07.506376 7f8632fdf780 -1 created new key in keyring /var/lib/ceph/tmp/mnt.WImxPL/keyring
I am not sure if this is correct, but from somewhere I had read, this maybe due to the udev issue?
Regards,
Luke
Updated by Sage Weil almost 11 years ago
what is strange is that parted showed 2 partitoins but cat /proc/partitions only showed 1. is it still in that state? can you do 'partprobe /dev/cciss/c0d1' and then cat /proc/partitions again and see if that makes it show both partitions?
in general, when testing this, the most helpful output is from running ceph-disk directly with -v, e.g.
ceph-disk -v zap /dev/cciss/c0d1
ceph-disk -v prepare /dev/cciss/c0d1
Updated by Sage Weil almost 11 years ago
Hi Luke- have you had a chance to see if a 'partprobe /dev/cciss/c0d1' makes the journal parition appear?
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
I had run partprobe and after probably 10 minutes, I still not seeing the second partition in /proc/partitions.
Regards,
Luke
Updated by Sage Weil almost 11 years ago
it sounds like the partition table didn't actually get written, or some other problem between your kernel and parted. :/ The ceph-disk parts seem to be behaving properly, at least, so I'll merge them in.
Luke, how reproducible is this? does it happen every time? even after you, say, ceph-deploy zap host:disk to blow away the old partition table and try it again?
Updated by Sage Weil almost 11 years ago
- Status changed from Fix Under Review to Pending Backport
- Priority changed from High to Urgent
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
I tried the methods you suggested (zap and prepare) on 2 identical servers with the same controller. Both show the same problem despite doing partprobe twice 30 minutes apart.
However when I rebooted both servers, the partition table somehow got updated and now the OSDs are up and operational. I am not sure if its a kernel issue, I have Ubuntu Precise installed using the standard 3.2.0 kernel.
Regards,
Luke
Updated by Sage Weil almost 11 years ago
Jing Yuan Luke wrote:
Hi Sage,
I tried the methods you suggested (zap and prepare) on 2 identical servers with the same controller. Both show the same problem despite doing partprobe twice 30 minutes apart.
However when I rebooted both servers, the partition table somehow got updated and now the OSDs are up and operational. I am not sure if its a kernel issue, I have Ubuntu Precise installed using the standard 3.2.0 kernel.
Oh- Is one of the other partitions on the disk mounted at the time you run partprobe? I believe that can prevent a refresh.
Updated by Jing Yuan Luke almost 11 years ago
Hi Sage,
I think the first partition (data) was mounted. I should had noted that when after running zap, parted still show the partition (probably there is a need to do a check mount before zap as well?).
Regards,
Luke
Updated by Sage Weil almost 11 years ago
- Status changed from Pending Backport to Resolved