Bug #5345
ceph-disk: handle less common device names
0%
Description
/dev/sdaa*
/dev/cciss/c0d0p1
etc.
Related issues
History
#1 Updated by Sage Weil over 10 years ago
- Assignee set to Sage Weil
#2 Updated by Sage Weil over 10 years ago
- Status changed from New to Need More Info
#3 Updated by Sage Weil over 10 years ago
- Priority changed from Urgent to High
#4 Updated by Tomas Lovato about 10 years ago
I have several HP dl180's with p400 raid controllers. This is all standard hardware.
The disk paths are enumerated to /dev/cciss/c0d0 with partitions being p0,p1,p2,etc...
The error occurs when I try an osd prepare. I can't proceed.
- ceph-deploy osd prepare ceph01:/dev/cciss/c0d0p3:/dev/cciss/c0d0p2
ceph-disk-prepare -- /dev/cciss/c0d0p3 /dev/cciss/c0d0p2 returned 1
Information: Moved requested sector from 4194338 to 4196352 in
order to align on 2048-sector boundaries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
Warning: WARNING: the kernel failed to re-read the partition table on /dev/cciss/c0d0p2 (Invalid argument). As a result, it may not reflect all of your changes until after reboot.
ceph-disk: Error: Command '['partprobe', '/dev/cciss/c0d0p2']' returned non-zero exit status 1
ceph-deploy: Failed to create 1 OSDs
#5 Updated by Sage Weil about 10 years ago
can you please try the version of ceph-disk in the wip-ceph-disk branch? it has a bunch of changes to be smarter about using the basename, but I don't have a system with that kind of driver to test it against.
#6 Updated by Jing Yuan Luke about 10 years ago
I had similar problem as Thomas but mine are HP Blades (more specifically BL460 G1) with P200i controllers but I suspect this only affect certain HP products only the newer so call G7 or G8 don't seem to display this behaviour.
Actually I should have provided the following information earlier but the activation email somehow was blocked by my corporate email server so this is my second attempt using a login with a different email account.
Anyway some information for your reference (this was run from one of the OSD host after I manually changed the ceph-disk myself):
find /dev/disk -ls
1397 0 drwxr-xr-x 7 root root 140 Jun 12 13:06 /dev/disk
20986 0 drwxr-xr-x 2 root root 80 Jun 12 13:06 /dev/disk/by-partlabel
22167 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partlabel/ceph\\x20data -> ../../cciss/c0d1p1
20987 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partlabel/ceph\\x20journal -> ../../cciss/c0d1p2
1429 0 drwxr-xr-x 2 root root 120 Jun 12 13:06 /dev/disk/by-uuid
22175 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-uuid/770be2c4-1961-4b32-8082-68f73ded5145 -> ../../cciss/c0d1p1
8989 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/754906aa-ec5c-4ecb-985c-500c60f564b0 -> ../../cciss/c0d0p2
1463 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/831417b6-493c-488f-bace-05f4a6f9be2d -> ../../cciss/c0d0p4
1430 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-uuid/72451fa4-ce32-487d-b630-aafd64dba78f -> ../../cciss/c0d0p3
1422 0 drwxr-xr-x 2 root root 140 Jun 12 13:06 /dev/disk/by-partuuid
22170 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partuuid/a520444d-28f5-4d7e-8c8c-24ce61aaddc1 -> ../../cciss/c0d1p1
20990 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-partuuid/2c95b241-c514-4d42-a176-4306bef58e38 -> ../../cciss/c0d1p2
8983 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/bbbe29b9-22c7-40d4-bfda-beb27f4a465a -> ../../cciss/c0d0p2
1457 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/f6ccc910-f761-4e64-a189-33cf0a7994c9 -> ../../cciss/c0d0p4
1423 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-partuuid/9cdf604b-f075-4cc5-bb8e-583fb6154943 -> ../../cciss/c0d0p3
1405 0 drwxr-xr-x 2 root root 140 Jun 12 13:06 /dev/disk/by-path
22173 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0-part1 -> ../../cciss/c0d1p1
20993 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0-part2 -> ../../cciss/c0d1p2
20724 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-path/pci-0000:0b:08.0 -> ../../cciss/c0d1
1460 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-path/pci-0000:0b:08.0-part4 -> ../../cciss/c0d0p4
1426 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-path/pci-0000:0b:08.0-part3 -> ../../cciss/c0d0p3
1398 0 drwxr-xr-x 2 root root 240 Jun 12 13:06 /dev/disk/by-id
22165 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part1 -> ../../cciss/c0d1p1
22163 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part1 -> ../../cciss/c0d1p1
20984 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part2 -> ../../cciss/c0d1p2
20982 0 lrwxrwxrwx 1 root root 18 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part2 -> ../../cciss/c0d1p2
20723 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000 -> ../../cciss/c0d1
20722 0 lrwxrwxrwx 1 root root 16 Jun 12 13:06 /dev/disk/by-id/cciss-3600508b1001037373620202020200000 -> ../../cciss/c0d1
1454 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part4 -> ../../cciss/c0d0p4
1451 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part4 -> ../../cciss/c0d0p4
1419 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/wwn-0x600508b1001037373620202020200000-part3 -> ../../cciss/c0d0p3
1416 0 lrwxrwxrwx 1 root root 18 Jun 12 13:02 /dev/disk/by-id/cciss-3600508b1001037373620202020200000-part3 -> ../../cciss/c0d0p3
find /dev/cciss ls 1 root disk Jul 2 15:33 /dev/cciss/c0d1p2
7500 0 drwxr-xr-x 2 root root 200 Jun 12 13:06 /dev/cciss
18074 0 brw-rw---
18073 0 brw-rw---- 1 root disk Jun 12 13:06 /dev/cciss/c0d1p1
13426 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p1
7508 0 brw-rw---- 1 root disk Jun 12 13:06 /dev/cciss/c0d1
7505 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p4
7504 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p3
7503 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0p2
7501 0 brw-rw---- 1 root disk Jun 12 13:02 /dev/cciss/c0d0
#7 Updated by Sage Weil about 10 years ago
Hi Jing,
As far as I can tell the current ceph-disk supports these device names, but as I mentioned I don't have a system to test with. Can you pull the lastest from https://raw.github.com/ceph/ceph/master/src/ceph-disk put it in /usr/sbin, and see if everything works?
If not, my first guess is that /dev/cciss/$foo doesn't appear at /sys/block/$foo; can you check? Thanks!
#8 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
Just tried the ceph-disk as per your suggestion, however I found the following error:
ceph-deploy osd prepare xxx:cciss/c0d1
ceph-disk-prepare -- /dev/cciss/c0d1 returned 1
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 2295, in <module>
main()
File "/usr/sbin/ceph-disk", line 2284, in main
args.func(args)
File "/usr/sbin/ceph-disk", line 1116, in main_prepare
verify_not_in_use(args.data)
File "/usr/sbin/ceph-disk", line 323, in verify_not_in_use
for partition in list_partitions(dev):
File "/usr/sbin/ceph-disk", line 233, in list_partitions
for name in os.listdir(os.path.join('/sys/block', base)):
OSError: [Errno 2] No such file or directory: '/sys/block/c0d1'
ceph-deploy: Failed to create 1 OSDs
Going through the /sys/block I found the following:
find /sys/block -ls
2651 0 drwxr-xr-x 2 root root 0 Jun 11 18:17 /sys/block
12289 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram0 -> ../devices/virtual/block/ram0
12364 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram1 -> ../devices/virtual/block/ram1
12439 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram2 -> ../devices/virtual/block/ram2
12514 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram3 -> ../devices/virtual/block/ram3
12589 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram4 -> ../devices/virtual/block/ram4
12664 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram5 -> ../devices/virtual/block/ram5
12739 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram6 -> ../devices/virtual/block/ram6
12814 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram7 -> ../devices/virtual/block/ram7
12889 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram8 -> ../devices/virtual/block/ram8
12964 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram9 -> ../devices/virtual/block/ram9
13039 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram10 -> ../devices/virtual/block/ram10
13114 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram11 -> ../devices/virtual/block/ram11
13189 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram12 -> ../devices/virtual/block/ram12
13264 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram13 -> ../devices/virtual/block/ram13
13339 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram14 -> ../devices/virtual/block/ram14
13414 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/ram15 -> ../devices/virtual/block/ram15
13505 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop0 -> ../devices/virtual/block/loop0
13580 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop1 -> ../devices/virtual/block/loop1
13655 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop2 -> ../devices/virtual/block/loop2
13730 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop3 -> ../devices/virtual/block/loop3
13805 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop4 -> ../devices/virtual/block/loop4
13880 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop5 -> ../devices/virtual/block/loop5
13955 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop6 -> ../devices/virtual/block/loop6
14030 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/loop7 -> ../devices/virtual/block/loop7
16402 0 lrwxrwxrwx 1 root root 0 Jun 11 18:17 /sys/block/cciss!c0d0 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d0/block/cciss!c0d0
16636 0 lrwxrwxrwx 1 root root 0 Jul 3 15:02 /sys/block/cciss!c0d1 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d1/block/cciss!c0d1
Thanks.
Luke
#9 Updated by Sage Weil about 10 years ago
- Status changed from Need More Info to In Progress
- Priority changed from High to Urgent
Thanks, Luke--that was exactly the info I needed!
#10 Updated by Sage Weil about 10 years ago
- Status changed from In Progress to Fix Under Review
#11 Updated by Sage Weil about 10 years ago
Hi Luke, Tomas,
Are you able to test the latest version in this branch? https://raw.github.com/ceph/ceph/wip-ceph-disk/src/ceph-disk
Thanks!
#12 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
I had the following error:
root@yyy:~/ceph-configure# ceph-deploy v osd prepare xxx:cciss/c0d1 /dev/cciss/c0d1 returned 1
Preparing cluster ceph disks xxx:/dev/cciss/c0d1:
Deploying osd to xxx
Host xxx is now ready for osd use.
Preparing host xxx disk /dev/cciss/c0d1 journal None activate False
ceph-disk-prepare -
ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1
ceph-deploy: Failed to create 1 OSDs
I had tested the same script on 2 separate servers and able to replicate the same error on both. Also no partitions were created.
Regards,
Luke
#13 Updated by Sage Weil about 10 years ago
Jing Yuan Luke wrote:
Hi Sage,
I had the following error:
root@yyy:~/ceph-configure# ceph-deploy
v osd prepare xxx:cciss/c0d1/dev/cciss/c0d1 returned 1
Preparing cluster ceph disks xxx:/dev/cciss/c0d1:
Deploying osd to xxx
Host xxx is now ready for osd use.
Preparing host xxx disk /dev/cciss/c0d1 journal None activate False
ceph-disk-prepare -ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1
ceph-deploy: Failed to create 1 OSDs
I had tested the same script on 2 separate servers and able to replicate the same error on both. Also no partitions were created.
Regards,
Luke
Hi Luke,
I pushed a version to the same branch (same url) that should print a line like 'dev ... name is ...' if you run 'ceph-disk -v /dev/cciss/c0d1'. Can you try it? And then verify whether /sys/block/$name is present? It looked like it is doing a simple s/\//!/ to the path relative to /dev, but i may be wrong.
Thanks!
#14 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
Here is what I got:
ceph-disk -v prepare /dev/cciss/c0d1
DEBUG:ceph-disk:dev /dev/cciss/c0d1 name is cciss/c0d1
ceph-disk: Error: not a disk or partition: /dev/cciss/c0d1
Checking /sys/block:
find /sys/block/cciss* -ls
16402 0 lrwxrwxrwx 1 root root 0 Jun 5 18:51 /sys/block/cciss!c0d0 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d0/block/cciss!c0d0
16636 0 lrwxrwxrwx 1 root root 0 Jul 10 09:15 /sys/block/cciss!c0d1 -> ../devices/pci0000:00/0000:00:03.0/0000:0a:00.0/0000:0b:08.0/cciss0/c0d1/block/cciss!c0d1
Regards,
Luke
#15 Updated by Sage Weil about 10 years ago
aha, i see the problem. pushed a fix.. can you see if it works now?
(thanks!)
#16 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
I think there is a typo in line 327:
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 2307, in <module>
main()
File "/usr/sbin/ceph-disk", line 2296, in main
args.func(args)
File "/usr/sbin/ceph-disk", line 1128, in main_prepare
verify_not_in_use(args.data)
File "/usr/sbin/ceph-disk", line 327, in verify_not_in_use
basename = get_dev_name(os.realpath(dev))
AttributeError: 'module' object has no attribute 'realpath'
Anyway, after I change it to basename = get_dev_name(os.path.realpath(dev)), I managed to get it running again until hit with the following error:
/dev/cciss/c0d11: No such file or directory
Usage: mkfs.xfs
< some long message from mkfs.xfs >
ceph-disk: Error: Command '['mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/cciss/c0d11']' returned non-zero exit status 1
ceph-deploy: Failed to create 1 OSDs
I believe the correct one should be /dev/cciss/c0d1p1, here is the output of my /proc/partition:
cat /proc/partitions
major minor #blocks name
104 0 244163520 cciss/c0d0
104 1 1024 cciss/c0d0p1
104 2 195584 cciss/c0d0p2
104 3 235966464 cciss/c0d0p3
104 4 7998464 cciss/c0d0p4
104 16 244163520 cciss/c0d1
104 17 243113903 cciss/c0d1p1
104 18 1047552 cciss/c0d1p2
Basically, the partitions are correctly created but not formatted.
Regards,
Luke
#17 Updated by Sage Weil about 10 years ago
ok, fixed the typo and redi the partition naming code.. try again?
thanks!
#18 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
The code get through without any error, but I think somewhere in prepare_journal or related failed to setup the journal properly and the OSD daemon failed (ceph osd tree show the host as being down and out). Below are some observations:
From ceph.log after ceph-deploy:
2013-07-12 09:49:56,293 ceph_deploy.osd DEBUG Preparing cluster ceph disks xxxx:/dev/cciss/c0d1:
2013-07-12 09:49:56,527 ceph_deploy.osd DEBUG Deploying osd to xxxx
2013-07-12 09:49:56,950 ceph_deploy.osd DEBUG Host xxxx is now ready for osd use.
2013-07-12 09:49:56,950 ceph_deploy.osd DEBUG Preparing host xxxx disk /dev/cciss/c0d1 journal None activate False
2013-07-12 09:50:50,791 ceph_deploy.osd DEBUG Activating cluster ceph disks xxxx:/dev/cciss/c0d1:
2013-07-12 09:50:51,025 ceph_deploy.osd DEBUG Activating host xxxx disk /dev/cciss/c0d1
2013-07-12 09:50:51,157 ceph_deploy.osd DEBUG Distro Ubuntu codename precise, will use upstart
- id weight type name up/down reweight
-1 1.88 root default
-2 0.23 host aaa
0 0.23 osd.0 up 1
-3 0.23 host bbb
1 0.23 osd.1 up 1
-4 0.23 host ccc
2 0.23 osd.2 up 1
-5 0.5 host ddd
3 0.5 osd.3 up 1
-6 0.23 host eee
4 0.23 osd.4 up 1
-7 0.23 host fff
5 0.23 osd.5 up 1
-8 0.23 host xxxx
6 0.23 osd.6 down 0
From the host where I tried to prepare the OSD:
parted /dev/cciss/c0d1 p
Model: Compaq Smart Array (cpqarray)
Disk /dev/cciss/c0d1: 250GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
2 1049kB 1074MB 1073MB ceph journal
1 1075MB 250GB 249GB xfs ceph data
Then the /proc/partition (here the c0d1p2 aka journal is missing):
cat /proc/partitions
major minor #blocks name
104 0 244163520 cciss/c0d0
104 1 1024 cciss/c0d0p1
104 2 195584 cciss/c0d0p2
104 3 235966464 cciss/c0d0p3
104 4 7998464 cciss/c0d0p4
104 16 244163520 cciss/c0d1
104 17 243113903 cciss/c0d1p1
The mount command showed the data is indeed mounted:
mount | grep ceph
/dev/cciss/c0d1p1 on /var/lib/ceph/osd/ceph-6 type xfs (rw)
Looking into the mount and I found the link to journal is missing (not showing in /dev/disk/by-partuuid:
ls -l /var/lib/ceph/osd/ceph-6/ total 40 -rw-r--r-- 1 root root 490 Jul 12 09:49 activate.monmap -rw-r--r-- 1 root root 3 Jul 12 09:49 active -rw-r--r-- 1 root root 37 Jul 12 09:49 ceph_fsid drwxr-xr-x 4 root root 61 Jul 12 09:49 current -rw-r--r-- 1 root root 37 Jul 12 09:49 fsid lrwxrwxrwx 1 root root 58 Jul 12 09:49 journal -> /dev/disk/by-partuuid/49c86d49-56fd-41af-a4a8-a7bd1de5e7a7 -rw-r--r-- 1 root root 37 Jul 12 09:49 journal_uuid -rw------- 1 root root 56 Jul 12 09:49 keyring -rw-r--r-- 1 root root 21 Jul 12 09:49 magic -rw-r--r-- 1 root root 6 Jul 12 09:49 ready -rw-r--r-- 1 root root 4 Jul 12 09:49 store_version -rw-r--r-- 1 root root 0 Jul 12 09:49 upstart -rw-r--r-- 1 root root 2 Jul 12 09:49 whoami
ls -l /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 18 Jul 12 09:48 24b8f94f-05ac-4440-aee2-65fc4874c8ff -> ../../cciss/c0d0p4
lrwxrwxrwx 1 root root 18 Jul 12 09:49 41088959-ef5c-4588-9d37-0e447a63db0c -> ../../cciss/c0d1p1
lrwxrwxrwx 1 root root 18 Jul 12 09:48 87657540-58ec-4ab8-8cef-ba4909e9f201 -> ../../cciss/c0d0p2
lrwxrwxrwx 1 root root 18 Jul 12 09:48 e083f517-8a32-4829-be99-7ceb5cbf6f9b -> ../../cciss/c0d0p3
From the ceph-osd.0.log:
2013-07-12 09:48:42.832776 7f3ec6dd5780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17205
2013-07-12 09:48:42.894943 7f3ec6dd5780 1 journal _open /dev/cciss/c0d1p2 fd 5: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:48:42.895731 7f3ec6dd5780 -1 journal read_header error decoding journal header
2013-07-12 09:48:44.492938 7f7ea4469780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17227
2013-07-12 09:48:44.555216 7f7ea4469780 1 journal _open /dev/cciss/c0d1p2 fd 5: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:48:44.555782 7f7ea4469780 -1 journal read_header error decoding journal header
2013-07-12 09:49:05.530703 7f7d3197f780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17261
2013-07-12 09:49:05.592673 7f7d3197f780 1 journal _open /dev/cciss/c0d1p2 fd 4: 1072693248 bytes, block size 4096 bytes, directio = 0, aio = 0
2013-07-12 09:49:05.593157 7f7d3197f780 -1 journal read_header error decoding journal header
And lastly ceph-osd.6.log:
2013-07-12 09:49:06.227090 7f8632fdf780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-osd, pid 17299
2013-07-12 09:49:06.230161 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs in /var/lib/ceph/tmp/mnt.WImxPL
2013-07-12 09:49:06.230203 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs fsid is already set to 41088959-ef5c-4588-9d37-0e447a63db0c
2013-07-12 09:49:06.360547 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) leveldb db exists/created
2013-07-12 09:49:06.422731 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 10: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:06.423208 7f8632fdf780 -1 journal read_header error decoding journal header
2013-07-12 09:49:06.487265 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 10: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:06.523450 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkjournal created journal on /var/lib/ceph/tmp/mnt.WImxPL/journal
2013-07-12 09:49:06.523494 7f8632fdf780 1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mkfs done in /var/lib/ceph/tmp/mnt.WImxPL
2013-07-12 09:49:06.671646 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount FIEMAP ioctl is supported and appears to work
2013-07-12 09:49:06.671699 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-07-12 09:49:06.672100 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount did NOT detect btrfs
2013-07-12 09:49:06.894047 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-07-12 09:49:06.894121 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount found snaps <>
2013-07-12 09:49:06.961018 7f8632fdf780 0 filestore(/var/lib/ceph/tmp/mnt.WImxPL) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-07-12 09:49:07.023186 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 16: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:07.087738 7f8632fdf780 1 journal _open /var/lib/ceph/tmp/mnt.WImxPL/journal fd 16: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-07-12 09:49:07.088183 7f8632fdf780 -1 filestore(/var/lib/ceph/tmp/mnt.WImxPL) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2013-07-12 09:49:07.505819 7f8632fdf780 1 journal close /var/lib/ceph/tmp/mnt.WImxPL/journal
2013-07-12 09:49:07.506224 7f8632fdf780 -1 created object store /var/lib/ceph/tmp/mnt.WImxPL journal /var/lib/ceph/tmp/mnt.WImxPL/journal for osd.6 fsid da54c1ec-86a9-4bd8-b8bc-31f870c744ed
2013-07-12 09:49:07.506276 7f8632fdf780 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.WImxPL/keyring: can't open /var/lib/ceph/tmp/mnt.WImxPL/keyring: (2) No such file or directory
2013-07-12 09:49:07.506376 7f8632fdf780 -1 created new key in keyring /var/lib/ceph/tmp/mnt.WImxPL/keyring
I am not sure if this is correct, but from somewhere I had read, this maybe due to the udev issue?
Regards,
Luke
#19 Updated by Sage Weil about 10 years ago
what is strange is that parted showed 2 partitoins but cat /proc/partitions only showed 1. is it still in that state? can you do 'partprobe /dev/cciss/c0d1' and then cat /proc/partitions again and see if that makes it show both partitions?
in general, when testing this, the most helpful output is from running ceph-disk directly with -v, e.g.
ceph-disk -v zap /dev/cciss/c0d1
ceph-disk -v prepare /dev/cciss/c0d1
#20 Updated by Sage Weil about 10 years ago
Hi Luke- have you had a chance to see if a 'partprobe /dev/cciss/c0d1' makes the journal parition appear?
#21 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
I had run partprobe and after probably 10 minutes, I still not seeing the second partition in /proc/partitions.
Regards,
Luke
#22 Updated by Sage Weil about 10 years ago
it sounds like the partition table didn't actually get written, or some other problem between your kernel and parted. :/ The ceph-disk parts seem to be behaving properly, at least, so I'll merge them in.
Luke, how reproducible is this? does it happen every time? even after you, say, ceph-deploy zap host:disk to blow away the old partition table and try it again?
#23 Updated by Sage Weil about 10 years ago
- Priority changed from Urgent to High
#24 Updated by Sage Weil about 10 years ago
- Status changed from Fix Under Review to Pending Backport
- Priority changed from High to Urgent
#25 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
I tried the methods you suggested (zap and prepare) on 2 identical servers with the same controller. Both show the same problem despite doing partprobe twice 30 minutes apart.
However when I rebooted both servers, the partition table somehow got updated and now the OSDs are up and operational. I am not sure if its a kernel issue, I have Ubuntu Precise installed using the standard 3.2.0 kernel.
Regards,
Luke
#26 Updated by Sage Weil about 10 years ago
Jing Yuan Luke wrote:
Hi Sage,
I tried the methods you suggested (zap and prepare) on 2 identical servers with the same controller. Both show the same problem despite doing partprobe twice 30 minutes apart.
However when I rebooted both servers, the partition table somehow got updated and now the OSDs are up and operational. I am not sure if its a kernel issue, I have Ubuntu Precise installed using the standard 3.2.0 kernel.
Oh- Is one of the other partitions on the disk mounted at the time you run partprobe? I believe that can prevent a refresh.
#27 Updated by Jing Yuan Luke about 10 years ago
Hi Sage,
I think the first partition (data) was mounted. I should had noted that when after running zap, parted still show the partition (probably there is a need to do a check mount before zap as well?).
Regards,
Luke
#28 Updated by Sage Weil about 10 years ago
- Status changed from Pending Backport to Resolved