Bug #381
closed"Operation not supported" not supported on various actions
0%
Description
This might seem a duplicate of #263 #322 and #377 but it keeps coming back.
This morning I updated to the latest unstable to verify that #377 had been solved, partially it did, the OSD crashes were gone, but the "Operation not supported" was still there.
root@node13:~# ceph class list 10.08.26_17:12:37.930985 mon <- [class,list] 10.08.26_17:12:38.000635 mon1 -> 'installed classes: rbd (v1.2 [x86-64]) sync (v1.0 [x86-64]) ' (0) root@node13:~# ceph class list 10.08.26_17:12:38.788628 mon <- [class,list] 10.08.26_17:12:38.789166 mon0 -> 'installed classes: rbd (v1.2 [x86-64]) sync (v1.0 [x86-64]) ' (0) root@node13:~# rbd create --size 10240 delta failed to assign a block name for image create error: Operation not supported root@node13:~# rbd snap ls alpha list_snaps failed: Operation not supported failed to list snapshots: Operation not supported root@node13:~# rbd snap create --snap=alpha001 alpha list_snaps failed: Operation not supported error searching for snapshot: Operation not supported root@node13:~#
Before doing so I ran "cclass -a". waited for some time (hours!) but it kept failing.
root@node13:~# cclass -a && sleep 30 && rbd snap ls alpha Loading class: /usr/lib/rados-classes/libcls_sync.so.1.0.0: sync 1.0 x86-64 read 184515 bytes from /usr/lib/rados-classes/libcls_sync.so.1.0.0 10.08.26_17:19:35.405774 mon <- [class,add,sync,1.0,x86-64,changed] 10.08.26_17:19:37.018712 mon1 -> 'updated' (0) Loading class: /usr/lib/rados-classes/libcls_rbd.so.1.0.0: rbd 1.2 x86-64 read 197658 bytes from /usr/lib/rados-classes/libcls_rbd.so.1.0.0 10.08.26_17:19:37.067614 mon <- [class,add,rbd,1.2,x86-64,changed] 10.08.26_17:19:38.231043 mon1 -> 'updated' (0) list_snaps failed: Operation not supported failed to list snapshots: Operation not supported root@node13:~#
On IRC it seems that other people are also experiencing this:
[22:41] <wido> about rbd, i still have some issues with it, like i reported in #377. Although the RBD class should be loaded correctly, it doesn't seem to do so [22:41] <wido> i still have to run cclass -a and ceph class list [22:41] <todinini> me too [22:41] <wido> to propogate it through the cluster [22:42] <wido> todinini: Ubuntu 10.04? [22:42] <todinini> wido: yep
I tried these commands on various machines, but they all kept failing. Even tried running cclass -a on all my nodes, but when reading the manpage it seems that wouldn't make a difference.
kvm-rbd runs fine (The VM's), but creating an image with qemu-img also fails:
root@client01:~# qemu-img create -f rbd rbd:rbd/delta 10G Formatting 'rbd:rbd/delta', fmt=rbd size=10737418240 cluster_size=0 qemu-img: Error while formatting (Input/output error) failed assigning block idroot@client01:~#
This might need a little more work to be really set to "Resolved"?
Updated by Sage Weil over 13 years ago
Did this include f429dc8aaac1a1e02f50381d16206d501c49656b ?
Updated by Wido den Hollander over 13 years ago
Yes, see my git log:
root@node01:/usr/src/ceph# git log -3 commit 4e3003249cd86898e447e56c1f4fca787f7eb905 Author: Sage Weil <sage@newdream.net> Date: Wed Aug 25 21:31:34 2010 -0700 mds: omit lock state in debug output when it's uninteresting (sync, no locks) commit 6dae938316cc7d7dd40dfbeaf6c61631dad8e3ef Author: Sage Weil <sage@newdream.net> Date: Tue Aug 24 16:37:15 2010 -0700 qa: rename snapmove -> snaptest-parents commit f429dc8aaac1a1e02f50381d16206d501c49656b Author: Sage Weil <sage@newdream.net> Date: Wed Aug 25 16:24:25 2010 -0700 osd: don't reply on missing class Instead, we queue ourselves in the waiting_for_class list. Broken by f1eb9a8751d48. root@node01:/usr/src/ceph#
root@node01:/usr/src/ceph# ls -al /usr/bin/cosd -rwxr-xr-x 1 root root 2379072 2010-08-26 09:39 /usr/bin/cosd root@node01:/usr/src/ceph#
As you can see I am at the right commit and my cosd binary has changed this morning.
Updated by Sage Weil over 13 years ago
Ok, there were a couple of different problems with the class loading and with timeouts. Please try 610b2e9dae2213462af7a152308c4beed54974a4.
Among other things, there are now 2 timeouts: 'osd class timeout' for good classes in the osd cache, and 'osd class error timeout' for missing classes. The latter timeout is shorter, so if you try a class and it's not loaded, then load it, you won't have to wait as long before the osd will re-request it from the monitor (assuming the error timeout is shorter! by default, 1 minute vs 1 hour for good classes).
Updated by Wido den Hollander over 13 years ago
Still no luck with this:
root@node13:~# cclass -a && sleep 90 && rbd snap ls alpha Loading class: /usr/lib/rados-classes/libcls_sync.so.1.0.0: sync 1.0 x86-64 read 184515 bytes from /usr/lib/rados-classes/libcls_sync.so.1.0.0 10.08.27_15:15:40.972608 mon <- [class,add,sync,1.0,x86-64,changed] 10.08.27_15:15:42.157734 mon1 -> 'updated' (0) Loading class: /usr/lib/rados-classes/libcls_rbd.so.1.0.0: rbd 1.2 x86-64 read 197658 bytes from /usr/lib/rados-classes/libcls_rbd.so.1.0.0 10.08.27_15:15:42.206335 mon <- [class,add,rbd,1.2,x86-64,changed] 10.08.27_15:15:43.365032 mon1 -> 'updated' (0) list_snaps failed: Operation not supported failed to list snapshots: Operation not supported root@node13:~#
I'm 100% sure i'm at the right commit. Check the modtime of the binaries and the buildtime of my .debs, they are all OK. Even restarted my whole cluster a few times.
My OSD config:
[osd] keyring = /etc/ceph/keyring.$name osd data = /srv/ceph/$name osd recovery max active = 1 osd pool default size = 3 osd pool default crush rule = 0 osd class error timeout = 30 osd class timeout = 60.0 debug osd = 1 debug filestore = 1
So in the example above, I should be getting results after 90 seconds. But even after a few hours the class is not working, while it is loaded:
root@node13:~# ceph class list 10.08.27_15:20:10.288448 mon <- [class,list] 10.08.27_15:20:10.288810 mon0 -> 'installed classes: rbd (v1.2 [x86-64]) sync (v1.0 [x86-64]) ' (0) root@node13:~# ceph -s 10.08.27_15:20:16.423661 pg v224603: 3240 pgs: 3240 active+clean; 1395 GB data, 1592 GB used, 3559 GB / 5151 GB avail 10.08.27_15:20:16.432695 mds e235: 1/1/1 up {0=up:active(laggy or crashed)}, 2 up:standby(laggy or crashed) 10.08.27_15:20:16.432737 osd e11336: 12 osds: 10 up, 10 in 10.08.27_15:20:16.432832 log 10.08.27_09:05:20.417379 mon0 [2001:16f8:10:2::c3c3:3f9b]:6789/0 20 : [INF] osd11 out (down for 302.300002) 10.08.27_15:20:16.432943 class rbd (v1.2 [x86-64]) 10.08.27_15:20:16.432962 mon e1: 2 mons at {node13=[2001:16f8:10:2::c3c3:3f9b]:6789/0,node14=[2001:16f8:10:2::c3c3:2e5c]:6789/0} root@node13:~#
It simply won't work, the RBD class is loaded, but does not seem to function.
Updated by Yehuda Sadeh over 13 years ago
We have seen in the past a case where the library execution path (the path were we temporarily store the objects on the osd before loading) didn't have execute permissions and it failed, though I think it was a different error.
Another possible problem would be if you're running osds over multiple architectures (e.g., x86-32).
Updated by Wido den Hollander over 13 years ago
My whole cluster is running x86-64 on Ubuntu 10.04 (even the same kernel), so the arch mix won't be the issue.
Updated by Wido den Hollander over 13 years ago
I've been going through the OSD logs and I see:
osd11
10.08.27_08:26:27.888976 7f0467030710 osd11 10954 pg[3.13a( v 10954'9058 (10954'9056,10954'9058] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10954'9057 active+clean] call class rbd does not exist 10.08.27_08:49:13.235395 7f0469135710 osd11 10957 class rbd method assign_bid flags= 10.08.27_08:49:13.235846 7f0467831710 osd11 10957 get_class 'rbd' by 0x48da900 10.08.27_08:49:13.235876 7f0467831710 class timed out going to send request for rbd v [x86-64] 10.08.27_08:49:13.235889 7f0467831710 osd11 10957 send_class_request class=rbd version= [x86-64] 10.08.27_08:49:13.235916 7f0467831710 osd11 10957 class not supported 10.08.27_08:49:13.235935 7f0467831710 osd11 10957 pg[3.13a( v 10957'9059 (10954'9057,10957'9059] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10954'9058 active+clean] call class rbd does not exist 10.08.27_08:49:13.236240 7f0469135710 osd11 10957 _dispatch 0x40612c0 class(SET, 1 entrieslast 4096) v1 10.08.27_08:49:13.236270 7f0469135710 osd11 10957 handle_class action=3 10.08.27_08:49:13.236284 7f0469135710 handle_class rbd 10.08.27_08:49:13.236296 7f0469135710 response of an invalid class 'rbd' 10.08.27_08:49:13.236338 7f0469135710 osd11 10957 got_class 'rbd' 10.08.27_08:49:21.238478 7f0469135710 osd11 10957 class rbd method assign_bid flags= 10.08.27_08:49:21.238948 7f0467030710 osd11 10957 get_class 'rbd' by 0x48dab40 10.08.27_08:49:21.238979 7f0467030710 osd11 10957 class not supported 10.08.27_08:49:21.238992 7f0467030710 osd11 10957 pg[3.13a( v 10957'9060 (10954'9058,10957'9060] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10957'9059 active+clean] call class rbd does not exist 10.08.27_08:49:25.337251 7f0469135710 osd11 10957 class rbd method assign_bid flags= 10.08.27_08:49:25.337699 7f0467831710 osd11 10957 get_class 'rbd' by 0x43e1240 10.08.27_08:49:25.337731 7f0467831710 osd11 10957 class not supported 10.08.27_08:49:25.337743 7f0467831710 osd11 10957 pg[3.13a( v 10957'9061 (10957'9059,10957'9061] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10957'9060 active+clean] call class rbd does not exist
10.08.27_15:42:15.836594 7f37adb6a710 get_obj: adding new class name=rbd ptr=0x3e4d7a8 10.08.27_15:42:15.836639 7f37adb6a710 osd11 11573 class rbd method assign_bid flags= 10.08.27_15:42:15.837116 7f37adb6a710 osd11 11573 class rbd method assign_bid flags= 10.08.27_15:42:15.837269 7f37ab964710 osd11 11573 class not supported
On other OSD's I wasn't able to find any messages about the RBD class.
I also checked the timestamps of /usr/lib/rados-classes/libcls_rbd.so.1.0.0 on all the OSD's, but those match, they were all modified this morning (When I updated the cluster).
Updated by Sage Weil over 13 years ago
The problem is on the monitor:
root@logger:~# ceph class list 10.08.28_16:36:38.531048 mon <- [class,list] 10.08.28_16:36:38.531634 mon0 -> 'installed classes: (v []) [active] sync (v1.0 [x86-64]) ' (0)
cclass -a and my attempts to add rbd by hand fail:
root@logger:~# ceph class del rbd 1.2 x86-64 10.08.28_16:38:53.608356 mon <- [class,del,rbd,1.2,x86-64] 10.08.28_16:38:54.719862 mon0 -> 'updated' (0) root@logger:~# ceph class list 10.08.28_16:38:58.713116 mon <- [class,list] 10.08.28_16:38:58.745979 mon1 -> 'installed classes: (v []) [active] sync (v1.0 [x86-64]) ' (0) root@logger:~# ceph class del '' '' '' 10.08.28_16:39:11.484623 mon <- [class,del,,,] 10.08.28_16:39:11.485178 mon0 -> 'couldn't find class ' (-22) root@logger:~# ceph class list 10.08.28_16:39:16.168194 mon <- [class,list] 10.08.28_16:39:16.206056 mon1 -> 'installed classes: (v []) [active] sync (v1.0 [x86-64]) ' (0) root@logger:~# ceph -i /usr/lib/rados-classes/libcls_rbd.so.1.0.0 class add rbd 1.2 x86-64 read 197658 bytes from /usr/lib/rados-classes/libcls_rbd.so.1.0.0 10.08.28_16:39:26.228687 mon <- [class,add,rbd,1.2,x86-64] 10.08.28_16:39:27.838072 mon1 -> 'updated' (0) root@logger:~# ceph class list 10.08.28_16:39:31.018152 mon <- [class,list] 10.08.28_16:39:31.018756 mon0 -> 'installed classes: (v []) [active] sync (v1.0 [x86-64]) ' (0)
Updated by Wido den Hollander over 13 years ago
Right now RBD seems to be loaded again:
root@node13:~# ceph class list 10.08.30_09:56:55.689043 mon <- [class,list] 10.08.30_09:56:55.690341 mon1 -> 'installed classes: rbd (v1.2 [x86-64]) sync (v1.0 [x86-64]) ' (0) root@node13:~#
But creating a RBD image is still not possible, same errors as before.
Updated by Yehuda Sadeh over 13 years ago
Should test this one after the fix of #386 (5ae8e26cc86e046d92454ad41b124d5cad5bd1ba). Should first del/add rbd class, as monitor might have gotten into a bad state.
Updated by Wido den Hollander over 13 years ago
Seems to be fixed for me now, removing the class and running "cclass -a" fixed it, all operations work for me now.
Updated by Yehuda Sadeh over 13 years ago
- Status changed from New to Resolved
I'll close it for now, assuming it fixed the problem.
Updated by Sage Weil over 13 years ago
- Project changed from 3 to Ceph
- Category deleted (
8)