Project

General

Profile

Bug #381

"Operation not supported" not supported on various actions

Added by Wido den Hollander about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
08/26/2010
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

This might seem a duplicate of #263 #322 and #377 but it keeps coming back.

This morning I updated to the latest unstable to verify that #377 had been solved, partially it did, the OSD crashes were gone, but the "Operation not supported" was still there.

root@node13:~# ceph class list
10.08.26_17:12:37.930985 mon <- [class,list]
10.08.26_17:12:38.000635 mon1 -> 'installed classes: 
rbd (v1.2 [x86-64])
sync (v1.0 [x86-64])
' (0)
root@node13:~# ceph class list
10.08.26_17:12:38.788628 mon <- [class,list]
10.08.26_17:12:38.789166 mon0 -> 'installed classes: 
rbd (v1.2 [x86-64])
sync (v1.0 [x86-64])
' (0)
root@node13:~# rbd create --size 10240 delta
failed to assign a block name for image
create error: Operation not supported
root@node13:~# rbd snap ls alpha
list_snaps failed: Operation not supported
failed to list snapshots: Operation not supported
root@node13:~# rbd snap create --snap=alpha001 alpha
list_snaps failed: Operation not supported
error searching for snapshot: Operation not supported
root@node13:~#

Before doing so I ran "cclass -a". waited for some time (hours!) but it kept failing.

root@node13:~# cclass -a && sleep 30 && rbd snap ls alpha
Loading class: /usr/lib/rados-classes/libcls_sync.so.1.0.0: sync 1.0 x86-64
read 184515 bytes from /usr/lib/rados-classes/libcls_sync.so.1.0.0
10.08.26_17:19:35.405774 mon <- [class,add,sync,1.0,x86-64,changed]
10.08.26_17:19:37.018712 mon1 -> 'updated' (0)
Loading class: /usr/lib/rados-classes/libcls_rbd.so.1.0.0: rbd 1.2 x86-64
read 197658 bytes from /usr/lib/rados-classes/libcls_rbd.so.1.0.0
10.08.26_17:19:37.067614 mon <- [class,add,rbd,1.2,x86-64,changed]
10.08.26_17:19:38.231043 mon1 -> 'updated' (0)
list_snaps failed: Operation not supported
failed to list snapshots: Operation not supported
root@node13:~#

On IRC it seems that other people are also experiencing this:

[22:41] <wido>  about rbd, i still have some issues with it, like i reported in #377. Although the RBD class should be loaded correctly, it doesn't seem to do so
[22:41] <wido> i still have to run cclass -a and ceph class list
[22:41] <todinini> me too
[22:41] <wido> to propogate it through the cluster
[22:42] <wido> todinini: Ubuntu 10.04?
[22:42] <todinini> wido: yep

I tried these commands on various machines, but they all kept failing. Even tried running cclass -a on all my nodes, but when reading the manpage it seems that wouldn't make a difference.

kvm-rbd runs fine (The VM's), but creating an image with qemu-img also fails:

root@client01:~# qemu-img create -f rbd rbd:rbd/delta 10G
Formatting 'rbd:rbd/delta', fmt=rbd size=10737418240 cluster_size=0 
qemu-img: Error while formatting (Input/output error)
failed assigning block idroot@client01:~#

This might need a little more work to be really set to "Resolved"?

History

#2 Updated by Wido den Hollander about 9 years ago

Yes, see my git log:

root@node01:/usr/src/ceph# git log -3
commit 4e3003249cd86898e447e56c1f4fca787f7eb905
Author: Sage Weil <sage@newdream.net>
Date:   Wed Aug 25 21:31:34 2010 -0700

    mds: omit lock state in debug output when it's uninteresting (sync, no locks)

commit 6dae938316cc7d7dd40dfbeaf6c61631dad8e3ef
Author: Sage Weil <sage@newdream.net>
Date:   Tue Aug 24 16:37:15 2010 -0700

    qa: rename snapmove -> snaptest-parents

commit f429dc8aaac1a1e02f50381d16206d501c49656b
Author: Sage Weil <sage@newdream.net>
Date:   Wed Aug 25 16:24:25 2010 -0700

    osd: don't reply on missing class

    Instead, we queue ourselves in the waiting_for_class list.  Broken by
    f1eb9a8751d48.
root@node01:/usr/src/ceph#
root@node01:/usr/src/ceph# ls -al /usr/bin/cosd 
-rwxr-xr-x 1 root root 2379072 2010-08-26 09:39 /usr/bin/cosd
root@node01:/usr/src/ceph# 

As you can see I am at the right commit and my cosd binary has changed this morning.

#3 Updated by Sage Weil about 9 years ago

Ok, there were a couple of different problems with the class loading and with timeouts. Please try 610b2e9dae2213462af7a152308c4beed54974a4.

Among other things, there are now 2 timeouts: 'osd class timeout' for good classes in the osd cache, and 'osd class error timeout' for missing classes. The latter timeout is shorter, so if you try a class and it's not loaded, then load it, you won't have to wait as long before the osd will re-request it from the monitor (assuming the error timeout is shorter! by default, 1 minute vs 1 hour for good classes).

#4 Updated by Wido den Hollander about 9 years ago

Still no luck with this:

root@node13:~# cclass -a && sleep 90 && rbd snap ls alpha
Loading class: /usr/lib/rados-classes/libcls_sync.so.1.0.0: sync 1.0 x86-64
read 184515 bytes from /usr/lib/rados-classes/libcls_sync.so.1.0.0
10.08.27_15:15:40.972608 mon <- [class,add,sync,1.0,x86-64,changed]
10.08.27_15:15:42.157734 mon1 -> 'updated' (0)
Loading class: /usr/lib/rados-classes/libcls_rbd.so.1.0.0: rbd 1.2 x86-64
read 197658 bytes from /usr/lib/rados-classes/libcls_rbd.so.1.0.0
10.08.27_15:15:42.206335 mon <- [class,add,rbd,1.2,x86-64,changed]
10.08.27_15:15:43.365032 mon1 -> 'updated' (0)
list_snaps failed: Operation not supported
failed to list snapshots: Operation not supported
root@node13:~#

I'm 100% sure i'm at the right commit. Check the modtime of the binaries and the buildtime of my .debs, they are all OK. Even restarted my whole cluster a few times.

My OSD config:

[osd]
        keyring = /etc/ceph/keyring.$name
        osd data = /srv/ceph/$name
        osd recovery max active = 1
        osd pool default size = 3
        osd pool default crush rule = 0
        osd class error timeout = 30
        osd class timeout = 60.0
        debug osd = 1
        debug filestore = 1

So in the example above, I should be getting results after 90 seconds. But even after a few hours the class is not working, while it is loaded:

root@node13:~# ceph class list
10.08.27_15:20:10.288448 mon <- [class,list]
10.08.27_15:20:10.288810 mon0 -> 'installed classes: 
rbd (v1.2 [x86-64])
sync (v1.0 [x86-64])
' (0)
root@node13:~# ceph -s
10.08.27_15:20:16.423661    pg v224603: 3240 pgs: 3240 active+clean; 1395 GB data, 1592 GB used, 3559 GB / 5151 GB avail
10.08.27_15:20:16.432695   mds e235: 1/1/1 up {0=up:active(laggy or crashed)}, 2 up:standby(laggy or crashed)
10.08.27_15:20:16.432737   osd e11336: 12 osds: 10 up, 10 in
10.08.27_15:20:16.432832   log 10.08.27_09:05:20.417379 mon0 [2001:16f8:10:2::c3c3:3f9b]:6789/0 20 : [INF] osd11 out (down for 302.300002)
10.08.27_15:20:16.432943   class rbd (v1.2 [x86-64])
10.08.27_15:20:16.432962   mon e1: 2 mons at {node13=[2001:16f8:10:2::c3c3:3f9b]:6789/0,node14=[2001:16f8:10:2::c3c3:2e5c]:6789/0}
root@node13:~#

It simply won't work, the RBD class is loaded, but does not seem to function.

#5 Updated by Yehuda Sadeh about 9 years ago

We have seen in the past a case where the library execution path (the path were we temporarily store the objects on the osd before loading) didn't have execute permissions and it failed, though I think it was a different error.
Another possible problem would be if you're running osds over multiple architectures (e.g., x86-32).

#6 Updated by Wido den Hollander about 9 years ago

My whole cluster is running x86-64 on Ubuntu 10.04 (even the same kernel), so the arch mix won't be the issue.

#7 Updated by Wido den Hollander about 9 years ago

I've been going through the OSD logs and I see:

osd11

10.08.27_08:26:27.888976 7f0467030710 osd11 10954 pg[3.13a( v 10954'9058 (10954'9056,10954'9058] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10954'9057 active+clean] call class rbd does not exist
10.08.27_08:49:13.235395 7f0469135710 osd11 10957 class rbd method assign_bid flags=
10.08.27_08:49:13.235846 7f0467831710 osd11 10957 get_class 'rbd' by 0x48da900
10.08.27_08:49:13.235876 7f0467831710 class timed out going to send request for rbd v [x86-64]
10.08.27_08:49:13.235889 7f0467831710 osd11 10957 send_class_request class=rbd version= [x86-64]
10.08.27_08:49:13.235916 7f0467831710 osd11 10957 class not supported
10.08.27_08:49:13.235935 7f0467831710 osd11 10957 pg[3.13a( v 10957'9059 (10954'9057,10957'9059] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10954'9058 active+clean] call class rbd does not exist
10.08.27_08:49:13.236240 7f0469135710 osd11 10957 _dispatch 0x40612c0 class(SET, 1 entrieslast 4096) v1
10.08.27_08:49:13.236270 7f0469135710 osd11 10957 handle_class action=3
10.08.27_08:49:13.236284 7f0469135710 handle_class rbd
10.08.27_08:49:13.236296 7f0469135710 response of an invalid class 'rbd'
10.08.27_08:49:13.236338 7f0469135710 osd11 10957 got_class 'rbd'
10.08.27_08:49:21.238478 7f0469135710 osd11 10957 class rbd method assign_bid flags=
10.08.27_08:49:21.238948 7f0467030710 osd11 10957 get_class 'rbd' by 0x48dab40
10.08.27_08:49:21.238979 7f0467030710 osd11 10957 class not supported
10.08.27_08:49:21.238992 7f0467030710 osd11 10957 pg[3.13a( v 10957'9060 (10954'9058,10957'9060] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10957'9059 active+clean] call class rbd does not exist
10.08.27_08:49:25.337251 7f0469135710 osd11 10957 class rbd method assign_bid flags=
10.08.27_08:49:25.337699 7f0467831710 osd11 10957 get_class 'rbd' by 0x43e1240
10.08.27_08:49:25.337731 7f0467831710 osd11 10957 class not supported
10.08.27_08:49:25.337743 7f0467831710 osd11 10957 pg[3.13a( v 10957'9061 (10957'9059,10957'9061] n=34 ec=2 les=10948 10946/10946/10946) [11,1] r=0 mlcod 10957'9060 active+clean] call class rbd does not exist

10.08.27_15:42:15.836594 7f37adb6a710 get_obj: adding new class name=rbd ptr=0x3e4d7a8
10.08.27_15:42:15.836639 7f37adb6a710 osd11 11573 class rbd method assign_bid flags=
10.08.27_15:42:15.837116 7f37adb6a710 osd11 11573 class rbd method assign_bid flags=
10.08.27_15:42:15.837269 7f37ab964710 osd11 11573 class not supported

On other OSD's I wasn't able to find any messages about the RBD class.

I also checked the timestamps of /usr/lib/rados-classes/libcls_rbd.so.1.0.0 on all the OSD's, but those match, they were all modified this morning (When I updated the cluster).

#8 Updated by Sage Weil about 9 years ago

The problem is on the monitor:

root@logger:~# ceph class list
10.08.28_16:36:38.531048 mon <- [class,list]
10.08.28_16:36:38.531634 mon0 -> 'installed classes: 
 (v []) [active]
sync (v1.0 [x86-64])
' (0)

cclass -a and my attempts to add rbd by hand fail:
root@logger:~# ceph class del rbd 1.2 x86-64
10.08.28_16:38:53.608356 mon <- [class,del,rbd,1.2,x86-64]
10.08.28_16:38:54.719862 mon0 -> 'updated' (0)
root@logger:~# ceph class list
10.08.28_16:38:58.713116 mon <- [class,list]
10.08.28_16:38:58.745979 mon1 -> 'installed classes: 
 (v []) [active]
sync (v1.0 [x86-64])
' (0)
root@logger:~# ceph class del '' '' ''
10.08.28_16:39:11.484623 mon <- [class,del,,,]
10.08.28_16:39:11.485178 mon0 -> 'couldn't find class ' (-22)
root@logger:~# ceph class list
10.08.28_16:39:16.168194 mon <- [class,list]
10.08.28_16:39:16.206056 mon1 -> 'installed classes: 
 (v []) [active]
sync (v1.0 [x86-64])
' (0)
root@logger:~# ceph -i /usr/lib/rados-classes/libcls_rbd.so.1.0.0 class add rbd 1.2 x86-64
read 197658 bytes from /usr/lib/rados-classes/libcls_rbd.so.1.0.0
10.08.28_16:39:26.228687 mon <- [class,add,rbd,1.2,x86-64]
10.08.28_16:39:27.838072 mon1 -> 'updated' (0)
root@logger:~# ceph class list
10.08.28_16:39:31.018152 mon <- [class,list]
10.08.28_16:39:31.018756 mon0 -> 'installed classes: 
 (v []) [active]
sync (v1.0 [x86-64])
' (0)

#9 Updated by Wido den Hollander about 9 years ago

Right now RBD seems to be loaded again:

root@node13:~# ceph class list
10.08.30_09:56:55.689043 mon <- [class,list]
10.08.30_09:56:55.690341 mon1 -> 'installed classes: 
rbd (v1.2 [x86-64])
sync (v1.0 [x86-64])
' (0)
root@node13:~#

But creating a RBD image is still not possible, same errors as before.

#10 Updated by Yehuda Sadeh about 9 years ago

Should test this one after the fix of #386 (5ae8e26cc86e046d92454ad41b124d5cad5bd1ba). Should first del/add rbd class, as monitor might have gotten into a bad state.

#11 Updated by Wido den Hollander about 9 years ago

Seems to be fixed for me now, removing the class and running "cclass -a" fixed it, all operations work for me now.

#12 Updated by Yehuda Sadeh about 9 years ago

  • Status changed from New to Resolved

I'll close it for now, assuming it fixed the problem.

#13 Updated by Sage Weil almost 9 years ago

  • Project changed from 3 to Ceph
  • Category deleted (8)

Also available in: Atom PDF