Bug #55334
closedmgr/cephadm: socket path too long for some daemons
0%
Description
The admin socket commands fails for daemons like rgw/rbd-mirror/cephfs-mirror
"admin_socket: exception getting command descriptions: AF_UNIX path too long".
This is probably because of too long filenames. An example path name for rbd-mirror:
/var/run/ceph/e3f41acc-ba6a-11ec-9629-525400c43ed6/ceph-client.rbd-mirror.ceph-node-00.dpqslq.2.93914410882624.asok
116 chars
This exceeds the sizeof(sockaddr_un.sun_path) , which is 108 currently:
struct sockaddr_un
{
_SOCKADDR_COMMON (sun);
char sun_path108; /* Path name. */
};
Updated by Avan Thakkar about 2 years ago
Avan Thakkar wrote:
The admin socket commands fails for daemons like rgw/rbd-mirror/cephfs-mirror
"admin_socket: exception getting command descriptions: AF_UNIX path too long".This is probably because of too long filenames. An example path name for rbd-mirror:
/var/run/ceph/e3f41acc-ba6a-11ec-9629-525400c43ed6/ceph-client.rbd-mirror.ceph-node-00.dpqslq.2.93914410882624.asok116 chars
This exceeds the sizeof(sockaddr_un.sun_path) , which is 108 currently:
struct sockaddr_un { __SOCKADDR_COMMON (sun_); char sun_path[108]; /* Path name. */ };
Updated by Avan Thakkar about 2 years ago
Ok I see name for these container are stored in form in src/cephadm
client.rbd-mirror.{daemon_id}
but the thing is the asok files stored under /var/run/ceph/<fsid> ..for e.g.
ceph-client.rbd-mirror.ceph-node-00.fqbmns.2.94595577574976.asok
So where's the extra 2.94595577574976 coming from? I couldn't see it in daemon spec of rbd-mirror as well
Updated by Redouane Kachach Elhichou almost 2 years ago
I'm afraid cephadm has nothing to do with this issue. cephadm responsibility ends on daemon name generation etc. Admin socket is created/managed by the RGW code. The part 94595577574976 corresponds to the cctid I guess (from sample.ceph.conf):
# The Ceph admin socket allows you to query a daemon via a socket interface # From a client perspective this can be a virtual machine using librbd # Type: String # Required: No ;admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
I'm not sure if this could be changed or not (RGW team may probably help here) but this is something that cephadm cannot control AFAIK.
Updated by Redouane Kachach Elhichou almost 2 years ago
- Status changed from New to Rejected
Please, feel free to re-open if you still think it's a cephadm issue. Otherwise you can move this issue to the corresponding component.
Updated by Sebastian Wagner almost 2 years ago
this issue did popped up a whole ago already. I think there is an internal rhbz about it. $cctid was added years ago to avoid having multiple RBD volumes (each having it's own librados instance thus opening a socket) conflicting with each other. Back then there was no easy way to solve it. Neha might help?
Don't think that simply closing this as "Rejected" is going to make this issue go away any time soon.
Updated by Matt Benjamin 12 months ago
Avan Thakkar,
A global requirement to make all socket paths less than 108 characters seems generally unreasonable, in an era when PATH_MAX is 4K (and generally NAME_MAX has been 255 for many many years). Obviously, that's not under your control.
However, I notice that
len(ceph-client.rbd-mirror.ceph-node-00.dpqslq.2.93914410882624.asok)
is in fact 64--well within the limit. So another view of 55334 is that, due to the use of long-ish socket file names, the Ceph tooling should not be using a full path.
Updated by Redouane Kachach Elhichou 10 months ago
- Status changed from Rejected to New
Updated by Redouane Kachach Elhichou 10 months ago
Reopning just to confir if this is a real issue or not.
I created an rgw daemon on my test cluster, and the length of the admin ASOK is the following:
(running from inside a cephadm shell):
[ceph: root@ceph-node-0 /]# echo "/var/run/ceph/ceph-client.rgw.rgw.1.ceph-node-0.rrzylk.2.94169732110424.asok" | wc 1 1 77
Running an admin socket command on the rgw daemon:
[ceph: root@ceph-node-0 /]# ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw.1.ceph-node-0.rrzylk.2.94169732110424.asok help { "config diff": "dump diff of current config and default config", "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>", "config get": "config get <field>: get the config value", "config help": "get config setting schema and descriptions",
So in summary, working from inside a chephadm shell there's no issue as the path '/var/run/ceph/ceph-client.rgw.rgw.1.ceph-node-0.rrzylk.2.94169732110424.asok' is less than the maximum (108) characters.
It only could be an issue in case somebody tries to use the admin socket from outside the cephadm shell command. In the case the whole path is longer than 108: '/var/run/ceph/994d5594-060c-11ee-b1a9-525400a90a1b/ceph-client.rgw.rgw.1.ceph-node-0.rrzylk.2.93938707486808.asok'
[root@ceph-node-0 ~]# echo "/var/run/ceph/994d5594-060c-11ee-b1a9-525400a90a1b/ceph-client.rgw.rgw.1.ceph-node-0.rrzylk.2.93938707486808.asok" | wc 1 1 114
However I'm not sure if the last use case (outside of cephadm shell) makes sense or not and what would be the case when such usage is needed.
Updated by Redouane Kachach Elhichou 10 months ago
- Status changed from New to Rejected
After an offline discussion with Avan we agreed that this is in fact not a valid BUG because it doesn't make sense to run the commands from outside cephadm shell. In case you have a different opinion feel free to reopen the BUG.