Bug #50237
opencephfs-journal-tool/cephfs-data-scan: Stuck in infinite loop with "NetHandler create_socket couldn't create socket"
0%
Description
Both tools are getting stuck in an infinite loop and only outputting this message.
Env:
version: 15.2.4
dockerized: ceph/ceph:v15.2.4
kernel: 3.10.0-1160.15.2.el7.x86_64
mons are msgrv2 only
ceph.conf:
mon host = [v2:10.0.0.1:3300],[v2:10.0.0.2:3300], ...
ms_mon_client/cluster/service_mode = secure
ms_cluster/service/client = secure
Example commands we tried:
cephfs-data-scan pg_files / 23.4
cephfs-journal-tool journal export backup.bin
For debugging we tried so far:
- Running the command from a non containerized env
- Multiple client versions (nautilus, very recent octopus build)
- Look at debug logs: we noticed that the client tries msgrv1 but not msgrv2, but I'm not 100% sure, because I didn't record the logs (I can go back and verify if needed)
Is it possible that these tools cannot speak msgrv2?
If yes, what would be the process to "enable" msgrv1? (Our monitors are already listening on the v1 port however if I update ceph.conf with the v1 addresses and run ex.: ceph -s, it crashes the monitor which receives the query)
The following is a similar bug report, however in our case the tools don't work at all.
https://tracker.ceph.com/issues/41034
Files
Updated by Patrick Donnelly about 3 years ago
- Affected Versions deleted (
v0.55d)
The warning is probably unrelated. Please collect logs.
Updated by Patrick Donnelly about 3 years ago
- Subject changed from cephfs-journal-tool/cephfs-data-scan: Stuck in infinite loop with "NetHandler create_socket couldn't create socket" to cephfs-journal-tool/cephfs-data-scan: Stuck in infinite loop with "NetHandler create_socket couldn't create socket"
- Status changed from New to Need More Info
Updated by n u about 3 years ago
thanks for the reply!
i'll collect relevant mon/mds debug logs, and the strace outputs this week
if there is anything else i should prepare, please let me know
Updated by n u about 3 years ago
- File debuglogs_50237.tar.gz debuglogs_50237.tar.gz added
we ran the followings from our mon.0 (10.223.14.4):
"cephfs-data-scan pg_files / 23.4" at 2021-04-16T12:16:58.280+0000
"cephfs-journal-tool journal export backup.bin" at 2021-04-16T12:17:37.243+0000
and collected the logs from that timeframe, as well as strace outputs
extra info:
mds.0 and was the active at the time, and we have 1 filesystem currently in the cluster
the cluster was reinstalled from scratch 2 weeks ago
Updated by n u almost 3 years ago
it seems that the tools (cephfs) doesn't support msgrv2
enabling msgrv1 fixed the issue