Project

General

Profile

Actions

Bug #50237

open

cephfs-journal-tool/cephfs-data-scan: Stuck in infinite loop with "NetHandler create_socket couldn't create socket"

Added by n u about 3 years ago. Updated almost 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
tools
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Both tools are getting stuck in an infinite loop and only outputting this message.

Env:
version: 15.2.4
dockerized: ceph/ceph:v15.2.4
kernel: 3.10.0-1160.15.2.el7.x86_64

mons are msgrv2 only
ceph.conf:
mon host = [v2:10.0.0.1:3300],[v2:10.0.0.2:3300], ...
ms_mon_client/cluster/service_mode = secure
ms_cluster/service/client = secure

Example commands we tried:
cephfs-data-scan pg_files / 23.4
cephfs-journal-tool journal export backup.bin

For debugging we tried so far:
- Running the command from a non containerized env
- Multiple client versions (nautilus, very recent octopus build)
- Look at debug logs: we noticed that the client tries msgrv1 but not msgrv2, but I'm not 100% sure, because I didn't record the logs (I can go back and verify if needed)

Is it possible that these tools cannot speak msgrv2?
If yes, what would be the process to "enable" msgrv1? (Our monitors are already listening on the v1 port however if I update ceph.conf with the v1 addresses and run ex.: ceph -s, it crashes the monitor which receives the query)

The following is a similar bug report, however in our case the tools don't work at all.
https://tracker.ceph.com/issues/41034


Files

debuglogs_50237.tar.gz (785 KB) debuglogs_50237.tar.gz daemon debug logs, strace outputs while running the cephfs recovery tools n u, 04/16/2021 01:30 PM
Actions #1

Updated by Patrick Donnelly about 3 years ago

  • Affected Versions deleted (v0.55d)

The warning is probably unrelated. Please collect logs.

Actions #2

Updated by Patrick Donnelly about 3 years ago

  • Subject changed from cephfs-journal-tool/cephfs-data-scan: Stuck in infinite loop with "NetHandler create_socket couldn't create socket" to cephfs-journal-tool/cephfs-data-scan: Stuck in infinite loop with "NetHandler create_socket couldn't create socket"
  • Status changed from New to Need More Info
Actions #3

Updated by n u about 3 years ago

thanks for the reply!
i'll collect relevant mon/mds debug logs, and the strace outputs this week
if there is anything else i should prepare, please let me know

Actions #4

Updated by n u about 3 years ago

we ran the followings from our mon.0 (10.223.14.4):

"cephfs-data-scan pg_files / 23.4" at 2021-04-16T12:16:58.280+0000
"cephfs-journal-tool journal export backup.bin" at 2021-04-16T12:17:37.243+0000

and collected the logs from that timeframe, as well as strace outputs

extra info:
mds.0 and was the active at the time, and we have 1 filesystem currently in the cluster
the cluster was reinstalled from scratch 2 weeks ago

Actions #5

Updated by n u almost 3 years ago

it seems that the tools (cephfs) doesn't support msgrv2

enabling msgrv1 fixed the issue

Actions

Also available in: Atom PDF