Project

General

Profile

Actions

Bug #56470

open

ceph iscsi tcmu-runner segfault in librados

Added by imirc tw almost 2 years ago. Updated almost 2 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
librados
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

Our iscsi on rbdkeeps crashing at random times with a segfault, this happens on different gateways

ework-thread372615: segfault at b0 ip 00007fd7590641a4 sp 00007fd744596978 error 4 in librados.so.2.0.0[7fd758fb2000+1a4000]

this is a cephadm deployed cluster using podman containers

Actions #1

Updated by Xiubo Li almost 2 years ago

  • Status changed from New to Need More Info

What's the tcmu-runner version ? And do you have the call trace from message logs ?

Actions #2

Updated by imirc tw almost 2 years ago

Container tag v16.2.9 default; tcmu-runner-1.5.2-89.g245914c.el8.x86_64

No call trace was recorded in the log files, only the following:

2022-07-05 03:19:07.527 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/235578344'. (Err -13)
2022-07-05 03:19:13.616 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:13.655 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/2016994959'. (Err -13)
2022-07-05 03:19:16.485 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:16.516 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/3332688053'. (Err -13)

Actions #3

Updated by imirc tw almost 2 years ago

Prrrt.. I just noticed the clients in this cluster are missing the LIO-ORG device config and are using all paths as active. I've reconfigured them and will let you know it that solved the problem or something else is going on.

Actions #4

Updated by Xiubo Li almost 2 years ago

imirc tw wrote:

Container tag v16.2.9 default; tcmu-runner-1.5.2-89.g245914c.el8.x86_64

No call trace was recorded in the log files, only the following:

2022-07-05 03:19:07.527 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/235578344'. (Err -13)
2022-07-05 03:19:13.616 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:13.655 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/2016994959'. (Err -13)
2022-07-05 03:19:16.485 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:16.516 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/3332688053'. (Err -13)

This is a little old and there had some bugs fixing including this, could you upgrade it and have a try ?

Actions

Also available in: Atom PDF