Bug #56470
open
ceph iscsi tcmu-runner segfault in librados
Added by imirc tw almost 2 years ago.
Updated almost 2 years ago.
Description
Hi,
Our iscsi on rbdkeeps crashing at random times with a segfault, this happens on different gateways
ework-thread372615: segfault at b0 ip 00007fd7590641a4 sp 00007fd744596978 error 4 in librados.so.2.0.0[7fd758fb2000+1a4000]
this is a cephadm deployed cluster using podman containers
- Status changed from New to Need More Info
What's the tcmu-runner version ? And do you have the call trace from message logs ?
Container tag v16.2.9 default; tcmu-runner-1.5.2-89.g245914c.el8.x86_64
No call trace was recorded in the log files, only the following:
2022-07-05 03:19:07.527 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/235578344'. (Err -13)
2022-07-05 03:19:13.616 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:13.655 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/2016994959'. (Err -13)
2022-07-05 03:19:16.485 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:16.516 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/3332688053'. (Err -13)
Prrrt.. I just noticed the clients in this cluster are missing the LIO-ORG device config and are using all paths as active. I've reconfigured them and will let you know it that solved the problem or something else is going on.
imirc tw wrote:
Container tag v16.2.9 default; tcmu-runner-1.5.2-89.g245914c.el8.x86_64
No call trace was recorded in the log files, only the following:
2022-07-05 03:19:07.527 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/235578344'. (Err -13)
2022-07-05 03:19:13.616 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:13.655 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/2016994959'. (Err -13)
2022-07-05 03:19:16.485 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:16.516 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/3332688053'. (Err -13)
This is a little old and there had some bugs fixing including this, could you upgrade it and have a try ?
Also available in: Atom
PDF