Bug #56470
openceph iscsi tcmu-runner segfault in librados
0%
Description
Hi,
Our iscsi on rbdkeeps crashing at random times with a segfault, this happens on different gateways
ework-thread372615: segfault at b0 ip 00007fd7590641a4 sp 00007fd744596978 error 4 in librados.so.2.0.0[7fd758fb2000+1a4000]
this is a cephadm deployed cluster using podman containers
Updated by Xiubo Li almost 2 years ago
- Status changed from New to Need More Info
What's the tcmu-runner version ? And do you have the call trace from message logs ?
Updated by imirc tw almost 2 years ago
Container tag v16.2.9 default; tcmu-runner-1.5.2-89.g245914c.el8.x86_64
No call trace was recorded in the log files, only the following:
2022-07-05 03:19:07.527 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/235578344'. (Err -13)
2022-07-05 03:19:13.616 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:13.655 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/2016994959'. (Err -13)
2022-07-05 03:19:16.485 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:16.516 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/3332688053'. (Err -13)
Updated by imirc tw almost 2 years ago
Prrrt.. I just noticed the clients in this cluster are missing the LIO-ORG device config and are using all paths as active. I've reconfigured them and will let you know it that solved the problem or something else is going on.
Updated by Xiubo Li almost 2 years ago
imirc tw wrote:
Container tag v16.2.9 default; tcmu-runner-1.5.2-89.g245914c.el8.x86_64
No call trace was recorded in the log files, only the following:
2022-07-05 03:19:07.527 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/235578344'. (Err -13)
2022-07-05 03:19:13.616 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:13.655 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/2016994959'. (Err -13)
2022-07-05 03:19:16.485 7 [WARN] tcmu_notify_lock_lost:271 rbd/rbd.xcp-ng-ha: Async lock drop. Old state 5
2022-07-05 03:19:16.516 7 [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:321 rbd/rbd.xcp-ng-ha: Could not rm blacklist entry '10.0.11.113:0/3332688053'. (Err -13)
This is a little old and there had some bugs fixing including this, could you upgrade it and have a try ?