Project

General

Profile

Bug #42981

run-backend-api-tests.sh: mgr oneshot signal handlers do not revert to killing process

Added by Sage Weil about 2 months ago. Updated 2 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
dashboard/qa
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

On master, when running the dashboard tests, the test tries to kill the mgr. The first SIGINT triggers shutdown of the python modules, but that invariable seems to hang due to some deadlock. Subsequent kill signals fail to kill the process.

To reproduce,

cd ../src/pybind/mgr/dashboard
./run-backend-api-tests.sh

and it will eventually time out trying to kill ceph-mgr.

The mgr log will note that the first signal was received,

2019-11-22T11:44:12.259-0600 7fef96fa4700 -1 received  signal: Terminated from python ../qa/tasks/vstart_runner.py --ignore-missing-binaries tasks.mgr.test_dashboard tasks.mgr.dashboard.test_auth tasks.mgr.dashboard.test_cephfs tasks.mgr.dashboard.test_cluster_configuration tasks.mgr.dashboard.test_erasure_code_pro
file tasks.mgr.dashboard.test_ganesha tasks.mgr.dashboard.test_health tasks.mgr.dashboard.test_host tasks.mgr.dashboard.test_logs tasks.mgr.dashboard.test_mgr_module tasks.mgr.dashboard.test_monitor tasks.mgr.dashboard.test_orchestrator tasks.mgr.dashboard.test_osd tasks.mgr.dashboard.test_perf_counters tasks.mgr.d
ashboard.test_pool tasks.mgr.dashboard.test_rbd_mirroring tasks.mgr.dashboard.test_rbd tasks.mgr.dashboard.test_requests tasks.mgr.dashboard.test_rgw tasks.mgr.dashboard.test_role tasks.mgr.dashboard.test_settings tasks.mgr.dashboard.test_summary tasks.mgr.dashboard.test_user tasks.mgr.test_module_selftest  (PID: 1
56456) UID: 1031
2019-11-22T11:44:12.259-0600 7fef96fa4700 -1 mgr handle_mgr_signal  *** Got signal Terminated ***

and shutdown deadlocks (different bug!). but sending another SIGINT fails to kill the process.

the signals are registered with

  register_async_signal_handler_oneshot(SIGINT, handle_mgr_signal);
  register_async_signal_handler_oneshot(SIGTERM, handle_mgr_signal);

and the oneshot sets the SA_RESETHAND flag,
  act.sa_flags = SA_SIGINFO | (oneshot ? SA_RESETHAND : 0);

I tested this works correctly with a kludge to ceph_mgr.cc (see attached), but it's not working later on for some reason!

kludge.patch View (1.25 KB) Sage Weil, 11/22/2019 08:20 PM


Related issues

Related to mgr - Bug #42744: mgr/dashboard: Executing the run-backend-api-tests script results in infinite loop Resolved

History

#1 Updated by Sage Weil about 2 months ago

To clarify: the signal handler is (supposed to be) installed as a one-shot: the first SIGINT/SIGTERM will trigger the handling code, and remove the signal handler, reverting to the default, so that the next SIGINT/SIGTERM just kills the process immediately. I'm not sure why this isn't happening, but what I observe is that a SIGTERM is sent hundreds of times but is seemingly ignored.

#2 Updated by Sage Weil about 2 months ago

  • Related to Bug #42744: mgr/dashboard: Executing the run-backend-api-tests script results in infinite loop added

#3 Updated by Patrick Donnelly about 1 month ago

  • Status changed from 12 to New

#4 Updated by Sebastian Wagner 2 days ago

  • Project changed from RADOS to mgr
  • Subject changed from mgr oneshot signal handlers do not revert to killing process to run-backend-api-tests.sh: mgr oneshot signal handlers do not revert to killing process
  • Category set to dashboard/qa

Also available in: Atom PDF