Bug #22746
osd/common: ceph-osd process is terminated by the logratote task
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
1. Construct the sceneļ¼
(1) step 1:
Open the terminal_1, and
Prepare the cmd: "killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw" from logrotate.conf
(2) step 2:
Open the terminal_2 and
run the cmd: "systemctl start ceph-osd@0"
(3) step 3:
switch to terminal_1, and
Keep running the cmd: "killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw"
2. Will find the following anomalies:
You can see ceph-osd initialization process has ended, the general end of the " osd = new OSD" this process in the ceph_osd.cc
3. The reason:
As shown in the following code,
...... osd = new OSD(g_ceph_context, store, whoami, ms_cluster, ms_public, ms_hb_front_client, ms_hb_back_client, ms_hb_front_server, ms_hb_back_server, ms_objecter, &mc, g_conf->osd_data, g_conf->osd_journal); int err = osd->pre_init(); if (err < 0) { derr << TEXT_RED << " ** ERROR: osd pre_init failed: " << cpp_strerror(-err) << TEXT_NORMAL << dendl; return 1; } ms_public->start(); ms_hb_front_client->start(); ms_hb_back_client->start(); ms_hb_front_server->start(); ms_hb_back_server->start(); ms_cluster->start(); ms_objecter->start(); // start osd err = osd->init(); if (err < 0) { derr << TEXT_RED << " ** ERROR: osd init failed: " << cpp_strerror(-err) << TEXT_NORMAL << dendl; return 1; } #ifdef BUILDING_FOR_EMBEDDED cephd_preload_rados_classes(osd); #endif // install signal handlers init_async_signal_handler(); register_async_signal_handler(SIGHUP, sighup_handler); register_async_signal_handler_oneshot(SIGINT, handle_osd_signal); register_async_signal_handler_oneshot(SIGTERM, handle_osd_signal); ......
the ceph-osd main() function does not redefine the SIGHUP semaphore before proceeding with "new osd" etc Before
that, if the killall command is received, it will cause the ceph-osd process to use the system The default approach
eventually led to the withdrawal of this process.
4.solution
Register the SIGHUP message number to the front in the ceph_osd.cc
https://github.com/ceph/ceph/pull/19958
History
#1 Updated by John Spray about 6 years ago
- Project changed from Ceph to RADOS
- Status changed from New to Fix Under Review
- Component(RADOS) OSD added
#2 Updated by Kefu Chai about 6 years ago
- Status changed from Fix Under Review to Resolved