Project

General

Profile

Bug #22746

osd/common: ceph-osd process is terminated by the logratote task

Added by huanwen ren over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
01/22/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature:

Description

1. Construct the sceneļ¼š

(1) step 1:
Open the terminal_1, and
Prepare the cmd: "killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw" from logrotate.conf

(2) step 2:
Open the terminal_2 and
run the cmd: "systemctl start ceph-osd@0"

(3) step 3:
switch to terminal_1, and
Keep running the cmd: "killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw"

2. Will find the following anomalies:
You can see ceph-osd initialization process has ended, the general end of the " osd = new OSD" this process in the ceph_osd.cc

3. The reason:
As shown in the following code,


    ......
    osd = new OSD(g_ceph_context,
                store,
                whoami,
                ms_cluster,
                ms_public,
                ms_hb_front_client,
                ms_hb_back_client,
                ms_hb_front_server,
                ms_hb_back_server,
                ms_objecter,
                &mc,
                g_conf->osd_data,
                g_conf->osd_journal);

    int err = osd->pre_init();
    if (err < 0) {
      derr << TEXT_RED << " ** ERROR: osd pre_init failed: " << cpp_strerror(-err)
       << TEXT_NORMAL << dendl;
      return 1;
    }

    ms_public->start();
    ms_hb_front_client->start();
    ms_hb_back_client->start();
    ms_hb_front_server->start();
    ms_hb_back_server->start();
    ms_cluster->start();
    ms_objecter->start();

    // start osd
    err = osd->init();
    if (err < 0) {
      derr << TEXT_RED << " ** ERROR: osd init failed: " << cpp_strerror(-err)
           << TEXT_NORMAL << dendl;
      return 1;
    }

    #ifdef BUILDING_FOR_EMBEDDED
      cephd_preload_rados_classes(osd);
    #endif

    // install signal handlers
    init_async_signal_handler();
    register_async_signal_handler(SIGHUP, sighup_handler);
    register_async_signal_handler_oneshot(SIGINT, handle_osd_signal);
    register_async_signal_handler_oneshot(SIGTERM, handle_osd_signal);
    ......
   

the ceph-osd main() function does not redefine the SIGHUP semaphore before proceeding with "new osd" etc Before 
that, if the killall command is received, it will cause the ceph-osd process to use the system The default approach
eventually led to the withdrawal of this process.

4.solution
Register the SIGHUP message number to the front in the ceph_osd.cc
https://github.com/ceph/ceph/pull/19958

History

#1 Updated by John Spray over 1 year ago

  • Project changed from Ceph to RADOS
  • Status changed from New to Need Review
  • Component(RADOS) OSD added

#2 Updated by Kefu Chai over 1 year ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF