Bug #10487: logclient: setting clog_to_monitor=false on live osd crashes - Ceph - Ceph

Actions

Copy link

Bug #10487

closed

logclient: setting clog_to_monitor=false on live osd crashes

Added by Sage Weil over 9 years ago. Updated over 8 years ago.

Status:

Can't reproduce

Priority:

Urgent

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

firefly

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2015-01-07 21:20:22.802780 7f5c373ae700 -1 common/LogClient.cc: In function 'Message* LogClient::_get_mon_log_message()' thread 7f5c373ae700 time 2015-01-07 21:20:22.474387
common/LogClient.cc: 155: FAILED assert(p != log_queue.end())

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (LogClient::_get_mon_log_message()+0x131c) [0x9fd89c]
 2: (LogClient::get_mon_log_message()+0x2c) [0x9fd9ec]
 3: (MonClient::send_log()+0x15) [0xa4baf5]
 4: (MonClient::tick()+0x158) [0xa55078]
 5: (Context::complete(int)+0x9) [0x6745a9]
 6: (SafeTimer::timer_thread()+0x425) [0xa97075]
 7: (SafeTimerThread::entry()+0xd) [0xa97cad]
 8: (()+0x7e9a) [0x7f5c4b47ae9a]
 9: (clone()+0x6d) [0x7f5c4a3742ed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

this probably only happens if there are already queued log events?

Actions

Copy link

Updated by Mykola Golub about 9 years ago

I am interested in fixing this issue. Still I have failed to reproduce it on master. I tried running a script that in loop changed the parameter

while sleep ${SLEEP}
 do
     ceph tell osd.0 injectargs --clog_to_monitors true
     sleep ${SLEEP}
     ceph tell osd.0 injectargs --clog_to_monitors false
 done

with SLEEP varying from 0 to several seconds, and generating clog events at that time. Also tried decreasing mon_client_max_log_entries_per_message and mon_client_ping_interval parameters. Reviewing the code did not help too.

Sage, may be you have some details to share that would help to reproduce the issue or understand where the problem is? Have you seen this only once? At what workload? What does make you believe it was due to clog_to_monitor setting?

Actions

Copy link