Project

General

Profile

Actions

Bug #10487

closed

logclient: setting clog_to_monitor=false on live osd crashes

Added by Sage Weil over 9 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-01-07 21:20:22.802780 7f5c373ae700 -1 common/LogClient.cc: In function 'Message* LogClient::_get_mon_log_message()' thread 7f5c373ae700 time 2015-01-07 21:20:22.474387
common/LogClient.cc: 155: FAILED assert(p != log_queue.end())

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (LogClient::_get_mon_log_message()+0x131c) [0x9fd89c]
 2: (LogClient::get_mon_log_message()+0x2c) [0x9fd9ec]
 3: (MonClient::send_log()+0x15) [0xa4baf5]
 4: (MonClient::tick()+0x158) [0xa55078]
 5: (Context::complete(int)+0x9) [0x6745a9]
 6: (SafeTimer::timer_thread()+0x425) [0xa97075]
 7: (SafeTimerThread::entry()+0xd) [0xa97cad]
 8: (()+0x7e9a) [0x7f5c4b47ae9a]
 9: (clone()+0x6d) [0x7f5c4a3742ed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

this probably only happens if there are already queued log events?

Actions #1

Updated by Mykola Golub about 9 years ago

I am interested in fixing this issue. Still I have failed to reproduce it on master. I tried running a script that in loop changed the parameter

while sleep ${SLEEP}
do
ceph tell osd.0 injectargs --clog_to_monitors true
sleep ${SLEEP}
ceph tell osd.0 injectargs --clog_to_monitors false
done

with SLEEP varying from 0 to several seconds, and generating clog events at that time. Also tried decreasing mon_client_max_log_entries_per_message and mon_client_ping_interval parameters. Reviewing the code did not help too.

Sage, may be you have some details to share that would help to reproduce the issue or understand where the problem is? Have you seen this only once? At what workload? What does make you believe it was due to clog_to_monitor setting?

Actions #2

Updated by Loïc Dachary about 9 years ago

  • Backport changed from giant,firefly to firefly

it's non critical to fix in giant

Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from 12 to Can't reproduce
  • Regression set to No

i wasn't able to reproduce this either. :(

Actions

Also available in: Atom PDF