Project

General

Profile

Actions

Feature #23242

closed

ceph-objectstore-tool command to trim the pg log

Added by Vikhyat Umrao about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
luminous,jewel
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

ceph-objectstore-tool command to trim the pg log
The motive of this bug is to have a command to trim the pg log without consuming a lot of memory and CPU which makes the OSD node unstable.
Red Hat bug - https://bugzilla.redhat.com/show_bug.cgi?id=1552094


Related issues 3 (1 open2 closed)

Related to RADOS - Bug #23269: Early use of clog in OSD startup crashes OSDNew03/07/2018

Actions
Copied to RADOS - Backport #23275: luminous: ceph-objectstore-tool command to trim the pg logResolvedDavid ZafmanActions
Copied to RADOS - Backport #23307: jewel: ceph-objectstore-tool command to trim the pg logResolvedDavid ZafmanActions
Actions #1

Updated by David Zafman about 6 years ago

  • Status changed from New to In Progress
  • Assignee set to David Zafman

When testing the log trimming code on master the OSD crashes like this.

2018-03-06 16:19:07.413 7f0a38ee4700 20 filestore(/home/dzafman/ceph/build/dev/osd1) sync_entry(4039): woke after 5.001077
2018-03-06 16:19:07.413 7f0a38ee4700 10 journal commit_start max_applied_seq 98, open_ops 0
2018-03-06 16:19:07.413 7f0a38ee4700 10 journal commit_start blocked, all open_ops have completed
2018-03-06 16:19:07.413 7f0a38ee4700 10 journal commit_start committing 98, still blocked
2018-03-06 16:19:07.413 7f0a38ee4700 10 journal commit_start
2018-03-06 16:19:07.413 7f0a38ee4700 15 filestore(/home/dzafman/ceph/build/dev/osd1) sync_entry(4070): committing 98
2018-03-06 16:19:07.413 7f0a38ee4700 10 journal commit_started committing 98, unblocking
2018-03-06 16:19:07.413 7f0a38ee4700 20 filestore dbobjectmap: seq is 2
2018-03-06 16:19:07.413 7f0a38ee4700 15 genericfilestorebackend(/home/dzafman/ceph/build/dev/osd1) syncfs: doing a full sync (syncfs(2) if possible)
2018-03-06 16:19:07.417 7f0a31ed6700 10 monclient: tick
2018-03-06 16:19:07.417 7f0a31ed6700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2018-03-06 16:18:37.420105)
2018-03-06 16:19:07.417 7f0a31ed6700 10 log_client  log_queue is 1 last_log 2 sent 0 num 1 unsent 2 sending 2
2018-03-06 16:19:07.417 7f0a31ed6700 -1 /home/dzafman/ceph/src/common/LogClient.cc: In function 'Message* LogClient::_get_mon_log_message()' thread 7f0a31ed6700 time 2018-03-06 16:19:07.420182
/home/dzafman/ceph/src/common/LogClient.cc: 292: FAILED assert(num_unsent <= log_queue.size())

 ceph version 13.0.1-2394-g485c784 (485c784c8f3f91e64c58f52dca852e43e99cb48e) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f0a507f61d2]
 2: (LogClient::_get_mon_log_message()+0xdd5) [0x7f0a507b4965]
 3: (LogClient::get_mon_log_message(bool)+0x43) [0x7f0a507b4a43]
 4: (MonClient::send_log(bool)+0x1c) [0x7f0a5084859c]
 5: (MonClient::tick()+0x612) [0x7f0a50852882]
 6: (Context::complete(int)+0x9) [0x563605744579]
 7: (SafeTimer::timer_thread()+0x20f) [0x7f0a507f2a3f]
 8: (SafeTimerThread::entry()+0xd) [0x7f0a507f407d]
 9: (()+0x76ba) [0x7f0a4f3106ba]
 10: (clone()+0x6d) [0x7f0a4e59a82d]
Actions #2

Updated by David Zafman about 6 years ago

From PG::log_weirdness():

2018-03-06 16:18:57.413 7f0a593b9dc0 -1 log_channel(cluster) log [ERR] : 1.0 log bound mismatch, info (tail,head] (12'22,12'41] actual [12'22,12'41]

Actions #3

Updated by David Zafman about 6 years ago

The assert(num_unsent <= log_queue.size()) probably doesn't relate directly with this feature. The log_weirdness() function uses clog too early in the start-up process on an OSD after the log is trimmed using ceph-objectstore-tool. We need to fix the "log bound mismatch" for this feature. Another tracker should be filed since that shouldn't crash the osd.

Actions #4

Updated by David Zafman about 6 years ago

  • Related to Bug #23269: Early use of clog in OSD startup crashes OSD added
Actions #5

Updated by Vikhyat Umrao about 6 years ago

  • Source set to Support
  • Backport set to luminous,jewel
Actions #7

Updated by David Zafman about 6 years ago

  • Copied to Backport #23275: luminous: ceph-objectstore-tool command to trim the pg log added
Actions #8

Updated by David Zafman about 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #9

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23307: jewel: ceph-objectstore-tool command to trim the pg log added
Actions #10

Updated by Nathan Cutler about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF