Project

General

Profile

Actions

Fix #5232

closed

osd: slow peering due to pg log rewrites

Added by Stefan Priebe almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I noticed that since cuttlefish the osd recovery process is extremely slow. Also client I/o gets stalled to the recovering osd ending up in slow request log messages.

Things I noticed:
- recovering osd uses 100-200% CPU until recovered
- nearly no disk I/o from that osd
- under bobtail I had much lower CPU usage and high disk I/o (which makes sense)
- recovering from a simple osd restart (getting changed objects since stop) is awful slow even my cluster was only degraded by 0.1% (24tb/3repl) took 10-20min
- I'm using xfs

This is a major blocker as recovering is nearly impossible.

Used version current upstream/cuttlefish

I've provided sage a gdb threads bt all.

Greets
Stefan


Files

gdb.txt.gz (31.9 KB) gdb.txt.gz Sage Weil, 06/02/2013 09:38 AM
ceph-osd.23.tar.gz (1.83 MB) ceph-osd.23.tar.gz Stefan Priebe, 06/02/2013 11:32 AM

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #5238: osd: slow recovery (uselessly dirtying pg logs during peering)ResolvedSamuel Just06/03/2013

Actions
Actions

Also available in: Atom PDF