Project

General

Profile

Bug #8582

Cluster very slow after upgrade to 80.1

Added by Rens Reinders almost 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Last week we've upgraded to firefily 80.1 on a production cluster (3 nodes, 18 osds). Upgrade went fine but the speed of the entire cluster dropped drastically. We've benched all OSD's but none seem to be slow. Also no weird messages in dmesg. If i map a rbd device on one of the nodes and do a dd with conv=sync oflag=direct, i get only 10mb /sec. Used to be 170. We use 10gbit networking dedicated osd and cluster network. Problems started after rebooting an osd server. Recovery of 6% degraded data took ages and was exponentionally slower. Untill recovery stalled completly at 1.1%. Until i stopped some clients, then recovery kicked back in again and finished. Still the cluster is very slow. Especially read.

Before We upgraded from 0.72 everything was super fast. I know this isn't a proper sized cluster but it shouldn't be slower after a upgrade right?

Please not we're in te process of imgrating everyhing off this cluster (at 10mb /sec) and after finishing will i be able to restart deamons and such. Please not this is new hardware and all enterprise disks.

History

#1 Updated by Sage Weil almost 10 years ago

Can you look at something like iostat or iotop to see if the disks are busy or not? Are the ceph-osd procs using a lot of CPU?

Can you use a tool like 'perf', perhaps to see where the time is being spent?

#2 Updated by Sage Weil almost 10 years ago

  • Status changed from New to Need More Info
  • Priority changed from Immediate to Urgent
  • Source changed from other to Community (user)

#3 Updated by Samuel Just almost 10 years ago

  • Priority changed from Urgent to High

#4 Updated by Sage Weil over 9 years ago

Any update here, Rens?

#5 Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF