Project

General

Profile

Bug #10823

The existance of snapshots causes huge performance issues

Added by Stefan Himpich about 9 years ago. Updated about 8 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Having snapshots of rbd images (just the existing of the snapshots for backup purposes, no fancy stuff like layering) has a huge performance impact on the performance of the whole ceph cluster. This applies to throughput and latency.

Rados Benchmark with having 7 Snapshots per RBD Image:
------------------
Total time run: 301.939224
Total writes made: 3366
Write size: 4194304
Bandwidth (MB/sec): 44.592

Stddev Bandwidth: 32.8543
Max bandwidth (MB/sec): 164
Min bandwidth (MB/sec): 0
Average Latency: 1.43471
Stddev Latency: 1.92137
Max latency: 17.6235
Min latency: 0.044953
------------------

Rados Benchmark after removing all snapshots:
------------------
Total time run: 300.615888
Total writes made: 7343
Write size: 4194304
Bandwidth (MB/sec): 97.706

Stddev Bandwidth: 50.4059
Max bandwidth (MB/sec): 256
Min bandwidth (MB/sec): 0
Average Latency: 0.655008
Stddev Latency: 0.817465
Max latency: 10.1572
Min latency: 0.042001
------------------

Setup:
- ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
- 4 Ceph Nodes with 10 spinning disks (sata raid editions 2TB) and 6 SSDs.
- all Journals are on SSDs.
- The Nodes are linked with 10GbE.
- currently 16 VM Images
- replicated size 3
- Ceph Nodes run Ubuntu Trusty with updated Kernel (3.18.0)

Skript used to create the snapshots:
--------------------
#!/bin/bash
DATE=$(date +"%Y%m%d_%H:%M")
MYTAG=autosnapshot
for IMAGE in $(rbd ls); do
rbd snap create $IMAGE@$MYTAG-$DATE
sleep 180
done
--------------------

Attachments:
- ceph.conf
- munin disk io graph of one node (snapshots were have been removed at 18:00, afterwards I ran hourly rados benchmarks)
- zabbix io wait graph for a VM using the cluster (snapshots were have been removed at 18:00, afterwards I ran hourly rados benchmarks)

ceph.conf View (20.8 KB) Stefan Himpich, 02/10/2015 12:02 PM

node1016 - diskstats_iops-day.png View - Munin Disk-IO Graph (46.6 KB) Stefan Himpich, 02/10/2015 12:02 PM

sandbox01-iowait.png View - Zabbix IO Wait (68.7 KB) Stefan Himpich, 02/10/2015 12:02 PM

History

#1 Updated by Samuel Just about 9 years ago

  • Status changed from New to Won't Fix

Snapshots cause COW for the first write to each block after the snapshot. This is perhaps not optimal, but it's how it works for now. The performance should level off as more writes happen after the snapshot.

#2 Updated by Stefan Himpich about 9 years ago

I have no problems with the snapshotted image becoming slower (as I see no technical way around), but the problem is, that the whole cluster is effected (see the rados bench results, they have nothing to do with the images).

#3 Updated by Andreas John about 9 years ago

Hi,
could anyone point out, why ~50 IOPS cause ~300 IOPS with a single snapshot? If a "normal" COW happens and we write ~50 IOPS, then we must read ~50 IOPS (current data) and write it to the snaphot (That's ~50 IOPS again).

So ~50 write IOPS ==> ~100 IOPS (write), ~50 IOPS read = ~150 IOPS.

Why is there a multiplication factor of 6 in ceph?

#4 Updated by Daniel Kraft about 8 years ago

I don't understand the won't fix tag.
As noted before, it's obvious that COW blocks of a snapshotted image will become slow (i.e. the first writes on the image). This is because of the design decision to keep the existing block for the snapshot and write a new one on writes for the live image.

But that the whole cluster gets slower with each snapshot (even images that don't have snapshots) sounds like a bad performance characteristic to me. Any information on this?

And even for the COW latency: Would this be better when I'd use 4k blocks for the image so that the COW action doesn't take so much time? Are there any performance measurements with 4k (or other small) blocks?

Also available in: Atom PDF