Bug #8100: Rados Bench seq read errors on tiered configuration - Ceph - Ceph

Actions

Copy link

Bug #8100

closed

Rados Bench seq read errors on tiered configuration

Added by Mark Nelson about 10 years ago. Updated about 10 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

David Zafman

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Attempting to perform the following rados bench seq test on a tiered pool setup that had data written to it via rados bench is resulting in the following error:

nhm@burnupiY:~$ /usr/bin/rados -c /tmp/cbt/ceph/ceph.conf -p rados-bench-`hostname -s`-3 -b 4096 bench 300 seq --concurrent-ios 32 --no-cleanup
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
read got 4096
error during benchmark: -5
error 5: (5) Input/output error

ceph -s shows all OSDs are up and in, but some cache pools are at/near their target max:

nhm@burnupiY:~$ ceph -s
    cluster 72746997-0aea-479e-bc48-6153b319cf35
     health HEALTH_WARN 'rados-bench-burnupiY-0-cache' at/near target max; 'rados-bench-burnupiY-1-cache' at/near target max; 'rados-bench-burnupiY-2-cache' at/near target max; 'rados-bench-burnupiY-3-cache' at/near target max
     monmap e1: 1 mons at {a=192.168.10.2:6789/0}, election epoch 2, quorum 0 a
     osdmap e220: 36 osds: 36 up, 36 in
      pgmap v1977: 21248 pgs, 11 pools, 278 GB data, 703 kobjects
            694 GB used, 27907 GB / 28602 GB avail
               21248 active+clean
  client io 6747 kB/s wr, 131 op/s

Not much to go on yet, but seems to be repeatable. Will do more testing.

Actions

Copy link

Updated by Mark Nelson about 10 years ago

This appears to be happening on non-tiered pools as well, regardless if erasure coding or replication is used.

Actions

Copy link

Updated by Greg Farnum about 10 years ago

Did you check for typos? :p Right pool name? That "-3" looks easy to get wrong.

Actions

Copy link

Updated by Mark Nelson about 10 years ago

It's all automated, though I did try manually testing reads from the command line as well. FWIW, with debugging enabled we seem to succeed with a number of reads with ondisk = 0 before we hit the Input/output error.

Actions

Copy link