Project

General

Profile

Actions

Bug #9854

closed

librbd: reads contending for cache space can cause livelock

Added by Josh Durgin over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
giant,firefly,dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As a result of accounting for reads properly with #9513. Using qemu-io (a test program) is one way to trigger this - qemu-iotests tries to read 128M.

Actions #1

Updated by Sage Weil over 9 years ago

  • Priority changed from Urgent to Immediate
Actions #2

Updated by Josh Durgin over 9 years ago

  • Assignee set to Jason Dillaman
Actions #3

Updated by Yuri Weinstein over 9 years ago

Update:

Run teuthology-2014-10-21_23:17:01-upgrade:firefly:newer-firefly-distro-basic-vps
Job: ['565380']

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-21_23:17:01-upgrade:firefly:newer-firefly-distro-basic-vps/565380/

Crash: Command failed on vpm164 with status 139: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=0975ec9cec1c466f7b15f5173541a7eab02dae18 TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" RBD_CREATE_ARGS=--new-format adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/rbd/import_export.sh'
ceph version 0.80.7-74-g0975ec9 (0975ec9cec1c466f7b15f5173541a7eab02dae18)
 1: rbd() [0x422f91]
 2: (()+0xf710) [0x7f37041e1710]
 3: (Mutex::Lock(bool)+0x1b) [0x7f3704622bfb]
 4: (Finisher::finisher_thread_entry()+0x27) [0x7f370463c497]
 5: (()+0x79d1) [0x7f37041d99d1]
 6: (clone()+0x6d) [0x7f37033e286d]
Actions #4

Updated by Jason Dillaman over 9 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Josh Durgin over 9 years ago

  • Subject changed from librbd: reads larger than cache size hang with caching enabled to librbd: reads contending for cache space can cause livelock

Reads thrashing the cache can be reproduced with:

ceph_test_objectcacher_stress --ops 5000 --percent-read 0.90 --delay-ns 0 --objects 100 --max-op-size 1048576 --client-oc-max-dirty 25165824

Actions #6

Updated by Josh Durgin over 9 years ago

  • Backport changed from giant to giant,firefly,dumpling
Actions #7

Updated by Sage Weil over 9 years ago

  • Priority changed from Immediate to Urgent
Actions #8

Updated by Jason Dillaman over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #9

Updated by Jason Dillaman over 9 years ago

Actions #10

Updated by Jason Dillaman over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #11

Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
Actions #12

Updated by Loïc Dachary about 9 years ago

  • Status changed from Resolved to Pending Backport

I incorrectly changed this ticket to resolved after the backport was merged in giant.

Actions #13

Updated by Josh Durgin about 9 years ago

  • Status changed from Pending Backport to 7
Actions #14

Updated by Ken Dreyer about 9 years ago

PR (now merged) for firefly: https://github.com/ceph/ceph/pull/3410

Actions #15

Updated by Josh Durgin about 9 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF