Project

General

Profile

Actions

Bug #8413

closed

test_librbd_fsx: krbd size check races with disk size revalidation

Added by Josh Durgin almost 10 years ago. Updated almost 10 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running test_librbd_fsx in kernel rbd mode against wip-discard in ceph-client.git, this error was reported:

2014-05-20T17:59:58.607 INFO:teuthology.orchestra.run.out:[10.214.135.104]: 56476 trunc from 0xde2bc00 to 0x169b200
2014-05-20T18:00:08.638 INFO:teuthology.orchestra.run.out:[10.214.135.104]: Size error: expected 0x169b200 stat 0xde2bc00

Checking later manually with blockdev --getsize64 /dev/rbd0, the expected size was shown, so the notify and revalidate disk worked,
just not before the test checked the block device size. The disk size is re-read and updated before krbd responds to the notify,
so this should not happen unless this took longer than the notify timeout. The osd in question was hitting some of its own timeouts,
so this may have happened.

I'm not sure of a reliable way around it.

Actions #1

Updated by Josh Durgin almost 10 years ago

  • Status changed from New to Rejected

In this case it was in fact the kernel client taking more than 10 seconds to respond to a notify, not a bug. This may have been influenced by having lots of dynamic debugging turned on. A workaround to make the test much more reliable is to increase the client_notify_timeout. At 60s, I haven't been able to reproduce the issue yet.

Actions

Also available in: Atom PDF