Project

General

Profile

Actions

Bug #13699

closed

krbd: crash under pblio benchmark

Added by Josh Durgin over 8 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Kernel version: unknown, running in aws, with provisioned iops volumes beneath the osds

Running pblio (https://github.com/pblcache/pblcache/wiki/Pblio) with three different rbd devices as ASU1, 2, and 3, and increasing the BSU parameter, caused the client to crash with BSU > 100. No stacktrace is available, the vm simply hung. This was reproduced a few times.
Hypothesis: past 100 BSUs, the test starts to read and write the same blocks, which is usually masked by the page cache/fs on top of rbd. Reproducing this with osds running on tmpfs or memstore may work.

Actions #1

Updated by Douglas Fuller over 8 years ago

Are there more details available? How large were the ASUs? This benchmark ran fine on a test cluster with upstream kernel 4.2.0 using 1GB ASUs and up to 256 BSUs (the highest number tested).

Actions #2

Updated by Douglas Fuller over 8 years ago

  • Status changed from New to Can't reproduce

Also couldn't duplicate with 3.10.

Actions

Also available in: Atom PDF