Bug #13699: krbd: crash under pblio benchmark - Linux kernel client - Ceph

Actions

Copy link

Bug #13699

closed

krbd: crash under pblio benchmark

Added by Josh Durgin over 8 years ago. Updated over 8 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

Kernel version: unknown, running in aws, with provisioned iops volumes beneath the osds

Running pblio (https://github.com/pblcache/pblcache/wiki/Pblio) with three different rbd devices as ASU1, 2, and 3, and increasing the BSU parameter, caused the client to crash with BSU > 100. No stacktrace is available, the vm simply hung. This was reproduced a few times.
Hypothesis: past 100 BSUs, the test starts to read and write the same blocks, which is usually masked by the page cache/fs on top of rbd. Reproducing this with osds running on tmpfs or memstore may work.

Actions

Copy link

Updated by Douglas Fuller over 8 years ago

Are there more details available? How large were the ASUs? This benchmark ran fine on a test cluster with upstream kernel 4.2.0 using 1GB ASUs and up to 256 BSUs (the highest number tested).

Actions

Copy link

Updated by Douglas Fuller over 8 years ago

Status changed from New to Can't reproduce

Also couldn't duplicate with 3.10.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #13699

krbd: crash under pblio benchmark

Updated by Douglas Fuller over 8 years ago

Updated by Douglas Fuller over 8 years ago