Bug #14197: Rados bench 4K write stops and reports pipe fault - Ceph - Ceph

Actions

Copy link

Bug #14197

closed

Rados bench 4K write stops and reports pipe fault

Added by Jianjian Huo over 8 years ago. Updated about 8 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v9.2.1

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Installed Infernalis 9.2.0 on one x64 Ubuntu 14.04 machine, setup one monitor and one OSD on same machine; ran radosbench 4MB or 16KB to fill the empty OSD to full, no problem; but ran radosbench 4KB to fill the empty OSD, it always stopped at near half capacity(total 55488 kobjects, 216 GB data), and reported below error:

2015-12-28 21:15:24.479928 7f566036b700 0 -- 10.1.10.102:0/2817697853 >> 10.1.10.102:6800/3804 pipe(0x7f5658002110 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f56580063b0).fault

$ceph -s
cluster ae29003b-8559-4896-8315-5e887eba1ede
health HEALTH_WARN
192 pgs stale
192 pgs stuck stale
1/1 in osds are down
monmap e1: 1 mons at {msl-lab-dsg02=10.1.10.102:6789/0}
election epoch 2, quorum 0 msl-lab-dsg02
osdmap e8: 1 osds: 0 up, 1 in
flags sortbitwise
pgmap v3731: 192 pgs, 2 pools, 216 GB data, 55488 kobjects
339 GB used, 101 GB / 441 GB avail
192 stale+active+clean

And ceph.log has this on bottom:
2015-12-28 21:30:19.459596 mon.0 10.1.10.102:6789/0 3779 : cluster [INF] osd.0 marked down after no pg stats for 901.104704seconds

Then I tried latest master branch on another machine, same issue, and it's easy to reproduce it. Have ran Hammer 0.94.2 a few months back, didn't see this issue.

Actions

Copy link

Updated by Jianjian Huo over 8 years ago

Re-ran same test on Hammer 0.94.2, was able to fill empty OSD to full with 4K writes and didn't see any pipe fault.
But Hammer only was able to write 41967 kobjects with 419GB used, while Infernalis could write 55488 kobjects with 339GB, big difference.

ceph -s
cluster bd38d860-8a75-4525-af91-caccc37f22e5
health HEALTH_ERR
1 full osd(s)
monmap e1: 1 mons at {MyMachine=IP:6789/0}
election epoch 2, quorum 0 MyMachine
osdmap e8: 1 osds: 1 up, 1 in
flags full
pgmap v3308: 192 pgs, 2 pools, 163 GB data, 41967 kobjects
419 GB used, 22531 MB / 441 GB avail
192 active+clean

Is there a limit of how many objects one filestore OSD can store? maybe this test with Infernalis hit that limit?

Actions

Copy link

Updated by Samuel Just over 8 years ago

2015-12-28 21:15:24.479928 7f566036b700 0 -- 10.1.10.102:0/2817697853 >> 10.1.10.102:6800/3804 pipe(0x7f5658002110 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f56580063b0).fault

isn't an interesting message. It sounds like the osd crashed, reproduce and post the osd log. Ideally, with

debug osd = 20
debug filestore = 20
debug ms = 1

Actions

Copy link

Updated by Jianjian Huo over 8 years ago

Thanks, Samuel.

After seeing this error later with Ceph Hammer, I realized this may be caused by my setup. I did look into various logs, and it seems it ran out of inode for xfs. I was able to fix it on Hammer with some tweaks. I need to verify if this works on master code. After that, I will update and then we can close this.

Actions

Copy link

Updated by Samuel Just about 8 years ago

Status changed from New to Rejected

Rejecting for now (seems to be setup?). Feel free to reopen if there is new information.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #14197

Rados bench 4K write stops and reports pipe fault

Updated by Jianjian Huo over 8 years ago

Updated by Samuel Just over 8 years ago

Updated by Jianjian Huo over 8 years ago

Updated by Samuel Just about 8 years ago