Bug #14197
closedRados bench 4K write stops and reports pipe fault
0%
Description
Installed Infernalis 9.2.0 on one x64 Ubuntu 14.04 machine, setup one monitor and one OSD on same machine; ran radosbench 4MB or 16KB to fill the empty OSD to full, no problem; but ran radosbench 4KB to fill the empty OSD, it always stopped at near half capacity(total 55488 kobjects, 216 GB data), and reported below error:
2015-12-28 21:15:24.479928 7f566036b700 0 -- 10.1.10.102:0/2817697853 >> 10.1.10.102:6800/3804 pipe(0x7f5658002110 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f56580063b0).fault
$ceph -s
cluster ae29003b-8559-4896-8315-5e887eba1ede
health HEALTH_WARN
192 pgs stale
192 pgs stuck stale
1/1 in osds are down
monmap e1: 1 mons at {msl-lab-dsg02=10.1.10.102:6789/0}
election epoch 2, quorum 0 msl-lab-dsg02
osdmap e8: 1 osds: 0 up, 1 in
flags sortbitwise
pgmap v3731: 192 pgs, 2 pools, 216 GB data, 55488 kobjects
339 GB used, 101 GB / 441 GB avail
192 stale+active+clean
And ceph.log has this on bottom:
2015-12-28 21:30:19.459596 mon.0 10.1.10.102:6789/0 3779 : cluster [INF] osd.0 marked down after no pg stats for 901.104704seconds
Then I tried latest master branch on another machine, same issue, and it's easy to reproduce it. Have ran Hammer 0.94.2 a few months back, didn't see this issue.
Updated by Jianjian Huo over 8 years ago
Re-ran same test on Hammer 0.94.2, was able to fill empty OSD to full with 4K writes and didn't see any pipe fault.
But Hammer only was able to write 41967 kobjects with 419GB used, while Infernalis could write 55488 kobjects with 339GB, big difference.
- ceph -s
cluster bd38d860-8a75-4525-af91-caccc37f22e5
health HEALTH_ERR
1 full osd(s)
monmap e1: 1 mons at {MyMachine=IP:6789/0}
election epoch 2, quorum 0 MyMachine
osdmap e8: 1 osds: 1 up, 1 in
flags full
pgmap v3308: 192 pgs, 2 pools, 163 GB data, 41967 kobjects
419 GB used, 22531 MB / 441 GB avail
192 active+clean
Is there a limit of how many objects one filestore OSD can store? maybe this test with Infernalis hit that limit?
Updated by Samuel Just over 8 years ago
2015-12-28 21:15:24.479928 7f566036b700 0 -- 10.1.10.102:0/2817697853 >> 10.1.10.102:6800/3804 pipe(0x7f5658002110 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f56580063b0).fault
isn't an interesting message. It sounds like the osd crashed, reproduce and post the osd log. Ideally, with
debug osd = 20
debug filestore = 20
debug ms = 1
Updated by Jianjian Huo over 8 years ago
Thanks, Samuel.
After seeing this error later with Ceph Hammer, I realized this may be caused by my setup. I did look into various logs, and it seems it ran out of inode for xfs. I was able to fix it on Hammer with some tweaks. I need to verify if this works on master code. After that, I will update and then we can close this.
Updated by Samuel Just about 8 years ago
- Status changed from New to Rejected
Rejecting for now (seems to be setup?). Feel free to reopen if there is new information.