Project

General

Profile

Actions

Bug #48785

closed

kceph: osd socket connection was closed

Added by Xiubo Li over 3 years ago. Updated over 2 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Ran against a vstart ceph cluster based the latest upstream code, then mount the kclient, by running the following test script serveral times:

# for i in {1..10000}; do echo $i > file$i.txt; done; for i in {1..10000}; do rm -f file$i.txt; done;

It will be stuck and I can see the following dmesg logs:

<4>[  618.390897] libceph: osd0 (1)10.72.47.117:6801 socket closed (con state OPEN)
<4>[  618.394087] libceph: osd0 (1)10.72.47.117:6801 socket error on write
<6>[  618.445075] libceph: osd0 down
<4>[  637.991918] libceph: osd2 (1)10.72.47.117:6809 socket closed (con state OPEN)
<4>[  637.992565] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  638.251396] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  638.755376] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  639.923390] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  641.972401] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  645.939306] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  654.067008] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  668.914496] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  684.273877] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  700.145453] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  714.992791] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  730.352353] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  746.223853] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  762.095293] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  776.942800] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  792.302200] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  808.173644] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  824.045071] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  839.404610] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  855.276046] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  871.147474] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  885.995108] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  901.354415] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<6>[  916.711242] libceph: osd1 weight 0x0 (out)
<4>[  917.225958] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  933.097298] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  947.944823] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  963.304275] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  979.176261] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[  995.047232] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1010.406943] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1026.278149] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1042.149890] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1058.021045] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1073.380703] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1089.252613] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1105.123450] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1119.970876] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1135.330522] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1151.201866] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1167.073204] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1181.920785] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1197.280300] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1213.151667] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1229.023141] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1244.382513] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1260.253995] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1276.125480] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1290.972967] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1306.332381] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1322.203871] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1338.075323] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1352.923316] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1368.282278] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1384.153756] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1400.025264] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1415.384779] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1431.256134] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1447.127539] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1461.975087] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1477.334520] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<4>[ 1493.205982] libceph: osd2 (1)10.72.47.117:6809 socket error on write
<6>[ 1507.428136] libceph: osd2 down
Actions #1

Updated by Xiubo Li over 3 years ago

The kceph was also using the lastest upstream code.

Actions #2

Updated by Xiubo Li over 3 years ago

  • Assignee set to Xiubo Li
Actions #3

Updated by Ilya Dryomov about 3 years ago

Hi Xiubo,

Was messenger failure injection enabled on the server side?

Do you have the corresponding "debug ms" logs from the OSDs?

Actions #4

Updated by Xiubo Li about 3 years ago

Ilya Dryomov wrote:

Hi Xiubo,

Was messenger failure injection enabled on the server side?

I didn't change anything and just setup the cluster by using:

# MDS=3 MON=3 OSD=3 MGR=1 ../src/vstart.sh -n -X -G --msgr1

And the messenger failure injection kept the as default.

Do you have the corresponding "debug ms" logs from the OSDs?

I didn't save the logs, since it was very easy to reproduce, I am not sure whether it was due to run out of the disk space, which is 300G. Many times it could run out of the disk space for my cephfs tests in a loop.

  data:
    pools:   3 pools, 192 pgs
    objects: 22 objects, 11 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     192 active+clean
Actions #5

Updated by Xiubo Li over 2 years ago

  • Status changed from New to Won't Fix

I can reproduce this locally many times, finally I found all of them were due to running out of the disk, it's not a bug and closing it.

Actions

Also available in: Atom PDF