Project

General

Profile

Actions

Bug #3828

closed

seeing error: fault, server, going to standby whenever I run a ceph-syn load generation

Added by Anonymous over 11 years ago. Updated over 11 years ago.

Status:
Rejected
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

while validating bug 520, i saw an interesting error. it may be a red herring, as I am seeing no problem with the writes or reads or deletes. marking the bug low, because an error is produced, but does not seem harmful.

While validating bug 520, i had my conf file setting with

debug ms = 1, and
debug mds = 5.
I received info that told me the test successfully. yay

But when I changed "debug mds = 1" i started getting this error:

2013-01-16 15:44:13.402478 7ff0a30f9700 0 -- 10.1.10.123:6802/30003 >> 10.1.10.126:6800/12755 pipe(0x7ff09d4b1d70 sd=31 :6802 pgs=2 cs=1 l=0).fault, server, going to standby
2013-01-16 15:46:43.413868 7ff0a2ef7700 0 -- 10.1.10.123:6802/30003 >> 10.1.10.126:6800/12770 pipe(0x7ff09d4b2090 sd=32 :6802 pgs=2 cs=1 l=0).fault, server, going to standby
2013-01-16 15:48:46.086045 7ff0a2af7700 0 -- 10.1.10.123:6802/30003 >> 10.1.10.126:6800/12786 pipe(0x7ff09d4b2570 sd=33 :6802 pgs=2 cs=1 l=0).fault, server, going to standby

when I ran this command on a remote client that is fuse-mount the ceph filesystem:
ceph-syn --syn makefiles 1000 1 0 -m 10.1.10.125

the command executes fine:
2013-01-16 15:59:12.553746 7fd086b7e700 0 client.6638 makefiles time is 5.910647 or 0.00591065 per file
and i can see the directories and files, are created with zero size, but I also get the fault, server, going to standby in the /var/log/ceph/ceph-mds.a.log

when I use this config:
3 Node Cluster:
ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
cat /etc/ceph/ceph.conf
[global]osd_journal_size = 1024
filestore_xattr_use_omap = true
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
[osd] debug ms = 1
osd journal size = 1000
filestore xattr use omap = true
osd heartbeat interval = 12
[mon.a] host = centos1
mon addr = 10.1.10.123:6789
[mon.b] host = centos2
mon addr = 10.1.10.124:6789
[mon.c] host = centos3
mon addr = 10.1.10.125:6789
[mds.a] debug mds = 1 <--------
host = centos1
[osd.0] host = centos1
[osd.1] host = centos1
[osd.2] host = centos1
[osd.3] host = centos2
[osd.4] host = centos2
[osd.5] host = centos2
[osd.6] host = centos3
[osd.7] host = centos3
[osd.8] host = centos3
Ran:
ceph-syn --syn makefiles 1000 1 0 -m 10.1.10.125

Multiple times.
no errors.
appears the "Success unsafe" issue has been resolved.

Actions #1

Updated by Greg Farnum over 11 years ago

  • Status changed from New to Rejected

This is showing up on your MDS, about 15 minutes after a client completes accesses, right? This is associated with the closing issues we discussed during bug scrub; the MDS is not closing the client's session when the client disconnects, but after 15 minutes of idling the messenger system closes the socket even so. (Which is what's generating the actual output, and is totally normal.)

So this is a symptom of an issue we should deal with, but not itself any kind of problem, and we have bugs addressing the issues!
(Also, I thought we had a "Not a Bug" status, but I don't see it so this wins the Rejected label instead! :D)

Actions

Also available in: Atom PDF