Project

General

Profile

Bug #5272

Updating ceph from 0.61.2 to 0.61.3 obviously changes tunables of existing cluster

Added by To Pro almost 11 years ago. Updated almost 11 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm running a ceph cluster with three server nodes, each running one MON, one MDS and three OSDs to provide CEPHFS storage to clients. The server- and client-node setup is debian wheezy using linux-3.8.x kernel from debian.org and ceph packages from ceph.com/debian-cuttlefish. The clients mount CephFS using kernel client. The cluster was created back then when it ran bobtail, then upgraded to cuttlefish 0.61.2 without issues. Today when I updated the server nodes to ceph 0.61.3 and restared ceph on all server-nodes (service ceph restart) consecutively, some time during that process all clients having had mounted CephFS stalled IO. Rebooting clients didn't change a thing so I decided to shutdown all (eight) clients, stop ceph daemons on all servers completely and waited for things to settle before starting up again. I then restarted ceph daemons on all server nodes again, waiting again for things to settle, then tried booting up a single client trying to mount the CephFS, still blocking IO.
I then remembered having read about Ceph tunables and as I still use linux-3.8 I would not be able to mount the FS when tunables would be set to "optimal" settings. So I just gave it a try calling "ceph osd crush tunables legacy" and from that moment all my nodes could mount CephFS again as previously with 0.61.2.

As requested by Sage I placed a tared-up mod dir on cephdrop for debugging reasons, the file is called topro_mon_0.61.2_to_0.61.3_tunables_issue.tar.bz2

History

#1 Updated by Ian Colle almost 11 years ago

  • Assignee set to Greg Farnum

#2 Updated by Greg Farnum almost 11 years ago

  • Status changed from New to Need More Info
  • Assignee deleted (Greg Farnum)

I went through a diff and there's nothing obvious between those two versions that could have caused these feature bits to either change or get set incorrectly.
Nor are any of the crush maps in the monitor store (covering many months) different from the newest crushmap.

It's possible there's some bug that caused the features to be reported incorrectly to clients, but I haven't heard of anybody else running into a problem with it, so for now I'm inclined to call this a different bug that got fixed either when the osdmap got jiggled or through one of the other steps.

#3 Updated by To Pro almost 11 years ago

I'm afraid that as long as no one else encounters this issue I am not able to provide more detailed information. The only thing I'm quite sure about is the fact that "ceph osd crush tunables legacy" actually made the cluster accessible again for my cephfs clients running linux-3.8.

Of course I can provide you with more detailed information about the specific cluster setup.

#4 Updated by To Pro almost 11 years ago

As I re-encountered the same issue without upgrading, just restarting MDS daemon, I think this tracker issue may be closed. Seems like as Greg supposed, something else by chance made the cluster work again just at the same time when I was issuing the ceph tunables commend.

My investigation this time implies that it has something to do with ceph daemons denying connections from restarted daemons caused by a change of process id. I saw logfile warnings in ceph-osd.X.log which stresses that assumtion. Grep for something like "worng node!" in the osd log files if you look after that issue.

#5 Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to Can't reproduce

#6 Updated by Greg Farnum almost 11 years ago

  • Status changed from Can't reproduce to Duplicate

This was issues with the MDS doing heavy reads off of the OSDs. See #4405 and the related caching issues.

Also available in: Atom PDF