Project

General

Profile

Actions

Bug #37617

closed

CephFS did not recover re-plugging network cable

Added by Niklas Hambuechen over 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Today my hardware hoster had a planned switch maintenance during which the switch one of my 3 ceph nodes was connected to was restarted.

Unfortunately, my CephFS fuse mount did not recover from this interruption; running e.g. `ls` or `df` on the mount point hung forever, even after all other services on the machine had recovered.

I'm using Ceph 13.2.2.

The logs look like this:

Dec 11 03:14:45 node-2 kernel: e1000e: eth0 NIC Link is Down
...
Dec 11 03:15:02 node-2 ceph-osd[20482]: 2018-12-11 03:15:02.491 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:>
Dec 11 03:15:02 node-2 ceph-osd[20482]: 2018-12-11 03:15:02.491 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:>
...
Dec 11 03:24:34 node-2 kernel: e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx

A script that I have that continuously tries to write to the mount continued with this output, starting from the downtime past the network recovery time:

Dec 11 03:28:03 node-2 myfs-mounted-pre-start[6973]: ceph-fuse /var/myfs/files fuse.ceph-fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
Dec 11 03:28:03 node-2 myfs-mounted-pre-start[6973]: touch: cannot touch '/var/myfs/files/.myfs-writable': Cannot send after transport endpoint shutdown

I also found these messages in the logs:

Dec 11 03:25:49 node-2 ceph-fuse[20150]: 2018-12-11 03:25:49.472 7f252e811700 -1 client.227355134 I was blacklisted at osd epoch 602
Dec 11 04:19:46 node-2 ceph-fuse[20150]: 2018-12-11 04:19:46.069 7f252e811700 -1 client.227355134 mds.0 rejected us (unknown error)

I also have full `journalctl` logs from that period, but leaving it at this for now in case it is already clear what the problem is or in case it's a known issue.

I would expect Ceph to recover automatically from this short 11-minute network interruption.

Unfortunately, it did not, and I had to kill the ceph-fuse process and `fusermount -u` the mount point, then restart ceph-fuse.

I am also not sure if it is a CephFS issue specifically, or one of the lower Ceph layers.


Files

diff.patch (733 Bytes) diff.patch geng jichao, 01/09/2020 06:44 AM
Actions #1

Updated by Niklas Hambuechen over 5 years ago

The OSD logs around time `Dec 11 03:25:49` (the "I was blacklisted" time):

2018-12-11 03:25:48.811 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:28.812044)
2018-12-11 03:25:48.811 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:28.812044)
2018-12-11 03:25:49.807 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:29.808082)
2018-12-11 03:25:49.807 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:29.808082)
2018-12-11 03:25:50.807 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:30.807746)
2018-12-11 03:25:50.807 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:30.807746)
2018-12-11 03:25:51.828 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:31.829055)
2018-12-11 03:25:51.828 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:31.829055)
2018-12-11 03:25:52.812 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:32.812618)
2018-12-11 03:25:52.812 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:32.812618)
2018-12-11 03:25:53.813 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.1:6804 osd.0 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:33.814465)
2018-12-11 03:25:53.813 7f02a74f2700 -1 osd.1 598 heartbeat_check: no reply from 10.0.0.3:6807 osd.2 since back 2018-12-11 03:14:42.396352 front 2018-12-11 03:14:42.396352 (cutoff 2018-12-11 03:25:33.814465)
2018-12-11 03:25:54.137 7f029acd9700  0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.1 down, but it is still running
2018-12-11 03:25:54.137 7f029acd9700  0 log_channel(cluster) log [DBG] : map e602 wrongly marked me down at e599
2018-12-11 03:25:54.137 7f029acd9700  1 osd.1 602 start_waiting_for_healthy

Doesn't seem to show anything related, but I notice the cutoff point moves all the time parallel to the current time.

Is that another bug?

Shouldn't the cutoff time be some constant time?

Actions #2

Updated by Patrick Donnelly over 5 years ago

  • Status changed from New to Rejected

I would expect Ceph to recover automatically from this short 11-minute network interruption.

Ceph will recover but your client became partitioned and the MDS will respond to this by evicting/blacklisting the client. Currently, it is necessary to restart the client when this happens.

In the future, please make a post to ceph-users for help diagnosing these kinds of problems.

Actions #3

Updated by Niklas Hambuechen over 5 years ago

Hey Patrick,

Currently, it is necessary to restart the client when this happens.

Is there already a feature request somewhere about not having to do that?

If not, could we turn this into one?

Not having to do manual recovery from network outages is a key reason we chose Ceph.

Also, what about my question of the moving cutoff point, is that expected?

Actions #4

Updated by Patrick Donnelly over 5 years ago

Niklas Hambuechen wrote:

Hey Patrick,

Currently, it is necessary to restart the client when this happens.

Is there already a feature request somewhere about not having to do that?

If not, could we turn this into one?

We do not have a tracker yet. Work is planned in the near future on this and we'll create a tracker once that is scoped.

Also, what about my question of the moving cutoff point, is that expected?

Don't know. Best to ask on ceph-users.

Actions #5

Updated by Niklas Hambuechen over 5 years ago

Patrick Donnelly wrote:

We do not have a tracker yet. Work is planned in the near future on this and we'll create a tracker once that is scoped.

OK. From a developer's perspective, this makes sense to me, but from user process perspective, an "appeal to the future" is suboptimal, because this way we do not have a ticket we can subscribe to for updates, or link from our internal issue tracker to the upstream feature request (both would be solved by having the Redmine issue). Instead, we'll have to periodically poll for when this exists somewhere in the near future.

Don't know. Best to ask on ceph-users.

All clear, I've posted it there (will link it once it appears in the archive).

Actions #6

Updated by geng jichao over 4 years ago

I also enconter the same problem. when I use ls or other operations on the mountpoint,it will failed. Even if the network is up and the more than mon_osd_blacklist_default_expire。

[root@YZ-222-8-83 fuse]# ls
ls: cannot open directory .: Cannot send after transport endpoint shutdown

The reason is when network is up until over mon_osd_blacklist_default_expire, the ceph-fuse will receive two epoch OSDMap,one include blacklist,the other one is blacklist recovery,but blacklist_events include recovered blacklist. I have fixed the problem,and test it ok.

_-- a/src/client/Client.cc
+++ b/src/client/Client.cc
@ -2480,12 +2480,14 @ void Client::handle_osd_map(MOSDMap *m)
// some PGs were inaccessible.
objecter
>op_cancel_writes(-EBLACKLISTED);

- } else if (blacklisted) {
+ }
+
+ if(blacklisted) {
// Handle case where we were blacklisted but no longer are
blacklisted = objecter->with_osdmap([myaddr](const OSDMap &o){
return o.is_blacklisted(myaddr);});
}_

Actions #7

Updated by Greg Farnum about 4 years ago

  • Status changed from Rejected to Resolved
  • Pull request ID set to 33169
Actions

Also available in: Atom PDF