Bug #8291: 0.80: fuse client hangs after wake-up from suspend until restart of active MDS - CephFS - Ceph

Actions

Copy link

Bug #8291

closed

0.80: fuse client hangs after wake-up from suspend until restart of active MDS

Added by Dmitry Smirnov almost 10 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

When computer with CephFS mounted using ceph-fuse wakes-up from suspend the /mnt/ceph mount is not responding (i.e. `df -h /mnt/ceph` never comes back etc.).
To remedy situation I restarted active MDS and connectivity restored as soon as another MDS took over.
Here is what I found in MDS log:

2014-05-06 16:56:09.301733 7f2602a3d700  0 -- 192.168.0.204:6800/4504 >> 192.168.0.7:6801/11147 pipe(0x7f268320a780 sd=30 :0 s=1 pgs=0 cs=0 l=1 c=0x7f261088fde0).fault
2014-05-06 17:01:15.820747 7f2603e43700  0 log [INF] : closing stale session client.2311588 192.168.0.7:0/9222 after 300.686328
2014-05-06 17:53:42.524279 7f2602c3f700  0 auth: could not find secret_id=2792
2014-05-06 17:53:42.524287 7f2602c3f700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=2792
2014-05-06 17:53:42.524292 7f2602c3f700  0 -- 192.168.0.204:6800/4504 >> 192.168.0.7:0/9222 pipe(0x7f2680d7cc80 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f260ed53600).accept: got bad authorizer
2014-05-06 17:53:42.739958 7f2602c3f700  0 -- 192.168.0.204:6800/4504 >> 192.168.0.7:0/9222 pipe(0x7f268299fe00 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f268588b1e0).accept we reset (peer sent cseq 2), sending RESETSESSION
2014-05-06 18:16:05.409976 7f2603642700 -1 mds.0.166 *** got signal Terminated ***
2014-05-06 18:16:05.410014 7f2603642700  1 mds.0.166 suicide.  wanted down:dne, now up:active
2014-05-06 18:16:06.838411 7f044fadf7c0  0 ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea), process ceph-mds, pid 28665
2014-05-06 18:16:07.315467 7f044a72d700  1 mds.-1.0 handle_mds_map standby

Files

sleep-recover.patch (445 Bytes) sleep-recover.patch

Zheng Yan, 05/12/2014 07:42 PM

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

Another MDS have no problem reconnecting to client:

2014-05-06 18:16:09.441894 7f2ac5567700  1 mds.0.server reconnect_clients -- 3 sessions
2014-05-06 18:16:09.443429 7f2ac5567700  0 log [DBG] : reconnect by new client.2311588 192.168.0.7:0/9222 after 0.000021

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

Status changed from New to Resolved

current master branch contains fix for this issue (use "kick_stale_sessions" admin socket command)

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

Many thanks, Zheng. Looks like it is commit commit:09a1bc5.

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

I'm sorry to say that commit:09a1bc5 did not fix it.
I cherry-picked it to 0.80 and here how it looks in MDS log:

2014-05-11 09:57:58.152628 7fe9f87b6700  0 auth: could not find secret_id=2895
2014-05-11 09:57:58.152634 7fe9f87b6700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=2895
2014-05-11 09:57:58.152637 7fe9f87b6700  0 -- 192.168.0.2:6800/25496 >> 192.168.0.7:0/2446 pipe(0x7fea50d3db80 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0x7fea03f6a2c0).accept: got bad authorizer
2014-05-11 09:57:58.154039 7fe9f87b6700  0 auth: could not find secret_id=2895
2014-05-11 09:57:58.154045 7fe9f87b6700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=2895
2014-05-11 09:57:58.154048 7fe9f87b6700  0 -- 192.168.0.2:6800/25496 >> 192.168.0.7:0/2446 pipe(0x7fea05538000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0x7fea03f6b340).accept: got bad authorizer
2014-05-11 09:57:58.355668 7fe9f87b6700  0 -- 192.168.0.2:6800/25496 >> 192.168.0.7:0/2446 pipe(0x7fea3a94ec80 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0x7fea04094580).accept connect_seq 2 vs existing 1 state standby
2014-05-11 09:57:58.759059 7fe9f87b6700  0 -- 192.168.0.2:6800/25496 >> 192.168.0.7:0/2446 pipe(0x7fea3a94e780 sd=21 :6800 s=0 pgs=0 cs=0 l=0 c=0x7fea040946e0).accept we reset (peer sent cseq 2), sending RESETSESSION
2014-05-11 13:32:00.837406 7fe9fc3c8700  0 mds.0.181 handle_mds_beacon no longer laggy
2014-05-11 18:03:13.986293 7fe9f9ac2700 -1 mds.0.181 *** got signal Terminated ***
2014-05-11 18:03:13.986323 7fe9f9ac2700  1 mds.0.181 suicide.  wanted down:dne, now up:active

At 18:03 I restarted active MDS and the next "standby" one took over:

2014-05-11 18:03:15.420060 7fdd0aa3d700  1 mds.0.0 replay_done (as standby)
2014-05-11 18:03:16.380048 7fdd0aa3d700  1 mds.0.182 handle_mds_map i am now mds.0.182
2014-05-11 18:03:16.380053 7fdd0aa3d700  1 mds.0.182 handle_mds_map state change up:standby-replay --> up:replay
2014-05-11 18:03:16.420142 7fdd08938700  1 mds.0.182 standby_replay_restart (as standby)
2014-05-11 18:03:16.422594 7fdd0aa3d700  1 mds.0.182 replay_done (as standby)
2014-05-11 18:03:16.422598 7fdd0aa3d700  1 mds.0.182 standby_replay_restart (final takeover pass)
2014-05-11 18:03:16.424686 7fdd0aa3d700  1 mds.0.182 replay_done
2014-05-11 18:03:16.424687 7fdd0aa3d700  1 mds.0.182 making mds journal writeable
2014-05-11 18:03:21.566836 7fdd0aa3d700  1 mds.0.182 handle_mds_map i am now mds.0.182
2014-05-11 18:03:21.566843 7fdd0aa3d700  1 mds.0.182 handle_mds_map state change up:replay --> up:reconnect
2014-05-11 18:03:21.566846 7fdd0aa3d700  1 mds.0.182 reconnect_start
2014-05-11 18:03:21.566848 7fdd0aa3d700  1 mds.0.182 reopen_log
2014-05-11 18:03:21.568912 7fdd0aa3d700  1 mds.0.server reconnect_clients -- 4 sessions
2014-05-11 18:03:21.588278 7fdd0aa3d700  0 log [DBG] : reconnect by new client.2919908 192.168.0.7:0/2446 after 0.018920
2014-05-11 18:03:21.604682 7fdd0aa3d700  0 log [DBG] : reconnect by client.2976796 192.168.0.250:0/19436 after 0.035745
2014-05-11 18:03:21.819712 7fdd0aa3d700  0 log [DBG] : reconnect by client.2927732 192.168.0.204:0/4206 after 0.250776
2014-05-11 18:03:22.407688 7fdd0aa3d700  0 log [DBG] : reconnect by client.2802672 192.168.0.2:0/12050 after 0.838754
2014-05-11 18:03:24.823116 7fdd0aa3d700  0 log [DBG] : reconnect by client.2788926 192.168.0.6:0/2428 after 3.254022
2014-05-11 18:03:25.095932 7fdd0aa3d700  1 mds.0.182 reconnect_done
2014-05-11 18:03:25.345916 7fdd0aa3d700  1 mds.0.182 handle_mds_map i am now mds.0.182
2014-05-11 18:03:25.345923 7fdd0aa3d700  1 mds.0.182 handle_mds_map state change up:reconnect --> up:rejoin
2014-05-11 18:03:25.345925 7fdd0aa3d700  1 mds.0.182 rejoin_start
2014-05-11 18:03:25.404962 7fdd0aa3d700  1 mds.0.182 rejoin_joint_start
2014-05-11 18:03:26.557855 7fdd0aa3d700  1 mds.0.182 rejoin_done
2014-05-11 18:03:27.805918 7fdd0aa3d700  1 mds.0.182 handle_mds_map i am now mds.0.182
2014-05-11 18:03:27.805924 7fdd0aa3d700  1 mds.0.182 handle_mds_map state change up:rejoin --> up:active
2014-05-11 18:03:27.805934 7fdd0aa3d700  1 mds.0.182 recovery_done -- successful recovery!
2014-05-11 18:03:28.435315 7fdd0aa3d700  1 mds.0.182 active_start
2014-05-11 18:03:28.435961 7fdd0aa3d700  1 mds.0.182 cluster recovered.

As you can see it managed to reconnect to fuse client at 192.168.0.7 where mount point started to respond again.

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

several other commits followed by commit 09a1bc5 are also required. These commits do not make fuse client recover automatically, still need to use admin socket to kick stale sessions. (This is because some metadata updates may get lost in this case. Actually 'restarting MDS makes the fuse client recover' is a bug, the MDS should not allow the client reconnect)

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

Subject changed from 0.80~rc1: fuse client hangs after wake-up from suspend until restart of active MDS to 0.80: fuse client hangs after wake-up from suspend until restart of active MDS

Perhaps my initial confusion was because only the above-mentioned commit was cherry-picked to "firefly" branch.

Zheng could you be more specific please?
I applied commit:a0867dbdb0dfd014006b9852005bbada19973ed2 as merged from wip-client-sleep to 0.80 and easily reproduced the problem again.
We need to backport fix to Firefly. Are you sure this problem is fixed?

I used the following command to squash merge-commit into patch

git log -p -m --first-parent -1 a0867dbdb0dfd014006b9852005bbada19973ed2 >client-sleep.patch

In the context of wake up from suspend reconnecting to MDS should be safe as everything supposed to be flushed before suspend.
We should allow clients to reconnect to MDS on wake-up even if active MDS had changed during sleep.

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

Dmitry Smirnov wrote:

Perhaps my initial confusion was because only the above-mentioned commit was cherry-picked to "firefly" branch.

Zheng could you be more specific please?
I applied commit:a0867dbdb0dfd014006b9852005bbada19973ed2 as merged from wip-client-sleep to 0.80 and easily reproduced the problem again.
We need to backport fix to Firefly. Are you sure this problem is fixed?

I used the following command to squash merge-commit into patch

The patch doesn't make ceph-fuse recover automatically. you need to use following command to recover hang fuse client

ceph --admin-daemon /var/run/ceph/ceph-fuse.asok kick_stale_sessions

In the context of wake up from suspend reconnecting to MDS should be safe as everything supposed to be flushed before suspend.
We should allow clients to reconnect to MDS on wake-up even if active MDS had changed during sleep.

It might be safe for the suspend/resume case. But the MDS can't distinguish it from other cases (such as someone freeze the ceph-fuse process).

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

Status changed from Resolved to 12

Zheng Yan wrote:

The patch doesn't make ceph-fuse recover automatically. you need to use following command to recover hang fuse client

ceph --admin-daemon /var/run/ceph/ceph-fuse.asok kick_stale_sessions

No, it doesn't work (after wake up), at least not from client.
It just prints "{}".
If it worked I could probably run such command from wake-up hook but wouldn't it be ugly?

It might be safe for the suspend/resume case. But the MDS can't distinguish it from other cases (such as someone freeze the ceph-fuse process).

There must be a (safe) way to resume from suspend. I should be implemented.
After wake-up computer with CephFS mounted (using ceph-fuse) is not usable after accessing mount point.
I can't even shut down system normally because nothing that touched ceph mount point responds.
Only forced (lazy) unmount works if mount point was open at the time of suspend.
This situation is very serious. What do you think we could do to fix it?

Perhaps we could just treat suspended client as idle for a very long time (weeks) until wake up?

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

A suspended client isn't participating in the cluster and gets booted out; if it has stale data it cannot rejoin the cluster safely and that isn't something we can resolve. For now I'd recommend unmounting before suspend.

I'm not very familiar with the kernel normally behaves around suspend, though — if it's flushing everything before actually going to sleep, we could perhaps have the client drop its caches and reconnect as a new client if it has no dirty buffers. Thoughts, Zheng?

Actions

Copy link

#10

Updated by Dmitry Smirnov almost 10 years ago

Greg Farnum wrote:

A suspended client isn't participating in the cluster and gets booted out; if it has stale data it cannot rejoin the cluster safely and that isn't something we can resolve.

Kicking client who have file system mounted is very unsafe -- on wake up client can't even save other files to other file systems because UI hangs and application don't respond.
Reset (i.e. unclean shutdown) is hardly a safe action to recover from this situation.
Unexpectedly disconnected clients leave some files behind (e.g. libreoffice lock files etc.).

We should always allow client to rejoin even after long periods of inactivity.
That's what NFS do and perhaps other file systems as well.

Without knowing what application is doing it is not possible to decide whether it is "safe" or "unsafe" on server side.
If client falls behind after long period of inactivity let's make it reload its cache etc. but don't just kick it out like if it did something wrong.
Let's be user friendly and a bit more environment friendly by allowing resume from suspend. :)

For now I'd recommend unmounting before suspend.

What if my /home is on CephFS?
Even if user have enough rights to (un)mount can we seriously recommend to "close all files and unmount before suspend or risk losing all your work on resume"?

I'm not very familiar with the kernel normally behaves around suspend, though — if it's flushing everything before actually going to sleep, we could perhaps have the client drop its caches and reconnect as a new client if it has no dirty buffers.

This looks like a right thing to do.
Don't take my word on it but I think suspend procedure sync all file systems before changing power mode.
Actually whatever CepFS kernel client does it looks like correct behaviour.
At least with CepFS mounted using kernel client I was able to resume from short sleep successfully.
I'll try longer suspend to see how it work...

Actions

Copy link

#11

Updated by Zheng Yan almost 10 years ago

File sleep-recover.patch sleep-recover.patch added

Dmitry Smirnov wrote:

Greg Farnum wrote:

A suspended client isn't participating in the cluster and gets booted out; if it has stale data it cannot rejoin the cluster safely and that isn't something we can resolve.

Kicking client who have file system mounted is very unsafe -- on wake up client can't even save other files to other file systems because UI hangs and application don't respond.
Reset (i.e. unclean shutdown) is hardly a safe action to recover from this situation.
Unexpectedly disconnected clients leave some files behind (e.g. libreoffice lock files etc.).

We should always allow client to rejoin even after long periods of inactivity.
That's what NFS do and perhaps other file systems as well.

Without knowing what application is doing it is not possible to decide whether it is "safe" or "unsafe" on server side.
If client falls behind after long period of inactivity let's make it reload its cache etc. but don't just kick it out like if it did something wrong.
Let's be user friendly and a bit more environment friendly by allowing resume from suspend. :)

For now I'd recommend unmounting before suspend.

What if my /home is on CephFS?
Even if user have enough rights to (un)mount can we seriously recommend to "close all files and unmount before suspend or risk losing all your work on resume"?

I'm not very familiar with the kernel normally behaves around suspend, though — if it's flushing everything before actually going to sleep, we could perhaps have the client drop its caches and reconnect as a new client if it has no dirty buffers.

This looks like a right thing to do.
Don't take my word on it but I think suspend procedure sync all file systems before changing power mode.
Actually whatever CepFS kernel client does it looks like correct behaviour.
At least with CepFS mounted using kernel client I was able to resume from short sleep successfully.
I'll try longer suspend to see how it work...

The attached (one line) patch can make fuse-client work as you expected. But you are at risk of losing dirty metadata unnoticeably.

Actions

Copy link

#12

Updated by Dmitry Smirnov almost 10 years ago

Zheng Yan wrote:

The attached (one line) patch can make fuse-client work as you expected.

Thanks, but I still can't make it work, sorry.
I applied your patch to v0.80.1 but on resume ceph-fuse did not respond, as usual.

It keeps logging the following to ceph-client.user.log:

2014-05-14 02:21:16.686178 7f79f08be700  0 -- 192.168.0.6:0/2046 submit_message client_session(request_renewcaps seq 142) v1 remote, 192.168.0.2:6802/20464, failed lossy con, dropping message 0x7f79cc00a250

Once again fuse client became operational only after restart of active MDS:

2014-05-14 02:28:13.682140 7f774155e700  0 log [DBG] : reconnect by new client.3216376 192.168.0.6:0/2046 after 0.023876

What is missing this time?

But you are at risk of losing dirty metadata unnoticeably.

That's OK. In the absence of journal, integrity can't be guaranteed without knowing what client is doing.

Actions

Copy link

#13

Updated by Dmitry Smirnov almost 10 years ago

Status changed from 12 to Resolved

Finally with following commits from "master":

commit:b8aa58af793bea4ed1a150ac5bf554fc894774f1
commit:70ab0793cc7d2d75ac942efb677a0b3297d3792a
commit:3e41f92b8439fcae33f412c7b833a7233e1faecb

and sleep-recover.patch applied to 0.80.1 fuse-client successfully re-established connectivity with MDS after over 5 hours in suspend:

From active MDS' log:

2014-05-14 09:13:18.539094 7f1d17e1b700  0 auth: could not find secret_id=2971
2014-05-14 09:13:18.539108 7f1d17e1b700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=2971
2014-05-14 09:13:18.539115 7f1d17e1b700  0 -- 192.168.0.250:6800/25664 >> 192.168.0.6:0/13127 pipe(0x7f1d32488a00 sd=32 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f1d24262000).accept: got bad authorizer
2014-05-14 09:13:18.541459 7f1d17e1b700  0 auth: could not find secret_id=2971
2014-05-14 09:13:18.541466 7f1d17e1b700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=2971
2014-05-14 09:13:18.541469 7f1d17e1b700  0 -- 192.168.0.250:6800/25664 >> 192.168.0.6:0/13127 pipe(0x7f1d32489b80 sd=32 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f1d230251e0).accept: got bad authorizer
2014-05-14 09:13:18.742889 7f1d17e1b700  0 -- 192.168.0.250:6800/25664 >> 192.168.0.6:0/13127 pipe(0x7f1d2ec0f400 sd=32 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f1d230254a0).accept we reset (peer sent cseq 2), sending RESETSESSION
2014-05-14 09:13:18.743522 7f1d17e1b700  0 -- 192.168.0.250:6800/25664 >> 192.168.0.6:0/13127 pipe(0x7f1d2ec0f400 sd=32 :6800 s=2 pgs=9 cs=1 l=0 c=0x7f1d230254a0).fault with nothing to send, going to standby
2014-05-14 09:14:39.295933 7f1d158fd700  0 -- 192.168.0.250:6800/25664 >> 192.168.0.6:0/13127 pipe(0x7f1d2361e780 sd=33 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f1d23025080).accept connect_seq 0 vs existing 1 state standby
2014-05-14 09:14:39.295950 7f1d158fd700  0 -- 192.168.0.250:6800/25664 >> 192.168.0.6:0/13127 pipe(0x7f1d2361e780 sd=33 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f1d23025080).accept peer reset, then tried to connect to us, replacing

Thanks very much, Zheng Yan.

Now I think it would be nice to make behaviour of ceph-fuse consistent with kernel-client (i.e. to restore connection after suspend).

Actions

Copy link

#14

Updated by John Spray over 7 years ago

Project changed from Ceph to CephFS
Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #8291

0.80: fuse client hangs after wake-up from suspend until restart of active MDS

Updated by Dmitry Smirnov almost 10 years ago

Updated by Zheng Yan almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by Zheng Yan almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by Zheng Yan almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by Zheng Yan almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by Dmitry Smirnov almost 10 years ago

Updated by John Spray over 7 years ago