Project

General

Profile

Bug #45417

cephadm: nfs grace remove killed before completion

Added by Michael Fritch 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm (binary)
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

ganesha-rados-grace remove is killed before completion.

The shutdown of the nfs container + grace remove (unit.poststop) take longer than 15seconds and are killed by systemd.

May 06 15:13:46 host1 bash[7542]: 06/05/2020 21:13:46 : epoch 5eb3287e : host1 : ganesha.nfsd-1[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
May 06 15:13:46 host1 bash[7542]: 06/05/2020 21:13:46 : epoch 5eb3287e : host1 : ganesha.nfsd-1[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
May 06 15:13:46 host1 bash[7542]: 06/05/2020 21:13:46 : epoch 5eb3287e : host1 : ganesha.nfsd-1[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
May 06 15:13:57 host1 systemd[1]: Stopping Ceph nfs.foo.host1 for 05b7b025-8a6d-43a6-bef2-701222ad75d7...
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Stopping delayed executor.
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Delayed executor stopped.
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Stopping state asynchronous request thread
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :THREAD :EVENT :State asynchronous request system shut down.
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Unregistering ports used by NFS service
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Shutting down RPC services
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[svc_8] rpc :TIRPC :EVENT :svc_rqst_rec_destroy: sr_rec 0x56350da58760 evchan 2 ev_refcnt 0 epoll_fd 40 control fd pair (38:39) unhook failed (9)
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Stopping worker threads
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :THREAD :EVENT :General fridge shut down.
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :THREAD :EVENT :Reaper thread shut down.
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Removing all exports.
May 06 15:13:57 host1 bash[7542]: 06/05/2020 21:13:57 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Removing all DSs.
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[0x7fc656ffd6d0] fridgethr_wake :THREAD :MAJ :Attempt to wake stopped/paused fridge reaper.
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :Destroying the FSAL system.
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Shutting down handles for FSAL MDCACHE
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Shutting down DS handles for FSAL MDCACHE
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Shutting down exports for FSAL MDCACHE
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Exports for FSAL MDCACHE shut down
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Shutting down handles for FSAL PSEUDO
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Shutting down DS handles for FSAL PSEUDO
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Shutting down exports for FSAL PSEUDO
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] destroy_fsals :FSAL :EVENT :Exports for FSAL PSEUDO shut down
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[Admin] do_shutdown :MAIN :EVENT :FSAL system destroyed.
May 06 15:13:58 host1 bash[7542]: 06/05/2020 21:13:58 : epoch 5eb3287e : host1 : ganesha.nfsd-1[main] nfs_start :MAIN :EVENT :NFS EXIT: regular exit
May 06 15:14:00 host1 podman[7722]: 2020-05-06 15:14:00.799854248 -0600 MDT m=+3.072652256 container died 326a3c5b4ecd29818dd3fff31ca68062e3be6bdb8270673d4233075e391d9e01 (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1)
May 06 15:14:02 host1 podman[7722]: 2020-05-06 15:14:02.68372362 -0600 MDT m=+4.956521660 container stop 326a3c5b4ecd29818dd3fff31ca68062e3be6bdb8270673d4233075e391d9e01 (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1)
May 06 15:14:02 host1 podman[7722]: 326a3c5b4ecd29818dd3fff31ca68062e3be6bdb8270673d4233075e391d9e01
May 06 15:14:07 host1 podman[7542]: 2020-05-06 15:14:07.76693186 -0600 MDT m=+46.335048304 container remove 326a3c5b4ecd29818dd3fff31ca68062e3be6bdb8270673d4233075e391d9e01 (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1)
May 06 15:14:11 host1 podman[7832]: 2020-05-06 15:14:11.33339705 -0600 MDT m=+8.595038930 container create 0701ddfe656a9ed04cdb47186a37e24cc25250e766786fb6c071d66c8c37886c (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1-grace-remove)
May 06 15:14:14 host1 podman[7832]: 2020-05-06 15:14:14.075035135 -0600 MDT m=+11.336676988 container init 0701ddfe656a9ed04cdb47186a37e24cc25250e766786fb6c071d66c8c37886c (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1-grace-remove)
May 06 15:14:15 host1 podman[7832]: 2020-05-06 15:14:15.016598392 -0600 MDT m=+12.278240262 container start 0701ddfe656a9ed04cdb47186a37e24cc25250e766786fb6c071d66c8c37886c (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1-grace-remove)
May 06 15:14:15 host1 podman[7832]: 2020-05-06 15:14:15.016813883 -0600 MDT m=+12.278455737 container attach 0701ddfe656a9ed04cdb47186a37e24cc25250e766786fb6c071d66c8c37886c (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1-grace-remove)
May 06 15:14:17 host1 systemd[1]: ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7@nfs.foo.host1.service: State 'stop-post' timed out. Terminating.
May 06 15:14:17 host1 systemd[1]: ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7@nfs.foo.host1.service: Failed with result 'timeout'.
May 06 15:14:17 host1 systemd[1]: Stopped Ceph nfs.foo.host1 for 05b7b025-8a6d-43a6-bef2-701222ad75d7.
May 06 15:14:19 host1 podman[7832]: 2020-05-06 15:14:19.927666126 -0600 MDT m=+17.189307983 container died 0701ddfe656a9ed04cdb47186a37e24cc25250e766786fb6c071d66c8c37886c (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1-grace-remove)
May 06 15:14:29 host1 podman[7832]: 2020-05-06 15:14:29.249473218 -0600 MDT m=+26.511115095 container remove 0701ddfe656a9ed04cdb47186a37e24cc25250e766786fb6c071d66c8c37886c (image=docker.io/ceph/daemon-base:latest-master-devel, name=ceph-05b7b025-8a6d-43a6-bef2-701222ad75d7-nfs.foo.host1-grace-remove)

History

#1 Updated by Michael Fritch 2 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 34938

#2 Updated by Sebastian Wagner about 2 months ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Sebastian Wagner about 1 month ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v15.2.4

Also available in: Atom PDF