Project

General

Profile

Support #38374

Crash when using cephfs as /var/lib/docker in devicemapper mode

Added by Jérôme Poulin about 5 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:

Description

It seems when Docker creates the first device with dmsetup, the MDS crashes and all further MDS replaying the log crash too.

Mounted with:

10.10.0.16,10.10.0.17,10.10.0.18:/gitlab/runner/docker on /var/lib/docker type ceph (rw,noatime,name=gitlabrunner,secret=<hidden>,acl,wsize=16777216,write_congestion_kb=127616)

The mount point has xattr set to use a different size=1 pool and an alternate namespace if it matters (same as Kubernetes does in another pool).

Thread 25 "fn_anonymous" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe07fc700 (LWP 32514)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fffede11801 in __GI_abort () at abort.c:79
#2  0x00007fffef401790 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
    at /home/jerome.poulin/GIT/ceph/src/common/assert.cc:66
#3  0x00007fffef401807 in ceph::__ceph_assert_fail (ctx=...) at /home/jerome.poulin/GIT/ceph/src/common/assert.cc:71
#4  0x00005555557ba700 in virtual thunk to C_IO_MDC_TruncateFinish::finish(int) ()
#5  0x00005555558da4e9 in Context::complete (r=-1, this=0x55555689a820) at /home/jerome.poulin/GIT/ceph/src/include/Context.h:77
#6  MDSIOContextBase::complete (this=0x55555689a820, r=-1) at /home/jerome.poulin/GIT/ceph/src/mds/MDSContext.cc:116
#7  0x00007fffef3ffbde in Finisher::finisher_thread_entry (this=0x5555564c3980) at /home/jerome.poulin/GIT/ceph/src/common/Finisher.cc:68
#8  0x00007fffeed0c6db in start_thread (arg=0x7fffe07fc700) at pthread_create.c:463
#9  0x00007fffedef288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

mds-crash.log View (174 KB) Jérôme Poulin, 02/19/2019 02:33 AM

mds-crash-no-assert.log View (175 KB) Jérôme Poulin, 02/19/2019 02:35 AM

History

#1 Updated by Jérôme Poulin about 5 years ago

If I remove the assert, the MDS switches to damaged.

#2 Updated by Jérôme Poulin about 5 years ago

  1. cephfs-journal-tool --rank=cephfs:0 journal inspect
    Overall journal integrity: OK
  1. Extracts from the journal (I have a backup) from the path I was working in when the crash happened.
  1. cephfs-journal-tool --rank=cephfs:0 event get --path=gitlab/runner list
    2019-02-18 19:45:28.531676 0x854c865 UPDATE: (rename)
    gitlabrunner
    gitlab/runner
    2019-02-18 19:45:40.996344 0x854d64e UPDATE: (set vxattr layout)
    gitlab/runner
    2019-02-18 19:46:33.766931 0x854e177 UPDATE: (unlink_local)
    stray4/10000004bd4
    gitlab/runner/test
    2019-02-18 19:46:39.930615 0x854f514 UPDATE: (openc)
    gitlab/runner/test
    2019-02-18 19:46:45.229092 0x8551b40 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:46:55.948214 0x8552fae UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:01.952960 0x855a5d4 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:04.235145 0x855ba1c UPDATE: (unlink_local)
    stray4/10000004bd5
    gitlab/runner/test
    2019-02-18 19:47:06.840181 0x855cdce UPDATE: (openc)
    gitlab/runner/test
    2019-02-18 19:47:07.970368 0x8564516 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:09.133796 0x856b4ca UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:10.185168 0x8572c17 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:11.206615 0x8579bcb UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:12.293469 0x8580b7f UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:13.380762 0x8587d70 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:14.438138 0x858f29e UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:15.208945 0x8593838 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:15.518060 0x8596905 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:17.058621 0x859c987 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:22.391426 0x859d676 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:47:24.855424 0x859ef2c UPDATE: (unlink_local)
    stray5/10000004bd6
    gitlab/runner/test
    2019-02-18 19:51:14.147476 0x85a1263 UPDATE: (openc)
    gitlab/runner/a
    2019-02-18 19:51:17.511474 0x85a1f88 UPDATE: (openc)
    gitlab/runner/b
    2019-02-18 19:51:19.286667 0x85a2cad UPDATE: (cap update)
    gitlab/runner/a
    2019-02-18 19:51:21.678302 0x85a416d UPDATE: (openc)
    gitlab/runner/test
    2019-02-18 19:51:22.678282 0x85ac825 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:51:23.747205 0x85b3817 UPDATE: (cap update)
    gitlab/runner/test
    2019-02-18 19:51:28.080793 0x85b4cdd UPDATE: (openc)
    gitlab/runner/test2
    2019-02-18 19:51:28.125241 0x85b693a UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 19:51:29.090955 0x85bf447 UPDATE: (cap update)
    gitlab/runner/test2
    2019-02-18 19:51:34.883792 0x85c2cf2 UPDATE: (cap update)
    gitlab/runner/test2
    2019-02-18 19:51:35.185037 0x85c39fa UPDATE: (set vxattr layout)
    gitlab/runner
    2019-02-18 19:51:54.281991 0x85c4de6 UPDATE: (openc)
    gitlab/runner/test3
    2019-02-18 19:51:55.333816 0x85ce87d UPDATE: (cap update)
    gitlab/runner/test3
    2019-02-18 19:51:59.756196 0x85d0d71 UPDATE: (cap update)
    gitlab/runner/test3
    2019-02-18 19:52:06.118409 0x85d777e UPDATE: (cap update)
    gitlab/runner/test3
    2019-02-18 19:52:14.651556 0x85d84c4 UPDATE: (unlink_local)
    stray5/10000004bd7
    gitlab/runner/a
    2019-02-18 19:52:37.381793 0x85de4af UPDATE: (openc)
    gitlab/runner/test3
    2019-02-18 19:52:43.686823 0x85e67c3 UPDATE: (cap update)
    gitlab/runner/test3
    2019-02-18 19:53:42.204744 0x85e81fd UPDATE: (set vxattr layout)
    gitlab/runner
    2019-02-18 20:06:07.402970 0x85e95aa UPDATE: (mkdir)
    gitlab/runner/docker-overlay
    2019-02-18 20:06:32.120432 0x85ea572 UPDATE: (cap update)
    gitlab/runner/docker-overlay
    2019-02-18 20:06:38.104444 0x85f9655 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:06:38.108086 0x85fab44 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:10:48.352505 0x85fd2a0 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b
    2019-02-18 20:10:49.446393 0x86c1e85 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/srv
    2019-02-18 20:10:50.001303 0x87774fd UPDATE: (symlink)

    2019-02-18 20:10:50.809114 0x8799218 UPDATE: (cap update)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/lib
    2019-02-18 20:10:51.933395 0x87a604c UPDATE: (mkdir)
    gitlab/runner/docker-overlay/b357688e9abb3115181ef013da5bc957f26413a0d49321f3efa9b63f5db30632-init
    2019-02-18 20:10:53.558078 0x87ab89b UPDATE: (cap update)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/link
    2019-02-18 20:10:58.140472 0x87d2712 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b
    2019-02-18 20:11:08.118136 0x87ffffb UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:11:08.118515 0x8801fbe UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/usr
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/var
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/lib
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/etc
    2019-02-18 20:11:08.142698 0x8803fb0 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:11:18.115240 0x8806a57 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:11:28.108013 0x8808352 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:11:57.041720 0x88099d9 UPDATE: (cap update)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/link
    2019-02-18 20:12:03.108687 0x882db45 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b
    2019-02-18 20:12:13.121422 0x8837674 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:12:23.109173 0x8838b21 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:21:56.104180 0x8839c04 UPDATE: (unlink_local)
    stray7/100000051bc
    gitlab/runner/docker-overlay/l/VVZIYIW4ENZD5I2S7TFDWOL4U7
    2019-02-18 20:21:57.728785 0x89403c8 UPDATE: (unlink_local)
    stray7/100000050d0
    gitlab/runner/docker-overlay/a9085a016970f6d7015ddf2224a317b371d7334e235e4b424f41c39e7f16788b/diff/usr
    2019-02-18 20:22:03.169610 0x89d0ee1 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:22:03.255098 0x8a1a1f6 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:23:16.469673 0x8a1b746 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt
    2019-02-18 20:23:23.121150 0x8a1da85 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:23:27.760089 0x8a1e5c4 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54
    2019-02-18 20:23:28.770354 0x8ae31a8 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/diff/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54/srv
    2019-02-18 20:23:29.785204 0x8bb1ed6 UPDATE: (cap update)
    gitlab/runner/docker-overlay/diff/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54/bin
    2019-02-18 20:23:30.150564 0x8bbc5d2 UPDATE: (cap update)
    gitlab/runner/docker-overlay/diff/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54/usr/bin
    2019-02-18 20:23:30.984902 0x8bc696c UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/7a2f70c5a119d13bd7dbd640abe1c41cb93b07ab7a7fe75c0d05d0eb1e059414-init
    2019-02-18 20:23:33.122864 0x8c05ad7 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:23:33.159578 0x8c06cd1 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/diff/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54/etc/apk
    2019-02-18 20:23:38.121445 0x8c0c4a0 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/diff/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54/etc/network
    2019-02-18 20:23:43.136204 0x8c169e6 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:23:48.121716 0x8c17ea4 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/diff/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54
    2019-02-18 20:23:58.121709 0x8c1bc01 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:24:34.042831 0x8c1f341 UPDATE: (cap update)
    gitlab/runner/docker-overlay/layers/ae64b34f260e5cb422ea2d9e5c36fba0dbc3e203ca58f30288bfa0e289ddae54
    2019-02-18 20:24:43.122575 0x8c44754 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:24:53.237987 0x8c4e00c UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:25:03.122986 0x8c4fc49 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:25:03.139734 0x8c51107 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:25:10.115273 0x8c51af6 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/88c89f6710acf974d13b104fa14040957b9dae4d51435046123155048a962193-init
    2019-02-18 20:25:18.123399 0x8c5a909 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:25:57.962784 0x8c5d610 UPDATE: (mknod)
    gitlab/runner/docker-overlay/asd
    2019-02-18 20:26:02.786677 0x8c5e679 UPDATE: (unlink_local)
    stray0/10000004dd7
    gitlab/runner/docker-overlay/asd
    2019-02-18 20:26:11.782457 0x8c60035 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/6dda0db20ce5d96f7daba4433f710dfe5a2038a6c34f383f594d7012b590af6f-init
    2019-02-18 20:26:18.124494 0x8c66e86 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:26:31.794069 0x8c69745 UPDATE: (cap update)
    gitlab/runner/docker-overlay/diff/88c89f6710acf974d13b104fa14040957b9dae4d51435046123155048a962193-init/.wh..wh.aufs
    2019-02-18 20:26:38.124624 0x8c6b5f1 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:27:14.033845 0x8c6c46f UPDATE: (cap update)
    gitlab/runner/docker-overlay/diff/6dda0db20ce5d96f7daba4433f710dfe5a2038a6c34f383f594d7012b590af6f-init/.wh..wh.aufs
    2019-02-18 20:27:23.125956 0x8c6e31b UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:28:45.780342 0x8c6f199 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/ac4f40d4a6c9f5350709bc2bfc2d262e211a313881c3bdfa81f189b5f902e686-init
    2019-02-18 20:28:53.127384 0x8c77d45 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:29:48.914015 0x8c7aca3 UPDATE: (cap update)
    gitlab/runner/docker-overlay/diff/ac4f40d4a6c9f5350709bc2bfc2d262e211a313881c3bdfa81f189b5f902e686-init/.wh..wh.aufs
    2019-02-18 20:29:58.128196 0x8c7cb4f UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:33:29.358825 0x8c7db9b UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/3339c43224ede83ea9d5683623ffa2c758d24680915208736883790e3358fa49
    2019-02-18 20:33:30.416385 0x8d05356 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/diff/3339c43224ede83ea9d5683623ffa2c758d24680915208736883790e3358fa49/media
    2019-02-18 20:33:31.423845 0x8dfa632 UPDATE: (setattr)

    2019-02-18 20:33:31.525452 0x8e059c5 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/diff/3339c43224ede83ea9d5683623ffa2c758d24680915208736883790e3358fa49/var
    2019-02-18 20:33:32.995931 0x8e26c46 UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/ac6c2a8a5cd35e67068b716a6250a79c6724a5579143e7237ba5999ff9520cff-init
    2019-02-18 20:33:38.132214 0x8e69df5 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay
    2019-02-18 20:33:44.221921 0x8e7592f UPDATE: (mkdir)
    gitlab/runner/docker-overlay/mnt/7c304afacac119c99dc690a5f79bc9ce3cd11ec3afecb52a39c6d9fb07f8c7b5-init
    2019-02-18 20:33:48.162216 0x8e7fd4b UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:33:58.132281 0x8e82aa0 UPDATE: (scatter_writebehind)
    gitlab/runner/docker-overlay/diff/3339c43224ede83ea9d5683623ffa2c758d24680915208736883790e3358fa49/usr
    2019-02-18 20:34:56.325813 0x8e99c13 UPDATE: (mkdir)
    gitlab/runner/docker
    2019-02-18 20:35:11.296275 0x8e9abab UPDATE: (setattr)
    gitlab/runner/docker
    2019-02-18 20:35:13.168673 0x8ebe7e1 UPDATE: (unlink_local)
    stray1/10000005207
    gitlab/runner/docker/tmp/docker-aufs-union967015101
    2019-02-18 20:35:18.133745 0x8ec8e01 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:35:18.223818 0x8ecd5a1 UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:35:20.115717 0x8ed1ab5 UPDATE: (openc)
    gitlab/runner/docker/tmp/GetImageBlob600375544
    2019-02-18 20:35:21.791506 0x8f51801 UPDATE: (mkdir)
    gitlab/runner/docker/aufs/diff/efa7dc80d100e58e3929eefcfd3099b5629259ae1a71259fc9665c442d9ad00e/home
    2019-02-18 20:35:22.275900 0x8fda69e UPDATE: (setattr)
    gitlab/runner/docker/aufs/diff/efa7dc80d100e58e3929eefcfd3099b5629259ae1a71259fc9665c442d9ad00e/usr/bin/nmeter
    2019-02-18 20:35:22.321780 0x9001601 UPDATE: (symlink)
    gitlab/runner/docker/aufs/diff/efa7dc80d100e58e3929eefcfd3099b5629259ae1a71259fc9665c442d9ad00e/usr/bin/test
    2019-02-18 20:35:22.911920 0x906ccc6 UPDATE: (cap update)
    gitlab/runner/docker/aufs/diff/efa7dc80d100e58e3929eefcfd3099b5629259ae1a71259fc9665c442d9ad00e/bin
    2019-02-18 20:35:28.134532 0x90de12a UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:35:28.173152 0x90e80a2 UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:35:33.134324 0x90eea41 UPDATE: (scatter_writebehind)
    gitlab/runner/docker/image/aufs
    2019-02-18 20:35:38.133824 0x90f3aec UPDATE: (scatter_writebehind)
    gitlab/runner/docker/aufs/diff/efa7dc80d100e58e3929eefcfd3099b5629259ae1a71259fc9665c442d9ad00e/usr/share/apk
    2019-02-18 20:35:38.218252 0x90f6537 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:35:38.230131 0x90f78d1 UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:35:42.844970 0x90f8e8c UPDATE: (unlink_local)
    stray3/10000005216
    gitlab/runner/docker/aufs/mnt/efa7dc80d100e58e3929eefcfd3099b5629259ae1a71259fc9665c442d9ad00e
    2019-02-18 20:35:44.345391 0x9283020 UPDATE: (unlink_local)
    stray3/100000051f2
    gitlab/runner/docker/aufs
    2019-02-18 20:35:44.363010 0x928554a UPDATE: (unlink_local)
    stray4/1000000520c
    gitlab/runner/docker/builder
    2019-02-18 20:35:48.158769 0x92ba53e UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:35:53.134789 0x930c459 UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:35:59.747461 0x930d96e UPDATE: (mkdir)
    gitlab/runner/docker/containerd
    2019-02-18 20:36:02.864433 0x933d5bf UPDATE: (openc)
    gitlab/runner/docker/tmp/GetImageBlob234906285
    2019-02-18 20:36:04.210918 0x93ca380 UPDATE: (unlink_local)
    stray4/10000005452
    gitlab/runner/docker/tmp/GetImageBlob234906285
    2019-02-18 20:36:04.287379 0x93d0372 UPDATE: (mkdir)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/run
    2019-02-18 20:36:04.367393 0x94011ab UPDATE: (setattr)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/sbin/slattach
    2019-02-18 20:36:06.006817 0x94da29e UPDATE: (mkdir)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/usr/share/apk/keys/aarch64
    2019-02-18 20:36:07.245192 0x94f695a UPDATE: (mkdir)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/var/log
    2019-02-18 20:36:08.136844 0x9515674 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:36:08.238070 0x951b67d UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:36:10.615324 0x952c5a4 UPDATE: (cap update)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/var
    2019-02-18 20:36:11.625945 0x952e740 UPDATE: (cap update)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/var/lib
    2019-02-18 20:36:13.000480 0x9535cc6 UPDATE: (mkdir)
    gitlab/runner/docker/image/overlay2/layerdb/sha256
    2019-02-18 20:36:18.134700 0x95554ea UPDATE: (scatter_writebehind)
    gitlab/runner/docker/image
    2019-02-18 20:36:18.204451 0x955c100 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:36:23.134590 0x955d4d8 UPDATE: (scatter_writebehind)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff
    2019-02-18 20:36:28.134716 0x9563938 UPDATE: (scatter_writebehind)
    gitlab/runner/docker/image/overlay2
    2019-02-18 20:36:33.134524 0x9566ade UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:36:33.155117 0x9568b4e UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:36:36.726005 0x956a10d UPDATE: (cap update)

    2019-02-18 20:36:39.793937 0x956fa24 UPDATE: (cap update)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/usr/share/misc
    2019-02-18 20:36:43.163758 0x95737d1 UPDATE: (scatter_writebehind)
    gitlab/runner/docker/overlay2
    2019-02-18 20:36:44.913740 0x9575112 UPDATE: (cap update)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/var/tmp
    2019-02-18 20:36:46.969416 0x957b7ca UPDATE: (cap update)
    gitlab/runner/docker/overlay2/b734edaaacf05891a3a3fdfb50c468e0c225769fdd976ded04f339ced8fd1434-init/link
    2019-02-18 20:36:48.135301 0x957d69b UPDATE: (scatter_writebehind)
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/usr
    2019-02-18 20:36:50.033693 0x957f224 UPDATE: (cap update)
    gitlab/runner/docker/image/overlay2/repositories.json
    2019-02-18 20:36:53.160268 0x9584853 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:36:58.134992 0x9587805 UPDATE: (scatter_writebehind)
    gitlab/runner/docker/image/overlay2
    2019-02-18 20:37:03.135255 0x958eecd UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:37:04.925121 0x958fa0c UPDATE: (cap update)
    gitlab/runner/docker/volumes/metadata.db
    2019-02-18 20:37:05.931614 0x9591b88 UPDATE: (cap update)
    gitlab/runner/docker/buildkit/cache.db
    2019-02-18 20:37:07.089513 0x9594529 UPDATE: (unlink_local)
    stray6/1000000544a
    gitlab/runner/docker/builder/fscache.db
    2019-02-18 20:37:08.136556 0x9619c7b UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:37:08.367465 0x964a737 UPDATE: (unlink_local)
    stray7/100000055f7
    gitlab/runner/docker/overlay2/1f0cfd912d4a4239ca0fda4d4706c67fe5da304b2099f6beb03de6566aee755d/diff/usr/sbin/chpasswd
    2019-02-18 20:37:13.209891 0x97a4274 UPDATE: (scatter_writebehind)
    gitlab/runner
    2019-02-18 20:37:14.363680 0x97a4db3 UPDATE: (mkdir)
    gitlab/runner/docker/containerd
    2019-02-18 20:37:15.712271 0x97c1b9c UPDATE: (cap update)
    gitlab/runner/docker/devicemapper/devicemapper/metadata
    2019-02-18 20:37:16.836052 0x97d5822 UPDATE: (mkdir)
    gitlab/runner/docker/swarm
    2019-02-18 20:37:18.136272 0x97dbadc UPDATE: (scatter_writebehind)
    gitlab/runner/docker
    2019-02-18 20:37:19.184364 0x97dd505 UPDATE: (cap update)
    gitlab/runner/docker/devicemapper/devicemapper/metadata
    2019-02-18 20:37:20.392242 0x97df3ba UPDATE: (cap update)
    gitlab/runner/docker/tmp/GetImageBlob071814131
    2019-02-18 20:37:20.926810 0x97edbf4 UPDATE: (cap update)
    gitlab/runner/docker/devicemapper/devicemapper/metadata
    2019-02-18 20:37:21.141452 0x9801cbd UPDATE: (openc)
    gitlab/runner/docker/image/devicemapper/.tmp-repositories.json101563899
    2019-02-18 20:37:21.478299 0x9815d01 UPDATE: (rename)
    gitlab/runner/docker/devicemapper/metadata/a549e8eb9e9b0ad4c9966466681f1ab496e27815540b6135ade9053ddb814918
    gitlab/runner/docker/devicemapper/metadata/.tmp500329311
    2019-02-18 20:40:12.633291 0x9831f3e UPDATE: (cap update)
    gitlab/runner/docker/image/devicemapper/layerdb/mounts/3abdac5565b0758d959c4332dd1f36234ab7c9ec6c6ba48978697e6cbb60423d/init-id
    2019-02-18 20:40:12.633989 0x9834224 UPDATE: (cap update)
    gitlab/runner/docker/image/devicemapper/repositories.json
  1. cephfs-journal-tool --rank=cephfs:0 event recover_dentries --path=gitlab/runner summary
    Events by type:
    UPDATE: 171
    Errors: 0

Since I didn't have any recent changes I cared about, I did reset the journal and it still failes:

  1. ceph-mds --setuser ceph --setgroup ceph i sg1vosrv08 -d
    2019-02-18 22:36:44.904 7fb40f1ce440 0 set uid:gid to 64045:64045 (ceph:ceph)
    2019-02-18 22:36:44.904 7fb40f1ce440 0 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable), process ceph-mds, pid 1954715
    2019-02-18 22:36:44.908 7fb40f1ce440 0 pidfile_write: ignore empty --pid-file
    starting mds.sg1vosrv08 at -
    2019-02-18 22:36:44.912 7fb3fe3df700 1 mds.sg1vosrv08 Updating MDS map to version 395 from mon.0
    2019-02-18 22:36:45.148 7fb3fe3df700 1 mds.sg1vosrv08 Updating MDS map to version 396 from mon.0
    2019-02-18 22:36:45.152 7fb3fe3df700 1 mds.sg1vosrv08 Map has assigned me to become a standby
    2019-02-18 22:36:45.168 7fb3fe3df700 1 mds.sg1vosrv08 Updating MDS map to version 397 from mon.0
    2019-02-18 22:36:45.168 7fb3fe3df700 1 mds.0.397 handle_mds_map i am now mds.0.397
    2019-02-18 22:36:45.168 7fb3fe3df700 1 mds.0.397 handle_mds_map state change up:boot --> up:replay
    2019-02-18 22:36:45.168 7fb3fe3df700 1 mds.0.397 replay_start
    2019-02-18 22:36:45.168 7fb3fe3df700 1 mds.0.397 recovery set is
    2019-02-18 22:36:45.180 7fb3f83d3700 0 mds.0.cache creating system inode with ino:0x100
    2019-02-18 22:36:45.180 7fb3f83d3700 0 mds.0.cache creating system inode with ino:0x1
    2019-02-18 22:36:45.184 7fb3f6bd0700 1 mds.0.journal EResetJournal
    2019-02-18 22:36:45.184 7fb3f6bd0700 1 mds.0.sessionmap wipe start
    2019-02-18 22:36:45.184 7fb3f6bd0700 1 mds.0.sessionmap wipe result
    2019-02-18 22:36:45.184 7fb3f6bd0700 1 mds.0.sessionmap wipe done
    2019-02-18 22:36:45.184 7fb3f6bd0700 1 mds.0.397 replay_done
    2019-02-18 22:36:45.184 7fb3f6bd0700 1 mds.0.397 making mds journal writeable
    2019-02-18 22:36:46.196 7fb3fe3df700 1 mds.sg1vosrv08 Updating MDS map to version 398 from mon.0
    2019-02-18 22:36:46.196 7fb3fe3df700 1 mds.0.397 handle_mds_map i am now mds.0.397
    2019-02-18 22:36:46.196 7fb3fe3df700 1 mds.0.397 handle_mds_map state change up:replay --> up:reconnect
    2019-02-18 22:36:46.196 7fb3fe3df700 1 mds.0.397 reconnect_start
    2019-02-18 22:36:46.196 7fb3fe3df700 1 mds.0.397 reopen_log
    2019-02-18 22:36:46.196 7fb3fe3df700 1 mds.0.397 reconnect_done
    2019-02-18 22:36:46.200 7fb3fe3df700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
    2019-02-18 22:36:46.200 7fb3fe3df700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.16348125 10.10.181.16:0/4010683472 after 2019-02-18 22:36:46.205310 (allowed interval 45)
    2019-02-18 22:36:46.204 7fb400be4700 0 -
    10.10.0.16:6821/4246809671 >> 10.10.181.16:0/4010683472 conn(0x55a159a9e000 :6821 s=STATE_OPEN pgs=150 cs=1 l=0).fault server, going to standby
    2019-02-18 22:36:47.216 7fb3fe3df700 1 mds.sg1vosrv08 Updating MDS map to version 399 from mon.0
    2019-02-18 22:36:47.216 7fb3fe3df700 1 mds.0.397 handle_mds_map i am now mds.0.397
    2019-02-18 22:36:47.216 7fb3fe3df700 1 mds.0.397 handle_mds_map state change up:reconnect --> up:rejoin
    2019-02-18 22:36:47.216 7fb3fe3df700 1 mds.0.397 rejoin_start
    2019-02-18 22:36:47.216 7fb3fe3df700 1 mds.0.397 rejoin_joint_start
    2019-02-18 22:36:47.220 7fb3f83d3700 0 mds.0.cache.dir(0x1) _fetched badness: got (but i already had) [inode 0x10000000004 [2,head] /Win10_1703_English_x64.iso auth v68 s=4334315520 n(v0 b4334315520 1=1+0) (iversion lock) 0x55a159bda700] mode 33188 mtime 2018-04-12 10:07:10.123469
    2019-02-18 22:36:47.220 7fb3f83d3700 1 log_channel(cluster) log [ERR] : loaded dup inode 0x10000000004 [2,head] v62 at /.Win10_1703_English_x64.iso.UN6JBY, but inode 0x10000000004.head v68 already exists at /Win10_1703_English_x64.iso
    2019-02-18 22:36:47.220 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x100000047ea
    2019-02-18 22:36:47.220 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000000005
    2019-02-18 22:36:47.224 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000000006
    2019-02-18 22:36:47.224 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003062
    2019-02-18 22:36:47.224 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003066
    2019-02-18 22:36:47.224 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003067
    2019-02-18 22:36:47.224 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000989683
    2019-02-18 22:36:47.224 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000989687
    2019-02-18 22:36:47.232 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003071
    2019-02-18 22:36:47.236 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003076
    2019-02-18 22:36:47.236 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003074
    2019-02-18 22:36:47.236 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10000003072
    2019-02-18 22:36:47.236 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x100000047e9
    2019-02-18 22:36:47.236 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000098968a
    2019-02-18 22:36:47.240 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000098968b
    2019-02-18 22:36:47.240 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000098968d
    2019-02-18 22:36:47.240 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000098968c
    2019-02-18 22:36:47.244 7fb3f9bd6700 1 mds.0.397 rejoin_done
    2019-02-18 22:36:48.264 7fb3fe3df700 1 mds.sg1vosrv08 Updating MDS map to version 400 from mon.0
    2019-02-18 22:36:48.264 7fb3fe3df700 1 mds.0.397 handle_mds_map i am now mds.0.397
    2019-02-18 22:36:48.264 7fb3fe3df700 1 mds.0.397 handle_mds_map state change up:rejoin --> up:active
    2019-02-18 22:36:48.264 7fb3fe3df700 1 mds.0.397 recovery_done -
    successful recovery!
    2019-02-18 22:36:48.264 7fb3fe3df700 1 mds.0.397 active_start
    2019-02-18 22:36:48.268 7fb3fe3df700 1 mds.0.397 cluster recovered.
    2019-02-18 22:36:48.268 7fb400be4700 0 -- 10.10.0.16:6821/4246809671 >> 10.10.181.16:0/4010683472 conn(0x55a159e8ca00 :6821 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
    2019-02-18 22:36:48.272 7fb3f83d3700 0 mds.0 RecoveryQueue::_recovered recovery error! -1
    2019-02-18 22:36:48.272 7fb3f83d3700 -1 log_channel(cluster) log [ERR] : OSD read error while recovering size for inode 0x10000004bdc
    2019-02-18 22:36:48.272 7fb3f83d3700 1 mds.sg1vosrv08 respawn!

#3 Updated by Jérôme Poulin about 5 years ago

I was able to mount the volume after a couple of asserts removed and backup/restore data in a new filesystem. All procedures presented in "Disaster recovery" didn't succeed. I have a RADOS backup of meta/data/journal of the FS if needed.

#4 Updated by Patrick Donnelly about 5 years ago

  • Tracker changed from Bug to Support
  • Description updated (diff)
  • Start date deleted (02/19/2019)
  • Affected Versions deleted (v13.2.4)

What the assert is indicating that the file could not be truncated due to RADOS errors (probably). The current status of your cluster would be helpful.

The docker aspects of this ticket are probably not relevant.

#5 Updated by Jérôme Poulin about 5 years ago

All PGs were active+clean, the only indication was that the filesystem was failed and MDS were all missing.

#6 Updated by Zheng Yan about 5 years ago

restart mds with debug_ms=1. I am curious what error OSD returned

#7 Updated by Jérôme Poulin about 5 years ago

I restored the backup of the meta and data pool for the CephFS on another cluster but I'm not able to attach a filesystem without clearing the metadata pool. If you have any trick to fool the filesystem to use this metadata pool instead, I can run the MDS again.

2019-02-22 14:36:10.817508 7f8ae2540700 1 mds.0.740 replay_start
2019-02-22 14:36:10.817511 7f8ae2540700 1 mds.0.740 recovery set is
2019-02-22 14:36:10.817515 7f8ae2540700 1 mds.0.740 waiting for osdmap 3305 (which blacklists prior instance)
2019-02-22 14:36:10.848348 7f8add536700 -1 mds.0.journaler.pq(ro) may got older pool id from header layout
  • Caught signal (Aborted)
    in thread 7f8add536700 thread_name:PQ_Finisher
    ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
    1: (()+0x5b9444) [0x55f0a039c444]
    2: (()+0x11390) [0x7f8ae77fd390]
    3: (gsignal()+0x38) [0x7f8ae678d428]
    4: (abort()+0x16a) [0x7f8ae678f02a]
    5: (Journaler::_set_layout(file_layout_t const*)+0x192) [0x55f0a0347cf2]
    6: (Journaler::_finish_read_head(int, ceph::buffer::list&)+0x7db) [0x55f0a034bd6b]
    7: (Context::complete(int)+0x9) [0x55f0a0084089]
    8: (Finisher::finisher_thread_entry()+0x460) [0x55f0a03e1550]
    9: (()+0x76ba) [0x7f8ae77f36ba]
    10: (clone()+0x6d) [0x7f8ae685f41d]
    2019-02-22 14:36:10.850120 7f8add536700 -1
    Caught signal (Aborted) *
    in thread 7f8add536700 thread_name:PQ_Finisher

#8 Updated by Jérôme Poulin over 1 year ago

We're not using this kind of setup anymore and won't be troubleshooting further. We can close this for now, probably unresolved.

Also available in: Atom PDF