Project

General

Profile

Bug #41935

ceph mdss keep on crashing

Added by Kenneth Waegeman 7 months ago. Updated 7 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, multimds
Pull request ID:
Crash signature:

Description

I updated ceph to 14.2.3 yesterday. everything was running fine, but today mds start crashing. I tried restarting all mds, but issue remains

[root@mds03 ~]# ceph -s
cluster:
id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
health: HEALTH_WARN
insufficient standby MDS daemons available

services:
mon: 3 daemons, quorum mds01,mds02,mds03 (age 22h)
mgr: mds02(active, since 25h), standbys: mds01, mds03
mds: ceph_fs:2 {0=mds02=up:active,1=mds03=up:active(laggy or crashed)}
osd: 535 osds: 535 up, 535 in
data:
pools: 3 pools, 3328 pgs
objects: 370.95M objects, 666 TiB
usage: 1.0 PiB used, 2.2 PiB / 3.2 PiB avail
pgs: 3317 active+clean
9 active+clean+scrubbing+deep
2 active+clean+scrubbing

This is the stacktraces I see in th mds logs:


Related issues

Duplicates fs - Bug #41948: nautilus: mds: incomplete backport of #40444 (MDCache::cow_inode does not cleanup unneeded client_snap_caps) Resolved

History

#1 Updated by Kenneth Waegeman 7 months ago

2019-09-19 15:41:40.156 7f037e6e01c0  0 set uid:gid to 167:167 (ceph:ceph)
2019-09-19 15:41:40.156 7f037e6e01c0  0 ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable), process ceph-mds, pid 294068
2019-09-19 15:41:40.156 7f037e6e01c0  0 pidfile_write: ignore empty --pid-file
2019-09-19 15:41:40.196 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553402 from mon.2
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553403 from mon.2
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.mds03 Map has assigned me to become a standby
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553404 from mon.2
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:boot --> up:replay
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.1.553404 replay_start
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.1.553404  recovery set is 0
2019-09-19 15:41:40.386 7f036c2f0700  1 mds.1.553404  waiting for osdmap 203605 (which blacklists prior instance)
2019-09-19 15:41:40.656 7f0365ae3700  0 mds.1.cache creating system inode with ino:0x101
2019-09-19 15:41:40.656 7f0365ae3700  0 mds.1.cache creating system inode with ino:0x1
2019-09-19 15:41:46.166 7f0364ae1700  1 mds.1.553404 Finished replaying journal
2019-09-19 15:41:46.166 7f0364ae1700  1 mds.1.553404 making mds journal writeable
2019-09-19 15:41:46.276 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553405 from mds.0
2019-09-19 15:41:46.276 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
2019-09-19 15:41:46.276 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:replay --> up:resolve
2019-09-19 15:41:46.276 7f036c2f0700  1 mds.1.553404 resolve_start
2019-09-19 15:41:46.276 7f036c2f0700  1 mds.1.553404 reopen_log
2019-09-19 15:41:46.276 7f036c2f0700  1 mds.1.553404  recovery set is 0
2019-09-19 15:41:46.566 7f036c2f0700  1 mds.1.553404 resolve_done
2019-09-19 15:41:47.406 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553406 from mon.2
2019-09-19 15:41:47.406 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
2019-09-19 15:41:47.406 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:resolve --> up:reconnect
2019-09-19 15:41:47.406 7f036c2f0700  1 mds.1.553404 reconnect_start
2019-09-19 15:41:47.406 7f036c2f0700  1 mds.1.server reconnect_clients -- 20 sessions
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962144 10.141.16.17:0/1354686824 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962162 10.141.16.24:0/2772929523 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962153 10.141.16.22:0/7426576 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962180 10.141.16.23:0/2413712232 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962183 10.141.16.20:0/743246641 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962189 10.141.16.19:0/2209083713 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962150 10.141.16.15:0/2231346138 after 0
2019-09-19 15:41:47.406 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37962177 10.141.16.25:0/648554407 after 0
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.36165377 v1:10.141.10.90:0/1063780365 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37960382 10.141.16.11:0/1996585949 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37961209 10.141.16.21:0/2395019382 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37960403 10.141.16.12:0/704451057 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37961197 10.141.16.13:0/3937945744 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37961236 10.141.16.14:0/304725641 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37960373 10.141.16.18:0/2156995594 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37960400 10.141.16.16:0/2737464876 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37865319 10.141.16.21:0/1591580297 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37865333 10.141.16.20:0/3660809459 after 0.01
2019-09-19 15:41:47.416 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37865372 10.141.16.19:0/3498217528 after 0.01
2019-09-19 15:41:47.456 7f036c2f0700  0 log_channel(cluster) log [DBG] : reconnect by client.37865356 10.141.16.11:0/1948991158 after 0.0500002
2019-09-19 15:41:47.456 7f036c2f0700  1 mds.1.553404 reconnect_done
2019-09-19 15:41:48.476 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553407 from mon.2
2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:reconnect --> up:rejoin
2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 rejoin_start
2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 rejoin_joint_start
2019-09-19 15:41:49.216 7f036c2f0700  1 mds.1.553404 rejoin_done
2019-09-19 15:41:49.526 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553408 from mon.2
2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:rejoin --> up:active
2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 recovery_done -- successful recovery!
2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 active_start
2019-09-19 15:41:49.756 7f036c2f0700  1 mds.1.553404 cluster recovered.
2019-09-19 15:42:00.336 7f036c2f0700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mds/MDCache.cc: In function 'CInode* MDCache::cow_inode(CInode*, snapid_t)' thread 7f036c2f0700 time 2019-09-19 15:42:00.333567
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mds/MDCache.cc: 1498: FAILED ceph_assert(!lock->get_num_wrlocks())

 ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f0375773ac2]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f0375773c90]
 3: (()+0x1ee48d) [0x55a7e4ccf48d]
 4: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x823) [0x55a7e4ccfcb3]
 5: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xbc) [0x55a7e4cd042c]
 6: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t, boost::intrusive_ptr<MClientCaps const> const&, boost::intrusive_ptr<MClientCaps> const&, bool*)+0xfb6) [0x55a7e4d957e6]
 7: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps const> const&)+0x2059) [0x55a7e4d9c8e9]
 8: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0xe7) [0x55a7e4daaf97]
 9: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x304) [0x55a7e4c089e4]
 10: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x6eb) [0x55a7e4c0b1bb]
 11: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x40) [0x55a7e4c0b8d0]
 12: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfc) [0x55a7e4bf82ac]
 13: (DispatchQueue::entry()+0x12a9) [0x7f0375968dd9]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f0375a183dd]
 15: (()+0x7dd5) [0x7f0373642dd5]
 16: (clone()+0x6d) [0x7f03722f302d]

2019-09-19 15:42:00.336 7f036c2f0700 -1 *** Caught signal (Aborted) **
 in thread 7f036c2f0700 thread_name:ms_dispatch

 ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
 1: (()+0xf5d0) [0x7f037364a5d0]
 2: (gsignal()+0x37) [0x7f037222b2c7]
 3: (abort()+0x148) [0x7f037222c9b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f0375773b11]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f0375773c90]
 6: (()+0x1ee48d) [0x55a7e4ccf48d]
 7: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x823) [0x55a7e4ccfcb3]
 8: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xbc) [0x55a7e4cd042c]
 9: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t, boost::intrusive_ptr<MClientCaps const> const&, boost::intrusive_ptr<MClientCaps> const&, bool*)+0xfb6) [0x55a7e4d957e6]
 10: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps const> const&)+0x2059) [0x55a7e4d9c8e9]
 11: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0xe7) [0x55a7e4daaf97]
 12: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x304) [0x55a7e4c089e4]
 13: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x6eb) [0x55a7e4c0b1bb]
 14: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x40) [0x55a7e4c0b8d0]
 15: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfc) [0x55a7e4bf82ac]
 16: (DispatchQueue::entry()+0x12a9) [0x7f0375968dd9]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f0375a183dd]
 18: (()+0x7dd5) [0x7f0373642dd5]
 19: (clone()+0x6d) [0x7f03722f302d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
....
  -221> 2019-09-19 15:41:48.476 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553407 from mon.2
  -220> 2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
  -219> 2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:reconnect --> up:rejoin
  -218> 2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 rejoin_start
  -217> 2019-09-19 15:41:48.476 7f036c2f0700  1 mds.1.553404 rejoin_joint_start
  -216> 2019-09-19 15:41:48.476 7f036c2f0700  5 mds.mds03 handle_mds_map old map epoch 553407 <= 553407, discarding
  -215> 2019-09-19 15:41:48.486 7f036faf7700  5 mds.beacon.mds03 received beacon reply up:rejoin seq 5 rtt 1.03
  -214> 2019-09-19 15:41:48.896 7f03652e2700  5 mds.1.log _submit_thread 30520209192609~1084 : ESessions 20 opens cmapv 115357868
  -213> 2019-09-19 15:41:49.216 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962144
  -212> 2019-09-19 15:41:49.216 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37960373
  -211> 2019-09-19 15:41:49.216 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37960400
  -210> 2019-09-19 15:41:49.216 7f036c2f0700  1 mds.1.553404 rejoin_done
  -209> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.553404 request_state up:active
  -208> 2019-09-19 15:41:49.336 7f036c2f0700  5 mds.beacon.mds03 set_want_state: up:rejoin -> up:active
  -207> 2019-09-19 15:41:49.336 7f036c2f0700  5 mds.beacon.mds03 Sending beacon up:active seq 6
  -206> 2019-09-19 15:41:49.336 7f036c2f0700 10 monclient: _send_mon_message to mon.mds03 at v2:10.141.16.3:3300/0
  -205> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962150
  -204> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962177
  -203> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962162
  -202> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962153
  -201> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962180
  -200> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962183
  -199> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37962189
  -198> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37960403
  -197> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37960382
  -196> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37961197
  -195> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37961236
  -194> 2019-09-19 15:41:49.336 7f036c2f0700  3 mds.1.server handle_client_session client_session(request_renewcaps seq 9) v1 from client.37961209
  -193> 2019-09-19 15:41:49.526 7f036c2f0700  1 mds.mds03 Updating MDS map to version 553408 from mon.2
  -192> 2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 handle_mds_map i am now mds.1.553404
  -191> 2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 handle_mds_map state change up:rejoin --> up:active
  -190> 2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 recovery_done -- successful recovery!
  -189> 2019-09-19 15:41:49.526 7f036c2f0700  1 mds.1.553404 active_start
  -188> 2019-09-19 15:41:49.526 7f036c2f0700  4 mds.1.553404 set_osd_epoch_barrier: epoch=203604
  -187> 2019-09-19 15:41:49.526 7f03652e2700  5 mds.1.log _submit_thread 30520209193713~25997 : EUpdate cap update [metablob 0x1000bd8ac7b, 12 dirs]
  -186> 2019-09-19 15:41:49.526 7f03652e2700  5 mds.1.log _submit_thread 30520209219730~1083 : EUpdate cap update [metablob 0x20005af5c74, 1 dirs]
  -185> 2019-09-19 15:41:49.526 7f03652e2700  5 mds.1.log _submit_thread 30520209220833~1037 : EUpdate check_inode_max_size [metablob 0x20005af5c74, 1 dirs]
  -184> 2019-09-19 15:41:49.526 7f036faf7700  5 mds.beacon.mds03 received beacon reply up:active seq 6 rtt 0.190001
  -183> 2019-09-19 15:41:49.756 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2375 lookup #0x1001172f39d/koi8-c 2019-09-19 15:41:39.482105 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
  -182> 2019-09-19 15:41:49.756 7f036c2f0700  5 mds.1.server waiting for root
  -181> 2019-09-19 15:41:49.756 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8866 lookup #0x100166004d4/Windows-MSVC-C.cmake 2019-09-19 15:41:39.479661 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
  -180> 2019-09-19 15:41:49.756 7f036c2f0700  5 mds.1.server waiting for root
  -179> 2019-09-19 15:41:49.756 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5807 setattr size=5055 #0x20005a6eb66 2019-09-19 15:41:39.478859 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
  -178> 2019-09-19 15:41:49.756 7f036c2f0700  5 mds.1.server waiting for root
  -177> 2019-09-19 15:41:49.756 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5808 lookup #0x20005a6eb3a/schema.cpython-37.pyc 2019-09-19 15:41:39.589633 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
  -176> 2019-09-19 15:41:49.756 7f036c2f0700  5 mds.1.server waiting for root
  -175> 2019-09-19 15:41:49.756 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865356:3604570 readdir #0x2000090d629 2019-09-19 15:41:39.523863 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
  -174> 2019-09-19 15:41:49.756 7f036c2f0700  5 mds.1.server waiting for root
  -173> 2019-09-19 15:41:49.756 7f036c2f0700  1 mds.1.553404 cluster recovered.
  -172> 2019-09-19 15:41:49.756 7f036c2f0700  4 mds.1.553404 set_osd_epoch_barrier: epoch=203605
  -171> 2019-09-19 15:41:49.756 7f036c2f0700  5 mds.mds03 handle_mds_map old map epoch 553408 <= 553408, discarding
  -170> 2019-09-19 15:41:50.196 7f036b2ee700 10 monclient: tick
  -169> 2019-09-19 15:41:50.196 7f036b2ee700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-09-19 15:41:20.200249)
  -168> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  log_queue is 20 last_log 20 sent 0 num 20 unsent 20 sending 20
  -167> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416276 mds.mds03 (mds.1) 1 : cluster [DBG] reconnect by client.37962144 10.141.16.17:0/1354686824 after 0
  -166> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416356 mds.mds03 (mds.1) 2 : cluster [DBG] reconnect by client.37962162 10.141.16.24:0/2772929523 after 0
  -165> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416403 mds.mds03 (mds.1) 3 : cluster [DBG] reconnect by client.37962153 10.141.16.22:0/7426576 after 0
  -164> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416441 mds.mds03 (mds.1) 4 : cluster [DBG] reconnect by client.37962180 10.141.16.23:0/2413712232 after 0
  -163> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416483 mds.mds03 (mds.1) 5 : cluster [DBG] reconnect by client.37962183 10.141.16.20:0/743246641 after 0
  -162> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416524 mds.mds03 (mds.1) 6 : cluster [DBG] reconnect by client.37962189 10.141.16.19:0/2209083713 after 0
  -161> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416562 mds.mds03 (mds.1) 7 : cluster [DBG] reconnect by client.37962150 10.141.16.15:0/2231346138 after 0
  -160> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416625 mds.mds03 (mds.1) 8 : cluster [DBG] reconnect by client.37962177 10.141.16.25:0/648554407 after 0
  -159> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.416915 mds.mds03 (mds.1) 9 : cluster [DBG] reconnect by client.36165377 v1:10.141.10.90:0/1063780365 after 0.01
  -158> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417130 mds.mds03 (mds.1) 10 : cluster [DBG] reconnect by client.37960382 10.141.16.11:0/1996585949 after 0.01
  -157> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417177 mds.mds03 (mds.1) 11 : cluster [DBG] reconnect by client.37961209 10.141.16.21:0/2395019382 after 0.01
  -156> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417212 mds.mds03 (mds.1) 12 : cluster [DBG] reconnect by client.37960403 10.141.16.12:0/704451057 after 0.01
  -155> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417259 mds.mds03 (mds.1) 13 : cluster [DBG] reconnect by client.37961197 10.141.16.13:0/3937945744 after 0.01
  -154> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417706 mds.mds03 (mds.1) 14 : cluster [DBG] reconnect by client.37961236 10.141.16.14:0/304725641 after 0.01
  -153> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417815 mds.mds03 (mds.1) 15 : cluster [DBG] reconnect by client.37960373 10.141.16.18:0/2156995594 after 0.01
  -152> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.417935 mds.mds03 (mds.1) 16 : cluster [DBG] reconnect by client.37960400 10.141.16.16:0/2737464876 after 0.01
  -151> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.418627 mds.mds03 (mds.1) 17 : cluster [DBG] reconnect by client.37865319 10.141.16.21:0/1591580297 after 0.01
  -150> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.419295 mds.mds03 (mds.1) 18 : cluster [DBG] reconnect by client.37865333 10.141.16.20:0/3660809459 after 0.01
  -149> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.419728 mds.mds03 (mds.1) 19 : cluster [DBG] reconnect by client.37865372 10.141.16.19:0/3498217528 after 0.01
  -148> 2019-09-19 15:41:50.196 7f036b2ee700 10 log_client  will send 2019-09-19 15:41:47.466316 mds.mds03 (mds.1) 20 : cluster [DBG] reconnect by client.37865356 10.141.16.11:0/1948991158 after 0.0500002
  -147> 2019-09-19 15:41:50.196 7f036b2ee700 10 monclient: _send_mon_message to mon.mds03 at v2:10.141.16.3:3300/0
  -146> 2019-09-19 15:41:50.216 7f0369aeb700  2 mds.1.cache Memory usage:  total 2071464, rss 1637780, heap 323836, baseline 323836, 477 / 613577 inodes have caps, 482 caps, 0.000785557 caps per inode
  -145> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209221890~9062 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
  -144> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209230972~4965 : EUpdate scatter_writebehind [metablob 0x1000bf27a9b, 2 dirs]
  -143> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209235957~4519 : EUpdate scatter_writebehind [metablob 0x100171e181c, 2 dirs]
  -142> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209240496~2809 : EUpdate scatter_writebehind [metablob 0x101, 2 dirs]
  -141> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209243325~2809 : EUpdate scatter_writebehind [metablob 0x101, 2 dirs]
  -140> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209246154~4071 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 2 dirs]
  -139> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209250245~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
  -138> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209260844~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 dirs]
  -137> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209263169~1845 : EUpdate scatter_writebehind [metablob 0x1000c233753, 2 dirs]
  -136> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209265034~1359 : EUpdate scatter_writebehind [metablob 0x20005f881ed, 2 dirs]
  -135> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209266413~5669 : EUpdate scatter_writebehind [metablob 0x10010dcb97f.101*, 5 dirs]
  -134> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209272102~1356 : EUpdate scatter_writebehind [metablob 0x20005f881ef, 2 dirs]
  -133> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209273478~1357 : EUpdate scatter_writebehind [metablob 0x20005f881ed, 2 dirs]
  -132> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209274855~1357 : EUpdate scatter_writebehind [metablob 0x20005f881ed, 2 dirs]
  -131> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209276232~1353 : EUpdate scatter_writebehind [metablob 0x20005f881ed, 2 dirs]
  -130> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209277605~1818 : EUpdate scatter_writebehind [metablob 0x20005af5c68, 2 dirs]
  -129> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209279443~1811 : EUpdate scatter_writebehind [metablob 0x20005af5c69, 2 dirs]
  -128> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209281274~12149 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 3 dirs]
  -127> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209293443~9062 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
  -126> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209302525~8615 : EUpdate scatter_writebehind [metablob 0x1000bf27a9b, 2 dirs]
  -125> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209311160~7042 : EUpdate scatter_writebehind [metablob 0x1000bf2745a, 8 dirs]
  -124> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209318222~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 dirs]
  -123> 2019-09-19 15:41:50.216 7f03652e2700  5 mds.1.log _submit_thread 30520209320547~3305 : EUpdate scatter_writebehind [metablob 0x1000c233753, 2 dirs]
  -122> 2019-09-19 15:41:50.226 7f03652e2700  5 mds.1.log _submit_thread 30520209323872~1846 : EUpdate scatter_writebehind [metablob 0x20005f881ed, 2 dirs]
  -121> 2019-09-19 15:41:50.226 7f03652e2700  5 mds.1.log _submit_thread 30520209325738~4038 : EUpdate scatter_writebehind [metablob 0x20005af5c63, 3 dirs]
  -120> 2019-09-19 15:41:50.226 7f03652e2700  5 mds.1.log _submit_thread 30520209329796~2760 : EUpdate scatter_writebehind [metablob 0x20005af5c68, 2 dirs]
  -119> 2019-09-19 15:41:50.226 7f03652e2700  5 mds.1.log _submit_thread 30520209332576~4651 : EUpdate lock inest accounted scatter stat update [metablob 0x1000bd8ac7b, 1 dirs]
  -118> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client handle_log_ack log(last 20) v1
  -117> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416276 mds.mds03 (mds.1) 1 : cluster [DBG] reconnect by client.37962144 10.141.16.17:0/1354686824 after 0
  -116> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416356 mds.mds03 (mds.1) 2 : cluster [DBG] reconnect by client.37962162 10.141.16.24:0/2772929523 after 0
  -115> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416403 mds.mds03 (mds.1) 3 : cluster [DBG] reconnect by client.37962153 10.141.16.22:0/7426576 after 0
  -114> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416441 mds.mds03 (mds.1) 4 : cluster [DBG] reconnect by client.37962180 10.141.16.23:0/2413712232 after 0
  -113> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416483 mds.mds03 (mds.1) 5 : cluster [DBG] reconnect by client.37962183 10.141.16.20:0/743246641 after 0
  -112> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416524 mds.mds03 (mds.1) 6 : cluster [DBG] reconnect by client.37962189 10.141.16.19:0/2209083713 after 0
  -111> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416562 mds.mds03 (mds.1) 7 : cluster [DBG] reconnect by client.37962150 10.141.16.15:0/2231346138 after 0
  -110> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416625 mds.mds03 (mds.1) 8 : cluster [DBG] reconnect by client.37962177 10.141.16.25:0/648554407 after 0
  -109> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.416915 mds.mds03 (mds.1) 9 : cluster [DBG] reconnect by client.36165377 v1:10.141.10.90:0/1063780365 after 0.01
  -108> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417130 mds.mds03 (mds.1) 10 : cluster [DBG] reconnect by client.37960382 10.141.16.11:0/1996585949 after 0.01
  -107> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417177 mds.mds03 (mds.1) 11 : cluster [DBG] reconnect by client.37961209 10.141.16.21:0/2395019382 after 0.01
  -106> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417212 mds.mds03 (mds.1) 12 : cluster [DBG] reconnect by client.37960403 10.141.16.12:0/704451057 after 0.01
  -105> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417259 mds.mds03 (mds.1) 13 : cluster [DBG] reconnect by client.37961197 10.141.16.13:0/3937945744 after 0.01
  -104> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417706 mds.mds03 (mds.1) 14 : cluster [DBG] reconnect by client.37961236 10.141.16.14:0/304725641 after 0.01
  -103> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417815 mds.mds03 (mds.1) 15 : cluster [DBG] reconnect by client.37960373 10.141.16.18:0/2156995594 after 0.01
  -102> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.417935 mds.mds03 (mds.1) 16 : cluster [DBG] reconnect by client.37960400 10.141.16.16:0/2737464876 after 0.01
  -101> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.418627 mds.mds03 (mds.1) 17 : cluster [DBG] reconnect by client.37865319 10.141.16.21:0/1591580297 after 0.01
  -100> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.419295 mds.mds03 (mds.1) 18 : cluster [DBG] reconnect by client.37865333 10.141.16.20:0/3660809459 after 0.01
   -99> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.419728 mds.mds03 (mds.1) 19 : cluster [DBG] reconnect by client.37865372 10.141.16.19:0/3498217528 after 0.01
   -98> 2019-09-19 15:41:50.626 7f036c2f0700 10 log_client  logged 2019-09-19 15:41:47.466316 mds.mds03 (mds.1) 20 : cluster [DBG] reconnect by client.37865356 10.141.16.11:0/1948991158 after 0.0500002
   -97> 2019-09-19 15:41:53.336 7f03692ea700  5 mds.beacon.mds03 Sending beacon up:active seq 7
   -96> 2019-09-19 15:41:53.336 7f03692ea700 10 monclient: _send_mon_message to mon.mds03 at v2:10.141.16.3:3300/0
   -95> 2019-09-19 15:41:53.346 7f036faf7700  5 mds.beacon.mds03 received beacon reply up:active seq 7 rtt 0.01
   -94> 2019-09-19 15:41:55.216 7f0369aeb700  2 mds.1.cache Memory usage:  total 3260792, rss 2826672, heap 323836, baseline 323836, 477 / 888012 inodes have caps, 482 caps, 0.000542785 caps per inode
   -93> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209337247~1811 : EUpdate scatter_writebehind [metablob 0x10010dcb97f.101*, 2 dirs]
   -92> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209339078~1835 : EUpdate scatter_writebehind [metablob 0x20005af5c62, 2 dirs]
   -91> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209340933~1531 : EUpdate scatter_writebehind [metablob 0x101, 1 dirs]
   -90> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209342484~5656 : EUpdate scatter_writebehind [metablob 0x20005af5c63, 5 dirs]
   -89> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209348160~2846 : EUpdate scatter_writebehind [metablob 0x20005af5c6c, 2 dirs]
   -88> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209351026~11258 : EUpdate scatter_writebehind [metablob 0x1000bf2626b, 3 dirs]
   -87> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209362304~9900 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 7 dirs]
   -86> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209372224~8611 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
   -85> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209380855~2777 : EUpdate scatter_writebehind [metablob 0x10010dcb97f.101*, 2 dirs]
   -84> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209383652~2774 : EUpdate scatter_writebehind [metablob 0x20005af5c62, 2 dirs]
   -83> 2019-09-19 15:41:55.216 7f03652e2700  5 mds.1.log _submit_thread 30520209386446~4047 : EUpdate scatter_writebehind [metablob 0x20005af5c68, 3 dirs]
   -82> 2019-09-19 15:41:57.336 7f03692ea700  5 mds.beacon.mds03 Sending beacon up:active seq 8
   -81> 2019-09-19 15:41:57.336 7f03692ea700 10 monclient: _send_mon_message to mon.mds03 at v2:10.141.16.3:3300/0
   -80> 2019-09-19 15:41:57.346 7f036faf7700  5 mds.beacon.mds03 received beacon reply up:active seq 8 rtt 0.01
   -79> 2019-09-19 15:42:00.116 7f0367ae7700  4 mds.1.server handle_client_request client_request(client.37865319:2375 lookup #0x1001172f39d/koi8-c 2019-09-19 15:41:39.482105 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
   -78> 2019-09-19 15:42:00.116 7f0367ae7700  4 mds.1.server handle_client_request client_request(client.37865333:8866 lookup #0x100166004d4/Windows-MSVC-C.cmake 2019-09-19 15:41:39.479661 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
   -77> 2019-09-19 15:42:00.116 7f0367ae7700  4 mds.1.server handle_client_request client_request(client.37865372:5807 setattr size=5055 #0x20005a6eb66 2019-09-19 15:41:39.478859 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
   -76> 2019-09-19 15:42:00.116 7f0367ae7700  4 mds.1.server handle_client_request client_request(client.37865372:5808 lookup #0x20005a6eb3a/schema.cpython-37.pyc 2019-09-19 15:41:39.589633 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
   -75> 2019-09-19 15:42:00.116 7f0367ae7700  4 mds.1.server handle_client_request client_request(client.37865356:3604570 readdir #0x2000090d629 2019-09-19 15:41:39.523863 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4
   -74> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal mds.1 epoch 9 load mdsload<[pop IRD:[C 3.00e+00] IWR:[C 7.60e-01] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:4.5]
/[pop IRD:[C 3.00e+00] IWR:[C 7.60e-01] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:4.5]
, req 0, hr 0, qlen 0, cpu 0>
   -73> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal   import_map from 0 -> 4.52093
   -72> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal  prep_rebalance: cluster loads are
   -71> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal prep_rebalance:  my load 4.52093   target 2.26046   total 4.52093
   -70> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal  mds.0 is underloaded or barely overloaded.
   -69> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal   i am sufficiently overloaded
   -68> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal    - mds.1 exports 2.26046 to mds.0
   -67> 2019-09-19 15:42:00.116 7f0367ae7700  5 mds.1.bal rebalance done
   -66> 2019-09-19 15:42:00.116 7f03652e2700  5 mds.1.log _submit_thread 30520209390513~25561 : EUpdate setattr [metablob 0x1000bd8ac7b, 12 dirs]
   -65> 2019-09-19 15:42:00.116 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5809 rmxattr #0x20005a6eb66 security.ima 2019-09-19 15:42:00.124872 caller_uid=0, caller_gid=0{0,}) v4
   -64> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2376 readdir #0x1001172f3d2 2019-09-19 15:42:00.127381 caller_uid=0, caller_gid=0{0,}) v4
   -63> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8867 lookup #0x100166004d4/Windows-MSVC-CXX.cmake 2019-09-19 15:42:00.127537 caller_uid=0, caller_gid=0{0,}) v4
   -62> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5810 lookup #0x20005af5c74/lxml.cpython-37.pyc 2019-09-19 15:42:00.130435 caller_uid=0, caller_gid=0{0,}) v4
   -61> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8868 lookup #0x100166004d4/Windows-MSVC.cmake 2019-09-19 15:42:00.132050 caller_uid=0, caller_gid=0{0,}) v4
   -60> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5811 setattr mtime=2019-09-16 12:00:14.047762 atime=2019-09-19 15:42:00.121361 #0x20005a6eb66 2019-09-19 15:42:00.135404 caller_uid=0, caller_gid=0{0,}) v4
   -59> 2019-09-19 15:42:00.126 7f03652e2700  5 mds.1.log _submit_thread 30520209416094~2363 : EUpdate setattr [metablob 0x20005af5c6c, 2 dirs]
   -58> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5812 rmxattr #0x20005a6eb66 security.ima 2019-09-19 15:42:00.136273 caller_uid=0, caller_gid=0{0,}) v4
   -57> 2019-09-19 15:42:00.126 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8869 lookup #0x100166004d4/Windows-NMcl.cmake 2019-09-19 15:42:00.136439 caller_uid=0, caller_gid=0{0,}) v4
   -56> 2019-09-19 15:42:00.136 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5813 lookup #0x20005af5c74/minidom.cpython-37.pyc 2019-09-19 15:42:00.140199 caller_uid=0, caller_gid=0{0,}) v4
   -55> 2019-09-19 15:42:00.136 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8870 lookup #0x100166004d4/Windows-NVIDIA-CUDA.cmake 2019-09-19 15:42:00.140572 caller_uid=0, caller_gid=0{0,}) v4
   -54> 2019-09-19 15:42:00.136 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5814 open #0x20005a6eb67 2019-09-19 15:42:00.144319 caller_uid=0, caller_gid=0{0,}) v4
   -53> 2019-09-19 15:42:00.136 7f03652e2700  5 mds.1.log _submit_thread 30520209418477~1532 : EOpen [metablob 0x20005af5c74, 1 dirs], 1 open files
   -52> 2019-09-19 15:42:00.136 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8871 lookup #0x100166004d4/Windows-OpenWatcom-C.cmake 2019-09-19 15:42:00.144713 caller_uid=0, caller_gid=0{0,}) v4
   -51> 2019-09-19 15:42:00.146 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8872 lookup #0x100166004d4/Windows-OpenWatcom-CXX.cmake 2019-09-19 15:42:00.148593 caller_uid=0, caller_gid=0{0,}) v4
   -50> 2019-09-19 15:42:00.146 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8873 lookup #0x100166004d4/Windows-OpenWatcom.cmake 2019-09-19 15:42:00.152535 caller_uid=0, caller_gid=0{0,}) v4
   -49> 2019-09-19 15:42:00.156 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8874 lookup #0x100166004d4/Windows-PGI-C.cmake 2019-09-19 15:42:00.156622 caller_uid=0, caller_gid=0{0,}) v4
   -48> 2019-09-19 15:42:00.156 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2377 lookup #0x1001172f39d/koi8-r 2019-09-19 15:42:00.158324 caller_uid=0, caller_gid=0{0,}) v4
   -47> 2019-09-19 15:42:00.156 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865356:3604571 readdir #0x2000090d62b 2019-09-19 15:42:00.159812 caller_uid=0, caller_gid=0{0,}) v4
   -46> 2019-09-19 15:42:00.156 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8875 lookup #0x100166004d4/Windows-PGI-Fortran.cmake 2019-09-19 15:42:00.160603 caller_uid=0, caller_gid=0{0,}) v4
   -45> 2019-09-19 15:42:00.156 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2378 readdir #0x1001172f3d4 2019-09-19 15:42:00.161768 caller_uid=0, caller_gid=0{0,}) v4
   -44> 2019-09-19 15:42:00.156 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8876 lookup #0x100166004d4/Windows-PGI.cmake 2019-09-19 15:42:00.164482 caller_uid=0, caller_gid=0{0,}) v4
   -43> 2019-09-19 15:42:00.166 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8877 lookup #0x100166004d4/Windows-Watcom-C.cmake 2019-09-19 15:42:00.168639 caller_uid=0, caller_gid=0{0,}) v4
   -42> 2019-09-19 15:42:00.166 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8878 lookup #0x100166004d4/Windows-Watcom-CXX.cmake 2019-09-19 15:42:00.172441 caller_uid=0, caller_gid=0{0,}) v4
   -41> 2019-09-19 15:42:00.166 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8879 lookup #0x100166004d4/Windows-df.cmake 2019-09-19 15:42:00.175790 caller_uid=0, caller_gid=0{0,}) v4
   -40> 2019-09-19 15:42:00.176 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8880 lookup #0x100166004d4/Windows-windres.cmake 2019-09-19 15:42:00.179236 caller_uid=0, caller_gid=0{0,}) v4
   -39> 2019-09-19 15:42:00.176 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8881 lookup #0x100166004d4/Windows.cmake 2019-09-19 15:42:00.182733 caller_uid=0, caller_gid=0{0,}) v4
   -38> 2019-09-19 15:42:00.176 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8882 lookup #0x100166004d4/WindowsCE-MSVC-C.cmake 2019-09-19 15:42:00.186270 caller_uid=0, caller_gid=0{0,}) v4
   -37> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865356:3604572 lookup #0x2000090d465/fa 2019-09-19 15:42:00.189287 caller_uid=0, caller_gid=0{0,}) v4
   -36> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2379 lookup #0x1001172f39d/koi8-u 2019-09-19 15:42:00.189393 caller_uid=0, caller_gid=0{0,}) v4
   -35> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8883 lookup #0x100166004d4/WindowsCE-MSVC-CXX.cmake 2019-09-19 15:42:00.189747 caller_uid=0, caller_gid=0{0,}) v4
   -34> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8884 lookup #0x100166004d4/WindowsCE.cmake 2019-09-19 15:42:00.193093 caller_uid=0, caller_gid=0{0,}) v4
   -33> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2380 readdir #0x1001172f3d5 2019-09-19 15:42:00.193504 caller_uid=0, caller_gid=0{0,}) v4
   -32> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865356:3604573 readdir #0x2000090d49c 2019-09-19 15:42:00.194633 caller_uid=0, caller_gid=0{0,}) v4
   -31> 2019-09-19 15:42:00.186 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8885 lookup #0x100166004d4/WindowsPaths.cmake 2019-09-19 15:42:00.196436 caller_uid=0, caller_gid=0{0,}) v4
   -30> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8886 lookup #0x100166004d4/WindowsPhone-MSVC-C.cmake 2019-09-19 15:42:00.199747 caller_uid=0, caller_gid=0{0,}) v4
   -29> 2019-09-19 15:42:00.196 7f036b2ee700 10 monclient: tick
   -28> 2019-09-19 15:42:00.196 7f036b2ee700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-09-19 15:41:30.200562)
   -27> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8887 lookup #0x100166004d4/WindowsPhone-MSVC-CXX.cmake 2019-09-19 15:42:00.203132 caller_uid=0, caller_gid=0{0,}) v4
   -26> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5815 lookup #0x20005a6eb3a/selectable.cpython-37.pyc 2019-09-19 15:42:00.204970 caller_uid=0, caller_gid=0{0,}) v4
   -25> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8888 lookup #0x100166004d4/WindowsPhone.cmake 2019-09-19 15:42:00.206381 caller_uid=0, caller_gid=0{0,}) v4
   -24> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8889 lookup #0x100166004d4/WindowsStore-MSVC-C.cmake 2019-09-19 15:42:00.209703 caller_uid=0, caller_gid=0{0,}) v4
   -23> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8890 lookup #0x100166004d4/WindowsStore-MSVC-CXX.cmake 2019-09-19 15:42:00.213200 caller_uid=0, caller_gid=0{0,}) v4
   -22> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8891 lookup #0x100166004d4/WindowsStore.cmake 2019-09-19 15:42:00.216577 caller_uid=0, caller_gid=0{0,}) v4
   -21> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8892 lookup #0x100166004d4/Xenix.cmake 2019-09-19 15:42:00.220230 caller_uid=0, caller_gid=0{0,}) v4
   -20> 2019-09-19 15:42:00.216 7f0369aeb700  2 mds.1.cache Memory usage:  total 4603496, rss 4167920, heap 323836, baseline 323836, 501 / 1162471 inodes have caps, 506 caps, 0.00043528 caps per inode
   -19> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 30520209420029~9062 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
   -18> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 30520209429111~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
   -17> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 30520209439710~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 dirs]
   -16> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 30520209442035~1845 : EUpdate scatter_writebehind [metablob 0x1000c233753, 2 dirs]
   -15> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8893 lookup #0x100166004d4/eCos.cmake 2019-09-19 15:42:00.223360 caller_uid=0, caller_gid=0{0,}) v4
   -14> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2381 lookup #0x1001172f39d/microsoft-cp1251 2019-09-19 15:42:00.224940 caller_uid=0, caller_gid=0{0,}) v4
   -13> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8894 lookup #0x100166004d4/gas.cmake 2019-09-19 15:42:00.226624 caller_uid=0, caller_gid=0{0,}) v4
   -12> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865319:2382 readdir #0x1001172f3d7 2019-09-19 15:42:00.228673 caller_uid=0, caller_gid=0{0,}) v4
   -11> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8895 lookup #0x100166004d4/kFreeBSD.cmake 2019-09-19 15:42:00.229668 caller_uid=0, caller_gid=0{0,}) v4
   -10> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8896 lookup #0x100166004d4/syllable.cmake 2019-09-19 15:42:00.232746 caller_uid=0, caller_gid=0{0,}) v4
    -9> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865333:8897 readdir #0x10016601379 2019-09-19 15:42:00.240672 caller_uid=0, caller_gid=0{0,}) v4
    -8> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865356:3604574 readdir #0x2000090d630 2019-09-19 15:42:00.241832 caller_uid=0, caller_gid=0{0,}) v4
    -7> 2019-09-19 15:42:00.266 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865356:3604575 readdir #0x2000090d631 2019-09-19 15:42:00.272158 caller_uid=0, caller_gid=0{0,}) v4
    -6> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 30520209443900~3089 : EUpdate scatter_writebehind [metablob 0x20005af5c63, 3 dirs]
    -5> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 30520209447009~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
    -4> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 30520209457608~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 dirs]
    -3> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 30520209459933~1030 : EUpdate check_inode_max_size [metablob 0x20005af5c74, 1 dirs]
    -2> 2019-09-19 15:42:00.326 7f036c2f0700  4 mds.1.server handle_client_request client_request(client.37865372:5816 setattr size=1138 #0x20005a6eb67 2019-09-19 15:42:00.333015 caller_uid=0, caller_gid=0{0,}) v4
    -1> 2019-09-19 15:42:00.336 7f036c2f0700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mds/MDCache.cc: In function 'CInode* MDCache::cow_inode(CInode*, snapid_t)' thread 7f036c2f0700 time 2019-09-19 15:42:00.333567
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mds/MDCache.cc: 1498: FAILED ceph_assert(!lock->get_num_wrlocks())

 ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f0375773ac2]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f0375773c90]
 3: (()+0x1ee48d) [0x55a7e4ccf48d]
 4: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x823) [0x55a7e4ccfcb3]
 5: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xbc) [0x55a7e4cd042c]
 6: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t, boost::intrusive_ptr<MClientCaps const> const&, boost::intrusive_ptr<MClientCaps> const&, bool*)+0xfb6) [0x55a7e4d957e6]
 7: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps const> const&)+0x2059) [0x55a7e4d9c8e9]
 8: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0xe7) [0x55a7e4daaf97]
 9: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x304) [0x55a7e4c089e4]
 10: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x6eb) [0x55a7e4c0b1bb]
 11: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x40) [0x55a7e4c0b8d0]
 12: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfc) [0x55a7e4bf82ac]
 13: (DispatchQueue::entry()+0x12a9) [0x7f0375968dd9]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f0375a183dd]
 15: (()+0x7dd5) [0x7f0373642dd5]
 16: (clone()+0x6d) [0x7f03722f302d]

     0> 2019-09-19 15:42:00.336 7f036c2f0700 -1 *** Caught signal (Aborted) **
 in thread 7f036c2f0700 thread_name:ms_dispatch

 ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
 1: (()+0xf5d0) [0x7f037364a5d0]
 2: (gsignal()+0x37) [0x7f037222b2c7]
 3: (abort()+0x148) [0x7f037222c9b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f0375773b11]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f0375773c90]
 6: (()+0x1ee48d) [0x55a7e4ccf48d]
 7: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x823) [0x55a7e4ccfcb3]
 8: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xbc) [0x55a7e4cd042c]
 9: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t, boost::intrusive_ptr<MClientCaps const> const&, boost::intrusive_ptr<MClientCaps> const&, bool*)+0xfb6) [0x55a7e4d957e6]
 10: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps const> const&)+0x2059) [0x55a7e4d9c8e9]
 11: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0xe7) [0x55a7e4daaf97]
 12: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x304) [0x55a7e4c089e4]
 13: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x6eb) [0x55a7e4c0b1bb]
 14: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x40) [0x55a7e4c0b8d0]
 15: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfc) [0x55a7e4bf82ac]
 16: (DispatchQueue::entry()+0x12a9) [0x7f0375968dd9]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f0375a183dd]
 18: (()+0x7dd5) [0x7f0373642dd5]
 19: (clone()+0x6d) [0x7f03722f302d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.mds03.log
--- end dump of recent events ---
[root@mds03 ~]# 

#2 Updated by Dan van der Ster 7 months ago

Looks like the backport of https://tracker.ceph.com/issues/39987 to nautilus was incomplete, it's missing https://github.com/ceph/ceph/pull/28459

#4 Updated by Patrick Donnelly 7 months ago

  • Status changed from New to Duplicate

#5 Updated by Patrick Donnelly 7 months ago

  • Duplicates Bug #41948: nautilus: mds: incomplete backport of #40444 (MDCache::cow_inode does not cleanup unneeded client_snap_caps) added

Also available in: Atom PDF