Support #20788
closedMDS report "failed to open ino 10007be02d9 err -61/0" and can not restart success
0%
Description
ceph version is v10.2.8
now my ceph can not restart
i have cephfs_metadata and cephfs_data pools
i can not reproduce this, and didn't enable high level debug log.
please tell me how to fix this, and through rados tools, i found this inode belong to pool cephfs_data
log here:
2017-07-27 09:26:15.442277 7f1613ff8700 1 mds.0.1977485 reconnect_done
2017-07-27 09:26:16.224473 7f1613ff8700 1 mds.0.1977485 handle_mds_map i am now mds.0.1977485
2017-07-27 09:26:16.224475 7f1613ff8700 1 mds.0.1977485 handle_mds_map state change up:reconnect --> up:rejoin
2017-07-27 09:26:16.224480 7f1613ff8700 1 mds.0.1977485 rejoin_start
2017-07-27 09:26:20.510579 7f1613ff8700 1 mds.0.1977485 rejoin_joint_start
2017-07-27 09:26:21.370883 7f160f6ee700 0 mds.0.cache failed to open ino 10007be02d9 err -61/0
tcmalloc: large alloc 1560289280 bytes 0x7f170fce2000 0x7f1619bb5bf3 0x7f1619bd8115 0x7f161a2e18d3 0x7f161a42e8bd 0x7f161a43054f 0x7f161a49e8f4 0x7f161a56e2a6 0x7f1619423dc5 0x7f1617ef076d (nil)
0x7f1619bb5bf3 0x7f1619bd8115 0x7f161a2e18d3 0x7f161a42e8bd 0x7f161a43054f 0x7f161a49e8f4 0x7f161a56e2a6 0x7f1619423dc5 0x7f1617ef076d (nil)
tcmalloc: large alloc 3120570368 bytes 0x7f17a0544000
Updated by Greg Farnum over 6 years ago
- Tracker changed from Bug to Support
61 is ENODATA. Sounds like something broke in the cluster; you'll need to provide a timeline of events.
Updated by Zheng Yan over 6 years ago
"failed to open ino" is a normal when mds is recovery. what do you mean "ceph can not restart"? mds crashed or mds hung at rejoin state?
Updated by dongdong tao over 6 years ago
now we have figure out the reason:
it's killed by docker when mds reach its memory limit
thanks for your help!