Project

General

Profile

Actions

Support #20788

closed

MDS report "failed to open ino 10007be02d9 err -61/0" and can not restart success

Added by dongdong tao almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

ceph version is v10.2.8
now my ceph can not restart
i have cephfs_metadata and cephfs_data pools
i can not reproduce this, and didn't enable high level debug log.
please tell me how to fix this, and through rados tools, i found this inode belong to pool cephfs_data

log here:
2017-07-27 09:26:15.442277 7f1613ff8700 1 mds.0.1977485 reconnect_done
2017-07-27 09:26:16.224473 7f1613ff8700 1 mds.0.1977485 handle_mds_map i am now mds.0.1977485
2017-07-27 09:26:16.224475 7f1613ff8700 1 mds.0.1977485 handle_mds_map state change up:reconnect --> up:rejoin
2017-07-27 09:26:16.224480 7f1613ff8700 1 mds.0.1977485 rejoin_start
2017-07-27 09:26:20.510579 7f1613ff8700 1 mds.0.1977485 rejoin_joint_start
2017-07-27 09:26:21.370883 7f160f6ee700 0 mds.0.cache failed to open ino 10007be02d9 err -61/0
tcmalloc: large alloc 1560289280 bytes 0x7f170fce2000 0x7f1619bb5bf3 0x7f1619bd8115 0x7f161a2e18d3 0x7f161a42e8bd 0x7f161a43054f 0x7f161a49e8f4 0x7f161a56e2a6 0x7f1619423dc5 0x7f1617ef076d (nil)
tcmalloc: large alloc 3120570368 bytes 0x7f17a0544000
0x7f1619bb5bf3 0x7f1619bd8115 0x7f161a2e18d3 0x7f161a42e8bd 0x7f161a43054f 0x7f161a49e8f4 0x7f161a56e2a6 0x7f1619423dc5 0x7f1617ef076d (nil)

Actions #1

Updated by Greg Farnum over 6 years ago

  • Tracker changed from Bug to Support

61 is ENODATA. Sounds like something broke in the cluster; you'll need to provide a timeline of events.

Actions #2

Updated by Greg Farnum over 6 years ago

  • Project changed from Ceph to CephFS
Actions #3

Updated by Zheng Yan over 6 years ago

"failed to open ino" is a normal when mds is recovery. what do you mean "ceph can not restart"? mds crashed or mds hung at rejoin state?

Actions #4

Updated by dongdong tao over 6 years ago

now we have figure out the reason:
it's killed by docker when mds reach its memory limit

thanks for your help!

Actions #5

Updated by Zheng Yan over 6 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF