Project

General

Profile

Bug #2385

max mds = 2, mds hang and crash

Added by Yavuz Selim Komur almost 12 years ago. Updated over 10 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

History

#1 Updated by Yavuz Selim Komur almost 12 years ago

Sorry..

ceph.com/debian repo wheezy binaries. hang all client task.

May  5 11:36:10 kahya ceph-mds: 2012-05-05 11:36:10.979694 7f3faebf0700 -1 *** Caught signal (Aborted) **#012 in thread 7f3faebf0700#012#012 ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)#012 1: /usr/bin/ceph-mds() [0x7dabb8]#012 2: (()+0xf030) [0x7f3fb36bc030]#012 3: (gsignal()+0x35) [0x7f3fb1e42475]#012 4: (abort()+0x180) [0x7f3fb1e456f0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3fb269352d]#012 6: (()+0x62636) [0x7f3fb2691636]#012 7: (()+0x62663) [0x7f3fb2691663]#012 8: (()+0x6288e) [0x7f3fb269188e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x261) [0x774b31]#012 10: (Server::do_rename_rollback(ceph::buffer::list&, int, MDRequest*)+0x2a01) [0x519871]#012 11: (MDCache::handle_resolve_ack(MMDSResolveAck*)+0x1025) [0x5a0135]#012 12: (MDCache::dispatch(Message*)+0x145) [0x5b7ec5]#012 13: (MDS::handle_deferrable_message(Message*)+0x760) [0x4c1360]#012 14: (MDS::_dispatch(Message*)+0x702) [0x4d3712]#012 15: (MDS::ms_dispatch(Message*)+0x1d3) [0x4d4ac3]#012 16: (SimpleMessenger::dispatch_entry()+0x853) [0x758b63]#012 17: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7224ad]#012 18: (()+0x6b50) [0x7f3fb36b3b50]#012 19: (clone()+0x6d) [0x7f3fb1ee890d]#012 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May  5 11:36:10 kahya ceph-mds: --- begin dump of recent events ---
May  5 11:36:10 kahya ceph-mds:      0> 2012-05-05 11:36:10.979694 7f3faebf0700 -1 *** Caught signal (Aborted) **#012 in thread 7f3faebf0700#012#012 ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)#012 1: /usr/bin/ceph-mds() [0x7dabb8]#012 2: (()+0xf030) [0x7f3fb36bc030]#012 3: (gsignal()+0x35) [0x7f3fb1e42475]#012 4: (abort()+0x180) [0x7f3fb1e456f0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3fb269352d]#012 6: (()+0x62636) [0x7f3fb2691636]#012 7: (()+0x62663) [0x7f3fb2691663]#012 8: (()+0x6288e) [0x7f3fb269188e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x261) [0x774b31]#012 10: (Server::do_rename_rollback(ceph::buffer::list&, int, MDRequest*)+0x2a01) [0x519871]#012 11: (MDCache::handle_resolve_ack(MMDSResolveAck*)+0x1025) [0x5a0135]#012 12: (MDCache::dispatch(Message*)+0x145) [0x5b7ec5]#012 13: (MDS::handle_deferrable_message(Message*)+0x760) [0x4c1360]#012 14: (MDS::_dispatch(Message*)+0x702) [0x4d3712]#012 15: (MDS::ms_dispatch(Message*)+0x1d3) [0x4d4ac3]#012 16: (SimpleMessenger::dispatch_entry()+0x853) [0x758b63]#012 17: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7224ad]#012 18: (()+0x6b50) [0x7f3fb36b3b50]#012 19: (clone()+0x6d) [0x7f3fb1ee890d]#012 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May  5 11:36:10 kahya ceph-mds: --- end dump of recent events ---

#2 Updated by Yavuz Selim Komur almost 12 years ago

ceph hate symlink. :)

i can't rsync debian and ubuntu mirror from local disk to cephfs.

allways, allways when debian/dists rsync. ceph-mds goes to crash/kill. it can't back. i recreate ceph. re rsync again. last time i sperate pool and dist folder.. ubuntu/pool and debian/pool success. but specialy debian/dists folder malfunction.

please calculate/resolve when create folder/file, resolve "realpath".

#3 Updated by Greg Farnum almost 12 years ago

The full Ceph filesystem is not currently well-tested, but if you can recreate this with MDS logging on and post the logs somewhere we should be able to take a look at some point. :)

#4 Updated by Yavuz Selim Komur almost 12 years ago

net.core.wmem_max = 536870912
net.core.rmem_max = 536870912
net.core.wmem_default = 262144
net.core.rmem_default = 262144
net.ipv4.tcp_wmem = 65536 262144 67108864
net.ipv4.tcp_rmem = 65536 262144 67108864


ceph like/need/want more memory..

with this setting 0.46 rsync success.. 0.47 better..

#5 Updated by Yavuz Selim Komur over 11 years ago

keepalive:/mnt/srv/ftp/debian/pool/main# cd h/
keepalive:/mnt/srv/ftp/debian/pool/main/h# ls -la
total 0
drwxr-xr-x 1 root root 18446744073702009635 Jun 17 13:19 .
drwxr-xr-x 1 root root         145343684536 Jun 17 12:56 ..
drwxrwxr-x 1 root root               186043 Jun 16 11:25 haskell-data-accessor-mtl
drwxrwxr-x 1 root root               271578 Jun 16 11:25 haskell-data-accessor-template
drwxrwxr-x 1 root root               931014 Jun 17 13:16 haskell-filestore
drwxrwxr-x 1 root root 18446744073709485334 Jun 17 13:16 haskell-happstack
drwxrwxr-x 1 root root 18446744073709495241 Jun 17 13:17 haskell-llvm
drwxrwxr-x 1 root root 18446744073709370402 Jun 17 13:17 haskell-rsa
drwxrwxr-x 1 root root 18446744073709531853 Jun 17 13:18 haskell-warp
drwxrwxr-x 1 root root 18446744073709087324 Jun 17 13:18 haskell-yaml
keepalive:/mnt/srv/ftp/debian/pool/main/h# cd haskell-rsa 
keepalive:/mnt/srv/ftp/debian/pool/main/h/haskell-rsa# ls -la
total 0
drwxrwxr-x 1 root root 18446744073709370402 Jun 17 13:17 .
drwxr-xr-x 1 root root 18446744073702009635 Jun 17 13:19 ..
keepalive:/mnt/srv/ftp/debian/pool/main/h/haskell-rsa# ls -lah
total 0
drwxrwxr-x 1 root root 16E Jun 17 13:17 .
drwxr-xr-x 1 root root 16E Jun 17 13:19 ..
keepalive:/mnt/srv/ftp/debian/pool/main/h/haskell-rsa# 

#6 Updated by Yavuz Selim Komur over 11 years ago

2012-06-17 16:49:46.941046 mds.0 192.168.10.110:6800/30426 68 : [ERR] unmatched rstat rbytes on single dirfrag 100000001f3, inode has n(v1 rc2012-06-17 16:49:46.941015 b2328889 4=3+1), dirfrag has n(v1 rc2012-06-17 16:49:46.941015 b2393421 3=3+0)

#7 Updated by Ian Colle almost 11 years ago

  • Project changed from Ceph to CephFS

#8 Updated by Zheng Yan over 10 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF