Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2012-09-18T03:07:29ZCeph
Redmine Ceph - Bug #3172 (Resolved): ceph::buffer::bad_alloc downloading a large object using rados https://tracker.ceph.com/issues/31722012-09-18T03:07:29ZJohn Leachjohn@johnleach.co.uk
<p>got a 1gig object, when I get it I get a bad_alloc error.</p>
<p>This is on a 64bit Ubuntu 12.04.1 LTS box, with packages installed from the ceph repository.</p>
<pre>
root@srv-2bz4v:~# rados -p test get zero.1 z1
terminate called after throwing an instance of 'ceph::buffer::bad_alloc'
what(): buffer::bad_alloc
*** Caught signal (Aborted) **
in thread 7f5064534700
ceph version 0.51 (commit:c03ca95d235c9a072dcd8a77ad5274a52e93ae30)
1: rados() [0x4434fa]
2: (()+0xfcb0) [0x7f5069e98cb0]
3: (gsignal()+0x35) [0x7f5068ee5445]
4: (abort()+0x17b) [0x7f5068ee8bab]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f506983369d]
6: (()+0xb5846) [0x7f5069831846]
7: (()+0xb5873) [0x7f5069831873]
8: (()+0xb596e) [0x7f506983196e]
9: (ceph::buffer::create_page_aligned(unsigned int)+0x95) [0x7f506a849635]
10: (SimpleMessenger::Pipe::read_message(Message**)+0x1db0) [0x7f506a7f02b0]
11: (SimpleMessenger::Pipe::reader()+0xb44) [0x7f506a804734]
12: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x7f506a6db2cd]
13: (()+0x7e9a) [0x7f5069e90e9a]
14: (clone()+0x6d) [0x7f5068fa14bd]
2012-09-18 08:51:14.390520 7f5064534700 -1 *** Caught signal (Aborted) **
in thread 7f5064534700
</pre>
<p>attached full output from crash and apport core dump data.</p> Messengers - Bug #1674 (Can't reproduce): daemons crash when sent random datahttps://tracker.ceph.com/issues/16742011-11-03T06:29:55ZJohn Leachjohn@johnleach.co.uk
<p>mon seem to crash every time, osd seem to take a few attempts (similar stack trace). not tested mds</p>
<pre>
( echo "ceph v027" ; dd if=/dev/urandom ) | nc 10.82.103.194 6789
</pre>
<p>I assume this is probably fixable, but it hints at deeper problems perhaps?</p>
<p>I know ceph isn't supposed to be byzantine fault tolerant, but I doubt that means this :)</p>
<pre>
*** Caught signal (Aborted) **
in thread 7f94ec6ac700
ceph version (commit:)
1: /usr/bin/ceph-mon() [0x5c3124]
2: (()+0xf8f0) [0x7f94f12398f0]
3: (gsignal()+0x35) [0x7f94efc02a75]
4: (abort()+0x180) [0x7f94efc065c0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f94f04b88e5]
6: (()+0xcad16) [0x7f94f04b6d16]
7: (()+0xcad43) [0x7f94f04b6d43]
8: (()+0xcae3e) [0x7f94f04b6e3e]
9: (operator new[](unsigned long)+0x3be) [0x7f94f072e95e]
10: (ceph::buffer::create(unsigned int)+0x67) [0x552b57]
11: (SimpleMessenger::Pipe::accept()+0x1365) [0x539fc5]
12: (SimpleMessenger::Pipe::reader()+0x109d) [0x542d2d]
13: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x45eced]
14: (()+0x69ca) [0x7f94f12309ca]
15: (clone()+0x6d) [0x7f94efcb570d]
</pre> Ceph - Bug #1628 (Resolved): segfault attempting to map an rbd snapshothttps://tracker.ceph.com/issues/16282011-10-18T15:18:56ZJohn Leachjohn@johnleach.co.uk
<pre>
# gdb --args rbd -n client.srv-3ucz4 --snap=two map test --user srv-3ucz4 --secret secret
Reading symbols from /usr/bin/rbd...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/rbd -n client.srv-3ucz4 --snap=two map test --user srv-3ucz4 --secret secret
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff5f75700 (LWP 1321)]
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bcb170 in librbd::snap_set(librbd::ImageCtx*, char const*) () from /usr/lib/librbd.so.1
(gdb) bt
#0 0x00007ffff7bcb170 in librbd::snap_set(librbd::ImageCtx*, char const*) () from /usr/lib/librbd.so.1
#1 0x000000000044dd88 in main ()
(gdb)
</pre>
<p>same with the test@two useage</p>
<p>device maps fine on it's own, just not snapshot (not sure if I'm supposed to be able to map a snapshot, but it certainly shouldn't segfault!)</p>
<p>this a build from master from a couple of days ago.</p> CephFS - Bug #1535 (Resolved): concurrent creating and removing directories crashes cmdshttps://tracker.ceph.com/issues/15352011-09-12T13:43:19ZJohn Leachjohn@johnleach.co.uk
<p>setup two clients with a mounted ceph filesystem, had one creating a hierarchy of empty directories in a loop and the other deleting them in a loop.</p>
<p>got two mds servers, one crashed:</p>
<pre>
2011-09-12 18:02:35.414589 7f1c719b4700 mds0.migrator nicely exporting to mds1 [dir 10000010809 /dirs/ [2,head] auth{1=1} pv=3421 v=3417 cv=0/0 ap=2+2+3 state=1610612738|complete f(v6 m2011-09-12 17:40:09.473169 4=0+4) n(v214 rc2011-09-12 18:02:34.405528 1260=0+1260)/n(v214 rc2011-09-12 18:02:34.347155 1261=0+1261) hs=4+1,ss=0+0 dirty=4 | child replicated dirty authpin 0x122395b8]
2011-09-12 18:02:45.183404 7f1c719b4700 mds0.bal mds0 mdsload<[79.2128,0 79.2128]/[7.27142,0 7.27142], req 0, hr 0, qlen 0, cpu 0.25> = 64.8246 ~ 79.2128
2011-09-12 18:02:45.183454 7f1c719b4700 mds0.bal mds1 mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 0.33> = 532.38 ~ 650.545
mds/MDCache.cc: In function 'void MDCache::handle_dentry_link(MDentryLink*)', in thread '0x7f1c719b4700'
mds/MDCache.cc: 9213: FAILED assert(dn)
ceph version (commit:)
1: (MDCache::handle_dentry_link(MDentryLink*)+0x361) [0x56a1c1]
2: (MDCache::dispatch(Message*)+0x175) [0x5aced5]
3: (MDS::handle_deferrable_message(Message*)+0x60f) [0x4a333f]
4: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
5: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
6: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
8: (()+0x69ca) [0x7f1c752339ca]
9: (clone()+0x6d) [0x7f1c73cb870d]
ceph version (commit:)
1: (MDCache::handle_dentry_link(MDentryLink*)+0x361) [0x56a1c1]
2: (MDCache::dispatch(Message*)+0x175) [0x5aced5]
3: (MDS::handle_deferrable_message(Message*)+0x60f) [0x4a333f]
4: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
5: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
6: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
8: (()+0x69ca) [0x7f1c752339ca]
9: (clone()+0x6d) [0x7f1c73cb870d]
*** Caught signal (Aborted) **
in thread 0x7f1c719b4700
ceph version (commit:)
1: /usr/bin/cmds() [0x794e74]
2: (()+0xf8f0) [0x7f1c7523c8f0]
3: (gsignal()+0x35) [0x7f1c73c05a75]
4: (abort()+0x180) [0x7f1c73c095c0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f1c744bb8e5]
6: (()+0xcad16) [0x7f1c744b9d16]
7: (()+0xcad43) [0x7f1c744b9d43]
8: (()+0xcae3e) [0x7f1c744b9e3e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x716cdf]
10: (MDCache::handle_dentry_link(MDentryLink*)+0x361) [0x56a1c1]
11: (MDCache::dispatch(Message*)+0x175) [0x5aced5]
12: (MDS::handle_deferrable_message(Message*)+0x60f) [0x4a333f]
13: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
14: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
15: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
17: (()+0x69ca) [0x7f1c752339ca]
18: (clone()+0x6d) [0x7f1c73cb870d]
</pre>
<p>The filesystem hung on both servers.</p>
<p>I restarted cmds on the crashed node and it crashed again during startup:<br /><pre>
2011-09-12 18:15:36.018540 7ffb8e9c3700 mds0.3 reconnect_done
2011-09-12 18:15:36.027241 7ffb8e9c3700 mds0.3 handle_mds_map i am now mds0.3
2011-09-12 18:15:36.027259 7ffb8e9c3700 mds0.3 handle_mds_map state change up:reconnect --> up:rejoin
2011-09-12 18:15:36.027266 7ffb8e9c3700 mds0.3 rejoin_joint_start
2011-09-12 18:15:36.018540 7ffb8e9c3700 mds0.3 reconnect_done
2011-09-12 18:15:36.027241 7ffb8e9c3700 mds0.3 handle_mds_map i am now mds0.3
2011-09-12 18:15:36.027259 7ffb8e9c3700 mds0.3 handle_mds_map state change up:reconnect --> up:rejoin
2011-09-12 18:15:36.027266 7ffb8e9c3700 mds0.3 rejoin_joint_start
mds/MDCache.cc: In function 'CDir* MDCache::rejoin_invent_dirfrag(dirfrag_t)', in thread '0x7ffb8e9c3700'
mds/MDCache.cc: 3937: FAILED assert(in->is_dir())
ceph version (commit:)
1: (MDCache::rejoin_invent_dirfrag(dirfrag_t)+0x219) [0x570a89]
2: (MDCache::handle_cache_rejoin_strong(MMDSCacheRejoin*)+0x418e) [0x590b0e]
3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x1b3) [0x5a55f3]
4: (MDCache::dispatch(Message*)+0x105) [0x5ace65]
5: (MDS::handle_deferrable_message(Message*)+0x60f) [0x4a333f]
6: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
7: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
8: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
10: (()+0x69ca) [0x7ffb922429ca]
11: (clone()+0x6d) [0x7ffb90cc770d]
ceph version (commit:)
1: (MDCache::rejoin_invent_dirfrag(dirfrag_t)+0x219) [0x570a89]
2: (MDCache::handle_cache_rejoin_strong(MMDSCacheRejoin*)+0x418e) [0x590b0e]
3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x1b3) [0x5a55f3]
4: (MDCache::dispatch(Message*)+0x105) [0x5ace65]
5: (MDS::handle_deferrable_message(Message*)+0x60f) [0x4a333f]
6: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
7: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
8: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
10: (()+0x69ca) [0x7ffb922429ca]
11: (clone()+0x6d) [0x7ffb90cc770d]
*** Caught signal (Aborted) **
in thread 0x7ffb8e9c3700
ceph version (commit:)
1: /usr/bin/cmds() [0x794e74]
2: (()+0xf8f0) [0x7ffb9224b8f0]
3: (gsignal()+0x35) [0x7ffb90c14a75]
4: (abort()+0x180) [0x7ffb90c185c0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ffb914ca8e5]
6: (()+0xcad16) [0x7ffb914c8d16]
7: (()+0xcad43) [0x7ffb914c8d43]
8: (()+0xcae3e) [0x7ffb914c8e3e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x716cdf]
10: (MDCache::rejoin_invent_dirfrag(dirfrag_t)+0x219) [0x570a89]
11: (MDCache::handle_cache_rejoin_strong(MMDSCacheRejoin*)+0x418e) [0x590b0e]
12: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x1b3) [0x5a55f3]
13: (MDCache::dispatch(Message*)+0x105) [0x5ace65]
14: (MDS::handle_deferrable_message(Message*)+0x60f) [0x4a333f]
15: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
16: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
17: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
18: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
19: (()+0x69ca) [0x7ffb922429ca]
20: (clone()+0x6d) [0x7ffb90cc770d]
</pre></p>
<p>filesystem stayed stuck on the clients.</p>
<p>stopped cmds on the other node and then started it on both nodes and they started successfully this time. Within a minute the filesystem on the clients started working again.</p>
<p>Started up the test scripts again and cmds crashed again almost immediately (on the other node this time) with a different error:</p>
<pre>
mds/CInode.cc: In function 'virtual void CInode::auth_unpin(void*)', in thread '0x7f084ddab700'
mds/CInode.cc: 1946: FAILED assert(auth_pins >= 0)
ceph version (commit:)
1: (CInode::auth_unpin(void*)+0x49e) [0x66dd7e]
2: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, std::allocator<Context*> >*)+0x4ef) [0x5e4c1f]
3: (Locker::handle_file_lock(ScatterLock*, MLock*)+0xf22) [0x5f6582]
4: (Locker::handle_lock(MLock*)+0x1e6) [0x5f6be6]
5: (MDS::handle_deferrable_message(Message*)+0x62f) [0x4a335f]
6: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
7: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
8: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
10: (()+0x69ca) [0x7f085162a9ca]
11: (clone()+0x6d) [0x7f08500af70d]
ceph version (commit:)
1: (CInode::auth_unpin(void*)+0x49e) [0x66dd7e]
2: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, std::allocator<Context*> >*)+0x4ef) [0x5e4c1f]
3: (Locker::handle_file_lock(ScatterLock*, MLock*)+0xf22) [0x5f6582]
4: (Locker::handle_lock(MLock*)+0x1e6) [0x5f6be6]
5: (MDS::handle_deferrable_message(Message*)+0x62f) [0x4a335f]
6: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
7: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
8: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
10: (()+0x69ca) [0x7f085162a9ca]
11: (clone()+0x6d) [0x7f08500af70d]
*** Caught signal (Aborted) **
in thread 0x7f084ddab700
ceph version (commit:)
1: /usr/bin/cmds() [0x794e74]
2: (()+0xf8f0) [0x7f08516338f0]
3: (gsignal()+0x35) [0x7f084fffca75]
4: (abort()+0x180) [0x7f08500005c0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f08508b28e5]
6: (()+0xcad16) [0x7f08508b0d16]
7: (()+0xcad43) [0x7f08508b0d43]
8: (()+0xcae3e) [0x7f08508b0e3e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x716cdf]
10: (CInode::auth_unpin(void*)+0x49e) [0x66dd7e]
11: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, std::allocator<Context*> >*)+0x4ef) [0x5e4c1f]
12: (Locker::handle_file_lock(ScatterLock*, MLock*)+0xf22) [0x5f6582]
13: (Locker::handle_lock(MLock*)+0x1e6) [0x5f6be6]
14: (MDS::handle_deferrable_message(Message*)+0x62f) [0x4a335f]
15: (MDS::_dispatch(Message*)+0x5e5) [0x4b92f5]
16: (MDS::ms_dispatch(Message*)+0x71) [0x4ba5c1]
17: (SimpleMessenger::dispatch_entry()+0x879) [0x722769]
18: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49439c]
19: (()+0x69ca) [0x7f085162a9ca]
20: (clone()+0x6d) [0x7f08500af70d]
</pre>
<p>Repeated another time with the same auth_unpin error.</p>
<p>This is with latest code from master.</p> Ceph - Bug #1502 (Resolved): osd FAILED assert(pg->log.tail <= pg->info.last_complete || pg->log....https://tracker.ceph.com/issues/15022011-09-04T11:26:12ZJohn Leachjohn@johnleach.co.uk
<p>I had a 4 osd cluster. I kill -9 one cosd process (as a test) - it was detected as failed and the cluster became degraded and started rebuilding.</p>
<p>I then added a 5th osd node, added it to the cluster - at which point (though it's hard to tell exactly which point) the whole cluster fell apart. I think a couple of cosd processes crashed (I think in some cases due to huge ram inflation and OOM killer). Restarting them gave loads of errors like this:</p>
<pre>
2011-09-04 18:02:23.916216 7f1d12bc8720 osd2 152 pg[1.45( v 24'57 lc 0'0 (4'55,24'57]+backlog n=47 ec=1 les/c 113/121 149/149/149) [] r=0 (info mismatch, log(4'55,0'0]+backlog) (log bound mismatch, empty) mlcod 0'0 inactive] Got exception 'read_log_error: Could not find hash for hoid 20000000465.00000000/head
' while reading log. Moving corrupted log file to 'corrupt_log_2011-09-04_18:02_1.45' for later analysis.
2011-09-04 18:02:23.968465 7f1d12bc8720 osd2 152 pg[0.6f( v 4'803 lc 0'0 (4'801,4'803]+backlog n=801 ec=1 les/c 128/53 149/149/54) [] r=0 (info mismatch, log(4'801,0'0]+backlog) (log bound mismatch, empty) mlcod 0'0 inactive] Got exception 'read_log_error: Could not find hash for hoid 10000000024.00000000/head
</pre>
<p>(these servers were not rebooted, so no unflushed written data should have got lost).</p>
<p>attached a log from one of the osds that crashed - no evidence that this one OOmed, but two others had processes killed by the OOM killer for sure.</p>
<p>This server now crashes with the fails assert every time I start it now - looks like the data is corrupt on disk now (but again, no evidence of filesystem corruption here).</p>
<p>the crashes occurred with packages built from git commit e8b12d80b5, but I've since upgraded to commit 933e7945a.</p> Ceph - Bug #1470 (Closed): broken osd after filestore upgradehttps://tracker.ceph.com/issues/14702011-08-31T03:30:10ZJohn Leachjohn@johnleach.co.uk
<p>upgraded to latest master (commit 961260d3a) and cosd began automatic upgrade of the filestore, which seemed to complete fine but then failed to start the osds. All the osds refuse to start reporting "unable to read osd superblock".</p>
<pre>
2011-08-31 10:08:25.769905 7f701f117720 Updating collection 2.e1_head current version is 0
2011-08-31 10:08:25.783932 7f701f117720 collection 2.e1_head updated
2011-08-31 10:08:25.783971 7f701f117720 480/486 processed
2011-08-31 10:08:25.784022 7f701f117720 Updating collection 2.40_head current version is 0
2011-08-31 10:08:25.797404 7f701f117720 collection 2.40_head updated
2011-08-31 10:08:25.797440 7f701f117720 481/486 processed
2011-08-31 10:08:25.797486 7f701f117720 Updating collection 1.ea_head current version is 0
2011-08-31 10:08:25.813655 7f701f117720 collection 1.ea_head updated
2011-08-31 10:08:25.813688 7f701f117720 482/486 processed
2011-08-31 10:08:25.813727 7f701f117720 Updating collection 1.14_head current version is 0
2011-08-31 10:08:25.834735 7f701f117720 collection 1.14_head updated
2011-08-31 10:08:25.834779 7f701f117720 483/486 processed
2011-08-31 10:08:25.834823 7f701f117720 Updating collection 1.9b_head current version is 0
2011-08-31 10:08:25.860830 7f701f117720 collection 1.9b_head updated
2011-08-31 10:08:25.860863 7f701f117720 484/486 processed
2011-08-31 10:08:25.860889 7f701f117720 Updating collection 1.6a_head current version is 0
2011-08-31 10:08:25.879706 7f701f117720 collection 1.6a_head updated
2011-08-31 10:08:25.879737 7f701f117720 485/486 processed
2011-08-31 10:08:25.879773 7f701f117720 Updating collection 0.80_head current version is 0
2011-08-31 10:08:30.019669 7f701f117720 collection 0.80_head updated
2011-08-31 10:08:30.029900 7f701f117720 journal close /data/osd/journal
2011-08-31 10:08:30.077887 7f701f117720 filestore(/data/osd) mount FIEMAP ioctl is supported
2011-08-31 10:08:30.077926 7f701f117720 filestore(/data/osd) mount did NOT detect btrfs
2011-08-31 10:08:30.078420 7f701f117720 filestore(/data/osd) mount found snaps <>
2011-08-31 10:08:30.078453 7f701f117720 filestore(/data/osd) mount: WRITEAHEAD journal mode explicitly enabled in conf
2011-08-31 10:08:30.085258 7f701f117720 journal _open /data/osd/journal fd 16: 536870912 bytes, block size 4096 bytes, directio = 1
2011-08-31 10:08:30.095640 7f701f117720 journal read_entry 77070336 : seq 785930 207906 bytes
2011-08-31 10:08:30.095715 7f701f117720 journal read_entry 77283328 : seq 781861 1923 bytes
2011-08-31 10:08:30.095731 7f701f117720 journal read_entry 77283328 : seq 781861 1923 bytes
2011-08-31 10:08:30.095751 7f701f117720 journal _open /data/osd/journal fd 16: 536870912 bytes, block size 4096 bytes, directio = 1
2011-08-31 10:08:30.096167 7f701f117720 osd1 0 OSD::init() : unable to read osd superblock
2011-08-31 10:08:30.096718 7f701f117720 journal close /data/osd/journal
2011-08-31 10:08:30.098546 7f701f117720 ** ERROR: initializing osd failed: error 1: Operation not permitted
</pre>
<p>upping the debug filestore level gets:</p>
<pre>
2011-08-31 10:26:42.757501 7f22ae768720 ceph version .commit: . process: cosd. pid: 19822
2011-08-31 10:26:42.757647 7f22ae768720 filestore(/data/osd) basedir /data/osd journal /data/osd/journal
2011-08-31 10:26:42.766997 7f22ae768720 filestore(/data/osd) mount FIEMAP ioctl is supported
2011-08-31 10:26:42.767046 7f22ae768720 filestore(/data/osd) mount did NOT detect btrfs
2011-08-31 10:26:42.767206 7f22ae768720 filestore(/data/osd) mount fsid is 1461041640
2011-08-31 10:26:42.767280 7f22ae768720 filestore(/data/osd) mount found snaps <>
2011-08-31 10:26:42.767308 7f22ae768720 filestore(/data/osd) mount op_seq is 785930
2011-08-31 10:26:42.767314 7f22ae768720 filestore(/data/osd) open_journal at /data/osd/journal
2011-08-31 10:26:42.767348 7f22ae768720 filestore(/data/osd) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2011-08-31 10:26:42.767366 7f22ae768720 filestore(/data/osd) list_collections
2011-08-31 10:26:42.775765 7f22ae768720 journal _open /data/osd/journal fd 9: 536870912 bytes, block size 4096 bytes, directio = 1
2011-08-31 10:26:42.776001 7f22ae768720 journal read_entry 77070336 : seq 785930 207906 bytes
2011-08-31 10:26:42.776023 7f22ae768720 journal read_entry 77283328 : seq 781861 1923 bytes
2011-08-31 10:26:42.776032 7f22ae768720 journal read_entry 77283328 : seq 781861 1923 bytes
2011-08-31 10:26:42.776045 7f22ae768720 journal _open /data/osd/journal fd 9: 536870912 bytes, block size 4096 bytes, directio = 1
2011-08-31 10:26:42.776374 7f22ab8bf700 filestore(/data/osd) sync_entry waiting for max_interval 5.000000
2011-08-31 10:26:42.776492 7f22ae768720 FileStore is up to date.
2011-08-31 10:26:42.776511 7f22ae768720 filestore(/data/osd) umount /data/osd
2011-08-31 10:26:42.776569 7f22aa0bc700 filestore(/data/osd) flusher_entry start
2011-08-31 10:26:42.776580 7f22aa0bc700 filestore(/data/osd) flusher_entry sleeping
2011-08-31 10:26:42.776607 7f22ab8bf700 filestore(/data/osd) sync_entry force_sync set
2011-08-31 10:26:42.776719 7f22aa0bc700 filestore(/data/osd) flusher_entry awoke
2011-08-31 10:26:42.776729 7f22aa0bc700 filestore(/data/osd) flusher_entry finish
2011-08-31 10:26:42.776850 7f22ae768720 journal close /data/osd/journal
2011-08-31 10:26:42.776906 7f22ae768720 filestore(/data/osd) stop_logger
2011-08-31 10:26:42.778024 7f22ae768720 filestore(/data/osd) test_mount basedir /data/osd journal /data/osd/journal
2011-08-31 10:26:42.778689 7f22ae768720 osd1 0 mounting /data/osd /data/osd/journal
2011-08-31 10:26:42.778726 7f22ae768720 filestore(/data/osd) basedir /data/osd journal /data/osd/journal
2011-08-31 10:26:42.786620 7f22ae768720 filestore(/data/osd) mount FIEMAP ioctl is supported
2011-08-31 10:26:42.786640 7f22ae768720 filestore(/data/osd) mount did NOT detect btrfs
2011-08-31 10:26:42.786677 7f22ae768720 filestore(/data/osd) mount fsid is 1461041640
2011-08-31 10:26:42.786730 7f22ae768720 filestore(/data/osd) mount found snaps <>
2011-08-31 10:26:42.786746 7f22ae768720 filestore(/data/osd) mount op_seq is 785930
2011-08-31 10:26:42.786751 7f22ae768720 filestore(/data/osd) open_journal at /data/osd/journal
2011-08-31 10:26:42.786763 7f22ae768720 filestore(/data/osd) mount: WRITEAHEAD journal mode explicitly enabled in conf
2011-08-31 10:26:42.786772 7f22ae768720 filestore(/data/osd) list_collections
2011-08-31 10:26:42.792436 7f22ae768720 journal _open /data/osd/journal fd 17: 536870912 bytes, block size 4096 bytes, directio = 1
2011-08-31 10:26:42.792578 7f22ae768720 journal read_entry 77070336 : seq 785930 207906 bytes
2011-08-31 10:26:42.792594 7f22ae768720 journal read_entry 77283328 : seq 781861 1923 bytes
2011-08-31 10:26:42.792601 7f22ae768720 journal read_entry 77283328 : seq 781861 1923 bytes
2011-08-31 10:26:42.792614 7f22ae768720 journal _open /data/osd/journal fd 17: 536870912 bytes, block size 4096 bytes, directio = 1
2011-08-31 10:26:42.792888 7f22a60b4700 filestore(/data/osd) sync_entry waiting for max_interval 5.000000
2011-08-31 10:26:42.792984 7f22a48b1700 filestore(/data/osd) flusher_entry start
2011-08-31 10:26:42.792997 7f22a48b1700 filestore(/data/osd) flusher_entry sleeping
2011-08-31 10:26:42.793051 7f22ae768720 osd1 0 boot
2011-08-31 10:26:42.793078 7f22ae768720 filestore(/data/osd) read meta/osd_superblock/0 0~0
2011-08-31 10:26:42.793229 7f22ae768720 filestore(/data/osd) FileStore::read(meta/osd_superblock/0): open error error 2: No such file or directory
2011-08-31 10:26:42.793242 7f22ae768720 osd1 0 OSD::init() : unable to read osd superblock
2011-08-31 10:26:42.793247 7f22ae768720 filestore(/data/osd) umount /data/osd
2011-08-31 10:26:42.793284 7f22a48b1700 filestore(/data/osd) flusher_entry awoke
2011-08-31 10:26:42.793295 7f22a48b1700 filestore(/data/osd) flusher_entry finish
2011-08-31 10:26:42.793327 7f22a60b4700 filestore(/data/osd) sync_entry force_sync set
2011-08-31 10:26:42.793503 7f22ae768720 journal close /data/osd/journal
2011-08-31 10:26:42.793544 7f22ae768720 filestore(/data/osd) stop_logger
2011-08-31 10:26:42.793625 7f22ae768720 ** ERROR: initializing osd failed: error 1: Operation not permitted
</pre> Ceph - Bug #1434 (Resolved): osds all failing each otherhttps://tracker.ceph.com/issues/14342011-08-22T15:46:14ZJohn Leachjohn@johnleach.co.uk
<p>build of git tree 3a623bb327</p>
<p>cluster boots, then within a minute all the osds start failing each other. logs show heartbeats not being received.</p>
<p>was working fine before, though this is the same cluster that was reporting errant scrub stat mismatches previously (but otherwise apparently functioning fine). See ticket <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: errant scrub stat mismatch logs after upgrade (Resolved)" href="https://tracker.ceph.com/issues/1376">#1376</a></p>
<p>see attached log from one of my 4 osds.</p> Ceph - Bug #1376 (Resolved): errant scrub stat mismatch logs after upgradehttps://tracker.ceph.com/issues/13762011-08-08T15:50:51ZJohn Leachjohn@johnleach.co.uk
<p>upgraded from git commit #394537092d to git commit #68cbbf42c42, and after restarting the cluster I immediately saw many "scrub stat mismatch" errors:</p>
<pre>
2011-08-08 21:45:00.590610 log 2011-08-08 21:44:53.557073 osd1 10.42.77.90:6800/10375 1 : [ERR] 0.2 scrub stat mismatch, got 63/63 objects, 0/0 clones, 542633/18446744073701705641 bytes, 566/18446744073709543990 kb.
2011-08-08 21:45:01.646482 log 2011-08-08 21:44:56.610022 osd2 10.219.16.42:6801/16231 1 : [ERR] 0.6 scrub stat mismatch, got 76/76 objects, 0/0 clones, 3822237/18446744073709179549 bytes, 3772/18446744073709551292 kb.2011-08-08 21:45:03.779548 log 2011-08-08 21:44:55.965413 osd3 10.200.35.118:6800/1574 10 : [ERR] 0.15 scrub stat mismatch, got 83/83 objects, 0/0 clones, 2329158/18446744073707686470 bytes, 2321/18446744073709549841 kb.
</pre>
<p>as these came in from multi osds, and just after an upgrade (never seen them before and have done many upgrades), it looks more like a bug than real data corruption.</p>
<p>Cluster layout is:<br /><pre>
2011-08-08 22:45:29.712106 pg v35386: 800 pgs: 800 active+clean; 2679 MB data, 13032 MB used, 2979 GB / 3152 GB avail
2011-08-08 22:45:29.713446 mds e91021: 2/2/2 up {0=1=up:active,1=0=up:active}
2011-08-08 22:45:29.713474 osd e102: 4 osds: 4 up, 4 in
2011-08-08 22:45:29.713520 log 2011-08-08 22:38:41.736291 osd3 10.200.35.118:6800/2525 297 : [ERR] 0.15 scrub 1 errors
2011-08-08 22:45:29.713583 mon e1: 3 mons at {0=10.126.174.94:6789/0,1=10.82.103.194:6789/0,2=10.115.202.218:6789/0}
</pre></p>
<p>cluster is just a test cluster, no real data and at the time of the upgrade, had no clients accessing it.</p>
<p>attached log of one of the osds after manually requesting a scrub (debug level 20)</p> CephFS - Bug #805 (Resolved): mds startup: _replay journaler got error -22, abortinghttps://tracker.ceph.com/issues/8052011-02-14T13:02:56ZJohn Leachjohn@johnleach.co.uk
<p>As per <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds assert failed replaying journal after respawn (Resolved)" href="https://tracker.ceph.com/issues/803">#803</a>, the assert is now fixed but starting up the cluster I now get:<br /><pre>
2011-02-14 21:02:14.026899 7f91c9865700 mds0.14 ms_handle_connect on 10.135.211.78:6801/22875
2011-02-14 21:02:59.867133 7f91c9865700 mds0.cache creating system inode with ino:100
2011-02-14 21:02:59.867415 7f91c9865700 mds0.cache creating system inode with ino:1
2011-02-14 21:03:05.579551 7f91c7059700 mds0.journaler try_read_entry got 0 len entry at offset 4051698581
2011-02-14 21:03:05.579615 7f91c7059700 mds0.log _replay journaler got error -22, aborting
2011-02-14 21:03:05.582298 7f91c7059700 mds0.14 boot_start encountered an error, failing
2011-02-14 21:03:05.582323 7f91c7059700 mds0.14 suicide. wanted up:replay, now down:dne
2011-02-14 21:03:05.583656 7f91cc303720 stopped.
</pre></p>
<p>See <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds assert failed replaying journal after respawn (Resolved)" href="https://tracker.ceph.com/issues/803">#803</a> for more logs and other details</p> Ceph - Bug #803 (Resolved): mds assert failed replaying journal after respawnhttps://tracker.ceph.com/issues/8032011-02-13T12:49:59ZJohn Leachjohn@johnleach.co.uk
<pre>
sdc/Journaler.h:225: FAILED assert(readonly || state == STATE_READHEAD)
</pre>
<p>I created a dir with about 500,000 files, then ran "ls" in it. The process ran for many minutes, with the mds process at 100% cpu (and osd processes quite busy too iirc). Whilst it was running, I used injectargs to experiment with a few debug levels, at which point the mds crashed and now won't start back up.</p>
<p>Judging from the mds logs, it was detected as being laggy (possibly due to my messing with debug settings?), got set as down, respawned, but failed to replay the journal (logs attached).</p>
<p>No other processes crashed or oomed (all 4 osds and 3 mons stayed up).</p>
<p>It's a build from the master branch, commit da6966958471db1dbf20f30e467221338b2b2e7d.</p>
<p>Possibly related to <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: mount hung, tid timed out messages in log (Resolved)" href="https://tracker.ceph.com/issues/777">#777</a>.</p> Ceph - Bug #777 (Resolved): mount hung, tid timed out messages in loghttps://tracker.ceph.com/issues/7772011-02-05T10:45:07ZJohn Leachjohn@johnleach.co.uk
<p>I have a ceph cluster with 3 mons, 1 mds and 4 osds. I mounted the ceph filesystem on another machine using the default 2.6.37 kernel client and write a script to create a million files, between 1k and 4k in size. It ran for some unknown amount of time (at a guess, at least 30 to 60 mins) putting load on the cluster but at some point my script process got stuck in D state and the mount because inaccessible (could see the free space but any kind of read or write operation on the mount stuck the process in D state) and all load ceased.</p>
<p>In the kernel log on the client I see message like "tid 405965 timed out on osd1, will reset osd" (see client.log). You'll also see where I tried restarting the whole ceph cluster (though not the client) in an attempt to get the mount working again (with no luck).</p>
<p>Logs for 3 of the 4 osds attached (osd0 never seemed to have any timed out messages).</p>
<p>I started the test at around 12:30 on 4th Feb. I turned osd debugging on at about 23:36 on 4th Feb and shut the cluster down to go to bed at around 00:45 on 5th Feb.</p>
<p>I've also included a tarball of the /sys/kernel/debug/ceph directory on the client (though this was made today, many hours after the cluster was been shut down).</p> Ceph - Bug #698 (Resolved): cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/6982011-01-11T10:49:23ZJohn Leachjohn@johnleach.co.uk
<p>I reported this on the mailing list a week ago but never filed it here. Still present in 0.24.1.</p>
<p>I've got a 3 node test cluster (3 mons, 3 osds) with about 24,000,000<br />very small objects across 2400 pools (written directly with librados,<br />this isn't a ceph filesystem).</p>
<p>The cosd processes have steadily grown in ram size and have finally<br />exhausted ram and are getting killed by the oom killer (the nodes have<br />6gig RAM and no swap).</p>
<p>When I start them back up they just very quickly increase in ram size<br />again and get killed.</p>
<p>I'm running ceph 0.24 (and 0.24.1) on 64bit Ubuntu Lucid servers. In case it's<br />useful, I've just written these objects serially, no reading, no<br />rewrites, updates or snapshots.</p>
<p>I haven't touched the pg_nums on this cluster that I recall (it's been<br />up a couple of weeks but has nearly exclusively been used for writing<br />this test data).</p>
<p>tcmalloc heap profiling report attached (last profile before cosd was killed).</p>
<p>cosd debug output from around the time of the final ram exhaustion:</p>
<pre>
2011-01-05 00:17:58.532524 mon e1: 3 mons at {0=10.135.211.78:6789/0,1=10.61.136.222:6789/0,2=10.202.105.222:6789/0}
2011-01-05 00:22:53.325264 osd e10659: 3 osds: 3 up, 3 in
2011-01-05 00:22:53.383272 pg v151295: 20936 pgs: 1 creating, 2 peering, 10352 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 476 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 12489924/49420044 degraded (25.273%)
2011-01-05 00:22:53.422433 log 2011-01-05 00:22:53.325027 mon0 10.135.211.78:6789/0 4 : [INF] osd0 10.135.211.78:6801/31836 boot
2011-01-05 00:24:47.301186 pg v151296: 20936 pgs: 1 creating, 2 peering, 10352 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 476 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 12489924/49420044 degraded (25.273%)
<cosd crashes here>
2011-01-05 00:25:52.422340 log 2011-01-05 00:25:52.189259 mon0 10.135.211.78:6789/0 5 : [INF] osd0 10.135.211.78:6801/31836 failed (by osd2 10.61.136.222:6800/915)
2011-01-05 00:25:57.265635 log 2011-01-05 00:25:57.121870 mon0 10.135.211.78:6789/0 6 : [INF] osd0 10.135.211.78:6801/31836 failed (by osd2 10.61.136.222:6800/915)
2011-01-05 00:26:02.341805 osd e10660: 3 osds: 2 up, 3 in
2011-01-05 00:26:02.362526 log 2011-01-05 00:26:02.127627 mon0 10.135.211.78:6789/0 7 : [INF] osd0 10.135.211.78:6801/31836 failed (by osd2 10.61.136.222:6800/915)
2011-01-05 00:26:02.470942 pg v151297: 20936 pgs: 1 creating, 2 peering, 10352 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 476 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 12489924/49420044 degraded (25.273%)
2011-01-05 00:26:12.578266 pg v151298: 20936 pgs: 1 creating, 2 peering, 3393 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 7435 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 20728862/49420044 degraded (41.944%)
</pre> Ceph - Bug #629 (Resolved): cosd segfaults when deleting a pool containing degraded objectshttps://tracker.ceph.com/issues/6292010-12-02T16:54:13ZJohn Leachjohn@johnleach.co.uk
<p>started a 4 node osd cluster. created some pools with some objects in them. killed one osd node. waited for it to be noticed and cluster to become degraded. deleted 3 pools containing degraded objects (using rados rmpool) and shortly afterward, other cosd processes segfault:</p>
<pre>
2010-12-03 00:48:39.443120 7fffeab28710 osd1 5181 pg[385.0( v 1158'1219 lc 0'0 (1158'1217,1158'1219]+backlog n=1219 ec=1155 les=5138 5158/5158/5158) [] r=-1 (info mismatch, log(0'0,0'0]) stray DELETING] write_log to 0~0
2010-12-03 00:48:39.443155 7fffeab28710 osd1 5181 _remove_pg 385.0 0 objects
2010-12-03 00:48:39.443163 7fffeab28710 osd1 5181 _remove_pg 385.0 flushing store
2010-12-03 00:48:39.443815 7fffeab28710 osd1 5181 _remove_pg 385.0 taking osd_lock
2010-12-03 00:48:39.443832 7fffeab28710 osd1 5181 _remove_pg 385.0 removing final
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeab28710 (LWP 13457)]
0x00000000004c37b4 in OSD::_put_pool(int) ()
(gdb) bt
#0 0x00000000004c37b4 in OSD::_put_pool(int) ()
#1 0x00000000004d6e7a in OSD::_remove_pg(PG*) ()
#2 0x00000000005d0a4f in ThreadPool::worker() ()
#3 0x00000000004feeed in ThreadPool::WorkThread::entry() ()
#4 0x0000000000470baa in Thread::_entry_func(void*) ()
#5 0x00007ffff79c29ca in start_thread () from /lib/libpthread.so.0
#6 0x00007ffff694070d in clone () from /lib/libc.so.6
#7 0x0000000000000000 in ?? ()
</pre>
<p>full all-thread backtrace attached.</p> Ceph - Bug #622 (Resolved): crushtool useless parse errorhttps://tracker.ceph.com/issues/6222010-12-02T03:07:32ZJohn Leachjohn@johnleach.co.uk
<p>I can't decide whether this is a bug in crushtool or a bug in my crushmap but whichever it is, the error message isn't helpful.</p>
<p>Attached is my human readable crushmap. When I compile this with crushtool, I get an error:</p>
<pre>
root@srv-an56n:~# crushtool -c /tmp/crush.txt -o /tmp/crush.new
/tmp/crush.txt:28 error: parse error at '# regions'
</pre>
<p>If I delete that comment line I get:</p>
<pre>
/tmp/crush.txt:27 error: parse error at ''
</pre>
<p>I'm certain I successfully compiled a very similar crushmap with 0.23.1</p>
<p>This is with <a class="changeset" title="osd: fix log tail vs last_complete assert on replica activation The last_complete may be below t..." href="https://tracker.ceph.com/projects/ceph/repository/revisions/78a14622438addcd5c337c4924cce1f67d053ee9">78a14622438addcd5c337c4924cce1f67d053ee9</a> from the rc branch.</p> Ceph - Bug #621 (Resolved): error building unstable branch, rbd.cc:837: error: no matching functi...https://tracker.ceph.com/issues/6212010-12-01T07:08:16ZJohn Leachjohn@johnleach.co.uk
<p>Building on i386 Ubuntu Lucid, it fails building rbd.</p>
<p>This is a build of unstable at commit bf784cdb4f605c467eb094a0bc226115cc751b93</p>
<pre>
g++ -DHAVE_CONFIG_H -I. -Wall -D__CEPH__ -D_FILE_OFFSET_BITS=64 -D_REENTRANT -D_THREAD_SAFE -rdynamic -g -O2 -MT rbd.o -MD -MP -MF .deps/rbd.Tpo -c -o rbd.o rbd.cc
rbd.cc: In function 'int do_import(void*, const char*, int, const char*)':
rbd.cc:837: error: no matching function for call to 'min(uint64_t&, off_t)'
make[3]: *** [rbd.o] Error 1
make[3]: Leaving directory `/build/buildd/ceph-0.23.1+gitbf784cdb4/src'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/build/buildd/ceph-0.23.1+gitbf784cdb4/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/build/buildd/ceph-0.23.1+gitbf784cdb4'
make: *** [build-stamp] Error 2
</pre>
<p>full build log here: <a class="external" href="http://launchpadlibrarian.net/59908082/buildlog_ubuntu-lucid-i386.ceph_0.23.1%2Bgitbf784cdb4-2_FAILEDTOBUILD.txt.gz">http://launchpadlibrarian.net/59908082/buildlog_ubuntu-lucid-i386.ceph_0.23.1%2Bgitbf784cdb4-2_FAILEDTOBUILD.txt.gz</a></p>