Project

General

Profile

Activity

From 11/05/2010 to 12/04/2010

12/04/2010

08:53 PM Tasks #616 (Rejected): radosacl needs a man page
Sage Weil
08:52 PM Bug #627 (Resolved): replace openssl with crypto++
Sage Weil
08:38 PM CephFS Feature #630 (Resolved): release caps on inodes unlinked by other clients
If client A writes a file, and client B unlinks it, client A needs to drop the inode sooner rather than later.
O...
Sage Weil
03:34 AM Revision 15d8bdf3 (ceph): crypto: use crypto++ for aes instead of openssl
need to implement it more efficiently, currently going through a string object Yehuda Sadeh
03:34 AM Revision 58f3ce4a (ceph): crypto: test for allocation failure, cleanup
Yehuda Sadeh
03:34 AM Revision 6ec622c0 (ceph): common: use ceph_armor instead of openssl based functions
also modify ceph_[un]armor to get dest buffer length Yehuda Sadeh
03:34 AM Revision 7fa9426c (ceph): makefile.am: most binaries (except rgw_*) don't link with openssl
Yehuda Sadeh
03:34 AM Revision e135e924 (ceph): crypto: remove old openssl implementation
Yehuda Sadeh
03:34 AM Revision 76e02c71 (ceph): common: remove base64.c
Yehuda Sadeh
03:34 AM Revision 88213770 (ceph): crypto: change include
Yehuda Sadeh
03:34 AM Revision a28b4494 (ceph): configure: check for the presence of libcrypto++ header files
Yehuda Sadeh
03:34 AM Revision f2424dfb (ceph): rgw: get rid of openssl altogether
Yehuda Sadeh
03:34 AM Revision e0059259 (ceph): rgw: null terminate armor result
Yehuda Sadeh
03:34 AM Revision 23f37043 (ceph): ceph.spec.in: update dependency
Yehuda Sadeh

12/03/2010

06:02 PM Revision a457cbb9 (ceph): mon: fix typo
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:02 PM Revision 378d13df (ceph): osd: remove poid/soid from ScrubMap::object; clean up callers
The soid is in the key in the map; no need to store it in the value.
Update the scrub code appropriately.
Signed-off...
Sage Weil
05:35 PM Revision a4cc929c (ceph): make: create log directories and tmp directories
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
05:10 PM Revision a5297388 (ceph): msgr: Correctly handle half-open connections.
If poll() says a socket is ready for reading, but zero bytes
are read, that means that the peer has sent a FIN. Hand...
Jim Schutt
11:11 AM Bug #590 (In Progress): osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Colin McCabe
11:11 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Hi Fred,
When you say you removed the PGs "by hand"... does that mean you used "rm -rf" on the object store while ...
Colin McCabe
07:33 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
this is with ceph rc 39b42b21e9805b3ec838f8682420166fede719f2
I tried to solve the ENOSPC problem by removing PGs ...
ar Fred
09:36 AM CephFS Bug #623 (Resolved): MDS: MDSTable::load_2
Sage Weil
12:32 AM CephFS Bug #623: MDS: MDSTable::load_2
Yes, tried with the latest rc, works!
MDS starts and recovers, als mounting and using the FS goes fine.
Wido den Hollander
09:35 AM Bug #625 (Resolved): make install should create dirs
implemented by commit:a4cc929cedb0ee773a2fa68d691a9951221ae31a and commit:39b42b21e9805b3ec838f8682420166fede719f2
C.
Colin McCabe
01:35 AM Revision 39b42b21 (ceph): make: create /etc/ceph if it doesn't exist
make: create /etc/ceph if it doesn't exist. On uninstall, remove the
directory if it's empty. (Never remove a user's ...
Colin Patrick McCabe
12:56 AM Revision da5ab7c9 (ceph): ost: object_info_t: decode old versions correctly
object_info_t has one constructor that initializes everything from a
bufferlist. This means that the decode function ...
Colin Patrick McCabe
12:18 AM Revision 03eb4e7a (ceph): man: add man page for cephfs
Add to Makefile, debian, and ceph.spec.in bits Greg Farnum

12/02/2010

07:52 PM Revision 6518fae3 (ceph): watch: some more linger fixes
Yehuda Sadeh
06:16 PM CephFS Feature #91: mds: up:shadow mode
I have yet to implement trimming, but the basic restarting-replay bits are now in place along with hooks to make it s... Greg Farnum
05:14 PM Bug #479: ceph/mount crash badly when writing
Hi all:
Ok, so I gitted again, original/unstable,
- Linux ss1 2.6.36-02063601-generic #201011231330 SMP Tue Nov 2...
DongJin Lee
05:07 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Colin McCabe
05:07 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Hi Fred,
I think the assertion you're seeing here was fixed very recently by commit:78a14622438addcd5c337c4924cce1...
Colin McCabe
05:02 PM Bug #629: cosd segfaults when deleting a pool containing degraded objects
Looks like some kind of lifecycle issue related to deleting pools.
OSD::_remove_pg does a _put_pool, and that does...
Colin McCabe
04:54 PM Bug #629 (Resolved): cosd segfaults when deleting a pool containing degraded objects
started a 4 node osd cluster. created some pools with some objects in them. killed one osd node. waited for it to be ... John Leach
04:52 PM CephFS Bug #623: MDS: MDSTable::load_2
I think that commit:da5ab7c9a49f8996b41783175683d4b8b13ece4d should fix this issue.
wido, can you re-run with the ...
Colin McCabe
04:44 PM CephFS Bug #623: MDS: MDSTable::load_2
root@noisy:/var/log/ceph# grep mark_all_unfound_as_lost *
[ no results ]
So we're not marking things as lost in...
Colin McCabe
11:52 AM CephFS Bug #623: MDS: MDSTable::load_2
actually -23 is NFILE, which is I think coming from the LOST code...but that should never trigger unless the admin ha... Sage Weil
05:12 AM CephFS Bug #623 (Resolved): MDS: MDSTable::load_2
On a small test machine I have a Ceph RC cluster running (Which was running a old unstable before), after my upgrade ... Wido den Hollander
04:13 PM Tasks #617: cephfs needs a man page
Already had it in the Makefile, put it in the other bits and updated the commit. Greg Farnum
03:56 PM Tasks #617: cephfs needs a man page
need to add filename to debian/ceph.install and ceph.spec.in too.
and to man/Makefile.am
Sage Weil
02:07 PM Tasks #617 (Resolved): cephfs needs a man page
Done in commit:6cdaa2f6a7670357313401ddbd322bdf529a1547 on the rc branch. Greg Farnum
03:29 PM Bug #622 (Resolved): crushtool useless parse error
Resolved-- the crushmap.txt was bad.
I created #628 for getting better error messages from crushtool.
Colin McCabe
01:19 PM Bug #622: crushtool useless parse error
There is a more advanced error handling API for spirit described at:
http://www.boost.org/doc/libs/1_41_0/libs/spiri...
Colin McCabe
11:35 AM Bug #622: crushtool useless parse error
Reposting the diff; hopefully clearer this time.
--- crushmap.txt.1 2010-12-02 11:38:43.816441440 -0800
++...
Colin McCabe
11:33 AM Bug #622: crushtool useless parse error
I was able to get the crushmap.txt to work by deleting the word "domain" in the gb1 region.
We should definitely h...
Colin McCabe
03:07 AM Bug #622 (Resolved): crushtool useless parse error
I can't decide whether this is a bug in crushtool or a bug in my crushmap but whichever it is, the error message isn'... John Leach
03:27 PM RADOS Feature #628 (New): crushtool: better error messages when parsing a crushmap.txt
There is a more advanced error handling API for spirit described at:
http://www.boost.org/doc/libs/1_41_0/libs/spiri...
Colin McCabe
03:22 PM Bug #625: make install should create dirs
Should be pretty straightforward. The only question is, should we remove those directories on an uninstall? Colin McCabe
11:11 AM Bug #625 (Resolved): make install should create dirs
/var/log/ceph
/var/lib/ceph/tmp
?
check debian/ceph.dirs to see what else gets created...
Sage Weil
11:57 AM Bug #627 (Resolved): replace openssl with crypto++
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/684011 Sage Weil
11:25 AM CephFS Feature #626 (Closed): qa: add IOR, rompio, or other parallel workloads suite
We've had reports that rompio is just terrifically unstable, and shows serious scaling issues.
IOR is a more commo...
Greg Farnum
09:39 AM Feature #624: radostool: make 'put' write large objects in chunks
Can we be able to set the chunk size with an argument, for testing this kind of thing in future? John Leach
09:38 AM Feature #624 (Resolved): radostool: make 'put' write large objects in chunks
otherwise a put on a large (100mb+) file can fail because it exceeds the size of the osd journals. it's also clearly... Sage Weil

12/01/2010

11:40 PM Revision 78a14622 (ceph): osd: fix log tail vs last_complete assert on replica activation
The last_complete may be below the log tail IFF we have a backlog.
Fixes 756918be3b24d8164699da301ddfbc8e6fd6b751.
...
Sage Weil
11:11 PM Revision 63fab458 (ceph): rados_bencher.h:
bench_write and bench_seq will now wait on any write/read
rather than the one least recently started.
bench_write ...
Samuel Just
11:00 PM Revision 0ea601ab (ceph): Create SyslogStreambuf
SyslogStreambuf is a kind of stream buffer that allows you to output
characters from an ostream to syslog. Most stand...
Colin Patrick McCabe
09:48 PM Revision a3d8c527 (ceph): filestore: call lower-level do_transactions() during journal replay
We used to call apply_transactions, which avoided rejournaling anything
because the journal wasn't writeable yet, but...
Sage Weil
09:46 PM Revision 9ecbc300 (ceph): filestore: do journal mode autodetect and sanity check _before_ replay
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:25 PM Tasks #617: cephfs needs a man page
I'll get this tomorrow. I wrote the tool and have had a task in my private manager to do this ever since then. Greg Farnum
07:05 PM Revision f9fa855a (ceph): filestore: fix journal locking on trailing mode
We're already holding journal_lock due to the surrounding
op_submit_{start,finish}.
Signed-off-by: Sage Weil <sage@n...
Sage Weil
06:20 PM Revision 0897edaf (ceph): Merge branch 'testing' into rc
Conflicts:
configure.ac
Sage Weil
06:20 PM Revision cbb56208 (ceph): rbd: use MIN instead of min()
Not even sure where min() was coming from, but it seems to be missing on
i386 lucid.:
g++ -DHAVE_CONFIG_H -I. -W...
Sage Weil
06:20 PM Revision 792b04ba (ceph): client: connect to export targets on cap EXPORT
Also unconditionally connect on reconnect, even when there aren't any
outstanding requests.
Signed-off-by: Sage Weil...
Sage Weil
06:03 PM Revision bde0c721 (ceph): filestore: do not autodetect BTRFS_IOC_SNAP_CREATE_ASYNC until interfac...
Li has proposed an alternative V2 ioctl that looks nicer, so wait until
that is finalized.
Signed-off-by: Sage Weil ...
Sage Weil
06:03 PM Revision 5bdae2af (ceph): ceph v0.23.2
Sage Weil
05:44 PM Revision 4592c220 (ceph): client: fix cap export handler
An EXPORT cap msg can race with a cap release; deal with that (realigning
this code with the kclient).
Signed-off-by...
Sage Weil
05:24 PM Revision 15c272e8 (ceph): man: fix monmaptool man page
I've found the manpage problem that I've noted before. It's about
monmaptool, the CLI says it's usage:
[--print] [--c...
Laszlo Boszormenyi
03:17 PM Bug #611 (Resolved): OSD: OSDMap::get_cluster_inst
Sage Weil
03:17 PM Bug #612 (Resolved): OSD: Crash during auto scrub
Sage Weil
02:58 PM Linux kernel client Bug #564 (Resolved): Configuration via configfs instead of sysfs
acked by greg kh, yay Sage Weil
02:58 PM rbd Bug #391 (Can't reproduce): snap create/delete caused corruption
this is old Sage Weil
02:47 PM Bug #550 (Can't reproduce): mon: PGMonitor::update_from_paxos()
haven't been able to reproduce this. commit:62716aa7 gives us useful error messages. if/when it comes up again we'l... Sage Weil
02:28 PM Linux kernel client Bug #436 (Can't reproduce): cmon: basic_string::_S_construct NULL not valid
Sage Weil
02:28 PM Bug #460 (Can't reproduce): OSD crash: ReplicatedPG::push_to_replica / Rb_tree
Sage Weil
09:46 AM Bug #621 (Resolved): error building unstable branch, rbd.cc:837: error: no matching function for ...
should be fixed by commit:307404231ecb09fdd2f6dd6e50677e746bba4236 Sage Weil
07:08 AM Bug #621 (Resolved): error building unstable branch, rbd.cc:837: error: no matching function for ...
Building on i386 Ubuntu Lucid, it fails building rbd.
This is a build of unstable at commit bf784cdb4f605c467eb094...
John Leach
09:14 AM CephFS Bug #344 (Resolved): cfuse should pass all qa tests
Sage asked me to mark this resolved. I ran the bonnie test yesterday and it eventually crashed when the disk ran out ... Greg Farnum
02:22 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
I just tried the latest unstable: fe9fad7bea
osd log attached...
osd/OSD.cc: In function 'void OSD::_process_pg...
ar Fred
12:50 AM Revision 6d96104e (ceph): osd: simplify scrub sanity checks
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:50 AM Revision 76b55c8a (ceph): osd: only adjust osd scrub_pending if pg was reserved
If for some reason we enter scrub() without scrub_reserved == true, don't
adjust the osd->scrubs_pending or we'll scr...
Sage Weil
12:38 AM Revision 260840f5 (ceph): mds: fix import_reverse re-exporting of caps
Make the import_reverse() set the pin/state before it clears them by using
the helper that sets them.
Signed-off-by:...
Sage Weil
12:25 AM Revision fe9fad7b (ceph): v0.25~rc
Sage Weil
12:25 AM Revision 109e3f18 (ceph): mds: turn off mds_bal_frag until resolve vs split/merge is fixed
See #594
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:11 AM Revision f216b020 (ceph): Merge remote branch 'origin/lost' into unstable
Conflicts:
src/osd/osd_types.h
Sage Weil

11/30/2010

11:48 PM Revision 0cc8d34e (ceph): osd: refactor object_info_t constructor a bit
Create a copy constructor for object_info_t, since we often want to copy
an object_info_t and would rather not try to...
Colin Patrick McCabe
11:48 PM Revision c281e1e0 (ceph): osd: mark_all_unfound_as_lost: wake waiters
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:48 PM Revision d5e6cae2 (ceph): radostool: fix memleak in error path
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:48 PM Revision 55f7e567 (ceph): osd: mark_all_unfound_as_lost: set lost attr
In mark_all_unfound_as_lost, we need to set the lost bit in the objects'
object_info_t.
Signed-off-by: Colin McCabe ...
Colin Patrick McCabe
11:48 PM Revision 5e243f3e (ceph): osd: create lost2 test
This one verifies:
1. Client asks for an unfound object and gets put to sleep
2. Object gets declared lost
3. Client ...
Colin Patrick McCabe
11:48 PM Revision b46f847c (ceph): osd: mark_obj_as_lost: don't assume we have obj
In PG::mark_obj_as_lost, we have to mark a missing object as lost. We
should not assume that we have an old version o...
Colin Patrick McCabe
11:48 PM Revision c29fbb12 (ceph): osd: mark_all_unfound_as_lost: bugfix, refactor
mark_all_unfound_as_lost: just delete items from the rmissing set as we
find them, rather than using a multi-pass sys...
Colin Patrick McCabe
11:48 PM Revision e9ccd7eb (ceph): osd: mark_obj_as_lost: fix oloc init, eversion
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:48 PM Revision cee3cd51 (ceph): osd: share_pg_log: update peer_missing
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:48 PM Revision ad4e5f36 (ceph): osd: ReplicatedPG::do_op: error on read-from-lost
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:48 PM Revision b15a97c7 (ceph): test_lost: add lost1 test
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:47 PM Revision 136dfdeb (ceph): osd: don't mark objs as lost unless we're active
We don't have enough information to mark objects as lost until we
activate the PG. might_have_unfound isn't even buil...
Colin Patrick McCabe
11:43 PM Revision 08bd4ead (ceph): mds: fix resolve for surviving observers
Make all survivors participate in resolve stage, so that survivors can
properly determine the outcome of migrations t...
Sage Weil
11:43 PM Revision fb4734be (ceph): (re)add mechanism for marking objects as lost
In activate_map, we now mark objects that we know are unfindable as
lost. This relies on the might_have_unfound set i...
Colin Patrick McCabe
11:43 PM Revision 80f3ea10 (ceph): Add ./ceph dump pg debug degraded_pgs_exist
./ceph dump pg debug degraded_pgs_exist returns TRUE if some pgs are
degraded; false otherwise.
tests: move start_re...
Colin Patrick McCabe
11:43 PM Revision de094224 (ceph): osd: object_info_t: add lost field
We can now permanently mark objects as lost by setting the lost bit in
their object_info_t. Rev the object_info_t str...
Colin Patrick McCabe
11:43 PM Revision e555899c (ceph): osd: active replicas process logs from primaries
In _process_pg_info, if the primary sends us a PG::Log, a replica should
merge that log into its own.
mark_all_unfou...
Colin Patrick McCabe
11:43 PM Revision c0e60afe (ceph): test: dump_osd_store: sort dump output
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
09:21 PM Revision 1123b5c5 (ceph): osd, librados: misc fixes, linger related issues
Yehuda Sadeh
08:57 PM Revision bf784cdb (ceph): osd: fix object_info_t() initialization of oloc
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:56 PM Revision 91a75590 (ceph): mds: add debug output to make completions easier to track
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:48 PM Revision ba1f3cb9 (ceph): osd: fix misuses of OLOC_BLANK
Commit 6e2b594b fixed a bunch of bad get_object_context() calls, but even
with the parameter fixed some were still br...
Sage Weil
08:23 PM Revision 2ad901b3 (ceph): Revert "mds: resolve cleanup"
This reverts commit cd53719f3ce712a060e4ac80cab934c597531a5e.
We need this on surviving nodes too to resolve ambiguo...
Sage Weil
08:19 PM Revision b39f0425 (ceph): Merge branch 'testing' into unstable
Conflicts:
src/os/FileJournal.cc
Sage Weil
07:43 PM Revision 1b06332d (ceph): osd: make recovery_oids debug list per-pg
Otherwise we hit bad asserts if an object of the same name in different
pools is getting recovered simultaneously.
S...
Sage Weil
06:56 PM Revision 05ad97b6 (ceph): client: Set the DirResult buffer to NULL when deleting it.
This should fix a crash exposed by our bonnie workunit. Previously
the client would keep trying to read out of the (d...
Greg Farnum
05:22 PM Revision 559d4d20 (ceph): ceph.spec.in: include gui files
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:13 PM Revision 93601269 (ceph): debian: many many cleanups
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu> Sage Weil
04:55 PM Revision 5eb8ef7f (ceph): filejournal: fix throttle vs FULL behavior
We don't want to add to the throttler if we aren't going to queue the
write, or else we'll never take it off again.
...
Sage Weil
04:45 PM Bug #612: OSD: Crash during auto scrub
this should be fixed by commit:76b55c8a121acd4e5e8b6f5dbb83c25926ac9f76 Sage Weil
04:32 PM Revision 132f74c5 (ceph): Merge branch 'osd_journaling' into unstable
Sage Weil
04:30 PM Revision 7af9ffdf (ceph): filestore: make sure blocked op_start's wake up in order
If they wake up out of order (which, theoretically, they could before) we
can screw up journal submitting order in wr...
Sage Weil
04:24 PM Revision fac7266d (ceph): filestore: assert op_submit_finish is called in order
Verify/assert that we aren't screwing up the submission pipeline ordering.
Namely, we want to make sure that if op_ap...
Sage Weil
04:20 PM CephFS Bug #594: mds: frag split/merge vs replay
disabled in v0.24 Sage Weil
04:01 PM Tasks #539 (Resolved): wiki: document pg expansion
Documented on:
http://ceph.newdream.net/wiki/Changing_the_number_of_PGs
Colin McCabe
03:54 PM Revision 5e391db0 (ceph): filejournal: rework journal FULL behavior and fix throttling
Keep distinct states for FULL, WAIT, and NOTFULL.
The old code was more or less correct at one point, but assumed th...
Sage Weil
03:51 PM Revision 79419c33 (ceph): filestore: refactor op_queue/journal locking
- Combine journal_lock and lock.
- Move throttling outside of the lock (this fixes potential deadlock in
parallel j...
Sage Weil
03:22 PM Revision 0df9dd6e (ceph): filestore: do not throttle op_queue in queue_op()
In parallel mode, queue_op is called while holding the journal lock, so it
is not okay to throttle there. Instead, t...
Sage Weil
12:25 PM Feature #620 (Resolved): objecter: (optionally) read from replica if on localhost and primary is not
This can either compare the ip address, or possibly have a netmask (set in g_conf) to determine 'locality' (where 255... Sage Weil
12:23 PM Feature #619 (Resolved): objecter: optionally read from replicas
Add a read flag to allow reads to come from a random replica. If a replica replies with EAGAIN, retry the request, b... Sage Weil
12:21 PM Feature #618 (Resolved): osd: allow reads from replicas
Allow osd to handle reads on a replica. If the replica is missing the object in question, reply with -EAGAIN to the ... Sage Weil
11:41 AM Bug #613 (Resolved): OSD crash: FAILED assert(recovery_oids.count(soid) == 0)
this was actually a problem with the debug sanity checks. fixed by commit:1b06332de69b332092d115451efbd29afec79269 Sage Weil
10:04 AM Tasks #617 (Resolved): cephfs needs a man page
Sage Weil
10:04 AM Tasks #616 (Rejected): radosacl needs a man page
Sage Weil
08:36 AM Bug #615 (Resolved): osd: improve op+journal throttling
Currently we block first, then take locks, then update the throttle accounting. This makes things racy, because a bu... Sage Weil
08:34 AM Bug #598 (Resolved): osd: journal reset in parallel mode acts weird
fixed as of commit:132f74c56064fdb3c47943679c48aa2a6b98f4eb, along with a ton of other related issues with the io que... Sage Weil
02:49 AM Revision 8003915b (ceph): Makefile: add bloom_filter.hpp to noinst_HEADERs
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
01:16 AM Revision 62075f34 (ceph): Makefile: Fix VPATH builds
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
12:41 AM Revision 0bcdc84a (ceph): osd: osd_types.h: const cleanup
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
12:40 AM Revision 7ee50add (ceph): osd: don't try to load a PG in a nonexistent pool
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
12:38 AM Revision 6ab17236 (ceph): filestore: simplify apply_transactions
Always use queue_transactions, even in no-journal case.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil

11/29/2010

11:52 PM Revision c9f864a0 (ceph): osd: PG::trim: fix inverted conditional in assert
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:12 PM Revision b2bcf4b3 (ceph): common: prevent infinite recursion on SIGSEGV
Install SIGSEGV / SIGABORT handlers with sigaction using SA_RESETHAND.
This will ensure that if the signal handler it...
Colin Patrick McCabe
10:12 PM Revision 85191813 (ceph): osd: Create pg_split test
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
09:35 PM Revision fb60e114 (ceph): logger: Fix a crash when the MDS shuts down cleanly.
We weren't holding the lock on the logger_timer before calling shutdown. Greg Farnum
09:35 PM Revision b4db4100 (ceph): Timer: add some asserts to catch certain errors.
Greg Farnum
08:56 PM Revision adbb5459 (ceph): osd: some notify simplifications and FIXMEs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:56 PM Revision ec15c465 (ceph): osd: track unconnected_watchers and when they expire
- set up an initial expiration when we load the obc off disk
- remove expiration when we connect to an existing watch...
Sage Weil
08:55 PM Revision 376870fa (ceph): osd: add timeout to watch_info_t
Allow the watch timeout be set on a per-watch basis. Still need to figure
out where that comes from.. the client? A...
Sage Weil
08:55 PM Revision 239c0a12 (ceph): rbd: fix version renaming
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:55 PM Revision b3051531 (ceph): osd: fix up WATCH
Separate various paths: registering new watch, reconnecting to existing
watch, removing watch, etc.
Signed-off-by: S...
Sage Weil
08:55 PM Revision 2563905b (ceph): osd: some cleanup
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:55 PM Revision b722662e (ceph): osd: use pg_t to find PG's again
The ceph_object_layout is approaching obsolete. Also, use a more general
lookup_lock_raw_pg() helper that doesn't ta...
Sage Weil
08:54 PM Revision a61f6b5e (ceph): osd: add missing Watch.cc
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:54 PM Revision 0e62c421 (ceph): osdc: spell out version
Cosmetic
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:51 PM Revision 15ffbc8d (ceph): makefile: add missing MWatchNotify.h
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:50 PM Revision 4dca64b2 (ceph): osd: drop unused fields
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:18 PM Revision 463d624d (ceph): Makefile: Add --as-needed to LDFLAGS
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:51 PM Revision a77eb6bd (ceph): vstart.sh: don't specify journaling mode
Let the autodetection kick in, or let the dev specify via -o '...'.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:41 PM Revision e0b927b2 (ceph): osd: PG::trim: add assert
Assert that we're not trimming the PG log past last_complete.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Colin Patrick McCabe
05:48 PM Revision 756918be (ceph): osd: _process_pg_info: add assert for replicas
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
05:06 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Fred, can you see if this reproduces on the latest unstable? Thanks.
-C
Colin McCabe
11:14 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
I added the PG::trim assert. It seems to cause problems immediately with test_unfound.sh
The plot thickens...
Colin McCabe
10:36 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Argh yeah I was all wrong here. The recovery code looks ok.. I think the problem is that _before_ this the log was t... Sage Weil
09:21 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
> The replicas only ever get messages from the primary, and the primary
> sends a log to activate. Never anything e...
Colin McCabe
04:51 PM Bug #614: SEGV loop on _open_lock_pg after rmpool
Er, by that I mean:
load_pgs shouldn't try to load a PG that is in a nonexistent pool. This could only happen aft...
Colin McCabe
04:49 PM Bug #614 (Resolved): SEGV loop on _open_lock_pg after rmpool
In OSD::load_pgs, we weren't checking to make sure that the pool existed when going through all the collections.
F...
Colin McCabe
02:23 PM Bug #614 (Resolved): SEGV loop on _open_lock_pg after rmpool
discovered my cosd processes at 100%, possibly following some "rados rmpool" commands to delete some pools. Stopped ... John Leach
04:41 PM Bug #598: osd: journal reset in parallel mode acts weird
bunch of problems here, not all related to a full journal. Sage Weil
12:18 PM Feature #568 (Resolved): debian: build with --as-needed?
Implemented!
before:
cmccabe@flab:~/src/ceph2/src$ ldd .libs/rados
linux-vdso.so.1 => (0x00007fff4eff...
Colin McCabe
11:13 AM Bug #575 (Resolved): monmaptool terminates when input file is not a monmap
Samuel Just
10:49 AM Bug #479 (Can't reproduce): ceph/mount crash badly when writing
Sage Weil
10:15 AM CephFS Subtask #547 (Resolved): mds: define fsck strategy, required metadata
Sage Weil
10:13 AM CephFS Bug #594: mds: frag split/merge vs replay
needs to be fixed in 0.24, or g_conf.mds_frag needs to be disabled. Sage Weil
10:06 AM Bug #595 (Won't Fix): Autogen: not a literal
seems to go away with latest automake Sage Weil
07:12 AM Bug #613 (Resolved): OSD crash: FAILED assert(recovery_oids.count(soid) == 0)
I'm running a script that reads and writes random objects using librados (creating a new pool once in a while). Runn... John Leach

11/25/2010

07:36 AM Revision 3ab60091 (ceph): osd: dump_missing: also dump missing_loc
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:35 AM Revision da087e47 (ceph): osd: discover_all_missing fix
Don't request information from an OSD unless it is up and part of the
might_have_unfound set. Add more logging.
Sign...
Colin Patrick McCabe
12:18 AM Bug #611: OSD: OSDMap::get_cluster_inst
commit:da087e47c21190f9cbde4d24182b7dfe581cd069 should resolve this Colin McCabe

11/24/2010

10:54 PM Bug #611: OSD: OSDMap::get_cluster_inst
I'll take a look Colin McCabe
10:18 PM Bug #611: OSD: OSDMap::get_cluster_inst
Okay, I somehow commented/set this bug backwards with another one. Whoops, sorry guys!
This looks like the OSD is as...
Greg Farnum
10:38 AM Bug #611: OSD: OSDMap::get_cluster_inst
Sam said he'd look at this since it's in the background scrubbing bits that he and Josh did. Greg Farnum
05:11 AM Bug #611 (Resolved): OSD: OSDMap::get_cluster_inst
After upgrading to the latest unstable, one OSD crashed. Before the upgrade, 10 of the 12 OSD's were online.
When ...
Wido den Hollander
10:18 PM Bug #612: OSD: Crash during auto scrub
Dunno how, but somehow commented/assigned this and another bug backwards. Meant to say:
Sam said he'd look at this s...
Greg Farnum
10:38 AM Bug #612: OSD: Crash during auto scrub
This looks like the OSD is assembling a list of missing queries and then sending them out without bothering to check ... Greg Farnum
05:28 AM Bug #612 (Resolved): OSD: Crash during auto scrub
After I saw #611 my cluster started to crash. One after the other, the OSD's started to go down, all with a message a... Wido den Hollander
10:09 PM Feature #453 (Resolved): osd: return error (instead of blocking) on lost objects
It's passing the lost1 and lost2 unit tests now. Colin McCabe
09:41 PM rgw Bug #353: Handle non-ascii filenames
Yeah, I agree with Amazon's approach here. UTF-8 makes sense. I think we could continue to use std::string internally... Colin McCabe
02:03 AM Revision d6e8e8d1 (ceph): gui: some cleanup
Rather than vectors of pointers, use vectors of NodeInfo structures.
This avoids the problem of freeing the NodeInfo ...
Colin Patrick McCabe
12:56 AM Revision 1b1e040e (ceph): osd: add a map for lingering messages
Yehuda Sadeh
12:55 AM Revision 99e1e4de (ceph): librados: assert_version on sync operations
Yehuda Sadeh
12:55 AM Revision c4b97953 (ceph): librados: last_objver is set on the pool, and not per thread
Yehuda Sadeh
12:55 AM Revision 454ea06e (ceph): rbd: notify about header changes
Yehuda Sadeh
12:55 AM Revision 520b523b (ceph): librados: fix unnecessary locking
Yehuda Sadeh
12:55 AM Revision 4c8bdc53 (ceph): osd: don't notify notifier
Yehuda Sadeh
12:54 AM Revision a76de3b2 (ceph): librados: complete C interface for watch/notify
Yehuda Sadeh
12:54 AM Revision 38c8e383 (ceph): librados: rename cookie to handle in api
Yehuda Sadeh
12:54 AM Revision 2954799a (ceph): librados: notify waits for completion
Yehuda Sadeh
12:50 AM Revision e7184e6d (ceph): librados: start implementing watch/notify
Yehuda Sadeh
12:50 AM Revision a4864bd8 (ceph): librados: enable object versioning
Yehuda Sadeh
12:50 AM Revision f36677f8 (ceph): librados: update C api
Yehuda Sadeh
12:49 AM Revision f8af4f2c (ceph): osd: add watch/notify timeout
Yehuda Sadeh
12:49 AM Revision cc62f2eb (ceph): osd: fix bad mutex lock
Yehuda Sadeh
12:49 AM Revision e0c548ad (ceph): osd: fix ms_handle_reset
Yehuda Sadeh
12:49 AM Revision d5cc6732 (ceph): osd: some notify related cleanups
Yehuda Sadeh
12:49 AM Revision 7272bfec (ceph): osd: send notify response from reset handler if needed
Yehuda Sadeh
12:49 AM Revision d66b52e1 (ceph): osd: watch infrastructure
third attempt Yehuda Sadeh
12:49 AM Revision 2b5e61ca (ceph): osd: send notification id
Yehuda Sadeh
12:49 AM Revision 59e61d0e (ceph): osd: discard of disconnected watchers
still need to add a timeout Yehuda Sadeh
12:49 AM Revision f5f33822 (ceph): osd: send notify reply if there are not watchers
Yehuda Sadeh
12:49 AM Revision 9437ea84 (ceph): osd: add user_version field in obect_info_t
Yehuda Sadeh
12:49 AM Revision 7bda45a1 (ceph): osd: reply with either user_version or at_version, depends on the op
Yehuda Sadeh
12:49 AM Revision f7b7d67a (ceph): osd: check requested watch version number
send appropriate status code if needed Yehuda Sadeh
12:47 AM Revision 2bce34e7 (ceph): osd: handle watch op, register client on object xattr
Yehuda Sadeh
12:47 AM Revision 3110e361 (ceph): osd: basic watch/notify handling
Yehuda Sadeh
12:47 AM Revision e493c7ae (ceph): osd: handle notify-ack
Yehuda Sadeh

11/23/2010

11:39 PM Revision 2f13dd8e (ceph): gui: more reindenting
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:37 PM Revision 66a78c23 (ceph): gui: reindent a bunch of code
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
10:40 PM Revision d8652de6 (ceph): mdcache: in trim_non_auth, only print out path if it has a parent dentry.
This should only occur with the root inode, but caused a segfault for
anybody running more than one MDS who restarted...
Greg Farnum
10:04 PM Revision 8768b52d (ceph): mds: Reply checking_lock while reading filelock
Use checking_lock to repalce lock_state in extra buffer list to let client can get correct file lock reply. Herb Shiu
09:59 PM Revision 4041bf0d (ceph): mds: fix set_state_rejoin auth_pin check
We carry an auth pin IFF !stable AND auth.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:59 PM Revision 5ed06ffc (ceph): client: remove inode from flush_caps list when auth_cap changes
Avoid confusing other code (e.g. kick_flushing_caps) by staying on the mds
flushign_caps list when we don't even have...
Sage Weil
09:52 PM Revision 285cc946 (ceph): osd: fix is_all_uptodate()
This should only return true when recovery is done, i.e., no more missing
objects. Nothing to do with unfound.
Sign...
Sage Weil
09:52 PM Revision 36f703e1 (ceph): osd: removing unused variable, fix warning
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:52 PM Revision 413ecb0b (ceph): osd: only search_for_missing if there are unfound objects
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:52 PM Revision 671b1c09 (ceph): osd: add get_num_unfound() helper
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:52 PM Revision 7ea7a435 (ceph): osd: only discover_all_missing if unfound
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:52 PM Revision 5452dae6 (ceph): osd: recover_primary() until primary has all found objects
The logic in that if was effectively reversed.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:52 PM Revision 5498c467 (ceph): osd: fix recover_replicas() unfound check
missing_loc.count(soid) == 0 only means unfound if it's not missing on the
primary.
Signed-off-by: Sage Weil <sage@n...
Sage Weil
09:52 PM Revision e97eae15 (ceph): init-ceph: tolerate failure in cleanallogs
Otherwise /var/log/ceph/stat makes rm -f error out and we fail.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:52 PM Revision 84612286 (ceph): Build might_have_unfound set at activation
The might_have_unfound set is used by the primary OSD during recovery.
This set tracks the OSDs which might have unfo...
Colin Patrick McCabe
09:52 PM Revision 0e15da8d (ceph): Rename peer_summary_requested to peer_backlog_req
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
09:52 PM Revision c0c301d5 (ceph): osd: PG::read_log: don't be clever with lost xattr
Formerly, we had a special case in read_log for dealing with objects
whose objects were present on the disk, but not ...
Colin Patrick McCabe
09:52 PM Revision 55570baf (ceph): osd: fix PG::is_all_uptodate
In PG::is_all_uptodate, don't try to look for peer_missing[osd->whoami].
The primary keeps that in PG::missing!
Sign...
Colin Patrick McCabe
08:26 PM Revision 36c6569c (ceph): monmaptool: Return a non-zero error code and print a useful error
message if unable to read the monmap file.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Samuel Just
06:14 PM Feature #610 (Resolved): gui: make PG view prettier
The ceph -g GUI should display PGs in a list, rather than as icons that have to be clicked on. We should get rid of t... Colin McCabe
06:13 PM Bug #604 (Resolved): Compiler warning: 'status' may be used uninitialized in this function
Fixed by commit:d6e8e8d15d22b51ec86bc5687336c3d50d9b3a5d
We should change PG view on the GUI to be a list view at ...
Colin McCabe
05:43 PM Revision fc212548 (ceph): mds: allow for old fs's with stray instead of stray0
New fs's get stray0, but we want to still behave with old ones.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:37 PM Revision de61991a (ceph): Merge branch 'testing' into unstable
Conflicts:
configure.ac
Sage Weil
03:00 PM Bug #531: Journaling Causes System Hang
Awesome, thanks for the help. I will give these patches a shot towards the end of the week.
Thanks
Bryan Tong
02:43 PM Bug #599 (Resolved): recover_master_log, doesn't
There were two problems here:
1) we were restarting the osds before the monitors, which in this case prevented a f...
Colin McCabe
02:01 PM Linux kernel client Bug #552: Samba with kernel oplocks=on produces lots of corrupt mds entries in dmesg
Our friends at Tcloud just submitted patches for this today, which I've applied to the unstable branch of our kernel ... Greg Farnum
11:46 AM CephFS Feature #593 (Rejected): mds: fsck: anchor table repair
dup Sage Weil
11:42 AM Feature #609 (Resolved): osd: query pool/pg for objects with given xattr
This will probably take the form of a pool class plugin?
It could start as just a hack, for now.
Sage Weil
11:03 AM Bug #595: Autogen: not a literal
This problem does not seem to occur using 2.68 on my local machine. Slider et al. seem to be using 2.67. Samuel Just
09:39 AM CephFS Bug #608 (Resolved): mds: MDCache::create_system_inode()
this should be fixed by commit:fc212548aea1d7f001b56ba096a79ba54b8a92c3
Thanks!
Sage Weil
07:09 AM CephFS Bug #608 (Resolved): mds: MDCache::create_system_inode()
On a small test cluster I saw that my MDS was not coming up after a fresh mkcephfs, this is what the log showed:
<...
Wido den Hollander
09:33 AM Tasks #584: do throughput scaling tests on sepia
What was the variance in per-node throughput? Did we have one node dominating? Greg Farnum
09:22 AM Tasks #584 (In Progress): do throughput scaling tests on sepia
There's definitely a problem here; the total throughput should be scaling more or less linearly until we hit a bottle... Sage Weil
07:44 AM Bug #563: osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
I'll have to rebuild, since I didn't look at the messages that closely. Wido den Hollander
07:02 AM Revision 868665d5 (ceph): v0.23.1
Sage Weil
06:41 AM Revision c327c6a2 (ceph): mon: always use send_reply for auth replies
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:41 AM Revision 61dd4f03 (ceph): mon: simplify send_reply code
No need to specify destination in send_reply, as we always have the request
for reference.
Simplify MRoute construct...
Sage Weil
01:37 AM Revision 2c71bd33 (ceph): osd: add assert to _process_pg_info
When activating an inactive replica, assert that we are doing so based
on a message from the primary.
Signed-off-by:...
Colin Patrick McCabe
01:35 AM Revision a70943fd (ceph): osd: re-indent some code in _process_pg_info
Re-indent the code and add a comment.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Colin Patrick McCabe
12:12 AM Revision 71369541 (ceph): msgr: tolerate 0 bytes from tcp_read_nonblocking
This can happen, I belive when we get a signal or something.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:12 AM Revision 7ec0034b (ceph): init-ceph: fix (and test!) cleanlogs and cleanalllogs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:03 AM Revision 7b4a801f (ceph): mds: fix rejoin_scour_survivor_replicas inode check
We want to remove replicas that we don't ack, but those don't appear in
the strong_inode map; they're appended to the...
Sage Weil

11/22/2010

11:08 PM Revision 8d95b5b6 (ceph): messenger: init rc to -1, removing compiler warning.
This actually is initialized before all uses, but compilers tend to
have trouble with assignment in if-else branches,...
Greg Farnum
11:08 PM Revision dd11fe27 (ceph): types: Allow inodeno_t structs to alias.
This removes a compiler warning that appeared in a gcc upgrade and
is apparently erroneous, about its usage violating...
Greg Farnum
10:56 PM Bug #540 (Resolved): CephxClientHandler::handle_response
couldn't reproduce this, but fixed two smallish things that may have been responsible for this:
commit:61dd4f03e6e15...
Sage Weil
10:35 PM Linux kernel client Bug #552: Samba with kernel oplocks=on produces lots of corrupt mds entries in dmesg
From the reply dump, it looks like a ceph_mds_reply_head, a length 0 tracebl, a length 1 extrabl (containing a u8 == ... Sage Weil
09:25 PM Revision ac6b018a (ceph): Causes the MDSes to switch among a set of stray directories when
switching to a new journal segment.
MDSCache:
The stray member has been replaced with strays, an array of inodes
r...
Samuel Just
09:16 PM Revision 3f8f5905 (ceph): Timer must be initialized in Client::init and shutdown in
Client::shutdown.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Samuel Just
06:47 PM Revision 8eb4de9e (ceph): generate_past_intervals:generate back to lastclean
PG::generate_past_intervals needs to generate all the intervals back to
history.last_epoch_clean, rather than just to...
Colin Patrick McCabe
06:07 PM Revision 80f28235 (ceph): vstart.sh: 'init-ceph stop' instead of 'stop.sh'
This just makes it easier to run multiple vstart sessions as the same user
on the same host.
Signed-off-by: Sage Wei...
Sage Weil
05:55 PM Revision 53d0650a (ceph): Merge branch 'osd_msgr' into unstable
Sage Weil
05:55 PM Revision cd53719f (ceph): mds: resolve cleanup
Only track ambiguous imports and such if we get a resolve message while in
the resolve state.
Signed-off-by: Sage We...
Sage Weil
05:55 PM Revision c0c81d53 (ceph): mds: trim exported subtree _after_ adjusting auth
We need to set the subtree bounds before trimming it away, or else we may
throw out things we're still auth for.
Sig...
Sage Weil
05:55 PM Revision 9e15ade8 (ceph): mds: do not eval subtree root when replay|resolve
This is nonsensical. And can lead to scatter_writebehind, which breaks
horribly.
Signed-off-by: Sage Weil <sage@new...
Sage Weil
05:55 PM Revision 27c6f217 (ceph): mds: remove bogus assert
Causes problems during resolve finish.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:49 PM Revision 924b1fcb (ceph): osd: bind to new cluster address when wrongly marked down
If we come back up on the same address, there is a possible race. Other
nodes will mark_down when they see us go dow...
Sage Weil
05:45 PM Revision 19409763 (ceph): msgr: implement rebind() to pick a new port
Closes out all old connections and binds to a _different_ port. This
ensures that someone doing mark_down on our old...
Sage Weil
05:09 PM Revision f7170f95 (ceph): client: only encode_cap_releases once per request.
Accomplish this by making a list of cap releases in the (permanent)
MetaRequest, and then copying that into the (pote...
Greg Farnum
04:36 PM Bug #607 (Rejected): osd: ReplicatedPG: sub_op_modify: fix creation of ObjectState
There's a part of the ReplicatedPG::sub_op_modify code that goes like this:
> // do op
> ObjectStat...
Colin McCabe
04:29 PM CephFS Feature #91: mds: up:shadow mode
Updated Journaler to make new interface options asynchronous.
Presently working on how to disambiguate between a one...
Greg Farnum
03:48 PM Tasks #584 (Resolved): do throughput scaling tests on sepia
Results of running rados -p bench bench 20 write on <Nodes>. <Average Throughput> is the average of the Bandwidth st... Samuel Just
01:24 PM CephFS Feature #88 (Resolved): mds: change stray commit strategy to avoid rolling stray dir commits
commit:ac6b018acbeaf8670f8c268db164cfb8a12c171d Sage Weil
12:59 PM Bug #563: osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
Is the stack trace you're getting now identical, or different? The FileStore.cc change _should_ have avoided the asy... Sage Weil
09:28 AM Bug #563: osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
Just to update the issue, Sage asked me to change something in FileStore.cc, tried that for some days, but that didn'... Wido den Hollander
12:47 PM CephFS Feature #606 (Duplicate): mds: optionally store parent attr on file objects
The goal is to be able to find files contained in rebuilt directories (#603). We can store the same attrs we do for ... Sage Weil
12:45 PM CephFS Feature #605 (Rejected): mds: verify/repair anchor table
- Make sure every item we encounter while traversing the that is anchored correctly appears in the anchor table.
- M...
Sage Weil
12:44 PM Bug #604 (Resolved): Compiler warning: 'status' may be used uninitialized in this function
In gui.cc
The warning's location references are a bit off, but the function gen_node_info_from_icons declares a "sta...
Greg Farnum
12:43 PM CephFS Feature #603 (Resolved): mds: repair directory hierarchy
The goals are
- rebuild missing/corrupt directories
- repair multiple primary links to directories
We'll do so...
Sage Weil
12:40 PM CephFS Feature #602 (Resolved): mds: handle corrupt/missing journals
This probably means
- shutting down current instances, resetting cluster membership
- throwing out journals (or m...
Sage Weil
12:37 PM CephFS Feature #601 (New): mds: order directory commits after rename
When we rename something between directories, we should try to commit the target directory _before_ the source direct... Sage Weil
12:34 PM CephFS Feature #600 (Resolved): mds: store full trace on directories
Currently we only store the immediate parent; store a full trace up to the root. This is CInode::encode_parent_mutat... Sage Weil
12:17 PM Bug #599: recover_master_log, doesn't
Also, I have verified that osd3 and osd9 did NOT crash. They're still running, and they did receive the messages from... Colin McCabe
12:13 PM Bug #599 (Resolved): recover_master_log, doesn't
This is another peering bug. We found it on wido's cluster. Basically, peering never completes.
I just examined PG...
Colin McCabe
09:52 AM Bug #592 (Resolved): osd: rebind cluster_messenger when wrongly marked down
commit:53d0650a42cbfd2f02db2c708a570b6d9e116bb4 Sage Weil
09:14 AM CephFS Bug #596 (Resolved): crash during mds reconnect
Well, that seems to fix it. I added a releases vector to the MetaReqest so it will only encode the releases once, and... Greg Farnum
08:49 AM Bug #598 (Resolved): osd: journal reset in parallel mode acts weird
from ML:... Sage Weil
04:52 AM Revision 51abcaa2 (ceph): mon: clean up cluster_addr code a bit, better debug output
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:52 AM Revision 20313644 (ceph): osdmap: fix cluster_addr encoding; printing
The cluster addrs were getting lost because we were checking v instead of
ev.
Signed-off-by: Sage Weil <sage@newdrea...
Sage Weil
04:52 AM Revision 28498a00 (ceph): osd: send correct ip addrs to monitor for cluster_, hb_addr
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:59 AM Revision ec434eda (ceph): osd: unconditionally set up separate msgr instance for osd<->osd msgs
Always set up cluster_messenger (before we would only do so if there was
an explicit address configured for it). The...
Sage Weil
12:16 AM Revision 0dddf453 (ceph): filestore: only warn about disk write cache on kernels <2.6.33
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:15 AM Revision 0856f57e (ceph): osd: fix search_for_missing: old last_update implies object not present
For example, if an osd sends an empty PG::Info (last_update = 0'0) and
empty missing, we should not conclude that the...
Sage Weil
12:09 AM Revision 6ef5c2f3 (ceph): init-ceph: fix cleanlogs for no log_sym_dir case
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

11/21/2010

07:55 PM Linux kernel client Bug #549 (Resolved): bonnie++ file stat failure
commit:3105c19c450ac7c18ab28c19d364b588767261b3 Sage Weil
03:50 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
I think the cleanest solution here is to re-bind the cluster_messenger to a new port when we are marked down and go b... Sage Weil
03:38 PM Linux kernel client Bug #597 (Closed): Reproducible crash mounting multiple directories from a pool
This bug was fixed in v2.6.36, commit:ca04d9c3ec721e474f00992efc1b1afb625507f5. Thanks for the report though! :) Sage Weil
03:34 PM Linux kernel client Bug #597: Reproducible crash mounting multiple directories from a pool
Should have mentioned - this is with the Ubuntu 10.10 desktop kernel, which is 2.6.35-22, I think. Ravi Pinjala
03:33 PM Linux kernel client Bug #597 (Closed): Reproducible crash mounting multiple directories from a pool
When trying to mount a pool multiple times (with different subdirectories) I get a consistent system hang.
Steps t...
Ravi Pinjala

11/20/2010

05:06 PM Bug #531: Journaling Causes System Hang
Please try out the patches in the filestore_throttle branch, commit:b28c0bf82ac28ded4fe85573d32fdc111c66e50b
It lo...
Sage Weil
03:15 AM Revision fc9b0976 (ceph): OSDMap: const cleanup
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
03:14 AM Revision 2a5c3893 (ceph): mds-dumper: Define Dumper::~Dumper()
To fix compile error.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Colin Patrick McCabe

11/19/2010

10:21 PM Revision 8566c5cd (ceph): ReplicatedPG::pull: fix test for unfound
The test for unfound objects was reversed, leading us to try to pull
unfound objects and refrain from pulling objects...
Colin Patrick McCabe
09:41 PM Revision 2f5502fa (ceph): osdmap: fix printing, again
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:21 PM CephFS Bug #596: crash during mds reconnect
The encode_cap_releases can only be called _once_, the very first time we send the request. So at some level this is... Sage Weil
04:22 PM CephFS Bug #596 (Resolved): crash during mds reconnect
While testing my Journaler changes, I got a cfuse segfault. My steps:
vstart with 1 of each daemon
mount cfuse
cop...
Greg Farnum
06:17 PM Revision 4303820b (ceph): Merge remote branch 'origin/mds' into unstable
Sage Weil
04:26 PM CephFS Feature #91 (In Progress): mds: up:shadow mode
I've been getting some proper time in on this on and off over the last few days. Pushed the Journaler changes to the ... Greg Farnum
03:52 PM Bug #531: Journaling Causes System Hang
Okay,
More updates.
1) All the VMs deployed okay but it looks like towards the end of the deployments I hit the...
Bryan Tong
02:49 PM Bug #531: Journaling Causes System Hang
Okay,
I just started the deployment of 12 vms on a new cephfs with 3 osds in and ssd's for journals on all the sys...
Bryan Tong
02:37 PM Bug #531: Journaling Causes System Hang
I am working on getting the output now. We are having to work on several projects at once right now. Sorry for the de... Bryan Tong
03:36 PM Bug #595 (Won't Fix): Autogen: not a literal
We get this running on autoconf 2.67:
configure.ac:6: warning: AC_INIT: not a literal: Sage Weil <sage@newdream.net>...
Greg Farnum
02:29 PM CephFS Bug #594 (Resolved): mds: frag split/merge vs replay
Need to reconcile refragmenting with resolve stage. Currently handle_resolve assumes frags match, when in reality th... Sage Weil
12:11 PM Bug #585 (Resolved): OSD: ReplicatedPG::pull
Fixed by commit:82f1de8c0d6e7817ca7d6dd710e3176b2a549e12 Colin McCabe
10:43 AM Bug #585 (In Progress): OSD: ReplicatedPG::pull
need to see what's going on with this Colin McCabe
11:47 AM Bug #503 (Closed): osd: query osds since last_epoch_clean before concluding objects lost?
Sage Weil
11:39 AM Bug #515 (Can't reproduce): osd: recovery isn't completing
with the recent changes i'm closing this one out, and reopening with specifics if it comes up in testing over the nex... Sage Weil
10:14 AM CephFS Feature #545 (Resolved): mds: use bloom filter to supplement dirfrag COMPLETE flag
merged commit:4303820b43721a8b46ef36d0e9ef4e1167857c80 Sage Weil
09:38 AM CephFS Feature #593 (Rejected): mds: fsck: anchor table repair
We need to be able to fix up the anchor table when there are problems, to avoid e.g.... Sage Weil
05:13 AM Revision b91e14e1 (ceph): multi-dump.sh: add diff mode
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
04:57 AM Revision 9cab522e (ceph): Add multi-dump.sh
This is a debug tool that can dump out Ceph information at various
epochs. For instance, it can show how the OSDmap c...
Colin Patrick McCabe

11/18/2010

11:05 PM Revision 6e2b594b (ceph): ReplicatedPG::get_object_contect: fix broken calls
ReplicatedPG::get_object_context takes three parameters. The last two
are "const object_locator_t& oloc" and "bool c...
Colin Patrick McCabe
09:50 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
Ah. Looks like you got it figured out.
I wasn't aware of what mark_down did.
Just in case anyone finds it useful...
Colin McCabe
09:22 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
ok, this is a problem with how the osd is interacting with the messenger. looking at the history of 0.5, we see
<pr...
Sage Weil
08:42 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
i suspect 0.5 didn't get set up on osd1 or 2 before osd0 went down? do you have the full logs for the other instances? Sage Weil
05:07 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
I should also add that Greg Farnum helped me examine the logs for this bug. Colin McCabe
05:03 PM Bug #592 (Resolved): osd: rebind cluster_messenger when wrongly marked down
This happened with commit:323565343071ce695f7d454ed29590688de64d5d on flab.ceph.dreamhost.com
While running test_u...
Colin McCabe
08:50 PM Revision 43e0b267 (ceph): ReplicatedPG: call finish_recovery when needed
Don't loop in ReplicatedPG::start_recovery_ops. There is already a loop
in both recover_replicas and recover_primary ...
Colin Patrick McCabe
08:33 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Colin McCabe wrote:
> Another potential issue that I can see here is that the code in OSD::_process_pg_info doesn't ...
Sage Weil
12:43 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Another potential issue that I can see here is that the code in OSD::_process_pg_info doesn't check whether it got a ... Colin McCabe
09:26 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
Need to look at this more closely. Fred, pretty sure no data is lost here, but the recovery code needs some fixing.
...
Sage Weil
06:19 AM Bug #590 (Resolved): osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
After upgrading to ceph 0.23, the cluster (3 osd, 3 mon, 3 non-clustered mds) worked for about 2 hours and then one c... ar Fred
06:09 PM Revision ea5d1d66 (ceph): osd_resurrection_1_impl: turn on recovery at end
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
09:47 AM Feature #526 (Resolved): osd: unfound objects rework
We now let the PG become active even when there are unfound objects. When the user tries to read one of those objects... Colin McCabe
07:39 AM Linux kernel client Feature #591 (Resolved): implement FALLOC_FL_PUNCH_HOLE
Sage Weil
12:52 AM Revision 4adfdee7 (ceph): Makefile: fix builddir weirdness
Signed-off-by: Jim Schutt <jaschut@sandia.gov> Jim Schutt
12:10 AM Bug #585: OSD: ReplicatedPG::pull
Well, it did show up again:... Wido den Hollander

11/17/2010

10:37 PM Revision 7e9812b4 (ceph): osd: rev PG::Info encoding for last_epoch_clean change
This was missed by 184fbf582b27c10b47101735a4495fe8c73ad186, so any fs
created between now and then won't decode prop...
Sage Weil
09:06 PM Revision c17e7da4 (ceph): Merge branch 'mds_frags' into unstable
Sage Weil
09:06 PM Revision 7f6a2561 (ceph): mds: clear PIN_SUBTREE on split/merge in purge_strays
This makes the helper work for merge as well as split. Remove the special
fixups in the caller that were making spli...
Sage Weil
09:06 PM Revision 66d43ac8 (ceph): mds: fix subtree map update on dirfrag merge
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:06 PM Revision b705be11 (ceph): mds: wrlock scatterlocks to prevent a gather racing with split/merge lo...
We have the dirs split in our cache for some time while journaling it to
disk, before the fragment_notify goes out. ...
Sage Weil
09:06 PM Revision f6823a79 (ceph): mds: adjust dir_auth_pins on steal_dentry
dir_auth_pins is a counter of dentry auth_pins in the current dir; those
need to be added in when stealing.
Signed-o...
Sage Weil
09:06 PM Revision cd5ee006 (ceph): mds: initialize PIN_SUBTREE on split
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:06 PM Revision d538817f (ceph): mds: flush log on fragment
This makes request lock auth_pins expire, so the fragment moves along.
Otherwise we can end up waiting for the log fl...
Sage Weil
09:06 PM Revision 3777ff8a (ceph): mds: move dirty rstat inodes to new dir on refragment
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:06 PM Revision 669b5544 (ceph): mds: don't complete freeze while parent inode is frozen
This makes maybe_finish_freeze() conditions match that of is_freezeable()
and avoids an assert.
Signed-off-by: Sage ...
Sage Weil
09:04 PM Revision b58b8d09 (ceph): mds: fix discover requests, tracking wrt fragments
Track discover requests by tid. The old system of tracking outstanding
discovers was kludgey and somewhat broken. A...
Sage Weil
09:02 PM Revision a63c06c8 (ceph): mds: fix EFragment replay
If the inode already exists in our cache, adjust our (existing) fragments.
But it might not. In that case, we just r...
Sage Weil
09:02 PM Revision a961049b (ceph): mds: don't fragment mdsdir or .ceph
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:48 PM Revision b54880e0 (ceph): Detect broken system linux/fiemap.h
RedHat 5.5 has a /usr/include/linux/fiemap.h, but it is
broken because it does not itself include linux/types.h.
As a...
Jim Schutt
06:24 PM Revision 29a9e668 (ceph): osdmap: don't include blacklist info in summary
It's confusing users and isn't that important.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:58 PM Revision c43455ce (ceph): client: Remove the I_COMPLETE flag from the parent directory in relink_...
This papers over issues arising from the client's lack of proper support
for hard links, and lets it pass the snaptes...
Greg Farnum
02:35 PM Bug #589 (Resolved): OSD: crash on startup, PG::read_state
Ok, this is fixed by commit:7e9812b4a9bbf320a8b0bd0abec48c1c5d78fe66. Assuming your fs is old enough you should be o... Sage Weil
11:38 AM Bug #589 (Resolved): OSD: crash on startup, PG::read_state
After upgrading to today's unstable all my OSD's crashed directly after startup, for example osd0:
Last loglines a...
Wido den Hollander
12:56 PM Bug #531: Journaling Causes System Hang
Just pinging you on this one. If you can send the logs I'd like to sort this out. Thanks! Sage Weil
09:59 AM CephFS Bug #344: cfuse should pass all qa tests
At this point the only test it's failing is bonnie. This one tends to fail on a SEGV that just keeps going through th... Greg Farnum
09:57 AM CephFS Bug #583 (Resolved): cfuse fails snaptest-upchildrealms
Okay, a proper fix for this is going to require a bit of work, since right now Inodes can only have one parent dentry... Greg Farnum
09:52 AM CephFS Cleanup #588 (Resolved): Allow Inodes to have multiple parent Dentries
Right now, cached Inodes can only have one parent Dentry. This is unfortunate when there are multiple hard links to a... Greg Farnum
09:40 AM Tasks #587 (Rejected): install mpich2 on sepia*
this will make management and testing easier Sage Weil
07:52 AM Bug #585 (Closed): OSD: ReplicatedPG::pull
This one should also be fixed in the latest unstable. Probably. The recovery code is still being worked on a bit, b... Sage Weil
02:55 AM Bug #585 (Resolved): OSD: ReplicatedPG::pull
On two OSD's (osd5 and osd10) I'm seeing the same crash, the crash almost directly after starting them.
I cranked ...
Wido den Hollander
07:19 AM Bug #586 (Resolved): OSD: Crash during scheduled scrub
This was fixed in the commit right after what you were running, commit:556ba7397c352f5a6cb7fe03087c6e2f51dbce32 Sage Weil
05:31 AM Bug #586 (Resolved): OSD: Crash during scheduled scrub
After I reported #585 I didn't pay much attention to my cluster, until I found out that I had only one OSD left onlin... Wido den Hollander
12:09 AM Revision d57181d3 (ceph): config: added max_mds
MDSMonitor: create_new_fs adapted to use the max_mds parameter
max_mds is now a configurable value and create_new_fs...
Samuel Just

11/16/2010

09:00 PM Tasks #584 (Rejected): do throughput scaling tests on sepia
Use rados bench on N nodes, scaling N, and see how the throughput scales. Sage Weil
08:09 PM Revision c4931265 (ceph): mds: make dirfrag thrashing join and split
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:09 PM Revision d1dcc035 (ceph): mds: allow frag merge on subtree root
Fix purge_stolen and adjust_dir_fragments.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:08 PM Revision 8f24919d (ceph): mds: add timestamp to LogEvents
This just gives us a bit of useful info when debugging problems.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:32 PM Revision 56b9e927 (ceph): osd: fix trailing + in pg state string rendering
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:21 PM CephFS Bug #583: cfuse fails snaptest-upchildrealms
Looks like the problem is caused by linking b/bar to b/foo. The server response to goes through insert_dentry_inode v... Greg Farnum
06:17 PM CephFS Bug #583 (Resolved): cfuse fails snaptest-upchildrealms
Fails to rm a/b, ENOTEMPTY. Greg Farnum
06:11 PM Feature #582 (Closed): Make max_mds configurable
Samuel Just
03:06 PM Feature #582 (Closed): Make max_mds configurable
Right now the only way to set it is with the set_max_mds mon command. Add it to the config stuff and have create_new_... Greg Farnum
06:10 PM Revision 2c9873f0 (ceph): Merge remote branch 'origin/unfound' into unstable
Sage Weil
06:06 PM Revision d17f7444 (ceph): mds: be less noisy about cap imports
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:01 PM Revision 05bd6b07 (ceph): Merge branch 'mds_dir_hash' into unstable
Sage Weil
06:01 PM Revision e146767e (ceph): mds: make dentry hash a dir layout property
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:01 PM Revision cc709df8 (ceph): mds: add DIRLAYOUTHASH feature bit
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:01 PM Revision be29e4c3 (ceph): mds: set mode before all the file type dependent inode initialization!
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:01 PM Revision 33580460 (ceph): mds: set dir hash on root inode
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:01 PM Revision 77c05fbc (ceph): mds/client: pass dir hash over the wire
Add a feature bit DIRLAYOUTHASH.
Also fix client request routing for lookups (we were only hashing when
a Dentry poi...
Sage Weil
05:13 PM Bug #479: ceph/mount crash badly when writing
Sorry Sage and Yehuda for the late update..
I was spending time experimenting, and just using the default btrfs with...
DongJin Lee
01:48 PM Bug #538: Write performance does not scale over multiple computers
Did you update your installed version of the rados tool as Sage said? If you did and are still getting poor performan... Greg Farnum
12:48 PM Bug #518: cfuse crashed on ls
Confirmed this is fixed 0.23.1 (sorry for huge delay in confirmation). John Leach
12:06 PM CephFS Feature #483 (Resolved): mds: add timestamp to LogEvent
commit:8f24919d39734cf518f2bf6e50faf6f5266d6eff Sage Weil
11:52 AM CephFS Feature #560 (Resolved): mds: alternate directory hashing
kernel part is done and in unstable branch, currently commit:9f62e3eaafd52875e1f2e4344e11e51ddb726f48 Sage Weil
09:59 AM CephFS Feature #560: mds: alternate directory hashing
commit:05bd6b078d743d6c235c0fcedda7ee4f64ab2ad5 has it working for the user client. Sage Weil
02:33 AM Revision 267cd845 (ceph): RadosClient::shutdown: call monclient::shutdown
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
02:22 AM Revision dfb78ebf (ceph): osd: don't stop recovery when there are unfound
There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push...
Colin Patrick McCabe
01:03 AM Revision d014acb6 (ceph): dumpjournal.cc: fix compile
dumpjournal needs to create its own SafeTimers and pass them in to some
constructors.
Signed-off-by: Colin McCabe <c...
Colin Patrick McCabe
12:44 AM Revision da2d5018 (ceph): rbd: fix rbd snap rm class handling
Yehuda Sadeh

11/15/2010

10:59 PM Revision 250d414e (ceph): Merge remote branch 'origin/unfound_last_epoch_clean' into unstable
Sage Weil
10:47 PM Revision c7075115 (ceph): Add ./ceph osd tell <osd-num> dump_missing <out>
Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging m...
Colin Patrick McCabe
10:38 PM Revision 755f5759 (ceph): search_for_missing:recalc stats if unfound changed
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
09:31 PM Revision d883a547 (ceph): mds: Use CDir bloom filter as appropriate.
Add items to the bloom filter when trimming, and look for them
in the filter in the few places where a simple existen...
Greg Farnum
09:31 PM Revision be2da00a (ceph): mds: Add bloom filter to CDir.
You can now add items to a bloom filter and check for their existence.
This is intended to be used when trimming item...
Greg Farnum
09:23 PM Revision 1fe31e18 (ceph): timer: make init/shutdown explicit
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:39 PM Revision d2af7b7e (ceph): test_unfound.sh: start recovery at end of test
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
08:31 PM Revision c293b9af (ceph): test_common.sh: add dump_osd_store
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
08:15 PM Revision 184fbf58 (ceph): osd: add last_epoch_clean to PG::Info
This changes the encoding in a non-backwards compatible way.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:15 PM Revision 873e9bf8 (ceph): osd: add incompat feature LEC for last_epoch_clean
So an old binary will fail to mount a store with new Info encoding.
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe
08:15 PM Revision b0c22bd5 (ceph): Add MOSDPGMissing
Add MOSDPGMissing, a message which just contains the missing objects
information for a PG. We will request messages l...
Colin Patrick McCabe
08:15 PM Revision d3cf4787 (ceph): PG::finish_recovery: set info.last_epoch_clean
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
08:15 PM Revision e768bbdf (ceph): Add stray_test to test_unfound.sh
This test is designed to produce a stray that nonetheless has some
useful objects. The primary should be able to find...
Colin Patrick McCabe
08:15 PM Revision 796ff1d1 (ceph): Fix bugs in search_for_missing, _process_pg_info
PG::search_for_missing: fix a bug with the handling of MSG_OSD_PG_INFO
messages. Formerly, when processing these mess...
Colin Patrick McCabe
08:15 PM Revision e3f65076 (ceph): osd: add discover_all_missing
Add discover_all_missing. This function makes sure that we have messages
en route to any OSD that we think might have...
Colin Patrick McCabe
08:15 PM Revision 470b1990 (ceph): stray_test:don't use up/down. timeout extension
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
08:15 PM Revision 05a16d32 (ceph): test_unfound.sh: fix return codes
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
08:15 PM Revision 6a65cc4f (ceph): test_common.sh: remove messenger debug for now
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
08:06 PM Revision 873180aa (ceph): osd: skip unfound in recover_replicas
This is moot currently, since we don't currently start recovering replicas
until the primary is complete.
Signed-off...
Sage Weil
08:04 PM Revision d61bc3bf (ceph): osd: skip unfound objects in recover_primary()
We also need to make sure we come back later when they are found.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:57 PM Revision 9ea1d8bb (ceph): osdmap: make printing a bit easier to read
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:50 PM Revision beae97f9 (ceph): objecter: don't dereference null op->outbl
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:36 PM Revision 089cd12d (ceph): include: Add bloom filter library to include/
Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Greg Farnum
07:25 PM Revision f2c080b3 (ceph): Merge remote branch 'origin/testing' into unstable
Sage Weil
07:25 PM Revision 556ba739 (ceph): osd: unreg scrub when removing pg
This fixes this crash:
osd/OSD.cc: In function 'PG* OSD::_lookup_lock_pg(pg_t)':
osd/OSD.cc:956: FAILED asse...
Sage Weil
04:54 PM CephFS Feature #560: mds: alternate directory hashing
almost there. need to fix/test uclient hashing.
then implement for kclient...
Sage Weil
04:44 PM Bug #580 (Resolved): rbd rm snap is broken
Fixed with commit:da2d50180dfdc0e30b4348f2acceb2be650f20b7 Yehuda Sadeh
03:42 PM Bug #580 (Resolved): rbd rm snap is broken
When doing 'rbd rm snap', the rbd image header gets corrupted. Yehuda Sadeh
01:49 PM Bug #535 (Resolved): cephtool hangs forever until a UNIX signal is received
Sage spent some time on the messenger too, and I suspect we're done now. Greg Farnum
01:39 PM CephFS Feature #545: mds: use bloom filter to supplement dirfrag COMPLETE flag
Pushed it to branch "mds" (which I apparently created, but thought existed...weird!). Testing it now on a secondary i... Greg Farnum
11:19 AM Bug #579 (Resolved): OSD::sched_scrub: FAILED assert(pg_map.count(pgid)
commit:f46f674261bf65a6f7f6313fb688ec4773f526b5 Sage Weil
10:56 AM Bug #579: OSD::sched_scrub: FAILED assert(pg_map.count(pgid)
Some more information about this bug.
OSD1 and OSD2 have a PG named 0.6
OSD0 does not.
=====================
...
Colin McCabe
10:51 AM Bug #579 (Resolved): OSD::sched_scrub: FAILED assert(pg_map.count(pgid)
On unfound_last_epoch_clean at commit commit:7201497f2feef6a2bbd0baf89e3a14b8a880e79f
I found this assert when run...
Colin McCabe
07:05 AM Bug #538: Write performance does not scale over multiple computers
I set 'osd heartbeat grace=120' and that got rid of the chatter. My performance is now:... Ed Burnette
04:48 AM Revision 7f38858c (ceph): Merge branch 'msgr_zerocopy_read' into unstable
Sage Weil
04:39 AM Revision 7cb2d508 (ceph): msgr: use provided rx buffer if present
This changes the read path so that we hold the Connection::lock mutex while
reading data off the socket. This ensure...
Sage Weil
04:39 AM Revision e8132cd9 (ceph): objecter: post rx buffer to msgr if target bufferlist is present
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:39 AM Revision 975dd8fa (ceph): librados: pass provided buffer to objecter on rados_read
This allows us to avoid to the data copy if the objecter and msgr manage
to use it.
Signed-off-by: Sage Weil <sage@n...
Sage Weil
04:23 AM Revision 2854dae8 (ceph): msgr: add Connection rx buffer interface
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:23 AM Revision c04ba725 (ceph): msgr: implement get_connection()
Get a Connection* for the given destination. This mirrors submit_message,
but does not actually queue a message.
Si...
Sage Weil
04:21 AM Revision 67852352 (ceph): buffer: implement list::iterator::get_current_ptr()
Return a buffer::ptr for the ptr at the current position/offset, with the
length set to the remaining space in the cu...
Sage Weil

11/14/2010

09:05 PM Messengers Feature #527 (Resolved): zero copy reads, msgr rx buffer infrastructure
commit:7f38858c0c19db36c5ecf36cb4d333579981c811 Sage Weil
07:29 PM Revision 4af14db4 (ceph): Objecter::shutdown: shut down timer.
We have to explictly shut down the timer in Objecter::shutdown.
Otherwise, we are relying on the destructor of SafeTi...
Colin Patrick McCabe
11:33 AM Bug #578 (Resolved): assert triggered on radostool shutdown
Colin McCabe
11:33 AM Bug #578: assert triggered on radostool shutdown
Fixed by commit:4af14db424e770c2f3e99dad6fd2b6f2059feacd
A mutex lifecycle issue.
Colin McCabe
11:26 AM Bug #578 (Resolved): assert triggered on radostool shutdown
I hit this assert when radostool was exiting.
./common/Mutex.h:97: FAILED assert(nlock == 0)
ceph version 0.24~r...
Colin McCabe

11/13/2010

08:46 PM Bug #574: timer: event cancellation apparently broken
cancel_event always relied on the caller to take the SafeTimer lock, and then goes on to take the Timer lock. So it's... Colin McCabe
08:39 PM Bug #535: cephtool hangs forever until a UNIX signal is received
It looks good so far. Colin McCabe
04:43 AM Revision f18609e8 (ceph): Merge remote branch 'origin/msgr' into testing
Sage Weil
12:00 AM Revision 2be4215a (ceph): debug: don't print thread id twice
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

11/12/2010

11:59 PM Revision b61af6a7 (ceph): msgr: cleanup: make queue_received non-inline; some helpful debug
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:56 PM Revision f99c84e6 (ceph): msgr: do not clear halt_delivery
We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new m...
Sage Weil
10:55 PM Revision 1071a9ab (ceph): msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
Close a few different races here.
Also, assert that queue_items are not queued in ~Pipe().
Signed-off-by: Sage Weil...
Sage Weil
10:55 PM Revision d4746ab5 (ceph): msgr: close enqueue/discard race
We need to re-check halt_delivery after dropping and retaking pipe_lock.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:55 PM Revision 20937e88 (ceph): msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
We want to make sure the pipe's queue item doesn't go away.
Also, make queue_received() require pipe_lock to be held...
Sage Weil
10:55 PM Revision cbf154e1 (ceph): msgr: only close socket on reconnect or shutdown
We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race w...
Sage Weil
10:55 PM Revision 70fe062f (ceph): msgr: add 'ms inject socket failures = foo'
Where we fail roughly every foo'th socket operation.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:49 PM Revision 20affc65 (ceph): TestTimers: don't test (nonexistent) Timer
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
10:45 PM Revision d5032a05 (ceph): Rename PG::peer to PG::do_peer
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
03:59 PM Revision 46cf27d4 (ceph): Merge branch 'testing' into unstable
Sage Weil
03:55 PM Revision c5b2d28b (ceph): uclient: insert lssnap results under snapdir, not live dir
Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent d...
Sage Weil
03:36 PM Revision 7ccdae8c (ceph): msg: fix buffer size for IPv6 address parsing
Signed-off-by: Wido den Hollander <wido@widodh.nl> Wido den Hollander
02:20 PM Bug #577 (Resolved): unify PG creation code in OSD::handle_pg_notify and OSD::_process_PG_info
unify PG creation code in OSD::handle_pg_notify and OSD::_process_PG_info
Duplicated code here. They're slightly d...
Colin McCabe
02:16 PM CephFS Feature #545: mds: use bloom filter to supplement dirfrag COMPLETE flag
Trying to find a bloom filter library. Unfortunately there don't seem to be any available under a GPL-compatible lice... Greg Farnum
01:16 PM Bug #490 (Can't reproduce): Cluster stays in a degraded state
Sage Weil
01:15 PM CephFS Cleanup #514 (Rejected): Optimize MIX/MIX_STALE reconnects, etc
mix_stale is no more Sage Weil
12:56 PM Linux kernel client Bug #576 (Can't reproduce): readdir returns too many results
... Sage Weil
11:02 AM Bug #535: cephtool hangs forever until a UNIX signal is received
Pushed a potential fix to the msgr branch, waiting for Colin to report back on if it works or not. :) Greg Farnum
07:56 AM CephFS Bug #561 (Resolved): snaptest-2 doesn't execute properly
Figured this out. LSSNAPs was adding the snap dentries to the cache under the parent dir instead of the hidden .snap... Sage Weil
07:37 AM Messengers Bug #573 (Resolved): monmaptool fails to parse IPv6 address
Thanks, applied as commit:7ccdae8cd44c143550234511a2a09bab38c6515e Sage Weil
04:56 AM Messengers Bug #573: monmaptool fails to parse IPv6 address
After searching through the source I found it :)
Attached is a patch to fix the IPv6 address parsing. The buffer w...
Wido den Hollander
05:12 AM Bug #575 (Resolved): monmaptool terminates when input file is not a monmap
For example:... Wido den Hollander
03:30 AM Bug #540: CephxClientHandler::handle_response
Just saw it again on the same cluster, this time osd2 crashed when upgrading to this morning's unstable:... Wido den Hollander
12:29 AM Bug #540: CephxClientHandler::handle_response
I saw that on a test machine of mine. The 'ceph -w' command was hanging for about 10 seconds and then exited with thi... Wido den Hollander
12:38 AM Revision ce6d6394 (ceph): timer: rewrite mostly from scratch
Just use the provided lock. This _vastly_ reduces the complexity because
we don't have to worry about races between ...
Sage Weil

11/11/2010

11:31 PM Revision 54848991 (ceph): mds: hit inode created via CREATE
We missed this path!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:28 PM Revision f8b3271f (ceph): Merge branch 'rc' into unstable
Conflicts:
configure.ac
src/Makefile.am
Sage Weil
05:47 PM Bug #531: Journaling Causes System Hang
Sorry I have been able to get the debug output yet. We have spent the last few days working with our production syste... Bryan Tong
04:47 PM Linux kernel client Tasks #569 (Resolved): test dir frags
a few fixes, mostly fine. commit:7b88dadc13e0004947de52df128dbd5b0754ed0a Sage Weil
04:43 PM Bug #574 (Resolved): timer: event cancellation apparently broken
Looking into this, it appears that the problem was that the wrong lock was taken during cancel event. Or that the ev... Sage Weil
03:38 PM Bug #574 (Resolved): timer: event cancellation apparently broken
Just saw this on latest unstable, commit:f8b3271f45cc4a87e3f3f212d22e3d34ff13da44
The monitor schedules a propose ...
Sage Weil
03:09 PM CephFS Tasks #366 (New): test snaptests against clustered mds failures
Sage Weil
03:08 PM CephFS Tasks #366 (Rejected): test snaptests against clustered mds failures
Sage Weil
03:08 PM CephFS Bug #362 (Rejected): mds: rejoin crashes on snaptest-2 workload
Sage Weil
02:45 PM Bug #540: CephxClientHandler::handle_response
Wido just saw this:... Sage Weil
05:18 AM Revision 5d1d8d0c (ceph): v0.23
Sage Weil
04:58 AM Revision 3d10b340 (ceph): mds: fix null_snapflush with multiple intervening snaps
The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap. However, the ...
Sage Weil
02:17 AM Messengers Bug #573 (Resolved): monmaptool fails to parse IPv6 address
I'm trying to setup a small cluster with IPv6, but mkcephfs fails:... Wido den Hollander
12:36 AM Revision 3d6e9155 (ceph): Merge remote branch 'origin/unfound' into unstable
Sage Weil
12:31 AM Revision 4d941cf4 (ceph): osd: scrub: change cancel behavior
Use explicit flag, so that scrub_reserved always indicates whether the
osd count includes us or not.
Signed-off-by: ...
Sage Weil
12:31 AM Revision a87e8901 (ceph): osd: track last_scrubbed in PG::Info::History
Share with peers and write to disk on scrub completion.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:31 AM Revision 6548fb65 (ceph): osd: do scrub schedule state changes inside scrub()
Update these values under protection of pg lock iff we start scrubbing,
otherwise back out.
On scrub completion, unr...
Sage Weil
12:31 AM Revision 815c3d56 (ceph): osd: fix sched_scrub
Insert whoami into reserved set on primary, not 0! Also more cleanup of
sched state helpers.
Signed-off-by: Sage We...
Sage Weil
12:31 AM Revision 92572910 (ceph): osd: call sched_scrub on reserve reply
Otherwise we have to wait until the next time it's called by the timer, and
during that period we have a reservation ...
Sage Weil
12:31 AM Revision c12829a2 (ceph): osd: don't scrub something we just scrubbed
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:31 AM Revision 85e08905 (ceph): osd: scrub least recently scrubbed pgs first; once a day
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

11/10/2010

10:50 PM Revision 231434af (ceph): pg_state_string: use an ostringstream
Use an ostringstream for efficiency's sake.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Colin Patrick McCabe
09:49 PM Revision d247616c (ceph): vstart: stop logging to /tmp/foo
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:46 PM CephFS Bug #561: snaptest-2 doesn't execute properly
I ran the test again and didn't get an mds crash. There was one issue remaining:... Greg Farnum
06:14 PM CephFS Bug #561 (In Progress): snaptest-2 doesn't execute properly
I think I may have finally nailed this problem, or at least found a band-aid by more aggressively removing the I_COMP... Greg Farnum
09:39 PM Revision 74be621c (ceph): osd: fix scrub reserved state when starting scrub
Also document scrub scheduling/pending/active states.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:18 PM CephFS Bug #570 (Resolved): Locker::_do_null_snapflush assert failure
Sage Weil
09:18 PM CephFS Bug #570: Locker::_do_null_snapflush assert failure
Nice catch. Fixed by commit:3d10b340748e5bbff86b49ac7386da9efa27a070. Added a unit test too! Sage Weil
02:58 PM CephFS Bug #570 (Resolved): Locker::_do_null_snapflush assert failure
Seen this a lot while working on the snaptest-2 issue, when shutting down cfuse.... Greg Farnum
09:16 PM Revision 8650418f (ceph): vstart: turn down msgr debugging
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:13 PM Revision 9e4027fb (ceph): monc: cancel timer events with lock held
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:23 PM Revision 07bb6756 (ceph): Wake up clients waiting for now-found objects
PG::search_for_missing: when we find a previously unfound object, check
to see if there is an entry in waiting_for_mi...
Colin Patrick McCabe
07:46 PM Revision 8288a23a (ceph): PG::peer: don't block if objects are unfound
Erase the code in PG::peer that used to keep us from becoming active
when objects were still unfound. Print out the n...
Colin Patrick McCabe
07:46 PM Revision 040c4bcd (ceph): PG::search_for_missing: minor refactoring, comment
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:46 PM Revision 5153ba5e (ceph): Add PG::Missing::have_missing()
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:46 PM Revision 85c4e6e6 (ceph): OSD::_process_pg_info:search_for_missing sometimes
OSD::_process_pg_info: If we're the primary for this active PG, and we
have missing objects, call search_for_missing....
Colin Patrick McCabe
07:46 PM Revision 6a04ac52 (ceph): PG::recover_master_log: rename a local variable
PG::recover_master_log: rename a local variable to avoid using the
overloaded term "missing".
Signed-off-by: Colin M...
Colin Patrick McCabe
07:46 PM Revision b5181133 (ceph): test_unfound.sh: shorter test
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:46 PM Revision 02ec7219 (ceph): Add num_objects_unfound to struct pg_stat_t
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:46 PM Revision fc605ced (ceph): test_unfound.sh: verify that we have unfound objs
test_unfound.sh: verify that we have unfound objs.
Then, when we bring up the other OSD, verify that those unfound ob...
Colin Patrick McCabe
07:46 PM Revision b9191ddc (ceph): test_unfound.sh: test reading an unfound object.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
07:46 PM Revision e6b6c539 (ceph): PG::peer: count/find cleanup
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
06:30 PM Revision b80f3e6a (ceph): PG: move ostream operator to .cpp file
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
06:30 PM Revision a46f15e7 (ceph): PG: nomenclature change: talk about unfound objs
Describe objects as "unfound" when we don't know what OSD has them.
Signed-off-by: Colin McCabe <colinm@hq.newdream....
Colin Patrick McCabe
06:30 PM Revision ef1f8ecd (ceph): PG.h erase deadcode
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
06:16 PM Bug #535 (In Progress): cephtool hangs forever until a UNIX signal is received
After checking the logs and conferring with Sage, I think I've found a possible cause. Designing and testing a fix no... Greg Farnum
05:43 PM Revision 82aa79f8 (ceph): mds: fix inode->frag rstat projected with snaps
The snapid 'first' value needs to be >= inode->first; move that into
the helper.
Signed-off-by: Sage Weil <sage@newd...
Sage Weil
05:04 PM Revision 5deef243 (ceph): osdmap: break up asserts for easier debugging
If we fail one of these it's helpful to know which one.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:03 PM Revision 586c9e7a (ceph): objecter: throttle before looking at lock protected state
The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take...
Sage Weil
04:50 PM Revision 57513739 (ceph): mon: drop unnecessary state checks
We want to ignore all beacons from the mds regardless of what state they
are in.
Signed-off-by: Sage Weil <sage@newd...
Sage Weil
04:46 PM Feature #567 (Resolved): osd: background scrub frequency, scheduling
fixed up some scheduling problems, then added the interval and oldest-scrubs-first stuff. Sage Weil
04:45 PM Revision 84840ed7 (ceph): debian: don't explicitly depend on libgoogle-perftools0
dpkg-buildpackage will autodetect the dependency. Except on lenny, where
it doesn't exist and we don't use it!
Sign...
Sage Weil
04:14 PM Revision ca3693d8 (ceph): mds: Enable --journal_check mode.
This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
a...
Greg Farnum
04:13 PM Revision 214b7269 (ceph): osdc: Fix bad assert in ~ObjectCacher.
The objects data member is never empty on shutdown since it now consists
of a vector of pools. Instead, check each po...
Greg Farnum
03:43 PM Feature #572 (Resolved): Implement lingering osd requests
For the watch/notify feature we need to implement lingering osd requests on the userspace client side. Lingering osd ... Yehuda Sadeh
03:42 PM Revision 5035c822 (ceph): uclient: only update inode if version increased
This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning inf...
Sage Weil
03:21 PM Linux kernel client Bug #571 (Closed): client hangs after osd disconnection
This happens on the rbd watch/notify sync branch. Probably related to lingering requests. Yehuda Sadeh
12:12 PM Bug #559 (Rejected): osd: dup requests can ack early
nevermind, this is already done and merged! Sage Weil
11:01 AM Linux kernel client Tasks #569 (Resolved): test dir frags
Make sure we behave with fragmented dirs, esp readdir. (probably need to mirror the recent cfuse fixes.) Sage Weil
09:43 AM Bug #521 (Resolved): objecter: crash in osdmap assert
commit:586c9e7a80b425802ca77d8c09bb00da5c25d616 Sage Weil
09:15 AM Feature #568 (Resolved): debian: build with --as-needed?
Can we do this to limit dependencies? See #544.
And the current warnings like...
Sage Weil
08:18 AM CephFS Feature #548 (Resolved): mds: shadowreplay one-shot mode
commit:ca3693d8ffcdffc3ae95eaba506a72889829bcb5 makes minimal changes to the MDS and MDSMonitor code to enable the ne... Greg Farnum
08:03 AM Revision 255e34af (ceph): decompile_crush_bucket: fix depth-first decomp
We need to ensure that buckets are output after their dependencies. The
best way to do this is a depth-first traversa...
Colin Patrick McCabe
07:58 AM Revision d1f15daf (ceph): CrushWrapper:get_bucket: ret ENOENT for no bucket
All the callers of CrushWrapper::get_bucket() check for error codes, but
not for NULL returns. So if there is no buck...
Colin Patrick McCabe
07:24 AM Bug #531: Journaling Causes System Hang
What would be helpful in diagnosing this problem is:
- turn up osd logging, in [osd] section:
debug osd = 20
...
Sage Weil

11/09/2010

11:56 PM Revision 11cfcfe8 (ceph): Merge branch 'sched_scrub' into unstable
Conflicts:
src/osd/PG.cc
src/osd/PG.h
Sage Weil
11:50 PM Revision e8ad6d26 (ceph): osd: small cleanup
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:46 PM Revision 28b44293 (ceph): osd: scrub: list objects without lock held
We'll go back to get anything we missed later.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
11:46 PM Revision c2d6d05f (ceph): Merge branch 'scrub_no_lock' into unstable
Sage Weil
11:34 PM Revision 966369aa (ceph): ps-ceph.pl: don't show self
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
11:04 PM Revision 6bc31511 (ceph): gui: add missing #include
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:50 PM Revision 58394828 (ceph): Merge branch 'rbd-fiemap' into unstable
Sage Weil
10:49 PM Revision e991702e (ceph): objecter: set READ flag on new objecter mapext/read_sparse ops
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:48 PM Revision adac5163 (ceph): objecter: fix balancer for ops with length < 0
Notably, mapext.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:36 PM Revision 20060548 (ceph): filestore: autodetect presense of FIEMAP ioctl
If it's not there, assume the whole object is allocated.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:35 PM Revision e5488718 (ceph): fiemap: include linux fiemap.h header; unconditionally compile helper
If the system doesn't have the header, use our copy.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:33 PM Revision 9f14dd25 (ceph): ps-ceph.pl: display Ceph tests
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
10:23 PM Revision 53b076d5 (ceph): Merge remote branch 'origin/rbd-fiemap' into unstable
Sage Weil
10:06 PM Revision 2325a1a2 (ceph): Fix example config file
We need to specify a journal size for the file-based journal we set up
in the example config file.
Signed-off-by: Co...
Colin Patrick McCabe
09:59 PM Revision 2947d19d (ceph): TimerThread:don't call pop_front before iter deref
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
09:30 PM Revision 1c7d8f1a (ceph): Makefile: use openssl module check
This allows ceph to build with --as-needed.
Signed-off-by: Kacper Kowalik <xarthisius@gentoo.org>
Kacper Kowalik
09:17 PM Revision 954ad982 (ceph): osd: shut down if we do not exist
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:08 PM Revision ea56dfdc (ceph): osd: handle osds that no longer exist in prior_set_affected
Consider no-longer-existent OSDs lost.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:05 PM Revision 29428b9b (ceph): Objecter: initialize timer in Objecter::init
Just in case future users of Objecter want to create one before calling
Messenger::start as a daemon.
Signed-off-by:...
Colin Patrick McCabe
06:15 PM Revision ec4200b0 (ceph): Add test_crushtool.sh
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
06:06 PM Revision 019bb70e (ceph): mds: turn on mds_bal_frag (dir fragmentation) by default
Let the fun begin!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:04 PM Revision ae13fc86 (ceph): osd: handle osds that no longer exist in build_prior
Fix build_prior to handle OSDs that no longer exist in the current map.
Consider them lost.
Signed-off-by: Sage Weil...
Sage Weil
06:04 PM Revision e15c9569 (ceph): mds: fix inode freeze auth pin allowance
When we're renaming across nodes, we need to freeze the inode. This
requires that we allow for the auth_pins that _w...
Sage Weil
06:03 PM Revision 3107944e (ceph): osdmap: cleanup: add parens
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:59 PM Revision f28b99b3 (ceph): CrushWrapper::get_bucket_item: bounds check
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> Colin Patrick McCabe
05:59 PM Revision 9b487256 (ceph): crushtool: don't create a dump we can't recompile
In crushtool, dump buckets in tree order. Buckets which reference other
buckets must be dumped after their depedencie...
Colin Patrick McCabe
05:55 PM Revision e1588dc4 (ceph): mds: wipe out client sessions on startup
For disaster recovery and such.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:55 PM Revision 05a47387 (ceph): mon: implement 'mds newfs <metapool> <datapool>' command
Create a new fs (by creating a new MDSMap) using the given pools.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:55 PM Revision d80948ad (ceph): mds: use mdsmap data pool for root inode default layout
The MDSMap may specify any random pool as the data pool; use that.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:55 PM Revision 8a21c6f6 (ceph): mds: add mds_skip_ino and mds_wipe_ino_prealloc options
These are last-ditch recovery tools. Not particularly effective ones,
though.
Signed-off-by: Sage Weil <sage@newdre...
Sage Weil
05:04 PM Linux kernel client Bug #549: bonnie++ file stat failure
bonnie tests are running under ceph 5, 6, 8, and 9, logging to /data/qa/ on each machine. Terri Haber
04:28 PM Bug #535: cephtool hangs forever until a UNIX signal is received
cephtool-hang-at-966369aad07461f2610b4dd2a9cdc770155c5a89.txt Colin McCabe
03:08 PM Bug #535: cephtool hangs forever until a UNIX signal is received
messenger-bug.txt Colin McCabe
04:26 PM Bug #521: objecter: crash in osdmap assert
Can you try with something like... Sage Weil
09:45 AM Bug #521: objecter: crash in osdmap assert
latest from ML:... Sage Weil
03:59 PM Feature #567 (Resolved): osd: background scrub frequency, scheduling
We should have some min interval such that the osds won't scrub the same osd more frequently than that.
Also, the ...
Sage Weil
03:56 PM Feature #425 (Resolved): trigger osd scrub automatically
Sage Weil
03:54 PM Subtask #485 (Resolved): osd: cooperative scrub scheduling
merged by commit:11cfcfe87503e50c892178d9c5c5b55da3aac740 Sage Weil
03:45 PM Subtask #486 (Resolved): osd: make scrub not block writes
merged commit:28b44293e34c5e97f350b4c68becdf9e7767ed6f Sage Weil
02:52 PM Bug #248 (Resolved): rbdtool import should use fiemap
Sage Weil
02:52 PM Bug #248: rbdtool import should use fiemap
Merged by commit:58394828a01950d7b26430d61d32df91df5a5fb1, bringing it in line with the objecter changes over the las... Sage Weil
02:13 PM RADOS Bug #558 (Resolved): crushtool cannot always re-encode a crushmap that it's created
Fixed by commit:9b48725614a880cf1f4bcad0bba2ceefdc76c167
C.
Colin McCabe
02:11 PM Bug #533 (Resolved): radostool hang on shutdown
Should be fixed by timer-fixes.
C.
Colin McCabe
02:10 PM Bug #565 (Resolved): Example config file is broken
Fixed by 2325a1a27b434cea7d7af832efff7a9257724fe6
C.
Colin McCabe
01:30 PM Bug #544 (Resolved): ceph-0.22.2: fails to build with --as-needed
Sage Weil
01:16 PM Bug #566 (Resolved): osd: build_prior needs to be wary of nonexistent osds
fixed by commit:954ad98230085c9c2a174fe15af24df237498977 commit:ea56dfdc663f8b0e19346bb63ffe3fec0c7759c4 commit:ae13f... Sage Weil
12:59 PM CephFS Bug #556 (Resolved): clustered mds: rename
this wasn't too bad.. the locking auth_pin scheme changed a while ago and the auth_pin allowance didn't get adjusted ... Sage Weil
12:42 PM Linux kernel client Bug #546 (Resolved): direct i/o does not work when offset is not page-aligned
See commit:c5c6b19d4b8f5431fca05f28ae9e141045022149. Passes my tests. Sage Weil
06:03 AM Revision aad3f7f2 (ceph): ceph.spec.in: don't strip rados classes
Signed-off-by: Christian Brunner <christian@brunner-muc.de> Christian Brunner

11/08/2010

10:49 PM Bug #535: cephtool hangs forever until a UNIX signal is received
> Look, I know it's a pain, but work on this isn't going to progress unless
> we collect AT LEAST:
> 1) The state ...
Colin McCabe
01:05 PM Bug #535 (Can't reproduce): cephtool hangs forever until a UNIX signal is received
Look, I know it's a pain, but work on this isn't going to progress unless we collect AT LEAST:
1) The state of each ...
Greg Farnum
10:35 AM Bug #535: cephtool hangs forever until a UNIX signal is received
The process that is hung is 17181, cephtool. Colin McCabe
10:35 AM Bug #535 (In Progress): cephtool hangs forever until a UNIX signal is received
Reproduced again on the unfound branch, which is very close to what is in unstable now.
cmccabe@flab:~/src/ceph/...
Colin McCabe
09:22 PM Revision 64f95ad9 (ceph): mds: add missing Dumper.[h,cc]
Sage Weil
09:18 PM Revision be9328ac (ceph): mds: tolerate/fix negative dir size counts
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:44 PM Revision d5515a8f (ceph): mds: add missing Dumper.[h,cc]
Sage Weil
08:40 PM Bug #566 (Resolved): osd: build_prior needs to be wary of nonexistent osds
... Sage Weil
08:09 PM Bug #565 (Resolved): Example config file is broken
The example config file (src/sample.ceph.conf) specifies the OSD journal as a file, but doesn't specify the size, whi... Ravi Pinjala
05:45 PM Revision 1ab7c7ff (ceph): Replace ps-ceph.sh shell script with perl script
A much faster version of ps-ceph.sh.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Andrew F
04:17 PM Linux kernel client Bug #384 (Closed): crash in splice_dentry
Sage Weil
03:07 PM Feature #80 (Resolved): uclient: readdir from cache
He already did it, yay! Greg Farnum
02:41 PM Feature #96 (Resolved): msgr: close idle connections?
Yay, this got done with the recent SimpleMessenger changes! Greg Farnum
02:15 PM Feature #276 (Resolved): Possibility to dump/list xattrs from RADOS object
Yehuda says he did this! Greg Farnum
01:24 PM Bug #531: Journaling Causes System Hang
We've looked at this a bit more but decided today that Sage is taking it over since he's a lot more familiar with the... Greg Farnum
12:49 PM Linux kernel client Bug #564 (Resolved): Configuration via configfs instead of sysfs
Will allow creation of different devices and setting them up. Should be device oriented, and will create a sub direct... Yehuda Sadeh
11:05 AM Bug #563 (Closed): osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
I'm running the unstable branch and I'm seeing in my dmesg:... Wido den Hollander
09:32 AM CephFS Bug #561: snaptest-2 doesn't execute properly
Okay, looks like this may be an issue with the test rather than Ceph. I just copied it into the root of the ceph moun... Greg Farnum
09:07 AM CephFS Bug #561 (Resolved): snaptest-2 doesn't execute properly
Checked it on cfuse and kclient:... Greg Farnum
09:27 AM RADOS Bug #558: crushtool cannot always re-encode a crushmap that it's created
Either the compiler part just needs to be updated to allow forward bucket references, or the dumper needs to dump by ... Sage Weil
09:26 AM Feature #562 (Closed): separate gui into separate binary, package
This will mean refactoring common ceph.cc bits into a separate file and .a. Sage Weil
09:22 AM Linux kernel client Bug #434: mds: clustered mds pjd failures
a few more fixes here on inode updates version check and mtime. Sage Weil
07:23 AM Linux kernel client Bug #434 (Resolved): mds: clustered mds pjd failures
this was a kclient problem caused by bad uid/gid in resent requests. fixed by commit:cb4276cca4695670916a82e359f2e377... Sage Weil
09:20 AM Tasks #406 (Closed): push v0.20.2 to upstream debian, ubuntu maintainers
Sage Weil
09:20 AM CephFS Cleanup #427 (Rejected): mds: tie scatter pins directly to freeze machinery
no more scatterpins, yay! Sage Weil
09:19 AM Linux kernel client Bug #554 (Resolved): clustered mds: max_size not updated
Sage Weil
07:39 AM CephFS Feature #560 (Resolved): mds: alternate directory hashing
Currently dentries are hashed among dirfrags using the linux dcache's hash function, which is pretty trivial. The pr... Sage Weil
07:30 AM Bug #559: osd: dup requests can ack early
The dup request check looks at the reqid in the log, and replies early. That request could still be in flight to dis... Sage Weil
07:28 AM Bug #559 (Rejected): osd: dup requests can ack early
Sage Weil

11/07/2010

06:02 PM RADOS Bug #558 (Resolved): crushtool cannot always re-encode a crushmap that it's created
When a CRUSH text map is encoded, the buckets are read in such a way that they must be defined before they are refere... Ravi Pinjala
05:56 PM Revision 0feec2f4 (ceph): Merge remote branch 'origin/object_locator' into unstable
Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osd/ReplicatedPG.h
src/osd/osd_types.h
Sage Weil
05:45 PM Revision b7f578cf (ceph): Merge remote branch 'origin/timer-fixes' into unstable
Sage Weil
05:44 PM Revision deb9ef76 (ceph): v0.24~rc
Sage Weil
05:42 PM Revision 0b190920 (ceph): Merge remote branch 'origin/testing' into unstable
Sage Weil
03:49 PM Revision a4674af5 (ceph): mds: eval: put scatter in MIX if replicated, otherwise LOCK
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:45 PM Revision 33c6e230 (ceph): mds: do not scatter_writebehind in MIX state
Replicas might come in while we're flushing and get a MIX state with
the old state.
Signed-off-by: Sage Weil <sage@n...
Sage Weil
11:29 AM Feature #231: Slow OSDs shouldn't destroy cluster performance
Today I experienced a btrfs bug where *[btrfs-transacti]* got to status D and causing my OSD to hang (also go into st... Wido den Hollander
10:18 AM Linux kernel client Bug #554: clustered mds: max_size not updated
fixed by commit:912a9b0319a8eb9e0834b19a25e01013ab2d6a9f. also commit:feb4cc9bb433bf1491ac5ffbba133f3258dacf06 for g... Sage Weil
10:15 AM Feature #524 (In Progress): object_locator_t
Work so far merged by commit:0feec2f4f31aa3a259b2cdf885d6458995ce860b
Still need to update the on-wire protocol to...
Sage Weil
10:08 AM CephFS Feature #495 (Resolved): mds: add MIX_STALE
merged in commit:0b1909209800229f5098cdc848fc3901508c1e19. best part of this is MIX_STALE went away. yay! Sage Weil
10:05 AM Bug #248 (In Progress): rbdtool import should use fiemap
whoops, this never got merged. Sage Weil
08:58 AM Linux kernel client Bug #557 (Can't reproduce): BUG_ON(!session->s_num_cap_releases);
... Sage Weil
08:11 AM CephFS Bug #556 (Resolved): clustered mds: rename
various hangs with thrash-exports and pjd rename tests. Sage Weil
04:05 AM Revision 1bf8e732 (ceph): Merge branch 'unstable' into mix_stale
Sage Weil
04:01 AM Revision 1eb94da2 (ceph): mds: introduce/use helpers to resync stale fragstat/rstat; update version
Simplifies code.
Also, update the version when we resync!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:01 AM Revision c1ee560e (ceph): mds: don't fuss with versions when taking frag/rstat from frag; it's ne...
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:01 AM Revision bdc2fa5b (ceph): mds: remove MIX_STALE
Yay, we don't need it!
If we can't update the frag on scatter, fine. The staleness of the frag
is implicit in the f...
Sage Weil
04:00 AM Revision c2034829 (ceph): mds: ignore done_locking on slave requests' acquire_locks()
Slave requests ask for each xlock one at a time. Don't bail out based on
the done_locking flag.
Signed-off-by: Sage...
Sage Weil
04:00 AM Revision 51b6a863 (ceph): mds: don't use helper for rename srcdn
The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag. For the src...
Sage Weil
04:00 AM Revision eb0a60d0 (ceph): mds: never complete a gather on a flushing lock
The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even mov...
Sage Weil

11/06/2010

04:38 PM Revision bdf3bc5e (ceph): mds: update version when bring stale rstat back up to date
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
02:58 PM Revision a74054d1 (ceph): mds: simplify stale semantics a bit
is_stale() => next MIX is MIX_STALE. Stale flag is then cleared. Then we
special case the import to preserve stale-n...
Sage Weil
01:30 PM Bug #555 (Closed): debian/ubuntu: ceph-client-tools needs to depend on libgtkmm-2.4-1c2a
Invalid report, it was due to a upgrade. When doing a fresh install of the packages they do depend on libgtkmm.
Cl...
Wido den Hollander
11:52 AM Bug #555 (Closed): debian/ubuntu: ceph-client-tools needs to depend on libgtkmm-2.4-1c2a
Right now, the building process depends on libgtkmm-2.4-dev, but when installing the packages and running 'ceph -g' y... Wido den Hollander
11:55 AM Linux kernel client Bug #434: mds: clustered mds pjd failures
Just saw this again:... Sage Weil
11:18 AM Bug #553: Kernelmodule doen't build under Debian lenny
Ok, a backport-kernel works fine AFAIS. I updated the wiki-page. DaB Punkt
10:10 AM Bug #553 (Won't Fix): Kernelmodule doen't build under Debian lenny
Unfortunately you're going to need to upgrade your kernel if you want the in-kernel client. Using the backports branc... Greg Farnum
09:52 AM Bug #553 (Won't Fix): Kernelmodule doen't build under Debian lenny
Hello all,
the wiki-page [1] says that ceph runs under Debian lenny, but as far as I see that is not true because th...
DaB Punkt
11:16 AM Linux kernel client Bug #554 (Resolved): clustered mds: max_size not updated
3 mds, export thrashing, dbench 1 hang waiting on max_size. Sage Weil
04:52 AM Revision e27f111f (ceph): mds: preserve stale state on import; some cleanup
Our new invariant is that MIX_STALE always implies is_stale(). And on
import, if is_stale(), MIX becomes MIX_STALE. ...
Sage Weil
12:08 AM Revision a582345c (ceph): Merge branch 'mix_stale' into unstable
Sage Weil
12:06 AM Revision 4126d1ce (ceph): mds: add more verify_scatter asserts
For catchings fragstat errors sooner.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil

11/05/2010

10:24 PM Revision ae670c33 (ceph): mds: fix version check on resyncing stale rstat in predirty_journal_par...
We're resyncing rstat, so check the rstat version (not fragstat!)
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:45 PM Revision 4cee6ead (ceph): mds: Fix bad inode deref.
Accidentally trying to print out the CInode after removing it in trim_non_auth!
Move the print to before it's been un...
Greg Farnum
07:20 PM Revision 93344fb2 (ceph): Revisit std::multimap decoder
Previously I changed the std::multimap decoder to minimize the number of
constructor invocations. However, it could b...
Colin Patrick McCabe
06:34 PM Revision f015c989 (ceph): autogen.sh: check for pkg-config
To avoid seeing confusing errors later in the configure process, in
autogen.sh, check to make sure the pkg-config pro...
Colin Patrick McCabe
05:57 PM Revision fd397aba (ceph): PG.cc: build_scrub_map now drops the PG lock while scanning the PG
build_inc_scrub_map scans all files modified since the given
version number and creates an incremental scr...
Samuel Just
05:38 PM Revision 989fa67d (ceph): mds: preserve version when recovering rstat from dirfrag in predirty_jo...
We don't want to screw up the version here. This aligns the code with
other instances of this check.
Signed-off-by:...
Sage Weil
02:50 PM Linux kernel client Bug #552 (Resolved): Samba with kernel oplocks=on produces lots of corrupt mds entries in dmesg
With kernel oplocks = yes, samba fills up dmesg with those
[ 4472.504211] ceph: problem parsing dir contents -5
[...
Paul Komkoff
01:56 PM Linux kernel client Bug #434: mds: clustered mds pjd failures
Sage has taken over the clustered MDS stuff for now, so here's the bug! Greg Farnum
01:55 PM CephFS Feature #495: mds: add MIX_STALE
Sage Weil
01:36 PM Bug #521: objecter: crash in osdmap assert
Sage Weil
01:02 PM CephFS Bug #551 (Can't reproduce): cfuse crash on quick mds restart
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004704ad in Client::kick_flushing_caps (this=0x...
Greg Farnum
12:29 PM Bug #550: mon: PGMonitor::update_from_paxos()
While I thought it wasn't related to the MDS issue i'm seeing, it might seem it is:... Wido den Hollander
12:11 PM Bug #550 (Can't reproduce): mon: PGMonitor::update_from_paxos()
One of my monitors crashed, got this backtrace:... Wido den Hollander
10:59 AM Linux kernel client Bug #549: bonnie++ file stat failure
Terri, can you have the qa machiens loop through _just_ the bonnie++ command he's having problems with? Something li... Sage Weil
10:57 AM Linux kernel client Bug #549 (Resolved): bonnie++ file stat failure
From ML:... Sage Weil
10:49 AM Bug #531: Journaling Causes System Hang
Hello,
1) Correct we are running transparent 10GbE
2) From what I can tell monitoring dstat across the cluster ...
Bryan Tong
10:14 AM CephFS Feature #91: mds: up:shadow mode
Update the journaler interface to allow the MDS to 'tail' the journal... periodically check to see if it's been exten... Sage Weil
10:10 AM CephFS Feature #548 (Resolved): mds: shadowreplay one-shot mode
Make sure the current mechanism still works. Clean it up if needed. Sage Weil
09:19 AM CephFS Subtask #547 (Resolved): mds: define fsck strategy, required metadata
Sage Weil
09:19 AM CephFS Feature #340 (Closed): large directories, directory fragmenting
Sage Weil
09:19 AM CephFS Feature #519 (Closed): mds: dirfrag merge
Sage Weil
06:20 AM Revision 9586e905 (ceph): mds: restructure finish_scatter_gather_update()
Separate behavior into two dimensions: whether or not we are updating
the dirfrag, and whether or not the dirfrag is ...
Sage Weil
06:15 AM Revision 669a8afa (ceph): mds: do not bump scatter stat lock in predirty_journal_parents
If we're in the MIX state, we clearly can't touch this without screwing up
the delicate scatter/gather behavior. If ...
Sage Weil
05:48 AM Revision 663b470f (ceph): mds: mark scatterlock stale on import of stale frag scatter stat
When the lock scattered, if we didn't have an auth frag that was frozen,
we go into MIX state. Later, we may import ...
Sage Weil
05:44 AM Revision 63c1ad84 (ceph): mds: match bottom half of assilate_dirty_rstat_inodes with a dir flag
We only do the assimilate_dirty_rstat_inodes if we do an update AND the
frag rstat was non-stale, but the bottom half...
Sage Weil
05:19 AM Revision 9b6d96e9 (ceph): mds: fix inode version used for inest in decode_lock_state
We need to pass the inode rstat's version into finish_scatter_update, not
the shadowed local variable. Otherwise we ...
Sage Weil
 

Also available in: Atom