Activity
From 05/10/2010 to 06/08/2010
06/08/2010
- 11:43 PM Revision 15a7a839 (ceph): mds: use helper to send message to client; fix send to null connection
- Sometimes session->connection is NULL; use session->inst in that case.
- 11:33 PM Revision c992d020 (ceph): mds: remove erroneous bracket
- 11:25 PM Revision 2a88e2e5 (ceph): add checks for being a snapshot root to dir_is_nonempty
- 11:11 PM Revision 0d8c2975 (ceph): Merge branch 'unstable' into mon-remove
- 11:03 PM Revision e3c4459e (ceph): monclient: track cur_mon by name, not rank
- 10:59 PM Linux kernel client Bug #189 (Resolved): leaked dentry
- Running unstable, commit:e041c5f
I think this is triggered by bonnie.sh. The bonnie.sh below is the dir where the... - 10:53 PM Bug #114: osd: corrupted pglog
- Running 'src/script/check_pglog.sh $osd_data_dir' (from ceph.git) periodically will let you check the osd for any cor...
- 10:48 PM Revision 1d162972 (ceph): mon: rename whoami rank, simplify rank change logic
- 10:37 PM Revision 0dc95695 (ceph): mds: fix stale lease trimming xlist iterator abuse
- 10:37 PM Revision 26a4d0ea (ceph): throttle: allow take(0)
- 10:17 PM Revision 2fa4a069 (ceph): mon: identify monitors by name, not rank
- 10:13 PM Revision c196851b (ceph): osd: init auid to CEPH_AUTH_UID_DEFAULT in case authorizer doesn't set it.
- We should probably also require the authorizer to set it for us.
- 10:13 PM Revision 76fb75e9 (ceph): buffer: fix padding distances
- 10:13 PM Revision 7c856462 (ceph): mon: fix memory-leaked messages
- 09:48 PM Bug #56 (Closed): osd: crash on repop completion
- 09:48 PM Bug #56 (Rejected): osd: crash on repop completion
- I haven't seen this in forever. Closing out.
- 08:50 PM Feature #138: Try out tcmalloc
- Going to implement this as a compile-time option (based on available libraries) once I've audited the mds and osd for...
- 08:38 PM Revision 80c42d06 (ceph): mds: scan stray dir, eval strays on mds startup
- 08:31 PM Bug #179 (In Progress): corrupted LogEntry in mon data
- 08:31 PM Bug #179 (Rejected): corrupted LogEntry in mon data
- Hmm, yeah i give up on this one. I see that it's corrupt, but not in any particularly suggestive way. No idea what ...
- 12:49 AM Bug #179: corrupted LogEntry in mon data
- I also had some problems using gdb...
gdb won't work if cmon is not in /usr/bin and debug symbols for cmon (which I ... - 06:39 PM Revision da520d7b (ceph): mon: clean up monmapmonitor warnings, style
- 06:37 PM Revision 76191d23 (ceph): logclient: clean up interaction with monclient, monitor
- Use monclient where available. Otherwise, we are a monitor, so send to
ourselves. - 06:27 PM Revision 0ab54c70 (ceph): monclient: make get_monmap_privately() clean up after itself
- - set cur_mon=-1 when we're done
- clean up connections - 06:22 PM Revision 0b44ef02 (ceph): monclient: Make MonClient update cur_mon on getting new monmap
- A MonClient starting with an incorrect monmap (i.e. mon id in the
starting map does not match the actual mon id found... - 04:42 PM Revision 2e1b0d35 (ceph): mon: make mon lease clock check protocol change backward compatible
- 04:29 PM Bug #186 (Resolved): BUG: failed to decode message of type 66 v1: buffer::exception
- 09:38 AM Bug #186 (In Progress): BUG: failed to decode message of type 66 v1: buffer::exception
- This was due to commit:29a42efe2e4c789092f59b98b29632bdc4b88a80, which made a protocol change. I'll fix it up today ...
- 06:41 AM Bug #186 (Resolved): BUG: failed to decode message of type 66 v1: buffer::exception
- While my cluster was in a degraded state due to a disk failure at random 2 out of my 3 monitors crashed.
I really ... - 04:25 PM Feature #180 (Resolved): Return ENOTEMPTY when trying to remove a directory which has a snapshot
- As of 2a88e2e54ee0b9449e86cec02315e2809b75ca8b it will return ENOTEMPTY if you try to delete a dir which roots a snap...
- 04:23 PM CephFS Bug #188 (Resolved): cfuse crashes on snapshot file read
- gregf@pudgy:~/ceph/src$ sudo mkdir mnt/a
[sudo] password for gregf:
gregf@pudgy:~/ceph/src$ sudo mkdir mnt/a/b
gr... - 03:18 PM Bug #181 (Resolved): monitor eats 8G of memory before beeing oom killed
- Found and fixed many more monitor memory leaks in 7c85646240a02a3e82a727045de6e4432cc2ed9e. Valgrind is a lot happier...
- 01:00 AM Bug #181: monitor eats 8G of memory before beeing oom killed
- Ok, here is an update 1 day after posting comment #2:
mon0 is dead, mon1 is also dead, both OOM-killed I guess (no... - 10:42 AM RADOS Bug #187 (Rejected): crush: high variance, latency for straw buckets
- 09:47 AM Cleanup #103: Introduction of namespaces
- Markus Elfring wrote:
> Can a bit of "C++ namespaces advice":http://stackoverflow.com/questions/713698/c-namespaces-... - 03:06 AM Cleanup #103: Introduction of namespaces
- Can a bit of "C++ namespaces advice":http://stackoverflow.com/questions/713698/c-namespaces-advice help to clarify yo...
- 07:49 AM Revision bc9fba0e (ceph): Introduced ceph mon remove command
- Added ceph mon remove <ip>:<port> command. The command will remove
the target monitor from the monmap and shutdown th... - 05:18 AM Revision 58fe4b8d (ceph): qa: add untar_snap_rm.sh
- 05:05 AM Revision ac10d837 (ceph): osd: print rollback osd_op nicely
- 03:55 AM Bug #99: Check return codes everywhere
- The C/C++ programming language makes it easy to "overlook unused return values":http://stackoverflow.com/questions/12...
- 02:31 AM Feature #101: Conversion of pointer parameters into references
- > ... and because it makes the pass-by-reference more explicit.
I find this opinion questionable. I assume that yo... - 02:10 AM Cleanup #146: Complete build options for Pthread API
- ...
06/07/2010
- 11:03 PM Revision 470a6fde (ceph): mds: wire Connection to Session when Session already exists on connect
- 11:03 PM Revision 6d770abe (ceph): mds: funnel mds->client messages through single Session* helper
- Simplify callers where possible.
- 10:52 PM Feature #6 (Rejected): libceph could use a backward-compatible-to function
- Usually this is handled via the shared object versioning scheme. The bit that doesn't address (I think) is when the ...
- 10:50 PM Bug #98: reserved identifier violation
- The qemu guys were worried about this when we submitted the rbd driver. Simply changing __FOO_H to CEPH_FOO_H throug...
- 10:48 PM Bug #99: Check return codes everywhere
- Markus Elfring wrote:
> Would you like to reuse any class library?
>
> I do not like "assert" for consistent erro... - 10:42 PM Cleanup #103: Introduction of namespaces
- Markus Elfring wrote:
> This issue corresponds to my previous "feature request":https://sourceforge.net/tracker/?fun... - 10:42 PM Revision 29a42efe (ceph): mon: simplify clock drift checks
- Ignore lease sent vs lease_ack receive times bc multiple lease msgs may
be in flight and the ack may be from a previo... - 10:40 PM Feature #101 (Rejected): Conversion of pointer parameters into references
- Markus Elfring wrote:
> This issue corresponds to my previous "feature request":https://sourceforge.net/tracker/?fun... - 10:38 PM Cleanup #146: Complete build options for Pthread API
- Markus Elfring wrote:
> Would you like to integrate the appended changes into your source code repository?
This g... - 10:25 PM Bug #149 (Closed): Stale NFS Handle when copying from snapshot
- Still finding bugs with this basic workload, but I haven't seen ESTALE pretty much ever. Closing this one out for no...
- 10:04 PM Revision 527d5fd7 (ceph): monc: behave in ms_handle_reset if cur_mon is < 0
- 10:03 PM Revision 6ff2a876 (ceph): msgr: don't throttle.get 0
- 10:00 PM Revision 191cb2e4 (ceph): throttle: allow put(0)
- Still returns a consistent value for the count.
- 09:59 PM Revision e505fb5a (ceph): msgr: don't thottle.put 0
- 09:47 PM Revision d2740973 (ceph): Merge remote branch 'origin/msgr' into unstable
- 07:05 PM Revision 4ecd8fac (ceph): mds: use cap on head if there is none on the snapped inode
- This is needed, in particular, when we're flushing snap data on an inode
that already got COWed. - 06:40 PM Revision 5be26609 (ceph): osd: use low-level helper getting obc in sub_op_push
- find_object_context does all sorts of stuff we don't need here: we know
which object the context is for. Just set it... - 03:09 PM Feature #185: mds: set file layout policy on directory hierarchy
- If we use an xattr for this, one issue will be keeping the ancestor directly policy consistent across any mds node ma...
- 03:07 PM Feature #185: mds: set file layout policy on directory hierarchy
- Alex Nelson wrote:
> One complication that may arise: If directory xattrs are what are used, what would be the sema... - 03:04 PM Feature #185: mds: set file layout policy on directory hierarchy
- One complication that may arise: If directory xattrs are what are used, what would be the semantics of a file hard-l...
- 03:03 PM Feature #185 (Resolved): mds: set file layout policy on directory hierarchy
- It woud be helpful to have a way to specify multiple files' object sizes. Currently there is an ioctl for controllin...
- 02:37 PM CephFS Bug #177: unlinked inode during try_to_expire()
- The problem is that the dirty_inodes list assumes the inodes are either linked or base inodes. That should be the ca...
- 02:32 PM Bug #179 (In Progress): corrupted LogEntry in mon data
- Okay, I can't make heads or tails of your core file on my system for some reason. Can you try this on your machine? ...
- 12:55 PM Revision 61555cce (ceph): throtle: add asserts on max and change parameters where appropriate
- 12:54 PM Revision 8413ed49 (ceph): throttle: fix assert count to actually use count
- 12:04 PM Linux kernel client Bug #182 (Resolved): VFS: Busy inodes after unmount of ceph.
- This was actually an mds bug. It wasn't responding to a client_caps flushsnap. Fixed in ceph.git commit:4ecd8facd91...
- 09:43 AM Linux kernel client Bug #182 (Resolved): VFS: Busy inodes after unmount of ceph.
- Am Sun, 06 Jun 2010 21:10:28 -0700 schrieb Sage Weil:
> On Sat, 5 Jun 2010, Thomas Mueller wrote:
>> hi
>>
>> ... - 11:35 AM Feature #184 (Resolved): librados support for truncate, writefull
- We need to add librados support for truncate and writefull. We should make sure that the S3 gateway handles object re...
- 11:11 AM CephFS Bug #52 (Resolved): mds: dentry versionlock
- 11:11 AM Feature #112 (Resolved): osd: snap rollback object op
- 11:10 AM Bug #176 (Resolved): osd: make_clone needs to duplicate xattrs
- 11:09 AM Feature #183 (Resolved): qa: xfstests workunit
- simple script that will run a subset of xfstests in the current directory.
- 09:49 AM Bug #181 (In Progress): monitor eats 8G of memory before beeing oom killed
- Guess I'll look at this a bit more.
- 01:22 AM Bug #181: monitor eats 8G of memory before beeing oom killed
- Hi, thanks for the fixes.
I just finished testing the new version, and my monitor survived (eating 6.8G memory, mo... - 05:15 AM Revision 21a97d1e (ceph): mon: don't leak MAuth
- 05:15 AM Revision d57b6296 (ceph): crypto: don't leak memory in CryptoAES::encrypt()
- 05:15 AM Revision 520a2c37 (ceph): crypto: don't clean up EVP table on every decrypt()
- Don't think that's appropriate? And certainly doesn't happen for the
encrypt() case. - 01:47 AM Revision ba63a7a4 (ceph): Removed all copies of the whoami value
06/06/2010
- 10:24 PM Bug #181 (Resolved): monitor eats 8G of memory before beeing oom killed
- fixed two leaks, commit:21a97d1e7ce329fac07b5e69362d27bb7edb31f5 and commit:d57b629699158abacdcc3880d43111291a6fdf77 ...
- 05:08 AM Bug #181 (Resolved): monitor eats 8G of memory before beeing oom killed
- Hi, I installed the latest ceph 0c38b3d63dd24fb8b86283de5e00f260a03d4024, and the latest qemu-rbd e6d8dbce416bfdba880...
- 02:08 PM Feature #138 (In Progress): Try out tcmalloc
- Okay, this is definitely something we need to look into more. I tried running with tcmalloc and the standard malloc, ...
- 06:23 AM Cleanup #146: Complete build options for Pthread API
- Would you like to integrate the appended changes into your source code repository?
06/05/2010
- 12:01 AM Revision 989c9ee1 (ceph): throttle: use signed counters and assert that count never drops below 0
06/04/2010
- 11:42 PM Revision 46040a5f (ceph): msgr: switch to get/set functions for Message:throttler
- 11:42 PM Revision 246415b3 (ceph): osd: fix compile issues
- 11:42 PM Revision 3b333f7a (ceph): msgr: put throttler usage on Message destruct
- 11:42 PM Revision 800da082 (ceph): msgr: Fix uses of get_[data, payload, middle] to use throttling-aware f...
- 11:42 PM Revision 0d4bdfac (ceph): osd: add osd_client_message_size_cap option to config; default 500MB
- And change the name in cosd to be that
- 11:32 PM Revision a76d8fc6 (ceph): objectcacher: cleanup formatting
- 11:32 PM Revision dff7cb33 (ceph): objectcacher: fix stat accounting when resizing bufferheads
- Must keep stats in mind when adjusting bufferheads!
- 11:32 PM Revision 0c38b3d6 (ceph): objectcacher: add verify_stats() debugging helper
- 11:32 PM Revision 12a5d7b2 (ceph): objectcacher: match states before merging in map_write
- The caller is going to set us to dirty, so we don't care what state we
have, so long as the left and right bits we're... - 11:32 PM Revision 462552ab (ceph): objectcacher: fix use of invalid iterator in map_write()
- The p points to bh, which is removed by merge_left. Move it back to final,
so we can advance to the new next a few l... - 11:23 PM Revision 522c12e5 (ceph): osd: fix rollback when head points at the rolled back snapshot
- 08:23 PM Revision 33b947cd (ceph): msg: remove copy_payload and copy_data functions; change set to use thr...
- 08:10 PM Revision 8d1e7739 (ceph): Merge branch 'rbd' into unstable
- 08:09 PM Revision 7b6aea6a (ceph): osd: clean up rollback debug output
- 08:01 PM Revision 1b5920f8 (ceph): uclient: handle inode with no caps from mds
- This happens when you readdir and some inodes are in a different snaprealm.
- 07:57 PM Revision e79a3fae (ceph): osd: filter_xattrs on a rollback op
- 07:55 PM Revision 48555f52 (ceph): osd: fix naughty iterator usage after invalidating it
- 07:49 PM Revision c730b85c (ceph): osd: add filter_xattrs function to remove non-user xattrs from a map of...
- 07:49 PM Revision a70a3668 (ceph): osd: _make_clone now properly duplicates xattrs
- 07:04 PM Revision f60be8e3 (ceph): progress
- 06:07 PM Revision 84b279a4 (ceph): mds: fix straydn->first part deux
- 9ed0c30ecf6611193db52e1facc1f46b37f04bc4 forgot to remove the old code.
- 04:45 PM Bug #173: Throttle client requests on OSD
- Pushed to msgr, commit 800da082ad8aad032ff5299b5ad0c05bc378a1e3.
This definitely, definitely does not set a hard m... - 04:31 PM Bug #178 (Resolved): cfuse fails dbench
- fixed this, and some related ObjectCacher bugs. dbench exercised a lot of code that simple tests in the past had not...
- 09:50 AM Bug #178 (Resolved): cfuse fails dbench
- From Thomas Mueller <thomas@chaschperli.ch>:
cfuse fails dbench with ceph.git testing. See logs:
Debian testing... - 02:04 PM Bug #179: corrupted LogEntry in mon data
- Doh.. so it looks like the piece of info I need was in the logm directory. If you still have it, great. If not, I c...
- 11:13 AM Bug #179 (Closed): corrupted LogEntry in mon data
- this is after a restart due to the update of all ceph daemons to c4e6482d302aa288031ced6cd845d60ba655e5c8
#0 0x... - 12:58 PM Bug #176: osd: make_clone needs to duplicate xattrs
- Pushed to unstable; and updated _rollback_to in the rbd branch to filter properly.
- 11:50 AM CephFS Bug #165 (Resolved): cmds crash
- 11:39 AM CephFS Bug #165: cmds crash
- Indeed, can't reproduce the crash with the latest unstable.
I did 3-4 restart of all mds and it worked fine, that'... - 11:43 AM Feature #180 (Resolved): Return ENOTEMPTY when trying to remove a directory which has a snapshot
- When the following command sequence is used, valuable data in a snapshot could go lost:...
- 10:37 AM CephFS Bug #172: OSD and MDS crash on rm -r
- Sage Weil wrote:
> Wido den Hollander wrote:
> > Today i ran the same test again, almost the same result.
> >
> ... - 09:53 AM CephFS Bug #172: OSD and MDS crash on rm -r
- Wido den Hollander wrote:
> Today i ran the same test again, almost the same result.
>
> Before i ran the test i ... - 06:22 AM CephFS Bug #172: OSD and MDS crash on rm -r
- Today i ran the same test again, almost the same result.
Before i ran the test i created a fresh fs with mkcephfs.... - 03:25 AM CephFS Bug #177 (Resolved): unlinked inode during try_to_expire()
- After trying to recover from bug #172 my MDS started to crash on their recovery.
Both mds0 and mds1 crashed while ... - 01:22 AM Revision 97f00aec (ceph): debugging output
- 01:22 AM Revision d3863272 (ceph): rados: print out pool instead of object
- 12:33 AM Revision 9ead80f8 (ceph): mds: fix CDir::take_sub_waiting vs dnwaiter pin
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:33 AM Revision 074a9b10 (ceph): mds: make discover work for multiversion inodes (e.g. dirs)
- If we don't have the specific snap, look up the head and see if it's
multiversion.
This doesn't give us a "range" lo... - 12:33 AM Revision ec0aa43a (ceph): mds: don't export stray (~mdsfoo/stray), and ignore in balancer
- We _must_ keep mdsdir and stray on local mds for normal operations.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:33 AM Revision 9ed0c30e (ceph): mds: set straydn first to match inode on unlink
- 12:33 AM Revision c4e6482d (ceph): mds: only purge dentries with no extra refs (besides dirty)
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:33 AM Revision 551a12f5 (ceph): mds: fix cap clone logic to look at matching first, not last
- The cap->client_follows is set to follows+1 by flushsnap, since the real
follows value isn't convenient. But it is e... - 12:33 AM Revision 791ca282 (ceph): mds: kill open_foreign_stray; but open remote mdsdirs instead
- Signed-off-by: Sage Weil <sage@newdream.net>
06/03/2010
- 11:45 PM Revision ff0e8715 (ceph): libatomic: fix assert.h compilation
- 11:40 PM Revision 900d4c6c (ceph): msgr: add Throttle pointer to Policy
- 11:20 PM Revision 1facfe0f (ceph): Merge branch 'unstable' into msgr
- 09:14 PM Revision 62b900f5 (ceph): mds: open past snap parents at end of rejoin phase
- We really need past parents open before we go active or else anything
that needs to build a snap context will fail. - 09:12 PM Revision 3989ae40 (ceph): osd: make sure we don't return EAGAIN to client
- 08:48 PM Revision 26449e7c (ceph): mdsmap: show individual mds states in summary
- 08:26 PM Revision 09185a00 (ceph): osd: improve snap_trimmer debug output
- 08:24 PM Revision 2b33d99b (ceph): mds: another cap_exports message/mdcache encoding fix
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:08 PM Revision 55da048f (ceph): mds: only adjust dn->first on lock msg if !multiversion
- The multiversion dn->first references a range of inode versions; don't
drag it forward. Fixes 38cb2403c043e6676b5631... - 07:03 PM Revision 5f905961 (ceph): mds: more fix cap_exports typing
- 06:59 PM Revision 054669ab (ceph): mds: fix scatter_nudge infinite loop
- 06:08 PM Revision 40b23227 (ceph): mds: fix ESessions type
- 06:04 PM Revision 5cd7919a (ceph): mds: drag in->first forward with straydn in handle_dentry_unlink
- 05:38 PM Revision 394d9c3d (ceph): mds: fix anchorclient dup lookups, again
- 05:17 PM Revision 980f234f (ceph): mds: only log successful requests as completed
- 05:09 PM Revision fa1e5603 (ceph): mds: anchor dir on mksnap
- 04:45 PM Revision c09d610c (ceph): mkcephfs: error when creating journal file in a directory that differs...
- mkcephfs creates osd data directory automatically, but it doesn't create a
directory for the osd journal file.
When ... - 04:40 PM Revision 5dd4a2d6 (ceph): mds: fix mismatched cap_exports type between msg and MDCache
- The types need to match because they are encoded/decoded interchangeably.
See MMDSCacheRejoin::decode() and MDCache::... - 04:33 PM Revision 609e6572 (ceph): mds: fix trim_unlinked iterator badness
- We may remove the next inode in the map. Queue up unlinked roots first,
which we know remove_inode_recursive() won't... - 04:28 PM Revision 915ab3ca (ceph): mds: define MDS_REF_SET in unstable
- 04:27 PM Revision ef095e1f (ceph): mds: clear dirtyscattered in remove_inode()
- 04:17 PM Revision 26822162 (ceph): mds: allow dup lookups in anchorclient
- It's not practical for callers to avoid dups, particularly since they may
be unaware of each other. And it's trivial... - 04:13 PM Bug #176 (Resolved): osd: make_clone needs to duplicate xattrs
- 04:01 PM Revision 8a2a9bd6 (ceph): assert: fix assert vs atomic_ops.h breakage
- This was causing us to use the system assert, not the ceph one.
- 03:23 PM Feature #175 (Resolved): Make the system large-object safe
- This will require extensive work throughout the system, especially in OSD recovery code. Right now, Ceph assumes that...
- 03:20 PM rgw Feature #174 (Resolved): Support large files better
- Right now, the rados gateway just dumps a given file into the RADOS store as a single large dump. If somebody's stori...
- 03:19 PM Revision f5ccc662 (ceph): mds: ensure past snap parents get opened before doing file recovery
- Otherwise we can fail to get_snaps() when we start the recovery:
#0 0x00007fa037625f55 in *__GI_raise (sig=<value o... - 03:17 PM Bug #173 (In Progress): Throttle client requests on OSD
- Yep, working on it now.
- 02:44 PM Bug #173 (Resolved): Throttle client requests on OSD
- See Jim Schutt's issue on the mailing list and simple patch to illustrate the problem.
Namely, overzealous clients c... - 03:16 PM Bug #149: Stale NFS Handle when copying from snapshot
- Putting this back in the mix since Sage has been handling a lot of bugs from this test case today.
- 03:04 PM Revision c0e9d210 (ceph): mds: relax lock state before encoding export (and lock state)
- We can't fuss with lock state in the finish method because we already
encoded the old state to the new auth, and we a... - 01:54 PM Linux kernel client Bug #111 (Resolved): handle EAGAIN from osd
- Looks to me like this can't actually happen. The function ReplicatedPG::find_object_context can return EAGAIN, and an...
- 01:39 PM CephFS Bug #170 (Rejected): null pointer dereference in journal_cow_dentry causes assertion failure
- 10:33 AM CephFS Bug #170: null pointer dereference in journal_cow_dentry causes assertion failure
- Unfortunately I don't -- on Yehuda's suggestion I recompiled with optimization off and have been trying to reproduce ...
- 01:39 PM CephFS Bug #171 (Resolved): mds: MDSTableClient::_logged_ack(version_t) FAILED assert(pending_commit.cou...
- 07:42 AM CephFS Bug #171: mds: MDSTableClient::_logged_ack(version_t) FAILED assert(pending_commit.count(tid))
- fixed by commit:3768ef941e67d17ecd710994b2c88960ba60627d
- 06:07 AM Revision 3768ef94 (ceph): mds: do not bother tableserver until it is active
- We resend these requests when the TS does go active, and if we send dups
things get all screwed up (see partial log b... - 05:14 AM Revision 7c0df054 (ceph): mds: do not reset filelock state when checking max_size during recovery
- This was broken by d5574993 (probably, that commit fixed a similar
problem). The rejoin_ack initializes replica stat... - 05:11 AM CephFS Bug #172 (Closed): OSD and MDS crash on rm -r
- I'm still using my test script which unpacks the kernel source and then removes it again with a few steps in between....
- 04:33 AM Revision 15c6651f (ceph): mds: lock->sync replica state is lock, not sync
- It's not readable yet. And after the lock->sync gather completes we send
out a SYNC.
Fixes failed assertion like:
... - 02:37 AM Revision 1c930f9b (ceph): msg: add missing msg_types.cc
06/02/2010
- 11:09 PM Linux kernel client Bug #111: handle EAGAIN from osd
- I agree. Though we should differentiate between two cases. One is that we initiate the EAGAIN (e.g., when reached a l...
- 11:00 PM Linux kernel client Bug #111: handle EAGAIN from osd
- Yehuda Sadeh wrote:
> We should make the client handle it, but we should also try to make sure that the osd doesn't ... - 10:51 PM Linux kernel client Bug #111: handle EAGAIN from osd
- We should make the client handle it, but we should also try to make sure that the osd doesn't ever return it (at leas...
- 10:56 PM CephFS Bug #171 (Resolved): mds: MDSTableClient::_logged_ack(version_t) FAILED assert(pending_commit.cou...
- ...
- 10:40 PM Linux kernel client Bug #38 (Resolved): rm -r failure
- I'm going to chalk this one up to commit:13a4214cd9ec14d7b77e98bd3ee51f60f868a6e5 (the d_subdirs ordering problem) an...
- 10:37 PM Linux kernel client Bug #69: ceph: ffff88001976ba50 auth cap (null) not mds0 ???
- For a multi-mds system, this can be caused if we are between an export and import on a cap.
But when I saw this th... - 10:24 PM CephFS Bug #165: cmds crash
- This looks an awful lot like it might be fixed by commit:15c6651ff57b88722b5c896f5698bf1d033e1f98. And possibly prev...
- 01:31 AM CephFS Bug #165: cmds crash
- just got the same crash of mds2 using b441fbdc9fdca271ed3bd100fc3c98c800b509b1
please find the full logs of each m... - 09:36 PM CephFS Bug #170: null pointer dereference in journal_cow_dentry causes assertion failure
- this is actually a failed assertion, not a null deref. it looks like gdb is having trouble resolving the symbols pro...
- 03:04 PM CephFS Bug #170 (Rejected): null pointer dereference in journal_cow_dentry causes assertion failure
- I've seen this a few times today.
Using the latest unstable servers(08afc8df680dc0cd5ad26f3f89152aa25a72b639), and m... - 07:40 PM Revision 5262a96a (ceph): mds: add export_dir command
- 07:40 PM Revision 4075b95c (ceph): mds: add MDCache::cache_traverse()
- 06:50 PM Revision a3323c98 (ceph): tcp: parse ipv4 and ipv6 addresses
- 06:50 PM Revision 0d1e5dbf (ceph): move addr parse() into entity_addr_t
- 06:50 PM Revision eac36cb5 (ceph): initscript: unmount btrfs if we mounted it
- 06:34 PM Revision 08afc8df (ceph): mon: fix unsynchronized clock logic;
- change output for clarity
- 11:19 AM Feature #169 (Resolved): osd: start up despite corrupted pg log(s)
- Catch decoding, memory alloc exceptions, and skip corrupt pgs so the osd can still start up. Log the errors.
- 10:02 AM Bug #149 (In Progress): Stale NFS Handle when copying from snapshot
06/01/2010
- 11:34 PM Revision b441fbdc (ceph): mds: lookup exact snap dn on import
- 11:33 PM Revision 38cb2403 (ceph): mds: update dn->first too when lock state adjusts inode->first
- This keeps dn->first in sync with inode->first
- 10:23 PM Revision 9248cd9e (ceph): mds: don't change lock states on replicated inode
- The reconnect will infer some client caps, which will affect what lock
states we want. If we're not replicated, fine... - 10:02 PM Revision afadb122 (ceph): mds: fix root null deref in recalc_auth_bits
- Root may be null if we don't have any subtrees besides ~mds$id.
- 09:14 PM Revision 364f3cb0 (ceph): mds: adjust subtree map when unlinking dirs
- Otherwise we get subtree bounds in the stray dir and get confused down
the line. - 08:18 PM CephFS Bug #30: multimds: slave_request on getattr
- I think the problem is that we authpin anything we rdlock... is that really necessary?
- 07:57 PM Revision c4bbb000 (ceph): mds: discover snapped paths on retried ops
- This is intended to mitigate a livelock issue with traversing to snapped
metadata. The client specifies all snap req... - 06:39 PM Revision 464e46c8 (ceph): mon: add wiggle room for clock synchronization check
- 05:30 PM Revision 7f8a743c (ceph): mds: add case for CEPH_LOCK_DVERSION to LockType
- 03:23 PM CephFS Bug #165: cmds crash
- I pushed a fix to unstable that _might_ fix the root cause of this, but it's hard to say. Can you leave 'debug mds =...
- 12:58 PM CephFS Bug #165: cmds crash
- ar Fred wrote:
> A bit later, I restarted the whole cluster, mds0 and mds2 crashed with the same stack trace, mds1 w... - 03:08 PM Linux kernel client Cleanup #168 (Closed): new truncate sequence
- The new truncate sequence was merged for 2.6.35-rc1. (->truncate is deprecated?)
We need to see what updates (i... - 03:02 PM CephFS Bug #167 (Resolved): mds crash
- fixed by commit:afadb1224516fc3a615d0cc51fe7560fcc0b5e7c
- 01:21 PM CephFS Bug #167 (Resolved): mds crash
- Core was generated by `/usr/bin/cmds -i r1-11 -c /tmp/fetched.ceph.conf.5518'.
Program terminated with signal 11, Se... - 12:48 PM Linux kernel client Bug #166 (Can't reproduce): Failing some pjd tests?
- Best guess is an unsychronized client/server clock.
- 11:55 AM Linux kernel client Feature #42: Resize of rbd image
- There is a refresh /sys/class/.. interface, however, resizing of an image should be lock protected, and probably shou...
- 10:28 AM Linux kernel client Bug #164 (Resolved): memory leak in statfs
- Fixed.
commit: 5d97634a3b824ed746ba0d5441bf3d1d65f490a0
05/31/2010
- 06:22 AM CephFS Bug #165: cmds crash
- A bit later, I restarted the whole cluster, mds0 and mds2 crashed with the same stack trace, mds1 was fine.
- 03:04 AM CephFS Bug #165 (Resolved): cmds crash
- one of my 3 mds crashed quickly after startup of the whole cluster:
this is using latest unstable (00c3dafd5afe6461f... - 03:41 AM Linux kernel client Bug #166 (Can't reproduce): Failing some pjd tests?
- Failed Test Stat Wstat Total Fail Failed List of Failed
----------------------------------------...
05/30/2010
05/29/2010
- 10:07 PM Linux kernel client Bug #144: GPF at con_close_socket+0x40/0x9f
- Yeah, i think this is related to #163, but i still don't know how that would cause this problem. The basic issue is ...
- 09:58 PM Linux kernel client Bug #163 (Resolved): put_osd on umount can use client after free
- fixed by commit:a922d38fd10d55d5033f10df15baf966e8f5b18c
- 04:40 PM Linux kernel client Bug #163: put_osd on umount can use client after free
- That would explain bug #144:
[12836.065773] Last user: [<ffffffffa01106b9>](put_osd+0x3f/0x82 [ceph]) - 09:25 AM Linux kernel client Bug #163 (Resolved): put_osd on umount can use client after free
- the connection can be put after ceph_client is freed, at which point this will dereference a bad pointer...
- 09:57 PM Linux kernel client Bug #164 (Resolved): memory leak in statfs
- workload dbench
master branch... - 06:06 PM Revision 79b39625 (ceph): ObjectCacher: do not try to deref an invalidated xlist::iterator
- Fixes #159
- 11:09 AM Bug #159 (Resolved): cfuse abort on file delete (0.20.2)
- All right, fixed in 0d437a205b4c239cb85f08ad6976868d84bf9ab4.
The ObjectCacher wasn't properly cleaning up objects i...
05/28/2010
- 08:21 PM Revision 83094d97 (ceph): paxos: fix store_state fix
- 07:59 PM Revision 62e290e8 (ceph): msgr: print bind errors to stderr
- 07:56 PM Revision 6060bdd8 (ceph): rbd: some fixes to conform with qemy code style
- 07:50 PM Revision 3a705ded (ceph): paxos: cleanup
- 07:48 PM Revision 3c3e82e0 (ceph): paxos: only store committed values in store_state
- The uncommitted value is handled specially by handle_last()
- 07:41 PM Revision 187011cd (ceph): initscript: fix typo with $lockfile stuff
- 07:37 PM Revision 6b72d70b (ceph): paxos: set last_committed in share_state()
- It wasn't getting set for LAST message, which broke recovery somewhat.
Broken by 8e76c5a1d827e01f77149245679bd00ba27... - 01:44 PM Linux kernel client Bug #162 (Can't reproduce): list bug during shrink_dcache_for_umount
- ceph3, rsync workload.
unstable circa 5/25... - 01:12 PM Linux kernel client Bug #141 (Resolved): ERESTARTSYS on mds update operations cause bad results
- 10:49 AM Linux kernel client Bug #141: ERESTARTSYS on mds update operations cause bad results
- I assume that switching to wait_for_completion_killable() fixed this one?
related commit: 0ec773c7f9ecbff4b75c3c68... - 12:58 PM Bug #158 (Resolved): cmon silently fails if addr is wrong in ceph.conf (0.20.1)
- fixed by commit:62e290e87fa2ce5b33a847e0837b2198bac6842b
- 08:42 AM Bug #158 (Resolved): cmon silently fails if addr is wrong in ceph.conf (0.20.1)
- 12:47 PM Bug #161: Monitor crashes on begin
- actually, commit:3c3e82e0f5feacef5f191a5ce34bf96c15fdaed5
- 12:37 PM Bug #161 (Resolved): Monitor crashes on begin
- fixed by commit:6b72d70be42823e32bb8bcec033ac3a62943e089
- 11:39 AM Bug #161 (Resolved): Monitor crashes on begin
- On an assert:
assert(begin->last_committed == last_committed);
(gdb) bt
#0 0x00007f39eacfdf45 in *__GI_raise (sig... - 12:07 PM Linux kernel client Bug #148 (Resolved): iozone failure
- yeah, this has survived 24 hours, whereas before it was failing after an hour or two.
- 12:00 PM Linux kernel client Bug #144: GPF at con_close_socket+0x40/0x9f
- What was the specific scenario? Can it be reproduced?
- 11:40 AM Bug #159 (In Progress): cfuse abort on file delete (0.20.2)
- 08:43 AM Bug #159 (Resolved): cfuse abort on file delete (0.20.2)
- 11:17 AM Linux kernel client Bug #150: order:1 page allocation failure
- Too many dirty pages? Too many pending osd requests?
We should probably try to get how many osds requests were in-fl... - 11:07 AM Linux kernel client Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
- nfs uses the rpc code, which, if I understand it correctly initializes a work queue for socket allocation and connect...
- 10:36 AM Feature #160 (Resolved): rbd revert-to-snapshot
- Need to fully implement revert-to-snapshot functionality. Currently there's a partial implementation in the rbd-class...
- 12:12 AM Revision 8c448257 (ceph): osd: fix compilation
05/27/2010
- 11:32 PM Revision 4b797745 (ceph): mds: fix null dn deref during anchor_prepare
- 11:14 PM Revision bb8b1398 (ceph): mds: fix invalid use of connection
- 10:25 PM Revision 93804416 (ceph): mds: switch some session->inst send_message calls to session->connection;
- switch an MDS broadcast from instance-based to Connection *-based send.
- 10:02 PM Support #156: Example debug levels in sample.ceph.conf
- Looks quite useful, thank you. There are also some logging directories available, e.g. logger sym. May those be add...
- 09:49 PM Support #156 (Resolved): Example debug levels in sample.ceph.conf
- 09:59 PM Revision 330e1e21 (ceph): osd: warn, don't crash, on purged_snaps shrinkage
- 09:59 PM Revision a1a13502 (ceph): mkcephfs: pass -c to cmon --mkfs
- 09:59 PM Revision 0a1d526b (ceph): osdmap: assert maxrep >= minrep
- 09:59 PM Revision 594d4568 (ceph): osdmaptool: include raw, up, acting mappings
- 09:59 PM Revision 892a0e25 (ceph): config: parse in $host from conf file
- So you can do stuff like
log dir = /data/$host - 09:58 PM Revision d2c40055 (ceph): initscript: incorporate Josef's fedora fixes
- Add 'status' command.
Add chkconfig line.
Do lockfile stuff only if /var/run/subsys exists.
Still specifying the run... - 09:31 PM Revision b83b0733 (ceph): rados: add op for rollback
- 09:31 PM Revision e935b8ec (ceph): osd: add rollback to ceph_osd_op_name
- 09:31 PM Revision 23336561 (ceph): osd: create _delete_head function, move CEPH_OSD_OP_DELETE handling to it.
- 09:31 PM Revision be1030d8 (ceph): rados: add snap.snapid to ceph_osd_op, to replace use of MOSDOp's snapid
- 09:31 PM Revision b82ba820 (ceph): osd: implement rollback functionality
- 09:31 PM Revision 91fb924a (ceph): objecter: add rollback_object function, which rolls back a single objec...
- 09:31 PM Revision 0292f2e6 (ceph): librados: add rollback_object functions.
- 09:31 PM Revision 9dd35584 (ceph): rados: add rollback functionality to rados
- 09:31 PM Revision bd9cf968 (ceph): osd: set clone_overlaps properly on rollback
- 09:31 PM Revision edffc122 (ceph): librados: update C header file to proper name for rollback function
- 09:31 PM Revision 7cc3ab62 (ceph): rados.h: should use __le64 instead of __u64
- 09:26 PM Linux kernel client Bug #148: iozone failure
- I think this may have been caused by the mds request signal handling? It isn't happening on the latest unstable.
- 09:24 PM Revision 08f69663 (ceph): ceph.spec: build-required libatomic_ops-devel, not libatomic_ops
- And no perl-devel.
- 09:23 PM Linux kernel client Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
- We could have a pool of preallocated sockets.. but that could be exhausted.
Or duplicate a bunch of socket creation ... - 06:23 PM Revision f95e1e0a (ceph): mds: add Connection * to Session
- 06:22 PM Revision 53523267 (ceph): Merge branch 'unstable' into msgr
- 04:34 PM CephFS Feature #45: Investigate adding Connection * to mds Session
- Added Connection *; now testing my send_message conversions to make sure I'm not trying to use any Connection *s whil...
- 02:57 PM Linux kernel client Bug #157 (Resolved): fix auth_x memory leak
- fixed by 'ceph: fix leak of osd authorizer'. the osd_client put_osd() didn't clean up the ceph_authorizer.
- 01:14 PM Linux kernel client Bug #157 (Resolved): fix auth_x memory leak
- this is on ceph1, qa loopall.sh workload, unstable branch....
- 04:47 AM Revision a3dc4bda (ceph): sample.ceph.conf: include debug options, commented out
05/26/2010
- 11:58 PM Revision 78375cfd (ceph): mon: add crush_rule data member to MPoolOp; use it in new pool creation...
- 11:58 PM Revision a9e17271 (ceph): objecter: add optional crush_rule parameter; set in pool_op_submit as n...
- 11:58 PM Revision 8044f7ac (ceph): librados: add crush_rule parameter to create_pool functions
- 11:58 PM Revision 05256bb0 (ceph): rados: you can now set the crush rule to use when creating a pool
- 09:54 PM Support #156 (In Progress): Example debug levels in sample.ceph.conf
- We should still add a wiki page with debugging information. I can include info about debug options, and also other s...
- 09:53 PM Support #156 (Resolved): Example debug levels in sample.ceph.conf
- good idea. commit:a3dc4bdac2057c2d0fcd27cab9c416c5089b4c76
- 05:20 PM Support #156 (Resolved): Example debug levels in sample.ceph.conf
- The debug options for ms, osd, etc. could afford to be listed in the sample.ceph.conf file, even commented out. Ther...
- 09:47 PM Revision a92df208 (ceph): mds: include LAZYIO in CEPH_CAP_ANY set
- 09:47 PM Revision a13b5b1c (ceph): mds: include LAYZIO cap in sync->mix and mix->sync transitions
- 09:47 PM Revision 297d3ecd (ceph): client: update ioctl.h (lazyio, invalidate_range)
- 09:47 PM Revision 648ce976 (ceph): mds: LAYZIO is not liked, but it is allowed
- 09:35 PM Revision 9b4d25b9 (ceph): mon: detect and warn on clock synchronization problems;
- change MMonPaxos::lease_expire to lease_timestamp
- 09:35 PM Revision 75de2723 (ceph): mon: warn to log, not just dout, on clock drift
- 09:11 PM Revision bee74a1e (ceph): ceph: add conversion to qemu coding style
- Hi Yehuda,
I've added a small hack to make push_to_qemu.pl convert tabs to spaces.
Christian - 05:59 PM Revision a1c99811 (ceph): paxos: use helper to store committed state; fix master mon catch up usi...
- The catch up logic in handle_last didn't handle the stashed state, so we
crashed and burned if it was the master that... - 05:01 PM Revision c0df916a (ceph): cfuse: bail out on mount() errors
- 04:58 PM Feature #135 (Resolved): Specify crush rules
- Added crush_rule parameters/data members as appropriate to OSDMonitor pool creation functions, objecter, librados, an...
- 02:47 PM Feature #33: O_LAZY or equivalent
- 02:32 PM Feature #105 (Resolved): mon: warn on clock drift
- It warns to dout and the logger:
1) when the slave notices the leader is behind by >(mon_lease - latency), or
2) wh... - 11:33 AM CephFS Bug #52: mds: dentry versionlock
- merged into unstable
- 11:32 AM Bug #37 (Rejected): osd: recover missing clone object
- this could have been related to the osd recovery fixes (wrt snapdir). haven't seen this in weeks. dropping it for now.
- 11:03 AM Feature #112: osd: snap rollback object op
- 11:01 AM Bug #151 (Resolved): cmon crash in PGMonitor::update_from_paxos at mon/PGMonitor.cc:90
- fixed by commit:a1c99811bae2199a4ef3eef8681ac70ccfa128f5
- 06:39 AM Bug #151 (Resolved): cmon crash in PGMonitor::update_from_paxos at mon/PGMonitor.cc:90
- one of my 3 monitors crashed today, the whole ceph cluster was idle at that time.
cmon compiled at f7708dea1f, ple... - 10:44 AM Documentation #155 (Resolved): document ceph auth
- 10:14 AM Feature #154 (Closed): support IPv6 addresses
- most of the infrastructure is there...
- 10:02 AM Bug #152 (Resolved): cfuse problem
- fixed by commit:c0df916a790f9560d487c74c22152a7e16e6f226
- 06:50 AM Bug #152 (Resolved): cfuse problem
- Hi,
compiled at 7ecf493fd2c
it seems cfuse starts fuse when it fails connecting to a monitor (which is dead in ... - 10:00 AM Bug #145: Check build dependencies for FastCGI
- Will any adjustments be needed to check for required header files also in a subdirectory like "fastcgi"?
Does a conf... - 09:54 AM CephFS Bug #153 (Resolved): mds: fix snap dentry replication vs readdir on frag auth
- The request may be something like #123//foo/some/path/to/dir, dir lives in the stray dir, and is auth on another node...
05/25/2010
- 11:40 PM Revision 32d34f06 (ceph): Merge branch 'lazyio' into unstable
- Conflicts:
src/mds/locks.c - 09:44 PM Revision e6b9055f (ceph): interval_set: fix union_of, intersection_of size accounting
- 08:47 PM Revision 2b9ef644 (ceph): init-ceph: use = not == for comparison operator
- 08:13 PM Revision fc228b5b (ceph): Merge branch 'mds_dentries' into unstable
- 08:01 PM Revision 701d2672 (ceph): mds: better debugging on rmdir
- 08:01 PM Revision 29ca21f5 (ceph): mds: fix scatterlock gather, writebehind
- We stopped overloading the virutal is_updated() when we renamed to
is_dirty.
broken by 7f19ee1ac36095cd4d4c169858d93... - 03:42 PM Linux kernel client Bug #143 (Resolved): avoid resending requests on mon ticket renewal
- fixed by 'ceph: do not resend mon requests on auth ticket renewal' and 'ceph: renew auth tickets before they expire'
- 02:37 PM Linux kernel client Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
- What it actually means is that sock_alloc_inode is being called under the kswapd context and it does an allocation wi...
- 10:34 AM Linux kernel client Bug #147 (Resolved): lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and c...
- ...
- 02:15 PM Cleanup #146: Complete build options for Pthread API
- If there is an environment where the -lpthread isn't sufficient, sure. Send a patch! :)
- 06:15 AM Cleanup #146 (Rejected): Complete build options for Pthread API
- Would you like to combine "your check for this programming interface":http://ceph.newdream.net/git/?p=ceph.git;a=blob...
- 02:07 PM Bug #145: Check build dependencies for FastCGI
- Something nonstandard with Suse then? On debian it's
fatty:src 02:05 PM $ dpkg -S /usr/include/fcgiapp.h
libfcgi... - 01:41 AM Bug #145: Check build dependencies for FastCGI
- I wonder why a header is not found because the file "/usr/include/fastcgi/fcgiapp.h" is available from the package "F...
- 02:01 PM Linux kernel client Bug #150 (Can't reproduce): order:1 page allocation failure
- workload was rsync to a ceph mount.
ceph3 mounting cosd0:/
not sure which version. probably unstable from last wee... - 11:09 AM Bug #149: Stale NFS Handle when copying from snapshot
- Reproduced on version:
kclient: current unstable branch (240ed68eb567d80dd6bab739341999a5ab0ad55d)
server: current ... - 11:05 AM Bug #149 (Closed): Stale NFS Handle when copying from snapshot
- Happens in the following scenario:
mount ceph
cd /mnt
mkdir a; cd a
tar xvfj ~/linux-2.6.xx.tar.bz2 (^C after a f... - 10:37 AM Bug #134 (Resolved): rbdtool segfaults when listing
- Might have been due to protocol change in the pool-op that didn't get a protocol version bumped up. Resolving it unti...
- 10:35 AM Linux kernel client Bug #148 (Resolved): iozone failure
- on ceph4, running
* rbd 3a6e756 ceph-rbd: snapshots support... - 10:32 AM Linux kernel client Bug #106 (Resolved): msgpool depletion?
- 10:28 AM Linux kernel client Bug #106: msgpool depletion?
- On what version did it happen? Do we have any reproducible scenario?
05/24/2010
- 11:51 PM Revision f8f9e6c4 (ceph): mds: make export targets stay in mdsmap for a while
- This limits the mdsmap churn some. Keep old targets around for at least
min-max iterations before removing them. - 11:51 PM Revision 7f0ef1cd (ceph): mds: balancer cleanup
- 11:50 PM Revision da42d061 (ceph): mds: warn on dn release that dne
- 11:11 PM Revision 06b86ea4 (ceph): rbd: modify rbd on-disk header
- 10:58 PM Revision 7cf48614 (ceph): rbd: fix push_to_qemu.pl
- 10:56 PM Revision be082f0d (ceph): filestore: make mkfs() zap any file or dirs it finds
- 10:56 PM Revision 7113775b (ceph): mon: roll mkmonfs functionality into cmon --mkfs
- 10:55 PM Revision 5e8a6096 (ceph): rbd: modify header, add utility to ease sync with qemu tree
- 09:00 PM Revision a9b494c4 (ceph): mon: no need for 'whoami' file in store
- The monitor rank is provided during startup. No need to verify it against
the monitor store, especially since the st... - 09:00 PM Revision 0d98fc6f (ceph): osd: keep recovery ops in sync with pull
- Call start_recovery_op from pull() instead of fixing every caller (some
were wrong). This keeps the recovery state i... - 03:31 PM Bug #132 (Resolved): slow mon recovery after operating degraded for too long
- fixed by commit:bf1cb87d255b88d8e06b2988b6700e400ceb1b92 and commit:357aa0334436da79065dc67b270ff78f8899493f
- 03:30 PM Cleanup #121 (Resolved): roll mkmonfs functionality into cmon
- commit:752a0fd5630aba92dedc3bb30fccec0ec837fa59
- 02:25 PM Bug #133 (Resolved): mds crash on snapshot
- The crash I saw here (related to an anchor table lookup) is fixed by commit:51c5823472ef8208c1b7a6b094f1655ccdc1190e
- 02:23 PM Bug #145: Check build dependencies for FastCGI
- Hmm, there is a rule in configure.ac checking for FCGX_Init. Is that rule broken, or is checking FCGX_Init insuffici...
- 02:14 PM Bug #145 (Resolved): Check build dependencies for FastCGI
- I stumble on the following messages for my compilation try....
05/23/2010
- 10:13 PM Revision 56c4043a (ceph): reword blacklisted output so it's clearly discussing MDSes and not OSDs
05/22/2010
- 04:56 PM Revision f7708dea (ceph): uclient: don't unlink null dentry when getting null linkage in mds reply
- This broke semi-recently when the mds started returning null linkages (and
associated leases). - 10:02 AM Bug #140 (Resolved): Cfuse crashes when mv-ing a file
- fixed by commit:f7708dea1f2db5d3be31ddc2aaf1500e1d50746d
- 09:47 AM Bug #140 (In Progress): Cfuse crashes when mv-ing a file
05/21/2010
- 11:17 PM Revision bf1cb87d (ceph): mon: trim pgmap states even when we don't have a full quorum
- 11:17 PM Revision 357aa033 (ceph): paxos: recover using stashed latest when state histories don't overlap
- If we don't have incremental states to catch up, jump to the latest.
- 09:55 PM Revision 51c58234 (ceph): mds: anchor multiversion inode before unlinking it
- If we are going to create a remote dentry linking to a multiversion inode
we're unlinking, make sure it's anchored!
... - 08:44 PM Revision fbbff743 (ceph): librados.h: add other TMAP definitions
- also add a comment in rados.h about the defines in librados.h
- 07:08 PM Revision 3ec46059 (ceph): monc: hunting by default
- Otherwise if we fail to connect to the first mon we try, we never retry.
- 07:08 PM Revision cd1b0710 (ceph): monc: pick a different mon when repicking
- 07:08 PM Revision 929048f9 (ceph): mds: fix readdir pingpong on snapped dir with multiple mds
- Our traverse helper will follow the auth if we're looking at snapped
metadata, but we _don't_ want that for readdir b... - 01:39 PM Linux kernel client Bug #141: ERESTARTSYS on mds update operations cause bad results
- It seems pretty important to me that users be able to abort MDS requests -- if for some reason part of the filesystem...
- 08:53 AM Linux kernel client Bug #141 (Resolved): ERESTARTSYS on mds update operations cause bad results
- - process does a create
- gets signal and returns ERESTARTSYS before reply comes back
- kernel retries the operatio... - 12:56 PM Linux kernel client Cleanup #142 (Resolved): reuse message for mon subscribe
- 12:20 PM Linux kernel client Cleanup #142 (Resolved): reuse message for mon subscribe
- no need to allocate a fresh message each time around
- 12:50 PM Linux kernel client Bug #144 (Can't reproduce): GPF at con_close_socket+0x40/0x9f
- ...
- 12:31 PM Linux kernel client Bug #143 (Resolved): avoid resending requests on mon ticket renewal
- 12:22 PM Linux kernel client Bug #66 (Resolved): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
- 12:22 PM Linux kernel client Bug #66 (In Progress): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
- 08:58 AM Linux kernel client Bug #139: BUG ceph_dentry_info: Objects remaining on kmem_cache_close()
- Looks this isn't fixed after all (see #63). Maybe a dentry is allocated but never added to the dcache?
- 02:08 AM Linux kernel client Bug #139 (Resolved): BUG ceph_dentry_info: Objects remaining on kmem_cache_close()
- After unmounting my Ceph filesystem and removing my kernel module i got the following message:...
- 02:33 AM Bug #140 (Resolved): Cfuse crashes when mv-ing a file
- Hello,
I try to set up a little test ceph cluster, based on the testing branch.
I encounter a problem using the c...
05/20/2010
- 11:19 PM Revision 16ab067c (ceph): librados: update librados to define CEPH_OSD_TMAP_SET
- 06:14 PM Revision 0050dd84 (ceph): mon: fix mon injectargs, and simplify
- 05:58 PM Revision 9e4e53e0 (ceph): osd: simplify --mkjournal, add --flush-journal
- 05:39 PM Revision 41b26060 (ceph): Merge branch 'osd_snapdir' into unstable
- 05:39 PM Revision 1d9ab261 (ceph): osd: nicer debug output
- 05:33 PM Revision f3ab812b (ceph): interval_set: fix union_of _size accounting; optimize ==
- 01:50 PM Bug #98: reserved identifier violation
- Markus Elfring wrote:
> I suggest to append a kind of "UUID":http://en.wikipedia.org/wiki/Universally_Unique_Identif... - 01:49 PM Bug #98: reserved identifier violation
- The header ifdef guards should definitely be fixed. I think
#ifndef CEPH_FOO_H
#define CEPH_FOO_H
should be s... - 10:59 AM Cleanup #137 (Resolved): osd: --apply-journal, --mkjournal?
- 10:40 AM Bug #76 (Resolved): osd: snapdir object recovery doesn't work
- 09:08 AM Cleanup #67: add 'autoscan' items to configure.ac
- Would you like to integrate any changes from the appended update suggestion into your source code repository?
05/19/2010
- 11:43 PM Revision e162aab3 (ceph): mds: fix interval_set copy of projected_free
- 11:42 PM Revision 155efe24 (ceph): mds: fix interval_set copy of projected_free
- 11:11 PM Revision f6c48274 (ceph): osd: use blank reqid for snapdir events, too
- Make reqid_is_indexed() less weird.
- 11:10 PM Revision 8dfe74f6 (ceph): osd: do not index by reqid if reqid not defined
- 10:42 PM Revision 4ed3acb0 (ceph): osd: update purged_snaps in PG::Info on trim completion; and replicate
- 08:17 PM Revision 9149dfa1 (ceph): rbd: fix snap_seq type in rbd_header
- 06:49 PM Revision 99690f63 (ceph): initscripts: remove 'flushoncommit' from default btrfs mount options
- 05:53 PM Revision 46891dd0 (ceph): osd: trim snaps via replicated osd ops
- 05:42 PM Revision ff94c3a4 (ceph): osd: make build_removed_snaps, is_removed_snap consistent
- 05:35 PM Revision e0315485 (ceph): rados: update documentation to mention mkpool and rmpool
- 03:25 PM Feature #138 (Resolved): Try out tcmalloc
- The cosd daemon seems to eat up fragmented memory or something, since heap size stays fairly consistent but top memor...
- 02:20 PM Cleanup #137 (Resolved): osd: --apply-journal, --mkjournal?
- onetime style commands (do something, then exit), ala --mkfs.
- 11:39 AM Feature #136 (Rejected): sensible grammar for monitor commands
- - clean up commands so the syntax is more intuitive
- some sort of help dump (full or partial?) grammar so you can s... - 10:46 AM Feature #135 (Resolved): Specify crush rules
- The ability to create many CRUSH rules isn't very helpful if you can't use them all! Implement all appropriate ways t...
- 05:58 AM Bug #134 (Resolved): rbdtool segfaults when listing
- I'm experiencing a segfault with rbdtool when listing after a fresh mkcephfs....
05/18/2010
- 11:00 PM Bug #130: Build needs more configuration files.
- > see commit:a7769755c18882a259af6b8756f227bf2e71561e
I can not see it so far. When will this change be also publi... - 09:41 AM Bug #130 (Resolved): Build needs more configuration files.
- Ah, that makes sense. Thanks!
- 09:30 AM Bug #130: Build needs more configuration files.
- It seems that the error messages about missing files will be displayed if somebody like me tries to regenerate the sc...
- 08:03 AM Bug #130: Build needs more configuration files.
- I added the automake args, and fixed the INCLUDES thing, but I don't see the other errors (on automake 1.10.1). and ...
- 06:18 AM Bug #130 (Resolved): Build needs more configuration files.
- I would like to update the "build configuration template":http://ceph.newdream.net/git/?p=ceph.git;a=blob;f=configure...
- 10:04 PM Revision ee218a1e (ceph): osd: fix peer_info updates on active primary
- 07:18 PM Revision baba34bc (ceph): msgr: remove unused utime_t now
- 03:01 PM Revision a7769755 (ceph): automake: some clean up
- 02:40 PM Bug #133 (Resolved): mds crash on snapshot
- (09:45:41 AM) wido: i'm experiencing a MDS crash regarding snapshots.
(09:46:11 AM) wido: Test case: download the Li... - 12:20 PM rgw Support #8 (Closed): Document differences from S3
- It's on the wiki, now, anyway.
- 11:49 AM Linux kernel client Feature #23: fcntl/flock advisory lock support
- Ahah, file_lock's fl_nspid pointer isn't filled in before calling the filesystem's lock handlers. I've fixed that so ...
- 10:03 AM Linux kernel client Feature #23 (In Progress): fcntl/flock advisory lock support
- Found some issues with recovery after all; working on them now.
- 09:03 AM Bug #132 (Resolved): slow mon recovery after operating degraded for too long
- need to trim even when degraded. and make sure recovery works using stashed latest.
- 08:37 AM Linux kernel client Feature #19 (Resolved): rbd
- 08:35 AM Feature #71 (Resolved): msgr: throttle incoming messages
- 08:04 AM Feature #131 (Resolved): bring wireshark plugin is up to date
05/17/2010
- 11:29 PM Revision 1ea0f858 (ceph): poolop: fix MPoolOpReply decoding
- 09:53 PM Revision 736d837e (ceph): throttle: allow large items if we're under our max
- Normally we stay under max, but for large items, take it as long as we're
currently below max. This avoids deadlock. - 04:14 PM Bug #74 (Resolved): make removed_snaps contiguous
- 03:43 PM Linux kernel client Feature #23 (Resolved): fcntl/flock advisory lock support
- It should support flock and fcntl locks now. Currently there are no caps for this, so all locking requests are routed...
- 02:04 PM Linux kernel client Bug #63 (Resolved): dentry_info slab not empty
- 11:30 AM Linux kernel client Bug #63: dentry_info slab not empty
- hopefully fixed by commit:7a597c3f4aa58d30d1236b1c1bf980e28a899578
- 01:40 PM Linux kernel client Feature #26: statlite
- http://marc.info/?t=123908749900004&r=1&w=2
http://marc.info/?t=123914651100002&r=1&w=2
http://marc.info/?l=linux-f...
05/16/2010
05/14/2010
- 09:57 PM Linux kernel client Bug #38 (In Progress): rm -r failure
- 09:57 PM Linux kernel client Bug #38: rm -r failure
- 09:18 PM Revision 47ba928b (ceph): osd: include snapdir objects in pg log for proper replication, recovery
- 06:36 PM Revision ffd72a98 (ceph): strings: clean up pool op names
- 06:32 PM Revision d1c78fcb (ceph): mds: allow readdir result limit in bytes
- This will allow the client to bound the size of the reply it gets
- 04:43 PM Revision 7be27f43 (ceph): debian: put proper distribution in debian changelog
- 04:14 PM Revision 33bf1a2b (ceph): version: use next version ~rc for unstable branch
- This makes unstable always sort after stable, testing releases:
0.21~rc-unstable... > 0.20.1-testing... - 01:21 PM Linux kernel client Bug #126 (Resolved): qemu rbd driver doesn't work with virtio
- Fixed, pushed.
- 12:01 PM Cleanup #129 (Rejected): msgr: separate message encoding into sections
- instead front, middle, data, just break the message encoding into N sections, identified by some integer. this makes...
- 11:31 AM rbd Feature #41: Support snapshots
- Read-only snapshots are now implemented on the kernel client. Still need to have the kvm-rbd implementation.
- 09:24 AM Linux kernel client Bug #127 (Resolved): fix r_aborted locking
- 09:10 AM Linux kernel client Bug #66 (Resolved): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
05/13/2010
- 08:28 PM Revision 0e177d28 (ceph): radosgw_admin: die after first bad argument
- 05:47 PM Revision c54d6cde (ceph): objecter: separately track pgmap, osdmap state machine version
- Mixing these up can make our request hang on the monitor indefinitely.
- 05:47 PM Revision 52e544bf (ceph): mon: return correct state machine epoch in replies
- 05:46 PM Revision ecc4f686 (ceph): testrados: fix aio api usage
- 05:46 PM Revision c3a8adaa (ceph): librados: implement rados_stat_pool()
- 01:04 PM Linux kernel client Bug #127 (In Progress): fix r_aborted locking
- 11:10 AM Linux kernel client Bug #127: fix r_aborted locking
- no, there needs to be some locking. if we abort and return to the caller, we need to know that fill_trace isn't doin...
- 08:08 AM Linux kernel client Bug #127 (Resolved): fix r_aborted locking
- r_aborted is protected by mdsc->mutex (along with r_reply, r_err), but it tested during fill_trace under s_mutex. is...
- 10:50 AM Feature #128 (Resolved): librados: implement get pool stats
- 09:22 AM Feature #128 (Resolved): librados: implement get pool stats
- 07:57 AM Linux kernel client Bug #107: lockup on __cap_is_valid (via aio_write) vs __ceph_remove_cap
- here are the final two crashes i got on this, presuambly because i had the wrong version of the module loaded:
<pr... - 07:28 AM Linux kernel client Bug #66 (In Progress): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
- hit this again, on commit:e84346b726ea90a8ed470bc81c4136a7b8710ea5
workload was kernel compilation....
05/12/2010
- 11:13 PM Revision 4ee1e261 (ceph): Merge branch 'unstable' of ceph.newdream.net:git/ceph into unstable
- 11:13 PM Revision e741d43f (ceph): rados: fix typo
- 11:06 PM Revision 342fc871 (ceph): osd: add CEPH_PG_MAX_SIZE to header
- 11:06 PM Revision 7f43cf8a (ceph): filestore: update btrfs ioctl.h
- 11:06 PM Revision 856bdf2f (ceph): client: un-"fix" u64 types in client/ioctl.h
- 11:06 PM Revision c9b1aee1 (ceph): msgr: fix possible overflow when sending seq
- 11:06 PM Revision 65074e5a (ceph): msgr: print message encoding version to aid debugging
- 10:10 PM Revision a902bf02 (ceph): cmpxattr: null termination fixes
- 10:10 PM Revision 80dcc28a (ceph): ceph_fs.h: checkpatch fixes
- 10:10 PM Revision afa1993e (ceph): ceph_strings: checkpatch fix
- 06:55 PM Revision 07fdde4b (ceph): rados: add 'tmap dump'
- 06:55 PM Revision b5029166 (ceph): osd: set obs.exists in projected object state during recovery
- 06:55 PM Revision dbcb4f82 (ceph): mds: warn, don't crash, on trailing garbage in dir objects
- 03:36 PM Linux kernel client Bug #126 (Resolved): qemu rbd driver doesn't work with virtio
- rbd images work fine with virtio off, but with it on I just get:...
- 12:03 PM Bug #122 (Resolved): msgr: msgvec should go on heap
- 11:26 AM Feature #125 (Resolved): log rotation
- 04:30 AM Revision 8fbabe03 (ceph): msgr: put msgvec on heap
- It can get too big for the stack.
- 04:14 AM Revision 876a0ccd (ceph): msgr: tolerate incoming seq #'s that skip ahead
- This is necessary because the kclient may pull messages out of the out/sent
queues, and we can't renumber previously ...
05/11/2010
- 09:46 PM Revision a576e6e2 (ceph): ceph: return error code returned by server
- 09:43 PM Cleanup #124 (Resolved): msgr: change protocol handshake to exchange in_seq
- This will allow peer to only requeue sent messages that weren't actually received. No need to resend stuff that will...
- 09:42 PM Linux kernel client Bug #123 (Resolved): fix msgr message retry seq numbering
- fix:
- we now allow seq #'s to jump forward
- we only assign seq # once after ceph_con_send'ing a message. if it r... - 09:07 PM Linux kernel client Bug #123 (Resolved): fix msgr message retry seq numbering
- we currently assign seq #'s when we send the message over the wire. this numbering breaks when we reconnect because ...
- 08:28 PM Revision 12f7c0b4 (ceph): mds: drop 'closed' bit from MClientReconnect
- 08:13 PM Revision 7a23b5fb (ceph): msgr: set outgoing msg connection before encoding
- This allows encode_payload to adjust behavior based on the target peer's
feature bits. - 04:59 PM Revision f6c2e1c4 (ceph): poolop: make new encoding backward compatible
- This makes cea221c64 behave when messages using the old encoding (that is,
older versions of the client talk to us). - 04:48 PM Bug #122 (Resolved): msgr: msgvec should go on heap
- it can be too large for the stack for big messages
- 04:36 PM Revision 0ebf2599 (ceph): osd: fix layout return type
- 04:36 PM Revision ffc3e63f (ceph): osd: fix compile error from cmpxattr, cleanup.
- I think the xattr bufferlist still needs to be null terminated...
- 04:36 PM Revision 4a7118e2 (ceph): msgr: be less noisy about msgr throttling
- 01:40 PM Cleanup #58 (Resolved): kill nstring/cstring, use std::basic_string instead
- 12:49 PM Linux kernel client Bug #107 (Resolved): lockup on __cap_is_valid (via aio_write) vs __ceph_remove_cap
- fixed by 'ceph: fix cap removal race' commit:d855b8010914b52d8dd596f6d22c162bf81ccf21
- 11:56 AM Linux kernel client Bug #107: lockup on __cap_is_valid (via aio_write) vs __ceph_remove_cap
- finaly caught it!...
- 11:43 AM Linux kernel client Cleanup #113 (Resolved): audit mds_client locking, esp reply handler
- 11:42 AM Linux kernel client Bug #116 (Resolved): can we drop user. xattr prefix for magic ceph xattrs?
- yes. see 'ceph: use ceph. prefix for virtual xattrs'.
- 10:54 AM Cleanup #121 (Resolved): roll mkmonfs functionality into cmon
- We can just do 'cmon --mkfs ...' instead of 'mkmonfs', similar to how the cosd initializes its local storage.
- 10:41 AM Bug #120 (Resolved): monitor cluster expansion broken
05/10/2010
- 11:18 PM Revision f857a2e1 (ceph): osd: add cmpxattr op handling
- 11:00 PM Revision 4d667dde (ceph): debian: remove pull.sh calls from helper .sh scripts
- 10:53 PM Revision 324fe827 (ceph): atomic: cast away const on read()
- (Only needed to build on lenny, this can go away someday)
- 10:10 PM Linux kernel client Cleanup #113 (In Progress): audit mds_client locking, esp reply handler
- 03:25 PM Linux kernel client Cleanup #113: audit mds_client locking, esp reply handler
- see also #66
- 10:09 PM Linux kernel client Bug #64: crash in handle_mds_map (corrupt s_waiting list?)
- fixed by commit:1c0806d2caacc683c56a587eaf1502769a7c0698
- 04:35 PM Linux kernel client Bug #64 (Resolved): crash in handle_mds_map (corrupt s_waiting list?)
- fixed by 'ceph: fix locking, error paths when waking reconnect requests'
- 10:09 PM Linux kernel client Bug #66: BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
- fixed by commit:9abf82b8bc93dd904738a71ca69aa5df356d4d24
- 04:34 PM Linux kernel client Bug #66 (Resolved): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
- fixed by 'ceph: fix locking, error paths when waking reconnect requests'
- 03:24 PM Linux kernel client Bug #66: BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
- unable to reproduce... but, see #113
- 10:04 PM Bug #120 (Resolved): monitor cluster expansion broken
- The wiki procedure at http://ceph.newdream.net/wiki/Monitor_cluster_expansion does not work. It crashes with:
<pr... - 08:40 PM Revision 99cdd525 (ceph): osd: 'stop' command
- 04:44 PM Linux kernel client Feature #119 (New): avoid looping connect/retry errors on console
- we should try to avoid filling up logs with stuff like this:...
- 04:35 PM Linux kernel client Bug #78 (Resolved): bdi_init list bug
- 04:13 PM Linux kernel client Feature #18 (Resolved): reconnect fixups
- 03:42 PM CephFS Feature #118 (Rejected): kclient: clean pages when throwing out dirty metadata on session teardown
- see 'ceph: throw out dirty caps metadata, data on session teardown'
- 03:25 PM Linux kernel client Bug #50 (Resolved): osd timeout reset leaves some ops hanging
- 10:31 AM Linux kernel client Bug #50: osd timeout reset leaves some ops hanging
- 10:31 AM Linux kernel client Bug #50: osd timeout reset leaves some ops hanging
- finally found this, fixed by commit:77eb74b92fee7340d104b24a9ee2800196b0f140
- 03:23 PM Bug #117 (Rejected): osd: lone osd might not notice new peers of simultaneously marked down
- 10:35 AM Bug #117 (Rejected): osd: lone osd might not notice new peers of simultaneously marked down
- see sepia.a's osd6 epoch 3206
Also available in: Atom