Activity
From 10/04/2010 to 11/02/2010
11/02/2010
- 11:34 PM Revision 3ae8c001 (ceph): osd: Make a per-pg sched_scrub, and remove non-active accounting from t...
- 11:28 PM Revision 9d1984e8 (ceph): mds: mark scatterlock stale if dir is frozen, not inode
- It's the dir we're auth for and that might potentially be frozen.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:25 PM Revision 4838016d (ceph): Merge branch 'unstable' into mix_stale
- 10:26 PM Revision e304a245 (ceph): rados: benchmark using unique object names
- Include hostname and pid in object name, so that instances running on
different hosts write to unique objects.
Signe... - 09:37 PM Revision 38f96c65 (ceph): debian packaging: set --sbindir=/sbin
- We want mkcephfs and mount.ceph to be under /sbin.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 08:00 PM Revision 68f7fede (ceph): config: fix sigsegv handler
- Fixed this with sigabrt, forgot to do sigsegv too.
See 7a688a9f999a6b9d3bcdcbebbd8cd984afc70e31.
Signed-off-by: Sag... - 06:10 PM Revision 235aa1c3 (ceph): filestore: disable 'filestore btrfs snap' when SNAP_DESTROY is missing
- We want to enable the new snap stuff by default. But we also want to work
with the default configuration on old kern... - 05:43 PM Revision 4cfd198c (ceph): Makefile.am: include the libcrush headers when installing
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 05:10 PM Revision abb0b6d9 (ceph): Merge branch 'testing' into unstable
- 05:09 PM Revision 5310ab6e (ceph): uclient: Warn on truncate_[size|seq] changes for non-file inodes.
- 05:09 PM Revision 630db2a9 (ceph): mds: Init system CInodes to have a truncate_size of -1.
- This should help with bug #518.
- 05:09 PM Revision 524c8903 (ceph): client: match initialization with mds
- (see Server::prepare_new_inode())
Signed-off-by: Sage Weil <sage@newdream.net> - 05:09 PM Revision 20e8a451 (ceph): client: only do truncate on regular files
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:45 PM Revision 905ff763 (ceph): debian: add pkg-config as build-depends
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 03:34 PM Bug #535 (In Progress): cephtool hangs forever until a UNIX signal is received
- Okay, got in on a hang. The Pipe's been doing a disconnect/reconnect loop for about 4 minutes, it's currently in stat...
- 03:24 PM Bug #538: Write performance does not scale over multiple computers
- Oh, there is another issue: the rados bench command always writes to objects "Object %d". So all of your nodes are w...
- 12:32 PM Bug #538: Write performance does not scale over multiple computers
- Just to be clear, do you have all 208 nodes running server daemons and the client? What's your configuration look lik...
- 11:55 AM Bug #538: Write performance does not scale over multiple computers
- I also tried setting the target pg count to 4,000 and got about the same numbers as 400, maybe a small amount faster....
- 11:24 AM Bug #538: Write performance does not scale over multiple computers
- I set the target pg count to 400 and tried again. It helped some, up to 2x, but is still slower than I expected:
<... - 10:19 AM Bug #538: Write performance does not scale over multiple computers
- If the benchpool is a new pool you created, the problem is likely that it is too small. By default, new pools have o...
- 08:15 AM Bug #538 (Closed): Write performance does not scale over multiple computers
- I have ceph0.22.1 installed on a cluster of 208 lightly loaded 64-bit Linux nodes (RHEL5.5 ext3). The configuration i...
- 02:48 PM Bug #537: debian/ubuntu: Build system broken after commit
- oh yeah, and thanks for your patches, Wido. Good call with the libcrush headers.
C. - 02:47 PM Bug #537 (Resolved): debian/ubuntu: Build system broken after commit
- should be fixed by 38f96c658dee3e7e26a68a3c57eec2a5d8758e17
cheers,
C. - 10:13 AM Bug #537: debian/ubuntu: Build system broken after commit
- I applied #2, but for #1, we really do want those installed in /sbin (so say the debian/ubuntu guys). That's unfortu...
- 06:45 AM Bug #537 (Resolved): debian/ubuntu: Build system broken after commit
- commit 1dd5042e655b80eae99f002047fe1dfb4cc46120 broke some things when building .deb packages, mainly because the loc...
- 10:46 AM Feature #389: Synchronize header modifications between clients
- Still working on it. Major functionality that was implemented:
- new osd- watch/notify/notify-ack messages
- most... - 10:32 AM Linux kernel client Bug #69: ceph: ffff88001976ba50 auth cap (null) not mds0 ???
- just saw this on ceph1, running commit:2f56f56ad991edd51ffd0baf1182245ee1277a04...
- 10:19 AM Tasks #539 (Resolved): wiki: document pg expansion
- 10:18 AM CephFS Bug #529 (Resolved): Cfuse: Software caused connection abort
- There were a sequence of commits in this, some of which were one step forward and two steps back. The testing branch ...
- 05:51 AM CephFS Bug #529: Cfuse: Software caused connection abort
- I was going to apply the patch to my version but I noted that my src/client/Client.h line 516 already says "truncate_...
- 09:41 AM Bug #536 (Resolved): debian/ubuntu: Add pkg-config as a build dependency
- applied commit:905ff7635297614633175f129f491a83c3b2f314, thanks!
- 02:37 AM Bug #536 (Resolved): debian/ubuntu: Add pkg-config as a build dependency
- When trying to build todays unstable, I got the following message:...
- 05:44 AM Bug #532 (Closed): OSD: repop_queue.front() == repop
- Indeed, my build system was still building the *rc* branch, oops!
- 05:09 AM Revision bc9bc4cb (ceph): init-ceph: make lockfile dir configuration (redhat)
- Reported-by: Ed Burnette <ed.burnette@sas.com>
Signed-off-by: Sage Weil <sage@newdream.net> - 02:31 AM Revision 85ba4f2d (ceph): object.h: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
11/01/2010
- 10:36 PM CephFS Bug #451 (Can't reproduce): mds: replay error
- 10:35 PM CephFS Bug #523: cfuse locks don't wake on mds reconnect?
- This might be the same issue as #535 (which looks to me like it's waiting on tcp_read/poll?).
- 10:33 PM CephFS Bug #529: Cfuse: Software caused connection abort
- Hey Greg, this looks like client truncation stuff again. This was biting me today, almost immediately. These two pa...
- 07:40 AM CephFS Bug #529 (Resolved): Cfuse: Software caused connection abort
- After using ceph for a few minutes it gets into a state where I can no longer access the cfuse mount point. It also s...
- 10:33 PM Revision ee3fc3bd (ceph): osd: Add scrub to the names of scrub scheduling-related things.
- 10:31 PM Revision 993ba1cd (ceph): osd: refactor OSD::sched_scrub
- Take sched_scrub_lock sparingly, and push active/pending accounting to the work queue.
- 10:31 PM Revision 8d200a7d (ceph): osd: Move pending/active scrub accounting into the scrub work queue.
- 10:30 PM Revision 378f84c1 (ceph): osd: Add the rest of infrastructure for scheduling scrubbing
- 10:29 PM Bug #531: Journaling Causes System Hang
- Yeah,
I figured not running with journals wouldn't work right. As long as the block size of the writes is very lar... - 10:22 PM Bug #531: Journaling Causes System Hang
- It's expected that you'll get extremely slow performance without the journal.
I'll work on replicating this in o... - 12:08 PM Bug #531: Journaling Causes System Hang
- I forgot a bit about the setup.
4 x OSD all with journals on separate drives. Each OSD is on a separate system.
B... - 12:00 PM Bug #531 (Resolved): Journaling Causes System Hang
- Hello,
It seems that when doing a large write once the journal fills up the system goes into a state of lock and h... - 10:28 PM Bug #530 (Resolved): No way to override lock file path on RH
- This look okay? commit:bc9bc4cb28376728e5428eff0ddb3ff301831e50
- 07:57 AM Bug #530 (Resolved): No way to override lock file path on RH
- init-ceph checks if /var/loc/subsys exists and if it does, tries to create a lock file there. In my case for various ...
- 10:28 PM Revision 7b68a403 (ceph): osd: add variables to track scrub scheduling
- Add OSD, PG, and config variables to track pending and active scrubs.
- 05:12 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- Colin McCabe wrote:
> While running vstart.sh, I reproduced this bug with debug_ms = 20.
>
> Here's what the outp... - 05:09 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- While running vstart.sh, I reproduced this bug with debug_ms = 20.
Here's what the output was. Since cephtool does... - 04:42 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- > Perhaps this bug is caused by Nagle's algorithm?
>
As Sage pointed out, we're already running with TCP_NODELAY... - 04:31 PM Bug #535 (Resolved): cephtool hangs forever until a UNIX signal is received
- I just saw this twice in a row. cephtool hangs forever until a UNIX signal is received. That seems to break the logja...
- 05:04 PM Revision 3d85a7b9 (ceph): logrotate: separate rule for stat/*.log
- Logrotate seems to ignore the entire rule if any part of the file list
is not found. This happens on nodes with only... - 04:56 PM Bug #533: radostool hang on shutdown
- I think I have a fix for this one.
- 03:48 PM Bug #533 (Resolved): radostool hang on shutdown
- radostool still seems to be hanging from time to time on shutdown.
Sending a signal resolves the issue.
For examp... - 04:53 PM Revision 49153c2c (ceph): osd::PG: Update PG comments
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:50 PM Revision e6df8074 (ceph): test: create test_unfound.sh
- Create test_unfound.sh to test handling unfound objects.
Move more test functions into test/test_common.sh to facili... - 03:53 PM Linux kernel client Feature #534 (Resolved): support CEPH_FEATURE_RECONNECT_SEQ in klibceph
- 02:07 PM Bug #532: OSD: repop_queue.front() == repop
- This problem was in v0.22, but fixed in v0.22.1. Can you try with the latest testing (v0.22.2) or unstable?
- 12:52 PM Bug #532: OSD: repop_queue.front() == repop
- I think I was a bit to premature about that, since osd5 just crash again with the same backtrace....
- 12:45 PM Bug #532 (Closed): OSD: repop_queue.front() == repop
- On two of my OSD's I had the following crash:...
- 03:43 AM Revision 1dd5042e (ceph): fix make distcheck, make uninstall
- Make distclean was failing because make uninstall was broken. (There were
still leftover files after running make ins... - 02:50 AM Bug #462: cephx: verify_authorizer_reply exception in decode_decrypt
- I've just done a fresh mkcephfs on my cluster and then I started to see:...
10/30/2010
- 10:46 PM Revision 33e4d533 (ceph): Merge remote branch 'origin/mds_frags' into unstable
- 10:22 PM Revision c044829c (ceph): filestore: automatically choose appropriate journaling mode
- The three modes each get an explicit config option that defaults to false.
You can choose one explicitly by enabling ... - 10:04 PM Revision 6c69c259 (ceph): Merge remote branch 'origin/testing' into unstable
- Conflicts:
configure.ac - 06:24 PM Revision 9f4fd4a6 (ceph): v0.22.2
- 06:12 PM Revision 5b06ca1c (ceph): filestore: use updated btrfs ioctls
- Switch to the interface finally merged for 2.6.37-rc1.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:32 PM Revision a831b2aa (ceph): btrfs: update ioctls.h
- This is what was finally merged for 2.6.37-rc1.
Signed-off-by: Sage Weil <sage@newdream.net> - 01:37 PM Bug #528 (Closed): cephx client: unknown request_type 20737
- nevermind, this was ancient code.
- 01:36 PM Bug #528 (Closed): cephx client: unknown request_type 20737
- ...
- 12:04 AM Revision bb628d38 (ceph): Get "make dist" working, fix gui build issues
- * Fix VPATH builds (i.e., builds where srcdir != builddir).
Don't assume that we can get a source files named blah wi...
10/29/2010
- 11:42 PM Revision 7e8fc103 (ceph): mds: detect small dirs that should be merged, and merge them
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:47 PM Revision 8a8e37b8 (ceph): mds: hit dir popularity on unlink
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:29 PM Revision 73c6e4cc (ceph): osd: write potentially large pg info to object, not xattr [format change]
- Write past_intervals and snap_collections to a separate object instead of
an attr on the collection directory. This ... - 09:36 PM Revision ed89d9a2 (ceph): cephtool-gui : more helpful error on pixbuf fail
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:06 PM Revision 18bb14ea (ceph): osd: snap_trimmer: flush between collection sets
- We need to make sure the objects whose collection sets we just adjusted
are reflected on disk when we make the next p... - 08:07 PM Revision 0ce1d509 (ceph): mds: Refactor need_snapflush into CInode helpers
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:07 PM Revision 440cc439 (ceph): mds: auth_pin head/snap pairs for all need_snapflush entries
- This ensures that when snap metadata is flushed, we will be auth on both
inodes and be able to do the update properly... - 07:40 PM Revision 9f86a79d (ceph): configure.ac: default to --enable-gtk
- Default to enabling gtk rather than disabling it. Gracefully handle
cases where the user tries to enable it but it ca... - 07:06 PM Revision c2045286 (ceph): osd: fix decoding of legacy (v2) coll_t
- It was u8, not int.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:42 PM Revision 4e180fef (ceph): mds: fix use-after-free iterator no-no in remove_inode_recursive()
- We can't close_dirfrag and use the iterator if it's pointing to the
released element.
Signed-off-by: Sage Weil <sage... - 06:42 PM Revision 96e583d3 (ceph): mds: use list instead of vector for trim_unlinked_inodes
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:30 PM Revision 8906819a (ceph): Merge branch 'mds_journal' into unstable
- 06:29 PM Revision aa83e11c (ceph): mds: log if trim_non_auth does anything, since it shouldn't
- be now except on rollbacks.
- 06:29 PM Revision 404c83e3 (ceph): mds: Call trim_non_auth_subtree when appropriate.
- 06:29 PM Revision 20745218 (ceph): mds: add function MDCache::trim_non_auth_subtree
- Trims the subtree rooted at the given dir from cache, except
for those portions linking to directories on other MDSes... - 05:48 PM Revision 8255a671 (ceph): debian: fix changelog
- (This was actually in the 0.22.1-1 package we built.)
Signed-off-by: Sage Weil <sage@newdream.net> - 04:41 PM Bug #374 (Can't reproduce): mon: osd will null addr added to map
- Couldn't find anything with code inspection, and haven't been able to reproduce. Hopefully if/when this pops up agai...
- 04:39 PM CephFS Feature #340: large directories, directory fragmenting
- split and merge rewritten and working. now for the stress testing.
- 03:24 PM Bug #522 (Resolved): osd: put potentially large pg info in separately object, not xattr
- commit:73c6e4cca7d8265e1e478e83d97a638cc7fa6a24
- 02:42 PM CephFS Bug #520: mds: change ifile state mix->sync on (many) lookups?
- Nothing wrong on the client.. it's just that the mds has / (a subtree root) in the MIX state, and file_eval doesn't d...
- 01:02 PM CephFS Bug #360 (Resolved): mds: head/snapped snap_cap linkage may cross mdss
- For now, let's just auth_pin(). resolved by commit:440cc43956f367e6c8fb1077c83693ff568c9d2c
- 12:43 PM Bug #518 (Resolved): cfuse crashed on ls
- Checked the MDS side and haven't heard back yet, so I'm going to close this out unless I hear about more issues.
- 12:38 PM CephFS Bug #525 (Closed): Audit CInode creation code for initialization
- Looks all good now!
- 09:16 AM CephFS Bug #525 (Closed): Audit CInode creation code for initialization
- Specifically, it seems there are some times when truncate_size (and truncate_seq?) aren't geting set. Check if there ...
- 11:31 AM CephFS Bug #329 (Resolved): mds: mislinked dentry found during journal replay
- The multi-mds fix has been pushed to mds_journal branch commit:aa83e11c67165878e1ca1b0fe66ff9b8c3a906c8. Then merged ...
- 10:38 AM Messengers Feature #527 (Resolved): zero copy reads, msgr rx buffer infrastructure
- Currently all messages read off the wire (include read results) go into newly allocated buffers. This results in a d...
- 10:21 AM Feature #526 (Resolved): osd: unfound objects rework
- We want to let the page group go active even if there are some unfound objects. We will keep track of which objects a...
10/28/2010
- 11:46 PM Revision ba6d931b (ceph): Mutex: add more checks to lockdep
- When lockdep is enabled, use PTHREAD_MUTEX_ERRORCHECK instead of
PTHREAD_MUTEX_NORMAL for non-recursive mutexes.
Sig... - 10:51 PM Revision 65fbd2ec (ceph): Add the Ceph monitoring GUI
- This adds a graphical monitoring mode to the ceph cluster monitoring tool. Its
functionality is similar to ./ceph -w... - 10:51 PM Revision c8839035 (ceph): cephtool gui: install and locate gui_resources
- Make install now installs the gui resource files into
/usr/share/cephtool/gui_resources (or wherever we configure it ... - 10:51 PM Revision c13183e7 (ceph): cephtool: join GUI thread before shutting down
- Join GUI thread before shutting down.
Move open_icon function.
Delete unused get_widgets function.
Signed-off-by: ... - 10:51 PM Revision 819ad635 (ceph): cephtool: fix initialization race
- Call GuiMonitor::link_elements before GuiMonitor::connect_signals.
It doesn't seem safe to set up callbacks before t... - 10:51 PM Revision 89273b7f (ceph): cephtool: only initialize the tokenizer once
- Only initialize the tokenizer once. It gets cranky if it we call
tok_init more than once.
Signed-off-by: Colin McCab... - 10:51 PM Revision 329fbc24 (ceph): cephtool: gui: handle bad input in view_node
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 10:51 PM Revision 7f90cc27 (ceph): ceph: gui: update copyright foo
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 10:49 PM Revision 10466c52 (ceph): qa: add rbd test
- 09:55 PM Revision b44901cb (ceph): SubmittingPatches: initial version
- Largely based on Linux's version. Includes the Signed-off-by stuff at
the top, and a bit more modern description of ... - 09:32 PM Revision b6ffdf18 (ceph): qa: add basic rbd test
- 09:10 PM Revision b434bb1a (ceph): osd: store locator with object_info; add incompat feature
- Also: We adjust the get_object_context() et al helpers to take a locator.
We include a locator (sometimes) in the MOS... - 09:03 PM Revision ec8960ff (ceph): osd: make object_locator_t encodable
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:59 PM Bug #518: cfuse crashed on ls
- Okay, commit:4fd49203b6c757b97455f1b85d5b93c76e20e199 is a partial revert of my initial (incorrect) fix, but keeps th...
- 05:03 PM Bug #518 (In Progress): cfuse crashed on ls
- Okay, never mind my previous statements. I just hit this while using vstart.sh -n -d on testing.
Looking over potent... - 10:32 AM Bug #518 (Resolved): cfuse crashed on ls
- Okay, best I can tell this is happening because of some weird interactions between a few different cfuse patchsets we...
- 08:49 PM Revision 771c2c44 (ceph): mds: Migrator needs to add_dir_context all the way to root.
- It was going to the default subtree root, which doesn't
work when we've just created a new subtree root out of the gi... - 08:40 PM Revision 66e1d9fc (ceph): osd: fix unneeded get_object_context() (and leak) in _rollback_to
- All we want is the name of the head sobject_t, which is 'soid' in the
parent frame.
Signed-off-by: Sage Weil <sage@n... - 08:30 PM Revision 4fe3ec91 (ceph): cephfs: remove unused variables
- Signed-off-by: Sage Weil <sage@newdream.net>
- 07:10 PM Revision ee27a61b (ceph): objecter: refactor interface with object_locator_t
- This paves the way for a locator that lets the user specify an arbitrary
string to hash for placement (instead of the... - 07:10 PM Revision 7a688a9f (ceph): config: fix signal handler recursion
- Avoid having old handler pointer match the new handler.
Avoid calling an old handler if it pointer is null.
Signed-... - 06:02 PM Revision 8f085108 (ceph): mds: pin NEEDSNAPFLUSH only when adding item
- This is mainly paranoia.
Signed-off-by: Sage Weil <sage@newdream.net> - 04:42 PM Feature #509 (Resolved): assimilate ceph gui code
- It's in there. I integrated it with cephtool too.
cheers,
C. - 02:20 PM Feature #524 (Resolved): object_locator_t
- 02:16 PM Tasks #442 (Closed): reconfigure cosd cluster
- did this. now set up like sepia (sudo make DESTDIR=/images/cosd). running latest unstable.
- 02:00 PM CephFS Bug #523 (Can't reproduce): cfuse locks don't wake on mds reconnect?
- Don't know the exact cause, but I was running clustered mds tests using snaptest-2.sh and once the MDSes had failed a...
- 04:27 AM Revision 1a0ac01f (ceph): osd: handle missing objects on snap read
- The old check in handle_op doesn't work because we don't provide a snap
context on read, and we haven't loaded one of... - 04:27 AM Revision 8d37b280 (ceph): debian: change compat to 6 to match debhelper require
- Reported-by: Laszlo Boszormenyi <gcs@debian.hu>
Signed-off-by: Sage Weil <sage@newdream.net> - 12:17 AM Revision 69f5ccdd (ceph): mds: store dir inode in separate object; fetch from both. incompat flag.
- This avoids setting large xattrs. There's no reason the inode needs to be
on the same object as the dir(frag) data.
...
10/27/2010
- 10:15 PM Linux kernel client Tasks #480 (Resolved): rebase btrfs snapshot ioctls, resend to list
- 09:43 PM Revision 745a8ee5 (ceph): clarify CDir/CInode content comments a little bit
- 08:21 PM Revision c1d07816 (ceph): filestore: can force use of stale snaps
- also, overwrite the commit_seq with the current version in case we
forced stale snaps. - 06:39 PM Revision bcc068ea (ceph): filestore: read commit_seq before mounting (btrfs ioctls)
- 06:39 PM Revision c1a6ee57 (ceph): filestore: don't revert to old snapshots on startup
- This should fix bug #55
- 05:14 PM Bug #522 (Resolved): osd: put potentially large pg info in separately object, not xattr
- I'm looking at the prior interval stuff, currently an attr on the head pg dir. This can be an object in meta/.
- 04:25 PM Bug #518: cfuse crashed on ls
- compiled b5d9bec659daa8ba26810e7508ec473aba8ad287 but is still crashing on ls:...
- 01:21 PM Bug #55 (Resolved): osd: fix transition from snaps -> no snaps -> snaps
- Fixed with commit:c1d078160a454c92fea899659d506e0b0ab7d92b.
- 11:45 AM Bug #521 (Resolved): objecter: crash in osdmap assert
- ...
- 06:28 AM Revision ae78ed42 (ceph): ceph.cc: delete deadcode
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:25 AM Revision 551711fb (ceph): Move ceph.cc to tools/
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 03:59 AM Revision a14dd819 (ceph): configure.ac: add ./configure option for gtk2
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 03:06 AM Revision 5fe0b5a0 (ceph): mds: fix split use after free; merge works
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:20 AM Revision b771ba89 (ceph): mds: simplify fragtree_t printer
- val/bits^split
Signed-off-by: Sage Weil <sage@newdream.net> - 02:19 AM Revision 4afbc529 (ceph): mds: check/take wrlock on dirfragtreelock; unwind after freeze if needed
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:19 AM Revision 4c79f369 (ceph): mds: requeue dir if we can't split now due to dftlock
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:19 AM Revision 05fa106c (ceph): mds: implement frag.parse()
- 02:19 AM Revision 7bd00b96 (ceph): mds: implement command 'merge_dir path frag'
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:19 AM Revision 0f8f02d3 (ceph): mds: add 'mds bal split bits' config option (default 3)
- This is how many bits we fragment by, by default.
Signed-off-by: Sage Weil <sage@newdream.net> - 02:19 AM Revision 2f9c9606 (ceph): client: fix dup entries in multifrag readdir
- We need a next_offset of 0 for non-leftmost frags. Otherwise we set
our dentry offsets incorrectly and the next_offs... - 02:19 AM Revision 96d26e38 (ceph): mds: reimplement split_dir
- Do not use an mdrequest; the old approach was totally broken wrt freezing,
locks, and deadlock.
First freeze, then l... - 02:19 AM Revision e1b53794 (ceph): mds: generalize split/merge call chain a bit
- Still need work at the lower levels.
Signed-off-by: Sage Weil <sage@newdream.net> - 02:19 AM Revision 332195a2 (ceph): mds: clean up merge() callchain
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:19 AM Revision a4b21449 (ceph): mds: don't replicate new frags (at least for now)
- Lease commented out stubs in place.
- 02:19 AM Revision e79417ba (ceph): mds: move fragment checks into shared helper
- Signed-off-by: Sage Weil <sage@newdream.net>
10/26/2010
- 11:28 PM Revision 96beaf6c (ceph): messenger: always unlock existing pipes, even if they're lossy
- 07:58 PM Revision b5d9bec6 (ceph): client: Initialize Inode::truncate_size to 0 instead of -1, and check p...
- on truncation.
truncate_size needs to precisely match the defaults on the MDS, or we run into
problems when importin... - 07:30 PM CephFS Bug #520 (Closed): mds: change ifile state mix->sync on (many) lookups?
- I'm seeing this on csyn --syn makefiles 1000 1 0...
- 07:04 PM Revision 2ed57d2a (ceph): Merge remote branch 'origin/testing' into unstable
- Conflicts:
configure.ac
src/rados.cc - 07:00 PM Revision ef90cb5e (ceph): filestore: some cleanup
- 06:59 PM Revision 54fdd641 (ceph): filestore: escape the xattr chunk names
- 06:41 PM Revision 84b85aa6 (ceph): osd::Missing: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:41 PM Revision 45f7110d (ceph): osd: move PG::Missing implementation to PG.cc
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:06 PM Revision 44202873 (ceph): filestore: some cleanup
- 01:05 PM Bug #518: cfuse crashed on ls
- commit:b5d9bec659daa8ba26810e7508ec473aba8ad287 in testing. Waiting to hear back before closing.
- 10:34 AM Bug #518: cfuse crashed on ls
- These numbers confused me. When I get confused, I like to generate logs. Like the attached one.
- 09:47 AM Bug #518: cfuse crashed on ls
- And while I've got it here's the inode printout:
$4 = {ino = {val = 1099511632800}, snapid = {val = 18446744073709... - 09:39 AM Bug #518: cfuse crashed on ls
- All right, got in and found:
Identical truncate_seqs of 2.
Identical truncate_sizes of 0.
prior_size of 209715200.... - 12:34 PM Feature #169 (Resolved): osd: start up despite corrupted pg log(s)
- 04:52 AM Revision 2a3e73bb (ceph): Merge branch 'btrfs_snap_ioctls' into unstable
- 04:52 AM Revision f131f429 (ceph): filestore: warn if btrfs_snaps enabled but no async snap create ioctl
- Signed-off-by: Sage Weil <sage@newdream.net>
10/25/2010
- 11:51 PM Revision 5e453454 (ceph): Merge branch 'objectcacher' into unstable
- 11:50 PM Revision b15e3b48 (ceph): client: fix to handle new ObjectCacher pool requirements.
- 11:50 PM Revision 38d7ddf2 (ceph): osdc: Add pool awareness to the ObjectCacher, to prevent unfortunate co...
- 11:45 PM Revision 7bb31f75 (ceph): osdc: Fix release_all so it loops properly
- 11:44 PM Revision a8f6ba94 (ceph): add cephfs to deb, rpm
- 11:44 PM Revision 00d54428 (ceph): mds: fix up mds_bal_frag options
- Use the mds_bal_frag option to enable/disable. Make checks consistent.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:44 PM Revision e275e855 (ceph): mon: remove pg from deleted pools from pg_map
- Signed-off-by: Sage Weil <sage@newdream.net>
- 11:36 PM Revision e27f0b1e (ceph): filestore: escape the xattr chunk names
- 10:24 PM Revision 961d3bc4 (ceph): PG::Log::Entry: remove unused snap_t field
- The snap_t information is stored in the sobject_t field now.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 10:11 PM Feature #359 (Resolved): osd: use new btrfs snapshot ioctls
- looks like the ioctl checks were fine. merged in commit:2a3e73bb325f89b708df1fc1fa889de238f5edd7
- 09:11 PM Feature #359: osd: use new btrfs snapshot ioctls
- Now that we know what's going in this cycle, we just need to make sure the ioctl checks are correct (no more DESTROY_...
- 09:08 PM CephFS Feature #519 (Closed): mds: dirfrag merge
- 08:42 PM Revision 61b3fc35 (ceph): makefile: add cephfs
- 07:31 PM Revision d4bbde5a (ceph): ./ceph osd setcrushmap: validate crushmap
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:08 PM Revision 394b0712 (ceph): crush: improve error handling in map decoding
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 05:10 PM Bug #518: cfuse crashed on ls
- Unfortunately I can't use these because my libraries don't match....gdb is finicky. :(
Can you instead run gdb you... - 04:36 PM Bug #518: cfuse crashed on ls
- attached core dump and cfuse binary as requested on irc
- 10:59 AM Bug #518 (Resolved): cfuse crashed on ls
- cluster: 2 monitors, 2 metadata servers, 3 osds.
cfuses segfault when ls is run on the mount point (df works, cd /... - 04:51 PM Bug #507 (Resolved): objectcacher mixes pool namespaces
- Merged into unstable as of commit:5e453454f8cc539de46d0ee2666e7a98e71a27a6.
- 03:56 PM Bug #507: objectcacher mixes pool namespaces
- Pushed to the objectcacher branch. I think this is done but need to make sure it's not breaking anything with its han...
- 12:35 PM Bug #517 (Resolved): monitors crashing on startup after injecting corrupt crush map
- Fixed by commit:d4bbde5ab171b37d1ecefdd396b7b04c6d41d0d2 and commit:394b0712bc2c12cba6b6043f633a9670c46e4df7
- 09:57 AM Bug #517: monitors crashing on startup after injecting corrupt crush map
- Need to decode the provided map in a try {} block to verify it is valid before using it. In OSDMonitor::prepare_comm...
10/24/2010
- 09:28 PM Revision a869b35a (ceph): cap_reconnect_t: ignore embedded NULLs in the path
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:56 PM Linux kernel client Bug #498: reconnect sends string with NULL?
- By change a869b35abdab37bd4505f435bf0f7ab1860b28cc, we no longer assert when there's a NULL in the path string.
I'... - 08:53 AM Bug #517 (Resolved): monitors crashing on startup after injecting corrupt crush map
- I followed the instructions at http://ceph.newdream.net/wiki/OSD_cluster_expansion/contraction to add a 3rd osd node ...
10/23/2010
- 08:47 PM Bug #516 (Closed): filestore: handle large xattrs on ext3
- 08:46 PM Bug #515 (Can't reproduce): osd: recovery isn't completing
- I'm seeing a few stray objects left over on sepia.
- 05:49 PM Revision e912e686 (ceph): v0.22.1
- 05:17 PM Revision 96d46737 (ceph): Makefile: add errno.h
- 05:17 PM Revision a974cfda (ceph): mds: be quiet about snaprealm push/pop
- Signed-off-by: Sage Weil <sage@newdream.net>
10/22/2010
- 11:06 PM Revision 69078946 (ceph): filestore: ignore ENOSPC on setxattr pending a better workaround
- This effectively reverts to old behavior (we weren't checking for ENOSPC
errors at all before). Log which object it ... - 10:55 PM Revision 6826ce4a (ceph): filestore: change xattr chunk size to 2048
- 10:55 PM Revision 76157d91 (ceph): filestore: split xattrs to multiple chunks
- 10:55 PM Revision 3ee37ee7 (ceph): rados: add getxattr, setxattr
- 10:52 PM Revision 22bb2118 (ceph): filestore: change xattr chunk size to 2048
- 10:51 PM Revision 557e7e34 (ceph): mds: Add new LOCK_MIX_STALE state to lock structs.
- 10:51 PM Revision 512a1da9 (ceph): mds: Check for LOCK_MIX_STALE along with LOCK_MIX
- LOCK_MIX_STALE precludes writing to the protected data, but
in general cases it's an acceptable state whenever LOCK_M... - 10:51 PM Revision f893a63b (ceph): mds: rename Locker::file_mixed to scatter_mix
- 10:51 PM Revision 372e8b3e (ceph): mds: Add bool "dirty" to ScatterLock, plus manipulation functions.
- Also add is_dirty() to SimpleLock so we don't need typing in these checks.
This lets us set that a dirfrag's account... - 10:51 PM Revision 47a5fc95 (ceph): mds: Whenever we set locks to state LOCK_MIX, check is_stale()
- and set to state LOCK_MIX_STALE instead, if necessary.
- 10:51 PM Revision db6759fe (ceph): mds: use set_stale() as appropriate:
- 1) When we update a lock but can't write its new data,
2) We load potentially-stale data off disk (ie, in restart). - 10:51 PM Revision b4fd986a (ceph): mds: Remove scatter_pins.
- We used these to prevent freezing a tree during gather-scatter ops,
but now we can just go stale on data when a scatt... - 09:33 PM Revision 9d4f7b8e (ceph): librados: add rmxattr
- 08:36 PM Revision 429b2d99 (ceph): Revert "messenger: Make sure to unlock existing->pipe_lock. There are a...
- This reverts commit 96692d24c8cdf0fe88260949b67f8580e0c70696.
This patch accidentally got merged into the tree twice,... - 06:50 PM Revision 242b5992 (ceph): test_lost.sh: put common functions in test_common
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:24 PM Revision 55fcbc64 (ceph): Merge branch 'msgr' into unstable
- 06:16 PM Revision 696da815 (ceph): messenger: If we error out of accept() but have messages in our queue, ...
- This can occur if we're replacing another Pipe and hit an error
in the process. - 06:15 PM Revision 49d8fd8a (ceph): messenger: If we're replacing an existing Pipe, steal queue when we kil...
- Previously we could fail out after killing existing but before
splicing its queue into our own, which lost messages. - 06:01 PM Revision bf0d347d (ceph): PG::peer: introduce prior_set_build flag
- Just because we have prior_set.empty() doesn't mean that the prior set
wasn't built. Create a flag to represent this ... - 05:16 PM Revision b8c0d3df (ceph): filestore: update btrfs_ioctl.h
- 05:16 PM Revision 1a7a341d (ceph): filestore: use different encoding for snap async_create
- 05:16 PM Revision bb451d20 (ceph): filestore: use SNAP_DESTROY_ASYNC ioctl if available
- 05:16 PM Revision a3d8c1ff (ceph): filestore: remove stray async_snap_test if present
- This cleans up if a prior instance failed to delete its
async_snap_test subvol. - 05:16 PM Revision 953ef1da (ceph): filestore: use new async btrfs ioctls
- 05:11 PM Revision 78352b32 (ceph): osd: fix deadlock in map handler
- To avoid deadlock,
- we need to drop osd_lock while we flush.
- we need to take map_lock _after_ we flush.
Signed-of... - 05:10 PM Revision 515efd5a (ceph): rados: add getxattr, setxattr
- 05:10 PM Revision f96eb805 (ceph): filestore: split xattrs to multiple chunks
- 04:44 PM CephFS Feature #340: large directories, directory fragmenting
- We still need to add a wrlock of the dirfragtreelock.
- 04:23 PM CephFS Cleanup #514 (Rejected): Optimize MIX/MIX_STALE reconnects, etc
- Right now the MDS puts locks into the MIX_STALE state whenever it loads from disk. This is safe but unnecessary. Fix!
- 04:11 PM CephFS Feature #495 (In Progress): mds: add MIX_STALE
- A first pass is done and pushed to the mix_stable branch. Testing and debugging now, but that may take a while.
- 11:42 AM Bug #55: osd: fix transition from snaps -> no snaps -> snaps
- I think all we need to do is look at current/commit_op_seq. If it is greater than the newest snap, than that snap is...
- 11:25 AM Bug #505 (Resolved): osd assert on flab
- Well, that was a Duh.
Fixed in commit:49d8fd8a21778d0f805176d670d5f63f14e36b47 and commit:696da81588621ac9ee256993a1... - 04:45 AM Revision 6a88d572 (ceph): mds: implement 'fragment_dir path frag by' command
- For testing dir fragmentation.
Signed-off-by: Sage Weil <sage@newdream.net> - 04:38 AM Revision b4f82328 (ceph): Merge branch 'testing' into unstable
- 12:31 AM Revision ce050ef6 (ceph): Create cpp_strerror to make error reporting easier
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision dec5b787 (ceph): errno: add missing common/errno.h
- 12:31 AM Revision 881bf02d (ceph): include/utime.h: should include include/types.h
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 399d31fa (ceph): test_lost.sh: ensure that recovery doesn't start.
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 831075c4 (ceph): osd: PG::prior_set_affected: fix lost OSD detection
- When looking for newly-lost OSDs, we should check prior_set_lost rather
than prior_set. Down OSDs often are in PG::pr... - 12:31 AM Revision 17c615c0 (ceph): osd: build_prior: clean up started_since_joining
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 3cbeaa14 (ceph): prior_set_affected: log msg when we see a lost osd
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 7207476e (ceph): PG::recover_master_log: replace count with find
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 96692d24 (ceph): messenger: Make sure to unlock existing->pipe_lock. There are a few cas...
- 12:31 AM Revision ad270f91 (ceph): osd: test: Add script to test LOST state
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision dc18e7a0 (ceph): osd mon: validate arguments before marking lost
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 3e4e73f2 (ceph): OSDMap::print: print osd_info_t using ostream op
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 794cf707 (ceph): osd: fix spacing in OSDMap::print
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision e3a53bbf (ceph): osd: track prior_set_lost
- In the placement group code, track prior_set_lost. This fixes a bug
where a new OSDMap updates an OSD's lost_at time,... - 12:31 AM Revision dca856d1 (ceph): PG::build_prior: update comment
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision f812f7eb (ceph): OSDMap: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 1d8d744e (ceph): test_lost.sh: update timeout, fix payload
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:31 AM Revision 10ec8ce5 (ceph): Timer.cc: add testtimers
- Add testtimers to test the timer code.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 12:31 AM Revision 17b8b0d7 (ceph): TestTimers: test SafeTimer as well as Timer
- Test SafeTimer as well as Timer. Test timer shutdown.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 12:31 AM Revision 9c59a6aa (ceph): debian: 0.22-4
- 12:31 AM Revision 4d1b9e69 (ceph): makefile: simplify cdebugpack install rule
- 12:31 AM Revision 3d94b6af (ceph): FileJournal: fix journal size calculation
- If the journal is a raw block device, the user shouldn't need to give a
journal size argument most of the time-- it s...
10/21/2010
- 11:16 PM Revision acc2e4de (ceph): mds: show readdir frag
- Signed-off-by: Sage Weil <sage@newdream.net>
- 11:16 PM Revision e380fc2e (ceph): client: reset fg after _readdir_get_frag
- The _readdir_get_frag may remap our frag; update the local variable
accordingly.
Signed-off-by: Sage Weil <sage@newd... - 11:15 PM Revision 0abf57b6 (ceph): client: fix skipped dentry on readdir chunk boundaries
- The at_cache_name is the last name successfully passed to the caller.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:15 PM Revision 59426bdd (ceph): client: fix dcache removal during multiple frags
- We remove unexpected dentries from our cache while processing mds results.
Results are ordered within a frag, but not... - 11:13 PM Revision 6c2f0f07 (ceph): client: show file offsets in hex
- This makes it easy to pick out frags and offsets.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:50 PM Revision 28d89928 (ceph): messenger: a 0 timeout on ::poll really means don't wait
- (as opposed to -1, which waits until an event occurs).
So, set the default timeout to -1, and convert ms_tcp_read_ti... - 08:07 PM Revision 32ba7760 (ceph): mds: fix inodestat encoding when frags are present
- Also simplify the max_size check calculation.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:48 PM Revision cb82eb59 (ceph): mds: do not finish_scatter_gather_update_accounted on dirfraglock
- This needs to match finish_scatter_gather_update, and we don't
update/project the dirfrag there.
Signed-off-by: Sage... - 06:37 PM Revision 814f9dbd (ceph): objecter: reconnect on osd disconnect
- If the connection closes to an OSD, we need to reconnect and resubmit our
ops. Otherwise we just hang. This is prob... - 06:18 PM Revision 34da1ac8 (ceph): rgw: return 204 on successful removal of bucket/object
- 06:18 PM Revision 44c78634 (ceph): init-ceph: Make sure daemon_is_running() checks the correct instance
- When starting multiple instances of a daemon on a single host,
for unknown reasons /var/run/ceph/$type.$id.pid can ho... - 06:14 PM Revision 78660cd6 (ceph): objecter: pause writes when FULL flag is set
- Also, subscribe to all osdmap updates while FULL flag is set, so that we
discover when it is unset.
Signed-off-by: S... - 06:14 PM Revision 66e493dd (ceph): objecter: always set READ or WRITE flag
- We should set either (or both). Assert if we don't.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:56 PM Revision 58f2f375 (ceph): include/utime.h: should include include/types.h
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:53 PM Revision c1f2f9a1 (ceph): rgw: return 204 on successful removal of bucket/object
- 04:07 PM Bug #505 (In Progress): osd assert on flab
- The error that exposed this was introduced in commit:8528ebb0c6286eb6660773fcaf29d1cccd98d72c, but the root cause is ...
- 03:36 PM Bug #513 (Closed): limited xattrs length
- We hit a problem where we have to use really large xattrs, we hit a limitation of the underlying fs. Need to figure o...
- 03:04 PM Bug #512 (Resolved): rados_initialize returns 0 when ceph.conf contains no monitors
- if ceph.conf contains no monitors, calling rados_initialise prints "unable to find any monitors in conf", doesn't act...
- 11:56 AM Feature #511 (Resolved): librados: implement flush
- Just wait for any previous writes to complete.
- 11:35 AM Bug #506 (Resolved): objecter: handle disconnects from osds
- Actually, it wasn't handling osd reconnects at all. Doh.
Fixed by commit:814f9dbdc57238d4e10c8e93fc298e9d3744516b - 11:16 AM Bug #510 (Resolved): objecter: (optionally) honor osdmap full flag
- commit:78660cd6ebd9456a26df10c39a13226267061745
- 10:18 AM Bug #510 (Resolved): objecter: (optionally) honor osdmap full flag
- We don't want to honor it on the MDS, but we do for librados etc. Make it optional.
- 10:16 AM Bug #496 (Closed): osd: OSDMap::decode / PG::read_log
- See #502. Closing this one out.
- 10:09 AM Feature #509 (Resolved): assimilate ceph gui code
- Michael McThrow has written a simple ceph gui with similar functionality to 'ceph -w', but based on an old version of...
- 12:30 AM Revision a5f6da43 (ceph): errno: add missing common/errno.h
10/20/2010
- 11:38 PM Revision 1127e47c (ceph): Create cpp_strerror to make error reporting easier
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 10:27 PM Revision 18b1f78b (ceph): FileJournal: fix journal size calculation
- If the journal is a raw block device, the user shouldn't need to give a
journal size argument most of the time-- it s... - 08:47 PM Revision 9e3607fe (ceph): debian: 0.22-4
- 08:47 PM Revision 9b4ec49c (ceph): makefile: simplify cdebugpack install rule
- 07:14 PM Revision 6620a5a8 (ceph): TestTimers: test SafeTimer as well as Timer
- Test SafeTimer as well as Timer. Test timer shutdown.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 07:13 PM Revision 1b0cf69b (ceph): Timer.cc: add testtimers
- Add testtimers to test the timer code.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 04:24 PM Revision 1c6349c9 (ceph): Merge remote branch 'origin/testing' into unstable
- 04:10 PM Revision 99013bad (ceph): mon: Don't force a wait of paxos_propose_interval seconds on every commit.
- Instead, we wait
1) Until last_commit_time + paxos_propose_interval
2) If past paxos_propose_interval, for paxos_min_... - 01:48 PM Tasks #508 (Closed): test hadoop on sepia
- There are two options: the userland version (see http://ceph.newdream.net/wiki/Hadoop_FileSystem) and the kernel vers...
- 09:56 AM Bug #505: osd assert on flab
- Sage thinks it's a problem in reconnect, where the messenger is dropping messages which causes the OSD assert.
- 09:53 AM Bug #474 (Resolved): mon: improve paxos commit batching
- Pushed a fix to unstable in commit:99013badb676986deb82757b77d91d0aa1f54cc9.
Instead of waiting g_conf.paxos_propose...
10/19/2010
- 10:55 PM Revision 197928c2 (ceph): Objecter::shutdown(): call SafeTimer::Join()
- Objecter::shutdown() needs to call Timer::join() to ensure that
concurrently exectuting events in other threads get f... - 10:33 PM Bug #507 (Resolved): objectcacher mixes pool namespaces
- ObjectCacher uses a single object map for all objects, regardless of pool. Whoops.
- 09:44 PM Revision 5aca7285 (ceph): btrfs_ioc_test.c: added a unitest
- 05:23 PM Bug #505: osd assert on flab
- because:...
- 05:23 PM Bug #505: osd assert on flab
- but osd.0 didn't see those two it skipped:...
- 05:20 PM Bug #505: osd assert on flab
- osd.1 is replying out of order:...
- 12:14 PM Bug #505 (Resolved): osd assert on flab
- When running a few tests with radostool, I hit an assert in the OSD:
(12:12:48 PM) colinm@newdream.net/: osd/Repli... - 05:05 PM Bug #504 (Resolved): hang when using radostool
- The second issue looks like a transient osd issue.
Closing this for now, but we should keep an eye out for it happ... - 04:03 PM Bug #504: hang when using radostool
- Perhaps 197928c26cec52e0f3f91e930988b1e5767e355b will resolve the radostool shutdown race condition.
The second ba... - 12:06 PM Bug #504 (Resolved): hang when using radostool
- I was adding some objects using radostool, when I got an unexplained hang. It looked like this:
gdb -p 19724
(g... - 05:05 PM Revision dac9ecd0 (ceph): SimpleMessenger::Pipe::Accept(): fix open
- When not replacing an existing pipe, zero the 'existing' pointer.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 04:36 PM Revision b55af75f (ceph): Revert "Revert "messenger: introduce a "halt_delivery" flag, checked by...
- This reverts commit d44267c2d6a77d4a3cda1e44ec7c58a19be51cc4.
The problem with this code was that it's possible for t... - 02:55 PM Bug #506 (Resolved): objecter: handle disconnects from osds
- The kclient is smart about osd disconnect: if there are outstanding requests, it reopens the connection. Objecter do...
- 12:16 PM Bug #501: unexpected lockdep crash during vstart.sh
- I believe that the second crash I saw should be fixed by dac9ecd0e05f75744fd0f10ae51ec1d92e9931c1.
Resolved. - 11:12 AM Bug #479: ceph/mount crash badly when writing
- What's the exact client version and kernel that you're running on?
Please do the following under ceph-client-standalo... - 01:02 AM Bug #479: ceph/mount crash badly when writing
- Continuing with the ext4
I have set no-journal mode by commenting out the two lines in the ceph
; osd journal = /da... - 10:20 AM Bug #484 (Resolved): msgr: crash on just-closed pipe
- I think the issue is in SimpleMessenger::Pipe::connect()...
10/18/2010
- 08:29 PM Revision c0db71fb (ceph): debian: update standards-version; fix ceph-client-tools-dbg
- 08:29 PM Revision aab2a360 (ceph): debian: sign/publish specific deb version
- 08:29 PM Revision fd42c852 (ceph): filestore: deliberate crash on ENOSPC or EIO
- Neither of these are handled, so crash when we hit them. This ensures we
don't blindly continue on with a partially ... - 08:28 PM Revision 19c2c833 (ceph): filestore: deliberate crash on ENOSPC or EIO
- Neither of these are handled, so crash when we hit them. This ensures we
don't blindly continue on with a partially ... - 08:02 PM Revision 781874f0 (ceph): messenger: Make sure to unlock existing->pipe_lock.
- There are a few cases in the "open" section where we can go to
fail_unlocked while still holding existing->pipe_lock.... - 05:19 PM Revision 1b2e9927 (ceph): debian: update scripts to do packaging fixes
- 04:59 PM Bug #479: ceph/mount crash badly when writing
- Looks like some issue with the journal:
2010-10-19 11:42:43.918144 7ffae0acf710 journal room 3928063 max_size 1048... - 03:58 PM Bug #479: ceph/mount crash badly when writing
- Update: more concise setup :)
I created simple four files; file1 (1MiB), file10 (10MiB), file100 (100MiB), file1000 ... - 03:27 PM Bug #479: ceph/mount crash badly when writing
- Thanks.
I've ran the above lines, no crash. But There was nothing in the osdc.
I'm unsure what output to expect fro... - 02:08 PM Bug #479: ceph/mount crash badly when writing
- From the osdc.txt, it looks as if none of the IOs are actually flushing to disk. Can you do a simple test like
<pre... - 03:54 PM Bug #501: unexpected lockdep crash during vstart.sh
- I applied the fix, but then I got a different crash in cmon:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000719b... - 11:24 AM Bug #501 (Resolved): unexpected lockdep crash during vstart.sh
- Looks like it's caused by trying to pipe_lock.Lock() while holding existing->pipe_lock.
Should be fixed in commit:4b... - 11:01 AM Bug #501 (Resolved): unexpected lockdep crash during vstart.sh
- I was running the unstable branch, at commit 1190313ae954f12f9b5bc364e1226d6d2440880c.
To test, I was running "vstar... - 03:13 PM Bug #354 (Resolved): Detect errors during transactions
- EIO and ENOSPC now checked.
- 02:23 PM Linux kernel client Tasks #499 (Resolved): avoid dcache_lock inside i_lock, if possible
- 11:54 AM Linux kernel client Tasks #499: avoid dcache_lock inside i_lock, if possible
- see commit:95c9f6141d0d4af18dd41165cc4e5a1d0fc10f57 ?
- 08:51 AM Linux kernel client Tasks #499: avoid dcache_lock inside i_lock, if possible
- Mmm, see this friendly thread: http://marc.info/?t=128721715100001&r=1&w=2
- 01:41 PM Bug #503 (Closed): osd: query osds since last_epoch_clean before concluding objects lost?
- We currently query prior_set osds through last_epoch_started. This gives us teh latest log and version. But if we ar...
- 01:33 PM Linux kernel client Tasks #422 (Resolved): update ceph-client-standalone.git for multiple modules
- 01:32 PM Linux kernel client Bug #502 (Won't Fix): honor osdmap FULL flag
- We should return ENOSPC (presumably) if attempting to write to a full osd cluster.
This needs to go somewhere in o... - 01:29 PM Bug #496: osd: OSDMap::decode / PG::read_log
- See commit:fd42c8527be21923d633b253a3260e1e600c1853 and commit:19c2c8332915c323defb2ff2e62bee2e7a3db845. These will ...
- 01:14 PM Bug #496: osd: OSDMap::decode / PG::read_log
- This all looks like fallout from a full disk and failed writes.
The unstable branch has some code to handle corrup... - 12:47 PM Linux kernel client Bug #497 (Closed): (no request) in /sys/kernel/debug/ceph/*/mdsc?
- The stalled reconnect was probably #498.
- 10:19 AM Bug #484 (In Progress): msgr: crash on just-closed pipe
- Apparently the playground was failing because nobody could connect to the monitors, and with this commit reverted it ...
- 08:37 AM CephFS Bug #500 (Closed): mds: FAILED assert("shouldn't be called if we are already xlockable" == 0)
- nevermind, old code.
- 08:36 AM CephFS Bug #500 (Closed): mds: FAILED assert("shouldn't be called if we are already xlockable" == 0)
- ...
- 03:15 AM Revision d44267c2 (ceph): Revert "messenger: introduce a "halt_delivery" flag, checked by queue_d...
- This reverts commit 69be0df61d29a093dbeadf6dbcd4e18b429d0a22.
- 03:04 AM Revision 69b764a8 (ceph): mon: add 'mds rm <gid>' and 'mds rmfailed <id>' commands
- For cleaning up the mds map when things get weird.
Signed-off-by: Sage Weil <sage@newdream.net> - 03:00 AM Revision ce09cbdd (ceph): Merge remote branch 'origin/testing' into testing
10/17/2010
- 09:18 PM Linux kernel client Tasks #499 (Resolved): avoid dcache_lock inside i_lock, if possible
- ...
- 08:50 PM Linux kernel client Bug #498 (Can't reproduce): reconnect sends string with NULL?
- I saw this with the latest v0.22 in the mds log:...
- 08:48 PM Linux kernel client Bug #497 (Closed): (no request) in /sys/kernel/debug/ceph/*/mdsc?
- saw this after an mds restart on ladder0:...
10/16/2010
- 01:59 AM Bug #496 (Closed): osd: OSDMap::decode / PG::read_log
- This morning I found out that 4 of my 12 OSD's had crashed at almost exactly the same moment, all with the following ...
10/15/2010
- 11:43 PM Revision 1190313a (ceph): Merge branch 'rc' into unstable
- Conflicts:
configure.ac
src/mds/ScatterLock.h - 10:34 PM Revision 2bc159e6 (ceph): debian: no libgoogle-perftools-dev on lenny
- 10:34 PM Revision 8a7c95f6 (ceph): v0.22
- 08:41 PM Revision d8ee92a6 (ceph): mds: take nestlock wrlock when projecting rstat into dirfrag
- We were already checking that we _can_ wrlock before doing the rstat
projection (if we can't, we mark_dirty_rstat() o... - 08:41 PM Revision 0e472d4a (ceph): mds: use correct helper when pinning past snaprealm parent
- The heler also updates the SnapRealm::open_past_parents, which is needed
for the have_past_parents_open() check.
Tha... - 08:41 PM Revision b8ab009a (ceph): mds: cleanup: print waiter masks in hex
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:41 PM Revision 180f4412 (ceph): mds: cleanup: clarify issue_seq in cap release debug output
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:21 PM Revision 8528ebb0 (ceph): messenger: introduce timeouts on pipes.
- This will return read errors on a pipe if it gets no data
for the given period of time (default 15 minutes). In a sta... - 05:41 PM Revision 6e1eeac3 (ceph): rgw: small cleanup
- 05:41 PM Revision b378cb48 (ceph): Add RGW_PRINT_CONTINUE to control wether we print the 100-continue header
- Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
- 05:17 PM Revision 32e790cf (ceph): conf: only set sig handler if wasn't set already
- Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
- 04:12 PM Bug #481 (Resolved): cosd leaking messenger threads
- 11:46 AM Feature #169: osd: start up despite corrupted pg log(s)
- 11:15 AM CephFS Cleanup #493 (Rejected): mds: allow scatter_pinned inode to go from mix -> sync
- We're going to skill scatter_pins instead, see #495
- 11:13 AM CephFS Feature #495 (Resolved): mds: add MIX_STALE
- ...
- 10:40 AM rgw Bug #439 (Resolved): Duplicate "Status" headers being sent
- Applied patch, commit:b378cb4899e78d1d1e5b81f376a2536e56fe54c4. Resolving.
- 10:16 AM Bug #494 (Resolved): reentrant sigabort handler?
- 10:16 AM Bug #494: reentrant sigabort handler?
- We set the signal handler multiple times (probably due to injectargs). Fixed by commit:32e790cf03c80b71cd224cf9c2e284...
- 07:43 AM Bug #494 (Resolved): reentrant sigabort handler?
- Something is amiss here? This was triggered by a regular assertion failure, on ceph version 0.22~rc (commit:60bfc670...
- 07:02 AM rbd Bug #489 (Closed): Memory leak when doing a lot of I/O
- I've got two rsync's running right now (Debian CD and kernel.org pub) without any problems at all. Memory usage is st...
- 05:47 AM rbd Bug #489: Memory leak when doing a lot of I/O
- I'm positive I used the latest version. I just backported qemu-kvm from Ubuntu 10.10 (Maverick) which is Qemu-kvm ver...
- 06:01 AM Bug #490: Cluster stays in a degraded state
- Tnx, this shows me a lot of information, but it's not clear what tells me which PG is degraded.
Just checked my OS... - 03:06 AM Revision dfc46f5e (ceph): mon: do not assert if paxosv < monmap->epoch
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:06 AM Revision 406648e1 (ceph): mon: do not delete mon->monmap which is not created by new
- Signed-off-by: Sage Weil <sage@newdream.net>
10/14/2010
- 10:18 PM Revision 94c96fa8 (ceph): Merge remote branch 'origin/osd_pglog_checksums' into unstable
- 10:07 PM Revision 04189f84 (ceph): mds: fix can_scatter_pin() to be only SYNC and MIX
- Those are the only states where the replica can effectively prevent the
lock from cycling in a way that would force a... - 10:06 PM Revision 9a8f1ad8 (ceph): object store: create OP_COLL_RENAME operation
- The OP_COLL_RENAME operation is used to rename collections.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 09:48 PM CephFS Bug #329: mds: mislinked dentry found during journal replay
- I suspect the solution (for the clustered case) is something like:
- trim_non_auth and a _subtree_ when we replay... - 09:39 PM CephFS Bug #329: mds: mislinked dentry found during journal replay
- This can come up with multiple MDSs. (Wido saw it with one MDS; not sure how that happened.)
With multiple MDSs, ... - 09:46 PM Revision 039a86f7 (ceph): doc: add object_store.dot
- Add object_store.dot. This graph is a rough sketch of the dependencies
between modules in the object store.
Signed-o... - 09:42 PM Revision 966a5b84 (ceph): conf: actually handle long long config options from conf file
- 09:31 PM Tasks #417 (Resolved): update wiki article on mon cluster expansion for v0.22 and monitor naming ...
- 07:09 PM Revision ad12d5d5 (ceph): Fix bug #487: osd: fix hang during mkfs
- If the user has turned on journalling, but left osd_journal_size at 0,
normally we would use the existing size of the... - 07:09 PM Revision 17de417f (ceph): FileJournal.h: add attribute __packed where needed
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:05 PM Revision 69be0df6 (ceph): messenger: introduce a "halt_delivery" flag, checked by queue_delivery.
- Defaults to false, is set to true by destroy_queue.
- 04:24 PM rbd Bug #489: Memory leak when doing a lot of I/O
- I do see the memory going up when running on 0.12.3, but not when running with the original version (the one on the r...
- 04:16 PM rbd Bug #489: Memory leak when doing a lot of I/O
- This patch doesn't compile (at least on my system). Are you sure you got the latest version running?
- 12:10 PM rbd Bug #489 (Closed): Memory leak when doing a lot of I/O
- I have a virtual machine with the following configuration:...
- 03:14 PM Bug #484 (Resolved): msgr: crash on just-closed pipe
- 11:10 AM Bug #484: msgr: crash on just-closed pipe
- Okay, make that commit:69be0df61d29a093dbeadf6dbcd4e18b429d0a22.
Adds a halt_delivery flag instead. - 10:39 AM Bug #484: msgr: crash on just-closed pipe
- Pretty sure this was dealt with by commit:587d1d5b42c378ebc8ede04e9bc72d260ed04f93, which makes destroy_queue check f...
- 03:07 PM CephFS Cleanup #493 (Rejected): mds: allow scatter_pinned inode to go from mix -> sync
- 02:30 PM Bug #492 (Rejected): osd: do not remove divergent objects
- Instead of blindly removing divergent objects, we should try to be smart about recovery. Overwrite them with a diffe...
- 01:02 PM Bug #490: Cluster stays in a degraded state
- ceph pg dump -o -
should let you know which PGs are degraded. If you're still running Cephx and having issues betwee... - 12:19 PM Bug #490 (Can't reproduce): Cluster stays in a degraded state
- My cluster is staying in a degraded state for the last few days....
- 12:26 PM Bug #491 (Can't reproduce): osd: pg incorrectly going active
- This wiped out some data on ceph-playground:...
- 12:08 PM Bug #487 (Resolved): osd: fix hang during mkfs
- Fixed by ad12d5d5be41ce740dfb8a6084484858d40898cc
cheers,
C. - 11:15 AM Bug #487: osd: fix hang during mkfs
- I can reproduce every time by not specifying any osd journal size in my ceph.conf.
- 10:46 AM Feature #488 (Resolved): osd: prehash pg content into subcollections
- We want to pre-hash pg content into subcollections (subdirs) based on the same hash we map objects into pgs with, so ...
- 10:36 AM Bug #482 (Closed): cephx assert
- We decided this was caused by walking off into the weeds due to #484.
10/13/2010
- 08:29 PM Bug #487: osd: fix hang during mkfs
- This was on the testing branch.
Need to confirm the source of the problem and fix in testing; we'll merge it into ... - 08:25 PM Bug #487 (Resolved): osd: fix hang during mkfs
- Ted writes on ML:...
- 07:11 PM Revision 60bfc670 (ceph): osd: fix MOSDBoot versioning
- 1 is what it was before; make it 2.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:08 PM Bug #460: OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- Sage Weil wrote:
> This shouldn't ever happen...
I have it happening on quite a few OSDs in my test cluster. Gen... - 06:20 PM Revision 0ff6e41d (ceph): RadosClient: clean up Rados::client use
- Forward declare RadosClient in librados.hpp so that we don't ahve to use
so many typecasts in class Rados.
Signed-of... - 05:33 PM Revision 36b61da5 (ceph): mds: SimpleLock and subclasses: const cleanup
- Const cleanup for SimpleLock, ScatterLock, and LocalLock.
Make SimpleLock::get_state_name() nonvirtual, since nobody... - 05:33 PM Revision d5d45039 (ceph): lists templates: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 05:09 PM Revision 7f493a11 (ceph): qa: add ffsb
- 04:40 PM Bug #484 (In Progress): msgr: crash on just-closed pipe
- Okay, it looks like there is a race between dispatch_entry and discard_queue. I'll patch that today, but I'd like to ...
- 01:17 PM Bug #484 (Resolved): msgr: crash on just-closed pipe
- The log shows the pipe 0x7fea08000c30 was just marked down:...
- 03:50 PM Revision e6d28ce3 (ceph): prefix git sha1 with commit:
- This just makes it into a link when pasted directly into redmine.
Signed-off-by: Sage Weil <sage@newdream.net> - 03:34 PM Bug #479: ceph/mount crash badly when writing
- Update.
re-ran again, this time capturing sys/kernel/debug/ceph/*/
briefly,
10:55:00 - start the ceph (sudo m... - 03:21 AM Bug #479 (Can't reproduce): ceph/mount crash badly when writing
- ceph version 0.23~rc (a7ed2ee05dc7453942018d7876401c28d3918214)
kclient master-backport
Linux ss1 2.6.36-020636rc7-... - 02:56 PM Bug #481 (In Progress): cosd leaking messenger threads
- The problem here is that tcp_read never times out, and OSDs don't write to sessions unless they're replying to someth...
- 08:40 AM Bug #481: cosd leaking messenger threads
- see ballpit3:/tmp/a
- 08:40 AM Bug #481 (Resolved): cosd leaking messenger threads
- 600 threads on ballpit3, running 0.22~rc, almost all messenger threads.
- 02:19 PM Subtask #486 (Resolved): osd: make scrub not block writes
- The overarching goal is to make scrub interact with writes. I think currently it holds the pg lock the whole time an...
- 02:13 PM Subtask #485 (Resolved): osd: cooperative scrub scheduling
- Each OSD probably needs some concurrency target (max concurrent scrubs). And a counter that indicates how many are i...
- 08:55 AM CephFS Feature #483 (Resolved): mds: add timestamp to LogEvent
- Would be nice if every log even had an mtime associated with it.
- 08:47 AM Bug #482 (Closed): cephx assert
- commit:e5882981b55f3c74d6b8b22a2bf5fbec81b775e6...
- 08:02 AM Linux kernel client Tasks #480 (Resolved): rebase btrfs snapshot ioctls, resend to list
- 01:55 AM CephFS Bug #478 (Can't reproduce): MDS crash: LogEvent::decode()
- On both my MDS'es I'm seeing the following crash:...
10/12/2010
- 10:26 PM Revision dc295a37 (ceph): mds: don't assert on mismatched rbytes
- 10:15 PM Revision 53decffc (ceph): Merge branch 'testing' into rc
- 10:15 PM Revision f35bdc28 (ceph): add rc to release.sh
- 09:42 PM Revision 098a4931 (ceph): mdsmonitor: remove unused variable
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:35 PM Revision fbb5a457 (ceph): mon: add 'ceph health' command
- Create MDSMonitor::get_health and OSDMonitor::get_health to check the
health of the MDSes and OSDes, respectively.
S... - 08:59 PM Revision 219b4764 (ceph): mds: fix const-ness of is_dirty()
- This was fixed before, got lost somehow.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:42 PM Revision df265a22 (ceph): mon: don't include endl on clock drift warning
- 06:17 PM Revision dead368d (ceph): Makefile: add cdebugpack.in to EXTRA_DIST
- 02:49 PM Revision 53fe418d (ceph): mds: MDCache should adjust_nested_anchors once the op's been logged.
- Fixes crashes from assert(nested_anchors >= 0) failures
when updating at the wrong point. - 02:49 PM Revision c56ab53f (ceph): mds: Locker::local_wrlock_finish now calls finish_waiters!
- Fixes a bug that could cause requests to hang since they were
put to sleep and never woken up. - 02:49 PM Revision 4ba060cc (ceph): mds: CInode doesn't always call assimilate_dirty_rstate_inodes_finish
- This was causing a mis-match in the projection code, since
assimilate_...finish() calls pop_and_dirty_projected_inode... - 02:49 PM Revision b438b3d6 (ceph): mds: Fix projection in rename code paths.
- We aren't actually projecting the inode unless destdn->is_auth(),
so check for that before projecting the snaprealm (... - 02:37 PM Cleanup #430 (Resolved): make simple 'ceph mon stat' check syntax
- 02:36 PM Cleanup #430: make simple 'ceph mon stat' check syntax
- Implemented "ceph health" in fbb5a457bacc656cd
The format is:
"HEALTH_OK|HEALTH_WARN|HEALTH_ERR <free-text-string... - 10:41 AM Linux kernel client Bug #473: Kernel panic: ceph_pagelist_append
- It looks like it wasn't the master branch, but some outtake from the unstable branch, probably commit:53f05210b418eaa...
- 09:43 AM Linux kernel client Bug #473: Kernel panic: ceph_pagelist_append
- commit:299ef41b70e26e6725073c2d0f85e5da7aa547d0 touches similar code, although it's not clear to me that it could cau...
- 10:08 AM Linux kernel client Bug #477 (Can't reproduce): kernel BUG at fs/inode.c:295
- On the playground machine, kernel version 2.6.36-rc3.
client commit 5954ea853b08105190d960032aa33cc339b2a3f1
[601... - 09:41 AM Linux kernel client Bug #464 (Resolved): fix bdi warning
- fixed upstream
- 04:25 AM Revision fc609846 (ceph): mds: avoid EXCL if mds_caps_wanted in _do_cap_update
- The file_excl() trigger asserts mds_caps_wanted is empty. The caller
shouldn't call it if that's the case. If it is... - 04:13 AM Revision fa2c371f (ceph): mds: bump dirstat.version during link/unlink/mtime update
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:57 AM Revision 9e5a203d (ceph): mds: fix get_xlock() assert on slave xlock
- If we do a slave request xlock, the state is LOCK, not XLOCK. Weaken
the SimpleLock::get_xlock() assert accordingly.... - 03:32 AM Revision f9b102e0 (ceph): mds: bump rstat version in predirty_journal_parents
- When we propagate the rstat to inode in predirty_journal_parents (because
we hold the nestlock), bump the rstat versi...
10/11/2010
- 08:54 PM Bug #376 (Can't reproduce): File corruption after cluster crashes
- 05:50 PM CephFS Bug #472: mds: fragstat crash
- Well, this seems to have gotten rid of the first assert issue -- and made pjd last a bit longer -- and it's a bit mor...
- 04:50 PM CephFS Bug #472: mds: fragstat crash
- let's try...
- 04:32 PM CephFS Bug #472: mds: fragstat crash
- Applied patch you gave me. Got new crash:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000a1e317 in sigabrt_handler... - 09:51 AM CephFS Bug #472: mds: fragstat crash
- Similarly:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000a1e2e7 in sigabrt_handler (signum=6) at config.cc:238
#... - 04:46 PM Cleanup #430: make simple 'ceph mon stat' check syntax
- or just 'ceph health'
- 01:10 PM Cleanup #430: make simple 'ceph mon stat' check syntax
- * probably want to call it 'ceph mon health'
* should check status of all components, not just monitor - 01:10 PM Tasks #476 (Resolved): wiki page for adding mds
- Looks good. Changed a couple things.
- 10:45 AM Tasks #476: wiki page for adding mds
- Something like this? http://ceph.newdream.net/wiki/MDS_cluster_expansion
I'v also grouped the cluster expanding ac... - 09:28 AM Tasks #476 (Resolved): wiki page for adding mds
- 10:09 AM Bug #475 (Resolved): failed to parse ceph_options
- Fixed by 566292a5871686e612b30bee58481db489b27bfb
- 10:05 AM Bug #326 (Resolved): OSD crash PG::IndexedLog::unindex
- fixed by commit:6bcda253e593b1f59f62a16798f56a92bdbbe0ab
- 09:44 AM Linux kernel client Bug #434: mds: clustered mds pjd failures
- To reproduce, you need to turn on mds thrashing (mds thrash exports = 1 in ceph.conf).
However, I've yet to get thes... - 01:40 AM Linux kernel client Bug #473: Kernel panic: ceph_pagelist_append
- I'm not completely sure, I see my vmlinuz is from 30-09-2010, so about 12 days old.
*vmlinuz-2.6.36-rc5-rbd-20014-...
10/10/2010
- 11:54 PM Bug #475: failed to parse ceph_options
- System: 2 x Intel Xeon E5630 (8 cores), 16GB Ram
OS: Linux ss1 2.6.36-020636rc7-generic #201010070908 SMP Thu Oct 7 ... - 09:32 PM Bug #475 (Resolved): failed to parse ceph_options
- from ML...
- 08:27 PM Bug #474 (Resolved): mon: improve paxos commit batching
- We should commit immediately if we haven't committed in the last 2 seconds. Currently we delay 2 seconds from the fi...
- 08:05 PM Linux kernel client Bug #473: Kernel panic: ceph_pagelist_append
- Do you know the commit id the client was running?
10/09/2010
- 11:35 AM Linux kernel client Bug #473: Kernel panic: ceph_pagelist_append
- I checked my mds log, this shows:...
- 05:22 AM Linux kernel client Bug #473 (Can't reproduce): Kernel panic: ceph_pagelist_append
- I was just doing a rsync of kernel.org, debian and ubuntu (simultaneous) and my client got a kernel panic.
The dme... - 04:22 AM Tasks #417: update wiki article on mon cluster expansion for v0.22 and monitor naming changes
- Something like this? http://ceph.newdream.net/wiki/Monitor_cluster_expansion
- 12:23 AM Revision d2175ee8 (ceph): filestore: don't start commit if nothing new is _applied_
- We were starting a commit if we had started a new op, but that left a
window in which the op could be being journaled... - 12:10 AM Revision a7ed2ee0 (ceph): mon: const crusade
- Make print_summary, print, dump, etc. functions const methods.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
10/08/2010
- 08:55 PM Revision 55370d3a (ceph): cdebugpack: update Makefile.am, add missing line
- 07:16 PM Revision 3d9a93ed (ceph): mount.ceph: make -v a little more verbose
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:07 PM Revision 8efef663 (ceph): mount.ceph: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:09 PM Revision 566292a5 (ceph): mount.ceph: allow the user to omit ceph_options
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 05:44 PM CephFS Bug #472 (Resolved): mds: fragstat crash
- see pudgy:/home/gregf/logs/fragstat_assert...
- 12:40 PM CephFS Cleanup #468 (Resolved): mds: use enum for LOCK_* in mds/locks.h
- 12:15 PM Linux kernel client Bug #471: NULL pointer dereference __list_add+0x42/0x89 kick_requests+0x24/0x9e
- Here's teh full dmesg, fwiw:...
- 12:04 PM Linux kernel client Bug #471 (Can't reproduce): NULL pointer dereference __list_add+0x42/0x89 kick_requests+0x24/0x9e
- On commit:0d328c1...
- 06:21 AM Revision 0b26f315 (ceph): mon: class library encodes/decodes activated class
- This fixes bug #470
- 01:12 AM Revision 932cfcbe (ceph): mount.ceph: add usage message
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 01:07 AM Revision 35c08d5f (ceph): mount.ceph: argument parsing cleanup
- * Functions that are local to the file are now static
* Don't modify the string argument to mount_resolve_src / pars...
10/07/2010
- 11:56 PM Bug #460: OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- node07 and node12 are online again (about 12 hours).
- 01:55 PM Bug #460: OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- The problem here is that we don't have the snapset attr. This happens when there is no _head and no _snapset object....
- 01:08 AM Bug #460: OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- I just saw this crash again.
Used "cdebugpack" to gather the right files.
Added "issue_460_node02.tar.gz" to th... - 11:19 PM Bug #470 (Resolved): Class gets disactivated
- Fixed by commit:0b26f3153f7aa06b70ebbab7aa61887bfe634909.
- 04:23 PM Bug #470 (Resolved): Class gets disactivated
- From time to time we see cases where classes lost their 'active' status. Might happen after restarting the monitors.
- 11:17 PM Revision 6679c274 (ceph): osd: move to boot state if down OR wrong address in map
- Saw an OSD that was up in the map, but the address didn't match. Caused
all kinds of strange behavior. I'm not sure... - 11:17 PM Revision 6bcda253 (ceph): osd: loosen caller_ops asserts
- The problem is that merge_log adds new items to the log before it unindexes
divergent items, and that behavior is nee... - 11:17 PM Revision 873095be (ceph): osd: fix merge_log cut point
- Look at the eversion.version field (not the whole eversion) when deciding
what is divergent. That way if we have
ou... - 06:16 PM Tasks #441: reconfigure sepia cluster
- All the daemons are running now! I'm still testing the stability of everything, of course.
Also, the make install ... - 01:56 PM Tasks #441 (Resolved): reconfigure sepia cluster
- 06:14 PM CephFS Cleanup #468: mds: use enum for LOCK_* in mds/locks.h
- Implemented in the cleanup branch.
C. - 04:47 PM Revision 6545f3ca (ceph): cdebugpack: behave when /bin/sh is dash
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:38 PM Revision af749e62 (ceph): cdebugpack: man page
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:31 PM Revision 9805eb5b (ceph): cdebugpack: include cdebugpack.XXXX dir in tarball
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:31 PM Revision 2c49ac4d (ceph): cdebugpack: include .tar.gz in usage filename
- 04:25 PM Revision 3b1b8f89 (ceph): cdebugpack: include in deb, rpm
- 02:52 PM Revision f10906b3 (ceph): mds: respawn (instead of suicide) on being marked down
- This makes temporarily laggy daemons will restart and rejoin the cluster
in standby mode.
Signed-off-by: Sage Weil <... - 02:52 PM Revision a2bcb419 (ceph): debug: always append to log
- We were truncating if we were in log_per_instance mode. But normally those
logs don't exist. And if they do, we pro... - 02:28 PM Revision a7deada2 (ceph): init-ceph: DTRT when cconf returns host = localhost
- cconf behavior was just changed by bcf1bdef56a256d4857dd4f9d859acca631cc347
Signed-off-by: Sage Weil <sage@newdream.... - 09:46 AM Feature #463 (Resolved): tool to capture debug info
- commit:6545f3ca1c9d358870e643bb511bd318710f2b94
- 12:51 AM Feature #463: tool to capture debug info
- There is a little bug in "cebugpack".
/bin/sh is used as interpreter. On Debian systems /bin/sh is symlinked to /b... - 09:18 AM Bug #469 (Rejected): Profiler detection is inaccurate
- Upon further inspection, I don't think this is a problem with the detection scripts, since IsHeapProfilerRunning is r...
- 08:09 AM Bug #469 (Rejected): Profiler detection is inaccurate
- After the latest git update, i.e., 22nd-Sept (unstable)
The 'make' breaks down. Here's the last line.
> /bin/bash... - 07:52 AM CephFS Feature #466 (Resolved): mds: respawn on suicide
- commit:f10906b3fdb720ef822478c7221836d67becef2b
- 03:30 AM Revision a18213d6 (ceph): debugpack: add ceph-pg-dump
- 03:04 AM Revision f6e49cbb (ceph): cdebugpack: save some more info
- ceph.conf
ceph -s
ceph osd dump
ceph mds dump
10/06/2010
- 11:42 PM Revision 8b716c6d (ceph): mds: Check the lock state, not the inode state!
- This was causing a lot of slowdowns.
Additionally, pin the inode when exporting caps -- otherwise it could
disappear ... - 11:06 PM Revision b778f830 (ceph): osd: on clearing corrupt logs, call pg::write_info
- After changing PG::info, call PG::write_info to get the on-disk
information back in sync with the in-memory state.
S... - 09:51 PM Revision 23bcc53a (ceph): Merge branch 'unstable' into osd_pglog_checksums
- 09:33 PM Revision 430377be (ceph): v0.23~rc (new unstable branch)
- 08:42 PM Revision 48196f91 (ceph): Merge branch 'testing' into unstable
- Conflicts:
src/osd/ReplicatedPG.cc - 08:41 PM Feature #463: tool to capture debug info
- Yehuda can you make a quick man page?
- 08:40 PM Feature #463 (In Progress): tool to capture debug info
- add to deb, rpm packages
- 08:29 PM Feature #463 (Resolved): tool to capture debug info
- done with commit:a18213d6fab3910ed75c838a150573b5456d8cec.
- 04:09 PM Feature #463: tool to capture debug info
- (04:08:30 PM) sage@newdream.net/slip: logs, binaries, core
(04:08:36 PM) sage@newdream.net/slip: /usr/lib/debug bina... - 08:21 PM Revision e5882981 (ceph): osd: fix pull completion tests, again
- op->complete==false is inconclusive.
Signed-off-by: Sage Weil <sage@newdream.net> - 08:21 PM Revision 47f2efb2 (ceph): osd: log error instead of crashing on failed pull attempt
- If peering screws up and the primary mistakenly tries to pull an object
from us we don't have, log an error instead o... - 08:05 PM Revision a2806854 (ceph): osd: save corrupt pg_logs to a special collection
- If the PG log is corrupt when we start up, save it to a special
collection so that we can examine it later.
Signed-o... - 08:01 PM Revision f6b47e38 (ceph): osd: clean out redundant (and wrong) complete calculation
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:01 PM Revision 1bb60b45 (ceph): osd: make sparse data/clone push behave with partial object push
- We can't error out if we don't get everything we want in one go now that
we support pushing objects in pieces. Remov... - 05:39 PM CephFS Cleanup #468 (Resolved): mds: use enum for LOCK_* in mds/locks.h
- We just fixed a bug that (I think?) the compiler would have warned about.. in->get_state() == LOCK_MIX instead of loc...
- 04:45 PM Revision 5ef97562 (ceph): Merge branch 'osd_lost_objects' into unstable
- 04:41 PM Linux kernel client Bug #459 (Resolved): bonnie++ is slow on clustered mds
- Solved the most apparent issue, which is that if the kclient had already dropped caps for the MDS on an existing inod...
- 04:23 PM Feature #169: osd: start up despite corrupted pg log(s)
- Done. We put each corrupt page log in a new collection.
- 04:17 PM CephFS Bug #295 (Can't reproduce): mds: can't rmdir due to dir size underflow
- 07:06 AM Revision ed3976ce (ceph): rgw: change default content type to binary/octet-stream
- 05:04 AM Revision 1f94a8fe (ceph): monclient: fix leaks in build_initial_monmap address lookup
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:02 AM Revision 7935e30e (ceph): monclient: fix off-by-one buffer overrun
- Still leaked, though.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:01 AM Revision 16f053f7 (ceph): addr_parsing: remove unused mount_path logic
- This was breaking parsing if any of the hosts included a ":port" too.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:05 AM rgw Bug #467 (Resolved): change default content type
- 12:05 AM rgw Bug #467: change default content type
- Should be fixed with commit:ed3976ce562908a0df02828d7c8d3dc79fa6443e.
- 12:03 AM rgw Bug #467 (Resolved): change default content type
- If content type was not specified we need to set it as 'binary/octet-stream' and not as 'text/plain'.
10/05/2010
- 11:47 PM Revision b2774979 (ceph): Merge remote branch 'origin/testing' into unstable
- 11:47 PM Revision 6a53d733 (ceph): Merge branch 'unstable' of ssh://ceph.newdream.net/home/sage/ceph.newdr...
- 11:26 PM Revision 109dcdf6 (ceph): cdebugpack: add a utility to generate a debug package
- 10:47 PM Revision 4bc4cba5 (ceph): osd: ignore info queries on deleting pgs
- Since we cancel deletion on pg change, we will only receive these from
old primaries, so we can safely ignore.
Signe... - 10:47 PM Revision a4eb5996 (ceph): osd: cancel deletion on pg change
- If the primary changes, cancel deletion so that the new primary has the
benefit of considering whether they need anyt... - 10:47 PM Revision ed2eee54 (ceph): config: fix address list parsing
- Skip past comma, whitespace.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:44 PM Revision 414bc4f9 (ceph): cmon: better error handling
- If we can't create the mon0/magic file, show an error message rather
than calling assert(). These cases are probably ... - 10:28 PM CephFS Feature #466 (Resolved): mds: respawn on suicide
- Either that, or we need some wrapper that restarts the daemon. Otherwise a cmds that gets laggy and is replaced won'...
- 10:16 PM Linux kernel client Bug #465 (Resolved): need to refresh osdmap when full flag is set
- Something as simple as calling
ceph_monc_request_next_osdmap(&osdc->client->monc);
before retur... - 10:02 PM Revision bcf1bdef (ceph): conf: cconf return default values from config.cc if not found
- 07:38 PM Revision 12373a6e (ceph): mds: allow do_null_snapflush on multiversion inodes
- The _do_snap_update() can handle a multiversion inode. Behave when
_do_null_snapflush() encounters one.
Signed-off-... - 07:26 PM Revision e064796b (ceph): signal handlers: be more elaborate about caught signals
- 07:16 PM Revision 22c38466 (ceph): mds: don't call mrk_dirty_rstat for base/root inodes
- Base inodes have no parent.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:05 PM Revision 3e56ac4b (ceph): dump backtrace when getting sigsegv and sigabrt
- 06:54 PM Revision f5958ad5 (ceph): mds: set dir layout during replay
- Need to copy layout from the EMetaBlob::fullbit into the inode.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:54 PM Revision 09b2db73 (ceph): mds: use helper to update inode from EMetaBlob during replay
- Removes 3 copies of this code.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:54 PM Revision 11a24f5e (ceph): mds: set root dir_layout during mkfs
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:54 PM Revision d600596a (ceph): mds: fix EMetaBlob dir_layout lifecycle
- Initialize, delete pointer.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:54 PM Revision 95e273a6 (ceph): mds: zero inode layout for dirs
- These aren't used for anything.
Also rename the default_dir_layout to _log_, since that's all that we now
use it for... - 06:54 PM Revision 50d91f62 (ceph): osd: less chatty in log about caps
- 06:54 PM Revision 994525ad (ceph): mds: fix typo in EMetaBlob encoder
- This was wrongly setting the dir_layout_exists flag to true.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:54 PM Revision cdc2b898 (ceph): mds: set root inode default_file_layout on mkfs
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:20 PM Revision ede37634 (ceph): mds: fix LocalLock xlocking by replacing default
- 06:20 PM Revision e4d86f31 (ceph): client: Fix truncate_seq/truncate_length initialization.
- Initializing to 0 was causing file_to_extents to get called on every inode
since the MDS initializes truncate_seq to ... - 06:08 PM Revision 5febcb90 (ceph): osd: read_log: clear the pagelog if it is corrupt
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:05 PM Revision e10f4607 (ceph): Merge branch 'unstable' into osd_pglog_checksums
- 05:12 PM Revision f4581e0d (ceph): mds: fix ESession/ESessions event id type again
- Not sure how many times we've screwed this one up!
Signed-off-by: Sage Weil <sage@newdream.net> - 04:57 PM Revision ff463df5 (ceph): filestore: drop unused parse_coll() declaration
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:17 PM Feature #463: tool to capture debug info
- commit:baa3772b1558af280a878c7b32b1d739c4054ed3 introduces cdebugpack. Generates a tar.gz (name needs to be specified...
- 10:48 AM Feature #463 (Resolved): tool to capture debug info
- - /usr/bin/ binary
- /usr/lib/debug/usr/bin symbol binary (if any)
- core files (if any)
- logs
Maybe it should... - 01:39 PM CephFS Tasks #365 (Resolved): test snaptests against single mds failure
- 12:37 PM Linux kernel client Bug #464 (Resolved): fix bdi warning
- I'm seeing this on the unstable branch:...
- 12:24 PM Feature #446 (Resolved): dump stack to log on segfault
- We'll keep the backtrace in the assertion code for now. Commit:e064796bea3985c088e74f75f35637225827bab8 adds some inf...
- 12:03 PM Feature #446: dump stack to log on segfault
- Commit:3e56ac4b377a3f39f040556beffb7c58cc2baea4 adds the signal handling part. Need to decide whether we keep the cur...
- 10:27 AM CephFS Bug #362: mds: rejoin crashes on snaptest-2 workload
- work on recovery in v0.23
- 10:26 AM CephFS Bug #395 (Resolved): mds: interval_set assert(0) during journal replay
- 10:26 AM CephFS Bug #426 (Resolved): mds: rstat propagation
- 04:22 AM Bug #460: OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- I'm now seeing this crash on multiple OSD's.
Added some coredumps to the collection on the logger machine.
10/04/2010
- 09:45 PM Bug #461: Hanging OSD during recovery
- While testing #462, I restarted osd6 to see if the cephx problems went await.
During boot, osd6 started to hang to... - 09:34 PM Bug #461 (Closed): Hanging OSD during recovery
- The OSD shutted down after about 3 hours it seems without any logging, so we probably won't find what ever caused the...
- 12:01 PM Bug #461 (Closed): Hanging OSD during recovery
- While my cluster was recovering from a few OSD crashes, one of my OSD's....
- 09:39 PM Bug #462 (Resolved): cephx: verify_authorizer_reply exception in decode_decrypt
- Since I started using _cephx_ on my cluster I started seeing these messages in my logfiles.
Now for example, I see... - 09:24 PM Bug #460: OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- I just tested if I could start the OSD again, but it crashed again, with almost the same backtrace:...
- 11:55 AM Bug #460 (Can't reproduce): OSD crash: ReplicatedPG::push_to_replica / Rb_tree
- After my cluster recovered from the latest crashes, I wanted to check if my RBD data was still in tact.
This cause... - 08:59 PM CephFS Bug #451: mds: replay error
- Uhh...Sorry, I thought the log should be enough, so I re-deployed the cluster and destroyed everything...
- 10:09 AM CephFS Bug #451: mds: replay error
- Henry Chang wrote:
> OK.. I've put it on the gateway machine: /tmp/ceph_logs/mds.1.log.gz
Got it, thanks.
Okay... - 06:21 PM Revision c3d3b422 (ceph): Merge branch 'testing' into unstable
- Conflicts:
src/mds/Locker.cc - 06:08 PM Revision 7aab70dd (ceph): Merge branch 'file_layouts' into unstable
- Conflicts:
src/mds/CInode.cc
src/mds/CInode.h
src/mds/MDCache.cc
src/mds/SimpleLock.h - 06:04 PM Revision 2b4eb4ab (ceph): add set layout ops to ceph_strings
- 06:04 PM Revision 45fa4a2f (ceph): mds: Conditionally encode default dir layout.
- Previously we unconditionally encoded the standard layout, which
on a directory inode is meaningless. So, use that sp... - 06:04 PM Revision 8938f271 (ceph): cephfs: Wrote and committed cephfs
- 06:04 PM Revision 212c1890 (ceph): client: update test_ioctls to test new stuff
- 05:50 PM Revision b5889832 (ceph): always throw by value; always catch by const ref
- Always throw exceptions by value rather than as pointers. Always catch
exceptions as const references to avoid uneces... - 05:42 PM Revision 2d194c67 (ceph): mds: If a projected inode has a dir_layout, we now encode it to disk.
- 05:42 PM Revision cb7b3601 (ceph): mds: misc fixes for dir default layout projection
- 05:42 PM Revision 64c3556d (ceph): mds: fix setlayout truncation check.
- The trunc_seq is initialized to 1 in prepare_new_inode.
- 05:42 PM Revision fbbf4481 (ceph): client: import ioctl header from ceph-client
- 05:42 PM Revision 79d18933 (ceph): mds: zero out the layout in handle_client_setlayout
- Could have led to an invalid layout by mistake.
- 05:42 PM Revision 42c7ed44 (ceph): mds: Implement op CEPH_MDS_OP_SETDIRLAYOUT.
- Implement handler functions, add to inode projection machinery, etc.
- 05:42 PM Revision 54e95fed (ceph): mds: Look for and make use of directory tree default layouts, if existent.
- 03:50 PM Revision 01ae1be2 (ceph): filestore: make list_collections() list all dirs
- coll_t is now unstructured; list all dirs besides '.' and '..'.
The old coll_t::parse() was broken. Remove it. Fix... - 03:44 PM Revision 940354b9 (ceph): osd: make load_pgs verbose
- Show what it's skipping any why.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:42 AM Bug #458 (Won't Fix): OSD::activate_pg
- This is from the old (broken) recovery code attempting to forget lost objects. The bandaid is to just comment out th...
- 11:35 AM Bug #458 (Won't Fix): OSD::activate_pg
- On one of my OSD's (osd7) I started to see:...
- 11:35 AM Linux kernel client Bug #459 (Resolved): bonnie++ is slow on clustered mds
- We tracked it down to a problem with cap revocation while deleting inodes. The MDS is requesting that the kclient dro...
- 11:32 AM Linux kernel client Bug #434 (In Progress): mds: clustered mds pjd failures
- Looking at this now.
- 11:27 AM Bug #428 (Resolved): osd: recovery stalls on mismatched snapset and object
- There's a separate issue open for the remaining issue #453. Closing this one out.
- 11:19 AM CephFS Bug #447 (Resolved): mds: failed assert(cap) in void Locker::handle_client_caps(MClientCaps*)
- I suspect this one is fixed by commit:113a9bcd957839f2838c0e0cb80c25108278fde2, which will be in v0.21.4 and v0.22. ...
- 11:17 AM Feature #185 (Resolved): mds: set file layout policy on directory hierarchy
- Pushed in commit:7aab70ddc464355f068a143ea0e972183c155f24 (userspace) and commit:f670ee7872e51842e817e1606539e3c72e4b...
- 11:04 AM Feature #457 (Rejected): osd: alphanumeric names
- 11:03 AM Bug #450 (Won't Fix): osd named with leading/padding 0 gets stripped
- 11:03 AM Bug #450: osd named with leading/padding 0 gets stripped
- This is normal. The OSD ids are purely numeric (ints). We could add a layer of alphanumeric names at some point, bu...
- 10:19 AM Feature #456 (Resolved): make dumpjournal functionality usable
- It could be integrated into cmds? Maybe something like,...
- 09:14 AM Bug #455 (Resolved): OSD::_create_lock_pg
- fixed by commit:01ae1be288bae196180ad03065e14be867b5e12e
- 12:40 AM Bug #455: OSD::_create_lock_pg
- I just checked (haven't check the cluster state for about a day and a half) and then found that osd11 crashed again w...
Also available in: Atom