Activity
From 10/31/2010 to 11/29/2010
11/29/2010
- 11:52 PM Revision c9f864a0 (ceph): osd: PG::trim: fix inverted conditional in assert
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 11:12 PM Revision b2bcf4b3 (ceph): common: prevent infinite recursion on SIGSEGV
- Install SIGSEGV / SIGABORT handlers with sigaction using SA_RESETHAND.
This will ensure that if the signal handler it... - 10:12 PM Revision 85191813 (ceph): osd: Create pg_split test
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:35 PM Revision fb60e114 (ceph): logger: Fix a crash when the MDS shuts down cleanly.
- We weren't holding the lock on the logger_timer before calling shutdown.
- 09:35 PM Revision b4db4100 (ceph): Timer: add some asserts to catch certain errors.
- 08:56 PM Revision adbb5459 (ceph): osd: some notify simplifications and FIXMEs
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:56 PM Revision ec15c465 (ceph): osd: track unconnected_watchers and when they expire
- - set up an initial expiration when we load the obc off disk
- remove expiration when we connect to an existing watch... - 08:55 PM Revision 376870fa (ceph): osd: add timeout to watch_info_t
- Allow the watch timeout be set on a per-watch basis. Still need to figure
out where that comes from.. the client? A... - 08:55 PM Revision 239c0a12 (ceph): rbd: fix version renaming
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:55 PM Revision b3051531 (ceph): osd: fix up WATCH
- Separate various paths: registering new watch, reconnecting to existing
watch, removing watch, etc.
Signed-off-by: S... - 08:55 PM Revision 2563905b (ceph): osd: some cleanup
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:55 PM Revision b722662e (ceph): osd: use pg_t to find PG's again
- The ceph_object_layout is approaching obsolete. Also, use a more general
lookup_lock_raw_pg() helper that doesn't ta... - 08:54 PM Revision a61f6b5e (ceph): osd: add missing Watch.cc
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:54 PM Revision 0e62c421 (ceph): osdc: spell out version
- Cosmetic
Signed-off-by: Sage Weil <sage@newdream.net> - 08:51 PM Revision 15ffbc8d (ceph): makefile: add missing MWatchNotify.h
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:50 PM Revision 4dca64b2 (ceph): osd: drop unused fields
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:18 PM Revision 463d624d (ceph): Makefile: Add --as-needed to LDFLAGS
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:51 PM Revision a77eb6bd (ceph): vstart.sh: don't specify journaling mode
- Let the autodetection kick in, or let the dev specify via -o '...'.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:41 PM Revision e0b927b2 (ceph): osd: PG::trim: add assert
- Assert that we're not trimming the PG log past last_complete.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 05:48 PM Revision 756918be (ceph): osd: _process_pg_info: add assert for replicas
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 05:06 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- Fred, can you see if this reproduces on the latest unstable? Thanks.
-C
- 11:14 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- I added the PG::trim assert. It seems to cause problems immediately with test_unfound.sh
The plot thickens... - 10:36 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- Argh yeah I was all wrong here. The recovery code looks ok.. I think the problem is that _before_ this the log was t...
- 09:21 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- > The replicas only ever get messages from the primary, and the primary
> sends a log to activate. Never anything e... - 04:51 PM Bug #614: SEGV loop on _open_lock_pg after rmpool
- Er, by that I mean:
load_pgs shouldn't try to load a PG that is in a nonexistent pool. This could only happen aft... - 04:49 PM Bug #614 (Resolved): SEGV loop on _open_lock_pg after rmpool
- In OSD::load_pgs, we weren't checking to make sure that the pool existed when going through all the collections.
F... - 02:23 PM Bug #614 (Resolved): SEGV loop on _open_lock_pg after rmpool
- discovered my cosd processes at 100%, possibly following some "rados rmpool" commands to delete some pools. Stopped ...
- 04:41 PM Bug #598: osd: journal reset in parallel mode acts weird
- bunch of problems here, not all related to a full journal.
- 12:18 PM Feature #568 (Resolved): debian: build with --as-needed?
- Implemented!
before:
cmccabe@flab:~/src/ceph2/src$ ldd .libs/rados
linux-vdso.so.1 => (0x00007fff4eff... - 11:13 AM Bug #575 (Resolved): monmaptool terminates when input file is not a monmap
- 10:49 AM Bug #479 (Can't reproduce): ceph/mount crash badly when writing
- 10:15 AM CephFS Subtask #547 (Resolved): mds: define fsck strategy, required metadata
- 10:13 AM CephFS Bug #594: mds: frag split/merge vs replay
- needs to be fixed in 0.24, or g_conf.mds_frag needs to be disabled.
- 10:06 AM Bug #595 (Won't Fix): Autogen: not a literal
- seems to go away with latest automake
- 07:12 AM Bug #613 (Resolved): OSD crash: FAILED assert(recovery_oids.count(soid) == 0)
- I'm running a script that reads and writes random objects using librados (creating a new pool once in a while). Runn...
11/25/2010
- 07:36 AM Revision 3ab60091 (ceph): osd: dump_missing: also dump missing_loc
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:35 AM Revision da087e47 (ceph): osd: discover_all_missing fix
- Don't request information from an OSD unless it is up and part of the
might_have_unfound set. Add more logging.
Sign... - 12:18 AM Bug #611: OSD: OSDMap::get_cluster_inst
- commit:da087e47c21190f9cbde4d24182b7dfe581cd069 should resolve this
11/24/2010
- 10:54 PM Bug #611: OSD: OSDMap::get_cluster_inst
- I'll take a look
- 10:18 PM Bug #611: OSD: OSDMap::get_cluster_inst
- Okay, I somehow commented/set this bug backwards with another one. Whoops, sorry guys!
This looks like the OSD is as... - 10:38 AM Bug #611: OSD: OSDMap::get_cluster_inst
- Sam said he'd look at this since it's in the background scrubbing bits that he and Josh did.
- 05:11 AM Bug #611 (Resolved): OSD: OSDMap::get_cluster_inst
- After upgrading to the latest unstable, one OSD crashed. Before the upgrade, 10 of the 12 OSD's were online.
When ... - 10:18 PM Bug #612: OSD: Crash during auto scrub
- Dunno how, but somehow commented/assigned this and another bug backwards. Meant to say:
Sam said he'd look at this s... - 10:38 AM Bug #612: OSD: Crash during auto scrub
- This looks like the OSD is assembling a list of missing queries and then sending them out without bothering to check ...
- 05:28 AM Bug #612 (Resolved): OSD: Crash during auto scrub
- After I saw #611 my cluster started to crash. One after the other, the OSD's started to go down, all with a message a...
- 10:09 PM Feature #453 (Resolved): osd: return error (instead of blocking) on lost objects
- It's passing the lost1 and lost2 unit tests now.
- 09:41 PM rgw Bug #353: Handle non-ascii filenames
- Yeah, I agree with Amazon's approach here. UTF-8 makes sense. I think we could continue to use std::string internally...
- 02:03 AM Revision d6e8e8d1 (ceph): gui: some cleanup
- Rather than vectors of pointers, use vectors of NodeInfo structures.
This avoids the problem of freeing the NodeInfo ... - 12:56 AM Revision 1b1e040e (ceph): osd: add a map for lingering messages
- 12:55 AM Revision 99e1e4de (ceph): librados: assert_version on sync operations
- 12:55 AM Revision c4b97953 (ceph): librados: last_objver is set on the pool, and not per thread
- 12:55 AM Revision 454ea06e (ceph): rbd: notify about header changes
- 12:55 AM Revision 520b523b (ceph): librados: fix unnecessary locking
- 12:55 AM Revision 4c8bdc53 (ceph): osd: don't notify notifier
- 12:54 AM Revision a76de3b2 (ceph): librados: complete C interface for watch/notify
- 12:54 AM Revision 38c8e383 (ceph): librados: rename cookie to handle in api
- 12:54 AM Revision 2954799a (ceph): librados: notify waits for completion
- 12:50 AM Revision e7184e6d (ceph): librados: start implementing watch/notify
- 12:50 AM Revision a4864bd8 (ceph): librados: enable object versioning
- 12:50 AM Revision f36677f8 (ceph): librados: update C api
- 12:49 AM Revision f8af4f2c (ceph): osd: add watch/notify timeout
- 12:49 AM Revision cc62f2eb (ceph): osd: fix bad mutex lock
- 12:49 AM Revision e0c548ad (ceph): osd: fix ms_handle_reset
- 12:49 AM Revision d5cc6732 (ceph): osd: some notify related cleanups
- 12:49 AM Revision 7272bfec (ceph): osd: send notify response from reset handler if needed
- 12:49 AM Revision d66b52e1 (ceph): osd: watch infrastructure
- third attempt
- 12:49 AM Revision 2b5e61ca (ceph): osd: send notification id
- 12:49 AM Revision 59e61d0e (ceph): osd: discard of disconnected watchers
- still need to add a timeout
- 12:49 AM Revision f5f33822 (ceph): osd: send notify reply if there are not watchers
- 12:49 AM Revision 9437ea84 (ceph): osd: add user_version field in obect_info_t
- 12:49 AM Revision 7bda45a1 (ceph): osd: reply with either user_version or at_version, depends on the op
- 12:49 AM Revision f7b7d67a (ceph): osd: check requested watch version number
- send appropriate status code if needed
- 12:47 AM Revision 2bce34e7 (ceph): osd: handle watch op, register client on object xattr
- 12:47 AM Revision 3110e361 (ceph): osd: basic watch/notify handling
- 12:47 AM Revision e493c7ae (ceph): osd: handle notify-ack
11/23/2010
- 11:39 PM Revision 2f13dd8e (ceph): gui: more reindenting
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 11:37 PM Revision 66a78c23 (ceph): gui: reindent a bunch of code
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 10:40 PM Revision d8652de6 (ceph): mdcache: in trim_non_auth, only print out path if it has a parent dentry.
- This should only occur with the root inode, but caused a segfault for
anybody running more than one MDS who restarted... - 10:04 PM Revision 8768b52d (ceph): mds: Reply checking_lock while reading filelock
- Use checking_lock to repalce lock_state in extra buffer list to let client can get correct file lock reply.
- 09:59 PM Revision 4041bf0d (ceph): mds: fix set_state_rejoin auth_pin check
- We carry an auth pin IFF !stable AND auth.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:59 PM Revision 5ed06ffc (ceph): client: remove inode from flush_caps list when auth_cap changes
- Avoid confusing other code (e.g. kick_flushing_caps) by staying on the mds
flushign_caps list when we don't even have... - 09:52 PM Revision 285cc946 (ceph): osd: fix is_all_uptodate()
- This should only return true when recovery is done, i.e., no more missing
objects. Nothing to do with unfound.
Sign... - 09:52 PM Revision 36f703e1 (ceph): osd: removing unused variable, fix warning
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:52 PM Revision 413ecb0b (ceph): osd: only search_for_missing if there are unfound objects
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:52 PM Revision 671b1c09 (ceph): osd: add get_num_unfound() helper
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:52 PM Revision 7ea7a435 (ceph): osd: only discover_all_missing if unfound
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:52 PM Revision 5452dae6 (ceph): osd: recover_primary() until primary has all found objects
- The logic in that if was effectively reversed.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:52 PM Revision 5498c467 (ceph): osd: fix recover_replicas() unfound check
- missing_loc.count(soid) == 0 only means unfound if it's not missing on the
primary.
Signed-off-by: Sage Weil <sage@n... - 09:52 PM Revision e97eae15 (ceph): init-ceph: tolerate failure in cleanallogs
- Otherwise /var/log/ceph/stat makes rm -f error out and we fail.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:52 PM Revision 84612286 (ceph): Build might_have_unfound set at activation
- The might_have_unfound set is used by the primary OSD during recovery.
This set tracks the OSDs which might have unfo... - 09:52 PM Revision 0e15da8d (ceph): Rename peer_summary_requested to peer_backlog_req
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:52 PM Revision c0c301d5 (ceph): osd: PG::read_log: don't be clever with lost xattr
- Formerly, we had a special case in read_log for dealing with objects
whose objects were present on the disk, but not ... - 09:52 PM Revision 55570baf (ceph): osd: fix PG::is_all_uptodate
- In PG::is_all_uptodate, don't try to look for peer_missing[osd->whoami].
The primary keeps that in PG::missing!
Sign... - 08:26 PM Revision 36c6569c (ceph): monmaptool: Return a non-zero error code and print a useful error
- message if unable to read the monmap file.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net> - 06:14 PM Feature #610 (Resolved): gui: make PG view prettier
- The ceph -g GUI should display PGs in a list, rather than as icons that have to be clicked on. We should get rid of t...
- 06:13 PM Bug #604 (Resolved): Compiler warning: 'status' may be used uninitialized in this function
- Fixed by commit:d6e8e8d15d22b51ec86bc5687336c3d50d9b3a5d
We should change PG view on the GUI to be a list view at ... - 05:43 PM Revision fc212548 (ceph): mds: allow for old fs's with stray instead of stray0
- New fs's get stray0, but we want to still behave with old ones.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:37 PM Revision de61991a (ceph): Merge branch 'testing' into unstable
- Conflicts:
configure.ac - 03:00 PM Bug #531: Journaling Causes System Hang
- Awesome, thanks for the help. I will give these patches a shot towards the end of the week.
Thanks - 02:43 PM Bug #599 (Resolved): recover_master_log, doesn't
- There were two problems here:
1) we were restarting the osds before the monitors, which in this case prevented a f... - 02:01 PM Linux kernel client Bug #552: Samba with kernel oplocks=on produces lots of corrupt mds entries in dmesg
- Our friends at Tcloud just submitted patches for this today, which I've applied to the unstable branch of our kernel ...
- 11:46 AM CephFS Feature #593 (Rejected): mds: fsck: anchor table repair
- dup
- 11:42 AM Feature #609 (Resolved): osd: query pool/pg for objects with given xattr
- This will probably take the form of a pool class plugin?
It could start as just a hack, for now.
- 11:03 AM Bug #595: Autogen: not a literal
- This problem does not seem to occur using 2.68 on my local machine. Slider et al. seem to be using 2.67.
- 09:39 AM CephFS Bug #608 (Resolved): mds: MDCache::create_system_inode()
- this should be fixed by commit:fc212548aea1d7f001b56ba096a79ba54b8a92c3
Thanks! - 07:09 AM CephFS Bug #608 (Resolved): mds: MDCache::create_system_inode()
- On a small test cluster I saw that my MDS was not coming up after a fresh mkcephfs, this is what the log showed:
<... - 09:33 AM Tasks #584: do throughput scaling tests on sepia
- What was the variance in per-node throughput? Did we have one node dominating?
- 09:22 AM Tasks #584 (In Progress): do throughput scaling tests on sepia
- There's definitely a problem here; the total throughput should be scaling more or less linearly until we hit a bottle...
- 07:44 AM Bug #563: osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
- I'll have to rebuild, since I didn't look at the messages that closely.
- 07:02 AM Revision 868665d5 (ceph): v0.23.1
- 06:41 AM Revision c327c6a2 (ceph): mon: always use send_reply for auth replies
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:41 AM Revision 61dd4f03 (ceph): mon: simplify send_reply code
- No need to specify destination in send_reply, as we always have the request
for reference.
Simplify MRoute construct... - 01:37 AM Revision 2c71bd33 (ceph): osd: add assert to _process_pg_info
- When activating an inactive replica, assert that we are doing so based
on a message from the primary.
Signed-off-by:... - 01:35 AM Revision a70943fd (ceph): osd: re-indent some code in _process_pg_info
- Re-indent the code and add a comment.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 12:12 AM Revision 71369541 (ceph): msgr: tolerate 0 bytes from tcp_read_nonblocking
- This can happen, I belive when we get a signal or something.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:12 AM Revision 7ec0034b (ceph): init-ceph: fix (and test!) cleanlogs and cleanalllogs
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:03 AM Revision 7b4a801f (ceph): mds: fix rejoin_scour_survivor_replicas inode check
- We want to remove replicas that we don't ack, but those don't appear in
the strong_inode map; they're appended to the...
11/22/2010
- 11:08 PM Revision 8d95b5b6 (ceph): messenger: init rc to -1, removing compiler warning.
- This actually is initialized before all uses, but compilers tend to
have trouble with assignment in if-else branches,... - 11:08 PM Revision dd11fe27 (ceph): types: Allow inodeno_t structs to alias.
- This removes a compiler warning that appeared in a gcc upgrade and
is apparently erroneous, about its usage violating... - 10:56 PM Bug #540 (Resolved): CephxClientHandler::handle_response
- couldn't reproduce this, but fixed two smallish things that may have been responsible for this:
commit:61dd4f03e6e15... - 10:35 PM Linux kernel client Bug #552: Samba with kernel oplocks=on produces lots of corrupt mds entries in dmesg
- From the reply dump, it looks like a ceph_mds_reply_head, a length 0 tracebl, a length 1 extrabl (containing a u8 == ...
- 09:25 PM Revision ac6b018a (ceph): Causes the MDSes to switch among a set of stray directories when
- switching to a new journal segment.
MDSCache:
The stray member has been replaced with strays, an array of inodes
r... - 09:16 PM Revision 3f8f5905 (ceph): Timer must be initialized in Client::init and shutdown in
- Client::shutdown.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net> - 06:47 PM Revision 8eb4de9e (ceph): generate_past_intervals:generate back to lastclean
- PG::generate_past_intervals needs to generate all the intervals back to
history.last_epoch_clean, rather than just to... - 06:07 PM Revision 80f28235 (ceph): vstart.sh: 'init-ceph stop' instead of 'stop.sh'
- This just makes it easier to run multiple vstart sessions as the same user
on the same host.
Signed-off-by: Sage Wei... - 05:55 PM Revision 53d0650a (ceph): Merge branch 'osd_msgr' into unstable
- 05:55 PM Revision cd53719f (ceph): mds: resolve cleanup
- Only track ambiguous imports and such if we get a resolve message while in
the resolve state.
Signed-off-by: Sage We... - 05:55 PM Revision c0c81d53 (ceph): mds: trim exported subtree _after_ adjusting auth
- We need to set the subtree bounds before trimming it away, or else we may
throw out things we're still auth for.
Sig... - 05:55 PM Revision 9e15ade8 (ceph): mds: do not eval subtree root when replay|resolve
- This is nonsensical. And can lead to scatter_writebehind, which breaks
horribly.
Signed-off-by: Sage Weil <sage@new... - 05:55 PM Revision 27c6f217 (ceph): mds: remove bogus assert
- Causes problems during resolve finish.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:49 PM Revision 924b1fcb (ceph): osd: bind to new cluster address when wrongly marked down
- If we come back up on the same address, there is a possible race. Other
nodes will mark_down when they see us go dow... - 05:45 PM Revision 19409763 (ceph): msgr: implement rebind() to pick a new port
- Closes out all old connections and binds to a _different_ port. This
ensures that someone doing mark_down on our old... - 05:09 PM Revision f7170f95 (ceph): client: only encode_cap_releases once per request.
- Accomplish this by making a list of cap releases in the (permanent)
MetaRequest, and then copying that into the (pote... - 04:36 PM Bug #607 (Rejected): osd: ReplicatedPG: sub_op_modify: fix creation of ObjectState
- There's a part of the ReplicatedPG::sub_op_modify code that goes like this:
> // do op
> ObjectStat... - 04:29 PM CephFS Feature #91: mds: up:shadow mode
- Updated Journaler to make new interface options asynchronous.
Presently working on how to disambiguate between a one... - 03:48 PM Tasks #584 (Resolved): do throughput scaling tests on sepia
- Results of running rados -p bench bench 20 write on <Nodes>. <Average Throughput> is the average of the Bandwidth st...
- 01:24 PM CephFS Feature #88 (Resolved): mds: change stray commit strategy to avoid rolling stray dir commits
- commit:ac6b018acbeaf8670f8c268db164cfb8a12c171d
- 12:59 PM Bug #563: osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
- Is the stack trace you're getting now identical, or different? The FileStore.cc change _should_ have avoided the asy...
- 09:28 AM Bug #563: osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
- Just to update the issue, Sage asked me to change something in FileStore.cc, tried that for some days, but that didn'...
- 12:47 PM CephFS Feature #606 (Duplicate): mds: optionally store parent attr on file objects
- The goal is to be able to find files contained in rebuilt directories (#603). We can store the same attrs we do for ...
- 12:45 PM CephFS Feature #605 (Rejected): mds: verify/repair anchor table
- - Make sure every item we encounter while traversing the that is anchored correctly appears in the anchor table.
- M... - 12:44 PM Bug #604 (Resolved): Compiler warning: 'status' may be used uninitialized in this function
- In gui.cc
The warning's location references are a bit off, but the function gen_node_info_from_icons declares a "sta... - 12:43 PM CephFS Feature #603 (Resolved): mds: repair directory hierarchy
- The goals are
- rebuild missing/corrupt directories
- repair multiple primary links to directories
We'll do so... - 12:40 PM CephFS Feature #602 (Resolved): mds: handle corrupt/missing journals
- This probably means
- shutting down current instances, resetting cluster membership
- throwing out journals (or m... - 12:37 PM CephFS Feature #601 (New): mds: order directory commits after rename
- When we rename something between directories, we should try to commit the target directory _before_ the source direct...
- 12:34 PM CephFS Feature #600 (Resolved): mds: store full trace on directories
- Currently we only store the immediate parent; store a full trace up to the root. This is CInode::encode_parent_mutat...
- 12:17 PM Bug #599: recover_master_log, doesn't
- Also, I have verified that osd3 and osd9 did NOT crash. They're still running, and they did receive the messages from...
- 12:13 PM Bug #599 (Resolved): recover_master_log, doesn't
- This is another peering bug. We found it on wido's cluster. Basically, peering never completes.
I just examined PG... - 09:52 AM Bug #592 (Resolved): osd: rebind cluster_messenger when wrongly marked down
- commit:53d0650a42cbfd2f02db2c708a570b6d9e116bb4
- 09:14 AM CephFS Bug #596 (Resolved): crash during mds reconnect
- Well, that seems to fix it. I added a releases vector to the MetaReqest so it will only encode the releases once, and...
- 08:49 AM Bug #598 (Resolved): osd: journal reset in parallel mode acts weird
- from ML:...
- 04:52 AM Revision 51abcaa2 (ceph): mon: clean up cluster_addr code a bit, better debug output
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:52 AM Revision 20313644 (ceph): osdmap: fix cluster_addr encoding; printing
- The cluster addrs were getting lost because we were checking v instead of
ev.
Signed-off-by: Sage Weil <sage@newdrea... - 04:52 AM Revision 28498a00 (ceph): osd: send correct ip addrs to monitor for cluster_, hb_addr
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:59 AM Revision ec434eda (ceph): osd: unconditionally set up separate msgr instance for osd<->osd msgs
- Always set up cluster_messenger (before we would only do so if there was
an explicit address configured for it). The... - 12:16 AM Revision 0dddf453 (ceph): filestore: only warn about disk write cache on kernels <2.6.33
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:15 AM Revision 0856f57e (ceph): osd: fix search_for_missing: old last_update implies object not present
- For example, if an osd sends an empty PG::Info (last_update = 0'0) and
empty missing, we should not conclude that the... - 12:09 AM Revision 6ef5c2f3 (ceph): init-ceph: fix cleanlogs for no log_sym_dir case
- Signed-off-by: Sage Weil <sage@newdream.net>
11/21/2010
- 07:55 PM Linux kernel client Bug #549 (Resolved): bonnie++ file stat failure
- commit:3105c19c450ac7c18ab28c19d364b588767261b3
- 03:50 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
- I think the cleanest solution here is to re-bind the cluster_messenger to a new port when we are marked down and go b...
- 03:38 PM Linux kernel client Bug #597 (Closed): Reproducible crash mounting multiple directories from a pool
- This bug was fixed in v2.6.36, commit:ca04d9c3ec721e474f00992efc1b1afb625507f5. Thanks for the report though! :)
- 03:34 PM Linux kernel client Bug #597: Reproducible crash mounting multiple directories from a pool
- Should have mentioned - this is with the Ubuntu 10.10 desktop kernel, which is 2.6.35-22, I think.
- 03:33 PM Linux kernel client Bug #597 (Closed): Reproducible crash mounting multiple directories from a pool
- When trying to mount a pool multiple times (with different subdirectories) I get a consistent system hang.
Steps t...
11/20/2010
- 05:06 PM Bug #531: Journaling Causes System Hang
- Please try out the patches in the filestore_throttle branch, commit:b28c0bf82ac28ded4fe85573d32fdc111c66e50b
It lo... - 03:15 AM Revision fc9b0976 (ceph): OSDMap: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 03:14 AM Revision 2a5c3893 (ceph): mds-dumper: Define Dumper::~Dumper()
- To fix compile error.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
11/19/2010
- 10:21 PM Revision 8566c5cd (ceph): ReplicatedPG::pull: fix test for unfound
- The test for unfound objects was reversed, leading us to try to pull
unfound objects and refrain from pulling objects... - 09:41 PM Revision 2f5502fa (ceph): osdmap: fix printing, again
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:21 PM CephFS Bug #596: crash during mds reconnect
- The encode_cap_releases can only be called _once_, the very first time we send the request. So at some level this is...
- 04:22 PM CephFS Bug #596 (Resolved): crash during mds reconnect
- While testing my Journaler changes, I got a cfuse segfault. My steps:
vstart with 1 of each daemon
mount cfuse
cop... - 06:17 PM Revision 4303820b (ceph): Merge remote branch 'origin/mds' into unstable
- 04:26 PM CephFS Feature #91 (In Progress): mds: up:shadow mode
- I've been getting some proper time in on this on and off over the last few days. Pushed the Journaler changes to the ...
- 03:52 PM Bug #531: Journaling Causes System Hang
- Okay,
More updates.
1) All the VMs deployed okay but it looks like towards the end of the deployments I hit the... - 02:49 PM Bug #531: Journaling Causes System Hang
- Okay,
I just started the deployment of 12 vms on a new cephfs with 3 osds in and ssd's for journals on all the sys... - 02:37 PM Bug #531: Journaling Causes System Hang
- I am working on getting the output now. We are having to work on several projects at once right now. Sorry for the de...
- 03:36 PM Bug #595 (Won't Fix): Autogen: not a literal
- We get this running on autoconf 2.67:
configure.ac:6: warning: AC_INIT: not a literal: Sage Weil <sage@newdream.net>... - 02:29 PM CephFS Bug #594 (Resolved): mds: frag split/merge vs replay
- Need to reconcile refragmenting with resolve stage. Currently handle_resolve assumes frags match, when in reality th...
- 12:11 PM Bug #585 (Resolved): OSD: ReplicatedPG::pull
- Fixed by commit:82f1de8c0d6e7817ca7d6dd710e3176b2a549e12
- 10:43 AM Bug #585 (In Progress): OSD: ReplicatedPG::pull
- need to see what's going on with this
- 11:47 AM Bug #503 (Closed): osd: query osds since last_epoch_clean before concluding objects lost?
- 11:39 AM Bug #515 (Can't reproduce): osd: recovery isn't completing
- with the recent changes i'm closing this one out, and reopening with specifics if it comes up in testing over the nex...
- 10:14 AM CephFS Feature #545 (Resolved): mds: use bloom filter to supplement dirfrag COMPLETE flag
- merged commit:4303820b43721a8b46ef36d0e9ef4e1167857c80
- 09:38 AM CephFS Feature #593 (Rejected): mds: fsck: anchor table repair
- We need to be able to fix up the anchor table when there are problems, to avoid e.g....
- 05:13 AM Revision b91e14e1 (ceph): multi-dump.sh: add diff mode
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:57 AM Revision 9cab522e (ceph): Add multi-dump.sh
- This is a debug tool that can dump out Ceph information at various
epochs. For instance, it can show how the OSDmap c...
11/18/2010
- 11:05 PM Revision 6e2b594b (ceph): ReplicatedPG::get_object_contect: fix broken calls
- ReplicatedPG::get_object_context takes three parameters. The last two
are "const object_locator_t& oloc" and "bool c... - 09:50 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
- Ah. Looks like you got it figured out.
I wasn't aware of what mark_down did.
Just in case anyone finds it useful... - 09:22 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
- ok, this is a problem with how the osd is interacting with the messenger. looking at the history of 0.5, we see
<pr... - 08:42 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
- i suspect 0.5 didn't get set up on osd1 or 2 before osd0 went down? do you have the full logs for the other instances?
- 05:07 PM Bug #592: osd: rebind cluster_messenger when wrongly marked down
- I should also add that Greg Farnum helped me examine the logs for this bug.
- 05:03 PM Bug #592 (Resolved): osd: rebind cluster_messenger when wrongly marked down
- This happened with commit:323565343071ce695f7d454ed29590688de64d5d on flab.ceph.dreamhost.com
While running test_u... - 08:50 PM Revision 43e0b267 (ceph): ReplicatedPG: call finish_recovery when needed
- Don't loop in ReplicatedPG::start_recovery_ops. There is already a loop
in both recover_replicas and recover_primary ... - 08:33 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- Colin McCabe wrote:
> Another potential issue that I can see here is that the code in OSD::_process_pg_info doesn't ... - 12:43 PM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- Another potential issue that I can see here is that the code in OSD::_process_pg_info doesn't check whether it got a ...
- 09:26 AM Bug #590: osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- Need to look at this more closely. Fred, pretty sure no data is lost here, but the recovery code needs some fixing.
... - 06:19 AM Bug #590 (Resolved): osd/PG.cc:1645: FAILED assert(info.last_complete >= log.tail || log.backlog)
- After upgrading to ceph 0.23, the cluster (3 osd, 3 mon, 3 non-clustered mds) worked for about 2 hours and then one c...
- 06:09 PM Revision ea5d1d66 (ceph): osd_resurrection_1_impl: turn on recovery at end
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:47 AM Feature #526 (Resolved): osd: unfound objects rework
- We now let the PG become active even when there are unfound objects. When the user tries to read one of those objects...
- 07:39 AM Linux kernel client Feature #591 (Resolved): implement FALLOC_FL_PUNCH_HOLE
- 12:52 AM Revision 4adfdee7 (ceph): Makefile: fix builddir weirdness
- Signed-off-by: Jim Schutt <jaschut@sandia.gov>
- 12:10 AM Bug #585: OSD: ReplicatedPG::pull
- Well, it did show up again:...
11/17/2010
- 10:37 PM Revision 7e9812b4 (ceph): osd: rev PG::Info encoding for last_epoch_clean change
- This was missed by 184fbf582b27c10b47101735a4495fe8c73ad186, so any fs
created between now and then won't decode prop... - 09:06 PM Revision c17e7da4 (ceph): Merge branch 'mds_frags' into unstable
- 09:06 PM Revision 7f6a2561 (ceph): mds: clear PIN_SUBTREE on split/merge in purge_strays
- This makes the helper work for merge as well as split. Remove the special
fixups in the caller that were making spli... - 09:06 PM Revision 66d43ac8 (ceph): mds: fix subtree map update on dirfrag merge
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:06 PM Revision b705be11 (ceph): mds: wrlock scatterlocks to prevent a gather racing with split/merge lo...
- We have the dirs split in our cache for some time while journaling it to
disk, before the fragment_notify goes out. ... - 09:06 PM Revision f6823a79 (ceph): mds: adjust dir_auth_pins on steal_dentry
- dir_auth_pins is a counter of dentry auth_pins in the current dir; those
need to be added in when stealing.
Signed-o... - 09:06 PM Revision cd5ee006 (ceph): mds: initialize PIN_SUBTREE on split
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:06 PM Revision d538817f (ceph): mds: flush log on fragment
- This makes request lock auth_pins expire, so the fragment moves along.
Otherwise we can end up waiting for the log fl... - 09:06 PM Revision 3777ff8a (ceph): mds: move dirty rstat inodes to new dir on refragment
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:06 PM Revision 669b5544 (ceph): mds: don't complete freeze while parent inode is frozen
- This makes maybe_finish_freeze() conditions match that of is_freezeable()
and avoids an assert.
Signed-off-by: Sage ... - 09:04 PM Revision b58b8d09 (ceph): mds: fix discover requests, tracking wrt fragments
- Track discover requests by tid. The old system of tracking outstanding
discovers was kludgey and somewhat broken. A... - 09:02 PM Revision a63c06c8 (ceph): mds: fix EFragment replay
- If the inode already exists in our cache, adjust our (existing) fragments.
But it might not. In that case, we just r... - 09:02 PM Revision a961049b (ceph): mds: don't fragment mdsdir or .ceph
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:48 PM Revision b54880e0 (ceph): Detect broken system linux/fiemap.h
- RedHat 5.5 has a /usr/include/linux/fiemap.h, but it is
broken because it does not itself include linux/types.h.
As a... - 06:24 PM Revision 29a9e668 (ceph): osdmap: don't include blacklist info in summary
- It's confusing users and isn't that important.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:58 PM Revision c43455ce (ceph): client: Remove the I_COMPLETE flag from the parent directory in relink_...
- This papers over issues arising from the client's lack of proper support
for hard links, and lets it pass the snaptes... - 02:35 PM Bug #589 (Resolved): OSD: crash on startup, PG::read_state
- Ok, this is fixed by commit:7e9812b4a9bbf320a8b0bd0abec48c1c5d78fe66. Assuming your fs is old enough you should be o...
- 11:38 AM Bug #589 (Resolved): OSD: crash on startup, PG::read_state
- After upgrading to today's unstable all my OSD's crashed directly after startup, for example osd0:
Last loglines a... - 12:56 PM Bug #531: Journaling Causes System Hang
- Just pinging you on this one. If you can send the logs I'd like to sort this out. Thanks!
- 09:59 AM CephFS Bug #344: cfuse should pass all qa tests
- At this point the only test it's failing is bonnie. This one tends to fail on a SEGV that just keeps going through th...
- 09:57 AM CephFS Bug #583 (Resolved): cfuse fails snaptest-upchildrealms
- Okay, a proper fix for this is going to require a bit of work, since right now Inodes can only have one parent dentry...
- 09:52 AM CephFS Cleanup #588 (Resolved): Allow Inodes to have multiple parent Dentries
- Right now, cached Inodes can only have one parent Dentry. This is unfortunate when there are multiple hard links to a...
- 09:40 AM Tasks #587 (Rejected): install mpich2 on sepia*
- this will make management and testing easier
- 07:52 AM Bug #585 (Closed): OSD: ReplicatedPG::pull
- This one should also be fixed in the latest unstable. Probably. The recovery code is still being worked on a bit, b...
- 02:55 AM Bug #585 (Resolved): OSD: ReplicatedPG::pull
- On two OSD's (osd5 and osd10) I'm seeing the same crash, the crash almost directly after starting them.
I cranked ... - 07:19 AM Bug #586 (Resolved): OSD: Crash during scheduled scrub
- This was fixed in the commit right after what you were running, commit:556ba7397c352f5a6cb7fe03087c6e2f51dbce32
- 05:31 AM Bug #586 (Resolved): OSD: Crash during scheduled scrub
- After I reported #585 I didn't pay much attention to my cluster, until I found out that I had only one OSD left onlin...
- 12:09 AM Revision d57181d3 (ceph): config: added max_mds
- MDSMonitor: create_new_fs adapted to use the max_mds parameter
max_mds is now a configurable value and create_new_fs...
11/16/2010
- 09:00 PM Tasks #584 (Rejected): do throughput scaling tests on sepia
- Use rados bench on N nodes, scaling N, and see how the throughput scales.
- 08:09 PM Revision c4931265 (ceph): mds: make dirfrag thrashing join and split
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:09 PM Revision d1dcc035 (ceph): mds: allow frag merge on subtree root
- Fix purge_stolen and adjust_dir_fragments.
Signed-off-by: Sage Weil <sage@newdream.net> - 08:08 PM Revision 8f24919d (ceph): mds: add timestamp to LogEvents
- This just gives us a bit of useful info when debugging problems.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:32 PM Revision 56b9e927 (ceph): osd: fix trailing + in pg state string rendering
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:21 PM CephFS Bug #583: cfuse fails snaptest-upchildrealms
- Looks like the problem is caused by linking b/bar to b/foo. The server response to goes through insert_dentry_inode v...
- 06:17 PM CephFS Bug #583 (Resolved): cfuse fails snaptest-upchildrealms
- Fails to rm a/b, ENOTEMPTY.
- 06:11 PM Feature #582 (Closed): Make max_mds configurable
- 03:06 PM Feature #582 (Closed): Make max_mds configurable
- Right now the only way to set it is with the set_max_mds mon command. Add it to the config stuff and have create_new_...
- 06:10 PM Revision 2c9873f0 (ceph): Merge remote branch 'origin/unfound' into unstable
- 06:06 PM Revision d17f7444 (ceph): mds: be less noisy about cap imports
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:01 PM Revision 05bd6b07 (ceph): Merge branch 'mds_dir_hash' into unstable
- 06:01 PM Revision e146767e (ceph): mds: make dentry hash a dir layout property
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:01 PM Revision cc709df8 (ceph): mds: add DIRLAYOUTHASH feature bit
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:01 PM Revision be29e4c3 (ceph): mds: set mode before all the file type dependent inode initialization!
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:01 PM Revision 33580460 (ceph): mds: set dir hash on root inode
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:01 PM Revision 77c05fbc (ceph): mds/client: pass dir hash over the wire
- Add a feature bit DIRLAYOUTHASH.
Also fix client request routing for lookups (we were only hashing when
a Dentry poi... - 05:13 PM Bug #479: ceph/mount crash badly when writing
- Sorry Sage and Yehuda for the late update..
I was spending time experimenting, and just using the default btrfs with... - 01:48 PM Bug #538: Write performance does not scale over multiple computers
- Did you update your installed version of the rados tool as Sage said? If you did and are still getting poor performan...
- 12:48 PM Bug #518: cfuse crashed on ls
- Confirmed this is fixed 0.23.1 (sorry for huge delay in confirmation).
- 12:06 PM CephFS Feature #483 (Resolved): mds: add timestamp to LogEvent
- commit:8f24919d39734cf518f2bf6e50faf6f5266d6eff
- 11:52 AM CephFS Feature #560 (Resolved): mds: alternate directory hashing
- kernel part is done and in unstable branch, currently commit:9f62e3eaafd52875e1f2e4344e11e51ddb726f48
- 09:59 AM CephFS Feature #560: mds: alternate directory hashing
- commit:05bd6b078d743d6c235c0fcedda7ee4f64ab2ad5 has it working for the user client.
- 02:33 AM Revision 267cd845 (ceph): RadosClient::shutdown: call monclient::shutdown
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 02:22 AM Revision dfb78ebf (ceph): osd: don't stop recovery when there are unfound
- There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push... - 01:03 AM Revision d014acb6 (ceph): dumpjournal.cc: fix compile
- dumpjournal needs to create its own SafeTimers and pass them in to some
constructors.
Signed-off-by: Colin McCabe <c... - 12:44 AM Revision da2d5018 (ceph): rbd: fix rbd snap rm class handling
11/15/2010
- 10:59 PM Revision 250d414e (ceph): Merge remote branch 'origin/unfound_last_epoch_clean' into unstable
- 10:47 PM Revision c7075115 (ceph): Add ./ceph osd tell <osd-num> dump_missing <out>
- Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging m... - 10:38 PM Revision 755f5759 (ceph): search_for_missing:recalc stats if unfound changed
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:31 PM Revision d883a547 (ceph): mds: Use CDir bloom filter as appropriate.
- Add items to the bloom filter when trimming, and look for them
in the filter in the few places where a simple existen... - 09:31 PM Revision be2da00a (ceph): mds: Add bloom filter to CDir.
- You can now add items to a bloom filter and check for their existence.
This is intended to be used when trimming item... - 09:23 PM Revision 1fe31e18 (ceph): timer: make init/shutdown explicit
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:39 PM Revision d2af7b7e (ceph): test_unfound.sh: start recovery at end of test
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 08:31 PM Revision c293b9af (ceph): test_common.sh: add dump_osd_store
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 08:15 PM Revision 184fbf58 (ceph): osd: add last_epoch_clean to PG::Info
- This changes the encoding in a non-backwards compatible way.
Signed-off-by: Sage Weil <sage@newdream.net> - 08:15 PM Revision 873e9bf8 (ceph): osd: add incompat feature LEC for last_epoch_clean
- So an old binary will fail to mount a store with new Info encoding.
Signed-off-by: Sage Weil <sage@newdream.net> - 08:15 PM Revision b0c22bd5 (ceph): Add MOSDPGMissing
- Add MOSDPGMissing, a message which just contains the missing objects
information for a PG. We will request messages l... - 08:15 PM Revision d3cf4787 (ceph): PG::finish_recovery: set info.last_epoch_clean
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 08:15 PM Revision e768bbdf (ceph): Add stray_test to test_unfound.sh
- This test is designed to produce a stray that nonetheless has some
useful objects. The primary should be able to find... - 08:15 PM Revision 796ff1d1 (ceph): Fix bugs in search_for_missing, _process_pg_info
- PG::search_for_missing: fix a bug with the handling of MSG_OSD_PG_INFO
messages. Formerly, when processing these mess... - 08:15 PM Revision e3f65076 (ceph): osd: add discover_all_missing
- Add discover_all_missing. This function makes sure that we have messages
en route to any OSD that we think might have... - 08:15 PM Revision 470b1990 (ceph): stray_test:don't use up/down. timeout extension
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 08:15 PM Revision 05a16d32 (ceph): test_unfound.sh: fix return codes
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 08:15 PM Revision 6a65cc4f (ceph): test_common.sh: remove messenger debug for now
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 08:06 PM Revision 873180aa (ceph): osd: skip unfound in recover_replicas
- This is moot currently, since we don't currently start recovering replicas
until the primary is complete.
Signed-off... - 08:04 PM Revision d61bc3bf (ceph): osd: skip unfound objects in recover_primary()
- We also need to make sure we come back later when they are found.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:57 PM Revision 9ea1d8bb (ceph): osdmap: make printing a bit easier to read
- Signed-off-by: Sage Weil <sage@newdream.net>
- 07:50 PM Revision beae97f9 (ceph): objecter: don't dereference null op->outbl
- Signed-off-by: Sage Weil <sage@newdream.net>
- 07:36 PM Revision 089cd12d (ceph): include: Add bloom filter library to include/
- Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
- 07:25 PM Revision f2c080b3 (ceph): Merge remote branch 'origin/testing' into unstable
- 07:25 PM Revision 556ba739 (ceph): osd: unreg scrub when removing pg
- This fixes this crash:
osd/OSD.cc: In function 'PG* OSD::_lookup_lock_pg(pg_t)':
osd/OSD.cc:956: FAILED asse... - 04:54 PM CephFS Feature #560: mds: alternate directory hashing
- almost there. need to fix/test uclient hashing.
then implement for kclient... - 04:44 PM Bug #580 (Resolved): rbd rm snap is broken
- Fixed with commit:da2d50180dfdc0e30b4348f2acceb2be650f20b7
- 03:42 PM Bug #580 (Resolved): rbd rm snap is broken
- When doing 'rbd rm snap', the rbd image header gets corrupted.
- 01:49 PM Bug #535 (Resolved): cephtool hangs forever until a UNIX signal is received
- Sage spent some time on the messenger too, and I suspect we're done now.
- 01:39 PM CephFS Feature #545: mds: use bloom filter to supplement dirfrag COMPLETE flag
- Pushed it to branch "mds" (which I apparently created, but thought existed...weird!). Testing it now on a secondary i...
- 11:19 AM Bug #579 (Resolved): OSD::sched_scrub: FAILED assert(pg_map.count(pgid)
- commit:f46f674261bf65a6f7f6313fb688ec4773f526b5
- 10:56 AM Bug #579: OSD::sched_scrub: FAILED assert(pg_map.count(pgid)
- Some more information about this bug.
OSD1 and OSD2 have a PG named 0.6
OSD0 does not.
=====================
... - 10:51 AM Bug #579 (Resolved): OSD::sched_scrub: FAILED assert(pg_map.count(pgid)
- On unfound_last_epoch_clean at commit commit:7201497f2feef6a2bbd0baf89e3a14b8a880e79f
I found this assert when run... - 07:05 AM Bug #538: Write performance does not scale over multiple computers
- I set 'osd heartbeat grace=120' and that got rid of the chatter. My performance is now:...
- 04:48 AM Revision 7f38858c (ceph): Merge branch 'msgr_zerocopy_read' into unstable
- 04:39 AM Revision 7cb2d508 (ceph): msgr: use provided rx buffer if present
- This changes the read path so that we hold the Connection::lock mutex while
reading data off the socket. This ensure... - 04:39 AM Revision e8132cd9 (ceph): objecter: post rx buffer to msgr if target bufferlist is present
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:39 AM Revision 975dd8fa (ceph): librados: pass provided buffer to objecter on rados_read
- This allows us to avoid to the data copy if the objecter and msgr manage
to use it.
Signed-off-by: Sage Weil <sage@n... - 04:23 AM Revision 2854dae8 (ceph): msgr: add Connection rx buffer interface
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:23 AM Revision c04ba725 (ceph): msgr: implement get_connection()
- Get a Connection* for the given destination. This mirrors submit_message,
but does not actually queue a message.
Si... - 04:21 AM Revision 67852352 (ceph): buffer: implement list::iterator::get_current_ptr()
- Return a buffer::ptr for the ptr at the current position/offset, with the
length set to the remaining space in the cu...
11/14/2010
- 09:05 PM Messengers Feature #527 (Resolved): zero copy reads, msgr rx buffer infrastructure
- commit:7f38858c0c19db36c5ecf36cb4d333579981c811
- 07:29 PM Revision 4af14db4 (ceph): Objecter::shutdown: shut down timer.
- We have to explictly shut down the timer in Objecter::shutdown.
Otherwise, we are relying on the destructor of SafeTi... - 11:33 AM Bug #578 (Resolved): assert triggered on radostool shutdown
- 11:33 AM Bug #578: assert triggered on radostool shutdown
- Fixed by commit:4af14db424e770c2f3e99dad6fd2b6f2059feacd
A mutex lifecycle issue. - 11:26 AM Bug #578 (Resolved): assert triggered on radostool shutdown
- I hit this assert when radostool was exiting.
./common/Mutex.h:97: FAILED assert(nlock == 0)
ceph version 0.24~r...
11/13/2010
- 08:46 PM Bug #574: timer: event cancellation apparently broken
- cancel_event always relied on the caller to take the SafeTimer lock, and then goes on to take the Timer lock. So it's...
- 08:39 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- It looks good so far.
- 04:43 AM Revision f18609e8 (ceph): Merge remote branch 'origin/msgr' into testing
- 12:00 AM Revision 2be4215a (ceph): debug: don't print thread id twice
- Signed-off-by: Sage Weil <sage@newdream.net>
11/12/2010
- 11:59 PM Revision b61af6a7 (ceph): msgr: cleanup: make queue_received non-inline; some helpful debug
- Signed-off-by: Sage Weil <sage@newdream.net>
- 11:56 PM Revision f99c84e6 (ceph): msgr: do not clear halt_delivery
- We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new m... - 10:55 PM Revision 1071a9ab (ceph): msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
- Close a few different races here.
Also, assert that queue_items are not queued in ~Pipe().
Signed-off-by: Sage Weil... - 10:55 PM Revision d4746ab5 (ceph): msgr: close enqueue/discard race
- We need to re-check halt_delivery after dropping and retaking pipe_lock.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:55 PM Revision 20937e88 (ceph): msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
- We want to make sure the pipe's queue item doesn't go away.
Also, make queue_received() require pipe_lock to be held... - 10:55 PM Revision cbf154e1 (ceph): msgr: only close socket on reconnect or shutdown
- We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race w... - 10:55 PM Revision 70fe062f (ceph): msgr: add 'ms inject socket failures = foo'
- Where we fail roughly every foo'th socket operation.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:49 PM Revision 20affc65 (ceph): TestTimers: don't test (nonexistent) Timer
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 10:45 PM Revision d5032a05 (ceph): Rename PG::peer to PG::do_peer
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 03:59 PM Revision 46cf27d4 (ceph): Merge branch 'testing' into unstable
- 03:55 PM Revision c5b2d28b (ceph): uclient: insert lssnap results under snapdir, not live dir
- Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent d... - 03:36 PM Revision 7ccdae8c (ceph): msg: fix buffer size for IPv6 address parsing
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 02:20 PM Bug #577 (Resolved): unify PG creation code in OSD::handle_pg_notify and OSD::_process_PG_info
- unify PG creation code in OSD::handle_pg_notify and OSD::_process_PG_info
Duplicated code here. They're slightly d... - 02:16 PM CephFS Feature #545: mds: use bloom filter to supplement dirfrag COMPLETE flag
- Trying to find a bloom filter library. Unfortunately there don't seem to be any available under a GPL-compatible lice...
- 01:16 PM Bug #490 (Can't reproduce): Cluster stays in a degraded state
- 01:15 PM CephFS Cleanup #514 (Rejected): Optimize MIX/MIX_STALE reconnects, etc
- mix_stale is no more
- 12:56 PM Linux kernel client Bug #576 (Can't reproduce): readdir returns too many results
- ...
- 11:02 AM Bug #535: cephtool hangs forever until a UNIX signal is received
- Pushed a potential fix to the msgr branch, waiting for Colin to report back on if it works or not. :)
- 07:56 AM CephFS Bug #561 (Resolved): snaptest-2 doesn't execute properly
- Figured this out. LSSNAPs was adding the snap dentries to the cache under the parent dir instead of the hidden .snap...
- 07:37 AM Messengers Bug #573 (Resolved): monmaptool fails to parse IPv6 address
- Thanks, applied as commit:7ccdae8cd44c143550234511a2a09bab38c6515e
- 04:56 AM Messengers Bug #573: monmaptool fails to parse IPv6 address
- After searching through the source I found it :)
Attached is a patch to fix the IPv6 address parsing. The buffer w... - 05:12 AM Bug #575 (Resolved): monmaptool terminates when input file is not a monmap
- For example:...
- 03:30 AM Bug #540: CephxClientHandler::handle_response
- Just saw it again on the same cluster, this time osd2 crashed when upgrading to this morning's unstable:...
- 12:29 AM Bug #540: CephxClientHandler::handle_response
- I saw that on a test machine of mine. The 'ceph -w' command was hanging for about 10 seconds and then exited with thi...
- 12:38 AM Revision ce6d6394 (ceph): timer: rewrite mostly from scratch
- Just use the provided lock. This _vastly_ reduces the complexity because
we don't have to worry about races between ...
11/11/2010
- 11:31 PM Revision 54848991 (ceph): mds: hit inode created via CREATE
- We missed this path!
Signed-off-by: Sage Weil <sage@newdream.net> - 10:28 PM Revision f8b3271f (ceph): Merge branch 'rc' into unstable
- Conflicts:
configure.ac
src/Makefile.am - 05:47 PM Bug #531: Journaling Causes System Hang
- Sorry I have been able to get the debug output yet. We have spent the last few days working with our production syste...
- 04:47 PM Linux kernel client Tasks #569 (Resolved): test dir frags
- a few fixes, mostly fine. commit:7b88dadc13e0004947de52df128dbd5b0754ed0a
- 04:43 PM Bug #574 (Resolved): timer: event cancellation apparently broken
- Looking into this, it appears that the problem was that the wrong lock was taken during cancel event. Or that the ev...
- 03:38 PM Bug #574 (Resolved): timer: event cancellation apparently broken
- Just saw this on latest unstable, commit:f8b3271f45cc4a87e3f3f212d22e3d34ff13da44
The monitor schedules a propose ... - 03:09 PM CephFS Tasks #366 (New): test snaptests against clustered mds failures
- 03:08 PM CephFS Tasks #366 (Rejected): test snaptests against clustered mds failures
- 03:08 PM CephFS Bug #362 (Rejected): mds: rejoin crashes on snaptest-2 workload
- 02:45 PM Bug #540: CephxClientHandler::handle_response
- Wido just saw this:...
- 05:18 AM Revision 5d1d8d0c (ceph): v0.23
- 04:58 AM Revision 3d10b340 (ceph): mds: fix null_snapflush with multiple intervening snaps
- The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap. However, the ... - 02:17 AM Messengers Bug #573 (Resolved): monmaptool fails to parse IPv6 address
- I'm trying to setup a small cluster with IPv6, but mkcephfs fails:...
- 12:36 AM Revision 3d6e9155 (ceph): Merge remote branch 'origin/unfound' into unstable
- 12:31 AM Revision 4d941cf4 (ceph): osd: scrub: change cancel behavior
- Use explicit flag, so that scrub_reserved always indicates whether the
osd count includes us or not.
Signed-off-by: ... - 12:31 AM Revision a87e8901 (ceph): osd: track last_scrubbed in PG::Info::History
- Share with peers and write to disk on scrub completion.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:31 AM Revision 6548fb65 (ceph): osd: do scrub schedule state changes inside scrub()
- Update these values under protection of pg lock iff we start scrubbing,
otherwise back out.
On scrub completion, unr... - 12:31 AM Revision 815c3d56 (ceph): osd: fix sched_scrub
- Insert whoami into reserved set on primary, not 0! Also more cleanup of
sched state helpers.
Signed-off-by: Sage We... - 12:31 AM Revision 92572910 (ceph): osd: call sched_scrub on reserve reply
- Otherwise we have to wait until the next time it's called by the timer, and
during that period we have a reservation ... - 12:31 AM Revision c12829a2 (ceph): osd: don't scrub something we just scrubbed
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:31 AM Revision 85e08905 (ceph): osd: scrub least recently scrubbed pgs first; once a day
- Signed-off-by: Sage Weil <sage@newdream.net>
11/10/2010
- 10:50 PM Revision 231434af (ceph): pg_state_string: use an ostringstream
- Use an ostringstream for efficiency's sake.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 09:49 PM Revision d247616c (ceph): vstart: stop logging to /tmp/foo
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:46 PM CephFS Bug #561: snaptest-2 doesn't execute properly
- I ran the test again and didn't get an mds crash. There was one issue remaining:...
- 06:14 PM CephFS Bug #561 (In Progress): snaptest-2 doesn't execute properly
- I think I may have finally nailed this problem, or at least found a band-aid by more aggressively removing the I_COMP...
- 09:39 PM Revision 74be621c (ceph): osd: fix scrub reserved state when starting scrub
- Also document scrub scheduling/pending/active states.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:18 PM CephFS Bug #570 (Resolved): Locker::_do_null_snapflush assert failure
- 09:18 PM CephFS Bug #570: Locker::_do_null_snapflush assert failure
- Nice catch. Fixed by commit:3d10b340748e5bbff86b49ac7386da9efa27a070. Added a unit test too!
- 02:58 PM CephFS Bug #570 (Resolved): Locker::_do_null_snapflush assert failure
- Seen this a lot while working on the snaptest-2 issue, when shutting down cfuse....
- 09:16 PM Revision 8650418f (ceph): vstart: turn down msgr debugging
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:13 PM Revision 9e4027fb (ceph): monc: cancel timer events with lock held
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:23 PM Revision 07bb6756 (ceph): Wake up clients waiting for now-found objects
- PG::search_for_missing: when we find a previously unfound object, check
to see if there is an entry in waiting_for_mi... - 07:46 PM Revision 8288a23a (ceph): PG::peer: don't block if objects are unfound
- Erase the code in PG::peer that used to keep us from becoming active
when objects were still unfound. Print out the n... - 07:46 PM Revision 040c4bcd (ceph): PG::search_for_missing: minor refactoring, comment
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:46 PM Revision 5153ba5e (ceph): Add PG::Missing::have_missing()
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:46 PM Revision 85c4e6e6 (ceph): OSD::_process_pg_info:search_for_missing sometimes
- OSD::_process_pg_info: If we're the primary for this active PG, and we
have missing objects, call search_for_missing.... - 07:46 PM Revision 6a04ac52 (ceph): PG::recover_master_log: rename a local variable
- PG::recover_master_log: rename a local variable to avoid using the
overloaded term "missing".
Signed-off-by: Colin M... - 07:46 PM Revision b5181133 (ceph): test_unfound.sh: shorter test
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:46 PM Revision 02ec7219 (ceph): Add num_objects_unfound to struct pg_stat_t
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:46 PM Revision fc605ced (ceph): test_unfound.sh: verify that we have unfound objs
- test_unfound.sh: verify that we have unfound objs.
Then, when we bring up the other OSD, verify that those unfound ob... - 07:46 PM Revision b9191ddc (ceph): test_unfound.sh: test reading an unfound object.
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 07:46 PM Revision e6b6c539 (ceph): PG::peer: count/find cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:30 PM Revision b80f3e6a (ceph): PG: move ostream operator to .cpp file
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:30 PM Revision a46f15e7 (ceph): PG: nomenclature change: talk about unfound objs
- Describe objects as "unfound" when we don't know what OSD has them.
Signed-off-by: Colin McCabe <colinm@hq.newdream.... - 06:30 PM Revision ef1f8ecd (ceph): PG.h erase deadcode
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:16 PM Bug #535 (In Progress): cephtool hangs forever until a UNIX signal is received
- After checking the logs and conferring with Sage, I think I've found a possible cause. Designing and testing a fix no...
- 05:43 PM Revision 82aa79f8 (ceph): mds: fix inode->frag rstat projected with snaps
- The snapid 'first' value needs to be >= inode->first; move that into
the helper.
Signed-off-by: Sage Weil <sage@newd... - 05:04 PM Revision 5deef243 (ceph): osdmap: break up asserts for easier debugging
- If we fail one of these it's helpful to know which one.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:03 PM Revision 586c9e7a (ceph): objecter: throttle before looking at lock protected state
- The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take... - 04:50 PM Revision 57513739 (ceph): mon: drop unnecessary state checks
- We want to ignore all beacons from the mds regardless of what state they
are in.
Signed-off-by: Sage Weil <sage@newd... - 04:46 PM Feature #567 (Resolved): osd: background scrub frequency, scheduling
- fixed up some scheduling problems, then added the interval and oldest-scrubs-first stuff.
- 04:45 PM Revision 84840ed7 (ceph): debian: don't explicitly depend on libgoogle-perftools0
- dpkg-buildpackage will autodetect the dependency. Except on lenny, where
it doesn't exist and we don't use it!
Sign... - 04:14 PM Revision ca3693d8 (ceph): mds: Enable --journal_check mode.
- This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
a... - 04:13 PM Revision 214b7269 (ceph): osdc: Fix bad assert in ~ObjectCacher.
- The objects data member is never empty on shutdown since it now consists
of a vector of pools. Instead, check each po... - 03:43 PM Feature #572 (Resolved): Implement lingering osd requests
- For the watch/notify feature we need to implement lingering osd requests on the userspace client side. Lingering osd ...
- 03:42 PM Revision 5035c822 (ceph): uclient: only update inode if version increased
- This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning inf... - 03:21 PM Linux kernel client Bug #571 (Closed): client hangs after osd disconnection
- This happens on the rbd watch/notify sync branch. Probably related to lingering requests.
- 12:12 PM Bug #559 (Rejected): osd: dup requests can ack early
- nevermind, this is already done and merged!
- 11:01 AM Linux kernel client Tasks #569 (Resolved): test dir frags
- Make sure we behave with fragmented dirs, esp readdir. (probably need to mirror the recent cfuse fixes.)
- 09:43 AM Bug #521 (Resolved): objecter: crash in osdmap assert
- commit:586c9e7a80b425802ca77d8c09bb00da5c25d616
- 09:15 AM Feature #568 (Resolved): debian: build with --as-needed?
- Can we do this to limit dependencies? See #544.
And the current warnings like... - 08:18 AM CephFS Feature #548 (Resolved): mds: shadowreplay one-shot mode
- commit:ca3693d8ffcdffc3ae95eaba506a72889829bcb5 makes minimal changes to the MDS and MDSMonitor code to enable the ne...
- 08:03 AM Revision 255e34af (ceph): decompile_crush_bucket: fix depth-first decomp
- We need to ensure that buckets are output after their dependencies. The
best way to do this is a depth-first traversa... - 07:58 AM Revision d1f15daf (ceph): CrushWrapper:get_bucket: ret ENOENT for no bucket
- All the callers of CrushWrapper::get_bucket() check for error codes, but
not for NULL returns. So if there is no buck... - 07:24 AM Bug #531: Journaling Causes System Hang
- What would be helpful in diagnosing this problem is:
- turn up osd logging, in [osd] section:
debug osd = 20
...
11/09/2010
- 11:56 PM Revision 11cfcfe8 (ceph): Merge branch 'sched_scrub' into unstable
- Conflicts:
src/osd/PG.cc
src/osd/PG.h - 11:50 PM Revision e8ad6d26 (ceph): osd: small cleanup
- Signed-off-by: Sage Weil <sage@newdream.net>
- 11:46 PM Revision 28b44293 (ceph): osd: scrub: list objects without lock held
- We'll go back to get anything we missed later.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:46 PM Revision c2d6d05f (ceph): Merge branch 'scrub_no_lock' into unstable
- 11:34 PM Revision 966369aa (ceph): ps-ceph.pl: don't show self
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 11:04 PM Revision 6bc31511 (ceph): gui: add missing #include
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:50 PM Revision 58394828 (ceph): Merge branch 'rbd-fiemap' into unstable
- 10:49 PM Revision e991702e (ceph): objecter: set READ flag on new objecter mapext/read_sparse ops
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:48 PM Revision adac5163 (ceph): objecter: fix balancer for ops with length < 0
- Notably, mapext.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:36 PM Revision 20060548 (ceph): filestore: autodetect presense of FIEMAP ioctl
- If it's not there, assume the whole object is allocated.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:35 PM Revision e5488718 (ceph): fiemap: include linux fiemap.h header; unconditionally compile helper
- If the system doesn't have the header, use our copy.
Signed-off-by: Sage Weil <sage@newdream.net> - 10:33 PM Revision 9f14dd25 (ceph): ps-ceph.pl: display Ceph tests
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 10:23 PM Revision 53b076d5 (ceph): Merge remote branch 'origin/rbd-fiemap' into unstable
- 10:06 PM Revision 2325a1a2 (ceph): Fix example config file
- We need to specify a journal size for the file-based journal we set up
in the example config file.
Signed-off-by: Co... - 09:59 PM Revision 2947d19d (ceph): TimerThread:don't call pop_front before iter deref
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:30 PM Revision 1c7d8f1a (ceph): Makefile: use openssl module check
- This allows ceph to build with --as-needed.
Signed-off-by: Kacper Kowalik <xarthisius@gentoo.org> - 09:17 PM Revision 954ad982 (ceph): osd: shut down if we do not exist
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:08 PM Revision ea56dfdc (ceph): osd: handle osds that no longer exist in prior_set_affected
- Consider no-longer-existent OSDs lost.
Signed-off-by: Sage Weil <sage@newdream.net> - 08:05 PM Revision 29428b9b (ceph): Objecter: initialize timer in Objecter::init
- Just in case future users of Objecter want to create one before calling
Messenger::start as a daemon.
Signed-off-by:... - 06:15 PM Revision ec4200b0 (ceph): Add test_crushtool.sh
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:06 PM Revision 019bb70e (ceph): mds: turn on mds_bal_frag (dir fragmentation) by default
- Let the fun begin!
Signed-off-by: Sage Weil <sage@newdream.net> - 06:04 PM Revision ae13fc86 (ceph): osd: handle osds that no longer exist in build_prior
- Fix build_prior to handle OSDs that no longer exist in the current map.
Consider them lost.
Signed-off-by: Sage Weil... - 06:04 PM Revision e15c9569 (ceph): mds: fix inode freeze auth pin allowance
- When we're renaming across nodes, we need to freeze the inode. This
requires that we allow for the auth_pins that _w... - 06:03 PM Revision 3107944e (ceph): osdmap: cleanup: add parens
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:59 PM Revision f28b99b3 (ceph): CrushWrapper::get_bucket_item: bounds check
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 05:59 PM Revision 9b487256 (ceph): crushtool: don't create a dump we can't recompile
- In crushtool, dump buckets in tree order. Buckets which reference other
buckets must be dumped after their depedencie... - 05:55 PM Revision e1588dc4 (ceph): mds: wipe out client sessions on startup
- For disaster recovery and such.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:55 PM Revision 05a47387 (ceph): mon: implement 'mds newfs <metapool> <datapool>' command
- Create a new fs (by creating a new MDSMap) using the given pools.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:55 PM Revision d80948ad (ceph): mds: use mdsmap data pool for root inode default layout
- The MDSMap may specify any random pool as the data pool; use that.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:55 PM Revision 8a21c6f6 (ceph): mds: add mds_skip_ino and mds_wipe_ino_prealloc options
- These are last-ditch recovery tools. Not particularly effective ones,
though.
Signed-off-by: Sage Weil <sage@newdre... - 05:04 PM Linux kernel client Bug #549: bonnie++ file stat failure
- bonnie tests are running under ceph 5, 6, 8, and 9, logging to /data/qa/ on each machine.
- 04:28 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- cephtool-hang-at-966369aad07461f2610b4dd2a9cdc770155c5a89.txt
- 03:08 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- messenger-bug.txt
- 04:26 PM Bug #521: objecter: crash in osdmap assert
- Can you try with something like...
- 09:45 AM Bug #521: objecter: crash in osdmap assert
- latest from ML:...
- 03:59 PM Feature #567 (Resolved): osd: background scrub frequency, scheduling
- We should have some min interval such that the osds won't scrub the same osd more frequently than that.
Also, the ... - 03:56 PM Feature #425 (Resolved): trigger osd scrub automatically
- 03:54 PM Subtask #485 (Resolved): osd: cooperative scrub scheduling
- merged by commit:11cfcfe87503e50c892178d9c5c5b55da3aac740
- 03:45 PM Subtask #486 (Resolved): osd: make scrub not block writes
- merged commit:28b44293e34c5e97f350b4c68becdf9e7767ed6f
- 02:52 PM Bug #248 (Resolved): rbdtool import should use fiemap
- 02:52 PM Bug #248: rbdtool import should use fiemap
- Merged by commit:58394828a01950d7b26430d61d32df91df5a5fb1, bringing it in line with the objecter changes over the las...
- 02:13 PM RADOS Bug #558 (Resolved): crushtool cannot always re-encode a crushmap that it's created
- Fixed by commit:9b48725614a880cf1f4bcad0bba2ceefdc76c167
C. - 02:11 PM Bug #533 (Resolved): radostool hang on shutdown
- Should be fixed by timer-fixes.
C. - 02:10 PM Bug #565 (Resolved): Example config file is broken
- Fixed by 2325a1a27b434cea7d7af832efff7a9257724fe6
C. - 01:30 PM Bug #544 (Resolved): ceph-0.22.2: fails to build with --as-needed
- 01:16 PM Bug #566 (Resolved): osd: build_prior needs to be wary of nonexistent osds
- fixed by commit:954ad98230085c9c2a174fe15af24df237498977 commit:ea56dfdc663f8b0e19346bb63ffe3fec0c7759c4 commit:ae13f...
- 12:59 PM CephFS Bug #556 (Resolved): clustered mds: rename
- this wasn't too bad.. the locking auth_pin scheme changed a while ago and the auth_pin allowance didn't get adjusted ...
- 12:42 PM Linux kernel client Bug #546 (Resolved): direct i/o does not work when offset is not page-aligned
- See commit:c5c6b19d4b8f5431fca05f28ae9e141045022149. Passes my tests.
- 06:03 AM Revision aad3f7f2 (ceph): ceph.spec.in: don't strip rados classes
- Signed-off-by: Christian Brunner <christian@brunner-muc.de>
11/08/2010
- 10:49 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- > Look, I know it's a pain, but work on this isn't going to progress unless
> we collect AT LEAST:
> 1) The state ... - 01:05 PM Bug #535 (Can't reproduce): cephtool hangs forever until a UNIX signal is received
- Look, I know it's a pain, but work on this isn't going to progress unless we collect AT LEAST:
1) The state of each ... - 10:35 AM Bug #535: cephtool hangs forever until a UNIX signal is received
- The process that is hung is 17181, cephtool.
- 10:35 AM Bug #535 (In Progress): cephtool hangs forever until a UNIX signal is received
- Reproduced again on the unfound branch, which is very close to what is in unstable now.
cmccabe@flab:~/src/ceph/... - 09:22 PM Revision 64f95ad9 (ceph): mds: add missing Dumper.[h,cc]
- 09:18 PM Revision be9328ac (ceph): mds: tolerate/fix negative dir size counts
- Signed-off-by: Sage Weil <sage@newdream.net>
- 08:44 PM Revision d5515a8f (ceph): mds: add missing Dumper.[h,cc]
- 08:40 PM Bug #566 (Resolved): osd: build_prior needs to be wary of nonexistent osds
- ...
- 08:09 PM Bug #565 (Resolved): Example config file is broken
- The example config file (src/sample.ceph.conf) specifies the OSD journal as a file, but doesn't specify the size, whi...
- 05:45 PM Revision 1ab7c7ff (ceph): Replace ps-ceph.sh shell script with perl script
- A much faster version of ps-ceph.sh.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 04:17 PM Linux kernel client Bug #384 (Closed): crash in splice_dentry
- 03:07 PM Feature #80 (Resolved): uclient: readdir from cache
- He already did it, yay!
- 02:41 PM Feature #96 (Resolved): msgr: close idle connections?
- Yay, this got done with the recent SimpleMessenger changes!
- 02:15 PM Feature #276 (Resolved): Possibility to dump/list xattrs from RADOS object
- Yehuda says he did this!
- 01:24 PM Bug #531: Journaling Causes System Hang
- We've looked at this a bit more but decided today that Sage is taking it over since he's a lot more familiar with the...
- 12:49 PM Linux kernel client Bug #564 (Resolved): Configuration via configfs instead of sysfs
- Will allow creation of different devices and setting them up. Should be device oriented, and will create a sub direct...
- 11:05 AM Bug #563 (Closed): osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
- I'm running the unstable branch and I'm seeing in my dmesg:...
- 09:32 AM CephFS Bug #561: snaptest-2 doesn't execute properly
- Okay, looks like this may be an issue with the test rather than Ceph. I just copied it into the root of the ceph moun...
- 09:07 AM CephFS Bug #561 (Resolved): snaptest-2 doesn't execute properly
- Checked it on cfuse and kclient:...
- 09:27 AM RADOS Bug #558: crushtool cannot always re-encode a crushmap that it's created
- Either the compiler part just needs to be updated to allow forward bucket references, or the dumper needs to dump by ...
- 09:26 AM Feature #562 (Closed): separate gui into separate binary, package
- This will mean refactoring common ceph.cc bits into a separate file and .a.
- 09:22 AM Linux kernel client Bug #434: mds: clustered mds pjd failures
- a few more fixes here on inode updates version check and mtime.
- 07:23 AM Linux kernel client Bug #434 (Resolved): mds: clustered mds pjd failures
- this was a kclient problem caused by bad uid/gid in resent requests. fixed by commit:cb4276cca4695670916a82e359f2e377...
- 09:20 AM Tasks #406 (Closed): push v0.20.2 to upstream debian, ubuntu maintainers
- 09:20 AM CephFS Cleanup #427 (Rejected): mds: tie scatter pins directly to freeze machinery
- no more scatterpins, yay!
- 09:19 AM Linux kernel client Bug #554 (Resolved): clustered mds: max_size not updated
- 07:39 AM CephFS Feature #560 (Resolved): mds: alternate directory hashing
- Currently dentries are hashed among dirfrags using the linux dcache's hash function, which is pretty trivial. The pr...
- 07:30 AM Bug #559: osd: dup requests can ack early
- The dup request check looks at the reqid in the log, and replies early. That request could still be in flight to dis...
- 07:28 AM Bug #559 (Rejected): osd: dup requests can ack early
11/07/2010
- 06:02 PM RADOS Bug #558 (Resolved): crushtool cannot always re-encode a crushmap that it's created
- When a CRUSH text map is encoded, the buckets are read in such a way that they must be defined before they are refere...
- 05:56 PM Revision 0feec2f4 (ceph): Merge remote branch 'origin/object_locator' into unstable
- Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osd/ReplicatedPG.h
src/osd/osd_types.h - 05:45 PM Revision b7f578cf (ceph): Merge remote branch 'origin/timer-fixes' into unstable
- 05:44 PM Revision deb9ef76 (ceph): v0.24~rc
- 05:42 PM Revision 0b190920 (ceph): Merge remote branch 'origin/testing' into unstable
- 03:49 PM Revision a4674af5 (ceph): mds: eval: put scatter in MIX if replicated, otherwise LOCK
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:45 PM Revision 33c6e230 (ceph): mds: do not scatter_writebehind in MIX state
- Replicas might come in while we're flushing and get a MIX state with
the old state.
Signed-off-by: Sage Weil <sage@n... - 11:29 AM Feature #231: Slow OSDs shouldn't destroy cluster performance
- Today I experienced a btrfs bug where *[btrfs-transacti]* got to status D and causing my OSD to hang (also go into st...
- 10:18 AM Linux kernel client Bug #554: clustered mds: max_size not updated
- fixed by commit:912a9b0319a8eb9e0834b19a25e01013ab2d6a9f. also commit:feb4cc9bb433bf1491ac5ffbba133f3258dacf06 for g...
- 10:15 AM Feature #524 (In Progress): object_locator_t
- Work so far merged by commit:0feec2f4f31aa3a259b2cdf885d6458995ce860b
Still need to update the on-wire protocol to... - 10:08 AM CephFS Feature #495 (Resolved): mds: add MIX_STALE
- merged in commit:0b1909209800229f5098cdc848fc3901508c1e19. best part of this is MIX_STALE went away. yay!
- 10:05 AM Bug #248 (In Progress): rbdtool import should use fiemap
- whoops, this never got merged.
- 08:58 AM Linux kernel client Bug #557 (Can't reproduce): BUG_ON(!session->s_num_cap_releases);
- ...
- 08:11 AM CephFS Bug #556 (Resolved): clustered mds: rename
- various hangs with thrash-exports and pjd rename tests.
- 04:05 AM Revision 1bf8e732 (ceph): Merge branch 'unstable' into mix_stale
- 04:01 AM Revision 1eb94da2 (ceph): mds: introduce/use helpers to resync stale fragstat/rstat; update version
- Simplifies code.
Also, update the version when we resync!
Signed-off-by: Sage Weil <sage@newdream.net> - 04:01 AM Revision c1ee560e (ceph): mds: don't fuss with versions when taking frag/rstat from frag; it's ne...
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:01 AM Revision bdc2fa5b (ceph): mds: remove MIX_STALE
- Yay, we don't need it!
If we can't update the frag on scatter, fine. The staleness of the frag
is implicit in the f... - 04:00 AM Revision c2034829 (ceph): mds: ignore done_locking on slave requests' acquire_locks()
- Slave requests ask for each xlock one at a time. Don't bail out based on
the done_locking flag.
Signed-off-by: Sage... - 04:00 AM Revision 51b6a863 (ceph): mds: don't use helper for rename srcdn
- The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag. For the src... - 04:00 AM Revision eb0a60d0 (ceph): mds: never complete a gather on a flushing lock
- The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even mov...
11/06/2010
- 04:38 PM Revision bdf3bc5e (ceph): mds: update version when bring stale rstat back up to date
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:58 PM Revision a74054d1 (ceph): mds: simplify stale semantics a bit
- is_stale() => next MIX is MIX_STALE. Stale flag is then cleared. Then we
special case the import to preserve stale-n... - 01:30 PM Bug #555 (Closed): debian/ubuntu: ceph-client-tools needs to depend on libgtkmm-2.4-1c2a
- Invalid report, it was due to a upgrade. When doing a fresh install of the packages they do depend on libgtkmm.
Cl... - 11:52 AM Bug #555 (Closed): debian/ubuntu: ceph-client-tools needs to depend on libgtkmm-2.4-1c2a
- Right now, the building process depends on libgtkmm-2.4-dev, but when installing the packages and running 'ceph -g' y...
- 11:55 AM Linux kernel client Bug #434: mds: clustered mds pjd failures
- Just saw this again:...
- 11:18 AM Bug #553: Kernelmodule doen't build under Debian lenny
- Ok, a backport-kernel works fine AFAIS. I updated the wiki-page.
- 10:10 AM Bug #553 (Won't Fix): Kernelmodule doen't build under Debian lenny
- Unfortunately you're going to need to upgrade your kernel if you want the in-kernel client. Using the backports branc...
- 09:52 AM Bug #553 (Won't Fix): Kernelmodule doen't build under Debian lenny
- Hello all,
the wiki-page [1] says that ceph runs under Debian lenny, but as far as I see that is not true because th... - 11:16 AM Linux kernel client Bug #554 (Resolved): clustered mds: max_size not updated
- 3 mds, export thrashing, dbench 1 hang waiting on max_size.
- 04:52 AM Revision e27f111f (ceph): mds: preserve stale state on import; some cleanup
- Our new invariant is that MIX_STALE always implies is_stale(). And on
import, if is_stale(), MIX becomes MIX_STALE. ... - 12:08 AM Revision a582345c (ceph): Merge branch 'mix_stale' into unstable
- 12:06 AM Revision 4126d1ce (ceph): mds: add more verify_scatter asserts
- For catchings fragstat errors sooner.
Signed-off-by: Sage Weil <sage@newdream.net>
11/05/2010
- 10:24 PM Revision ae670c33 (ceph): mds: fix version check on resyncing stale rstat in predirty_journal_par...
- We're resyncing rstat, so check the rstat version (not fragstat!)
Signed-off-by: Sage Weil <sage@newdream.net> - 07:45 PM Revision 4cee6ead (ceph): mds: Fix bad inode deref.
- Accidentally trying to print out the CInode after removing it in trim_non_auth!
Move the print to before it's been un... - 07:20 PM Revision 93344fb2 (ceph): Revisit std::multimap decoder
- Previously I changed the std::multimap decoder to minimize the number of
constructor invocations. However, it could b... - 06:34 PM Revision f015c989 (ceph): autogen.sh: check for pkg-config
- To avoid seeing confusing errors later in the configure process, in
autogen.sh, check to make sure the pkg-config pro... - 05:57 PM Revision fd397aba (ceph): PG.cc: build_scrub_map now drops the PG lock while scanning the PG
- build_inc_scrub_map scans all files modified since the given
version number and creates an incremental scr... - 05:38 PM Revision 989fa67d (ceph): mds: preserve version when recovering rstat from dirfrag in predirty_jo...
- We don't want to screw up the version here. This aligns the code with
other instances of this check.
Signed-off-by:... - 02:50 PM Linux kernel client Bug #552 (Resolved): Samba with kernel oplocks=on produces lots of corrupt mds entries in dmesg
- With kernel oplocks = yes, samba fills up dmesg with those
[ 4472.504211] ceph: problem parsing dir contents -5
[... - 01:56 PM Linux kernel client Bug #434: mds: clustered mds pjd failures
- Sage has taken over the clustered MDS stuff for now, so here's the bug!
- 01:55 PM CephFS Feature #495: mds: add MIX_STALE
- 01:36 PM Bug #521: objecter: crash in osdmap assert
- 01:02 PM CephFS Bug #551 (Can't reproduce): cfuse crash on quick mds restart
- Program terminated with signal 11, Segmentation fault.
#0 0x00000000004704ad in Client::kick_flushing_caps (this=0x... - 12:29 PM Bug #550: mon: PGMonitor::update_from_paxos()
- While I thought it wasn't related to the MDS issue i'm seeing, it might seem it is:...
- 12:11 PM Bug #550 (Can't reproduce): mon: PGMonitor::update_from_paxos()
- One of my monitors crashed, got this backtrace:...
- 10:59 AM Linux kernel client Bug #549: bonnie++ file stat failure
- Terri, can you have the qa machiens loop through _just_ the bonnie++ command he's having problems with? Something li...
- 10:57 AM Linux kernel client Bug #549 (Resolved): bonnie++ file stat failure
- From ML:...
- 10:49 AM Bug #531: Journaling Causes System Hang
- Hello,
1) Correct we are running transparent 10GbE
2) From what I can tell monitoring dstat across the cluster ... - 10:14 AM CephFS Feature #91: mds: up:shadow mode
- Update the journaler interface to allow the MDS to 'tail' the journal... periodically check to see if it's been exten...
- 10:10 AM CephFS Feature #548 (Resolved): mds: shadowreplay one-shot mode
- Make sure the current mechanism still works. Clean it up if needed.
- 09:19 AM CephFS Subtask #547 (Resolved): mds: define fsck strategy, required metadata
- 09:19 AM CephFS Feature #340 (Closed): large directories, directory fragmenting
- 09:19 AM CephFS Feature #519 (Closed): mds: dirfrag merge
- 06:20 AM Revision 9586e905 (ceph): mds: restructure finish_scatter_gather_update()
- Separate behavior into two dimensions: whether or not we are updating
the dirfrag, and whether or not the dirfrag is ... - 06:15 AM Revision 669a8afa (ceph): mds: do not bump scatter stat lock in predirty_journal_parents
- If we're in the MIX state, we clearly can't touch this without screwing up
the delicate scatter/gather behavior. If ... - 05:48 AM Revision 663b470f (ceph): mds: mark scatterlock stale on import of stale frag scatter stat
- When the lock scattered, if we didn't have an auth frag that was frozen,
we go into MIX state. Later, we may import ... - 05:44 AM Revision 63c1ad84 (ceph): mds: match bottom half of assilate_dirty_rstat_inodes with a dir flag
- We only do the assimilate_dirty_rstat_inodes if we do an update AND the
frag rstat was non-stale, but the bottom half... - 05:19 AM Revision 9b6d96e9 (ceph): mds: fix inode version used for inest in decode_lock_state
- We need to pass the inode rstat's version into finish_scatter_update, not
the shadowed local variable. Otherwise we ...
11/04/2010
- 11:22 PM Revision 62716aa7 (ceph): PGMonitor::update_from_paxos: check for bad input
- Be more robust against bad data coming in from the network.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 11:04 PM Linux kernel client Bug #546: direct i/o does not work when offset is not page-aligned
- Attached is the testing file.
- 10:55 PM Linux kernel client Bug #546 (Resolved): direct i/o does not work when offset is not page-aligned
- When opening file with O_DIRECT, seeking to offset 6656 and reading 512 bytes gets wrong data.
Below is a strace log... - 09:33 PM Revision 8f3672dc (ceph): Replace sprintf with snprintf
- Replace sprintf with snprintf. This is especially critical when the
format string includes "%s".
Signed-off-by: Coli... - 09:26 PM Revision 56179d12 (ceph): start_profiler/enable_profiler_options:fix memleak
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:11 PM Revision e6a751bd (ceph): Set HEAP_PROFILE_INUSE_INTERVAL based on conf
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 09:09 PM Revision 8c8bfdb3 (ceph): CInode::make_path_string: don't coerce ino
- CInode::make_path_string: don't coerce the inode number to 32-bits.
Everyone else is treating it as 64 bits; this fun... - 08:17 PM Revision f23ba003 (ceph): mds: verify single frag rstat on projection too
- Currently we do a sanity check on gather; do the same check in
project_rstat_frag_to_inode().
Signed-off-by: Sage We... - 08:17 PM Revision 53f6ed16 (ceph): mds: mds debug scatterstat to print out projected rstat/fragstat
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:58 PM Revision 4df92ade (ceph): Merge branch 'dumpjournal' into unstable
- 06:41 PM Revision d3c2b9cb (ceph): cmds: Include journal dumper functionality.
- 06:41 PM Revision e0a5de25 (ceph): dumper: Add new Dumper class.
- This lets you dump an MDS journal to a file.
- 06:33 PM Revision 28f956ae (ceph): mds: fix optional frag asserts
- We want these to trigger when mds_verify_scatter is true. Only one !.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:28 PM Revision 86d6e51e (ceph): objecter: add new wait_for_osd_map function.
- 06:13 PM Revision 8a41d096 (ceph): osd: clean up active <-> booting state transitions
- Among other things, get rid of the 'wrongly marked down' log message on
normal startup.
Signed-off-by: Sage Weil <sa... - 05:24 PM Revision f917df79 (ceph): TestEncoding: count number of ctor invocations
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 12:00 PM Feature #456 (Resolved): make dumpjournal functionality usable
- Pushed to branch dumpjournal, and merged into unstable in commit:4df92adea46730bdb7cb88290203cad2369ed895.
Tested ... - 10:57 AM Bug #544: ceph-0.22.2: fails to build with --as-needed
- Sage Weil wrote:
> Thanks! Can I add your Signed-off-by to this (as per SubmittingPatches)?
Sure. - 10:04 AM Bug #544: ceph-0.22.2: fails to build with --as-needed
- Thanks! Can I add your Signed-off-by to this (as per SubmittingPatches)?
- 12:57 AM Bug #544 (Resolved): ceph-0.22.2: fails to build with --as-needed
- Due to wrong linking order[1] of ceph's libraries, whole package fails to build with LDFLAGS="-Wl,--as-needed".
Supp... - 10:25 AM Bug #531: Journaling Causes System Hang
- Brian, can you give us a few more details about your cluster and the performance drop you're seeing here? Specific qu...
- 10:09 AM Bug #538: Write performance does not scale over multiple computers
- Ed Burnette wrote:
> I'll try that if I can the servers to stay up long enough. ceph -w is swamped with chatter abou... - 09:56 AM CephFS Feature #545 (Resolved): mds: use bloom filter to supplement dirfrag COMPLETE flag
- Currently we need the complete flag (or a cached negative dentry) to conclude a name does not exist in a frag before ...
- 05:28 AM Revision 1c934ebd (ceph): mds: wait for last_failure_osd_epoch before starting journal replay
- This is extremely important, and it forces the MDS to get the osdmap that
includes the blacklist entry for its predec... - 05:28 AM Revision e90a3b62 (ceph): mds: dump corrupt events; optionally skip them
- If we encounter a bad event in the journal, dump it to the log.
Optionally skip it, if 'mds log skip corrupt events ... - 05:28 AM Revision f5112866 (ceph): mon: blacklist and update last_failure_osd_epoch in all failure paths
- This includes the pure failure in do_stop(), and the explicit admin
fail command.
Signed-off-by: Sage Weil <sage@new... - 05:28 AM Revision 6345fcda (ceph): mon: update mdsmap.last_failure_osd_epoch when blacklisting
- We need to note the osdmap epoch the taking-over mds needs in the mdsmap.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:28 AM Revision 0fb22974 (ceph): mds: add last_failure_osd_epoch to extended section of mdsmap
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:00 AM Revision c4e56e9a (ceph): MonClient: start SafeTimer in MonClient::init()
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:55 AM Revision 8f33a415 (ceph): cosd: start SafeTimer in OSD::init()
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision 1cf5bc74 (ceph): cephtool: fix timer init/destruction
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision 2c7d293d (ceph): vstart.sh: turn on MDS debugging
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision e4853fa8 (ceph): SafeTimer: delete contexts under the event_lock
- SafeTimer: delete contexts under the event_lock.
Also add more debug printouts and create two convenience functions.
... - 04:40 AM Revision b0e73746 (ceph): TestTimers: add test for out-of-order timer insert
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision 0b9f2e23 (ceph): Timer: add verbose debugging when debug timer = 20
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision 124d287a (ceph): Monitor: start timer thread in init(), not ctor
- Don't start the SafeTimer when class Monitor is created. We want to hold off on
starting the thread until SimpleMesse... - 04:40 AM Revision e6b8dbae (ceph): Timer: fix timer shutdown, efficiency issues
- Rework Timer and SafeTimer to be more efficient and to handle shutdown
correctly. Document the API, especially what l... - 04:40 AM Revision cd316651 (ceph): TestTimers: call common_init and parse argv
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision d840e4f0 (ceph): TestTimers: test cancelling single events
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision 8279f14b (ceph): Timer.cc: clean up debug printouts
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision 571e3750 (ceph): SafeTimer: clean up copy constructor declaration
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:40 AM Revision d3ead43a (ceph): Logger.cc: avoid creating SafeTimer in global-ctor
- Don't create a SafeTimer at global constructor time. Timers
contain a Thread, and the library stuff may not have been...
11/03/2010
- 11:41 PM Revision 0d1bfe06 (ceph): client: print useful max_size waiting message
- Signed-off-by: Sage Weil <sage@newdream.net>
- 11:40 PM Revision fc9059e5 (ceph): Merge branch 'mix_stale' into unstable
- 11:40 PM Revision 4f24fcbc (ceph): debian: add gtk build-depends
- For ceph -g.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:30 PM Bug #479: ceph/mount crash badly when writing
- Hi DongJin,
Any luck on this issue? Has the problem gone away, or do you have time to help us track it down?
T... - 11:27 PM CephFS Bug #478 (Can't reproduce): MDS crash: LogEvent::decode()
- 11:27 PM CephFS Bug #478: MDS crash: LogEvent::decode()
- From the mds dump in the debugpacks, it looks like there were MDS daemons on two different nodes. I'm inclined to ch...
- 10:49 PM CephFS Bug #542 (Resolved): mds journal corruption
- 10:49 PM CephFS Bug #542: mds journal corruption
- commit:1c934ebd6ff3a3a7000671821a12e83c609f1e27
- 10:24 PM CephFS Bug #542: mds journal corruption
- Mystery solved.. this was actually a takeover:
- where the old mds was blacklisted
- new mds probed and read jour... - 09:38 PM CephFS Bug #542 (Resolved): mds journal corruption
- I saw this on the playground.
THe last bit of the replay log:... - 10:49 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- I should have written this at the top of the bug report, but this was on the unstable branch.
Anyway, I'll add mor... - 02:11 PM Bug #535 (Rejected): cephtool hangs forever until a UNIX signal is received
- This occurrence is a problem on the monitor side that reproduces in the timer-fixes branch, but not unstable.
- 10:45 PM Feature #543 (Resolved): PG::search_for_missing: don't iterate over all missing
- PG::search_for_missing processes a replica's missing map to determine if it has any objects that we need.
If the m... - 09:47 PM Revision fd57f4de (ceph): mds: fix put_xlock() assert for slave masters
- If we are a master of a slave, the state will be LOCK.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:47 PM Revision d0c29d7d (ceph): mds: add 'mds verify scatter' and re-add some scatter asserts
- Check on ifile and inest gather that stats match single-frag dirs.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:47 PM Revision 563a9ba6 (ceph): mds: finish_scatter_update on auth dirfrags too
- We can update the dirfrag accounted on auth dirfrags at scatter time too.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:47 PM Revision 8b9342c7 (ceph): mds: disable tempsync
- Tempsync is not implemented in the filelock state machine. Never use it,
at lesat for now!
Signed-off-by: Sage Weil... - 09:47 PM Revision 4d669c8c (ceph): mds: request unscatter when MIX_STALE on replica
- This means implementing REQUNSCATTER.
Eventually this should use TEMPSYNC, but that isn't fully implemented yet.
Si... - 09:47 PM Revision a98812f9 (ceph): mds: rename 'mix stale' => 'mix_stale'
- For unambigous debug output
Signed-off-by: Sage Weil <sage@newdream.net> - 09:12 PM CephFS Bug #472 (Resolved): mds: fragstat crash
- 08:08 PM Revision 0e079bc8 (ceph): mds: use helper for scatter dirfrag update; use on local dirfrags
- Any time we scatter is an opportunity to update the dirfrag with the
accounted scatter stat if it is out of date. We... - 07:52 PM Revision 77ec378d (ceph): Add the ps-ceph.sh tool
- This allows you to see at a glance which ceph programs and tools you
have running.
Signed-off-by: Colin McCabe <coli... - 07:19 PM Revision 4e586dd0 (ceph): encoding.h: fix compiler warning
- Fix a compiler warning about an uninitialized variable. Basically, we
used to insert uninitialized values into a std:... - 07:19 PM Revision c98b0268 (ceph): TestEncoding: add templated encode-then-decode fn
- TestEncoding: add a templated encode-then-decode fn that can be used to
test encoding followed by decoding of any typ... - 07:18 PM Revision 84e2da8d (ceph): Create TestEncoding to test serialization code
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 06:07 PM Revision 60c59aed (ceph): mds: add some scatterlock notes
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:03 PM Revision 0dc75a94 (ceph): ceph: remove bad assert for old frag stat
- It's normal for old fragstat info to be mismatched (stat !=
accounted_stat).
Signed-off-by: Sage Weil <sage@newdream... - 05:51 PM Revision 34135185 (ceph): mds: match conditions in finish_scatter_gather_update_accounted
- This needs to match the frozen check in finish_scatter_gather_update.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:12 PM Revision 33268e20 (ceph): mds: handle MIX_STALE on auth too
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:51 PM Revision 14f4d22c (ceph): mds: scatter_info_t ancestor for nest_info_t and frag_info_t
- This will facilitate using generic code for the inest and ifile
scatterlocks.
Signed-off-by: Sage Weil <sage@newdrea... - 04:47 PM Revision cbacc1d4 (ceph): mds: only mark auth dirfrags stale in start_scatter
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:40 PM CephFS Feature #495 (Resolved): mds: add MIX_STALE
- commit:fc9059e5270380c3266f7f958da6a8cc9b042f22
- 04:05 PM CephFS Feature #495: mds: add MIX_STALE
- Sage has been working on this today.
- 03:42 PM CephFS Feature #541 (Resolved): mds: tempsync
- Integrate this into the filelock state machine, and then use it when appropriate (namely, unscatter)
- 08:25 AM Bug #540 (Resolved): CephxClientHandler::handle_response
- Saw this crash today after upgrading to the latest unstable:...
- 07:19 AM Bug #538: Write performance does not scale over multiple computers
- Greg Farnum wrote:
> Just to be clear, do you have all 208 nodes running server daemons and the client? What's your ... - 04:51 AM Revision 44574e86 (ceph): mds: mark scatterlock stale if any auth dirfrags appear stale
- The auth needs to move to MIX_STALE for the same reasons a replica does:
if, on scatter, any dirfrags have an old acc... - 04:49 AM Revision 4a0f7312 (ceph): mds: do not update accounted_*stat if auth and frozen
- The auth can't update a frozen dirfrag for the same reason a replica
can't.
Signed-off-by: Sage Weil <sage@newdream.... - 12:52 AM Revision 839371cc (ceph): osd: Added load threshold for scrub scheduling
11/02/2010
- 11:34 PM Revision 3ae8c001 (ceph): osd: Make a per-pg sched_scrub, and remove non-active accounting from t...
- 11:28 PM Revision 9d1984e8 (ceph): mds: mark scatterlock stale if dir is frozen, not inode
- It's the dir we're auth for and that might potentially be frozen.
Signed-off-by: Sage Weil <sage@newdream.net> - 11:25 PM Revision 4838016d (ceph): Merge branch 'unstable' into mix_stale
- 10:26 PM Revision e304a245 (ceph): rados: benchmark using unique object names
- Include hostname and pid in object name, so that instances running on
different hosts write to unique objects.
Signe... - 09:37 PM Revision 38f96c65 (ceph): debian packaging: set --sbindir=/sbin
- We want mkcephfs and mount.ceph to be under /sbin.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net> - 08:00 PM Revision 68f7fede (ceph): config: fix sigsegv handler
- Fixed this with sigabrt, forgot to do sigsegv too.
See 7a688a9f999a6b9d3bcdcbebbd8cd984afc70e31.
Signed-off-by: Sag... - 06:10 PM Revision 235aa1c3 (ceph): filestore: disable 'filestore btrfs snap' when SNAP_DESTROY is missing
- We want to enable the new snap stuff by default. But we also want to work
with the default configuration on old kern... - 05:43 PM Revision 4cfd198c (ceph): Makefile.am: include the libcrush headers when installing
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 05:10 PM Revision abb0b6d9 (ceph): Merge branch 'testing' into unstable
- 05:09 PM Revision 5310ab6e (ceph): uclient: Warn on truncate_[size|seq] changes for non-file inodes.
- 05:09 PM Revision 630db2a9 (ceph): mds: Init system CInodes to have a truncate_size of -1.
- This should help with bug #518.
- 05:09 PM Revision 524c8903 (ceph): client: match initialization with mds
- (see Server::prepare_new_inode())
Signed-off-by: Sage Weil <sage@newdream.net> - 05:09 PM Revision 20e8a451 (ceph): client: only do truncate on regular files
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:45 PM Revision 905ff763 (ceph): debian: add pkg-config as build-depends
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 03:34 PM Bug #535 (In Progress): cephtool hangs forever until a UNIX signal is received
- Okay, got in on a hang. The Pipe's been doing a disconnect/reconnect loop for about 4 minutes, it's currently in stat...
- 03:24 PM Bug #538: Write performance does not scale over multiple computers
- Oh, there is another issue: the rados bench command always writes to objects "Object %d". So all of your nodes are w...
- 12:32 PM Bug #538: Write performance does not scale over multiple computers
- Just to be clear, do you have all 208 nodes running server daemons and the client? What's your configuration look lik...
- 11:55 AM Bug #538: Write performance does not scale over multiple computers
- I also tried setting the target pg count to 4,000 and got about the same numbers as 400, maybe a small amount faster....
- 11:24 AM Bug #538: Write performance does not scale over multiple computers
- I set the target pg count to 400 and tried again. It helped some, up to 2x, but is still slower than I expected:
<... - 10:19 AM Bug #538: Write performance does not scale over multiple computers
- If the benchpool is a new pool you created, the problem is likely that it is too small. By default, new pools have o...
- 08:15 AM Bug #538 (Closed): Write performance does not scale over multiple computers
- I have ceph0.22.1 installed on a cluster of 208 lightly loaded 64-bit Linux nodes (RHEL5.5 ext3). The configuration i...
- 02:48 PM Bug #537: debian/ubuntu: Build system broken after commit
- oh yeah, and thanks for your patches, Wido. Good call with the libcrush headers.
C. - 02:47 PM Bug #537 (Resolved): debian/ubuntu: Build system broken after commit
- should be fixed by 38f96c658dee3e7e26a68a3c57eec2a5d8758e17
cheers,
C. - 10:13 AM Bug #537: debian/ubuntu: Build system broken after commit
- I applied #2, but for #1, we really do want those installed in /sbin (so say the debian/ubuntu guys). That's unfortu...
- 06:45 AM Bug #537 (Resolved): debian/ubuntu: Build system broken after commit
- commit 1dd5042e655b80eae99f002047fe1dfb4cc46120 broke some things when building .deb packages, mainly because the loc...
- 10:46 AM Feature #389: Synchronize header modifications between clients
- Still working on it. Major functionality that was implemented:
- new osd- watch/notify/notify-ack messages
- most... - 10:32 AM Linux kernel client Bug #69: ceph: ffff88001976ba50 auth cap (null) not mds0 ???
- just saw this on ceph1, running commit:2f56f56ad991edd51ffd0baf1182245ee1277a04...
- 10:19 AM Tasks #539 (Resolved): wiki: document pg expansion
- 10:18 AM CephFS Bug #529 (Resolved): Cfuse: Software caused connection abort
- There were a sequence of commits in this, some of which were one step forward and two steps back. The testing branch ...
- 05:51 AM CephFS Bug #529: Cfuse: Software caused connection abort
- I was going to apply the patch to my version but I noted that my src/client/Client.h line 516 already says "truncate_...
- 09:41 AM Bug #536 (Resolved): debian/ubuntu: Add pkg-config as a build dependency
- applied commit:905ff7635297614633175f129f491a83c3b2f314, thanks!
- 02:37 AM Bug #536 (Resolved): debian/ubuntu: Add pkg-config as a build dependency
- When trying to build todays unstable, I got the following message:...
- 05:44 AM Bug #532 (Closed): OSD: repop_queue.front() == repop
- Indeed, my build system was still building the *rc* branch, oops!
- 05:09 AM Revision bc9bc4cb (ceph): init-ceph: make lockfile dir configuration (redhat)
- Reported-by: Ed Burnette <ed.burnette@sas.com>
Signed-off-by: Sage Weil <sage@newdream.net> - 02:31 AM Revision 85ba4f2d (ceph): object.h: const cleanup
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
11/01/2010
- 10:36 PM CephFS Bug #451 (Can't reproduce): mds: replay error
- 10:35 PM CephFS Bug #523: cfuse locks don't wake on mds reconnect?
- This might be the same issue as #535 (which looks to me like it's waiting on tcp_read/poll?).
- 10:33 PM CephFS Bug #529: Cfuse: Software caused connection abort
- Hey Greg, this looks like client truncation stuff again. This was biting me today, almost immediately. These two pa...
- 07:40 AM CephFS Bug #529 (Resolved): Cfuse: Software caused connection abort
- After using ceph for a few minutes it gets into a state where I can no longer access the cfuse mount point. It also s...
- 10:33 PM Revision ee3fc3bd (ceph): osd: Add scrub to the names of scrub scheduling-related things.
- 10:31 PM Revision 993ba1cd (ceph): osd: refactor OSD::sched_scrub
- Take sched_scrub_lock sparingly, and push active/pending accounting to the work queue.
- 10:31 PM Revision 8d200a7d (ceph): osd: Move pending/active scrub accounting into the scrub work queue.
- 10:30 PM Revision 378f84c1 (ceph): osd: Add the rest of infrastructure for scheduling scrubbing
- 10:29 PM Bug #531: Journaling Causes System Hang
- Yeah,
I figured not running with journals wouldn't work right. As long as the block size of the writes is very lar... - 10:22 PM Bug #531: Journaling Causes System Hang
- It's expected that you'll get extremely slow performance without the journal.
I'll work on replicating this in o... - 12:08 PM Bug #531: Journaling Causes System Hang
- I forgot a bit about the setup.
4 x OSD all with journals on separate drives. Each OSD is on a separate system.
B... - 12:00 PM Bug #531 (Resolved): Journaling Causes System Hang
- Hello,
It seems that when doing a large write once the journal fills up the system goes into a state of lock and h... - 10:28 PM Bug #530 (Resolved): No way to override lock file path on RH
- This look okay? commit:bc9bc4cb28376728e5428eff0ddb3ff301831e50
- 07:57 AM Bug #530 (Resolved): No way to override lock file path on RH
- init-ceph checks if /var/loc/subsys exists and if it does, tries to create a lock file there. In my case for various ...
- 10:28 PM Revision 7b68a403 (ceph): osd: add variables to track scrub scheduling
- Add OSD, PG, and config variables to track pending and active scrubs.
- 05:12 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- Colin McCabe wrote:
> While running vstart.sh, I reproduced this bug with debug_ms = 20.
>
> Here's what the outp... - 05:09 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- While running vstart.sh, I reproduced this bug with debug_ms = 20.
Here's what the output was. Since cephtool does... - 04:42 PM Bug #535: cephtool hangs forever until a UNIX signal is received
- > Perhaps this bug is caused by Nagle's algorithm?
>
As Sage pointed out, we're already running with TCP_NODELAY... - 04:31 PM Bug #535 (Resolved): cephtool hangs forever until a UNIX signal is received
- I just saw this twice in a row. cephtool hangs forever until a UNIX signal is received. That seems to break the logja...
- 05:04 PM Revision 3d85a7b9 (ceph): logrotate: separate rule for stat/*.log
- Logrotate seems to ignore the entire rule if any part of the file list
is not found. This happens on nodes with only... - 04:56 PM Bug #533: radostool hang on shutdown
- I think I have a fix for this one.
- 03:48 PM Bug #533 (Resolved): radostool hang on shutdown
- radostool still seems to be hanging from time to time on shutdown.
Sending a signal resolves the issue.
For examp... - 04:53 PM Revision 49153c2c (ceph): osd::PG: Update PG comments
- Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
- 04:50 PM Revision e6df8074 (ceph): test: create test_unfound.sh
- Create test_unfound.sh to test handling unfound objects.
Move more test functions into test/test_common.sh to facili... - 03:53 PM Linux kernel client Feature #534 (Resolved): support CEPH_FEATURE_RECONNECT_SEQ in klibceph
- 02:07 PM Bug #532: OSD: repop_queue.front() == repop
- This problem was in v0.22, but fixed in v0.22.1. Can you try with the latest testing (v0.22.2) or unstable?
- 12:52 PM Bug #532: OSD: repop_queue.front() == repop
- I think I was a bit to premature about that, since osd5 just crash again with the same backtrace....
- 12:45 PM Bug #532 (Closed): OSD: repop_queue.front() == repop
- On two of my OSD's I had the following crash:...
- 03:43 AM Revision 1dd5042e (ceph): fix make distcheck, make uninstall
- Make distclean was failing because make uninstall was broken. (There were
still leftover files after running make ins... - 02:50 AM Bug #462: cephx: verify_authorizer_reply exception in decode_decrypt
- I've just done a fresh mkcephfs on my cluster and then I started to see:...
Also available in: Atom