Activity
From 03/04/2013 to 04/02/2013
04/02/2013
- 11:24 PM Bug #1535 (Resolved): concurrent creating and removing directories crashes cmds
- I think this has been fixed by commit 00025462
- 10:48 PM Bug #1945: blogbench hang on caps
- Sorry for the delay, I didn't noticed the notification. I fixed several bugs that may cause hangs of this type, but I...
- 07:24 PM Bug #4489: ceph fs hangs on file stat
- Hm, snapdirname is something obfuscated (but have no use, actually).
I've got the same error one more time, so I bel... - 06:14 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- Sorry, I mean the mds journal, not the debug logs, when referring to the size.
- 05:12 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- Greg Farnum wrote:
> Strange, it looks like you have an MDS log of about 1236MB, which is...large. What config optio... - 04:28 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- Strange, it looks like you have an MDS log of about 1236MB, which is...large. What config options are you setting?
... - 12:36 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- I changed back to max_mds 1. same result:...
- 09:42 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- I'll check my assumptions today (already downloaded the logs), but with multiple active MDSes this doesn't warrant a ...
- 07:14 AM Bug #4618 (Resolved): Journaler: _is_readable() and _prefetch() don't communicate correctly
- The Journaler has mechanisms to try and read extra data if an event is large enough that it exceeds the current prefe...
- 02:48 PM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
- Merged and pushed to master in commit:3842ff7d677bae98462f7d050f5fda9d85f6273d
- 02:20 PM Bug #4619: mds: anchortable hangs on new cluster
- Code looks good, Sorry for the bug!.
- 01:06 PM Bug #4619 (Fix Under Review): mds: anchortable hangs on new cluster
- recovery_done() breaks on a fresh machine because of the populate_mydir() ordering. The problem is that both recover...
- 09:52 AM Bug #4619 (In Progress): mds: anchortable hangs on new cluster
- Sage said he'd look at the double-send as well.
- 09:27 AM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
- commit:968c6c0c9408b33904041e5ddbd9ea738e831713
- 09:13 AM Bug #4619: mds: anchortable hangs on new cluster
- I think this isn't correct. If we restart the table server MDS, it will send two ready messages to the table client. ...
- 09:02 AM Bug #4619: mds: anchortable hangs on new cluster
- Code looks good, assuming the tests run.
Sorry about that! :( - 08:15 AM Bug #4619 (Fix Under Review): mds: anchortable hangs on new cluster
- wip-4619
- 08:14 AM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
- 02:30 PM Bug #4621 (Rejected): failed pjd chown/00.t 124
- Okay, all symlink attempts that made it to the MDS were successes, and I can't find any failed ceph-fuse symlink/ll_s...
- 01:59 PM Bug #4621: failed pjd chown/00.t 124
- Sorry, not an lchown, just a symlink create.
- 01:29 PM Bug #4621: failed pjd chown/00.t 124
- Well, it's always an adventure to figure out which one is busted, but it looks to be an lchown on a symlink failing. ...
- 09:30 AM Bug #4621 (Rejected): failed pjd chown/00.t 124
- 2013-04-02T09:04:34.029 INFO:teuthology.task.workunit.client.0.out:../pjd-fstest-20090130-RC-open24/tests/chown/00.t ...
- 02:27 PM Feature #4630 (New): make lchown work in ceph-fuse for pjd
- pjd doesn't believe that ceph-fuse supports lchown. Maybe this is pjd's fault; maybe it's ours. Figure out why so tha...
- 11:49 AM Documentation #2206: Need a control command to gracefully shutdown an active MDS prior to planned...
- This is partially documented by 0c16b31db7a5ed72a9c306ae91b191c326d0776a on github.
04/01/2013
- 03:18 PM Bug #3266: "ceph mds tell 0 dumpcache /etc/passwd" is not cool
- Before anybody embarks on solving this, I assume there's a standard way to handle this by outlawing certain kinds of ...
- 01:23 PM Bug #2657: kclient: direct io write larger than 8MiB fails
- in testing, there is now a test workunit
- 01:23 PM Bug #2657 (Resolved): kclient: direct io write larger than 8MiB fails
- 01:22 PM Bug #4434 (Resolved): looping waiting for quorum after upgrade
- Whoops@!
- 01:14 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- I'll look into the code around this today.
- 11:03 AM Bug #4489: ceph fs hangs on file stat
- Why are you specifying the snapdirname to that weird value when mounting this?
- 11:00 AM Bug #4405: MDCache::populate_mydir can loop forever
- This dump has 1063591 inodes in the cache, of which only 122104 are non-stray. That doesn't seem quite right.
I do... - 09:37 AM Bug #4590 (Resolved): ceph-fuse: fsx fails with 'client oc = false'
- commit:c01e2e42f368ca003e03debe9a7bd5f12eb79d2c
03/31/2013
- 10:33 AM Bug #4601 (Can't reproduce): symlink with size zero
- Somehow I got into a situation in which a number of symlinks, all of them created and later modified at about the sam...
03/29/2013
- 09:05 PM Bug #4590 (Resolved): ceph-fuse: fsx fails with 'client oc = false'
- ...
- 03:22 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
- Oh, yeah, we can do the same in the userspace client. I'll do that and re-push. Thanks Yan!
- 03:12 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
- FYI:
The kclient deals with this case by calling wake_up_session_caps(). It just clear i_wanted_max_size/i_requested... - 01:04 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
- I believe those are okay as truncate size changes should end up actually journaled (as setattrs) so they'll be replay...
- 12:58 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
- I spent most of this morning figuring out if it made sense to send the full cap (ceph_mds_caps -- and get rid of the ...
- 12:31 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
- I'm not sure this is wrong, but it's confusing me a bit. I thought that the Client sent all capabilities it holds bac...
- 12:14 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
- I just pushed wip-4582. Testing it on the fsstress test with mds_thrasher now. I'm not positive this is the right a...
- 11:53 AM Bug #4582 (In Progress): mds: Client hang on fsstress with mds_thrasher
- 11:53 AM Bug #4582 (Resolved): mds: Client hang on fsstress with mds_thrasher
While trying to reproduce #4565, fsstress eventually hangs where the client is waiting for a max size update that t...- 01:55 PM Feature #4583 (Resolved): libcephfs: add test that kills a client and verifies mds cleans it up
- 01:28 PM Feature #4022 (In Progress): client: qa: test non-cached operation (force sync mode)
- 01:24 PM Fix #4191 (Resolved): qa: mulitiple mds in nightly (non-failure case)
- 11:31 AM Bug #4578 (Resolved): client: hangs on unlink
- 11:16 AM Bug #4578: client: hangs on unlink
- This patch solves the problem :)
- 12:51 AM Bug #4578: client: hangs on unlink
- yes, patch is also attached
- 11:11 AM Feature #4442 (Resolved): java: add topology API support
- Err, forgot to close. Thanks. ebc3abaf6dc62678f5ef5914862e9d8f216fffbf
- 11:05 AM Feature #4442: java: add topology API support
- I think this already got reviewed and merged, right? Or is there something else we need?
- 11:02 AM Bug #4569 (Resolved): ceph-mds: segfault
- commit:4f8ba0e7756a1b0647867db0e9b5549b3e82f6b1 in master. This wasn't a bug in any released versions, so no backports.
- 10:50 AM Bug #4569: ceph-mds: segfault
- In case it matters at all, the segfault was happening when I was furiously sigterm'n my hung-on-unlink client.
- 10:33 AM Bug #4569: ceph-mds: segfault
- Yep, the problem here is that the Session was created during replay and it never had a Connection associated with it ...
- 10:20 AM Bug #4569: ceph-mds: segfault
- In the logs the session in question is one that failed to reconnect. Was there a different event that caused the MDS ...
03/28/2013
- 08:47 PM Bug #4578 (Resolved): client: hangs on unlink
- Looks like somebody accidentally deleted #4570 (and there's no undelete in Redmine best I can tell), so this ticket w...
- 06:58 PM Feature #4576 (Rejected): java: support ByteBuffer interface for NIO and NIO.2 high-perf I/O
- ByteBuffer interface in NIO avoids needless copying, and is used by NIO.2 and the new VFS infrastructure in Java 7. T...
- 10:21 AM Bug #4569: ceph-mds: segfault
- It looks like the session is getting closed because its stale, and then killed, but the session->connection field pas...
- 10:00 AM Feature #4354 (In Progress): mds: add an equivalent to the OSD OpTracker
- 07:31 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- Update on trying to track this down...running this test in teuthology, I don't hit the same assertion, but I do see t...
03/27/2013
- 09:23 PM Bug #4308 (Won't Fix): ceph-fuse crashed during blogbench test (argonaut)
- this is most likely memory corruption in argonaut's ceph-fuse.
- 09:21 PM Bug #4564 (Resolved): client: Close session doesn't wait for outstanding requests
- 09:09 AM Bug #4564 (Fix Under Review): client: Close session doesn't wait for outstanding requests
- Pushed a fix to wip-4564.
- 07:13 AM Bug #4564 (Resolved): client: Close session doesn't wait for outstanding requests
Ran into another failure related to testing #4451 on the client where the following occurs:
client sends create/...- 11:45 AM Bug #4569 (Resolved): ceph-mds: segfault
- I started receiving this segfault in ceph-mds with the latest master today....
- 09:35 AM Bug #4565 (Resolved): MDS/client: issue decoding MClientReconnect on MDS
- ...
- 08:26 AM Bug #4539 (Resolved): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_...
- commit:295c92c
- 07:47 AM Bug #4539 (Fix Under Review): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::stand...
- Yep. There's no state bit, and the cache is unchanged by the backtrace updates list. The standby mds is free to cle...
- 08:04 AM Bug #4555 (Resolved): The CephFileSystem class is missing the createNonRecursive method
- 0a5175722a8444579715c1871c09c246969e7890
03/26/2013
- 10:22 PM Bug #4539: include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_segments()
- I think this is as simple as...
- 04:04 PM Feature #4277 (Closed): Move built hadoop artificats to download URL
- 04:03 PM Feature #4277: Move built hadoop artificats to download URL
- For now, we're manually posting Hadoop bindings to http://ceph.com/download/. I'll close this for now and we can revi...
- 03:13 PM Bug #4530 (Resolved): client: Assert failure on session close
- 10:19 AM Bug #4530: client: Assert failure on session close
- I went to file another bug for the client reconnect triggering a session close, but the log indicates that its not ac...
- 02:23 PM Fix #4191 (In Progress): qa: mulitiple mds in nightly (non-failure case)
- 12:09 PM Bug #4555 (Resolved): The CephFileSystem class is missing the createNonRecursive method
- This is needed by HBase
There is a pull request here: https://github.com/ceph/hadoop-common/pull/1 - 11:45 AM Feature #2144: mon: improve mds health checks
- commit:1baf66b
- 11:45 AM Feature #2144 (Resolved): mon: improve mds health checks
- 11:11 AM Bug #4545 (Can't reproduce): error creating empty object store. Invalid argument.
- 09:33 AM Bug #4545: error creating empty object store. Invalid argument.
- i've seen this regularly in the qa runs over the last week or so
- 09:41 AM Bug #4537 (Resolved): mds: hang on rmdir, unlink
- 07:04 AM Bug #4537 (Fix Under Review): mds: hang on rmdir, unlink
- Fix pushed to wip-4537.
03/25/2013
- 07:23 PM Bug #4405: MDCache::populate_mydir can loop forever
- Ok, I did
ceph mds tell 0 dumpcache /tmp/dump.txt
http://91.226.13.93/dump.txt.gz - 07:16 PM Bug #4405: MDCache::populate_mydir can loop forever
- ...
- 09:20 AM Bug #4405: MDCache::populate_mydir can loop forever
- If you run "ceph mds 0 dumpcache <filename>" then the MDS will dump everything it has in cache to the filename you sp...
- 01:37 PM Bug #4537 (In Progress): mds: hang on rmdir, unlink
- 01:06 PM Bug #4530 (Fix Under Review): client: Assert failure on session close
- I pushed some fixes to wip-4530 for the client side part of this. Needs review.
- 09:49 AM Bug #4530 (In Progress): client: Assert failure on session close
- 12:57 PM Bug #4545: error creating empty object store. Invalid argument.
- Alright, I no longer think the apache2 signature is related. This seems like a proper bug in its own right.
- 12:04 PM Bug #4545: error creating empty object store. Invalid argument.
- This may be failing due to a package signing issue that I thought had been resolved. I'll hold onto this ticket until...
- 12:02 PM Bug #4545: error creating empty object store. Invalid argument.
- Added the yaml file I was using (needs 3 locked hosts) and the teuthology output as attachments.
- 12:00 PM Bug #4545 (Can't reproduce): error creating empty object store. Invalid argument.
- While running a teuthology test, mkcephfs failed with this error:
INFO:teuthology.task.ceph:Running mkfs on osd node... - 09:47 AM Bug #4517 (Resolved): ceph_rename fails success case
03/24/2013
- 05:34 PM Bug #4537: mds: hang on rmdir, unlink
- ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2501
cro... - 05:33 PM Bug #4537: mds: hang on rmdir, unlink
- ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2503
<pr... - 03:18 PM Bug #4537: mds: hang on rmdir, unlink
- similar hang:...
- 02:41 PM Bug #4537 (Resolved): mds: hang on rmdir, unlink
- ...
- 03:22 PM Bug #4539: include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_segments()
- ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2410
- 03:22 PM Bug #4539: include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_segments()
- also ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2414
- 03:22 PM Bug #4539 (Resolved): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_...
- ...
- 11:16 AM Bug #4536 (Resolved): hadoop: receiving unexpected filenotfound exceptions
- Fixed by 150e914c7549f7197eff9fe980abd17a921799ce
03/23/2013
- 12:39 PM Bug #4517: ceph_rename fails success case
- All working fine now. Thanks
- 12:19 PM Bug #4517: ceph_rename fails success case
- Thanks for testing/reporting that Noah. That commit last night was bogus. Pushed wip-4517b.
- 11:04 AM Bug #4517: ceph_rename fails success case
- I'm testing this branch, and I'm getting a segfault running the LibCephFS.Rename test....
- 11:53 AM Bug #4536 (Resolved): hadoop: receiving unexpected filenotfound exceptions
- Jobs have started failing with the following trace....
03/22/2013
- 03:17 PM Feature #4535 (New): mds: add group usage statistics gathering to the MemoryModel
- Once we've updated our MemoryModel (#4502, #4503) and have selected groups of in-memory data that we believe we can s...
- 02:09 PM Bug #4517: ceph_rename fails success case
- Indeed. Updated wip-4517.
- 01:49 PM Bug #4517: ceph_rename fails success case
- I'm just skimming this in the middle of a meeting, but it looks like we're now failing the rename if the destination ...
- 01:31 PM Bug #4517 (Fix Under Review): ceph_rename fails success case
- 12:21 PM Bug #4517 (In Progress): ceph_rename fails success case
- 01:27 PM Feature #4442 (Fix Under Review): java: add topology API support
- 09:42 AM Bug #4530 (Resolved): client: Assert failure on session close
During testing of #4451:
../../src/common/Cond.h: In function 'int Cond::Signal()' thread 7fe04c36f700 time 2013...
03/21/2013
- 12:52 PM Bug #4517 (Resolved): ceph_rename fails success case
- ceph_rename has started returning -ENONET in the common case (source path exists, dest path doesn't exist). In the cl...
03/20/2013
- 10:50 AM Bug #4405: MDCache::populate_mydir can loop forever
- 1) I don't use filesystem snapshots at all.
2) I really have 3 big directories with 40000 files total
3) Some days ... - 09:40 AM Bug #4405: MDCache::populate_mydir can loop forever
- Sorry this got dropped on the floor. I found the problems.
The MDS never finishes the "populate_mydir()" function ... - 09:45 AM Bug #4451: client: Ceph client not releasing cap
- Uploaded an annotated log with only the lines related to the inode exhibiting the problem. The problem occurs from t...
03/19/2013
- 11:35 PM Bug #4489: ceph fs hangs on file stat
- And MDS reload doesn't fixed problem until I rebooted one of FS clients.
- 11:34 PM Bug #4489: ceph fs hangs on file stat
- Oh, sorry for that. It seems, I failed with log. I will attach correct log next time problem persist. But the problem...
- 02:05 PM Bug #4489 (Need More Info): ceph fs hangs on file stat
- That log is from a standby MDS. You'll need to provide the log of the active MDS for us to do anything with it. :)
- 10:38 AM Feature #4504 (Resolved): mds: trim based on total memory usage
- Right now the MDS only trims based on the count of the dentry cache. We should, based on a config option, optionally ...
- 10:21 AM Feature #4503 (New): mds: MemoryModel: include the different boost::pools we use
- We use a different boost::pool for each of CDir, CDentry, CInode, Capability. Include these pools, and any others we'...
- 10:19 AM Feature #4502 (New): mds: Make the MemoryModel useful
- Right now the MDCache's MemoryModel is trying to parse out usage from /proc/self/status. Switch it to use tcmalloc's ...
- 10:08 AM Tasks #4499: Identify fields in CInode which aren't permanently necessary
- Also, a small one but one that's everywhere: each of the classes in this sequence of bugs has an MDCache pointer. Pro...
- 09:43 AM Tasks #4499 (Resolved): Identify fields in CInode which aren't permanently necessary
- There are a number of fields in CInode that we don't always need. Examples include everything involved with projectio...
- 10:01 AM Cleanup #89 (Closed): mds: put inode dirty fields in dirty_bits_t to reduce memory footprint
- This is a less-specific duplicate of #4499 now.
- 10:01 AM Feature #4501 (Resolved): Identify fields in CDir which aren't permanently necessary
- The CDir has some machinery for handling things like dirty data that isn't always necessary. Audit it for these membe...
- 09:57 AM Tasks #4500 (New): Identify fields in CDentry which aren't permanently necessary
- CDentry is in far better shape than CInode in this regard, but audit it for things which we don't always need in memo...
- 09:16 AM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
- cdca0babf9145a8f6e7613ab7026cf0968b3bc91
- 09:00 AM Feature #3540 (Resolved): mds: maintain per-file backpointers on first file object
- 8b798867731d298c05d9f93b0c207a541d2b5e90 merged to master
03/18/2013
- 09:07 PM Bug #4491 (Resolved): mds: assert failure on _purge_forward_pointers
- 03:37 PM Bug #4491 (Fix Under Review): mds: assert failure on _purge_forward_pointers
- I pushed a proposed fix to wip-4491. Basically we just need to handle the case that the osd returns ENODATA.
- 03:03 PM Bug #4491: mds: assert failure on _purge_forward_pointers
- This happens soon after ceph-fuse mount. I hit this when trying to run blogbench test.
- 02:47 PM Bug #4491 (Resolved): mds: assert failure on _purge_forward_pointers
Joe Buck reported a bug with master:
INFO:teuthology.task.ceph.mds.0.err:mds/MDCache.cc: In function 'void MDCac...- 06:03 PM Bug #4489: ceph fs hangs on file stat
- and no other specific events were that moment (like scrubbing, osd/mds/mon failures).
- 06:02 PM Bug #4489: ceph fs hangs on file stat
- no, it started early than #4486, so the reason is another one.
- 06:00 PM Bug #4489: ceph fs hangs on file stat
- I think it could be connected with #4486, because, I found about 150 launched cron tasks and every task is launched t...
- 05:46 PM Bug #4489: ceph fs hangs on file stat
- Wrapping cron.d code....
- 05:44 PM Bug #4489: ceph fs hangs on file stat
- code, which caused hung running on two hosts:...
- 10:20 AM Bug #4489: ceph fs hangs on file stat
- can provide shell access to one of servers but don't know if it can be reproduced easily.
- 10:17 AM Bug #4489 (Can't reproduce): ceph fs hangs on file stat
- hi. I have cephfs (kernel client) mounted from two hosts at /var/www.
I'm trying to do... - 05:46 PM Feature #1448: test hadoop on sepia
- Are nodes available for scale testing? Issdm cluster is withering away..
- 05:42 PM Feature #4484 (Resolved): Enable Hadoop bindings to pull configuration options from the monitor
- 04:28 PM Feature #4494 (New): qa: exercise recovery from migration points
- In #4493 we checked recovery in an MDS cluster. Now we need to check recovery following each kill point involved in m...
- 04:23 PM Feature #4493 (New): qa: trigger each kill_at point related to clustered recovery
- Write a workunit using the restart teuthology task interface that handles running several MDS daemons and fully exerc...
- 04:18 PM Tasks #4492 (New): mds: Define kill points involved in clustered migration and recovery
- We need to define all the separate points at which a break in 1) clustered recovery and 2) migration leaves a differe...
- 09:55 AM Bug #4434: looping waiting for quorum after upgrade
- Yep! This says that you ran a branch which included an unreleased set of encoding rules on the MDS which would have c...
03/17/2013
- 12:12 PM Feature #4484 (Fix Under Review): Enable Hadoop bindings to pull configuration options from the m...
- 12:12 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
- ceph.git wip-4484
hadoop-common.git cephfs/wip-4484 - 11:10 AM Feature #4485 (New): Improve "needsrecover" handling
- Jim Schutt reported issues on the mailing list[1] with slow stats that turned out to be due to inodes with the "needs...
03/16/2013
- 03:50 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
- I'd lean towards keyring files, but we may want to float this on Monday's stand-up.
- 03:18 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
- Seems like there are keyring files, secret strings, client usernames, etc… Which one(s) should we use?
http://ceph... - 03:10 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
- This replaces the current approach, which assumes that the host with the JobTracker (likely not an OSD and possibly n...
- 02:59 PM Feature #4484 (Resolved): Enable Hadoop bindings to pull configuration options from the monitor
- At present, the Hadoop bindings require several options be specified in xml files.
It would be easier for users if ...
03/15/2013
- 02:54 PM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
- I'm going to need to dig into why it doesn't seem to be finishing, but I think it might be exposing some (more) file ...
- 10:51 AM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
- The wip-samba-on-ceph branch has "samba", "cifs-mount", and "smbtorture" tasks.
I notice that smbtorture on ceph-f... - 02:23 PM Bug #4451: client: Ceph client not releasing cap
- Looked at this again briefly. I notice:
1) the inode was previously in the stray directory (before MDS restart)
2) ... - 09:31 AM Bug #4451: client: Ceph client not releasing cap
- For some reason the MDS is sending back an "export" on the caps for that inode (timestamp 2013-03-15 09:07:38.098273)...
- 08:59 AM Bug #4451 (Resolved): client: Ceph client not releasing cap
I'm occasionally hitting a hang in my backtrace testing, where unmount never completes. The client log shows a dis...- 01:56 PM Tasks #4467 (New): qa: make ior tasks work
- 01:42 PM Fix #3630 (Resolved): mds: broken closed connection cleanup
- 01:38 PM Fix #4286: SLES 11 - cfuse: disable 'big_writes'and 'atomic_o_trunc
- also, the invalidate callback code probably needs to be conditional, too!
- 01:35 PM Fix #2215 (Resolved): ceph-fuse does not invalidate page cache
- Sage is turning it on by default now following weeks of testing in the nightlies!
- 01:33 PM Feature #2903 (Resolved): ceph-fuse: Support -o noallow_other
03/14/2013
- 03:51 PM Bug #4434: looping waiting for quorum after upgrade
- This is what was captured at the time the test was run successfully: Ceph Version: 0.57-667-g6a9cda7
The next inst... - 10:12 AM Bug #4434: looping waiting for quorum after upgrade
- Just to make sure I'm tracking these upgrades correctly:
It was created on v0.56.3? (Not a branch.) Then it moved to... - 09:30 AM Bug #4434: looping waiting for quorum after upgrade
- It's quite possible the upgrade was corrupted somewhere along the line. Prior to the issues the system was on 0.56.3...
- 01:30 PM Feature #4441 (Resolved): libcephfs: add ceph_get_osd_addr()
- 01:19 PM Feature #4441 (Resolved): libcephfs: add ceph_get_osd_addr()
- 01:20 PM Feature #4442 (Resolved): java: add topology API support
- 09:53 AM Bug #4358 (Resolved): kclient: ENOENT during kernel build on kclient
- 08:54 AM Bug #4358: kclient: ENOENT during kernel build on kclient
- passed another 100 iterations (modulo a machine lockup on the server side)
03/13/2013
- 08:21 PM Bug #4405: MDCache::populate_mydir can loop forever
- And what's interesting all the time MDS server has incoming traffic of ~40MB/s, but no active clients. I found it aft...
- 08:14 PM Bug #4405: MDCache::populate_mydir can loop forever
- OK, I don't know what do you mean under "start" term. But actually, all the time MDS run with
debug ms =1 and debug ... - 06:52 PM Bug #4405: MDCache::populate_mydir can loop forever
- Hi Ivan-
Looking at the log, it looks like all 3 times the MDS started up it came up within 5 seconds or so. Do y... - 06:14 PM Bug #4358: kclient: ENOENT during kernel build on kclient
- 20 iterations on testing branch. i ran a bunch on master to make sure i could trigger the old bug, but then couldn't...
- 04:40 PM Bug #4390 (Resolved): mds: zapping named mds causes client assertion
- commit:f67596a44739e8071cc97fb0463f37203502faaa
- 04:39 PM Bug #4385 (Resolved): mds: refusing connections with high open socket count
- commit:8b713371447f9761597457af2c81f0b870d3c4ba
- 03:03 PM Bug #4434: looping waiting for quorum after upgrade
- More details? I'm not sure how the title relates to the bug description or MDS log. The log is crashing on the Sessio...
- 02:52 PM Bug #4434: looping waiting for quorum after upgrade
- changed the project
- 02:41 PM Bug #4434: looping waiting for quorum after upgrade
Part of the bug appears to be in ceph, where the following returns an error, causing an infinite loop in get_key():...- 02:36 PM Bug #4434 (Resolved): looping waiting for quorum after upgrade
- How we got here:
Bobtail .56 installed on burnupi60 failed daily upgrade due to new gitbuilder keys.
updated key.
... - 02:25 PM Bug #3640 (Duplicate): kclient: hang and kernel panic
- dup of #3088
- 02:24 PM Bug #3088: NULL pointer dereference at ceph_d_prune
- this code may be gone now with yan's d_prune changes...
- 02:06 PM Bug #1945: blogbench hang on caps
- Yan, would you mind taking a look at this when you have time?
- 02:05 PM Bug #3637: client: not issuing caps for with clients doing shared writes
03/12/2013
- 11:18 PM Feature #4277: Move built hadoop artificats to download URL
- I have a documentation branch pushed up that is waiting for the URLs. Let me know what those are and I can integrate ...
- 11:22 AM Feature #4277 (In Progress): Move built hadoop artificats to download URL
- As a starting point, let's post this on the download page as stand-alone jar files. I'll take ownership of doing that...
- 08:57 PM Bug #4385 (Fix Under Review): mds: refusing connections with high open socket count
- sounds right. thanks for testing!
- 08:25 PM Bug #4385: mds: refusing connections with high open socket count
- Err, "unclean mounts" = "exiting without unmounting"
- 08:23 PM Bug #4385: mds: refusing connections with high open socket count
- Well hot damn. That branch seems to solve two problems. First, clients that do a clean unmount don't leave lots of FD...
- 07:55 PM Bug #4385: mds: refusing connections with high open socket count
- Noah, do you want to try wip-mds-con?
- 07:45 PM Bug #4385 (In Progress): mds: refusing connections with high open socket count
- 01:53 PM Bug #4385: mds: refusing connections with high open socket count
- Although the high counts were because of double counting by lsof, the sockets still are not being closed. Without any...
- 12:32 PM Bug #4385: mds: refusing connections with high open socket count
- Hmm, did we screw up our refactoring work so that replaced sockets are no longer actually closed? That might explain ...
- 12:25 PM Bug #4385: mds: refusing connections with high open socket count
- Here's some more info after investigating this a bit further.
Open socket counts by category after a fresh MDS reb... - 11:52 AM Documentation #4422: Typo on Release Process webpage
- Make that one fewer "are". Got to love making a typo on a ticket about a typo.
- 11:38 AM Documentation #4422 (Resolved): Typo on Release Process webpage
- This sentence (in section 1) needs one less instance of "and":
"The RPM based packages are are built natively, so on...
03/11/2013
- 09:34 PM Feature #4393 (Resolved): Add apache-hadoop gitbuilder to master gitbuilder webpage
- gitbuilder.sepia.com just needed the new gitbuilder added to the proxy config file.
- 05:55 PM Bug #4398: fix kclient_workunit_misc.yaml in the nightlies
- ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21326
- 09:38 AM Bug #4398 (Duplicate): fix kclient_workunit_misc.yaml in the nightlies
- 03:22 PM Feature #4073 (Resolved): qa: add message delay injection to test suite
- 03:20 PM Feature #4190 (Resolved): qa: add mds thrashing to nightly
- 09:47 AM Feature #4326 (In Progress): qa: add samba + (kclient|ceph-fuse) to suite
- 04:42 AM Bug #4405: MDCache::populate_mydir can loop forever
- Log is done when it was stuck last time. I stopped MDS, increased log level and started again.
03/10/2013
- 11:40 PM Bug #4405: MDCache::populate_mydir can loop forever
- Log download link: http://pixeltram.com/ceph-mds.1.log.1.gz
- 07:43 AM Bug #4405: MDCache::populate_mydir can loop forever
- If the stuck startup is reproducible now (by lowering the cache size and restarting), a log with debug ms =1 and debu...
- 12:31 AM Bug #4405: MDCache::populate_mydir can loop forever
- I mounted ceph root and counted amount of files and it's less than default cache size of 100000...
- 12:05 AM Bug #4405: MDCache::populate_mydir can loop forever
- Actually, regarding initial ticket message. I think MDS goes in some kind of LOOP during start, when cache size is sm...
03/09/2013
- 11:59 PM Bug #4405: MDCache::populate_mydir can loop forever
- I think it's important to specify some kind of metrics so everyone could calculate memory utilization of specific cac...
- 11:49 PM Bug #4405: MDCache::populate_mydir can loop forever
- regarding q2:
I increased mds cache size to
mds cache size = 100000000
and it started in seconds.
I don't... - 11:27 PM Bug #4405 (Resolved): MDCache::populate_mydir can loop forever
- I had unusual MDS failure. My server NIC started to flap and as a result (finally)
my CEPH FS started to recover an... - 10:35 PM Bug #4390: mds: zapping named mds causes client assertion
- ran this through the fs suite and it passed. i would expect breakage in mds thrashing and multimds situations, thoug...
- 07:24 AM Bug #4358: kclient: ENOENT during kernel build on kclient
- That might work, as long as we don't need to update the flags and i_release_count atomically... that'd have to become...
- 06:12 AM Bug #4358: kclient: ENOENT during kernel build on kclient
- any idea to fix the locking issue? use atomic bit operation to modify the i_ceph_flags?
03/08/2013
- 05:27 PM Bug #4398: fix kclient_workunit_misc.yaml in the nightlies
- looks like the test failed due to,
2013-03-06T06:38:55.270 INFO:teuthology.task.workunit.client.0.out:
2013-03-06... - 05:16 PM Bug #4398 (Duplicate): fix kclient_workunit_misc.yaml in the nightlies
- log: ubuntu@teuthology:/a/teuthology-2013-03-06_01:00:04-regression-master-testing-gcov/16995...
- 04:34 PM Bug #4385: mds: refusing connections with high open socket count
- Log file fun. Here is the MDS log up until it stopped accepting connections.
http://piha.soe.ucsc.edu/ceph-mds.a.l... - 10:57 AM Bug #4385: mds: refusing connections with high open socket count
- Would you like to logs up to the point that the MDS stops accepts connections, or just a snap shot after the FD list ...
- 10:34 AM Bug #4385: mds: refusing connections with high open socket count
- can you reproduce with debug ms = 20 and debug mds = 20 ? those logs would be helpful
- 10:30 AM Bug #4385: mds: refusing connections with high open socket count
- To Greg's question, it seems as though the connections were not timing out. I'd toss out a rough estimate of about 45...
- 10:29 AM Bug #4385: mds: refusing connections with high open socket count
- Is there anything I can do to get more information for this ticket?
- 09:56 AM Bug #4385: mds: refusing connections with high open socket count
- It might be contributing, but I believe the sockets should still be getting closed after a timeout period, right?
- 09:45 AM Bug #4385: mds: refusing connections with high open socket count
- I bet #3630 is contributing here.
- 07:11 AM Bug #4385: mds: refusing connections with high open socket count
- I had this thought that the set of FDs in the logs would be >> than the set shown in lsof, and that we'd want to cros...
- 02:11 PM Bug #4390: mds: zapping named mds causes client assertion
- pushed wip-4390-b, which solves this on the client side.
i don't really want to delay the mark-down/failing in the... - 08:48 AM Bug #4390: mds: zapping named mds causes client assertion
- That approach was breaking the monitor. Just pushed a new approach that queues the zap for later.
- 06:37 AM Bug #4390 (Fix Under Review): mds: zapping named mds causes client assertion
- 06:37 AM Bug #4390: mds: zapping named mds causes client assertion
- Proposed fix in wip-4390. Should we also cleanup the client code to wait till the mdsmap contains up members? Separ...
- 06:31 AM Bug #4390 (Resolved): mds: zapping named mds causes client assertion
Hit the following assertion on the client with backtrace testing:
../../src/mds/MDSMap.h: In function 'const ent...- 09:29 AM Feature #4393 (Resolved): Add apache-hadoop gitbuilder to master gitbuilder webpage
- I brought a new gitbuilder online at gitbuilder-precise-apache-hadoop-amd64.front.sepia.ceph.com and ran the command ...
- 09:01 AM Bug #4358: kclient: ENOENT during kernel build on kclient
- I hit this today while testing. Sorry, I don't remember
which test but Sage says he knows what happened.
http://pa... - 08:46 AM Fix #2215: ceph-fuse does not invalidate page cache
- Those tests are part of the full regression test suite.
03/07/2013
- 10:32 PM Fix #4286 (In Progress): SLES 11 - cfuse: disable 'big_writes'and 'atomic_o_trunc
- big_write was added in fuse 2.8, sles has fuse version 2.7.2
atomic_o_trunc requires fuse > 2.2 and kernel > 2.6.2... - 09:48 PM Bug #4385: mds: refusing connections with high open socket count
- Doesn't /proc tell you whether the fd is a socket or not? Or do you mean correlate activity?
In any case, all the ... - 07:05 PM Bug #4385: mds: refusing connections with high open socket count
- Err, dump up the level on the MDS...
- 07:04 PM Bug #4385: mds: refusing connections with high open socket count
- I'll test out the ulimit as a workaround, and presumably to verify the open fd limit theory.
I checked all my clie... - 06:53 PM Bug #4385: mds: refusing connections with high open socket count
- The direct cause of this is almost certainly an open fd limit coming from the OS, which you can probably work around ...
- 06:04 PM Bug #4385 (Resolved): mds: refusing connections with high open socket count
- My MDS has become unresponsive after a long period of map-reduce jobs. The MDS process is idle, but is eating up 16 G...
- 09:06 PM Fix #4034 (Resolved): mds: fix replayed ino creation extra_bl
- commit:3a7233bc8b199c97fbde9c1e44370353f0504af8
- 05:46 PM Fix #4034: mds: fix replayed ino creation extra_bl
- There's still a bad comment in 0c0313c6f6d4e2733fcf972b49456bf1faad9255, but the rest looks good!
- 03:49 PM Fix #4034: mds: fix replayed ino creation extra_bl
- Reviewed on Github.
- 09:06 PM Feature #4074 (Resolved): qa: add traceless reply test to fs suite
- commit:de62a79589fc4feed4243ac278d365b6363bfa2b fixed ceph.git bugs. added tests to fs suite.
- 03:49 PM Feature #4074: qa: add traceless reply test to fs suite
- Edit; wrong bug, sort of.
- 08:41 PM Cleanup #4387 (Resolved): mds: EMetaBlob::client_reqs doesn't need to be a list
- It is either set or not set at all, currently.
- 07:21 PM Feature #4386 (Resolved): kclient: Mount error message when no MDS present
- Right now you either get an input/output error or a message about not being able to find the superblock when trying t...
- 01:09 PM Bug #4358: kclient: ENOENT during kernel build on kclient
- An initial patch from Yan is in our testing branch and should fix this issue. (Or at least fixes one cause.) It may g...
- 09:35 AM Bug #4358: kclient: ENOENT during kernel build on kclient
- Let's see if this happens in testing branch after Yan's patches are all applied.
- 01:01 AM Bug #4358: kclient: ENOENT during kernel build on kclient
- got following message for kernel build error "find: `./include/generated': No such file or directory".
It's strange ... - 12:38 PM Bug #4370 (New): mds: high-cpu utilization in memorymodel:_sample
- Shortly after running some fs workloads on a 1-mds/16-osd cluster, cpu utilization spikes and never returns to normal...
03/06/2013
- 05:16 PM Feature #4361 (Resolved): Setup another gitbuilder VM for building external Hadoop git repo(s)
- One of the first things I did today was create it but it was taking a while and I started working on some other stuff...
- 11:55 AM Feature #4361 (Resolved): Setup another gitbuilder VM for building external Hadoop git repo(s)
- We're moving from building our own monolithic Hadoop packages to building a Hadoop/Ceph library and then running that...
- 03:34 PM Feature #4356 (Closed): libcephfs: expose osd topology
- 07:31 AM Bug #4358 (Resolved): kclient: ENOENT during kernel build on kclient
- ...
03/05/2013
- 07:47 PM Feature #4356 (Fix Under Review): libcephfs: expose osd topology
- 07:46 PM Feature #4356 (Closed): libcephfs: expose osd topology
- wip-expose-topo
e8da4bf and 6b3fce1 - 07:14 PM Feature #4074 (Fix Under Review): qa: add traceless reply test to fs suite
- 07:14 PM Fix #4034 (Fix Under Review): mds: fix replayed ino creation extra_bl
- 03:47 PM Feature #4355 (New): uclient: add perfcounters
- The client currently has 3 perfcounters: average latency of replies, of processing a request, and of a file write.
... - 03:46 PM Feature #4354 (Resolved): mds: add an equivalent to the OSD OpTracker
- Like it says — we want to be able to get information about ops-in-flight and their current status in a lot of differe...
- 01:29 PM Cleanup #4166 (Resolved): ceph: simplify ceph_sync_write() page_align
- This has been committed to the ceph-client testing branch.
038832c ceph: simplify ceph_sync_write() page_align cal... - 11:12 AM Bug #4350 (Rejected): ceph-fuse: lockup from 40g loopback mkfs.ext3
- The underlying RADOS cluster in this report isn't fully healthy. I'm pretty sure that's all there is. Unless we hear ...
- 09:00 AM Bug #4350 (Rejected): ceph-fuse: lockup from 40g loopback mkfs.ext3
- ...
03/04/2013
- 02:10 PM Feature #4326 (Resolved): qa: add samba + (kclient|ceph-fuse) to suite
- 11:01 AM Documentation #3796 (Resolved): FUSE mount documentation needs some corrections for v0,56x
- Page has been updated with instructions, and a hyperlink to Cephx Configuration Reference. See http://ceph.com/docs/m...
- 10:06 AM Documentation #3796 (In Progress): FUSE mount documentation needs some corrections for v0,56x
- 10:38 AM Cleanup #4166: ceph: simplify ceph_sync_write() page_align
- This patch (2/3 below) has been posted for review, along with
a few others I include here for context. Marking for ... - 08:50 AM Cleanup #4166 (In Progress): ceph: simplify ceph_sync_write() page_align
- I'm reopening this after all.
It turns out that the original patch was fine. The only part
that was bad was due ... - 07:14 AM Feature #4277: Move built hadoop artificats to download URL
- Thanks for the info Gary! Let me do a little bit more research on how users want to obtain the artifacts. I think tha...
Also available in: Atom