Project

General

Profile

Activity

From 03/04/2013 to 04/02/2013

04/02/2013

11:24 PM Bug #1535 (Resolved): concurrent creating and removing directories crashes cmds
I think this has been fixed by commit 00025462 Zheng Yan
10:48 PM Bug #1945: blogbench hang on caps
Sorry for the delay, I didn't noticed the notification. I fixed several bugs that may cause hangs of this type, but I... Zheng Yan
07:24 PM Bug #4489: ceph fs hangs on file stat
Hm, snapdirname is something obfuscated (but have no use, actually).
I've got the same error one more time, so I bel...
Ivan Kudryavtsev
06:14 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Sorry, I mean the mds journal, not the debug logs, when referring to the size. Greg Farnum
05:12 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Greg Farnum wrote:
> Strange, it looks like you have an MDS log of about 1236MB, which is...large. What config optio...
Andras Elso
04:28 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Strange, it looks like you have an MDS log of about 1236MB, which is...large. What config options are you setting?
...
Greg Farnum
12:36 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
I changed back to max_mds 1. same result:... Andras Elso
09:42 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
I'll check my assumptions today (already downloaded the logs), but with multiple active MDSes this doesn't warrant a ... Greg Farnum
07:14 AM Bug #4618 (Resolved): Journaler: _is_readable() and _prefetch() don't communicate correctly
The Journaler has mechanisms to try and read extra data if an event is large enough that it exceeds the current prefe... Andras Elso
02:48 PM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
Merged and pushed to master in commit:3842ff7d677bae98462f7d050f5fda9d85f6273d Greg Farnum
02:20 PM Bug #4619: mds: anchortable hangs on new cluster
Code looks good, Sorry for the bug!. Zheng Yan
01:06 PM Bug #4619 (Fix Under Review): mds: anchortable hangs on new cluster
recovery_done() breaks on a fresh machine because of the populate_mydir() ordering. The problem is that both recover... Sage Weil
09:52 AM Bug #4619 (In Progress): mds: anchortable hangs on new cluster
Sage said he'd look at the double-send as well. Greg Farnum
09:27 AM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
commit:968c6c0c9408b33904041e5ddbd9ea738e831713 Sage Weil
09:13 AM Bug #4619: mds: anchortable hangs on new cluster
I think this isn't correct. If we restart the table server MDS, it will send two ready messages to the table client. ... Zheng Yan
09:02 AM Bug #4619: mds: anchortable hangs on new cluster
Code looks good, assuming the tests run.
Sorry about that! :(
Greg Farnum
08:15 AM Bug #4619 (Fix Under Review): mds: anchortable hangs on new cluster
wip-4619 Sage Weil
08:14 AM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
Sage Weil
02:30 PM Bug #4621 (Rejected): failed pjd chown/00.t 124
Okay, all symlink attempts that made it to the MDS were successes, and I can't find any failed ceph-fuse symlink/ll_s... Greg Farnum
01:59 PM Bug #4621: failed pjd chown/00.t 124
Sorry, not an lchown, just a symlink create. Greg Farnum
01:29 PM Bug #4621: failed pjd chown/00.t 124
Well, it's always an adventure to figure out which one is busted, but it looks to be an lchown on a symlink failing. ... Greg Farnum
09:30 AM Bug #4621 (Rejected): failed pjd chown/00.t 124
2013-04-02T09:04:34.029 INFO:teuthology.task.workunit.client.0.out:../pjd-fstest-20090130-RC-open24/tests/chown/00.t ... Sage Weil
02:27 PM Feature #4630 (New): make lchown work in ceph-fuse for pjd
pjd doesn't believe that ceph-fuse supports lchown. Maybe this is pjd's fault; maybe it's ours. Figure out why so tha... Greg Farnum
11:49 AM Documentation #2206: Need a control command to gracefully shutdown an active MDS prior to planned...
This is partially documented by 0c16b31db7a5ed72a9c306ae91b191c326d0776a on github. Matthew Roy

04/01/2013

03:18 PM Bug #3266: "ceph mds tell 0 dumpcache /etc/passwd" is not cool
Before anybody embarks on solving this, I assume there's a standard way to handle this by outlawing certain kinds of ... Greg Farnum
01:23 PM Bug #2657: kclient: direct io write larger than 8MiB fails
in testing, there is now a test workunit Sage Weil
01:23 PM Bug #2657 (Resolved): kclient: direct io write larger than 8MiB fails
Sage Weil
01:22 PM Bug #4434 (Resolved): looping waiting for quorum after upgrade
Whoops@! Greg Farnum
01:14 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
I'll look into the code around this today. Greg Farnum
11:03 AM Bug #4489: ceph fs hangs on file stat
Why are you specifying the snapdirname to that weird value when mounting this? Greg Farnum
11:00 AM Bug #4405: MDCache::populate_mydir can loop forever
This dump has 1063591 inodes in the cache, of which only 122104 are non-stray. That doesn't seem quite right.
I do...
Greg Farnum
09:37 AM Bug #4590 (Resolved): ceph-fuse: fsx fails with 'client oc = false'
commit:c01e2e42f368ca003e03debe9a7bd5f12eb79d2c Sage Weil

03/31/2013

10:33 AM Bug #4601 (Can't reproduce): symlink with size zero
Somehow I got into a situation in which a number of symlinks, all of them created and later modified at about the sam... Alexandre Oliva

03/29/2013

09:05 PM Bug #4590 (Resolved): ceph-fuse: fsx fails with 'client oc = false'
... Sage Weil
03:22 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
Oh, yeah, we can do the same in the userspace client. I'll do that and re-push. Thanks Yan! Sam Lang
03:12 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
FYI:
The kclient deals with this case by calling wake_up_session_caps(). It just clear i_wanted_max_size/i_requested...
Zheng Yan
01:04 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I believe those are okay as truncate size changes should end up actually journaled (as setattrs) so they'll be replay... Greg Farnum
12:58 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I spent most of this morning figuring out if it made sense to send the full cap (ceph_mds_caps -- and get rid of the ... Sam Lang
12:31 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I'm not sure this is wrong, but it's confusing me a bit. I thought that the Client sent all capabilities it holds bac... Greg Farnum
12:14 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I just pushed wip-4582. Testing it on the fsstress test with mds_thrasher now. I'm not positive this is the right a... Sam Lang
11:53 AM Bug #4582 (In Progress): mds: Client hang on fsstress with mds_thrasher
Sam Lang
11:53 AM Bug #4582 (Resolved): mds: Client hang on fsstress with mds_thrasher

While trying to reproduce #4565, fsstress eventually hangs where the client is waiting for a max size update that t...
Sam Lang
01:55 PM Feature #4583 (Resolved): libcephfs: add test that kills a client and verifies mds cleans it up
Sage Weil
01:28 PM Feature #4022 (In Progress): client: qa: test non-cached operation (force sync mode)
Sage Weil
01:24 PM Fix #4191 (Resolved): qa: mulitiple mds in nightly (non-failure case)
Sage Weil
11:31 AM Bug #4578 (Resolved): client: hangs on unlink
Noah Watkins
11:16 AM Bug #4578: client: hangs on unlink
This patch solves the problem :) Noah Watkins
12:51 AM Bug #4578: client: hangs on unlink
yes, patch is also attached Zheng Yan
11:11 AM Feature #4442 (Resolved): java: add topology API support
Err, forgot to close. Thanks. ebc3abaf6dc62678f5ef5914862e9d8f216fffbf Noah Watkins
11:05 AM Feature #4442: java: add topology API support
I think this already got reviewed and merged, right? Or is there something else we need? Greg Farnum
11:02 AM Bug #4569 (Resolved): ceph-mds: segfault
commit:4f8ba0e7756a1b0647867db0e9b5549b3e82f6b1 in master. This wasn't a bug in any released versions, so no backports. Greg Farnum
10:50 AM Bug #4569: ceph-mds: segfault
In case it matters at all, the segfault was happening when I was furiously sigterm'n my hung-on-unlink client. Noah Watkins
10:33 AM Bug #4569: ceph-mds: segfault
Yep, the problem here is that the Session was created during replay and it never had a Connection associated with it ... Greg Farnum
10:20 AM Bug #4569: ceph-mds: segfault
In the logs the session in question is one that failed to reconnect. Was there a different event that caused the MDS ... Greg Farnum

03/28/2013

08:47 PM Bug #4578 (Resolved): client: hangs on unlink
Looks like somebody accidentally deleted #4570 (and there's no undelete in Redmine best I can tell), so this ticket w... Greg Farnum
06:58 PM Feature #4576 (Rejected): java: support ByteBuffer interface for NIO and NIO.2 high-perf I/O
ByteBuffer interface in NIO avoids needless copying, and is used by NIO.2 and the new VFS infrastructure in Java 7. T... Noah Watkins
10:21 AM Bug #4569: ceph-mds: segfault
It looks like the session is getting closed because its stale, and then killed, but the session->connection field pas... Sam Lang
10:00 AM Feature #4354 (In Progress): mds: add an equivalent to the OSD OpTracker
Greg Farnum
07:31 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Update on trying to track this down...running this test in teuthology, I don't hit the same assertion, but I do see t... Sam Lang

03/27/2013

09:23 PM Bug #4308 (Won't Fix): ceph-fuse crashed during blogbench test (argonaut)
this is most likely memory corruption in argonaut's ceph-fuse. Sage Weil
09:21 PM Bug #4564 (Resolved): client: Close session doesn't wait for outstanding requests
Sage Weil
09:09 AM Bug #4564 (Fix Under Review): client: Close session doesn't wait for outstanding requests
Pushed a fix to wip-4564. Sam Lang
07:13 AM Bug #4564 (Resolved): client: Close session doesn't wait for outstanding requests

Ran into another failure related to testing #4451 on the client where the following occurs:
client sends create/...
Sam Lang
11:45 AM Bug #4569 (Resolved): ceph-mds: segfault
I started receiving this segfault in ceph-mds with the latest master today.... Noah Watkins
09:35 AM Bug #4565 (Resolved): MDS/client: issue decoding MClientReconnect on MDS
... Sage Weil
08:26 AM Bug #4539 (Resolved): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_...
commit:295c92c Sage Weil
07:47 AM Bug #4539 (Fix Under Review): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::stand...
Yep. There's no state bit, and the cache is unchanged by the backtrace updates list. The standby mds is free to cle... Sam Lang
08:04 AM Bug #4555 (Resolved): The CephFileSystem class is missing the createNonRecursive method
0a5175722a8444579715c1871c09c246969e7890 Noah Watkins

03/26/2013

10:22 PM Bug #4539: include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_segments()
I think this is as simple as... Sage Weil
04:04 PM Feature #4277 (Closed): Move built hadoop artificats to download URL
Anonymous
04:03 PM Feature #4277: Move built hadoop artificats to download URL
For now, we're manually posting Hadoop bindings to http://ceph.com/download/. I'll close this for now and we can revi... Anonymous
03:13 PM Bug #4530 (Resolved): client: Assert failure on session close
Sage Weil
10:19 AM Bug #4530: client: Assert failure on session close
I went to file another bug for the client reconnect triggering a session close, but the log indicates that its not ac... Sam Lang
02:23 PM Fix #4191 (In Progress): qa: mulitiple mds in nightly (non-failure case)
Sage Weil
12:09 PM Bug #4555 (Resolved): The CephFileSystem class is missing the createNonRecursive method
This is needed by HBase
There is a pull request here: https://github.com/ceph/hadoop-common/pull/1
Mike Bryant
11:45 AM Feature #2144: mon: improve mds health checks
commit:1baf66b Sage Weil
11:45 AM Feature #2144 (Resolved): mon: improve mds health checks
Sage Weil
11:11 AM Bug #4545 (Can't reproduce): error creating empty object store. Invalid argument.
Sage Weil
09:33 AM Bug #4545: error creating empty object store. Invalid argument.
i've seen this regularly in the qa runs over the last week or so Sage Weil
09:41 AM Bug #4537 (Resolved): mds: hang on rmdir, unlink
Sam Lang
07:04 AM Bug #4537 (Fix Under Review): mds: hang on rmdir, unlink
Fix pushed to wip-4537. Sam Lang

03/25/2013

07:23 PM Bug #4405: MDCache::populate_mydir can loop forever
Ok, I did
ceph mds tell 0 dumpcache /tmp/dump.txt
http://91.226.13.93/dump.txt.gz
Ivan Kudryavtsev
07:16 PM Bug #4405: MDCache::populate_mydir can loop forever
... Ivan Kudryavtsev
09:20 AM Bug #4405: MDCache::populate_mydir can loop forever
If you run "ceph mds 0 dumpcache <filename>" then the MDS will dump everything it has in cache to the filename you sp... Greg Farnum
01:37 PM Bug #4537 (In Progress): mds: hang on rmdir, unlink
Sam Lang
01:06 PM Bug #4530 (Fix Under Review): client: Assert failure on session close
I pushed some fixes to wip-4530 for the client side part of this. Needs review. Sam Lang
09:49 AM Bug #4530 (In Progress): client: Assert failure on session close
Ian Colle
12:57 PM Bug #4545: error creating empty object store. Invalid argument.
Alright, I no longer think the apache2 signature is related. This seems like a proper bug in its own right. Anonymous
12:04 PM Bug #4545: error creating empty object store. Invalid argument.
This may be failing due to a package signing issue that I thought had been resolved. I'll hold onto this ticket until... Anonymous
12:02 PM Bug #4545: error creating empty object store. Invalid argument.
Added the yaml file I was using (needs 3 locked hosts) and the teuthology output as attachments. Anonymous
12:00 PM Bug #4545 (Can't reproduce): error creating empty object store. Invalid argument.
While running a teuthology test, mkcephfs failed with this error:
INFO:teuthology.task.ceph:Running mkfs on osd node...
Anonymous
09:47 AM Bug #4517 (Resolved): ceph_rename fails success case
Sage Weil

03/24/2013

05:34 PM Bug #4537: mds: hang on rmdir, unlink
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2501
cro...
Sage Weil
05:33 PM Bug #4537: mds: hang on rmdir, unlink
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2503
<pr...
Sage Weil
03:18 PM Bug #4537: mds: hang on rmdir, unlink
similar hang:... Sage Weil
02:41 PM Bug #4537 (Resolved): mds: hang on rmdir, unlink
... Sage Weil
03:22 PM Bug #4539: include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_segments()
ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2410 Sage Weil
03:22 PM Bug #4539: include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_segments()
also ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2414 Sage Weil
03:22 PM Bug #4539 (Resolved): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_...
... Sage Weil
11:16 AM Bug #4536 (Resolved): hadoop: receiving unexpected filenotfound exceptions
Fixed by 150e914c7549f7197eff9fe980abd17a921799ce Noah Watkins

03/23/2013

12:39 PM Bug #4517: ceph_rename fails success case
All working fine now. Thanks Noah Watkins
12:19 PM Bug #4517: ceph_rename fails success case
Thanks for testing/reporting that Noah. That commit last night was bogus. Pushed wip-4517b. Sam Lang
11:04 AM Bug #4517: ceph_rename fails success case
I'm testing this branch, and I'm getting a segfault running the LibCephFS.Rename test.... Noah Watkins
11:53 AM Bug #4536 (Resolved): hadoop: receiving unexpected filenotfound exceptions
Jobs have started failing with the following trace.... Noah Watkins

03/22/2013

03:17 PM Feature #4535 (New): mds: add group usage statistics gathering to the MemoryModel
Once we've updated our MemoryModel (#4502, #4503) and have selected groups of in-memory data that we believe we can s... Greg Farnum
02:09 PM Bug #4517: ceph_rename fails success case
Indeed. Updated wip-4517. Sam Lang
01:49 PM Bug #4517: ceph_rename fails success case
I'm just skimming this in the middle of a meeting, but it looks like we're now failing the rename if the destination ... Greg Farnum
01:31 PM Bug #4517 (Fix Under Review): ceph_rename fails success case
Sam Lang
12:21 PM Bug #4517 (In Progress): ceph_rename fails success case
Sam Lang
01:27 PM Feature #4442 (Fix Under Review): java: add topology API support
Noah Watkins
09:42 AM Bug #4530 (Resolved): client: Assert failure on session close

During testing of #4451:
../../src/common/Cond.h: In function 'int Cond::Signal()' thread 7fe04c36f700 time 2013...
Sam Lang

03/21/2013

12:52 PM Bug #4517 (Resolved): ceph_rename fails success case
ceph_rename has started returning -ENONET in the common case (source path exists, dest path doesn't exist). In the cl... Noah Watkins

03/20/2013

10:50 AM Bug #4405: MDCache::populate_mydir can loop forever
1) I don't use filesystem snapshots at all.
2) I really have 3 big directories with 40000 files total
3) Some days ...
Ivan Kudryavtsev
09:40 AM Bug #4405: MDCache::populate_mydir can loop forever
Sorry this got dropped on the floor. I found the problems.
The MDS never finishes the "populate_mydir()" function ...
Greg Farnum
09:45 AM Bug #4451: client: Ceph client not releasing cap
Uploaded an annotated log with only the lines related to the inode exhibiting the problem. The problem occurs from t... Sam Lang

03/19/2013

11:35 PM Bug #4489: ceph fs hangs on file stat
And MDS reload doesn't fixed problem until I rebooted one of FS clients. Ivan Kudryavtsev
11:34 PM Bug #4489: ceph fs hangs on file stat
Oh, sorry for that. It seems, I failed with log. I will attach correct log next time problem persist. But the problem... Ivan Kudryavtsev
02:05 PM Bug #4489 (Need More Info): ceph fs hangs on file stat
That log is from a standby MDS. You'll need to provide the log of the active MDS for us to do anything with it. :) Greg Farnum
10:38 AM Feature #4504 (Resolved): mds: trim based on total memory usage
Right now the MDS only trims based on the count of the dentry cache. We should, based on a config option, optionally ... Greg Farnum
10:21 AM Feature #4503 (New): mds: MemoryModel: include the different boost::pools we use
We use a different boost::pool for each of CDir, CDentry, CInode, Capability. Include these pools, and any others we'... Greg Farnum
10:19 AM Feature #4502 (New): mds: Make the MemoryModel useful
Right now the MDCache's MemoryModel is trying to parse out usage from /proc/self/status. Switch it to use tcmalloc's ... Greg Farnum
10:08 AM Tasks #4499: Identify fields in CInode which aren't permanently necessary
Also, a small one but one that's everywhere: each of the classes in this sequence of bugs has an MDCache pointer. Pro... Greg Farnum
09:43 AM Tasks #4499 (Resolved): Identify fields in CInode which aren't permanently necessary
There are a number of fields in CInode that we don't always need. Examples include everything involved with projectio... Greg Farnum
10:01 AM Cleanup #89 (Closed): mds: put inode dirty fields in dirty_bits_t to reduce memory footprint
This is a less-specific duplicate of #4499 now. Greg Farnum
10:01 AM Feature #4501 (Resolved): Identify fields in CDir which aren't permanently necessary
The CDir has some machinery for handling things like dirty data that isn't always necessary. Audit it for these membe... Greg Farnum
09:57 AM Tasks #4500 (New): Identify fields in CDentry which aren't permanently necessary
CDentry is in far better shape than CInode in this regard, but audit it for things which we don't always need in memo... Greg Farnum
09:16 AM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
cdca0babf9145a8f6e7613ab7026cf0968b3bc91 Noah Watkins
09:00 AM Feature #3540 (Resolved): mds: maintain per-file backpointers on first file object
8b798867731d298c05d9f93b0c207a541d2b5e90 merged to master Ian Colle

03/18/2013

09:07 PM Bug #4491 (Resolved): mds: assert failure on _purge_forward_pointers
Sage Weil
03:37 PM Bug #4491 (Fix Under Review): mds: assert failure on _purge_forward_pointers
I pushed a proposed fix to wip-4491. Basically we just need to handle the case that the osd returns ENODATA. Sam Lang
03:03 PM Bug #4491: mds: assert failure on _purge_forward_pointers
This happens soon after ceph-fuse mount. I hit this when trying to run blogbench test. Tamilarasi muthamizhan
02:47 PM Bug #4491 (Resolved): mds: assert failure on _purge_forward_pointers

Joe Buck reported a bug with master:
INFO:teuthology.task.ceph.mds.0.err:mds/MDCache.cc: In function 'void MDCac...
Sam Lang
06:03 PM Bug #4489: ceph fs hangs on file stat
and no other specific events were that moment (like scrubbing, osd/mds/mon failures). Ivan Kudryavtsev
06:02 PM Bug #4489: ceph fs hangs on file stat
no, it started early than #4486, so the reason is another one. Ivan Kudryavtsev
06:00 PM Bug #4489: ceph fs hangs on file stat
I think it could be connected with #4486, because, I found about 150 launched cron tasks and every task is launched t... Ivan Kudryavtsev
05:46 PM Bug #4489: ceph fs hangs on file stat
Wrapping cron.d code.... Ivan Kudryavtsev
05:44 PM Bug #4489: ceph fs hangs on file stat
code, which caused hung running on two hosts:... Ivan Kudryavtsev
10:20 AM Bug #4489: ceph fs hangs on file stat
can provide shell access to one of servers but don't know if it can be reproduced easily. Ivan Kudryavtsev
10:17 AM Bug #4489 (Can't reproduce): ceph fs hangs on file stat
hi. I have cephfs (kernel client) mounted from two hosts at /var/www.
I'm trying to do...
Ivan Kudryavtsev
05:46 PM Feature #1448: test hadoop on sepia
Are nodes available for scale testing? Issdm cluster is withering away.. Noah Watkins
05:42 PM Feature #4484 (Resolved): Enable Hadoop bindings to pull configuration options from the monitor
Noah Watkins
04:28 PM Feature #4494 (New): qa: exercise recovery from migration points
In #4493 we checked recovery in an MDS cluster. Now we need to check recovery following each kill point involved in m... Greg Farnum
04:23 PM Feature #4493 (New): qa: trigger each kill_at point related to clustered recovery
Write a workunit using the restart teuthology task interface that handles running several MDS daemons and fully exerc... Greg Farnum
04:18 PM Tasks #4492 (New): mds: Define kill points involved in clustered migration and recovery
We need to define all the separate points at which a break in 1) clustered recovery and 2) migration leaves a differe... Greg Farnum
09:55 AM Bug #4434: looping waiting for quorum after upgrade
Yep! This says that you ran a branch which included an unreleased set of encoding rules on the MDS which would have c... Greg Farnum

03/17/2013

12:12 PM Feature #4484 (Fix Under Review): Enable Hadoop bindings to pull configuration options from the m...
Noah Watkins
12:12 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
ceph.git wip-4484
hadoop-common.git cephfs/wip-4484
Noah Watkins
11:10 AM Feature #4485 (New): Improve "needsrecover" handling
Jim Schutt reported issues on the mailing list[1] with slow stats that turned out to be due to inodes with the "needs... Greg Farnum

03/16/2013

03:50 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
I'd lean towards keyring files, but we may want to float this on Monday's stand-up. Anonymous
03:18 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
Seems like there are keyring files, secret strings, client usernames, etc… Which one(s) should we use?
http://ceph...
Noah Watkins
03:10 PM Feature #4484: Enable Hadoop bindings to pull configuration options from the monitor
This replaces the current approach, which assumes that the host with the JobTracker (likely not an OSD and possibly n... Anonymous
02:59 PM Feature #4484 (Resolved): Enable Hadoop bindings to pull configuration options from the monitor
At present, the Hadoop bindings require several options be specified in xml files.
It would be easier for users if ...
Anonymous

03/15/2013

02:54 PM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
I'm going to need to dig into why it doesn't seem to be finishing, but I think it might be exposing some (more) file ... Greg Farnum
10:51 AM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
The wip-samba-on-ceph branch has "samba", "cifs-mount", and "smbtorture" tasks.
I notice that smbtorture on ceph-f...
Greg Farnum
02:23 PM Bug #4451: client: Ceph client not releasing cap
Looked at this again briefly. I notice:
1) the inode was previously in the stray directory (before MDS restart)
2) ...
Greg Farnum
09:31 AM Bug #4451: client: Ceph client not releasing cap
For some reason the MDS is sending back an "export" on the caps for that inode (timestamp 2013-03-15 09:07:38.098273)... Greg Farnum
08:59 AM Bug #4451 (Resolved): client: Ceph client not releasing cap

I'm occasionally hitting a hang in my backtrace testing, where unmount never completes. The client log shows a dis...
Sam Lang
01:56 PM Tasks #4467 (New): qa: make ior tasks work
Sage Weil
01:42 PM Fix #3630 (Resolved): mds: broken closed connection cleanup
Sage Weil
01:38 PM Fix #4286: SLES 11 - cfuse: disable 'big_writes'and 'atomic_o_trunc
also, the invalidate callback code probably needs to be conditional, too! Sage Weil
01:35 PM Fix #2215 (Resolved): ceph-fuse does not invalidate page cache
Sage is turning it on by default now following weeks of testing in the nightlies! Greg Farnum
01:33 PM Feature #2903 (Resolved): ceph-fuse: Support -o noallow_other
Sage Weil

03/14/2013

03:51 PM Bug #4434: looping waiting for quorum after upgrade
This is what was captured at the time the test was run successfully: Ceph Version: 0.57-667-g6a9cda7
The next inst...
Ken Franklin
10:12 AM Bug #4434: looping waiting for quorum after upgrade
Just to make sure I'm tracking these upgrades correctly:
It was created on v0.56.3? (Not a branch.) Then it moved to...
Greg Farnum
09:30 AM Bug #4434: looping waiting for quorum after upgrade
It's quite possible the upgrade was corrupted somewhere along the line. Prior to the issues the system was on 0.56.3... Ken Franklin
01:30 PM Feature #4441 (Resolved): libcephfs: add ceph_get_osd_addr()
Noah Watkins
01:19 PM Feature #4441 (Resolved): libcephfs: add ceph_get_osd_addr()
Noah Watkins
01:20 PM Feature #4442 (Resolved): java: add topology API support
Noah Watkins
09:53 AM Bug #4358 (Resolved): kclient: ENOENT during kernel build on kclient
Sage Weil
08:54 AM Bug #4358: kclient: ENOENT during kernel build on kclient
passed another 100 iterations (modulo a machine lockup on the server side) Sage Weil

03/13/2013

08:21 PM Bug #4405: MDCache::populate_mydir can loop forever
And what's interesting all the time MDS server has incoming traffic of ~40MB/s, but no active clients. I found it aft... Ivan Kudryavtsev
08:14 PM Bug #4405: MDCache::populate_mydir can loop forever
OK, I don't know what do you mean under "start" term. But actually, all the time MDS run with
debug ms =1 and debug ...
Ivan Kudryavtsev
06:52 PM Bug #4405: MDCache::populate_mydir can loop forever
Hi Ivan-
Looking at the log, it looks like all 3 times the MDS started up it came up within 5 seconds or so. Do y...
Sage Weil
06:14 PM Bug #4358: kclient: ENOENT during kernel build on kclient
20 iterations on testing branch. i ran a bunch on master to make sure i could trigger the old bug, but then couldn't... Sage Weil
04:40 PM Bug #4390 (Resolved): mds: zapping named mds causes client assertion
commit:f67596a44739e8071cc97fb0463f37203502faaa Sage Weil
04:39 PM Bug #4385 (Resolved): mds: refusing connections with high open socket count
commit:8b713371447f9761597457af2c81f0b870d3c4ba Sage Weil
03:03 PM Bug #4434: looping waiting for quorum after upgrade
More details? I'm not sure how the title relates to the bug description or MDS log. The log is crashing on the Sessio... Greg Farnum
02:52 PM Bug #4434: looping waiting for quorum after upgrade
changed the project Ken Franklin
02:41 PM Bug #4434: looping waiting for quorum after upgrade

Part of the bug appears to be in ceph, where the following returns an error, causing an infinite loop in get_key():...
Sam Lang
02:36 PM Bug #4434 (Resolved): looping waiting for quorum after upgrade
How we got here:
Bobtail .56 installed on burnupi60 failed daily upgrade due to new gitbuilder keys.
updated key.
...
Ken Franklin
02:25 PM Bug #3640 (Duplicate): kclient: hang and kernel panic
dup of #3088 Sage Weil
02:24 PM Bug #3088: NULL pointer dereference at ceph_d_prune
this code may be gone now with yan's d_prune changes... Sage Weil
02:06 PM Bug #1945: blogbench hang on caps
Yan, would you mind taking a look at this when you have time? Ian Colle
02:05 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Sage Weil

03/12/2013

11:18 PM Feature #4277: Move built hadoop artificats to download URL
I have a documentation branch pushed up that is waiting for the URLs. Let me know what those are and I can integrate ... Noah Watkins
11:22 AM Feature #4277 (In Progress): Move built hadoop artificats to download URL
As a starting point, let's post this on the download page as stand-alone jar files. I'll take ownership of doing that... Anonymous
08:57 PM Bug #4385 (Fix Under Review): mds: refusing connections with high open socket count
sounds right. thanks for testing! Sage Weil
08:25 PM Bug #4385: mds: refusing connections with high open socket count
Err, "unclean mounts" = "exiting without unmounting" Noah Watkins
08:23 PM Bug #4385: mds: refusing connections with high open socket count
Well hot damn. That branch seems to solve two problems. First, clients that do a clean unmount don't leave lots of FD... Noah Watkins
07:55 PM Bug #4385: mds: refusing connections with high open socket count
Noah, do you want to try wip-mds-con? Sage Weil
07:45 PM Bug #4385 (In Progress): mds: refusing connections with high open socket count
Sage Weil
01:53 PM Bug #4385: mds: refusing connections with high open socket count
Although the high counts were because of double counting by lsof, the sockets still are not being closed. Without any... Noah Watkins
12:32 PM Bug #4385: mds: refusing connections with high open socket count
Hmm, did we screw up our refactoring work so that replaced sockets are no longer actually closed? That might explain ... Greg Farnum
12:25 PM Bug #4385: mds: refusing connections with high open socket count
Here's some more info after investigating this a bit further.
Open socket counts by category after a fresh MDS reb...
Noah Watkins
11:52 AM Documentation #4422: Typo on Release Process webpage
Make that one fewer "are". Got to love making a typo on a ticket about a typo. Anonymous
11:38 AM Documentation #4422 (Resolved): Typo on Release Process webpage
This sentence (in section 1) needs one less instance of "and":
"The RPM based packages are are built natively, so on...
Anonymous

03/11/2013

09:34 PM Feature #4393 (Resolved): Add apache-hadoop gitbuilder to master gitbuilder webpage
gitbuilder.sepia.com just needed the new gitbuilder added to the proxy config file. Anonymous
05:55 PM Bug #4398: fix kclient_workunit_misc.yaml in the nightlies
ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21326 Tamilarasi muthamizhan
09:38 AM Bug #4398 (Duplicate): fix kclient_workunit_misc.yaml in the nightlies
Ian Colle
03:22 PM Feature #4073 (Resolved): qa: add message delay injection to test suite
Sage Weil
03:20 PM Feature #4190 (Resolved): qa: add mds thrashing to nightly
Sage Weil
09:47 AM Feature #4326 (In Progress): qa: add samba + (kclient|ceph-fuse) to suite
Ian Colle
04:42 AM Bug #4405: MDCache::populate_mydir can loop forever
Log is done when it was stuck last time. I stopped MDS, increased log level and started again. Ivan Kudryavtsev

03/10/2013

11:40 PM Bug #4405: MDCache::populate_mydir can loop forever
Log download link: http://pixeltram.com/ceph-mds.1.log.1.gz
Ivan Kudryavtsev
07:43 AM Bug #4405: MDCache::populate_mydir can loop forever
If the stuck startup is reproducible now (by lowering the cache size and restarting), a log with debug ms =1 and debu... Sage Weil
12:31 AM Bug #4405: MDCache::populate_mydir can loop forever
I mounted ceph root and counted amount of files and it's less than default cache size of 100000... Ivan Kudryavtsev
12:05 AM Bug #4405: MDCache::populate_mydir can loop forever
Actually, regarding initial ticket message. I think MDS goes in some kind of LOOP during start, when cache size is sm... Ivan Kudryavtsev

03/09/2013

11:59 PM Bug #4405: MDCache::populate_mydir can loop forever
I think it's important to specify some kind of metrics so everyone could calculate memory utilization of specific cac... Ivan Kudryavtsev
11:49 PM Bug #4405: MDCache::populate_mydir can loop forever
regarding q2:
I increased mds cache size to
mds cache size = 100000000
and it started in seconds.
I don't...
Ivan Kudryavtsev
11:27 PM Bug #4405 (Resolved): MDCache::populate_mydir can loop forever
I had unusual MDS failure. My server NIC started to flap and as a result (finally)
my CEPH FS started to recover an...
Ivan Kudryavtsev
10:35 PM Bug #4390: mds: zapping named mds causes client assertion
ran this through the fs suite and it passed. i would expect breakage in mds thrashing and multimds situations, thoug... Sage Weil
07:24 AM Bug #4358: kclient: ENOENT during kernel build on kclient
That might work, as long as we don't need to update the flags and i_release_count atomically... that'd have to become... Sage Weil
06:12 AM Bug #4358: kclient: ENOENT during kernel build on kclient
any idea to fix the locking issue? use atomic bit operation to modify the i_ceph_flags? Zheng Yan

03/08/2013

05:27 PM Bug #4398: fix kclient_workunit_misc.yaml in the nightlies
looks like the test failed due to,
2013-03-06T06:38:55.270 INFO:teuthology.task.workunit.client.0.out:
2013-03-06...
Tamilarasi muthamizhan
05:16 PM Bug #4398 (Duplicate): fix kclient_workunit_misc.yaml in the nightlies
log: ubuntu@teuthology:/a/teuthology-2013-03-06_01:00:04-regression-master-testing-gcov/16995... Tamilarasi muthamizhan
04:34 PM Bug #4385: mds: refusing connections with high open socket count
Log file fun. Here is the MDS log up until it stopped accepting connections.
http://piha.soe.ucsc.edu/ceph-mds.a.l...
Noah Watkins
10:57 AM Bug #4385: mds: refusing connections with high open socket count
Would you like to logs up to the point that the MDS stops accepts connections, or just a snap shot after the FD list ... Noah Watkins
10:34 AM Bug #4385: mds: refusing connections with high open socket count
can you reproduce with debug ms = 20 and debug mds = 20 ? those logs would be helpful Sage Weil
10:30 AM Bug #4385: mds: refusing connections with high open socket count
To Greg's question, it seems as though the connections were not timing out. I'd toss out a rough estimate of about 45... Noah Watkins
10:29 AM Bug #4385: mds: refusing connections with high open socket count
Is there anything I can do to get more information for this ticket? Noah Watkins
09:56 AM Bug #4385: mds: refusing connections with high open socket count
It might be contributing, but I believe the sockets should still be getting closed after a timeout period, right? Greg Farnum
09:45 AM Bug #4385: mds: refusing connections with high open socket count
I bet #3630 is contributing here. Sage Weil
07:11 AM Bug #4385: mds: refusing connections with high open socket count
I had this thought that the set of FDs in the logs would be >> than the set shown in lsof, and that we'd want to cros... Noah Watkins
02:11 PM Bug #4390: mds: zapping named mds causes client assertion
pushed wip-4390-b, which solves this on the client side.
i don't really want to delay the mark-down/failing in the...
Sage Weil
08:48 AM Bug #4390: mds: zapping named mds causes client assertion
That approach was breaking the monitor. Just pushed a new approach that queues the zap for later. Sam Lang
06:37 AM Bug #4390 (Fix Under Review): mds: zapping named mds causes client assertion
Sam Lang
06:37 AM Bug #4390: mds: zapping named mds causes client assertion
Proposed fix in wip-4390. Should we also cleanup the client code to wait till the mdsmap contains up members? Separ... Sam Lang
06:31 AM Bug #4390 (Resolved): mds: zapping named mds causes client assertion

Hit the following assertion on the client with backtrace testing:
../../src/mds/MDSMap.h: In function 'const ent...
Sam Lang
09:29 AM Feature #4393 (Resolved): Add apache-hadoop gitbuilder to master gitbuilder webpage
I brought a new gitbuilder online at gitbuilder-precise-apache-hadoop-amd64.front.sepia.ceph.com and ran the command ... Anonymous
09:01 AM Bug #4358: kclient: ENOENT during kernel build on kclient
I hit this today while testing. Sorry, I don't remember
which test but Sage says he knows what happened.
http://pa...
Alex Elder
08:46 AM Fix #2215: ceph-fuse does not invalidate page cache
Those tests are part of the full regression test suite. Sam Lang

03/07/2013

10:32 PM Fix #4286 (In Progress): SLES 11 - cfuse: disable 'big_writes'and 'atomic_o_trunc
big_write was added in fuse 2.8, sles has fuse version 2.7.2
atomic_o_trunc requires fuse > 2.2 and kernel > 2.6.2...
Anonymous
09:48 PM Bug #4385: mds: refusing connections with high open socket count
Doesn't /proc tell you whether the fd is a socket or not? Or do you mean correlate activity?
In any case, all the ...
Greg Farnum
07:05 PM Bug #4385: mds: refusing connections with high open socket count
Err, dump up the level on the MDS... Noah Watkins
07:04 PM Bug #4385: mds: refusing connections with high open socket count
I'll test out the ulimit as a workaround, and presumably to verify the open fd limit theory.
I checked all my clie...
Noah Watkins
06:53 PM Bug #4385: mds: refusing connections with high open socket count
The direct cause of this is almost certainly an open fd limit coming from the OS, which you can probably work around ... Greg Farnum
06:04 PM Bug #4385 (Resolved): mds: refusing connections with high open socket count
My MDS has become unresponsive after a long period of map-reduce jobs. The MDS process is idle, but is eating up 16 G... Noah Watkins
09:06 PM Fix #4034 (Resolved): mds: fix replayed ino creation extra_bl
commit:3a7233bc8b199c97fbde9c1e44370353f0504af8 Sage Weil
05:46 PM Fix #4034: mds: fix replayed ino creation extra_bl
There's still a bad comment in 0c0313c6f6d4e2733fcf972b49456bf1faad9255, but the rest looks good! Greg Farnum
03:49 PM Fix #4034: mds: fix replayed ino creation extra_bl
Reviewed on Github. Greg Farnum
09:06 PM Feature #4074 (Resolved): qa: add traceless reply test to fs suite
commit:de62a79589fc4feed4243ac278d365b6363bfa2b fixed ceph.git bugs. added tests to fs suite. Sage Weil
03:49 PM Feature #4074: qa: add traceless reply test to fs suite
Edit; wrong bug, sort of. Greg Farnum
08:41 PM Cleanup #4387 (Resolved): mds: EMetaBlob::client_reqs doesn't need to be a list
It is either set or not set at all, currently. Sage Weil
07:21 PM Feature #4386 (Resolved): kclient: Mount error message when no MDS present
Right now you either get an input/output error or a message about not being able to find the superblock when trying t... Mark Nelson
01:09 PM Bug #4358: kclient: ENOENT during kernel build on kclient
An initial patch from Yan is in our testing branch and should fix this issue. (Or at least fixes one cause.) It may g... Greg Farnum
09:35 AM Bug #4358: kclient: ENOENT during kernel build on kclient
Let's see if this happens in testing branch after Yan's patches are all applied. Ian Colle
01:01 AM Bug #4358: kclient: ENOENT during kernel build on kclient
got following message for kernel build error "find: `./include/generated': No such file or directory".
It's strange ...
Zheng Yan
12:38 PM Bug #4370 (New): mds: high-cpu utilization in memorymodel:_sample
Shortly after running some fs workloads on a 1-mds/16-osd cluster, cpu utilization spikes and never returns to normal... Noah Watkins

03/06/2013

05:16 PM Feature #4361 (Resolved): Setup another gitbuilder VM for building external Hadoop git repo(s)
One of the first things I did today was create it but it was taking a while and I started working on some other stuff... Sandon Van Ness
11:55 AM Feature #4361 (Resolved): Setup another gitbuilder VM for building external Hadoop git repo(s)
We're moving from building our own monolithic Hadoop packages to building a Hadoop/Ceph library and then running that... Anonymous
03:34 PM Feature #4356 (Closed): libcephfs: expose osd topology
Noah Watkins
07:31 AM Bug #4358 (Resolved): kclient: ENOENT during kernel build on kclient
... Sage Weil

03/05/2013

07:47 PM Feature #4356 (Fix Under Review): libcephfs: expose osd topology
Noah Watkins
07:46 PM Feature #4356 (Closed): libcephfs: expose osd topology
wip-expose-topo
e8da4bf and 6b3fce1
Noah Watkins
07:14 PM Feature #4074 (Fix Under Review): qa: add traceless reply test to fs suite
Sage Weil
07:14 PM Fix #4034 (Fix Under Review): mds: fix replayed ino creation extra_bl
Sage Weil
03:47 PM Feature #4355 (New): uclient: add perfcounters
The client currently has 3 perfcounters: average latency of replies, of processing a request, and of a file write.
...
Greg Farnum
03:46 PM Feature #4354 (Resolved): mds: add an equivalent to the OSD OpTracker
Like it says — we want to be able to get information about ops-in-flight and their current status in a lot of differe... Greg Farnum
01:29 PM Cleanup #4166 (Resolved): ceph: simplify ceph_sync_write() page_align
This has been committed to the ceph-client testing branch.
038832c ceph: simplify ceph_sync_write() page_align cal...
Alex Elder
11:12 AM Bug #4350 (Rejected): ceph-fuse: lockup from 40g loopback mkfs.ext3
The underlying RADOS cluster in this report isn't fully healthy. I'm pretty sure that's all there is. Unless we hear ... Greg Farnum
09:00 AM Bug #4350 (Rejected): ceph-fuse: lockup from 40g loopback mkfs.ext3
... Sage Weil

03/04/2013

02:10 PM Feature #4326 (Resolved): qa: add samba + (kclient|ceph-fuse) to suite
Sage Weil
11:01 AM Documentation #3796 (Resolved): FUSE mount documentation needs some corrections for v0,56x
Page has been updated with instructions, and a hyperlink to Cephx Configuration Reference. See http://ceph.com/docs/m... John Wilkins
10:06 AM Documentation #3796 (In Progress): FUSE mount documentation needs some corrections for v0,56x
John Wilkins
10:38 AM Cleanup #4166: ceph: simplify ceph_sync_write() page_align
This patch (2/3 below) has been posted for review, along with
a few others I include here for context. Marking for ...
Alex Elder
08:50 AM Cleanup #4166 (In Progress): ceph: simplify ceph_sync_write() page_align
I'm reopening this after all.
It turns out that the original patch was fine. The only part
that was bad was due ...
Alex Elder
07:14 AM Feature #4277: Move built hadoop artificats to download URL
Thanks for the info Gary! Let me do a little bit more research on how users want to obtain the artifacts. I think tha... Noah Watkins
 

Also available in: Atom