Project

General

Profile

Activity

From 08/09/2011 to 09/07/2011

09/07/2011

04:58 PM Bug #1509 (Resolved): cfuse sometimes hangs after unmount
recent regression, now fixed by commit:fc587d6caa2376f95fe15567bd632a2d4b8bb81f Sage Weil
09:54 AM Bug #1425 (Resolved): mds: stuck in prexlock
Sage Weil

09/06/2011

10:23 PM Bug #1509: cfuse sometimes hangs after unmount
This is usually caused by leaked inode references. A full client log (debug ms = 1, debug client = 20, debug objectc... Sage Weil
12:32 PM Bug #1509 (Can't reproduce): cfuse sometimes hangs after unmount
After fusermount completes successfully, cfuse did not exit in these runs:
teuthology:~teuthworker/archive/nightly...
Josh Durgin
10:01 PM Bug #1472: cfuse hangs with v0.34
Any update on this? Were you able to reproduce? Sage Weil
12:44 PM Bug #1511 (Closed): fsstress failure with 3 active mds
Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-09-05/653... Josh Durgin
12:40 PM Bug #1510 (Resolved): fsx failure on cfuse
Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-09-05/623:... Josh Durgin

09/05/2011

10:45 AM Bug #1108: Large number of files in a directory makes things grind to a halt
I've just re-created the cluster I was testing this on, and given a 50G lv to store the ceph logs on, so running ever... Damien Churchill

09/02/2011

09:04 PM Cleanup #1499 (Resolved): mds: clean up directory layouts
Rip out all the default_layout stuff and just stick this in the inode_t::layout value. This should remove a lot of a... Sage Weil
03:16 PM Bug #1437: cfuse can't change permissions of a file
Sage Weil

09/01/2011

09:40 PM Bug #1435: mds: loss of layout policies upon mds restart
Seriously, if we just put it in the layout field, this... Sage Weil
09:36 PM Bug #1435: mds: loss of layout policies upon mds restart
Okay, I see at least one problem.. the IFILE lock state isn't sharing the default_file_layout with other nodes. CIno... Sage Weil
03:04 PM Bug #1460 (Resolved): mds: file locks don't work right with 0-length locks
I updated locktest.c; it now fails before my fix and succeeds afterwards. Hurray!
(Also the task that runs it now ac...
Greg Farnum
10:59 AM Bug #1460 (In Progress): mds: file locks don't work right with 0-length locks
Oh, I pushed this patch yesterday since it seems to be working. But I'm leaving this bug open until I can clean up th... Greg Farnum
02:29 PM Bug #1425 (In Progress): mds: stuck in prexlock
Sage Weil
01:15 PM Bug #1467 (Resolved): cfuse crash during fsx workunit
This is just a bad assert, fixed by commit:c8c205fa73078c1ee46152ed860084a272867f5e Sage Weil
11:52 AM Bug #1467 (In Progress): cfuse crash during fsx workunit
This wasn't the OSD reply bug - got this crash again today:... Josh Durgin
12:40 PM Bug #1464 (In Progress): mds crash during shutdown (after trivial_sync workunit on kclient)
Sage Weil
09:36 AM Bug #1472: cfuse hangs with v0.34
I was able to run the active mds in debug mode when a hang occurred. This is the log a few seconds before and after ... Sam Lang
09:29 AM Bug #1472: cfuse hangs with v0.34
Can you 'ceph mds tell 0 dumpcache /tmp/foo' and grep out the inode that the open is blocked on? Sage Weil
09:06 AM Bug #1472: cfuse hangs with v0.34
Again stuck in an open:
(gdb) bt
#0 0x00007f7407a33bac in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/x86_...
Sam Lang
09:04 AM Feature #626: qa: add IOR, rompio, or other parallel workloads suite
IOR depends on mpi. mpich2 is pretty easy to set up (there's a package).
I think an ior task would need to:
- t...
Sage Weil

08/31/2011

09:48 PM Bug #1447 (Resolved): mds: does not validate pool IDs in handle_client_set[dir]layout
Sage Weil
09:45 PM Bug #1437: cfuse can't change permissions of a file
Sam, is this something you can reproduce? All we should need is a client log.. something like '--log-file foo --log-... Sage Weil
06:51 PM Bug #1108 (Closed): Large number of files in a directory makes things grind to a halt
Anything new here? Large directories aren't a part of our qa yet, but when they are this'll come up... Sage Weil
06:23 PM Cleanup #431 (Resolved): mds: clean up inode journaling internal interfaces
Sage Weil
05:41 PM Bug #1318: directories disappear across multiple rsyncs
added a workunit misc/multiple_rsyncs.sh to do a couple rsyncs and make sure no additional files are transfered. src... Sage Weil
04:58 PM Bug #1467 (Closed): cfuse crash during fsx workunit
same, i think this was the MOSDOpReply bug Sage Weil
03:52 PM Bug #1472: cfuse hangs with v0.34

At about the time of that last client hang (_open), I do see these messages in the active mds log:
2011-08-31 17...
Sam Lang
03:28 PM Bug #1472: cfuse hangs with v0.34
At most 20 processes running at any given time (different instances of the same application) from a single client, re... Sam Lang
03:15 PM Bug #1472: cfuse hangs with v0.34
What does the workload look like? Sage Weil
03:12 PM Bug #1472 (In Progress): cfuse hangs with v0.34
Well, so much for that then.
Are these actually new hangs compared to v0.33? Newly-noticed but possibly present be...
Greg Farnum
03:05 PM Bug #1472: cfuse hangs with v0.34
I have verified that this hang is not due to osds crashing. With all osds running, and all pgs active+clean, I still... Sam Lang
02:31 PM Bug #1472: cfuse hangs with v0.34
Well with 3 OSDs down you probably lost access to some objects?
It probably shouldn't hang all other requests on t...
Greg Farnum
02:26 PM Bug #1472: cfuse hangs with v0.34
Only 3 osds crashed though. It seems like there should be other PGs on other osds that are still accessible, unless ... Sam Lang
02:14 PM Bug #1472 (Duplicate): cfuse hangs with v0.34
Yeah, this is probably due to dead OSDs, so the client's unable to find anywhere to read the data from and is just wa... Greg Farnum
01:34 PM Bug #1472: cfuse hangs with v0.34
FYI: These hangs may have just been caused by osd failures (see #1473). I will update if this issue persists. Sam Lang
12:10 PM Bug #1472 (Can't reproduce): cfuse hangs with v0.34
I see hangs with cfuse that appear to be at random (random requests to servers). Here are the backtraces of some cfu... Sam Lang
09:19 AM Bug #1367 (Resolved): cfuse and mon crash after dbench
Sage Weil

08/30/2011

11:20 PM Bug #1467 (Resolved): cfuse crash during fsx workunit
Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-08-30/276... Josh Durgin
11:04 PM Bug #1464 (Can't reproduce): mds crash during shutdown (after trivial_sync workunit on kclient)
Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-08-30/293... Josh Durgin
02:32 PM Bug #1460 (Resolved): mds: file locks don't work right with 0-length locks
Right now it just doesn't handle them properly. See, eg ... Greg Farnum
01:25 PM Bug #1456 (Resolved): cfuse: crash in snaptest2 during full snaps run
Sage Weil

08/29/2011

01:51 PM Bug #1456 (Resolved): cfuse: crash in snaptest2 during full snaps run
cfuse seems to be failing on master with the following config:
roles:
- - mon.0
- mds.0
- osd.0
- - mon.1
...
Samuel Just
09:17 AM Bug #1367 (In Progress): cfuse and mon crash after dbench
ok, just hit the top one after 35 runs. Sage Weil

08/25/2011

09:19 PM Bug #1444 (Resolved): client: crash on flush completion under blogbench
Sage Weil
08:48 AM Bug #1444 (Resolved): client: crash on flush completion under blogbench
... Sage Weil
09:19 PM Bug #1391 (Resolved): client: crash on std::string in insert_trace()
Sage Weil
03:57 PM Bug #1318: directories disappear across multiple rsyncs
Looking at these symptoms again, I wonder if this could have been a result of the path_traverse changes we were makin... Greg Farnum
01:08 PM Bug #1446 (Resolved): cephfs: pool option doesn't work
Fixed in commit:65b30507590e9ef47623b7bfe1e672aba01ce823 Greg Farnum
09:14 AM Bug #1446 (Resolved): cephfs: pool option doesn't work
While testing the pool layout option, it's accepted, but reading back the pool it's still located in pool 0.
This ...
Greg Farnum
01:01 PM Bug #1405 (Resolved): cephfs: shouldn't have to specify all layout options
Fixed in userspace commit:b8267492551f1adc5e0079a670b20f6180de18f0
and kernel client commit:7c296cadd05d28329e595b...
Greg Farnum
08:14 AM Bug #1405: cephfs: shouldn't have to specify all layout options
The in-kernel code rejects any layout that doesn't set the stripe unit (and if you set the object_size it makes sure ... Greg Farnum
12:51 PM Feature #1448 (Resolved): test hadoop on sepia
- set it up on some sepia nodes (8?)
- do some basic testing of ceph vs hdfs
from doug cutting:...
Sage Weil
12:45 PM Bug #1368 (Can't reproduce): mds crash after blogbench on cfuse
Sage Weil
12:45 PM Bug #1367 (Can't reproduce): cfuse and mon crash after dbench
Sage Weil
10:01 AM Bug #1447 (Resolved): mds: does not validate pool IDs in handle_client_set[dir]layout
Yep, there's no checking that they're valid mds data pools or even that they exist! Greg Farnum

08/24/2011

10:27 PM Bug #1405: cephfs: shouldn't have to specify all layout options
This is going to be related to #1446, obviously. I'll take both. Greg Farnum
05:43 PM Bug #1391: client: crash on std::string in insert_trace()
Sage Weil
03:11 PM Bug #1391: client: crash on std::string in insert_trace()
Hmm, any other hints on what workloads might trigger this? I'm not getting anything from valgrind or my workloads.
...
Sage Weil
11:23 AM Bug #1391 (New): client: crash on std::string in insert_trace()
Reopening this... Sage Weil
03:10 PM Bug #1442 (Resolved): client: non-empty ObjectSet on last inode->put()
fixed by commit:f2381f97dea9f3563897857c0a0482281b449b61 Sage Weil
12:56 PM Bug #1442 (Resolved): client: non-empty ObjectSet on last inode->put()
commit:e9b739f8dd39f3373dd0869a0fd5436350e1e3f3... Sage Weil
01:12 PM Bug #1429 (Resolved): cfuse assert failed assert(diri->dn_set.size() < 2)
fixed by commit:6c6fa6dffddb6f388d03ca59e95844ddf845f491 Sage Weil

08/23/2011

11:33 AM Bug #1437 (Can't reproduce): cfuse can't change permissions of a file
I've hit a case where I cannot change the permissions of a script to 755.
> chmod 777 ./extract_full.sh
> ls -...
Sam Lang
10:33 AM Bug #1391: client: crash on std::string in insert_trace()

I've been seeing a segfault in a similar spot regularly, but its been hard to reproduce. The segfault is always in...
Sam Lang

08/22/2011

04:12 PM Bug #1435 (Resolved): mds: loss of layout policies upon mds restart
Cluster running ceph 0.33 + patch to add support for “ceph mds add_data_pool”.
I set up layout policies for variou...
Alexandre Oliva
04:09 PM Bug #1433 (Resolved): mds: assert in path_traverse
Fixed by commit:b03a1841b4b08c82fa37a45dc31a0c0255949235 Greg Farnum
03:10 PM Bug #1433: mds: assert in path_traverse
Yep! Need it to wait in that case. Pushing as soon as I write some documentation for path_traverse. Greg Farnum
02:46 PM Bug #1433: mds: assert in path_traverse
Looks like it's bailing out because another client is holding a lock, so the (existing) null dentry isn't readable. R... Greg Farnum
01:44 PM Bug #1433 (Resolved): mds: assert in path_traverse
While testing my teuthology lock test: ... Greg Farnum
03:16 PM Bug #1366 (Can't reproduce): mds segfault
Sage Weil
02:05 PM Bug #1428 (Resolved): MDS: Load and pin stray dirs in memory
Sage Weil
10:28 AM Bug #1428 (Resolved): MDS: Load and pin stray dirs in memory
MDCache::populate_mydir() already does some of this; we also need to it load each frag and set the STICKY flag on the... Greg Farnum
01:26 PM Bug #1429: cfuse assert failed assert(diri->dn_set.size() < 2)
Hopefully -- we'll have to reproduce with logging and check it out in more detail. My concern is that it may be revea... Greg Farnum
01:23 PM Bug #1429: cfuse assert failed assert(diri->dn_set.size() < 2)
Sounds like an easy fix. For dirs it should just unlink the old link in insert_trace (or whatever it is). Sage Weil
01:19 PM Bug #1429: cfuse assert failed assert(diri->dn_set.size() < 2)
There's probably something wonky going on with the way the client is handling moved directories -- that assert is bec... Greg Farnum
11:28 AM Bug #1429 (Resolved): cfuse assert failed assert(diri->dn_set.size() < 2)
This assertion happens when a directory is moved on one client, and then the other client changes to that directory. ... Sam Lang

08/21/2011

09:34 PM Bug #1425 (Resolved): mds: stuck in prexlock
See mds.a.log on sepia78.
- setattr request starts locking
- auth_pins auth stuff
- rdlocks parent dirs, does no...
Sage Weil

08/19/2011

09:14 PM Bug #1367: cfuse and mon crash after dbench
nuked and unlocked nodes, nothing useful there. Sage Weil
04:38 PM Bug #1417 (Resolved): mds: failed assert on xlock
Well, I hit a path_traverse bug instead. I'm going to mark this particular one as resolved unless it pops up again. Greg Farnum
04:26 PM Bug #1417: mds: failed assert on xlock
Testing that fix I worked out with Sage. Greg Farnum
02:33 PM Bug #1417: mds: failed assert on xlock
Okay:
1) dispatch client1 request, gets xlock on filelock (lock_xlock)
2) early_reply to client1 request, which cal...
Greg Farnum
09:17 AM Bug #1417 (Resolved): mds: failed assert on xlock
... Greg Farnum

08/18/2011

11:30 AM Bug #1405: cephfs: shouldn't have to specify all layout options
And you also need to specify these even if you only want to set the pool. :( Greg Farnum
11:02 AM Bug #1405 (Resolved): cephfs: shouldn't have to specify all layout options
Right now, you need to specify all layout options in cephfs (of the stripe unit, stripe count, and block size, anyway... Greg Farnum

08/17/2011

03:58 PM Bug #1389 (Resolved): re-created snapshot gets removed by mds journal replay
fixed by commit:d60d5319ad5d6674488cab96b4a452ff553e779b Sage Weil
03:26 PM Bug #1399: mds crash
replay crash looks like the one fixed in commit:8c5e7dcf8cf7f3daa65eb9905, yay! Sage Weil
02:17 PM Bug #1399 (Resolved): mds crash
I'm not sure I can reproduce the second (replay) crash. Sam, next time you see one of these, please capture a replay... Sage Weil
09:12 AM Bug #1399 (In Progress): mds crash
original crash is fixed by commit:e98669ea69059e26e0c4aa72c46e0be5bfc96386 Sage Weil
08:07 AM Bug #1399: mds crash
Hmmm. If ... Greg Farnum
07:53 AM Bug #1399: mds crash
As for the original error, it does seem reproducible by creating a snapshot of a directory using the mkdir system cal... Sam Lang
07:28 AM Bug #1399: mds crash
I removed the assertion: assert(in->is_head());
That allowed the mds servers to restart and complete recovery, and...
Sam Lang
02:16 PM Bug #1393 (Resolved): cfuse failed 3 pjd tests
Sage Weil

08/16/2011

09:58 PM Bug #1399: mds crash
Sam, do you still have this cluster? Can you restart the mds with debug mds = 20 and attach the resulting log? There... Sage Weil
03:11 PM Bug #1399 (Resolved): mds crash
After running successfully with one active mds and two standbys, the active mds has crashed, and on restart, it crash... Sam Lang

08/15/2011

01:31 PM Cleanup #1307 (Closed): client cleanup
Sage Weil
09:58 AM Feature #1398 (New): qa: multiclient file io test
test read/write consistency across clients.
i thinking:
- teuthology task gets list of client names (or uses all...
Sage Weil
09:49 AM Bug #1391 (Can't reproduce): client: crash on std::string in insert_trace()
It's not clear from code inspection where this might be coming from, unless there is general heap corruption. If you... Sage Weil

08/12/2011

10:34 AM Bug #1390 (Resolved): MDS crash in function 'bool Locker::issue_caps(CInode*, Capability*)', in t...
pushed to next/master, will be in v0.33. Sage Weil
04:01 AM Bug #1390: MDS crash in function 'bool Locker::issue_caps(CInode*, Capability*)', in thread '0x7f...
It's looking good, cluster has started up okay, no metadata crashes :-) Damien Churchill

08/11/2011

04:36 PM Bug #1393 (Resolved): cfuse failed 3 pjd tests
Teuthology results are in teuthology:~teuthworker/archive/full_suite_coverage_20110811/35/teuthology.log
Nodes sepia...
Josh Durgin
03:51 PM Bug #1390: MDS crash in function 'bool Locker::issue_caps(CInode*, Capability*)', in thread '0x7f...
Will it be safe to cherry-pick this onto 0.32? Else I can try packaging up that branch and deploying it! Damien Churchill
03:23 PM Bug #1390: MDS crash in function 'bool Locker::issue_caps(CInode*, Capability*)', in thread '0x7f...
Pushed commit:26871eff1740d6ec5b9b287bf47e098db913fb27 (branch wip-needissue) that should fix this. Can you let me k... Sage Weil
03:36 AM Bug #1390 (Resolved): MDS crash in function 'bool Locker::issue_caps(CInode*, Capability*)', in t...
Yesterday I was playing about with snapshots, which seemed to expose a bug in btrfs (delayed_inode.c) which I have si... Damien Churchill
12:54 PM Bug #1391 (Resolved): client: crash on std::string in insert_trace()
Random cfuse client crash. Sorry I don't have a core file for this. It only happened on
*** Caught signal (Segm...
Sam Lang
08:24 AM Bug #1389: re-created snapshot gets removed by mds journal replay
I'm pretty sure I got it with both, before I understood why my snapshots were disappearing. Once I did, I only teste... Alexandre Oliva

08/10/2011

06:15 PM Bug #1389: re-created snapshot gets removed by mds journal replay
Is this with the kernel or fuse client? Sage Weil
06:10 PM Bug #1389 (Resolved): re-created snapshot gets removed by mds journal replay
Say you enter a snapshot pseudo-dir then run:
mkdir test
rmdir test
mkdir test
then restart the mds, and run:...
Alexandre Oliva
11:27 AM Bug #1360: mds crash during pjd workunit on cfuse
Machines are nuked and unlocked. Josh Durgin

08/09/2011

12:40 PM Bug #1368: mds crash after blogbench on cfuse
unlocked the nodes Sage Weil
12:38 PM Bug #1368: mds crash after blogbench on cfuse
The crash was on shutdown. Have core file but gitbuilder binaries were expired.
Running in a loop to reproduce.
Sage Weil
 

Also available in: Atom