Project

General

Profile

Activity

From 12/28/2014 to 01/26/2015

01/26/2015

04:44 PM Bug #4920: client: does not respect O_NOFOLLOW
Hmm, I don't remember where this came from but it might have been somebody cross-compiling for OS X. As marked, it's ... Greg Farnum
03:42 PM Bug #4920: client: does not respect O_NOFOLLOW
O_SYMLINK sounds like BSD extension to me. We really want it?
Anyway, the Client::open() lacks support for O_NOFOLLO...
Radoslaw Zarzynski
04:41 PM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Is this in the kernel tree, and is there something we should do on the server side as well? Greg Farnum
09:08 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
I had our cluster running under full load for some days and the bug still did not appear. Markus Blank-Burian

01/23/2015

11:27 PM Bug #10620: TestFlush fails on formatting mistake
ceph-qa-suite commit:41a99f58ccbbc09ee7b5355e598c426cef1d8775 Greg Farnum
09:22 PM Bug #10620 (Resolved): TestFlush fails on formatting mistake
... Greg Farnum
11:04 PM Feature #10627: teuthology: qa: enable Samba runs on RHEL
Apparently Fedora *just* enabled the Ceph VFS and RHEL doesn't, so not until we push on those.
And even then we'll...
Greg Farnum
10:48 PM Feature #10627: teuthology: qa: enable Samba runs on RHEL
i wonder if we can run upstream samba now? Sage Weil
10:43 PM Feature #10627 (New): teuthology: qa: enable Samba runs on RHEL
We need a gitbuilder and some teuthology installation changes. We can get other people to make those, but this umbrel... Greg Farnum
10:05 PM Bug #10624 (Resolved): hung samba tests after lsof failure
http://pulpito.ceph.com/teuthology-2015-01-21_23:14:02-samba-master-testing-basic-multi/717177/, and in one other rec... Greg Farnum
09:54 PM Bug #9994: ceph-qa-suite: nfs mount timeouts
http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-21_23:10:01-knfs-master-testing-basic-multi/717164/
Ubuntu ...
Greg Farnum

01/22/2015

09:42 AM Bug #10550: Assertion in ceph-mds on failures in ::init
Proposal of the patch attached. Radoslaw Zarzynski

01/21/2015

03:09 PM Bug #10579 (Resolved): ceph-qa-suite: can't run quota tests on kernel client
Thanks! Greg Farnum
02:41 PM Bug #10579 (Fix Under Review): ceph-qa-suite: can't run quota tests on kernel client
https://github.com/ceph/ceph/pull/3436
https://github.com/ceph/ceph-qa-suite/pull/308
John Spray
02:19 PM Bug #10550: Assertion in ceph-mds on failures in ::init
I am working on this. Radoslaw Zarzynski

01/20/2015

08:07 PM Bug #10416 (Resolved): quota test failures
That appears to have been the issue. Greg Farnum
07:29 PM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
ObjectCacher::file_write did hang about 3 hours... Zheng Yan
01:54 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Up until now, I could not reproduce the bug with all the patches applied. But I stumpled across bug #5429 which is ma... Markus Blank-Burian

01/19/2015

10:26 PM Feature #10498 (Fix Under Review): ObjectCacher: order wakeups when write calls block on throttling
Zheng Yan
10:31 AM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-14_23:04:01-fs-master-testing-basic-multi/704465/
We've see...
Greg Farnum
02:20 PM Bug #10361: teuthology: only direct admin socket commands to active MDS
Filesystem.get_active_names is in master now! John Spray
10:49 AM Feature #9883 (Resolved): journal-tool: smarter scavenge (conditionally update dir objects)
Merged in commit:c15f2d5056dedabfb7761b74c8622c61ab1f8477! :) Greg Farnum
10:36 AM Bug #10579 (Resolved): ceph-qa-suite: can't run quota tests on kernel client
The kclient doesn't support fs quotas. Right now we do so, because the quota test is in ceph/qa/workunits/misc, and t... Greg Farnum
07:26 AM Bug #10413: samba: coredumps after tests run
Sent to Samba upstream here: https://lists.samba.org/archive/samba-technical/2015-January/104851.html Ken Dreyer
07:25 AM Bug #10412: samba: failed smbtorture run
-Sent to samba upstream here: https://lists.samba.org/archive/samba-technical/2015-January/104851.html-
(EDIT: sor...
Ken Dreyer
05:58 AM Feature #10390 (In Progress): mds: throttle deletes in flight
John Spray
04:17 AM Feature #10388 (In Progress): Add MDS perf counters for stray/purge status
John Spray

01/15/2015

10:26 PM Bug #10368: Assertion in _trim_expired_segments
one possible cause:
there are more than one contexts in mdlog's wait_for_safe list. there is a context executed af...
Zheng Yan
07:52 AM Bug #10368: Assertion in _trim_expired_segments

Had no joy reproducing with mds_auto_repair more recently (on diff. test nodes), but did manage on a vstart cluster...
John Spray
09:53 PM Bug #10552 (Resolved): ceph-fuse: failed bufferlist assert when reading xattrs
Thanks Zheng. Merged to master in commit:a60b815c85f07d7ac10b6d23fd1618c6334ddcf7. Greg Farnum
06:25 PM Bug #10552 (Fix Under Review): ceph-fuse: failed bufferlist assert when reading xattrs
triggered by new test case Zheng Yan
10:53 AM Bug #10552: ceph-fuse: failed bufferlist assert when reading xattrs
That assert is from 2013, by the way; I'm not sure what we might have changed to break it. :/ Greg Farnum
10:52 AM Bug #10552 (Resolved): ceph-fuse: failed bufferlist assert when reading xattrs
... Greg Farnum
07:00 PM Bug #10415 (Resolved): libcephfs: test failures
Zheng Yan
06:53 PM Feature #10504: kclient: include client version in client_metadata
pushed a kernel patch to testing branch Zheng Yan
05:54 PM Bug #10413: samba: coredumps after tests run
samba guys proposes a fix on their side, the new fix should go into upstream soon Zheng Yan
11:04 AM Bug #9994: ceph-qa-suite: nfs mount timeouts
This continues to be a problem; e.g. http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-11_23:10:01-knfs-next-tes... Greg Farnum
10:20 AM Bug #10550 (Resolved): Assertion in ceph-mds on failures in ::init

init() can call suicide in various error handling paths before it starts progress thread, but in suicide() it tries...
John Spray

01/14/2015

03:32 PM Bug #10539: Error EINVAL: all MDS daemons must be inactive before removing filesystem
thanks for the quick fix ! Loïc Dachary
02:54 PM Bug #10539 (Resolved): Error EINVAL: all MDS daemons must be inactive before removing filesystem
Merged to ceph-qa-suite master in commit:6fa29f6f19331df59d15e7470b81f4cf5f0e9689 Greg Farnum
02:07 PM Bug #10539 (Fix Under Review): Error EINVAL: all MDS daemons must be inactive before removing fil...

My fault!
https://github.com/ceph/ceph/pull/3372
John Spray
11:50 AM Bug #10539 (Resolved): Error EINVAL: all MDS daemons must be inactive before removing filesystem
*make check* sometimes fails on "test/cephtool-test-mds.sh":http://workbench.dachary.org/ceph/ceph/blob/master/src/te... Loïc Dachary
01:54 PM Bug #10542 (Resolved): ceph-fuse cap trimming fails with: mount: only root can use "--options" op...

This appears to be from:...
John Spray
02:47 AM Bug #10382: mds/MDS.cc: In function 'void MDS::heartbeat_reset()
This will need backport to giant John Spray
02:44 AM Bug #10382 (Fix Under Review): mds/MDS.cc: In function 'void MDS::heartbeat_reset()
https://github.com/ceph/ceph/pull/3370 John Spray

01/13/2015

01:28 PM Bug #10381 (Resolved): health HEALTH_WARN mds ceph239 is laggy
Whoops, this fell through the cracks.
Anyway, the MDS map has pool 0 set to use for data, but the OSDMap doesn't h...
Greg Farnum
07:00 AM Bug #10387 (Resolved): ceph-fuse doesn't release capability on deleted directory
Zheng Yan
06:43 AM Bug #10387: ceph-fuse doesn't release capability on deleted directory
I wouldn't bother backporting: the symptom is just some dir objects that don't get cleaned up until next client unmou... John Spray

01/12/2015

08:06 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
... Zheng Yan
07:30 PM Bug #10387: ceph-fuse doesn't release capability on deleted directory
do we need to backport this. the directory will be deleted when all of its child dentries get trimmed from the cache Zheng Yan
03:26 PM Bug #10387 (Pending Backport): ceph-fuse doesn't release capability on deleted directory
This was merged to master in commit:e9a29c58b6d40933afbf711afcf7b872b4005585; does it need a backport? Greg Farnum
03:14 PM Bug #10413: samba: coredumps after tests run
We talked about this in standup; Zheng is going to send it to the Samba guys. Greg Farnum
11:31 AM Bug #10503 (Resolved): ceph-fuse: quota code is not 32-bit safe for vxattr output
Fix merged to master in commit:c219c43cc2943c794378214d77566e3f0d3f394a Greg Farnum
11:27 AM Bug #10416: quota test failures
Pushed a config change for the cfuse_workunit_misc yaml fragment to master in commit:c3eee83fb09a8f37faa71fa8fa78b632... Greg Farnum
11:09 AM Bug #10416 (In Progress): quota test failures
....no. I'll fix that and we'll see if things work better, thanks! Greg Farnum

01/11/2015

11:35 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
please remove the sudo before /home/ubuntu/cephtest/fsx-mpi. Otherwise rank of all processes will be zero Zheng Yan
10:38 PM Bug #10416: quota test failures
The configure log is look like below:
overrides:
admin_socket:
branch: master
ceph:
conf:
...
Yunchuan Wen

01/09/2015

07:58 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
direct io failure is fixed by "ceph: fix reading inline data when i_size > PAGE_SIZE" Zheng Yan
04:18 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
the directio failure is caused by the 'reading inline data' changes Zheng Yan
10:47 AM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
Zheng? Greg Farnum
02:51 PM Feature #10504: kclient: include client version in client_metadata
That userspace bit got merged. Greg Farnum
02:48 PM Feature #10504: kclient: include client version in client_metadata
https://github.com/ceph/ceph/pull/3341 for the userspace side! Sage Weil
01:53 PM Feature #10504 (Resolved): kclient: include client version in client_metadata
It would be helpful for debugging. Probably just put it in Client::populate_metadata() for userspace; I imagine the k... Greg Farnum
01:48 PM Bug #10503 (Resolved): ceph-fuse: quota code is not 32-bit safe for vxattr output
From the gitbuilders:... Greg Farnum
11:02 AM Feature #10498 (New): ObjectCacher: order wakeups when write calls block on throttling
The ObjectCacher can block write calls if the dirty data limits are exceeded by sleeping on a cond. Unfortunately, we... Greg Farnum
09:56 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
If I'm understanding my quick skim correctly, this is not "MPI tests are failing" but "this mpi-fsx" test is failing,... Greg Farnum
03:39 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Is this a new failure on the existing cluster, or something that's come up on the new cluster? John Spray

01/08/2015

05:52 PM Feature #1398: qa: multiclient file io test
See #10489 for a related issue. Anonymous
05:50 PM Feature #1398 (Fix Under Review): qa: multiclient file io test
pull request #284 for wip-1398-multiclientio-wusui has been made.
commit: f93e531de6fa69a3ad32117c613094fc0aa0283e
Anonymous
01:16 PM Feature #1398: qa: multiclient file io test
changes look ok, although you can probably just use mnt.0 instead of the gmnt symlink.
also, make sure the -dev p...
Sage Weil
05:18 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Running teuthology using the following yaml demonstrates this problem:... Anonymous
05:15 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Running teuthology using the following yaml demonstrates this problem:... Anonymous
05:05 PM Subtask #10489 (New): Mpi tests fail on both ceph-fuse and kclient
Mpi tests fail on ceph-fuse and kclient. I will post some tests and results that demonstrate this failure. Anonymous
10:40 AM Bug #10041 (Resolved): ceph-fuse: never exit when no MDS server is available
Merged into master as of commit:1b22857f2bc945ad2f5e60a2cc6f9be24d977079. Greg Farnum
06:14 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
ok, i have now a test running with all three kernel patches plus the one mentioned in #10450. Markus Blank-Burian
04:39 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
looks like kernel bug, please try attached kernel patches Zheng Yan
02:07 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Out of 8 crashes, there was only one incident with slow requests in the logs. I added the mds/mon logs for further in... Markus Blank-Burian

01/07/2015

06:59 PM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
there are some file lock related fixes after 3.14. did you see something like below in ceph.log
slow request 121.3...
Zheng Yan
12:28 AM Bug #10041 (Fix Under Review): ceph-fuse: never exit when no MDS server is available
Zheng Yan

01/06/2015

07:18 PM Feature #1398: qa: multiclient file io test
Here's the story so far.
The teuthology tests for mpi testing used a yaml file with the following tasks:...
Anonymous
07:08 PM Bug #10416: quota test failures
the nondeterministic design is nightmare Zheng Yan
11:00 AM Bug #10416: quota test failures
It is not fixed already:
http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-26_23:04:02-fs-master-testing-basic-...
Greg Farnum
06:36 PM Bug #10412 (Resolved): samba: failed smbtorture run
this bug was introduced after giant Zheng Yan
08:18 AM Bug #10412 (Pending Backport): samba: failed smbtorture run
Sage Weil
04:26 AM Bug #10412 (Fix Under Review): samba: failed smbtorture run
Zheng Yan
11:26 AM Bug #10465: Audit and fix ceph-qa-suite exec tasks
Also, I noticed this because http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-26_23:04:02-fs-master-testing-bas... Greg Farnum
11:26 AM Bug #10465 (Resolved): Audit and fix ceph-qa-suite exec tasks
suites/fs/basic/tasks/cfuse_workunit_suites_truncate_delay.yaml executes a dd and a truncate on a file in the local n... Greg Farnum
11:03 AM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
Zheng, we've got a new failing directio test at http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-28_23:08:01-kc... Greg Farnum
10:42 AM Bug #10413: samba: coredumps after tests run
That looks good to me (I guess?), but I don't think putting it into our own samba repo is the right place. IIRC we're... Greg Farnum
07:46 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Regarding the stuck requests .. I have definitely seen one or two stuck mds requests. But I had then thought they wer... Markus Blank-Burian
05:31 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
I think there was a stuck request, which prevented mds from trimming completed_requests Zheng Yan

01/05/2015

11:35 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
push a fix to the testing branch Zheng Yan
03:44 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
I don't think rbd uses anything under fs/ceph, so here's a category where Zheng and Sage are more likely to notice it. Greg Farnum
09:03 PM Feature #1398: qa: multiclient file io test
Here's what appears to be a bunch of problems:... Anonymous
04:05 PM Feature #1398: qa: multiclient file io test
Here's what looks to be the last issue:... Anonymous
06:20 PM Bug #10413 (Fix Under Review): samba: coredumps after tests run
https://github.com/ceph/samba/pull/1 Zheng Yan
05:48 AM Bug #10413 (In Progress): samba: coredumps after tests run
smbd can call cephwrap_{getcwd,chdir,stat} after umount Zheng Yan
03:49 PM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
If this is two separate bugs I'd like to figure out if we can prevent the session_info_t from growing so large pretty... Greg Farnum
03:54 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Funnily enough, I was just noticing this in the code while working on #9883 -- all the MDS table persistence code use... John Spray
01:30 AM Bug #10449 (Resolved): MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message...
The relevant part of the log is the following:... Markus Blank-Burian
03:28 PM Bug #10436: ceph-fuse: snapshot flushing from page cache to Client is not coherent
Just to be clear, the problem here is that
1) there is dirty data in the page cache
2) a snapshot happens
3) ceph-...
Greg Farnum
03:26 PM Bug #10323: lock get stuck in snap->sync state
https://github.com/ceph/ceph/pull/3270 Greg Farnum
03:26 PM Bug #10343: qa/workunits/snaps/snaptest-xattrwb.sh fails
https://github.com/ceph/ceph/pull/3270 Greg Farnum

01/04/2015

11:38 PM Bug #10448 (Resolved): wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() ...
Hi, all
A bug is found in striped_read() of fs/ceph/file.c.
striped_read() calls ceph_zero_pape_vector_range() at...
caifeng zhu

12/28/2014

07:09 PM Bug #10436 (Resolved): ceph-fuse: snapshot flushing from page cache to Client is not coherent
fuse kernel module has no understanding of snapshot. It does not flush data when snapshot is created. Besides the fus... Zheng Yan
06:57 PM Bug #10312 (Fix Under Review): creating snapshot makes parent snapshot lost
Zheng Yan
06:55 PM Bug #10315 (Fix Under Review): set last snapid according to removed snaps in data pools
Zheng Yan
06:54 PM Bug #10323 (Fix Under Review): lock get stuck in snap->sync state
Zheng Yan
06:54 PM Bug #10343 (Fix Under Review): qa/workunits/snaps/snaptest-xattrwb.sh fails
Zheng Yan
06:11 PM Bug #10387 (Fix Under Review): ceph-fuse doesn't release capability on deleted directory
Zheng Yan
 

Also available in: Atom