Project

General

Profile

Activity

From 12/13/2014 to 01/11/2015

01/11/2015

11:35 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
please remove the sudo before /home/ubuntu/cephtest/fsx-mpi. Otherwise rank of all processes will be zero Zheng Yan
10:38 PM Bug #10416: quota test failures
The configure log is look like below:
overrides:
admin_socket:
branch: master
ceph:
conf:
...
Yunchuan Wen

01/09/2015

07:58 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
direct io failure is fixed by "ceph: fix reading inline data when i_size > PAGE_SIZE" Zheng Yan
04:18 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
the directio failure is caused by the 'reading inline data' changes Zheng Yan
10:47 AM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
Zheng? Greg Farnum
02:51 PM Feature #10504: kclient: include client version in client_metadata
That userspace bit got merged. Greg Farnum
02:48 PM Feature #10504: kclient: include client version in client_metadata
https://github.com/ceph/ceph/pull/3341 for the userspace side! Sage Weil
01:53 PM Feature #10504 (Resolved): kclient: include client version in client_metadata
It would be helpful for debugging. Probably just put it in Client::populate_metadata() for userspace; I imagine the k... Greg Farnum
01:48 PM Bug #10503 (Resolved): ceph-fuse: quota code is not 32-bit safe for vxattr output
From the gitbuilders:... Greg Farnum
11:02 AM Feature #10498 (New): ObjectCacher: order wakeups when write calls block on throttling
The ObjectCacher can block write calls if the dirty data limits are exceeded by sleeping on a cond. Unfortunately, we... Greg Farnum
09:56 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
If I'm understanding my quick skim correctly, this is not "MPI tests are failing" but "this mpi-fsx" test is failing,... Greg Farnum
03:39 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Is this a new failure on the existing cluster, or something that's come up on the new cluster? John Spray

01/08/2015

05:52 PM Feature #1398: qa: multiclient file io test
See #10489 for a related issue. Anonymous
05:50 PM Feature #1398 (Fix Under Review): qa: multiclient file io test
pull request #284 for wip-1398-multiclientio-wusui has been made.
commit: f93e531de6fa69a3ad32117c613094fc0aa0283e
Anonymous
01:16 PM Feature #1398: qa: multiclient file io test
changes look ok, although you can probably just use mnt.0 instead of the gmnt symlink.
also, make sure the -dev p...
Sage Weil
05:18 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Running teuthology using the following yaml demonstrates this problem:... Anonymous
05:15 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Running teuthology using the following yaml demonstrates this problem:... Anonymous
05:05 PM Subtask #10489 (New): Mpi tests fail on both ceph-fuse and kclient
Mpi tests fail on ceph-fuse and kclient. I will post some tests and results that demonstrate this failure. Anonymous
10:40 AM Bug #10041 (Resolved): ceph-fuse: never exit when no MDS server is available
Merged into master as of commit:1b22857f2bc945ad2f5e60a2cc6f9be24d977079. Greg Farnum
06:14 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
ok, i have now a test running with all three kernel patches plus the one mentioned in #10450. Markus Blank-Burian
04:39 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
looks like kernel bug, please try attached kernel patches Zheng Yan
02:07 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Out of 8 crashes, there was only one incident with slow requests in the logs. I added the mds/mon logs for further in... Markus Blank-Burian

01/07/2015

06:59 PM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
there are some file lock related fixes after 3.14. did you see something like below in ceph.log
slow request 121.3...
Zheng Yan
12:28 AM Bug #10041 (Fix Under Review): ceph-fuse: never exit when no MDS server is available
Zheng Yan

01/06/2015

07:18 PM Feature #1398: qa: multiclient file io test
Here's the story so far.
The teuthology tests for mpi testing used a yaml file with the following tasks:...
Anonymous
07:08 PM Bug #10416: quota test failures
the nondeterministic design is nightmare Zheng Yan
11:00 AM Bug #10416: quota test failures
It is not fixed already:
http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-26_23:04:02-fs-master-testing-basic-...
Greg Farnum
06:36 PM Bug #10412 (Resolved): samba: failed smbtorture run
this bug was introduced after giant Zheng Yan
08:18 AM Bug #10412 (Pending Backport): samba: failed smbtorture run
Sage Weil
04:26 AM Bug #10412 (Fix Under Review): samba: failed smbtorture run
Zheng Yan
11:26 AM Bug #10465: Audit and fix ceph-qa-suite exec tasks
Also, I noticed this because http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-26_23:04:02-fs-master-testing-bas... Greg Farnum
11:26 AM Bug #10465 (Resolved): Audit and fix ceph-qa-suite exec tasks
suites/fs/basic/tasks/cfuse_workunit_suites_truncate_delay.yaml executes a dd and a truncate on a file in the local n... Greg Farnum
11:03 AM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
Zheng, we've got a new failing directio test at http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-28_23:08:01-kc... Greg Farnum
10:42 AM Bug #10413: samba: coredumps after tests run
That looks good to me (I guess?), but I don't think putting it into our own samba repo is the right place. IIRC we're... Greg Farnum
07:46 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Regarding the stuck requests .. I have definitely seen one or two stuck mds requests. But I had then thought they wer... Markus Blank-Burian
05:31 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
I think there was a stuck request, which prevented mds from trimming completed_requests Zheng Yan

01/05/2015

11:35 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
push a fix to the testing branch Zheng Yan
03:44 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
I don't think rbd uses anything under fs/ceph, so here's a category where Zheng and Sage are more likely to notice it. Greg Farnum
09:03 PM Feature #1398: qa: multiclient file io test
Here's what appears to be a bunch of problems:... Anonymous
04:05 PM Feature #1398: qa: multiclient file io test
Here's what looks to be the last issue:... Anonymous
06:20 PM Bug #10413 (Fix Under Review): samba: coredumps after tests run
https://github.com/ceph/samba/pull/1 Zheng Yan
05:48 AM Bug #10413 (In Progress): samba: coredumps after tests run
smbd can call cephwrap_{getcwd,chdir,stat} after umount Zheng Yan
03:49 PM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
If this is two separate bugs I'd like to figure out if we can prevent the session_info_t from growing so large pretty... Greg Farnum
03:54 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
Funnily enough, I was just noticing this in the code while working on #9883 -- all the MDS table persistence code use... John Spray
01:30 AM Bug #10449 (Resolved): MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message...
The relevant part of the log is the following:... Markus Blank-Burian
03:28 PM Bug #10436: ceph-fuse: snapshot flushing from page cache to Client is not coherent
Just to be clear, the problem here is that
1) there is dirty data in the page cache
2) a snapshot happens
3) ceph-...
Greg Farnum
03:26 PM Bug #10323: lock get stuck in snap->sync state
https://github.com/ceph/ceph/pull/3270 Greg Farnum
03:26 PM Bug #10343: qa/workunits/snaps/snaptest-xattrwb.sh fails
https://github.com/ceph/ceph/pull/3270 Greg Farnum

01/04/2015

11:38 PM Bug #10448 (Resolved): wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() ...
Hi, all
A bug is found in striped_read() of fs/ceph/file.c.
striped_read() calls ceph_zero_pape_vector_range() at...
caifeng zhu

12/28/2014

07:09 PM Bug #10436 (Resolved): ceph-fuse: snapshot flushing from page cache to Client is not coherent
fuse kernel module has no understanding of snapshot. It does not flush data when snapshot is created. Besides the fus... Zheng Yan
06:57 PM Bug #10312 (Fix Under Review): creating snapshot makes parent snapshot lost
Zheng Yan
06:55 PM Bug #10315 (Fix Under Review): set last snapid according to removed snaps in data pools
Zheng Yan
06:54 PM Bug #10323 (Fix Under Review): lock get stuck in snap->sync state
Zheng Yan
06:54 PM Bug #10343 (Fix Under Review): qa/workunits/snaps/snaptest-xattrwb.sh fails
Zheng Yan
06:11 PM Bug #10387 (Fix Under Review): ceph-fuse doesn't release capability on deleted directory
Zheng Yan

12/26/2014

05:01 AM Bug #10336 (Need More Info): hung ffsb test
need log for diagnosing Zheng Yan
04:23 AM Bug #10387 (In Progress): ceph-fuse doesn't release capability on deleted directory
Zheng Yan

12/23/2014

08:16 AM Bug #10415 (Pending Backport): libcephfs: test failures
? Sage Weil

12/22/2014

06:25 PM Bug #10415 (Fix Under Review): libcephfs: test failures
it was hang on umount... Zheng Yan
09:38 AM Bug #10415 (Resolved): libcephfs: test failures
hang on GetOsdCrushLocation test, with no core dumps:
http://pulpito.ceph.com/teuthology-2014-12-17_23:04:01-fs-mast...
Greg Farnum
05:34 PM Bug #10417 (In Progress): snaptest-2.sh is failing
Zheng Yan
09:54 AM Bug #10417 (Resolved): snaptest-2.sh is failing
All we've got is the teuthology log, whose last line is when it's deleting the directory tree (but not the snapshots)... Greg Farnum
02:25 PM Bug #10423 (Closed): update hadoop gitbuilders
They're building "branch-1", whatever that is, and so I suspect they aren't pointing at cephfs-hadoop. :( Greg Farnum
09:58 AM Bug #10414 (Resolved): client: valgrind found uninit value used in conditional
merged fix to master in 547ee783d44b8598bdf30aad191252aaecca0754 Greg Farnum
09:36 AM Bug #10414: client: valgrind found uninit value used in conditional
http://pulpito.ceph.com/teuthology-2014-12-17_23:04:01-fs-master-testing-basic-multi/667065/ Greg Farnum
09:34 AM Bug #10414 (Resolved): client: valgrind found uninit value used in conditional
http://pulpito.ceph.com/teuthology-2014-12-17_23:04:01-fs-master-testing-basic-multi/667063/... Greg Farnum
09:49 AM Bug #10416 (Resolved): quota test failures
The quota tests failed. Maybe this is fixed already?
http://pulpito.ceph.com/teuthology-2014-12-19_23:04:06-fs-mas...
Greg Farnum
09:20 AM Bug #9994: ceph-qa-suite: nfs mount timeouts
teuthology-2014-12-17_23:10:01-knfs-master-testing-basic-multi/667121/
http://pulpito.ceph.com/teuthology-2014-12-17...
Greg Farnum
09:16 AM Bug #10413 (Resolved): samba: coredumps after tests run
These tests had coredumps without corresponding failures in teuthology.log or backtraces in any of the logs. The clie... Greg Farnum
09:08 AM Bug #10412 (Resolved): samba: failed smbtorture run
http://pulpito.ceph.com/teuthology-2014-12-17_23:14:01-samba-master-testing-basic-multi/667136/
I'm not even sure ...
Greg Farnum

12/21/2014

04:53 PM Bug #10381: health HEALTH_WARN mds ceph239 is laggy
Greg Farnum wrote:
> Can you provide the output of "ceph mds dump" and "ceph osd dump"?
>
> It looks like the MDS...
science luo

12/19/2014

02:44 PM Bug #10277 (Pending Backport): ceph-fuse: Consistent pjd failure in getcwd
Let it cook for a bit in master before doing the giant backport. Greg Farnum
11:37 AM Bug #10381: health HEALTH_WARN mds ceph239 is laggy
Can you provide the output of "ceph mds dump" and "ceph osd dump"?
It looks like the MDS is trying to access a poo...
Greg Farnum
09:31 AM Bug #10382: mds/MDS.cc: In function 'void MDS::heartbeat_reset()
I tried to reproduce this today with debug_mds set to 10 and 20, but I wasn't able to reproduce it at that moment.
...
Wido den Hollander
07:00 AM Bug #10382 (In Progress): mds/MDS.cc: In function 'void MDS::heartbeat_reset()
John Spray
07:28 AM Feature #10393 (Rejected): client: remove mount prefix shenanigans for quota
Once we have a better subvolume abstraction in place, replace the ancestor opens that are needed by snaps with someth... Sage Weil
07:26 AM Feature #10392 (Rejected): mds: refactor subvolume vs snaprealm, capture quota trees
Unify the subvolume abstraction that is used by snapshots and quotas. This will make the ownership of files by a quo... Sage Weil
06:53 AM Feature #10390 (Resolved): mds: throttle deletes in flight
make sure the number of deletes due to purging strays are bounded in some way Sage Weil
05:30 AM Feature #10388 (Resolved): Add MDS perf counters for stray/purge status
It would be useful to know how many inodes are currently in the stray folders, waiting to be deleted, and also how ma... John Spray
05:25 AM Bug #10387 (Resolved): ceph-fuse doesn't release capability on deleted directory

I thought this was a repeat of #10164, but then noticed that the dirfrag objects were deleted once the client unmou...
John Spray

12/18/2014

11:56 PM Bug #10382 (Resolved): mds/MDS.cc: In function 'void MDS::heartbeat_reset()
While running a Active/Standby set of MDSes I see this happen quite often when stopping the Active MDS:... Wido den Hollander
10:52 PM Bug #10381 (Resolved): health HEALTH_WARN mds ceph239 is laggy
Hi there.
Today,I runned a script to do some test on my ceph cluster via a cephfs client,include dd/rm/cp files less...
science luo
11:04 AM Fix #10377 (New): Impose lower bound on "mds events per segment"

Currently, if the MDS is started with a small mds_events_per_segment, it can go into a trim/rejournal cycle when fi...
John Spray
07:10 AM Feature #10369 (Resolved): qa-suite: detect unexpected MDS failovers and daemon crashes
Currently some of our tests can be run with standby MDSs, and a failover event might occur without our tests noticing... John Spray
06:05 AM Bug #10368: Assertion in _trim_expired_segments
Reproduce (took two tries to see the crash again, so it's intermittent)... John Spray
05:43 AM Bug #10368 (Resolved): Assertion in _trim_expired_segments
... John Spray
04:18 AM Bug #10361: teuthology: only direct admin socket commands to active MDS
Filesystem::mds_asok would be a good place to put this John Spray

12/17/2014

10:33 PM Bug #10344 (In Progress): qa/workunits/snaps/snaptest-git-ceph.sh fails
Zheng Yan
06:58 PM Bug #10336: hung ffsb test
Logs of passed tests also contains "opendir: No such file or directory" Zheng Yan
06:49 PM Bug #10335 (Resolved): MDS: disallow flush_path and related commands if not active
Zheng Yan
03:34 PM Bug #10335 (Fix Under Review): MDS: disallow flush_path and related commands if not active
https://github.com/ceph/ceph/pull/3198 Greg Farnum
03:19 PM Bug #10335 (In Progress): MDS: disallow flush_path and related commands if not active
Oh duh, right, this is us hitting the standby MDS with a flush request. (Thus the mds.-1 up there.) I guess the admin... Greg Farnum
03:18 PM Bug #10361 (Resolved): teuthology: only direct admin socket commands to active MDS
Right now our flush and scrub asok command tests are directed at mds.a, but sometimes that's not the active MDS. Find... Greg Farnum
02:15 PM Bug #10277: ceph-fuse: Consistent pjd failure in getcwd
Needs test (or at least test description) and some performance numbers. Greg Farnum
02:14 PM Bug #9997 (Resolved): test_client_pin case is failing
Hum, I was thinking that we could backport the simple fix since most users will be on older kernels where it behaves ... Greg Farnum
12:10 AM Bug #9997: test_client_pin case is failing
The fix is buggy, we shouldn't backport it. we should use patches for #10277 instead Zheng Yan
12:07 AM Bug #9997: test_client_pin case is failing
https://github.com/ceph/ceph/pull/3191 Zheng Yan
02:11 PM Feature #7317 (Resolved): mds: behave with fs fills (e.g., allow deletion)
Merged to master in commit:be7b2f8a30b12841d260c4ce486a59ed28953a4f Greg Farnum
05:47 AM Bug #10343: qa/workunits/snaps/snaptest-xattrwb.sh fails
only fuse-client fails Zheng Yan

12/16/2014

11:27 PM Bug #10344 (Resolved): qa/workunits/snaps/snaptest-git-ceph.sh fails
Zheng Yan
11:26 PM Bug #10343 (Resolved): qa/workunits/snaps/snaptest-xattrwb.sh fails
Zheng Yan
01:43 PM Bug #10302 (Resolved): "fsync-tester.sh: line 10: lsof: command not found"
Sage Weil
12:22 PM Bug #10316 (Won't Fix): pool quota no offect for cephfs client
Oh, you mean you got to 1050MB instead of exactly 1GB.
Yeah, that's expected behavior; these are all soft limits. ...
Greg Farnum
12:09 PM Bug #9997 (Pending Backport): test_client_pin case is failing
Can we get a giant backport for this, please? Greg Farnum
11:02 AM Bug #10336 (Can't reproduce): hung ffsb test
http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-14_23:08:01-kcephfs-next-testing-basic-multi/
It looks like...
Greg Farnum
10:56 AM Bug #10335 (Resolved): MDS: disallow flush_path and related commands if not active
... Greg Farnum
06:53 AM Bug #8576 (Resolved): teuthology: nfs tests failing on umount
Haven't seen this since we made those changes! Greg Farnum
01:43 AM Bug #10323 (Resolved): lock get stuck in snap->sync state
... Zheng Yan

12/15/2014

07:19 PM Bug #10316: pool quota no offect for cephfs client
I think the data is already put into Rados, because my ceph cluster and client are rebooted but my data is not lo... wei qiaomiao
11:32 AM Bug #10316: pool quota no offect for cephfs client
I suspect this is just that you're writing into the local page cache (and ceph client userspace cache), rather than t... Greg Farnum
06:48 AM Feature #9883 (In Progress): journal-tool: smarter scavenge (conditionally update dir objects)
John Spray

12/14/2014

11:58 PM Bug #10316 (Won't Fix): pool quota no offect for cephfs client
ceph version:0.87
ceph cluster os:redhat 6
mds num: 1
step:
(1) set pool 'data' max-byte limit is 1G
[root...
wei qiaomiao
11:47 PM Bug #10315 (Resolved): set last snapid according to removed snaps in data pools
handle the case that we create cephfs using existing pools and the existing pools' removed snaps are not empty. Zheng Yan
07:24 PM Bug #10312 (Resolved): creating snapshot makes parent snapshot lost
[root@zhyan-kvm1 testdir]# mkdir dir1
[root@zhyan-kvm1 testdir]# cd dir1
[root@zhyan-kvm1 dir1]# mkdir dir2
[root@...
Zheng Yan
 

Also available in: Atom