Activity
From 01/09/2015 to 02/07/2015
02/07/2015
- 10:30 PM Feature #10792 (Resolved): qa: enable thrasher for MDS cluster size (vary max_mds)
Rather than just killing MDSs within a fixed size cluster, have a thrasher that varies max_mds and deactivates MDSs...- 08:38 PM Bug #10744 (In Progress): MDS gets stuck in 'stopping' when strays exist
- have a working patch for this, PR in due course...
- 02:01 PM Bug #10791 (Resolved): ceph mds deactivate sometimes fails silently
I suspect this is due to the targeted MDS sending a beacon at the same moment as the deactivate, which flips the pe...
02/06/2015
- 06:13 AM Bug #10744: MDS gets stuck in 'stopping' when strays exist
- Is it possible your hack to avoid purging anything is busting more than you realize?
- 06:12 AM Bug #10743: mds/MDLog.cc: 259: FAILED assert(!capped) on MDS rank shutdown
- Probably means there's some point where we run shutdown_pass() and assume it succeeds. (That's what calls MDLog::cap().)
02/05/2015
- 03:03 PM Feature #10764: optimize memory usage of MDSCacheObject
- We've scoped this out some before:
#4499, #4500, #4501, #4502, #4503, #4504, #4535 may be of interest. At least keep... - 02:52 PM Feature #10764 (In Progress): optimize memory usage of MDSCacheObject
- the attached file is dump of layouts of various classes.
- 05:32 AM Bug #10620 (Resolved): TestFlush fails on formatting mistake
- Haven't seen any issues.
02/04/2015
- 06:05 PM Bug #10368: Assertion in _trim_expired_segments
- We saw this on our internal lab install as well.
[ubuntu@magna125 ~]$ zless /var/log/ceph/ceph-mds.magna125.log-20... - 05:30 PM Bug #4920 (Resolved): client: does not respect O_NOFOLLOW
- Thanks!
- 09:32 AM Bug #4920: client: does not respect O_NOFOLLOW
- With pleasure. :-)
I corrected the typo in comment and squashed wip-4920 branch into two commits.
PR is available h... - 05:41 AM Bug #4920: client: does not respect O_NOFOLLOW
- Looks good! Can you submit it as a PR against the main Ceph repo? :)
- 01:02 PM Bug #10744: MDS gets stuck in 'stopping' when strays exist
- Yes, I know -- multiple MDSs is what I'm testing here :-)
- 12:36 PM Bug #10744: MDS gets stuck in 'stopping' when strays exist
- looks like you are running multiple MDS. When a MDS gets stuck in 'stopping' state, restart the whole MDS cluster, th...
- 11:13 AM Bug #10744 (Resolved): MDS gets stuck in 'stopping' when strays exist
- The migrate_stray part happens, but we end up stuck like this:...
- 12:32 PM Bug #10737 (Fix Under Review): ceph-fuse permits layout changes to files with data
- https://github.com/ceph/ceph/pull/3617
- 11:10 AM Bug #10743 (Resolved): mds/MDLog.cc: 259: FAILED assert(!capped) on MDS rank shutdown
Hacked MDCache::eval-stray to never purge if my rank >0
Vstart cluster with 2 MDSs
mkdir ALPHA
dmesg> ALPHA/t...- 10:10 AM Bug #10720: MDS: valgrind leaks
- This is a global string constant, which valgrind apparently doesn't understand:...
- 04:12 AM Bug #10740 (Rejected): teuthology: nfs test getting EBUSY on umount
- ubuntu@teuthology:/a/teuthology-2015-02-01_23:10:01-knfs-next-testing-basic-multi/735724
This was previously #8576...
02/03/2015
- 10:17 PM Feature #1398: qa: multiclient file io test
- Hmm. I have a branch with this commit: f93e531de6fa69a3ad32117c613094fc0aa0283e
(same as the one mentioned a few u... - 09:39 PM Bug #10737 (Resolved): ceph-fuse permits layout changes to files with data
- With two data pools having IDs 3 and 4:...
- 06:44 PM Bug #10720: MDS: valgrind leaks
- Merged. Do you think there might be more leaks, John? Your comment was ambiguous. :)
- 11:22 AM Bug #10720: MDS: valgrind leaks
- Here's one: https://github.com/ceph/ceph/pull/3598
- 04:58 AM Bug #10720: MDS: valgrind leaks
- I know there are some OSD leaks, but there are records in the MDS too. Unless I mis-parsed the valgrind logs and they...
- 03:32 AM Bug #10720: MDS: valgrind leaks
- both failures are leaks in OSD
- 06:39 PM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
- It requires some dev work and testing, which might need to be scheduled. It goes in the feature tracker even if it's ...
- 12:18 PM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
- I vote that this is restored as "Bug" instead of "Feature". http://pulpito.ceph.com/loic-2015-01-29_15:41:06-rados-du...
- 03:49 PM Bug #10702: ceph-qa-suite: hung client-recovery task in nightlies
- It's weird that the fusermount -u is failing -- usually if you have an offline MDS and an idle fuse mount and you fus...
- 02:51 AM Bug #10702: ceph-qa-suite: hung client-recovery task in nightlies
- no, the first umount supposed to success. but it fails with error "Device or resource busy".
I think the simplest ... - 02:44 AM Bug #10703: failures in libcephfs-java tests
- ...
02/02/2015
- 10:37 PM Bug #10703: failures in libcephfs-java tests
- Are we supposed to provide any sort of ordering on MClientCap versus MClientRequest messages? Truncate and length iss...
- 01:44 PM Bug #10703: failures in libcephfs-java tests
- looks like the ftruncate failure is caused by out-of-order cap message (mds processes setattr request and cap message...
- 10:33 PM Bug #10702: ceph-qa-suite: hung client-recovery task in nightlies
- Isn't the first umount supposed to fail, and then the "except run.CommandFailedError:" block forcibly cleans it up?
... - 06:53 AM Bug #10702: ceph-qa-suite: hung client-recovery task in nightlies
- it hung at...
- 07:27 PM Bug #10361 (Resolved): teuthology: only direct admin socket commands to active MDS
- 07:18 PM Bug #10720 (Rejected): MDS: valgrind leaks
- http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-30_23:04:02-fs-master-testing-basic-multi/733168/
http://qa-p... - 03:14 PM Bug #4920: client: does not respect O_NOFOLLOW
- I added the support for O_PATH in several places as well as unit test. The commit is on my github. Feel free to comme...
- 12:11 PM Bug #10710: fuse_ll warning
- NB error handling code here should consider #10542
- 10:12 AM Bug #10710 (Resolved): fuse_ll warning
- client/fuse_ll.cc: In function 'void remount_cb(void*)':
warning: client/fuse_ll.cc:767:14: ignoring return value of... - 11:10 AM Bug #10712: TestFlush intermittent failure on scatter_writebehind event
- urgent by default since this causes qa tests to fail
- 11:04 AM Bug #10712: TestFlush intermittent failure on scatter_writebehind event
- http://pulpito.ceph.com/sage-2015-02-01_05:11:51-fs-master-distro-basic-multi/734532/
- 11:04 AM Bug #10712 (Resolved): TestFlush intermittent failure on scatter_writebehind event
The flush code isn't fully deterministic, sometimes the single log segment left behind just has a subtreemap event,...
01/30/2015
- 07:36 PM Bug #10703 (Duplicate): failures in libcephfs-java tests
- http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-28_23:04:02-fs-master-testing-basic-multi/729688/...
- 07:33 PM Bug #10702 (Resolved): ceph-qa-suite: hung client-recovery task in nightlies
- http://pulpito.ceph.com/teuthology-2015-01-28_23:04:02-fs-master-testing-basic-multi/729695/
I gathered the logs f...
01/28/2015
- 10:05 PM Bug #10550 (Resolved): Assertion in ceph-mds on failures in ::init
- Merged to master.
- 05:02 PM Bug #10550: Assertion in ceph-mds on failures in ::init
- I made a pull reuqest via github. Commit log has been improved as well. Direct link: https://github.com/ceph/ceph/pul...
- 12:06 PM Bug #10550: Assertion in ceph-mds on failures in ::init
- Thanks for the patch. Please could you resend that as either a github pull request or if that is not possible, a git...
- 09:02 PM Feature #10679 (New): Add support for the chattr +i command (immutable file)
- To add an additional layer of protection for files I would like to see
support for the chattr +i command added for ... - 05:54 PM Bug #4920: client: does not respect O_NOFOLLOW
- I think it's probably best to merge it all as one pull request; there's nothing urgent here and that way we'll know w...
- 05:41 PM Bug #4920: client: does not respect O_NOFOLLOW
- Thank you for your review. You are right, on Linux the call should not fail with ELOOP if there was O_PATH present in...
- 01:52 AM Bug #10448 (Resolved): wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() ...
01/27/2015
- 11:28 PM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
- Let's do ordered wakeups!
- 02:43 PM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
- 11:13 PM Bug #10449 (Resolved): MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message...
- #10657 is a feature ticket for the warning.
The libceph bug reference to #5429 is being tracked in #8087 and doesn... - 11:06 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
- Created #10649 to track the need to overcome sessiontable size limit, to avoid overloading this ticket.
I agree th... - 04:50 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
- it's a kernel bug. hang request prevents mds from trimming completed_requests in sessionmap. there is nothing to do w...
- 11:11 PM Feature #10657 (Resolved): MDS: warn when clients are not advancing their oldest_client_tid
- There was a bug in the kernel client which prevented it from updating the oldest_client_tid shared with the MDS. This...
- 11:04 PM Bug #10465: Audit and fix ceph-qa-suite exec tasks
- commit:8a5f76d10bc8bf0358f5215ac2aadfb210691507 in master fixes the dd location; there were no other yaml fragments p...
- 10:44 PM Bug #10423 (In Progress): update hadoop gitbuilders
- 10:27 PM Bug #10302 (Resolved): "fsync-tester.sh: line 10: lsof: command not found"
- I actually created another ticket for this a few days ago but forgot, let's keep discussion there: #10600
- 07:28 AM Bug #10302: "fsync-tester.sh: line 10: lsof: command not found"
- This turned up in a test on next branch.
http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-24_23:04:03-fs-next-... - 07:20 PM Bug #10361 (Fix Under Review): teuthology: only direct admin socket commands to active MDS
- https://github.com/ceph/ceph-qa-suite/pull/316
Passed a local test, just running the scrub task against a cluster. - 06:42 PM Bug #10361 (In Progress): teuthology: only direct admin socket commands to active MDS
- Testing a quick fix.
- 05:55 PM Bug #4920: client: does not respect O_NOFOLLOW
- Hmm, just looking at the man page for open() aren't you supposed to perform an open on a symlink with O_NOFOLLOW if y...
- 01:56 PM Bug #4920: client: does not respect O_NOFOLLOW
- I have added support for O_NOFOLLOW in Client::open and extended unittest as well. It works for me. Patch in attachme...
- 05:40 PM Bug #10382 (Resolved): mds/MDS.cc: In function 'void MDS::heartbeat_reset()
- Thanks!
- 05:15 PM Bug #10382: mds/MDS.cc: In function 'void MDS::heartbeat_reset()
- https://github.com/ceph/ceph/pull/3502
- 02:54 PM Bug #10382 (Pending Backport): mds/MDS.cc: In function 'void MDS::heartbeat_reset()
- Can you prepare a backport branch please?
- 11:04 AM Feature #10649 (Resolved): Store MDS tables without size limit
- Created from #10449
Currently MDS tables are written as single objects, in a single writefull, implicitly limited ...
01/26/2015
- 04:44 PM Bug #4920: client: does not respect O_NOFOLLOW
- Hmm, I don't remember where this came from but it might have been somebody cross-compiling for OS X. As marked, it's ...
- 03:42 PM Bug #4920: client: does not respect O_NOFOLLOW
- O_SYMLINK sounds like BSD extension to me. We really want it?
Anyway, the Client::open() lacks support for O_NOFOLLO... - 04:41 PM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
- Is this in the kernel tree, and is there something we should do on the server side as well?
- 09:08 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
- I had our cluster running under full load for some days and the bug still did not appear.
01/23/2015
- 11:27 PM Bug #10620: TestFlush fails on formatting mistake
- ceph-qa-suite commit:41a99f58ccbbc09ee7b5355e598c426cef1d8775
- 09:22 PM Bug #10620 (Resolved): TestFlush fails on formatting mistake
- ...
- 11:04 PM Feature #10627: teuthology: qa: enable Samba runs on RHEL
- Apparently Fedora *just* enabled the Ceph VFS and RHEL doesn't, so not until we push on those.
And even then we'll... - 10:48 PM Feature #10627: teuthology: qa: enable Samba runs on RHEL
- i wonder if we can run upstream samba now?
- 10:43 PM Feature #10627 (New): teuthology: qa: enable Samba runs on RHEL
- We need a gitbuilder and some teuthology installation changes. We can get other people to make those, but this umbrel...
- 10:05 PM Bug #10624 (Resolved): hung samba tests after lsof failure
- http://pulpito.ceph.com/teuthology-2015-01-21_23:14:02-samba-master-testing-basic-multi/717177/, and in one other rec...
- 09:54 PM Bug #9994: ceph-qa-suite: nfs mount timeouts
- http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-21_23:10:01-knfs-master-testing-basic-multi/717164/
Ubuntu ...
01/22/2015
- 09:42 AM Bug #10550: Assertion in ceph-mds on failures in ::init
- Proposal of the patch attached.
01/21/2015
- 03:09 PM Bug #10579 (Resolved): ceph-qa-suite: can't run quota tests on kernel client
- Thanks!
- 02:41 PM Bug #10579 (Fix Under Review): ceph-qa-suite: can't run quota tests on kernel client
- https://github.com/ceph/ceph/pull/3436
https://github.com/ceph/ceph-qa-suite/pull/308 - 02:19 PM Bug #10550: Assertion in ceph-mds on failures in ::init
- I am working on this.
01/20/2015
- 08:07 PM Bug #10416 (Resolved): quota test failures
- That appears to have been the issue.
- 07:29 PM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
- ObjectCacher::file_write did hang about 3 hours...
- 01:54 AM Bug #10449: MDS crashing in "C_SM_Save::finish" due to OSD response "-90 ((90) Message too long"
- Up until now, I could not reproduce the bug with all the patches applied. But I stumpled across bug #5429 which is ma...
01/19/2015
- 10:26 PM Feature #10498 (Fix Under Review): ObjectCacher: order wakeups when write calls block on throttling
- 10:31 AM Feature #10498: ObjectCacher: order wakeups when write calls block on throttling
- http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-14_23:04:01-fs-master-testing-basic-multi/704465/
We've see... - 02:20 PM Bug #10361: teuthology: only direct admin socket commands to active MDS
- Filesystem.get_active_names is in master now!
- 10:49 AM Feature #9883 (Resolved): journal-tool: smarter scavenge (conditionally update dir objects)
- Merged in commit:c15f2d5056dedabfb7761b74c8622c61ab1f8477! :)
- 10:36 AM Bug #10579 (Resolved): ceph-qa-suite: can't run quota tests on kernel client
- The kclient doesn't support fs quotas. Right now we do so, because the quota test is in ceph/qa/workunits/misc, and t...
- 07:26 AM Bug #10413: samba: coredumps after tests run
- Sent to Samba upstream here: https://lists.samba.org/archive/samba-technical/2015-January/104851.html
- 07:25 AM Bug #10412: samba: failed smbtorture run
- -Sent to samba upstream here: https://lists.samba.org/archive/samba-technical/2015-January/104851.html-
(EDIT: sor... - 05:58 AM Feature #10390 (In Progress): mds: throttle deletes in flight
- 04:17 AM Feature #10388 (In Progress): Add MDS perf counters for stray/purge status
01/15/2015
- 10:26 PM Bug #10368: Assertion in _trim_expired_segments
- one possible cause:
there are more than one contexts in mdlog's wait_for_safe list. there is a context executed af... - 07:52 AM Bug #10368: Assertion in _trim_expired_segments
Had no joy reproducing with mds_auto_repair more recently (on diff. test nodes), but did manage on a vstart cluster...- 09:53 PM Bug #10552 (Resolved): ceph-fuse: failed bufferlist assert when reading xattrs
- Thanks Zheng. Merged to master in commit:a60b815c85f07d7ac10b6d23fd1618c6334ddcf7.
- 06:25 PM Bug #10552 (Fix Under Review): ceph-fuse: failed bufferlist assert when reading xattrs
- triggered by new test case
- 10:53 AM Bug #10552: ceph-fuse: failed bufferlist assert when reading xattrs
- That assert is from 2013, by the way; I'm not sure what we might have changed to break it. :/
- 10:52 AM Bug #10552 (Resolved): ceph-fuse: failed bufferlist assert when reading xattrs
- ...
- 07:00 PM Bug #10415 (Resolved): libcephfs: test failures
- 06:53 PM Feature #10504: kclient: include client version in client_metadata
- pushed a kernel patch to testing branch
- 05:54 PM Bug #10413: samba: coredumps after tests run
- samba guys proposes a fix on their side, the new fix should go into upstream soon
- 11:04 AM Bug #9994: ceph-qa-suite: nfs mount timeouts
- This continues to be a problem; e.g. http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-11_23:10:01-knfs-next-tes...
- 10:20 AM Bug #10550 (Resolved): Assertion in ceph-mds on failures in ::init
init() can call suicide in various error handling paths before it starts progress thread, but in suicide() it tries...
01/14/2015
- 03:32 PM Bug #10539: Error EINVAL: all MDS daemons must be inactive before removing filesystem
- thanks for the quick fix !
- 02:54 PM Bug #10539 (Resolved): Error EINVAL: all MDS daemons must be inactive before removing filesystem
- Merged to ceph-qa-suite master in commit:6fa29f6f19331df59d15e7470b81f4cf5f0e9689
- 02:07 PM Bug #10539 (Fix Under Review): Error EINVAL: all MDS daemons must be inactive before removing fil...
My fault!
https://github.com/ceph/ceph/pull/3372- 11:50 AM Bug #10539 (Resolved): Error EINVAL: all MDS daemons must be inactive before removing filesystem
- *make check* sometimes fails on "test/cephtool-test-mds.sh":http://workbench.dachary.org/ceph/ceph/blob/master/src/te...
- 01:54 PM Bug #10542 (Resolved): ceph-fuse cap trimming fails with: mount: only root can use "--options" op...
This appears to be from:...- 02:47 AM Bug #10382: mds/MDS.cc: In function 'void MDS::heartbeat_reset()
- This will need backport to giant
- 02:44 AM Bug #10382 (Fix Under Review): mds/MDS.cc: In function 'void MDS::heartbeat_reset()
- https://github.com/ceph/ceph/pull/3370
01/13/2015
- 01:28 PM Bug #10381 (Resolved): health HEALTH_WARN mds ceph239 is laggy
- Whoops, this fell through the cracks.
Anyway, the MDS map has pool 0 set to use for data, but the OSDMap doesn't h... - 07:00 AM Bug #10387 (Resolved): ceph-fuse doesn't release capability on deleted directory
- 06:43 AM Bug #10387: ceph-fuse doesn't release capability on deleted directory
- I wouldn't bother backporting: the symptom is just some dir objects that don't get cleaned up until next client unmou...
01/12/2015
- 08:06 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
- ...
- 07:30 PM Bug #10387: ceph-fuse doesn't release capability on deleted directory
- do we need to backport this. the directory will be deleted when all of its child dentries get trimmed from the cache
- 03:26 PM Bug #10387 (Pending Backport): ceph-fuse doesn't release capability on deleted directory
- This was merged to master in commit:e9a29c58b6d40933afbf711afcf7b872b4005585; does it need a backport?
- 03:14 PM Bug #10413: samba: coredumps after tests run
- We talked about this in standup; Zheng is going to send it to the Samba guys.
- 11:31 AM Bug #10503 (Resolved): ceph-fuse: quota code is not 32-bit safe for vxattr output
- Fix merged to master in commit:c219c43cc2943c794378214d77566e3f0d3f394a
- 11:27 AM Bug #10416: quota test failures
- Pushed a config change for the cfuse_workunit_misc yaml fragment to master in commit:c3eee83fb09a8f37faa71fa8fa78b632...
- 11:09 AM Bug #10416 (In Progress): quota test failures
- ....no. I'll fix that and we'll see if things work better, thanks!
01/11/2015
- 11:35 PM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
- please remove the sudo before /home/ubuntu/cephtest/fsx-mpi. Otherwise rank of all processes will be zero
- 10:38 PM Bug #10416: quota test failures
- The configure log is look like below:
overrides:
admin_socket:
branch: master
ceph:
conf:
...
01/09/2015
- 07:58 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
- direct io failure is fixed by "ceph: fix reading inline data when i_size > PAGE_SIZE"
- 04:18 PM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
- the directio failure is caused by the 'reading inline data' changes
- 10:47 AM Bug #10448: wrong parameter passed to ceph_zero_pape_vector_range() in striped_read() of fs/ceph/...
- Zheng?
- 02:51 PM Feature #10504: kclient: include client version in client_metadata
- That userspace bit got merged.
- 02:48 PM Feature #10504: kclient: include client version in client_metadata
- https://github.com/ceph/ceph/pull/3341 for the userspace side!
- 01:53 PM Feature #10504 (Resolved): kclient: include client version in client_metadata
- It would be helpful for debugging. Probably just put it in Client::populate_metadata() for userspace; I imagine the k...
- 01:48 PM Bug #10503 (Resolved): ceph-fuse: quota code is not 32-bit safe for vxattr output
- From the gitbuilders:...
- 11:02 AM Feature #10498 (New): ObjectCacher: order wakeups when write calls block on throttling
- The ObjectCacher can block write calls if the dirty data limits are exceeded by sleeping on a cond. Unfortunately, we...
- 09:56 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
- If I'm understanding my quick skim correctly, this is not "MPI tests are failing" but "this mpi-fsx" test is failing,...
- 03:39 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
- Is this a new failure on the existing cluster, or something that's come up on the new cluster?
Also available in: Atom