Activity
From 09/09/2014 to 10/08/2014
10/08/2014
- 11:08 PM Bug #9679: Ceph hadoop terasort job failure
- missing one of these?...
- 10:46 PM Bug #9679: Ceph hadoop terasort job failure
- My bet at this point is on the generation of the input data set. Teragen creates a file with X 100byte entries. When ...
- 07:28 AM Feature #9437 (Resolved): make 'ceph tell mds.* ...' work, deprecate 'ceph mds tell * ...'
- ...
10/07/2014
- 07:28 PM Bug #9692 (Resolved): ACL workunit syntax error
- http://pulpito.ceph.com/gregf-2014-10-06_19:59:42-kcephfs-wip-9628-testing-basic-multi/531900...
- 07:26 PM Bug #9628 (Resolved): mds: race between ms_handle_accept() and ms_handle_reset()
- Merged to master in commit:1b7fae7b2953649564a9e226b4abedad0ce652cc
- 09:54 AM Bug #9679: Ceph hadoop terasort job failure
- https://issues.apache.org/jira/browse/MAPREDUCE-2018
- 09:53 AM Bug #9679: Ceph hadoop terasort job failure
- https://svn.apache.org/repos/asf/hadoop/common/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/terasor...
- 09:39 AM Bug #9679: Ceph hadoop terasort job failure
- Teragen command:
./hadoop/bin/hadoop jar ./hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar t... - 09:22 AM Bug #9679: Ceph hadoop terasort job failure
- Thanks for adding this. What command did you use to generate the input?
- 09:04 AM Bug #9679 (Closed): Ceph hadoop terasort job failure
- Hadoop version: 2.4.1
Ceph version:
ceph --version
ceph version 0.85-986-g031ef05 (031ef0551ebc98d824075558e884... - 07:03 AM Bug #9636 (Duplicate): segfault in CInode::get_caps_allowed_for_client
- 07:02 AM Bug #9562 (Resolved): Lockdep assertion in Filer purge
- Backported to giant:...
- 07:02 AM Bug #8576 (Need More Info): teuthology: nfs tests failing on umount
10/06/2014
- 06:27 PM Bug #9674: nightly failed multiple_rsync.sh
- rsync return codes aren't standard error codes. The man page says that 23 means...
- 05:59 PM Bug #9674: nightly failed multiple_rsync.sh
- #define ENFILE 23 /* File table overflow */
maybe we should adjust ulimit - 02:23 PM Bug #9674 (Resolved): nightly failed multiple_rsync.sh
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-03_23:04:01-fs-giant-distro-basic-multi/527949/...
10/03/2014
- 02:50 PM Feature #9659 (Duplicate): MDS: support cache eviction
- It would be really useful when writing certain kinds of tests (eg, for scrubbing) to be able to know that a particula...
- 06:52 AM Bug #9636: segfault in CInode::get_caps_allowed_for_client
- looks like it's the same as #9628
10/02/2014
- 02:15 PM Bug #9514 (Resolved): ceph-fuse pjd test is failing in giant nightlies
- Dumpling commit:5f601f099be98c2b061cc94fb06917e7543f3efe
Firefly commit:9fee8de25ab5c155cd6a3d32a71e45630a5ded15
10/01/2014
- 10:42 AM Bug #9636 (Duplicate): segfault in CInode::get_caps_allowed_for_client
While doing ad-hoc killing of clients stuck on full cluster: unchecked dereference of session connection....- 06:18 AM Feature #7317 (In Progress): mds: behave with fs fills (e.g., allow deletion)
- 06:15 AM Feature #9437 (Fix Under Review): make 'ceph tell mds.* ...' work, deprecate 'ceph mds tell * ...'
09/30/2014
- 10:29 AM Bug #9562 (Pending Backport): Lockdep assertion in Filer purge
- This is popping up in Giant as well, which I believe has the new code that was the proximate cause. :)
- 10:27 AM Bug #9514 (Pending Backport): ceph-fuse pjd test is failing in giant nightlies
- In giant as commit:0ea20a668cf859881c49b33d1b6db4e636eda18a.
Needs to go to firefly as well. - 12:08 AM Bug #9628: mds: race between ms_handle_accept() and ms_handle_reset()
- https://github.com/ceph/ceph/pull/2596
- 12:08 AM Bug #9628 (Resolved): mds: race between ms_handle_accept() and ms_handle_reset()
- ceph version 0.85-1003-g3ae673c (3ae673c764a4fac6e554e05722f0179566ed3fb3)
1: (ceph::BackTrace::BackTrace(int)+0x2...
09/29/2014
- 08:18 PM Bug #9562 (Resolved): Lockdep assertion in Filer purge
- 04:43 PM Bug #9341: MDS: very slow rejoin
- John Spray wrote:
> The userspace change and test for this are merged into master. Is the kernel side all done too?... - 01:07 PM Bug #9341: MDS: very slow rejoin
- The userspace change and test for this are merged into master. Is the kernel side all done too?
- 04:33 PM Bug #9514: ceph-fuse pjd test is failing in giant nightlies
- 03:49 PM Bug #9514: ceph-fuse pjd test is failing in giant nightlies
- So here's a question: why does the client (temporarily) remember its ctime as being 2014-09-26 19:22:06.889397, but n...
- 02:58 PM Bug #9514 (In Progress): ceph-fuse pjd test is failing in giant nightlies
- Hah, we got the failure with logs in /a/sage-2014-09-26_17:51:11-smoke-giant-distro-basic-multi/513914
All of the ... - 01:15 PM Bug #8576: teuthology: nfs tests failing on umount
- Trying the sync on Sage's go-ahead. :)
commit:56223ce98b659fe7b25b55161ef8163495f438fc in teuthology. - 10:45 AM Bug #8576: teuthology: nfs tests failing on umount
- Is there any chance that just running a sync on the node prior to trying to "exportfs -au" might prevent this? I'm he...
09/26/2014
- 03:32 PM Bug #8427: ceph-fuse: Dumpling "cache still has 0+1 items, waiting (for caps to release?)" on shu...
- Sage believes this is a bug with readahead that got fixed in subsequent releases.
- 06:51 AM Bug #8427 (Won't Fix): ceph-fuse: Dumpling "cache still has 0+1 items, waiting (for caps to relea...
09/25/2014
- 06:23 PM Feature #541 (Resolved): mds: tempsync
- this is implemented... TSYN and related states
- 05:47 PM Feature #630 (Resolved): release caps on inodes unlinked by other clients
- 05:47 PM Feature #630: release caps on inodes unlinked by other clients
- dup of #5039. already fixed by commit f8a947d92 client: trim deleted inode
- 04:34 PM Bug #9514: ceph-fuse pjd test is failing in giant nightlies
- This hasn't reproduced since we turned on debug logging. :(
But I did see it on a run without any logging: /a/gregf-... - 03:31 AM Bug #9562 (Fix Under Review): Lockdep assertion in Filer purge
- https://github.com/ceph/ceph/pull/2572
- 12:56 AM Bug #9563 (Resolved): kcephfs crash in ceph_mdsc_do_request
- 12:55 AM Bug #9564 (Resolved): kcephfs crash in _nfs4_do_open
- the bug is fixed upstream commit f39c0104 (NFS: remove BUG possibility in nfs4_open_and_get_state). I rebased the tes...
09/24/2014
- 07:47 PM Bug #6613: samba is crashing in teuthology
- Still happening
/a/teuthology-2014-09-22_23:14:01-samba-giant-testing-basic-multi/50607 - 07:43 PM Bug #8427: ceph-fuse: Dumpling "cache still has 0+1 items, waiting (for caps to release?)" on shu...
- /a/teuthology-2014-09-22_19:06:01-fs-dumpling-testing-basic-multi/505408
Grabbed all the logs out of /var/log/ceph... - 02:18 PM Bug #8576: teuthology: nfs tests failing on umount
- https://github.com/ceph/teuthology/pull/336
- 10:00 AM Cleanup #2378 (Resolved): "ceph -s" MDS output is confusing
- We don't print mds status if there's not an FS any more.
09/23/2014
- 06:16 PM Bug #9562: Lockdep assertion in Filer purge
- can we just unlock the PurgeRange/Probe locks before using the objecter?
- 06:21 AM Bug #9562 (In Progress): Lockdep assertion in Filer purge
- 06:21 AM Bug #9562: Lockdep assertion in Filer purge
So I think this bug already existed with the Probe lock, but it was triggered by the new PurgeRange lock, because t...- 02:31 PM Bug #9564: kcephfs crash in _nfs4_do_open
- /a/teuthology-2014-09-22_23:10:02-knfs-giant-testing-basic-multi/506055/teuthology.log
09/22/2014
- 10:43 PM Bug #9563: kcephfs crash in ceph_mdsc_do_request
- the bug came from "ceph: use pagelist to present MDS request data". I force updated the testing branch, please test it.
- 05:04 AM Bug #9563 (Resolved): kcephfs crash in ceph_mdsc_do_request
From serial console:...- 05:08 AM Bug #9564: kcephfs crash in _nfs4_do_open
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-19_23:10:01-knfs-giant-testing-basic-multi/500158/...
- 05:07 AM Bug #9564 (Resolved): kcephfs crash in _nfs4_do_open
- 04:47 AM Bug #9562: Lockdep assertion in Filer purge
- ...
- 04:46 AM Bug #9562 (Resolved): Lockdep assertion in Filer purge
09/21/2014
- 04:26 PM Feature #9557 (Resolved): mds: verify backtrace on fetch_dir
- Verify that the backtrace is valid when we finish fetch_dir. That is, that we would have been able to locate the dir...
09/19/2014
- 09:02 PM Bug #9178 (Resolved): samba: ENOTEMPTY on "rm -rf"
- 03:34 PM Bug #9539 (Resolved): struct PurgeRange in Filer.cc needs lock to protect
- 06:32 AM Bug #9539 (Resolved): struct PurgeRange in Filer.cc needs lock to protect
- send two requests to delete 1000026dfe3.00000067, but no request to 1000026dfe3.00000068...
- 02:47 PM Bug #8576: teuthology: nfs tests failing on umount
- Been playing around with this some.
- 02:47 PM Bug #9177 (Resolved): ceph-fuse: failing MPI mdtest runs
- John fixed this by updating mdtest in ceph-qa-suite as of commit:b1365a80982dba4160e861c28d887b066ca451b6.
- 12:16 PM Feature #9284 (Resolved): mds: warn when clients are not responding to cache pressure
- Merged in giant...
- 09:05 AM Bug #9540 (Rejected): Crash during FS upgrade: assert(o->get_num_ref() == 0)
- Never mind, seems like this was just another manifestation of the original segment reference bug -- giant HEAD is OK.
- 06:37 AM Bug #9540: Crash during FS upgrade: assert(o->get_num_ref() == 0)
- The crash hits at the last ceph.restart (after upgrade from firefly to 83bd3430e3a17b77265e696095904b7a9032d2ee).
... - 06:33 AM Bug #9540 (Rejected): Crash during FS upgrade: assert(o->get_num_ref() == 0)
- ...
09/18/2014
- 06:42 PM Feature #9189 (Resolved): Expose client identifying metadata to MDS, e.g. hostname
- 07:58 AM Feature #9437 (In Progress): make 'ceph tell mds.* ...' work, deprecate 'ceph mds tell * ...'
- 06:16 AM Feature #9477: Handle kclient shutdown with dead network more gracefully
- In the general case (e.g. root filesystem is cephfs) there's nothing we can do: the system can't shut down until the ...
- 05:56 AM Bug #9518 (Resolved): client metadata get lost after mds restart
- ...
- 02:30 AM Bug #9518 (Fix Under Review): client metadata get lost after mds restart
- Well, I also shouldn't have missed it while writing the code :-)
https://github.com/ceph/ceph/pull/2515
09/17/2014
- 11:59 PM Bug #9504 (Duplicate): failed to decode message of type 24 v2: buffer::end_of_buffer
- looks like this is duplicate of #9458
- 08:23 AM Bug #9504 (Duplicate): failed to decode message of type 24 v2: buffer::end_of_buffer
- root@burnupi21:~# less /var/log/upstart/ceph-mds-ceph_burnupi21.log
... - 08:46 PM Bug #9518: client metadata get lost after mds restart
- Dur, shouldn't have missed that in review. :(
- 07:44 PM Bug #9518 (Resolved): client metadata get lost after mds restart
- 03:05 PM Bug #9514 (Resolved): ceph-fuse pjd test is failing in giant nightlies
- commit:0ea20a668cf859881c49b33d1b6db4e636eda18a
http://qa-proxy.ceph.com/teuthology/sage-2014-09-14_18:23:49-smoke... - 05:48 AM Feature #9189: Expose client identifying metadata to MDS, e.g. hostname
- Userspace part merged:...
- 05:47 AM Feature #9375 (Resolved): Send single 'many clients' health warning instead of N warnings for N c...
- ...
09/16/2014
- 05:37 PM Fix #9435 (Resolved): prevent use of cache pools as metadata or data pools
- Merged into giant branch in commit:eb1b2e0072bf605095f4104c2b6c2abfba216dbe
- 02:57 AM Fix #9435 (Fix Under Review): prevent use of cache pools as metadata or data pools
- https://github.com/ceph/ceph/pull/2507
- 02:16 PM Feature #9466: kclient: Extend CephFSTestCase tests to cover kclient
- Got these passing at least once by hand using IPMI to work around #9477, suite scheduled:
http://pulpito.front.sep...
09/15/2014
- 02:02 PM Bug #9444 (Resolved): "unmatched rstat" exception after firefly->master upgrade
- if mds_verify_scatter isn't enabled, the MDS will fix rstat mismatch atomically.
- 10:45 AM Bug #9444: "unmatched rstat" exception after firefly->master upgrade
- I think you're right, John. I'm not sure why we never saw this before though — Zheng, what changed that we're looking...
- 02:45 AM Bug #9444: "unmatched rstat" exception after firefly->master upgrade
- Is this actually fixed, in the case of filesystems created using old code? It seems like the patch prevents creating...
- 01:04 PM Fix #9435: prevent use of cache pools as metadata or data pools
- First half here: https://github.com/ceph/ceph/tree/wip-9435 (no handling of tiering updates yet)
- 12:47 PM Fix #9435 (In Progress): prevent use of cache pools as metadata or data pools
- 12:32 PM Feature #9477: Handle kclient shutdown with dead network more gracefully
Ah, this *only* happens if I have some dirty state from userspace at the time. In this instance it's my Mount.open...- 11:59 AM Feature #9477 (Closed): Handle kclient shutdown with dead network more gracefully
- ...
- 10:14 AM Bug #9423 (Resolved): failure in client_recovery task
- 10:14 AM Bug #9423: failure in client_recovery task
Fixed merged to giant....- 08:07 AM Bug #9423: failure in client_recovery task
- 09:50 AM Feature #9466 (In Progress): kclient: Extend CephFSTestCase tests to cover kclient
- 03:43 AM Feature #9466: kclient: Extend CephFSTestCase tests to cover kclient
- kclient instrumentation to enable implementing KernelClient::get_global_id (mapping local mount to the ID we see on t...
- 03:38 AM Feature #9466 (Resolved): kclient: Extend CephFSTestCase tests to cover kclient
Currently the mds_client_recovery and mds_client_limits tasks in ceph-qa-suite only run against the fuse client, be...- 08:06 AM Bug #9177: ceph-fuse: failing MPI mdtest runs
- https://github.com/ceph/ceph-qa-suite/pull/140
- 08:04 AM Bug #9177: ceph-fuse: failing MPI mdtest runs
09/12/2014
- 05:40 PM Bug #9427 (Resolved): osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_writte...
- Merged to master in commit:e06f4251ac36503d33f203567ada1b096119ab80.
Immediately cherry-picked to giant in commit:c3... - 11:35 AM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
- For posterity, the manual test procedure for the changes to rewrite that fix this issue:...
- 06:57 AM Bug #9427 (Fix Under Review): osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= las...
- https://github.com/ceph/ceph/pull/2469
- 05:31 PM Bug #9444 (Resolved): "unmatched rstat" exception after firefly->master upgrade
- aha, yep! thanks
- 05:29 PM Bug #9444 (Fix Under Review): "unmatched rstat" exception after firefly->master upgrade
- git cherry-pick da17394941386dab88ddbfed4af2c8cb6b5eb72f
https://github.com/ceph/ceph/pull/2479
- 06:45 AM Bug #9444 (Resolved): "unmatched rstat" exception after firefly->master upgrade
Create filesystem with firefly, then restart system with master binaries plus wip-9427-rewrite....- 01:57 PM Bug #9280 (Resolved): valgrind failures in ceph-fuse
- commit:46bbe30e6895311e4ce5f9cf2dea3438db99188e
- 01:53 PM Fix #9435: prevent use of cache pools as metadata or data pools
- Yes, that's what I'm hoping as well. That's what _check_remove_pool() is; we'd need to add an equivalent for tiering....
- 01:48 PM Fix #9435: prevent use of cache pools as metadata or data pools
- I lean toward setting data pool to the base pool too. I worry about having to stand up so many guard rails, though. ...
- 01:43 PM Fix #9435: prevent use of cache pools as metadata or data pools
- The user pointed out that right now we prevent assigning EC pools to CephFS. I believe this is the result of a user w...
- 01:24 PM Bug #9423: failure in client_recovery task
- Can we pull out the fix so we can merge it and have it run against giant going forward?
09/11/2014
- 01:32 PM Fix #9435: prevent use of cache pools as metadata or data pools
- This conversation is getting split across several mediums, but this shouldn't prevent specifying the use of a base po...
- 01:13 PM Fix #9435: prevent use of cache pools as metadata or data pools
- My vote is NAK on this. THis is exactly what I want to do on my cluster and I this is the only way EC can be used fo...
- 01:07 PM Fix #9435: prevent use of cache pools as metadata or data pools
- Yeah, that's the simple solution. I was also wondering though if we wanted to do something more sophisticated trying ...
- 12:48 PM Fix #9435: prevent use of cache pools as metadata or data pools
- I mean something like this (although I'm not positive I got all the requirements right):...
- 12:39 PM Fix #9435: prevent use of cache pools as metadata or data pools
- would checking the nature of pools during 'fs new' on the monitor and failing if any of the specified pools (data or ...
- 11:30 AM Fix #9435 (Resolved): prevent use of cache pools as metadata or data pools
- From the mailing list...
- 11:50 AM Feature #9437 (Resolved): make 'ceph tell mds.* ...' work, deprecate 'ceph mds tell * ...'
- 09:26 AM Bug #9428 (Resolved): mds: tight mon reconnect loop
- 08:35 AM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
- It doesn't need to be an absolute offset that gets fed to the standby-replay MDS, as long as it can use the informati...
- 06:55 AM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
- So rewriting the truncate_finish part isn't too hard if we want to do that:
https://github.com/ceph/ceph/commit/4ae6... - 03:40 AM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
The history:...- 07:10 AM Bug #9341: MDS: very slow rejoin
- I re-built and re-deployed ceph with fuse patch; re-configured all kernel clients to use fuse client; re-mounted Ceph...
- 06:56 AM Feature #9375 (Fix Under Review): Send single 'many clients' health warning instead of N warnings...
- 06:56 AM Feature #9189 (Fix Under Review): Expose client identifying metadata to MDS, e.g. hostname
09/10/2014
- 10:09 PM Bug #9428 (Resolved): mds: tight mon reconnect loop
- ...
- 10:08 PM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
- wip-mds has hacky workaround
- 09:38 PM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
- ESubtreeMap has an expire_pos field, and we set it in ESubtreeMap::replay() if it is > the current expire pos. I thi...
- 09:30 PM Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
- ...
- 08:52 PM Bug #9427 (Resolved): osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_writte...
- ...
- 07:02 PM Bug #9341: MDS: very slow rejoin
- that patch is for kernel client. here is the patch for ceph-fuse
- 05:23 PM Bug #9341: MDS: very slow rejoin
- Zheng Yan wrote:
> are you using kernel client? If you are, please try the attached patch. I hope it will improve re... - 06:19 AM Bug #9341: MDS: very slow rejoin
- are you using kernel client? If you are, please try the attached patch. I hope it will improve rejoin speed.
- 03:56 PM Bug #9178: samba: ENOTEMPTY on "rm -rf"
- the fix https://github.com/ceph/ceph/pull/2431 hasn't been merged yet
- 01:54 PM Bug #9178: samba: ENOTEMPTY on "rm -rf"
- /a/teuthology-2014-09-08_23:14:02-samba-master-testing-basic-multi/474551/
- 03:15 PM Bug #9423: failure in client_recovery task
- Had seen this previously at http://pulpito.ceph.com/teuthology-2014-09-05_23:04:02-fs-master-testing-basic-multi/4701...
- 01:27 PM Bug #9423: failure in client_recovery task
- http://pulpito.ceph.com/teuthology-2014-09-08_23:04:01-fs-master-testing-basic-multi/474441/
- 01:27 PM Bug #9423 (Resolved): failure in client_recovery task
- ...
- 02:01 PM Bug #8427: ceph-fuse: Dumpling "cache still has 0+1 items, waiting (for caps to release?)" on shu...
- /a/teuthology-2014-09-09_19:06:01-fs-dumpling-testing-basic-multi/475752
I copied the server logs to it - 01:54 PM Bug #9177: ceph-fuse: failing MPI mdtest runs
- /teuthology-2014-09-08_19:06:01-fs-dumpling-testing-basic-multi/473897/
/teuthology-2014-09-08_19:06:01-fs-dumpling-... - 01:54 PM Bug #9280: valgrind failures in ceph-fuse
- /teuthology-2014-09-08_23:04:01-fs-master-testing-basic-multi/474458/
/teuthology-2014-09-08_23:04:01-fs-master-test... - 01:46 PM Bug #8576: teuthology: nfs tests failing on umount
- ...
- 05:58 AM Feature #7316: improve mds state dumps (memory usage, completeness)
- NB as follow up to our new health checks (9282, 9284) we should ensure we add admin socket commands for dumping the s...
09/09/2014
- 07:18 AM Bug #8055 (Can't reproduce): knfs: NFS: nfs4_discover_server_trunking unhandled error -5. Exiting...
- 07:17 AM Bug #7613 (Can't reproduce): mds/MDCache.cc: 216: FAILED assert(inode_map.count(in->vino()) == 0)
- 07:12 AM Bug #8757 (Won't Fix): no need to hold write lock on hardlink's dir while creating anchortable entry
- the anchor table is no more, yay!
- 07:08 AM Bug #8576 (Need More Info): teuthology: nfs tests failing on umount
- 07:07 AM Bug #9280: valgrind failures in ceph-fuse
- 07:07 AM Bug #9341 (Need More Info): MDS: very slow rejoin
- 07:04 AM Bug #5382 (Can't reproduce): mds: failed objecter assert on shutdown
- 03:30 AM Bug #9178: samba: ENOTEMPTY on "rm -rf"
- ...
Also available in: Atom