Project

General

Profile

Activity

From 07/26/2014 to 08/24/2014

08/24/2014

04:10 PM Feature #8343 (Closed): please enable data integrity checking (by default) / silent data corruption
Loïc Dachary
04:06 PM Bug #8349 (Resolved): env-vs-args unittest is racy
Fixed by https://github.com/ceph/ceph/commit/3230060f07c738383cc1034a99d60d2ad369560f Loïc Dachary
03:32 PM Support #8462: related to integrity of objects
Loïc Dachary
03:12 PM Feature #7238 (Fix Under Review): erasure code : implement LRC plugin
The rados test work (no thrashing). Loïc Dachary
02:57 PM Support #8310 (Closed): Most pgs stuck stale, no osds reporting them, repair ineffective
Loïc Dachary
09:25 AM CephFS Bug #9212 (Won't Fix): mon election delays mds beacon
ubuntu@teuthology:/a/teuthology-2014-08-22_23:04:01-fs-master-testing-basic-multi/444359... Sage Weil
08:36 AM Bug #9211 (Resolved): osdmap blacklist encoding order is nondeterministic
... Sage Weil

08/23/2014

05:00 PM Bug #9203: ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.
fwiw the reproducer hits a crash on firefly, but not emperor or dumpling. A fair bit changed in ceph_test_rados for ... John Spray
03:13 PM Bug #9203: ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.

So it turns out that ceph_test_rados is also crashy on master, as I found when I took my reproducer for this issue ...
John Spray
03:53 PM rbd Bug #9210 (Resolved): osdc/ObjectCacher.cc: 529: FAILED assert(i->empty()) on fencing test shutdown
... Sage Weil
11:50 AM Feature #7238: erasure code : implement LRC plugin
Loïc Dachary
11:25 AM Feature #7238 (Fix Under Review): erasure code : implement LRC plugin
Although thrashing tests using an LRC pool fail, I believe this is due to the size of the pool rather than the plugin... Loïc Dachary
11:29 AM Bug #9209: osd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0)
The same YAML file run against firefly 0.80.5-171-gca3ac90-1trusty instead of master succeeds. Loïc Dachary
11:23 AM Bug #9209 (Resolved): osd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0)
Using ... Loïc Dachary

08/22/2014

06:26 PM rgw Bug #9208 (Resolved): rgw: civetweb does not drain request buffer correctly
When radosgw returns an early error without reading the request content, we need civetweb to drain the buffer so that... Yehuda Sadeh
05:24 PM Subtask #6478 (Rejected): ErasureCode : XOR plugin
This has been obsoleted by the work on the ISA plugin. Loïc Dachary
05:22 PM Feature #7238: erasure code : implement LRC plugin
Fixed a bug that made the plugin incorrectly claiming it could not recover when the last OSD was out, running tests a... Loïc Dachary
03:09 PM Bug #9207 (Resolved): osdc/Objecter.cc: 1074: FAILED assert(op->get_nref() > 1)
ubuntu@teuthology:/var/lib/teuthworker/archive/john-2014-08-22_10:24:47-rados-wip-objecter-testing-basic-multi/441988... Sage Weil
03:04 PM rgw Bug #9206 (Resolved): rgw: cross rgw message headers filtered by apache 2.4
apache 2.4 filters out header fields that have underscores in them. Need to convert underscores into dashes. Yehuda Sadeh
02:52 PM Bug #9205 (Resolved): osd: notify ops reordered
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-21_11:40:02-upgrade:dumpling-x:stress-split-master... Yuri Weinstein
01:23 PM devops Feature #9136 (Resolved): ceph-deploy: use pre-existing ceph.conf
merged commit 2781538 into ceph:master Alfredo Deza
12:44 PM devops Feature #9118 (Fix Under Review): ceph-deploy: Add pre-generated keys to a Monitor
Pull request opened https://github.com/ceph/ceph-deploy/pull/235 Alfredo Deza
12:02 PM Bug #9203: ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.
Does not reproduce very often, but eventually caught in the act with debug turned up.
The oid in the asserting ope...
John Spray
06:39 AM Bug #9203 (Resolved): ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limi...

http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-basic-multi/440722/
http:/...
John Spray
11:28 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
added patches to master that will dump the weak_refs on shutdown Sage Weil
06:32 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-basic-multi/440850/
http://p...
John Spray
06:24 AM Bug #7995 (New): osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
This is happening again:
http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-b...
John Spray
11:15 AM Bug #8736: thrash and scrub combination lead to error
This needs to be prioritized.
Confirmed, logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-21_11:...
Yuri Weinstein
10:19 AM Bug #8985: "[WRN] map e9 wrongly marked me down" in upgrade:dumpling-x-firefly---basic-vps suite
Yuri Weinstein
06:36 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
The stack trace created by the minimal script is different from the one reported above, but it fails at the same poin... Loïc Dachary
05:51 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
The problem does not show if waiting after the object is inserted. It is a race condition.... Loïc Dachary
05:25 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
For the problem to show the file being removed has to be the primary. Loïc Dachary
05:06 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
Even simpler and does not require root privileges... Loïc Dachary
04:56 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
The following reproduces it reliably on my laptop:... Loïc Dachary
03:47 AM Fix #8914 (In Progress): osd crashed at assert ReplicatedBackend::build_push_op
Thanks for the update, will try again :-) Loïc Dachary
02:57 AM CephFS Bug #4545: error creating empty object store. Invalid argument.
i maybe found the problem.
before you mkcephfs,you should ensure the dir(/var/lib/ceph/osd/ceph-0) empty.
once i wr...
cache china
02:32 AM Bug #9202 (Can't reproduce): Performance degradation during recovering and backfilling
From recent test and analysis, we find slow requests mainly happen at 2 patterns during recovering and backfilling.
...
Zhi Zhang

08/21/2014

11:12 PM rgw Feature #8911: RGW doesn't return 'x-timestamp' in header which is used by 'View Details' of Open...
Thanks Luis... actually its a new feature request not a bug. Since we want one to one headers mapping between Swift a... Ashish Chandra
09:11 PM rgw Bug #9201 (Resolved): rgw: bad object with different pool alignment
http://qa-proxy.ceph.com/teuthology/sage-2014-08-21_17:03:27-rgw-master-testing-basic-multi/440046/teuthology.log
...
Yehuda Sadeh
05:28 PM Bug #9153 (Resolved): erasure-code: jerasure_matrix_dotprod segmentation fault due to package upg...
Loïc Dachary
04:55 PM Feature #8147 (Resolved): osd: make split automatically trigger scrub
Sage Weil
04:49 PM Bug #8998 (Resolved): osd: SEGV in OSD::heartbeat()
no backport needed; this happened bc update_osd_stats() was in OSDService but still using hte other dout macro, but f... Sage Weil
04:49 PM rgw Feature #9200 (Resolved): rgw: log civetweb access
Apache has an access log, civetweb has one too, however we need to incorporate it into our logging system. Yehuda Sadeh
04:44 PM CephFS Bug #5762 (Resolved): teuthology: Failed MPI runs lead to a hung test instead of a failure
Sage Weil
03:29 PM Feature #8639: mon: dispatch messages while blocked waiting for IO
Sage Weil
03:29 PM Feature #7516 (Resolved): mon: reweight-by-pg
Sage Weil
03:27 PM Fix #9199 (Resolved): librados: watch linger pings need to verify pg mapping hasn't changed
at the same time, osds might want to push osdmap incrementals to client sessions with watchers to expedite things ... Sage Weil
03:22 PM Feature #9198 (Resolved): librados: notify callback includes gid of notifier
Sage Weil
03:21 PM Feature #9197 (Resolved): librados/osd: notify reply payload
Sage Weil
03:21 PM Fix #9196 (Resolved): librados: watch_check() to synchronous verify we haven't missed notifies
Sage Weil
03:21 PM Fix #9195 (Resolved): librados: issue watch callback on (possibly) missed notifies
Sage Weil
03:20 PM Fix #9194 (Resolved): librados/osd: watch reconnect needs to be exclusive to detect possibly miss...
Sage Weil
03:18 PM Linux kernel client Bug #8806: libceph: must use new tid when watch is resent
the watch resend needs to use a new tid to avoid the dup op detection in the osd. this is how librbd avoids this pro... Sage Weil
02:55 PM Bug #9176 (Pending Backport): mon: leaked MMonGetVersion
Sage Weil
01:08 PM Bug #9176 (Fix Under Review): mon: leaked MMonGetVersion
https://github.com/ceph/ceph/pull/2301 Sage Weil
02:49 PM rgw Bug #9160: rgw failures with 'NoneType' object has no attribute 'get_contents_as_string'
http://pulpito.front.sepia.ceph.com/sage-2014-08-19_15:19:41-rgw-master-testing-basic-multi/435812/
http://pulpito.f...
John Spray
02:43 PM rgw Bug #9160: rgw failures with 'NoneType' object has no attribute 'get_contents_as_string'
http://pulpito.front.sepia.ceph.com/john-2014-08-20_19:21:46-rgw-wip-objecter-testing-basic-plana/438545/ John Spray
01:56 PM Bug #9144 (Pending Backport): filestore: commit triggered during journal replay
Sage Weil
01:21 PM Bug #9193: notify does not return an error code on timeout
https://github.com/ceph/ceph/pull/2302 Sage Weil
01:20 PM Bug #9193 (Resolved): notify does not return an error code on timeout
commit:7c7bf5fee7be397ef141b947f532a2a0b3567b42
There is simply no error code passed back to the caller; the API c...
Sage Weil
01:10 PM Bug #9150: osd/ECBackend.cc: 529: FAILED assert(pop.data.length() == sinfo.aligned_logical_offset...
suspect this and #9135 to be a ghost due to misbehaving underlying fs Sage Weil
01:09 PM Bug #9145 (Resolved): recursive lock of CollectionIndex::access_lock (52)
Sage Weil
12:51 PM Bug #9182 (Need More Info): osd deadlock after ms_handle_reset
Sage Weil
12:50 PM Bug #9181 (Need More Info): Osd: segv in OpTracker::unregister_inflight_op
no log, core isn't giving me good info :( Sage Weil
12:34 PM Bug #8885 (Can't reproduce): SIGABRT in TrackedOp::dump() via dump_ops_in_flight()
Sage Weil
12:09 PM devops Feature #9136 (Fix Under Review): ceph-deploy: use pre-existing ceph.conf
Pull request opened https://github.com/ceph/ceph-deploy/pull/234 Alfredo Deza
12:07 PM devops Bug #9185: incorrect Centos 6.5 fastcgi package
ok, the idle timeout is working fine.. i can pause the radosgw process (kill -STOP) and curl will block for well over... Sage Weil
10:27 AM devops Bug #9185 (In Progress): incorrect Centos 6.5 fastcgi package
Sage Weil
09:52 AM devops Bug #9185: incorrect Centos 6.5 fastcgi package
(09:51:57 AM) sagehm@newdream.net/montreal: mod_fastcgi-2.4.7-1.ceph.el6.x86_64
(09:52:15 AM) sagehm@newdream.net/mo...
Sage Weil
11:43 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Does fio complete eventually? Are there any other hung tasks in dmesg? A task blocking for more than 120 seconds is... Ilya Dryomov
11:38 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
I apply http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/wip-request-fn/linux-image-3.16.0-ceph-00037-g... German Anders
11:37 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Ok, I've applied the "..." with Kernel 3.16.0 and the error continues:
...
Aug 21 14:38:45 mail02-old kernel: [ 7...
German Anders
10:19 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Eric is correct, the fix isn't in 3.16 stable yet, and unfortunately won't be in 3.15 at all - Linus pulled it into h... Ilya Dryomov
10:10 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
The fix looks like it made it into 3.17rc1. I have been testing this kernel since Sunday, and have not triggered the ... Eric Eastman
09:31 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Upgrade to kernel: 3.16.0 and got the same problem:
...
[ 70.858716] Key type ceph registered
[ 70.858800] l...
German Anders
11:18 AM Linux kernel client Bug #9192 (New): krbd: poor read (about 10%) vs write performance
We started testing the 3.17rc1 kernel over the weekend, as it is the only Linus
released kernel that has the fix fo...
Eric Eastman
10:05 AM devops Feature #5773 (In Progress): ceph-deploy: should add more tests to ceph-deploy task
Tamilarasi muthamizhan
09:55 AM CephFS Bug #9152 (In Progress): mds: beacon needs to not take mds_lock
wip-9152 John Spray
09:50 AM CephFS Bug #9177: ceph-fuse: failing MPI mdtest runs
The compiler is spitting out a warning about getcwd -- no evidence that that's what it's actually hitting in this ins... John Spray
08:53 AM CephFS Bug #9177: ceph-fuse: failing MPI mdtest runs
http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-20_23:04:01-fs-next-testing-basic-multi/439228/ Greg Farnum
08:29 AM CephFS Bug #9177: ceph-fuse: failing MPI mdtest runs
How did you track it down to getcwd? If that is the issue there are a bunch of avenues of attack here, and we should ... Greg Farnum
06:31 AM CephFS Bug #9177: ceph-fuse: failing MPI mdtest runs
mdtest has a getcwd call into an unzeroed buffer that it doesn't check the error of. If fuse is failing the getcwd f... John Spray
09:46 AM devops Bug #9190 (Resolved): idle times out do not work on ubuntu precise
This maybe similar to #9185
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-21_08:29:18-upgrade...
Yuri Weinstein
08:26 AM Bug #9188: make check fails for setmaxosd
"make check" is passing on our gitbuilders (http://ceph.com/gitbuilder.cgi). Try updating and running it again? If th... Greg Farnum
02:28 AM Bug #9188 (Rejected): make check fails for setmaxosd
make check fails for setmaxosd. This is after a recent change in setmaxosd behavior to disallow shrinking of OSDs. He... Anand Bhat
06:56 AM CephFS Bug #9151 (In Progress): mds should log/error/warn when segments are NOT getting trimmed
John Spray
05:56 AM CephFS Feature #9189 (Resolved): Expose client identifying metadata to MDS, e.g. hostname

Currently, when doing e.g. a "session ls" on an MDS's admin socket, we get client IDs and IP addresses. It would b...
John Spray
05:35 AM CephFS Bug #9173 (Fix Under Review): Crash in Server::_session_logged

https://github.com/ceph/ceph/pull/2297
John Spray
03:28 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
Missed a step to mention.
before i did a repair on the primary osd; i aslo did a scrub
#:/build/ceph-firefly84/sr...
Dhiraj Kamble
03:17 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
Hi Loic,
please find below the steps to reproduce the issue.
@*#:/build/ceph-firefly84/src# ./ceph -v
*** DEVE...
Dhiraj Kamble
01:09 AM rgw Bug #9155: Swift Subuser - 403 Forbidden - during upload/post
made a comment on your proposed fix. Dhiraj Kamble

08/20/2014

09:02 PM devops Bug #9187 (Resolved): osds down after fresh deploy in master branch of ceph
Sage Weil
09:02 PM devops Bug #9187: osds down after fresh deploy in master branch of ceph
thsi si fixed later today. it was the isa preload thing:
2014-08-20 21:04:58.845739 7f7369af2780 -1 load: jerasur...
Sage Weil
04:37 PM devops Bug #9187 (Resolved): osds down after fresh deploy in master branch of ceph
ceph version 0.84-367-gf71c889
test setup: mira023
ceph-deploy version: 1.5.11
created 4 osds, with a combi...
Tamilarasi muthamizhan
08:48 PM Bug #9180 (Resolved): keyvaluestore: bad op 2563
done, commit:fdbab46852e74d405b5c747da98564a5866ec8a7 . thanks!! Sage Weil
08:07 PM Bug #9180: keyvaluestore: bad op 2563
We need to backport commit c08adbc98ff5f380ecd215f8bd9cf3cab214913c(https://github.com/ceph/ceph/commit/c08adbc98ff5f... Haomai Wang
10:39 AM Bug #9180 (Resolved): keyvaluestore: bad op 2563
... Sage Weil
05:33 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Plugging one of the 520s into a 3Gbit sata port makes no difference either. Mark Kirkwood
04:58 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Updated the bios on the work machine. No difference. Mark Kirkwood
04:08 PM Bug #9153 (In Progress): erasure-code: jerasure_matrix_dotprod segmentation fault due to package ...
preloading jerasure is not enough : the plugin selects another plugin to be loaded depending on the CPU features (jer... Loïc Dachary
03:29 PM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
I still see this error in today's run http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-20_13:52:13-upgrade:dump... Yuri Weinstein
10:07 AM Bug #9153 (Resolved): erasure-code: jerasure_matrix_dotprod segmentation fault due to package upg...
Sage Weil
03:27 PM devops Bug #9185: incorrect Centos 6.5 fastcgi package
fcgi? how does that even enter into it? I thought our work was only with fastcgi?
Is this on teuthology, or cust...
Dan Mick
03:26 PM devops Bug #9185: incorrect Centos 6.5 fastcgi package
So this problem is with the fcgi package not mod_fastcgi? Sandon Van Ness
02:07 PM devops Bug #9185: incorrect Centos 6.5 fastcgi package
This should fix #9169 Yuri Weinstein
01:54 PM devops Bug #9185 (Rejected): incorrect Centos 6.5 fastcgi package
The fastcgi package that is being installed is, or either based off: fcgi-2.4.0-10.el6.x86_64. Not 100% sure that it ... Yehuda Sadeh
02:33 PM Feature #9031: List RADOS namespaces and list all objects in all namespaces
David Zafman
02:31 PM Bug #9186 (Duplicate): erasure-code: conditionally preload isa plugin
The isa plugin is only built on some platforms. When the OSD preloads plugins, it should not try to load plugins that... Loïc Dachary
02:05 PM rgw Bug #9169: 100-continue broken for centos/rhel
This seems to be due to idle timeout is not working, should be fixed by #9185 Yuri Weinstein
01:27 PM devops Feature #9136 (In Progress): ceph-deploy: use pre-existing ceph.conf
Alfredo Deza
10:54 AM Bug #9182: osd deadlock after ms_handle_reset
..and when i detached gdb the osd saw it was marked down, and came back to life after that. :/ Sage Weil
10:52 AM Bug #9182: osd deadlock after ms_handle_reset
... Sage Weil
10:51 AM Bug #9182 (Can't reproduce): osd deadlock after ms_handle_reset
ubuntu@teuthology:/a/teuthology-2014-08-19_02:30:02-rados-firefly-distro-basic-multi/435572... Sage Weil
10:47 AM CephFS Bug #9173: Crash in Server::_session_logged
Better log. John Spray
06:30 AM CephFS Bug #9173 (Resolved): Crash in Server::_session_logged

Hit by mds_client_recovery task...
John Spray
10:43 AM Bug #9181 (Resolved): Osd: segv in OpTracker::unregister_inflight_op
... Sage Weil
10:38 AM Bug #9179 (Resolved): unfound objects, recovery timeout
402/7722 unfound (
all osds up
ubuntu@teuthology:/a/teuthology-2014-08-19_02:30:02-rados-firefly-distro-basic-m...
Sage Weil
10:33 AM CephFS Bug #9178: samba: ENOTEMPTY on "rm -rf"
http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-10_23:14:02-samba-next-testing-basic-plana/415869/
Greg Farnum
10:30 AM CephFS Bug #9178 (Resolved): samba: ENOTEMPTY on "rm -rf"
... Greg Farnum
10:14 AM CephFS Bug #9177 (Resolved): ceph-fuse: failing MPI mdtest runs
... Greg Farnum
09:40 AM Bug #9176 (Resolved): mon: leaked MMonGetVersion
ubuntu@teuthology:/a/teuthology-2014-08-19_02:30:02-rados-firefly-distro-basic-multi/435589 Sage Weil
09:38 AM Bug #9175 (Duplicate): osd: stuck recovery
ubuntu@teuthology:/a/teuthology-2014-08-19_02:30:02-rados-firefly-distro-basic-multi/435529
pgs stuck recovery, ne...
Sage Weil
09:33 AM Feature #7238: erasure code : implement LRC plugin
Reserved three machines and run the following job on them:... Loïc Dachary
09:32 AM rgw Subtask #9068 (In Progress): rgw: add rgw setup to vstart
Pull request: https://github.com/ceph/ceph/pull/2292 Luis Pabon
09:31 AM rgw Documentation #9003: rgw: document development setup for rgw
Abhishek L wrote:
> Luis Pabon wrote:
> > I have edited vstart.sh so that it can setup rgw automatically. I have a...
Luis Pabon
09:30 AM rgw Documentation #9003: rgw: document development setup for rgw
patch has been submitted: https://github.com/ceph/ceph/pull/2292 Luis Pabon
05:21 AM rgw Documentation #9003: rgw: document development setup for rgw
Luis Pabon wrote:
> I have edited vstart.sh so that it can setup rgw automatically. I have also documented most of ...
Abhishek Lekshmanan
09:19 AM Bug #9128: Newly-restarted OSD may suicide itself after hitting suicide time out value because it...
sounds like we need to use the TPHandle and tp.reset_tp_handle() inside the search_For_missing loop Sage Weil
07:53 AM Documentation #9174: wrong picture on http://ceph.com/docs/master/cephfs/
... Dieter Kasper
07:46 AM Documentation #9174 (Closed): wrong picture on http://ceph.com/docs/master/cephfs/
The picture on page http://ceph.com/docs/master/cephfs/
is not correct.
ceph.ko is not on top of libcephfs / librad...
Dieter Kasper
03:11 AM devops Feature #8868 (Resolved): Update Fedora to 0.80.5 packages with ceph-common
The updated packages with spec file synced up with the upstream spec file were pushed to epel 7, fedora 22, fedora 21... Boris Ranto

08/19/2014

09:31 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
A related thought is that the Intel 520s are plugged into the sata 6Gbit ports on the motherboard, so if there are an... Mark Kirkwood
06:52 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
It might be worth trying an Intel 530 if that is dramatically easier to source - as it is similar to the 520 in the m... Mark Kirkwood
06:26 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
That should have said unpatched wip-9073. Mark Kirkwood
06:25 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Doing a little more digging for the cause of 2/ (invalid argument erro). Using unpatched ipw-0973 and changing the jo... Mark Kirkwood
09:07 PM rgw Bug #9125 (Resolved): rgw: swift tests fail with civetweb
Sage Weil
05:44 PM Feature #7238: erasure code : implement LRC plugin
There is no need to test upgrade on a plugin that does not exist in LRC. Loïc Dachary
02:34 PM Feature #7238: erasure code : implement LRC plugin
canceled the previous job because it did not have enough OSD to complete (the LRC rule requires a minimum of 8 for ea... Loïc Dachary
12:22 PM Feature #7238: erasure code : implement LRC plugin
Cancel the "teuthology run that did not contain any LRC workload":http://pulpito.ceph.com/loic-2014-08-19_20:27:09-up... Loïc Dachary
11:27 AM Feature #7238: erasure code : implement LRC plugin
Fixed a few problems and running "a firefly upgrade suite":http://pulpito.ceph.com/loic-2014-08-19_20:27:09-upgrade:f... Loïc Dachary
03:08 PM Bug #9156: SWIFT tests failed in upgrade:dumpling:rgw-dumpling-distro-basic-vps suite
Further analyzes and chants with Loic and Yehuda revealed that in apache access log we indeed have 30 sec not 1200 se... Yuri Weinstein
03:02 PM Bug #9156: SWIFT tests failed in upgrade:dumpling:rgw-dumpling-distro-basic-vps suite
Suspected backport apache 2.4 issue, test branch wip-rgw-dumpling for ceph-qa-suite
Running now ...
Yuri Weinstein
02:15 PM Fix #8914 (Need More Info): osd crashed at assert ReplicatedBackend::build_push_op
I'm not able to reproduce the problem on *ceph version 0.84-343-g92b227e (92b227e1c0b1533c359e74c81de58140b483ee8e)* ... Loïc Dachary
01:15 PM rgw Bug #9155: Swift Subuser - 403 Forbidden - during upload/post
I pushed a different fix to wip-8587, please take a look and see if you think it makes sense. Yehuda Sadeh
01:10 PM Feature #8155: Disallow changing cache_mode in nonsensical ways
c3f403293c7f8d946f66a871aa015a558120ce78 Samuel Just
01:10 PM Feature #8155 (Resolved): Disallow changing cache_mode in nonsensical ways
Samuel Just
01:09 PM devops Feature #9050: Calamari builds for ceph.com
Asking Ian and Neil, they confirm that what this means is "repos". The hard choice is going to be figuring out what ... Dan Mick
12:15 PM Bug #9170 (Resolved): erasure-code: preload erasure code plugins
Whitelist the plugins to be preloaded. Loïc Dachary
11:19 AM devops Feature #3019 (Closed): juju: modernize ceph charm, mon & osd bootstrap
Neil Levine
11:11 AM rgw Bug #9169 (Resolved): 100-continue broken for centos/rhel
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-18_16:07:27-upgrade:dumpling-firefly-x-firefly-dis... Yuri Weinstein
11:10 AM devops Feature #8868 (In Progress): Update Fedora to 0.80.5 packages with ceph-common
Ian Colle
09:20 AM rgw Feature #8911: RGW doesn't return 'x-timestamp' in header which is used by 'View Details' of Open...
I'll take a look. Seems like this is new functionality in RGW, not a bug, right? Luis Pabon
09:13 AM CephFS Bug #9152: mds: beacon needs to not take mds_lock
Hmm, the beacon send code doesn't need to hold the lock on its own, but it's triggered by the SafeTimer, which is jus... Greg Farnum
09:07 AM rgw Documentation #9003: rgw: document development setup for rgw
I have edited vstart.sh so that it can setup rgw automatically. I have also documented most of the steps needed by n... Luis Pabon
09:02 AM rgw Documentation #9003 (In Progress): rgw: document development setup for rgw
Luis Pabon
09:05 AM CephFS Bug #9151: mds should log/error/warn when segments are NOT getting trimmed
What kind of logging do we want? I assume you mean journal segments, and this is a bog standard operation...
If it's...
Greg Farnum
09:04 AM rgw Feature #8945: rgw: support swift /info api
After spending some time on this call, I am going to have to break it down to smaller tasks. I am currently investig... Luis Pabon
09:02 AM Bug #9143: Incorrect key sequence in encoding object name to key for GenericObjectMap
How did you run across this? Is it feasible to fix it by typing the escaped strings and writing a custom comparator? Greg Farnum
07:47 AM Bug #9079: osd: bad learned_addr during send_boot
"pending pull request":https://github.com/ceph/ceph/pull/2275 Loïc Dachary
07:41 AM Feature #9167 (Resolved): erasure-code: check plugin version when loading it
When loading the erasure code plugin, check the Ceph version against which it was built and fail if it does not match... Loïc Dachary
07:22 AM devops Bug #9166 (Closed): activate dmcrypt volumes via init script
Hi,
I don't know if this is more a bug or a feature request.
I think it would helpful if the activation of ceph ...
Manuel Lausch
07:16 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
"firefly backport":https://github.com/ceph/ceph/pull/2286 Loïc Dachary
07:10 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
The teuthology upgrade tests fails consistently with the same problem. Backporting to firefly seem to be the only way... Loïc Dachary
05:21 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
"Running upgrade:dumpling-firefly-x with the proposed fix":http://pulpito.ceph.com/loic-2014-08-19_14:23:09-upgrade:d... Loïc Dachary
06:49 AM CephFS Fix #4286: SLES 11 - cfuse: disable 'big_writes'and 'atomic_o_trunc
Ian Colle
04:17 AM rbd Bug #9076: Can't completely remove a version 1 image on RHEL 7
Ok it's better with ceph.com packages. You can close this :)
Thanks!
Sébastien Han
04:16 AM rbd Bug #9075: Can't create a version 2 images on RHEL 7
Ok it's better with ceph.com packages. You can close this :)
Thanks!
Sébastien Han

08/18/2014

11:21 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
FWIW - checked this myself on my home machine (which was *not* seeing this last issue recall, only the hang) by reboo... Mark Kirkwood
07:48 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
My linux version is 3.2 && 3.5. I'll test on 3.13.0-32-generic to find whether kernel cause this bug. jianpeng ma
07:00 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Excellent. Purely out of interest, any idea (now) why we only saw this bug on one particular system? Mark Kirkwood
04:04 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Hmm, thanks very much! I'll send the patch.
Thanks again, Mark!
jianpeng ma
03:44 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Retested with only debug-journal-header-3.diff on wip-9073. I did 200 test runs, good journal every time. Mark Kirkwood
02:39 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I think you should retest only using debug-journal-header-3.diff on wip-9073. And test more times to avoid the bug r... jianpeng ma
02:36 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I had your last debugging diff on there as well (I can retest without that if needed). Mark Kirkwood
02:34 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Are you only apply debug-journal-header-3.diff on wip-9073 to test ?
jianpeng ma
02:32 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Heh - sorry, means 'really fixed it well'! Mark Kirkwood
02:30 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
What's mean of nail it? sorry, i don't know. jianpeng ma
02:21 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Lol, you certainly have - been a pleasure debugging this with you!
I actually applied the patch attached in this n...
Mark Kirkwood
02:01 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
growl, make that 3.13.0-32-generic, typed 'uname -a' in wrong (x)window before! Mark Kirkwood
02:01 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I have a thought. It's strange.
Using aio, the kernel use user-space to write. But if before write to journal, the u...
jianpeng ma
01:58 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
...oh and kernel is 3.13.0-34-generic (sorry)! Mark Kirkwood
01:52 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Yeah, disabling dio seems to get a consistently good header (10 consecutive runs) Mark Kirkwood
01:22 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
From the latest ceph-osd.o.log. Before io_submit, the content is ok.
I found another issue.
2014-08-18 20:10:09.7...
jianpeng ma
01:10 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Seems I spoke too soon - a few more runs showed up:
$ hexdump -n8 journalblk-prestart--20864.txt
0000000 7000 033...
Mark Kirkwood
12:38 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
With *only* your latest pacth applied to wip-9073 I'm seeing a good journal header:
$ hexdump -n8 journalblk-prest...
Mark Kirkwood
12:12 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Using my latest patch, is journal-header corrupt?
From my debug info, before io_submit and after aio completed, the ...
jianpeng ma
09:44 PM rgw Bug #9155: Swift Subuser - 403 Forbidden - during upload/post
Here's the pull request:
[[https://github.com/ceph/ceph/pull/2281]]
Dhiraj Kamble
08:20 AM rgw Bug #9155: Swift Subuser - 403 Forbidden - during upload/post
That's duplicate of #8587, a pull request for your fix would be great. Yehuda Sadeh
07:49 AM rgw Bug #9155 (Resolved): Swift Subuser - 403 Forbidden - during upload/post
Swift Upload fails with HTTP error 403 for an subuser that was created with the required permissions. This happens ge... Dhiraj Kamble
06:26 PM Bug #9062: Mon segfault in waitlist_or_zap_client
the fix was merged in commit:321d4defd4a0f5a53a41276e6dc048479cb3084a Greg Farnum
05:14 PM Bug #9145: recursive lock of CollectionIndex::access_lock (52)
The fix Sam suggested is to name the CollectionIndex lock based on the collection names. This will make lockdep happy... Somnath Roy
01:58 PM Bug #9145: recursive lock of CollectionIndex::access_lock (52)
Sage,
Yes, I am able to reproduce this following the steps you suggested. But, this time I am hitting the issue in _...
Somnath Roy
04:51 PM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
"minimal fix":https://github.com/ceph/ceph/pull/2282 Loïc Dachary
09:05 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Stopping the daemons may not be the brightest idea because of http://tracker.ceph.com/issues/8849 . Pre-loading the p... Loïc Dachary
08:09 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
"proposed fix":https://github.com/ceph/ceph/pull/2278 Loïc Dachary
07:27 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Here is a possible scenario:
* ceph-osd-0.80.5 is running but did not load jerasure
* ceph-osd-0.83 is installed ...
Loïc Dachary
07:09 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Here is the part of the teuthology log dealing with the upgrade, which is immediately followed by a core dump from os... Loïc Dachary
06:43 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Trying a manual upgrade... Loïc Dachary
06:25 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
The ceph-libs package is obsolete and the jerasure plugin now lives in the ceph package. The problem does not come fr... Loïc Dachary
06:18 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
It looks like the ceph-libs package is not upgraded, which explains the core dump : master cannot successfully load a... Loïc Dachary
05:31 AM Bug #9153 (Fix Under Review): erasure-code: jerasure_matrix_dotprod segmentation fault due to pac...
"proposed fix":https://github.com/ceph/ceph/pull/2276 Loïc Dachary
05:22 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
If the ceph-libs package is upgraded before the ceph package, it is entirely possible that the shared library is repl... Loïc Dachary
04:47 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
The upgrade sequence
* dumpling
* firefly -> installs and load the jerasure plugin
* master -> installs an updat...
Loïc Dachary
04:41 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
The stack trace is bizarre. ECUtil::decode calls ErasureCodeJerasure::encode_chunks which makes no sense becase a) de... Loïc Dachary
04:29 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Got three VPS with rhel 6.5 installed, running the job on them with no "nuke-on-error" Loïc Dachary
03:43 AM Bug #9153 (In Progress): erasure-code: jerasure_matrix_dotprod segmentation fault due to package ...
As soon as VPS are available, lock three and run the job again hoping to repeat it... Loïc Dachary
01:22 AM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Ack Loïc Dachary
02:42 PM Feature #9161 (New): Cache warmup and ejection
Initial access of an object in a high performance cache tier can have high latency as the object is fetched from the ... Neil Levine
02:20 PM rgw Bug #9160 (Closed): rgw failures with 'NoneType' object has no attribute 'get_contents_as_string'

Several jobs in this suite failed with this error:
http://pulpito.ceph.com/john-2014-08-18_16:28:28-rgw-wip-object...
John Spray
01:56 PM rgw Bug #9125: rgw: swift tests fail with civetweb
looks like the fix is merged to master, tested it on master branch and it worked fine.
will mark it as "Resolved"...
Tamilarasi muthamizhan
10:45 AM Bug #9158 (Duplicate): osd crashed in upgrade:dumpling-x:stress-split-master-distro-basic-vps suite
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-17_11:40:01-upgrade:dumpling-x:stress-split-master... Yuri Weinstein
10:24 AM Bug #9072 (Resolved): error setting 'mon_pg_warn_min_objects' to '10K': (22) Invalid argument
Sage Weil
09:23 AM Bug #9072: error setting 'mon_pg_warn_min_objects' to '10K': (22) Invalid argument
I checked the firefly branch and Sage cherry-picked the required patches to it.
That ought to fix all issues with ...
Joao Eduardo Luis
09:08 AM devops Feature #9118: ceph-deploy: Add pre-generated keys to a Monitor
Keith Schincke wrote:
> Can the precreated/populated keyring be propagated with the ceph-deploy command when the clu...
Sage Weil
09:04 AM devops Feature #9118: ceph-deploy: Add pre-generated keys to a Monitor
Can the precreated/populated keyring be propagated with the ceph-deploy command when the cluster is created? Keith Schincke
08:23 AM Bug #9156 (Resolved): SWIFT tests failed in upgrade:dumpling:rgw-dumpling-distro-basic-vps suite
12 tests total failed in http://pulpito.front.sepia.ceph.com/teuthology-2014-08-17_12:05:01-upgrade:dumpling:rgw-dump... Yuri Weinstein
05:17 AM Bug #9112 (Resolved): (wip-objecter) librados notify calls freezing
No longer occurring after reinstating _recalc_linger_op_target and updating related bits of code John Spray

08/17/2014

11:52 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Meanwhile, I have been doing a little digging of my own: if I disable dio or aio via
[osd]
journal [d,a]io = fals...
Mark Kirkwood
11:40 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Here's the log with that patch applied. Mark Kirkwood
07:27 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Hi Mark,
Could you test again? I add more debug message this time.
Thanks!
jianpeng ma
08:53 PM rbd Bug #8919 (Resolved): qemu-iotests fails to find common.env
Sage Weil
05:15 PM Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race
Loic, can you take a look? Sage Weil
04:38 PM Bug #9153 (Resolved): erasure-code: jerasure_matrix_dotprod segmentation fault due to package upg...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-17_11:30:03-upgrade:dumpling-firefly-x-master-dist... Yuri Weinstein
01:00 PM CephFS Bug #9152 (Resolved): mds: beacon needs to not take mds_lock
any random task that holds the mds lock for a long time prevents beacons, which will trigger a failover Sage Weil
12:48 PM CephFS Bug #9151 (Resolved): mds should log/error/warn when segments are NOT getting trimmed
Sage Weil

08/16/2014

10:01 PM rgw Bug #8621 (Pending Backport): civetweb frontend fails authentication if URL has special chars
Sage Weil
09:55 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Sage's comment suggested I check something - reverting 4eb18dd487da4cb621dcbecfc475fc0871b356ac from wip-9073 and run... Mark Kirkwood
08:59 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I've reverted commit:4eb18dd487da4cb621dcbecfc475fc0871b356ac on next so we can release v0.84. once we sort this out... Sage Weil
12:47 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
jianpeng ma wrote:
> I read the #6003. I think they are not the same.
> You can see those two files (patch.diff ...
Sage Weil
09:53 PM Feature #9030 (Resolved): mon: quickly identify 'problem'  osds
Sage Weil
09:26 PM Bug #9150 (Can't reproduce): osd/ECBackend.cc: 529: FAILED assert(pop.data.length() == sinfo.alig...
... Sage Weil
08:57 PM rgw Bug #9137 (Resolved): AH00534: apache2: Configuration error: No MPM loaded. (rpm distros)
Sage Weil
04:56 PM rgw Bug #9137: AH00534: apache2: Configuration error: No MPM loaded. (rpm distros)
works on el6 and el7. fc20 fails the ceph-qa-chef because of tiobench. Sage Weil
02:16 PM rgw Bug #9137: AH00534: apache2: Configuration error: No MPM loaded. (rpm distros)
verfied to work on precise and trusty.
still need to test on el6, el7, and fedora.
Sage Weil
08:52 PM rgw Bug #9148 (Resolved): rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy...
... Sage Weil
03:42 PM CephFS Bug #8574 (Resolved): teuthology: NFS mounts on trusty are failing
chef adds a dummy export and restarts nfs-kernel-server now Sage Weil
02:41 PM CephFS Bug #8574: teuthology: NFS mounts on trusty are failing
root@mira055:~# service nfs-kernel-server restart
* Stopping NFS kernel daemon ...
Sage Weil
02:08 PM Linux kernel client Bug #9147 (Closed): krbd: run_xfstests.sh fails
... Sage Weil
02:07 PM rbd Bug #9146 (Can't reproduce): EPERM from image_read.sh
... Sage Weil
01:54 PM rgw Bug #9039: Using COPY on radosgw to copy object from one bucket to another that's in another pool...
The restriping tool never made it to dumpling. It actually isn't even in firefly. Yehuda Sadeh
01:39 PM rgw Bug #9039 (Pending Backport): Using COPY on radosgw to copy object from one bucket to another tha...
the restriping fix patches also need to go to dumpling... Sage Weil
01:46 PM Bug #8997: ceph_test_rados_watch_notify hangs
ubuntu@teuthology:/a/sage-2014-08-15_21:44:35-rados-master-testing-basic-multi/427533 (probably) Sage Weil
01:43 PM Bug #9145 (Resolved): recursive lock of CollectionIndex::access_lock (52)
... Sage Weil
01:17 PM Feature #7238: erasure code : implement LRC plugin
"running teuthology test run":http://pulpito.ceph.com/loic-2014-08-16_22:17:50-upgrade:firefly-x:stress-split-wip-723... Loïc Dachary
12:41 PM Bug #9144 (Fix Under Review): filestore: commit triggered during journal replay
https://github.com/ceph/ceph/pull/2274 Sage Weil
09:26 AM Bug #9144 (Resolved): filestore: commit triggered during journal replay
... Sage Weil
09:38 AM Feature #9033 (Resolved): erasure-code: simplified LRC
"part of a larger pull request":https://github.com/dachary/ceph/commit/43b8f66797184b1138560184708573aa6930e8c4 Loïc Dachary
09:15 AM Bug #9053 (Pending Backport): mon/Paxos.cc: 628: FAILED assert(begin->last_committed == last_comm...
Sage Weil
07:47 AM Bug #9143 (Rejected): Incorrect key sequence in encoding object name to key for GenericObjectMap
For example, two oid has same hash and their name is:
A: "rb.data.123"
B: "rb-123"
In ghobject_t compare level, ...
Haomai Wang
06:02 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
"all green !":http://pulpito.ceph.com/loic-2014-08-16_10:42:43-upgrade:firefly-x:stress-split-wip-9025-chunk-remappin... Loïc Dachary

08/15/2014

07:50 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I read the #6003. I think they are not the same.
You can see those two files (patch.diff Magnifier (571 Bytes) ji...
jianpeng ma
06:19 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
This is starting to sound a lot like #6003! Sage Weil
01:56 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
hexdump -n8 journalblk-prestart.txt
0000000 3000 021d 0000 0000
Mark Kirkwood
12:09 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Can you paste the journal head after this command. Only first 8byte. jianpeng ma
04:41 PM rgw Bug #9137: AH00534: apache2: Configuration error: No MPM loaded. (rpm distros)
The MPM selection is supposed to be made in the default config. Dan Mick
01:23 PM rgw Bug #9137: AH00534: apache2: Configuration error: No MPM loaded. (rpm distros)
Looking into this; my theory is two problems: 1) package structure changed in 2.4 and we might need to explicitly in... Dan Mick
11:46 AM rgw Bug #9137 (Resolved): AH00534: apache2: Configuration error: No MPM loaded. (rpm distros)
... Sage Weil
03:49 PM Bug #9130 (Resolved): (wip-objecter) FAILED assert(cur_con) in MonClient
fix in wip-objecter Sage Weil
06:42 AM Bug #9130 (Resolved): (wip-objecter) FAILED assert(cur_con) in MonClient

http://pulpito.front.sepia.ceph.com/john-2014-08-15_03:34:51-rbd-wip-mds-contexts-testing-basic-multi/425519/
<p...
John Spray
02:08 PM Bug #9119 (Pending Backport): READFORWARD ordering bug
Sage Weil
02:03 PM RADOS Bug #8963 (Resolved): erasure coding crush rulset breaks rbd kernel clients on non-ec pools on Ub...
backported to firefly Sage Weil
01:34 PM Bug #9142 (Can't reproduce): [ RUN ] LibRadosTwoPoolsPP.PromoteSnapScrub hang
ubuntu@teuthology:/a/samuelj-2014-08-14_18:41:07-rados-wip-sam-testing-testing-basic-multi/425498 Samuel Just
01:33 PM Bug #9140: [ FAILED ] LibRadosTwoPoolsPP.PromoteOn2ndRead (9913 ms)
ubuntu@teuthology:/a/samuelj-2014-08-14_18:41:07-rados-wip-sam-testing-testing-basic-multi/425458 Samuel Just
01:30 PM Bug #9140 (Duplicate): [ FAILED ] LibRadosTwoPoolsPP.PromoteOn2ndRead (9913 ms)
2014-08-15T05:48:20.619 INFO:tasks.workunit.client.0.plana16.stdout:[ OK ] LibRadosTwoPoolsPP.HitSetWrite (2908... Samuel Just
01:32 PM Bug #9141 (Can't reproduce): [ RUN ] LibRadosAio.IsCompletePP hang
ubuntu@teuthology:/a/samuelj-2014-08-14_18:41:07-rados-wip-sam-testing-testing-basic-multi/425497 Samuel Just
01:01 PM Bug #9139 (Rejected): ceph_test_rados reports incorrectly missing object
ORDERSNAPS was fixing something important:
1) cache-primary send DELETE on object we are flushing
2) base-primary q...
Samuel Just
11:28 AM devops Feature #9134 (Duplicate): ceph-deploy: add pre-generated client keys to MON
9118 Neil Levine
11:22 AM devops Feature #9134 (Duplicate): ceph-deploy: add pre-generated client keys to MON
User story: As an admin, I have already generated Ceph client keys and would like to add them to the cluster during t... Neil Levine
11:27 AM devops Feature #9136 (Resolved): ceph-deploy: use pre-existing ceph.conf
User story: As an admin, I have already generated a ceph,conf file and would like to use it for a new cluster install... Neil Levine
11:26 AM Bug #9135 (Can't reproduce): ENOENT on collection_add
... Sage Weil
11:08 AM CephFS Feature #8869 (Resolved): MDS: support standby-replay on old-format journals
This merged a couple of weeks ago in https://github.com/ceph/ceph/commit/440c820cce2c262570ab78e352bed8a630d41be5 John Spray
10:49 AM devops Feature #9133 (Rejected): create ceph user/group; run daemons as ceph (non-root)
this will involve lots of updates to packaging. Sage Weil
05:33 AM Feature #7238: erasure code : implement LRC plugin
Teuthology job description:... Loïc Dachary
04:45 AM CephFS Bug #9105: ~ObjectCacher behaves poorly on EBLACKLISTED
Punting on a general purpose fix for ObjectCacher for the time being, and just fixing this in librbd teardown. John Spray
04:44 AM CephFS Bug #9105 (Fix Under Review): ~ObjectCacher behaves poorly on EBLACKLISTED
https://github.com/ceph/ceph/pull/2263 John Spray
03:53 AM Bug #9128 (Resolved): Newly-restarted OSD may suicide itself after hitting suicide time out value...
Stop one OSD daemon for a long time, like many hours even to 1 day, without marking it as out. During this time, ther... Zhi Zhang
03:40 AM Feature #9025 (Resolved): erasure-code: chunk remapping
Loïc Dachary
03:38 AM Feature #9025: erasure-code: chunk remapping
Teuthology job passes. Loïc Dachary

08/14/2014

11:25 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
The strace attached. So this is the mkfs...and wip-9073 with *just* the last patch applied. Mark Kirkwood
11:20 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Can you using strace to catch the ceh-osd command? Please using strace -f to cache all child process.
Thanks!
jianpeng ma
11:14 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Back to seeing the same error (invalid argument) with this latest patch :-( Mark Kirkwood
10:58 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Makr, i don't find the reason. But i think this bug may caused by patch. So i modify my patch and hope the bug don't ... jianpeng ma
10:58 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
...suggests a memory overwrite problem - we really need to get the binaries running under valgrind! Mark Kirkwood
08:11 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
With that last patch applied, journal header looks good every mkfs and osd is starting every time. Mark Kirkwood
07:47 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Yes. It's a great step. A strange bug.
The attachment is a patch which add read_header on some place.Can you try t...
jianpeng ma
07:41 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Very interesting: *somethimes* after the mkfs the header looks like:
0000000 b000 02b5 0000 0000 0001 0000 0000 00...
Mark Kirkwood
07:12 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Will do. Mark Kirkwood
06:57 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
But from the code, when start osd, read journal-header is the first thing for journal.
I don't know the command 's...
jianpeng ma
06:54 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Hmmm - just checked again and got:
$ hexdump journalblk-prestart.txt|head -1
0000000 3000 02a0 0000 0000 0001 000...
Mark Kirkwood
06:45 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Funny you should mention that, I had just check that myself:
So, just after the mkfs, journal header is:
$ hexd...
Mark Kirkwood
06:30 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Hi Mark,
I use different on my hand but i can't reproduce this.
From the deply.sh, for osd operation
1:ceph-osd ...
jianpeng ma
03:33 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Can you use "strace -f ceph-osd .." to trace all syscall?
We may from the info find some clue.
jianpeng ma
03:20 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
No, sorry,
$ sudo dd if=/dev/zero of=/dev/sdc1 bs=512
$ sudo ./deploy.sh
is the prescription. The result is os...
Mark Kirkwood
03:08 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Are you mean if you zero the journal-disk then the osd can start? Otherwise, it will met this bug. jianpeng ma
03:04 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Sure - I'm running the script attached initially - now using a minor variation thereof (attached again).
The only ot...
Mark Kirkwood
02:40 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
The first 8 byte of journal-header destroyed. But the debug info show the content of journal-header is right.
Now ...
jianpeng ma
02:06 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Also, I note that running
$ sudo ceph-osd -i 0 --mkjournal
results in a journal state that lets the osd start, ...
Mark Kirkwood
01:38 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Shame about no 520, but here are the files. Mark Kirkwood
06:06 PM rgw Bug #9125 (Resolved): rgw: swift tests fail with civetweb
logs are copied to ubuntu@mira042.front.sepia.ceph.com:/home/ubuntu/civetweb_swift... Tamilarasi muthamizhan
05:57 PM rgw Bug #8971 (Duplicate): rgw: s3 test failures with civetweb
Tamilarasi muthamizhan
05:56 PM rgw Bug #8971: rgw: s3 test failures with civetweb
s3tests now pass on wip-8621 branch. Tamilarasi muthamizhan
05:55 PM rgw Bug #8621: civetweb frontend fails authentication if URL has special chars
s3tests passed with recent changes to wip-8621.
Tamilarasi muthamizhan
05:39 PM Bug #9058 (Need More Info): rest-api: long-running process may fail 'tell osd...' due to stale os...
ok, my theory doesn't seem right.. Objecter is checking for a new map if it gets ENXIO or similar. enabled logging i... Sage Weil
05:36 PM devops Bug #8330 (Resolved): repodata on rpm repos do not list latest ceph-deploy (1.5.2)
Thanks for verifying. Sandon Van Ness
05:33 PM devops Bug #8976 (Fix Under Review): httpd on RHEL7 (RHEL repo) incompatible with mod_fastcgi (ceph repo)
We have a new version available out at:
http://gitbuilder.ceph.com/apache2-rpm-rhel7-x86_64-basic/ref/master/
A...
Sandon Van Ness
05:13 PM Bug #8895: ceph osd pool stats (displayed incorrect values)
Can probably close this as dupe of #5884? John Spray
04:14 PM CephFS Bug #9101: multimds: unlinked file is not pruned from replica mds caches
Sage Weil
03:20 PM CephFS Bug #9123 (Can't reproduce): kceph: had 130k+ inodes with write caps
in #9121 the client had more than 130k inodes open for write, resulting in a huge file recovery queue. there definit... Sage Weil
02:37 PM CephFS Bug #9121 (In Progress): mds: inode stuck recovering after client restart
recovery is working.. there are just a lot of inodes queued:
2014-08-14 14:40:06.695087 7fd45f757700 10 mds.0.cach...
Sage Weil
02:10 PM CephFS Bug #9121 (Resolved): mds: inode stuck recovering after client restart
... Sage Weil
01:51 PM CephFS Bug #9105: ~ObjectCacher behaves poorly on EBLACKLISTED
John Spray wrote:
> This is happening when the librbd-using client is blacklisted, ObjectCacher fails to flush when ...
Sage Weil
10:16 AM CephFS Bug #9105: ~ObjectCacher behaves poorly on EBLACKLISTED
This is happening when the librbd-using client is blacklisted, ObjectCacher fails to flush when requested, and ImageC... John Spray
09:44 AM CephFS Bug #9105: ~ObjectCacher behaves poorly on EBLACKLISTED
Started failing in 061c8e93f76dc4fd6290d6d15723d76e73267444 where rbd_cache and rbd_cache_writethrough_until_flush we... John Spray
01:17 PM rgw Bug #8988 (Resolved): AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
Sage Weil
12:33 PM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
"the suite runs ok":http://pulpito.ceph.com/loic-2014-08-14_14:25:55-upgrade:firefly-x:stress-split-wip-9025-chunk-re... Loïc Dachary
05:55 AM rgw Bug #8988 (Fix Under Review): AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
"need review":https://github.com/ceph/ceph-qa-suite/pull/87 Loïc Dachary
05:36 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
The reason why "the suite fails":http://pulpito.ceph.com/loic-2014-08-14_09:47:05-upgrade:firefly-x:stress-split-wip-... Loïc Dachary
12:53 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
It failed for the same reason. "Rescheduled once more, hoping the problem has been fixed":http://pulpito.ceph.com/loi... Loïc Dachary
01:13 PM Bug #8865 (Resolved): cep osd setmaxosd doesn't check if osds exist
Sage Weil
12:37 PM Feature #9025: erasure-code: chunk remapping
Now that the teuthology + MDS bugs are fixed, the following job will be scheduled to exercise remapping:... Loïc Dachary
11:10 AM Bug #9119 (Resolved): READFORWARD ordering bug
READFORWARD is forwarding RWORDERED reads. Samuel Just
11:06 AM devops Feature #9118: ceph-deploy: Add pre-generated keys to a Monitor
Any keys (client.admin or otherwise) in the keyring file passed to "ceph-mon --mkfs --keyring <foo>" will get seeded ... Sage Weil
10:56 AM devops Feature #9118 (Resolved): ceph-deploy: Add pre-generated keys to a Monitor
ceph-authtool can be used to generate a key and keyring before a Ceph cluster is running, if a user has access to the... Neil Levine
10:54 AM Feature #9083 (Closed): Standalone script to generate Ceph keys
Feature already exists in ceph-authtool Neil Levine
09:34 AM Bug #9113: osd: snap trimming eats memory, linearly
a few notes:... Sage Weil
06:40 AM Bug #9113 (Resolved): osd: snap trimming eats memory, linearly
- rados pool snapshot taken weekly
- trimmed when >30 days old
- trimming makes some osds consume memory linearly
...
Sage Weil
09:06 AM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
ubuntu@teuthology:/a/sage-2014-08-13_15:28:18-rados-next-testing-basic-multi/422862 Sage Weil
09:05 AM Bug #9114: osd: segv in build_push_op
note: i manually killed ceph_test_rados to make teuthology clean up Sage Weil
07:09 AM Bug #9114 (Duplicate): osd: segv in build_push_op
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-13_15:28:18-rados-next-testing-basic-multi/422759... Sage Weil
08:33 AM Bug #9102 (Resolved): ceph-disk has undefined variables
Sage Weil
07:51 AM Bug #9102 (Fix Under Review): ceph-disk has undefined variables
PR opened https://github.com/ceph/ceph/pull/2251 Alfredo Deza
07:58 AM rgw Documentation #9116 (Resolved): rgw: broken link
From Luis Pabon:... Yehuda Sadeh
07:21 AM devops Bug #9066 (Rejected): Need ceph-deploy to be able to run to JUST generate ceph.conf and keyring w...
The initial issue was misunderstood, ceph-deploy already is able to create a ceph.conf and a mon keyring. Other requi... Alfredo Deza
06:47 AM Bug #9062 (Resolved): Mon segfault in waitlist_or_zap_client
Sage Weil
06:40 AM Bug #9112 (In Progress): (wip-objecter) librados notify calls freezing
John Spray
06:39 AM Bug #9112: (wip-objecter) librados notify calls freezing
Client log with objecter and librados debug logging at 20 in teuthology:~/jcsp/9112 John Spray
06:28 AM Bug #9112 (Resolved): (wip-objecter) librados notify calls freezing

Hitting this in rbd tests, periodically the ceph_test_rados_fsx process gets stuck inside IoCtxImpl::notify
<pre...
John Spray
06:34 AM CephFS Bug #8725 (Resolved): mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
Sage Weil
06:16 AM devops Feature #9103: create a (generic) webservice to handle Sphinx documentation versions
1.- Adding something to the Sphinx build is non-trivial. Sphinx extensions (the right way to do this) are very comple... Alfredo Deza
02:48 AM Bug #9111: PG stuck with 'active+remapped' forever with cluster wide change (add/remove OSDs)
Right after I filed this bug, I got some clue, I found the problem came from those removed OSDs (which has status DNE... Guang Yang
02:01 AM Bug #9111 (Won't Fix): PG stuck with 'active+remapped' forever with cluster wide change (add/remo...
After adding/removing OSDs, some PGs stuck with 'active+remapped' forever.
1. ceph -s
-bash-4.1$ ceph -s...
Guang Yang
01:35 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...
Thanks Sage , the issue has been resolved, cluster is Healthy now. karan singh

08/13/2014

11:49 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Mark, i can't find the ssd in lab.
And i also can't find the code.But from my two patch, i don't modify code which c...
jianpeng ma
07:08 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
hexdump journalblk.txt
0000000 1000 03ce 0000 0000 0001 0000 0000 0000
0000010 bdb9 29ac 51d7 a343 3bbf 1114 622e...
jianpeng ma
06:51 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Here's the 4096 bytes of sdc1 Mark Kirkwood
06:41 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
For the code,there is a logic error.
int r = ::pread(fd, bp.c_str(), bp.length(), 0);
bl.push_back(bp);
try ...
jianpeng ma
06:21 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Can you read the 4096 of /dev/sdc1 and send to me?
The journal header is in first 4096 size.
jianpeng ma
06:12 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
The info for the Intel 520:
Re more journal debugging - sure, I already have the following set:
[osd]
debug os...
Mark Kirkwood
06:09 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
The script puts in symlinks (also note slightly different osd data path on the work machine):
$ ls -l /var/lib/cep...
Mark Kirkwood
06:04 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
From you message, I found:
14-08-14 10:58:01.735317 7f944f5e4800 20 journal _check_disk_write_cache: disk write cach...
jianpeng ma
05:36 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Can you send the 520 disk-info using hdparm to me?
I'll search the lab try to find this ssd.
Thanks!
jianpeng ma
05:13 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Can you print more debuginfo about journal?
From the messages:
journal read_header error decoding journal header
...
jianpeng ma
03:58 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Doing a secure erase of the 520's changes nothing. Still seeing problem 2/ 'invalid argument' opening the journal. Mark Kirkwood
01:55 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
An aside thought - to rule out weird ssd related stuff I had performed a secure erase on the Crucial m4's while inves... Mark Kirkwood
01:40 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I'm happy to report that wip-9073 definitely fixes problem 1/ (the hang). Mark Kirkwood
01:04 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
to (hopefully) clarify the errors:
- Home machine: osd mkfs hangs (which I've called 1/)
- work machine: osd mkfs...
Mark Kirkwood
12:56 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Interesting... I'm just building wip-9073 on my home machine now, will update you with what I find.
The issue *mig...
Mark Kirkwood
09:17 PM rgw Feature #8473: rgw: Shard bucket index objects to improve single bucket PUT throughput
Here is the first patch - https://github.com/ceph/ceph/pull/2187 Guang Yang
09:16 PM Bug #7521 (Won't Fix): Add more events (hold object context) to OpTracker to better analyze perfo...
With more understanding of the tracker, I found actually the issue being tracked by this but can be achieved by the c... Guang Yang
09:14 PM Bug #7710 (Resolved): Multiple rados bench instance will overwrite the metadata object
Guang Yang
09:10 PM Documentation #6142: Ceph needs mor than 32k pids
John, not sure where this should go in the doc structure... Sage Weil
06:20 PM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
Loic, we had disk failed and possibly due to that suite failed (guessing), I re-started it http://pulpito.front.sepia... Yuri Weinstein
04:11 PM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
Waiting for "Shipping apache config":https://github.com/ceph/ceph-qa-suite/blob/master/tasks/rgw.py#L82 with... Loïc Dachary
04:04 PM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
"running a suite using the new VPS.yaml":http://pulpito.ceph.com/loic-2014-08-14_01:02:11-upgrade:firefly-x:stress-sp... Loïc Dachary
03:47 PM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
"fix indentation of rgw override":https://github.com/ceph/ceph-qa-suite/pull/85 Loïc Dachary
03:35 PM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
To confirm there is a large delay requiring a large idle_timeout:... Loïc Dachary
03:33 PM rgw Bug #8988 (In Progress): AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
Sage Weil
04:30 PM Bug #9109 (New): ceph CLI: Help is missing -k keyring option
The ceph command line should provide a -k keyring argument. "ceph --help" does not appear to list the -k option for t... John Wilkins
04:28 PM Bug #9087 (Need More Info): ceph_test_rados_list_parallel hang
Sage Weil
02:21 PM Bug #9087: ceph_test_rados_list_parallel hang
added some debugging. Samuel Just
12:47 PM Bug #9087: ceph_test_rados_list_parallel hang
Looking Samuel Just
04:22 PM Bug #9053: mon/Paxos.cc: 628: FAILED assert(begin->last_committed == last_committed)
Paxos::handle_last() bug.
the peon:...
Sage Weil
04:17 PM Bug #9053: mon/Paxos.cc: 628: FAILED assert(begin->last_committed == last_committed)
Sage Weil
03:35 PM CephFS Bug #8964 (Resolved): kcephfs: client does not resend requests on mds restart
Sage Weil
03:13 PM CephFS Bug #8725 (Fix Under Review): mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic...
https://github.com/ceph/ceph/pull/2254 Sage Weil
02:46 PM Cleanup #9106: ceph-authtool: Modifying user without --gen-key overwrites the key
Wasn't able to reproduce this after retrying. Maybe just a usage issue. John Wilkins
02:24 PM Cleanup #9106 (Resolved): ceph-authtool: Modifying user without --gen-key overwrites the key
If you are trying to modify a user's caps/permissions using ceph-authtool, and the user has an existing key, specifyi... John Wilkins
02:37 PM RADOS Feature #9108 (New): ceph auth get: Get multiple users
The "ceph auth get <user>" command with the -o option is an ideal way to create a keyring for an individual user. How... John Wilkins
02:37 PM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
Hmm, most likely a bug in repair. We should start by creating a teuthology task which reproduces the bug. Once we h... Samuel Just
02:27 PM RADOS Feature #9107 (New): ceph-authtool: Delete a user.
Currently, there is no corresponding "delete" feature that allows a user to delete a user from a keyring. We should h... John Wilkins
02:25 PM Feature #8389 (Resolved): osd: clean up old ec objects more aggressively
Samuel Just
02:25 PM Feature #8480 (Resolved): modify scrub to detect/repair obsolete rollback objects
Samuel Just
02:15 PM CephFS Bug #9105 (New): ~ObjectCacher behaves poorly on EBLACKLISTED

In ceph master 78dc4df
http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-11_23:00:01-rbd-master-testing-bas...
John Spray
01:59 PM devops Feature #9103: create a (generic) webservice to handle Sphinx documentation versions

The calamari docs already include a version (albeit a rather verbose one including the git hash). I guess with a l...
John Spray
01:06 PM devops Feature #9103 (Resolved): create a (generic) webservice to handle Sphinx documentation versions
None of our docs allow a user to:
* Have a visual queue of what version of the docs they are seeing.
* be warned ...
Alfredo Deza
01:44 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
... Sage Weil
01:19 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
... Sage Weil
01:39 PM CephFS Bug #9101: multimds: unlinked file is not pruned from replica mds caches
looks like the problem is that another mds has the inode in its cache and isn't trimming it (or being asked to trim i... Sage Weil
01:13 PM CephFS Bug #9101 (Fix Under Review): multimds: unlinked file is not pruned from replica mds caches
https://github.com/ceph/ceph/pull/2250 Sage Weil
11:36 AM CephFS Bug #9101: multimds: unlinked file is not pruned from replica mds caches
Here is the debug data when using a ceph-fuse client.
We did reproduce the problem
Stephane Boisvert
11:15 AM CephFS Bug #9101 (New): multimds: unlinked file is not pruned from replica mds caches
as a result, deleted files stay pinned for a long time and space does not get removed. Sage Weil
01:35 PM Bug #9055 (Resolved): LibRadosTwoPoolsPP.HitSetWrite (and others) fail on remove of whiteout
Sage Weil
01:30 PM Bug #9052 (Resolved): ceph-mon crashes with *** Caught signal (Floating point exception) **
Sage Weil
12:38 PM CephFS Feature #9029 (Resolved): min/max uid for snapshot creation
Sage Weil
11:59 AM Bug #9102 (Resolved): ceph-disk has undefined variables
We fail to track them because the build doesn't yell at us, in the meantime, those should be fixed.... Alfredo Deza
10:46 AM Bug #9096 (Resolved): OSD::require_same_peer_instance fails to acquire lock
Sage Weil
10:23 AM Bug #9096 (Fix Under Review): OSD::require_same_peer_instance fails to acquire lock
https://github.com/ceph/ceph/pull/2249 Samuel Just
03:38 AM Bug #9096: OSD::require_same_peer_instance fails to acquire lock
It is the cause of http://tracker.ceph.com/issues/9074 Loïc Dachary
03:37 AM Bug #9096 (Resolved): OSD::require_same_peer_instance fails to acquire lock
It can be reproduced by running a few times (less than 5) *qa/workunits/cephtool/test.sh -t mon_osd*. It will eventua... Loïc Dachary
10:33 AM Bug #9082 (Resolved): Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before ...
Sage Weil
09:11 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...
i've pushed wip-9082-firefly... can you please try this and see if it avoids the crash? i was looking for a divide b... Sage Weil
08:34 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...
Hello Sage
Thanks for your time checking this bug. As required i have found some PG’s and 3 OSDs which are making...
karan singh
08:24 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...
Hello Sage
As i have found some PG / OSD that make agent_choose_mode() unhappy. I am attaching logs of 2 differen...
karan singh
09:22 AM Feature #9097 (New): request for tools/commands to see hits/misses on cache pools
request for tools/commands to see hits/misses on cache pools Sheldon Mustard
07:23 AM Bug #9085 (Resolved): erasure-code: ISA plugin does not load
The isa plugin "wip-firefly-isa":https://github.com/ceph/ceph/tree/wip-firefly-isa does not have the bug. It was intr... Loïc Dachary
03:39 AM devops Bug #9074 (Duplicate): gitbuilder: make check does not complete, sometimes
It happens because of http://tracker.ceph.com/issues/9096 Loïc Dachary
01:57 AM devops Bug #9074: gitbuilder: make check does not complete, sometimes
Wrong diagnostic, the error is not from here. It loops while waiting for osds to come back up "a few lines below":htt... Loïc Dachary
01:02 AM devops Bug #9074: gitbuilder: make check does not complete, sometimes
"test.sh":https://github.com/ceph/ceph/blob/ea731ae14216bb479eff1f86ed6bd4a7cb71fb56/qa/workunits/cephtool/test.sh fa... Loïc Dachary
03:17 AM rbd Bug #9078: Removing an RBD is very slow whenever there is write's in other RBD which also belongs...
RBD's are created with different order parameter
Ramakrishnan P
02:00 AM rbd Bug #9078: Removing an RBD is very slow whenever there is write's in other RBD which also belongs...
setup is not available, unable to check "ceph -w", below are information based on IO tool(fio)
before rbd remove: io...
Ramakrishnan P
12:27 AM Bug #9077: Cluster is up in MON node even if Ceph is uninstalled in OSD node
Mon logs and dmesg logs of mon node are attached Ramakrishnan P
12:14 AM rbd Bug #9075: Can't create a version 2 images on RHEL 7
Ok will do :). Sébastien Han

08/12/2014

10:51 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
I can't reproduce.
From your messages, i can't find any error info.
Or am i missing something?
jianpeng ma
10:28 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Yeah I'm using that commmand.
Sorry - messed up the commit hash : 4eb18dd487da4cb621dcbecfc475fc0871b356ac
Mark Kirkwood
10:23 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Are you using this command "ceph-osd --id 0 --mkjournal --mkfs --osd-data /data1/cephdata --osd-journal /dev/sdc1"?
...
jianpeng ma
10:10 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Building wip-9073. Hmmm still getting the invalid argument error and osd down. I'm guessing this means there are two ... Mark Kirkwood
09:01 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Very quick work! Will test... Mark Kirkwood
08:47 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Mark, I've pushed this as wip-9073.. can you please test?
Thanks, Jianpeng! Sorry I missed the pull request earlier!
Sage Weil
08:36 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Yes, I already found this bug. If journal use aio mode. The bug occur.
The https://github.com/ceph/ceph/pull/2185 c...
jianpeng ma
08:28 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
When you say that reverting fixes it, do you mean that it allows an OSD that was erroring out on start to then start,... Sage Weil
06:31 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
...or maybe the ::open() Mark Kirkwood
06:14 PM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
On a different machine instead of a hang I am reliably getting:
2014-08-13 12:50:28.253439 7ffc701bb8c0 -1 ** ERR...
Mark Kirkwood
01:40 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
A correctio - the 'stuck on a mutex' comment is completely wrong - sorry - I'd attached strace to the ceph-osd proces... Mark Kirkwood
09:03 PM Feature #8560 (Pending Backport): mon: instrument paxos
Sage Weil
06:27 PM Bug #8886: Miss some folders in PG's folder
I see. Thank you for your reply~ Jingjing Zhao
01:43 PM Bug #8886 (Closed): Miss some folders in PG's folder
./default.4281.322\u\ushadow\u.Ndfi3nAmRHjph\uXyzjJQutltgGi1Dkd\u1__head_17F630A2__1b_ffffffffffffffff_7
appears t...
Samuel Just
06:18 PM Bug #9067 (Resolved): (wip-objecter) Objecter assertion in SIGINT handler
... John Spray
04:43 PM Bug #8894: osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
Samuel Just
04:20 PM Bug #8894 (Resolved): osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
Samuel Just
12:19 PM Bug #8894: osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
Samuel Just
12:19 PM Bug #8894: osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
wip-9054 Samuel Just
11:25 AM Bug #8894: osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
I think it's the C_Copyfrom which we gave the objecter in _copy_some. It's got a CopyOpRef. Samuel Just
04:34 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
This sounds right to me! Sage Weil
03:58 PM Bug #9082 (Need More Info): Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub b...
Sage Weil
10:57 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...
I have injected debus osd 20 to one OSD , and and then tried to initiate rados bench on EC pool which is tired with c... karan singh
09:31 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...
can you reproduce this with debug osd = 20 and attach teh log? thanks! Sage Weil
08:27 AM Bug #9082: Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before tier agent ...

I have sent one email to ceph mailing list today , which is related to problem with Ceph pool . ...
karan singh
07:59 AM Bug #9082 (Resolved): Ceph Firefly 0.80.5 : PG has invalid (post-split) stats; must scrub before ...
Hello
Ceph version : 0.80.5
Centos 6.5
Features in use : erasure coding and cache tiering
Few hours back m...
karan singh
03:48 PM Bug #9064 (Resolved): RadosModel assertion failure
Samuel Just
03:48 PM Bug #9064 (Pending Backport): RadosModel assertion failure
Sage Weil
03:26 PM Bug #9064: RadosModel assertion failure
Samuel Just
03:26 PM Bug #9064: RadosModel assertion failure
wip-9064 Samuel Just
03:25 PM Bug #9064: RadosModel assertion failure
Got it: 0ed3adc1e0a74bf9548d1d956aece11f019afee0
We're redirecting RW ordered reads due to the second read promote...
Samuel Just
02:00 PM Bug #9064: RadosModel assertion failure

I've now seen this in a case where the client wasn't in the process of handling a new OSD map (but the server was),...
John Spray
05:17 AM Bug #9064: RadosModel assertion failure
This just reproduced on master 78dc4df, so looks like it's not wip-objecter specific. John Spray
03:24 PM Messengers Bug #8880 (Resolved): msg/Pipe.cc: 1538: FAILED assert(0 == "old msgs despite reconnect_seq featu...
Sage Weil
03:18 PM Bug #8860 (Resolved): ceph-disk issues with custom cluster name
Sage Weil
12:21 PM Bug #8860 (Pending Backport): ceph-disk issues with custom cluster name
Sage Weil
03:16 PM Bug #8625 (Resolved): EC pool - OSD creates an empty file for op with 'create 0~0, writefull 0~xx...
Sage Weil
03:11 PM rgw Bug #8539 (Resolved): civetweb backend responds with a body when a HEAD request yields an error
Sage Weil
03:02 PM Bug #8982 (Resolved): cache pool osds crashing when data is evicting to underlying storage pool
Sage Weil
03:02 PM Bug #8714 (Resolved): we do not block old clients from breaking cache pools
Sage Weil
03:01 PM Bug #8944 (Resolved): Ceph daemon bad asok used in connection with cluster
Sage Weil
02:59 PM Bug #9080 (Resolved): LogClient: sends dup messages, misses some
Sage Weil
01:15 PM Bug #9080 (Pending Backport): LogClient: sends dup messages, misses some
Sage Weil
07:02 AM Bug #9080 (Resolved): LogClient: sends dup messages, misses some
noticed where where 'ceph -s' woudln't show the most recent log message. tracing things it turns out that it was alw... Sage Weil
02:58 PM Bug #9022 (Resolved): Potential lock leaks in RadosClient
Sage Weil
02:57 PM Bug #7999 (Resolved): osd: pgs share info that hasn't been persisted
Sage Weil
02:57 PM rgw Bug #8169 (Resolved): rgw: swift user manifest does not compute etag
Sage Weil
02:56 PM rgw Bug #8269 (Resolved): rgw: corrupted multipart object
Sage Weil
02:56 PM Bug #8438 (Resolved): erasure code: object are not cleanup
Sage Weil
02:56 PM rgw Bug #8442 (Resolved): rgw: does not detect/adapt to erasure pool stripe size
Sage Weil
02:56 PM rgw Bug #8586 (Resolved): Missing Swift API Header causes RadosGW to segfault
Sage Weil
02:55 PM rbd Bug #8912 (Resolved): librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)
Sage Weil
12:38 PM rbd Bug #8912 (Pending Backport): librbd segfaults when creating new image (rbd-ephemeral-clone-stabl...
Sage Weil
02:54 PM Bug #8670 (Resolved): Cache tiering parameters can not be displayed for a pool
Sage Weil
02:48 PM Bug #8696 (Resolved): mon: 'osd pool set' must take into account pool's nature when setting some ...
Sage Weil
02:48 PM Bug #8701 (Resolved): osd: scrub found obsolete rollback obj
Sage Weil
02:47 PM rgw Bug #8702 (Resolved): RadosGW incorrectly converting + to space in URLs
Sage Weil
02:46 PM Bug #8733 (Resolved): OSD crashed at void ECBackend::handle_sub_read
Sage Weil
02:39 PM Bug #8882 (Resolved): osd: osd tier remove ... leaves incomplete clones behind, confusing scrub
Sage Weil
02:39 PM Bug #8889 (Resolved): osd/ReplicatedPG.cc: 5162: FAILED assert(got)
Sage Weil
02:38 PM rbd Bug #8920 (Resolved): rbd/singleton/{all/formatted-output.yaml} fails on trusty due to whitespace
Sage Weil
02:38 PM rgw Bug #8928 (Resolved): rgw: bad object created if stripe size is not a multiple of chunk size
Sage Weil
02:38 PM Bug #8931 (Resolved): failed write reply order from ceph_test_rados
Sage Weil
02:37 PM rgw Bug #8937 (Resolved): rgw: broken large(-ish) objects
Sage Weil
02:37 PM Bug #8943 (Resolved): "ceph df" cannot show pool available space correctly
Sage Weil
02:37 PM Bug #8969 (Resolved): PerfCounters.SinglePerfCounters failure on i386
Sage Weil
02:37 PM rgw Bug #8972 (Resolved): rgw: bucket index log wrong object name in multipart completion
Sage Weil
02:34 PM Bug #9085 (Pending Backport): erasure-code: ISA plugin does not load
Sage Weil
09:46 AM Bug #9085 (Fix Under Review): erasure-code: ISA plugin does not load
"need review":https://github.com/ceph/ceph/pull/2245 Loïc Dachary
09:20 AM Bug #9085 (Resolved): erasure-code: ISA plugin does not load
Because the plugin was not compiled with ErasureCode.cc Loïc Dachary
02:07 PM devops Bug #8160 (Duplicate): multipath-tools does not co-exist with ceph
If/when we implement multipath support in ceph-deploy, this should be resolved. Ian Colle
01:43 PM rgw Bug #9089 (Resolved): rgw: copy_obj_data() does not stripe target object
copy_obj_data() is as it is now a reminiscent of a very old architecture. It should be modified to create a striped o... Yehuda Sadeh
01:36 PM Bug #8591 (Resolved): ceph-disk incorrectly colocates journal when using dm-crypt
wip-ceph-disk Sage Weil
01:35 PM Bug #8922: ceph-deploy mon create fails to create additional monitoring nodes.
does 'hostname' on those machines return the same string, or does it include a domain name, or somethign different? Sage Weil
01:34 PM Bug #8985: "[WRN] map e9 wrongly marked me down" in upgrade:dumpling-x-firefly---basic-vps suite
change the vps.yaml timeout to 90 seconds instead of 40.. these should go away then Sage Weil
01:33 PM Bug #8986 (Duplicate): "[WRN] map e62 wrongly marked me down" in upgrade:dumpling-x-firefly---bas...
Samuel Just
01:33 PM Bug #9012 (Duplicate): "[WRN] map e277 wrongly marked me down" in upgrade:dumpling-x-firefly---ba...
Samuel Just
01:32 PM Bug #9011 (Duplicate): osd memory leaks on next
#9023 Sage Weil
01:27 PM devops Bug #9061 (Resolved): dumpling to firefly upgrade on RH6 restarts the daemons
Sage Weil
01:26 PM Bug #8974 (Need More Info): osd crashed with merge_log assert due to removal of isds
Samuel Just
01:25 PM Bug #8974: osd crashed with merge_log assert due to removal of isds
We can probably make some progress if you reproduce with
debug ms = 1
debug osd = 20
debug filestore = 20
on ...
Samuel Just
01:14 PM Bug #8505 (Resolved): OSD osd/OSD.cc: 6222: FAILED assert(p->second.empty())
Samuel Just
01:13 PM Bug #8691 (Resolved): osd: PG::_lock, OSD::pg_map_lock lock cycle
Samuel Just
01:10 PM Bug #8939 (Duplicate): stalled LibRadosTwoPoolsPP.TryFlushReadRace; client failed to reconnect?
#8891 Sage Weil
01:09 PM Bug #8940 (Duplicate): 3.22s1 shard 0(2) missing ad166f62/benchmark_data_plana57_30491_object1036...
Sage Weil
01:06 PM Bug #9069 (Resolved): rgw tests reported as failed in teuthology-2014-08-11_10:35:04-upgrade:dump...
Sage Weil
12:43 PM rgw Bug #8784: rgw: completion leak
Note that all the failures are at the copy object across regions path. I did find a missing cleanup at the error hand... Yehuda Sadeh
10:53 AM Bug #9058: rest-api: long-running process may fail 'tell osd...' due to stale osdmap
ubuntu@teuthology:/a/teuthology-2014-08-10_02:30:01-rados-next-testing-basic-plana/412468 Sage Weil
10:08 AM Bug #9087 (Can't reproduce): ceph_test_rados_list_parallel hang
... Sage Weil
09:09 AM rbd Bug #6631 (Need More Info): disabling writethrough until flush appears to disable RBD cache
Amit Vijairania wrote:
> More repetition of tests..
>
> // IOPS for Sequential 4KB Write _with_ "rbd cache writet...
Sage Weil
09:07 AM rbd Bug #9078 (Need More Info): Removing an RBD is very slow whenever there is write's in other RBD w...
it sounds like the cluster is just under heavy load. can you confirm how many ops ceph -w shows before and during th... Sage Weil
05:09 AM rbd Bug #9078 (Rejected): Removing an RBD is very slow whenever there is write's in other RBD which a...
Configuration:
3 node with mon and 3 node with OSD connected via Enclosure/jbod, total 15 OSD's
Steps followed:
...
Ramakrishnan P
09:07 AM Feature #9083 (Closed): Standalone script to generate Ceph keys
Goal: To allow 3rd party products which will be acting as Ceph clients to be able to install & configure all Ceph-cli... Neil Levine
09:04 AM Bug #9077 (Need More Info): Cluster is up in MON node even if Ceph is uninstalled in OSD node
can you turn up mon logging (if it isn't up already) and attach teh log from the leader? tehse should get marked dow... Sage Weil
04:49 AM Bug #9077 (Can't reproduce): Cluster is up in MON node even if Ceph is uninstalled in OSD node
Configuration:
1 mon and 1 osd node, number of OSD's 7
Steps followed:
1. Make Cluster up in single node and e...
Ramakrishnan P
09:00 AM rbd Bug #8845 (Resolved): Flattening Clones of clone, results in command failure
Josh Durgin
09:00 AM rbd Bug #9075 (Need More Info): Can't create a version 2 images on RHEL 7
can you retry with the ceph.com package? the 0.81 from fedora is all kinds of busted. Sage Weil
02:45 AM rbd Bug #9075 (Resolved): Can't create a version 2 images on RHEL 7
Hi,
I can't create version 2 images, version 1 works though.
# rbd create -s 10240 --image-format 2 lesebb
20...
Sébastien Han
08:56 AM Bug #8595 (In Progress): osd: client op blocks until backfill starts (dumpling)
with this patch, i see filestore tripping over ENOENT on clone:
ubuntu@teuthology:/a/teuthology-2014-08-11_19:00:0...
Sage Weil
07:35 AM rgw Bug #9002: Creating swift key with --gen-secret in separate step from subuser creation fails
have meet on Wheezy and Ubuntu with Ceph0.80.5 too.
it can be sucessful when use :
radosgw-admin user create --su...
only debian
07:31 AM CephFS Bug #9056: fuse kmod + ceph-fuse triggers "BUG: sleeping function called from invalid context"
... John Spray
06:51 AM CephFS Bug #9056 (Resolved): fuse kmod + ceph-fuse triggers "BUG: sleeping function called from invalid ...
Sage Weil
05:10 AM CephFS Bug #9056: fuse kmod + ceph-fuse triggers "BUG: sleeping function called from invalid context"
This is supposed to be fixed upstream in v3.16-rc6 by commit c55a01d360af, will close this when we've seen a clean fs... John Spray
07:20 AM Bug #9044: erasure-code: use ruleset instead of ruleid
"backport to firefly":https://github.com/ceph/ceph/pull/2244 Loïc Dachary
05:58 AM Bug #9044 (Pending Backport): erasure-code: use ruleset instead of ruleid
Loïc Dachary
05:57 AM Bug #9044 (Resolved): erasure-code: use ruleset instead of ruleid
Loïc Dachary
05:55 AM Bug #9044: erasure-code: use ruleset instead of ruleid
Works. The problems of this run are
* "unrelated MDS decode bug":http://pulpito.ceph.com/loic-2014-08-12_10:00:07-...
Loïc Dachary
12:58 AM Bug #9044: erasure-code: use ruleset instead of ruleid
"scheduled upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-12_10:00:07-upgrade:firefly-x:stress-... Loïc Dachary
06:56 AM CephFS Bug #8648: Standby MDS leaks memory over time
Any change you can run one of these in standby under massif for a while? that will tell us what is leaking! Sage Weil
06:55 AM CephFS Bug #8651 (Won't Fix): crashing mds in an active-active mds setup
this MDS got blacklisted. there is an open issues somewhere to make the shutdown more friendly, but the behavior is ... Sage Weil
06:52 AM Bug #9023: valgrind failures in OSD
The leaks in the init stuff seem likely also to be present on master John Spray
06:50 AM CephFS Bug #8725: mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
we probably have to do a reencoding trick like we do in MOSDMap? Sage Weil
06:48 AM CephFS Bug #8876 (Resolved): kcephfs: hang on read of length 0
Sage Weil
06:22 AM Bug #9079 (Resolved): osd: bad learned_addr during send_boot
... Sage Weil
06:10 AM Bug #8520: osd: segv in PushOp::print()
... Sage Weil
03:27 AM rbd Bug #8385: RBD / QEMU Crash: Invalid fastbin entry (free)
Any interest in a lookalike bug from Cuttlefish?
/lib/x86_64-linux-gnu/libc.so.6(+0x7e566)[0x7f7cd15ad566]
/usr/...
Andrey Korolyov
02:55 AM rbd Bug #9076 (Resolved): Can't completely remove a version 1 image on RHEL 7
I can create version 1 image, however the deletion is not complete.
# rbd create -s 10240 --image-format 1 leseb
...
Sébastien Han
12:54 AM devops Bug #9074: gitbuilder: make check does not complete, sometimes
"re-run the build to check if it fails always or sometimes":http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-... Loïc Dachary
12:52 AM devops Bug #9074 (Duplicate): gitbuilder: make check does not complete, sometimes
It looks like i386 build fails because a timeout interrupts it before it gets a chance to complete.
It could be t...
Loïc Dachary

08/11/2014

09:15 PM Bug #9073 (Resolved): OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Using a src build (and the packages built from it) on Ubuntu 14.04 x86_64. Ceph version is 0.83-399-gf77449c.
In ...
Mark Kirkwood
08:53 PM rbd Bug #9071 (Duplicate): mkfs.ext4 stuck in D state on RBD with kernel client
This is a bug in 3.15; it is not present in 3.14. The fix will make it into the next stable 3.15 release soon.
Sage Weil
07:32 PM rbd Bug #9071: mkfs.ext4 stuck in D state on RBD with kernel client
Please, mark this issue as duplicate of http://tracker.ceph.com/issues/8818 Ivan Mironov
06:06 PM rbd Bug #9071: mkfs.ext4 stuck in D state on RBD with kernel client
Reproducible on all my ceph hosts (all with the same kernel), with any image format (1 or 2). But only with mkfs.ext4... Ivan Mironov
05:47 PM rbd Bug #9071 (Duplicate): mkfs.ext4 stuck in D state on RBD with kernel client
I tried to create ext4 on newly created and mapped RBD image, but mkfs.ext4 stuck:
# mkfs.ext4 /dev/rbd/docker.rbd...
Ivan Mironov
06:15 PM Documentation #8955 (Resolved): doc refers to [default] section, don't think it exists
's/[default]/[global]/' John Wilkins
06:10 PM Documentation #8955 (In Progress): doc refers to [default] section, don't think it exists
John Wilkins
06:05 PM devops Bug #8734 (Resolved): EPEL / Ceph.com package priority issues
I added priorty=2 to the get packages document example for ceph.repo. I also added an install yum-priorities series o... John Wilkins
05:56 PM devops Bug #8734 (In Progress): EPEL / Ceph.com package priority issues
John Wilkins
05:51 PM Bug #9072: error setting 'mon_pg_warn_min_objects' to '10K': (22) Invalid argument
ubuntu@teuthology:/a/sage-2014-08-10_18:40:12-rados-firefly-next-distro-basic-multi/414556 Sage Weil
05:50 PM Bug #9072 (Resolved): error setting 'mon_pg_warn_min_objects' to '10K': (22) Invalid argument
... Sage Weil
05:25 PM Bug #9069: rgw tests reported as failed in teuthology-2014-08-11_10:35:04-upgrade:dumpling:rgw-du...
oh.. it' snot running as root.. or with daemon-helper. Sage Weil
05:24 PM Bug #9069: rgw tests reported as failed in teuthology-2014-08-11_10:35:04-upgrade:dumpling:rgw-du...
7585 ? Sl 0:05 radosgw -n client.0 -k /etc/ceph/ceph.client.0.keyring --rgw-socket-path /home/ubuntu/ceph... Sage Weil
03:57 PM Bug #9069 (Resolved): rgw tests reported as failed in teuthology-2014-08-11_10:35:04-upgrade:dump...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-11_12:05:02-upgrade:dumpling-dumpling---basic-vps/... Yuri Weinstein
04:58 PM rbd Bug #8912 (Fix Under Review): librbd segfaults when creating new image (rbd-ephemeral-clone-stabl...
https://github.com/ceph/ceph/pull/2239 Josh Durgin
01:39 PM rbd Bug #8912: librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)
Looks like it was a race condition in a previously little-used error path. Josh Durgin
01:04 PM rbd Bug #8912 (In Progress): librbd segfaults when creating new image (rbd-ephemeral-clone-stable-ice...
Excellent report, your reproducer causes the same crash for me. Josh Durgin
04:14 PM Bug #9044: erasure-code: use ruleset instead of ruleid
gitbuilder is running Loïc Dachary
03:32 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
wip-9054 Samuel Just
03:07 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
When we go to flush clone 22, all we know is that 22 is dirty, has snaps
[21], and 4 is clean. As part of fl...
Samuel Just
02:24 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
Ok, we start with the following configuration in the cache (all dirty):
30:[29,21,20,15,10,4]:[22(21), 15(15,10), ...
Samuel Just
12:45 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
Actually, looks like this might already be handled correctly, re-consulting the log. Samuel Just
12:00 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
Thinking Samuel Just
11:52 AM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
Hmm, I think the bug is like this:
Normally, if we get the following op sequence:
- write 1:[]
- delete 10:[3] (...
Samuel Just
03:16 PM Bug #9040: clients can SEGV during package upgrade
I see no segmentation errors in the latest run: /a/teuthology-2014-08-11_12:05:02-upgrade:dumpling-dumpling---basic-v... Yuri Weinstein
12:34 PM rgw Bug #8539: civetweb backend responds with a body when a HEAD request yields an error
Merged, commit:0a2b4c25541bbd15776d3d35986518e37166910f Yehuda Sadeh
12:34 PM rgw Bug #8539 (Pending Backport): civetweb backend responds with a body when a HEAD request yields an...
Yehuda Sadeh
12:24 PM Bug #9064: RadosModel assertion failure

The bug is happening when a new OSD map is received in the middle of the series of transactions. The read transact...
John Spray
11:40 AM Bug #9064: RadosModel assertion failure
Got an even more specific backtrace ... John Spray
09:57 AM Bug #9064: RadosModel assertion failure
trying to reproduce locally with objecter logging turned up and ``ms inject socket failures`` enabled as it is in the... John Spray
09:29 AM Bug #9064: RadosModel assertion failure
I understand this a little better now: the operations in this WriteOp are 1,2 (writes), 4 (setxattr), 5 (read). So t... John Spray
07:23 AM Bug #9064: RadosModel assertion failure
http://qa-proxy.ceph.com/teuthology/john-2014-08-10_02:14:59-rados-wip-mds-contexts-testing-basic-plana/411119/teutho... John Spray
07:22 AM Bug #9064 (Resolved): RadosModel assertion failure

http://qa-proxy.ceph.com/teuthology/john-2014-08-10_02:14:59-rados-wip-mds-contexts-testing-basic-plana/411119/teut...
John Spray
10:41 AM Bug #9057 (Resolved): mark_down from fast dispatch can deadlock
Sage Weil
09:57 AM rgw Subtask #9068 (Closed): rgw: add rgw setup to vstart
As part of the development documentation we need to update vstart to create an RadosGW development environment. Luis Pabon
09:53 AM Bug #9067 (Resolved): (wip-objecter) Objecter assertion in SIGINT handler

@ wip-mds-contexts 2550fc51f30a8a1e581dd9a90511732a3b70ad2a
When I start a "ceph status" while no mon is running...
John Spray
09:01 AM devops Bug #9066 (Rejected): Need ceph-deploy to be able to run to JUST generate ceph.conf and keyring w...
Mirror of issue: https://bugzilla.redhat.com/show_bug.cgi?id=1127852 Alfredo Deza
08:37 AM Bug #9065 (Resolved): LibRados* tests failed in upgrade:dumpling-x-firefly---basic-vps
This should be fixed by https://github.com/ceph/ceph/pull/2236 (in review)
Logs are in http://qa-proxy.ceph.com/te...
Yuri Weinstein
08:33 AM devops Bug #9032 (Rejected): ceph-deploy over proxy
The `--gpg-url` is only valid if you are pointing to a custom repo.
What you need to do is create a custom repo se...
Alfredo Deza
08:28 AM Feature #8580: Decrease disk thread's IO priority and/or make it configurable
Hi,
The backport to dumpling is missing the commit which provides the new configurable: https://github.com/ceph/ce...
Dan van der Ster
05:04 AM Bug #9062: Mon segfault in waitlist_or_zap_client
Note that this was wip-mds-clients which doesn't have any messenger changes and doesn't have any mon changes other th... John Spray
05:01 AM Bug #9062 (Resolved): Mon segfault in waitlist_or_zap_client

http://pulpito.front.sepia.ceph.com/john-2014-08-10_02:14:59-rados-wip-mds-contexts-testing-basic-plana/411054/
...
John Spray
04:37 AM Bug #9023: valgrind failures in OSD

Haven't seen the "new Session" one since rebasing on master, so I'm optimistic that it was the same thing as the le...
John Spray
04:09 AM CephFS Bug #8878 (In Progress): mds lock cycle (wip-objecter)
I think all these are OK now in wip-mds-contexts: remaining failures on that branch are all outside MDS. John Spray
04:09 AM Bug #9009 (Resolved): (wip-objecter) ObjectCacher assert in fs client
This is all good now in wip-mds-contexts (http://pulpito.ceph.com/john-2014-08-09_14:56:53-fs-wip-mds-contexts-testin... John Spray

08/10/2014

11:43 PM devops Bug #9061 (Resolved): dumpling to firefly upgrade on RH6 restarts the daemons
Hi,
When I upgrade the RPMs on a RH6 server from 0.67.9 to 0.80.5, the daemons are (cond)restarted. I believe these ...
Dan van der Ster
07:20 PM Linux kernel client Bug #8806: libceph: must use new tid when watch is resent
meanwhile, the MWatchNotify message now has a return value encoded at the end (s32) when header.version >= 0. See wi... Sage Weil
07:19 PM Linux kernel client Bug #8806: libceph: must use new tid when watch is resent
the bug is with the kernel client: it needs to use a new tid when resending the watch. this was partially fixed on t... Sage Weil
05:04 PM Bug #9057 (Fix Under Review): mark_down from fast dispatch can deadlock
https://github.com/ceph/ceph/pull/2238 Sage Weil
10:45 AM Bug #9057: mark_down from fast dispatch can deadlock
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-09_14:13:44-rados-next-testing-basic-multi/410713
3 (!...
Sage Weil
08:41 AM Bug #9057 (Resolved): mark_down from fast dispatch can deadlock
... Sage Weil
04:13 PM Feature #8639 (In Progress): mon: dispatch messages while blocked waiting for IO
Sage Weil
03:45 PM Bug #8620: rest/test.py occasional failure (dumpling)
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-10_13:22:17-rados-dumpling-distro-basic-multi/413788 Sage Weil
02:07 PM Feature #8560 (Fix Under Review): mon: instrument paxos
Sage Weil
12:51 PM rgw Bug #8988 (Fix Under Review): AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
Two consecutive run with the increased timeout do not show the bug ("one":http://pulpito.ceph.com/loic-2014-08-10_15:... Loïc Dachary
02:03 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
In a few tickets it is suggested that this may be an idle timeout problem. I "rescheduled a suite":http://pulpito.cep... Loïc Dachary
01:31 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
In the attached file, each part separated with *-----------------------------* is the output between the last success... Loïc Dachary
01:09 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
The errors for each failure are different and suggests the tests are failing for an independent reason such as the cl... Loïc Dachary
01:03 AM rgw Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
* http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-ba... Loïc Dachary
12:46 PM Bug #9055 (Fix Under Review): LibRadosTwoPoolsPP.HitSetWrite (and others) fail on remove of whiteout
https://github.com/ceph/ceph/pull/2236 Sage Weil
11:05 AM Feature #9059 (Resolved): osd: store opportunistic whole-object checksum
when we deep scrub, we have a whole-object checksums that cover data and omap. store a copy in object_info_t, along ... Sage Weil
10:52 AM Bug #8935: operations not idempotent when enabling cache
sage-2014-08-09_14:13:44-rados-next-testing-basic-multi/410527 and 410528 Sage Weil
10:51 AM Bug #9058 (Can't reproduce): rest-api: long-running process may fail 'tell osd...' due to stale o...
sage-2014-08-09_14:13:44-rados-next-testing-basic-multi/410524 Sage Weil
10:48 AM Bug #8894: osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-09_14:13:44-rados-next-testing-basic-multi/410806
alwa...
Sage Weil
02:16 AM CephFS Bug #8725: mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
"same error":http://pulpito.ceph.com/loic-2014-08-10_09:59:49-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping... Loïc Dachary
12:53 AM CephFS Bug #8725: mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
Another "similar crash":http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chun... Loïc Dachary
12:39 AM CephFS Bug #8725: mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
And the same trace at "upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-08_12:13:20-upgrade:firef... Loïc Dachary
12:33 AM CephFS Bug #8725: mds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
Looks like a similar problem at "upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-08_12:13:20-upg... Loïc Dachary
01:04 AM Feature #9025: erasure-code: chunk remapping
The upgrade suite from firefly had one error related to an independant "MDS problem":http://pulpito.ceph.com/loic-201... Loïc Dachary
12:49 AM Feature #8496 (Resolved): erasure-code: ErasureCode base class
Loïc Dachary
12:41 AM Feature #8496: erasure-code: ErasureCode base class
The "upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-08_12:13:20-upgrade:firefly-x:stress-split-... Loïc Dachary
12:16 AM Bug #8978: ceph ping not working as expected
I'm experiencing the same (on newly installed ceph-cluster via Ubuntu server 14.04.1):
ceph status
cluster b6...
Kees Boogert

08/09/2014

10:55 PM rbd Bug #8000: SLAB: Unable to allocate memory on node 0
Unfortunately converting RBD to image format 2 did not fix it. User returned after being away for a week and her syst... Dmitry Smirnov
05:50 PM CephFS Bug #9056: fuse kmod + ceph-fuse triggers "BUG: sleeping function called from invalid context"

http://pulpito.front.sepia.ceph.com/john-2014-08-09_14:56:53-fs-wip-mds-contexts-testing-basic-plana/409236/
http:...
John Spray
05:48 PM CephFS Bug #9056 (Resolved): fuse kmod + ceph-fuse triggers "BUG: sleeping function called from invalid ...

kernel 5f740d7e1531099b888410e6bab13f68da9b1a4d
wip-mds-contexts (aka wip-objecter) 7be59771bff09e2b46b5467627cb...
John Spray
12:53 PM Bug #9055 (Resolved): LibRadosTwoPoolsPP.HitSetWrite (and others) fail on remove of whiteout
2014-08-09T09:03:14.670 INFO:tasks.workunit.client.0.plana70.stdout:test/librados/TestCase.cc:93: Failure
2014-08-09...
Sage Weil
12:26 PM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
2014-08-08 10:55:12.312751 7f1237847700 10 osd.0 pg_epoch: 462 pg[2.1( v 462'2839 (0'0,462'2839] local-les=422 n=53 e... Sage Weil
10:04 AM Bug #9054: ceph_test_rados: FAILED assert(!old_value.deleted())
almost there. on osd.0, we finish trimming 14a here:
2014-08-08 10:55:12.311901 7f1237847700 10 osd.0 pg_epoch: 4...
Sage Weil
11:43 AM Bug #8894: osd/ReplicatedPG.cc: 9281: FAILED assert(object_contexts.empty())
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-08_22:30:19-rados-wip-sage-testing-testing-basic-burnupi/... Sage Weil
01:39 AM Bug #9044 (Fix Under Review): erasure-code: use ruleset instead of ruleid
"associated pull request":https://github.com/ceph/ceph/pull/2232 Loïc Dachary

08/08/2014

11:00 PM Bug #9054 (Resolved): ceph_test_rados: FAILED assert(!old_value.deleted())
ubuntu@teuthology:/a/teuthology-2014-08-06_02:30:01-rados-next-testing-basic-plana/403383... Sage Weil
10:58 PM Bug #8997: ceph_test_rados_watch_notify hangs
ubuntu@teuthology:/a/teuthology-2014-08-06_02:30:01-rados-next-testing-basic-plana/402968 Sage Weil
10:55 AM Bug #8997: ceph_test_rados_watch_notify hangs
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-08-06_02:30:01-rados-next-testing-basic-plana/402968 Sage Weil
10:54 PM Bug #9053 (Resolved): mon/Paxos.cc: 628: FAILED assert(begin->last_committed == last_committed)
ubuntu@teuthology:/a/teuthology-2014-08-06_02:30:01-rados-next-testing-basic-plana/402965
description: rados/monthra...
Sage Weil
07:36 PM Bug #9052: ceph-mon crashes with *** Caught signal (Floating point exception) **
With no OSDs in the cluster, the calculations for @pgs_per_osd@ can divide by zero (integer, but that still causes th... Dan Mick
07:29 PM Bug #9052 (Resolved): ceph-mon crashes with *** Caught signal (Floating point exception) **
I've found that I can crash ceph-mon by attempting to change pool values (such as pg_num) before adding OSDs to the c... Jamin Collins
06:59 PM rgw Documentation #9051 (Closed): Document rgw_defer_to_bucket_acls option
It appears that the only documentation right now is the commit message of 1d7c2041. Benjamin Gilbert
06:16 PM Bug #7576: osd: large skew in pg epochs (dumpling)
..and when we do, include commit:a52a855f6c92b03dd84cd0cc1759084f070a98c2 !! Sage Weil
06:16 PM Bug #7576 (Pending Backport): osd: large skew in pg epochs (dumpling)
still want to backport this to firefly ... Sage Weil
06:04 PM rgw Bug #8621: civetweb frontend fails authentication if URL has special chars
tested wip-8621 by executing s3tests, there are still a few failures,
logs are copied to ubuntu@mira042.front.sepi...
Tamilarasi muthamizhan
04:42 PM Fix #4205: librados: Improve Watch-notify semantics
http://pad.ceph.com/p/watch-notify Sage Weil
03:55 PM devops Feature #9050 (Rejected): Calamari builds for ceph.com
Neil Levine
03:24 PM devops Feature #6310 (Closed): Get Dumpling into CentOS Ceph repo
Neil Levine
10:31 AM Bug #9046 (Resolved): Limiting the pool object quota stops the IO, however IO does not restart if...
Issue Title: Limiting the pool object quota stops the IO, however IO does not restart if we rest the pool object quot... Hirak Mazumder
09:37 AM Bug #9040: clients can SEGV during package upgrade
Ian Colle
09:03 AM Bug #9023: valgrind failures in OSD
Another `new Session` at OSD.cc:3704
http://qa-proxy.ceph.com/teuthology/john-2014-08-07_18:44:20-fs-wip-mds-context...
John Spray
06:43 AM Bug #9044 (Resolved): erasure-code: use ruleset instead of ruleid
When "ruleset is looked up by name":https://github.com/ceph/ceph/blob/firefly/src/mon/OSDMonitor.cc#L2928 when creati... Loïc Dachary
03:15 AM Feature #9025: erasure-code: chunk remapping
"requeued, for ubuntu 14.04 to get quicker results":http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-... Loïc Dachary
03:13 AM Feature #8496: erasure-code: ErasureCode base class
"requeued, for ubuntu 14.04 to get quicker results":http://pulpito.ceph.com/loic-2014-08-08_12:13:20-upgrade:firefly-... Loïc Dachary
02:43 AM rgw Bug #9043 (Duplicate): rgw:Cannot add object to Ceph using Openstack Dashboard(Horizon) in firefly
Uploading a new object fails with message "Error: Unable to upload object".
While adding an object using Horizon w...
Ashish Chandra

08/07/2014

03:56 PM Feature #8276: ceph-filestore-dump import-rados -p <pool> <archive>
Implemented syntax:
ceph_objectstore_tool import-rados pool [import_file|-]
Import into the specified pool on r...
David Zafman
03:54 PM Bug #8396 (Resolved): osd: message delayed in Session misdirected after split
Samuel Just
03:39 PM Bug #8625 (Pending Backport): EC pool - OSD creates an empty file for op with 'create 0~0, writef...
Sage Weil
02:34 PM Bug #9040: clients can SEGV during package upgrade
https://github.com/ceph/ceph-qa-suite/pull/77 seemed fixing this.
Testing now.
Yuri Weinstein
01:56 PM Bug #9040 (Won't Fix): clients can SEGV during package upgrade
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-06_16:30:35-upgrade:dumpling-dumpling---basic-vps/... Yuri Weinstein
12:37 PM rgw Bug #9039: Using COPY on radosgw to copy object from one bucket to another that's in another pool...
Well, I think data copy is the right thing to do. If I put bucket in different pool is because they're configured dif... Sylvain Munaut
10:40 AM rgw Bug #9039: Using COPY on radosgw to copy object from one bucket to another that's in another pool...
The problem is that it is implicitly assumed with the new manifest that the tail is going to reside at the same pool ... Yehuda Sadeh
10:07 AM rgw Bug #9039: Using COPY on radosgw to copy object from one bucket to another that's in another pool...
Really ? I didn't see anything in the code that checked whether the destination bucket was in the same pool or not an... Sylvain Munaut
09:59 AM rgw Bug #9039: Using COPY on radosgw to copy object from one bucket to another that's in another pool...
That sounds like an issue with the new (firefly) manifest. Yehuda Sadeh
07:21 AM rgw Bug #9039 (Resolved): Using COPY on radosgw to copy object from one bucket to another that's in a...
Currently if you copy an object from a bucket to another one which is in another rados pool, things will just break. ... Sylvain Munaut
09:34 AM Bug #9035 (Closed): ceph cluster is using more space than actual data after replication
the used is simply summing the statfs(2) results on all the OSDs. you can see this by doing a df on the osd volumes,... Sage Weil
02:24 AM Bug #9035 (Closed): ceph cluster is using more space than actual data after replication
Ceph cluster is using more space than estimated space to store data after replication.
Total cluster capacity is 5...
Srinivasula Reddy Maram
07:52 AM rgw Bug #9037 (Duplicate): civetweb: error HEAD responses return body
Ian Colle
07:40 AM rgw Bug #9037: civetweb: error HEAD responses return body
Ah, sorry, somehow managed to miss it when I looked through the issue list. Please close this then. Valtteri Vuorikoski
07:34 AM rgw Bug #9037: civetweb: error HEAD responses return body
See #8539 Sylvain Munaut
02:59 AM rgw Bug #9037 (Duplicate): civetweb: error HEAD responses return body
0.80.5 radosgw with civetweb frontend returns body data when sending an error response to a HEAD request. This breaks... Valtteri Vuorikoski
06:41 AM CephFS Feature #9029: min/max uid for snapshot creation
Wido den Hollander
06:00 AM Bug #4254: osd: failure to recover before timeout on rados bench and thrashing; negative stats
I am seeing this issue again on v0.80.4. I stopped 3 osd processes and marked them as out to trigger data migration (... Zhi Zhang
03:08 AM Feature #8496: erasure-code: ErasureCode base class
"requeued on vps because plana are very busy":http://pulpito.ceph.com/loic-2014-08-07_12:09:48-upgrade:firefly-x:stre... Loïc Dachary
03:06 AM Feature #9025: erasure-code: chunk remapping
"queued the suite on vps because plana are very busy":http://pulpito.ceph.com/loic-2014-08-07_12:06:56-upgrade:firefl... Loïc Dachary
12:54 AM Feature #9025: erasure-code: chunk remapping
"upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-07_09:56:17-upgrade:firefly-x:stress-split-wip-... Loïc Dachary
01:23 AM Feature #9034 (New): erasure-code: better LRC strategy
The current LRC recovery strategy does not take advantage of all possibilities and may fail to discover a scenario th... Loïc Dachary
01:17 AM Feature #9033 (Resolved): erasure-code: simplified LRC
Add implicit parity and simplified LRC as "described by Andreas":https://www.mail-archive.com/ceph-devel@vger.kernel.... Loïc Dachary

08/06/2014

06:40 PM Bug #9022 (Pending Backport): Potential lock leaks in RadosClient
Sage Weil
02:58 AM Bug #9022: Potential lock leaks in RadosClient
Pull request on the way. Pavan Rallabhandi
02:58 AM Bug #9022 (Resolved): Potential lock leaks in RadosClient
While going through RadosClient, identified couple of interfaces librados::RadosClient::lookup_pool() and librados::R... Pavan Rallabhandi
03:39 PM Feature #9031: List RADOS namespaces and list all objects in all namespaces

A way to implement this is to enhance the pg_ls_repsonse_t to include the namespace (or change object_t to hobject_...
David Zafman
02:30 PM Feature #9031 (Resolved): List RADOS namespaces and list all objects in all namespaces
We can currently create namespaces, but cannot easily view those that have been created. A method of listing namespac... Brian Andrus
03:23 PM devops Bug #9032 (Rejected): ceph-deploy over proxy
I have my servers working behind a proxy. When I run the ceph-deploy install command I get an error:
[ceph01][INFO ...
TJ Walker
02:05 PM Feature #9030 (Fix Under Review): mon: quickly identify 'problem'  osds
Sage Weil
02:05 PM Feature #9030 (Resolved): mon: quickly identify 'problem'  osds
Sage Weil
12:55 PM Bug #8860 (Fix Under Review): ceph-disk issues with custom cluster name
PR opened https://github.com/ceph/ceph/pull/2216 Alfredo Deza
12:25 PM CephFS Feature #9029 (Resolved): min/max uid for snapshot creation
On shared systems like shared hosting it might be useful to prevent regular users from creating snapshots on CephFS.
...
Wido den Hollander
12:20 PM rgw Feature #6747: PowerDNS backend for RGW bucket directing
Wido den Hollander
11:06 AM rbd Bug #8845 (Pending Backport): Flattening Clones of clone, results in command failure
Sage Weil
09:41 AM Bug #9019 (Resolved): Makefile.am: error: required file './README' not found
fixed it up with a symlink.. other solutions seemed more annoying :( Sage Weil
08:39 AM Linux kernel client Bug #8818 (Resolved): IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
OK, thanks everybody.... Ilya Dryomov
08:09 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
I switched to the good kernel (3.16.0-ceph-00037-g0532581) yesterday and re-ran my scripts overnight. The scripts co... Greg Wilson
08:39 AM Linux kernel client Bug #8464 (Resolved): krbd: deadlock
OK, thanks everybody.... Ilya Dryomov
08:06 AM Feature #8496: erasure-code: ErasureCode base class
"scheduled upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-06_17:07:04-upgrade:firefly-x:stress-... Loïc Dachary
06:22 AM Feature #8496: erasure-code: ErasureCode base class
The test "only had one job":http://pulpito.ceph.com/loic-2014-08-05_13:45:56-upgrade:firefly-x:stress-split-wip-8496-... Loïc Dachary
07:12 AM Feature #9025 (Fix Under Review): erasure-code: chunk remapping
"need review":https://github.com/ceph/ceph/pull/2213 Loïc Dachary
06:28 AM Feature #9025 (Resolved): erasure-code: chunk remapping
Interpret the *mapping* parameter and remap the chunks accordingly. For instance mapping=_DD means the data chunks ar... Loïc Dachary
07:11 AM CephFS Feature #9026 (Resolved): client: vxattr support for rctime, rsize, etc.
Sage Weil
05:44 AM Bug #9023 (Can't reproduce): valgrind failures in OSD

osd.2 from OSD.cc:462 (SafeTimer::init, pthread_create)
http://pulpito.front.sepia.ceph.com/john-2014-08-01_11:0...
John Spray

08/05/2014

11:22 PM Feature #9021 (Resolved): librbd: shared flag, object map
we need to consider to make a tradeoff between multi-client support and single-client support for librbd. In practice... Haomai Wang
10:43 PM Bug #8797: "ceph status" do not exit with python_2.7.8
For a moment Python maintainer in Debian kindly fixed this issue for us by adding patch to revert problematic change ... Dmitry Smirnov
07:34 PM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
The 00036 "bad" kernel started showing the problem in the /var/log/kern.log file within minutes of starting my test s... Eric Eastman
12:49 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Eric, Greg,
The fix on top of 3.16 + testing is in wip-request-fn.
http://gitbuilder.ceph.com/kernel-deb-precis...
Ilya Dryomov
06:16 PM Bug #9019 (Resolved): Makefile.am: error: required file './README' not found
commit(a923e2c9eb16823fa484c) Renamed README to README.md to render in markdown. After that, i can't generate Makefil... jianpeng ma
06:15 PM Bug #9008: Objecter: pg listing can deadlock when throttling is in use
> I'm guessing the request is hung on teh OSD side of things...
Thanks Sage. Sadly after radosgw daemon restarting, ...
Guang Yang
08:28 AM Bug #9008 (Need More Info): Objecter: pg listing can deadlock when throttling is in use
Sage Weil
08:28 AM Bug #9008: Objecter: pg listing can deadlock when throttling is in use
please query the admin socket for the process like so:
ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requ...
Sage Weil
02:44 AM Bug #9008 (Resolved): Objecter: pg listing can deadlock when throttling is in use
In our Ceph cluster (with radosgw), we found that occasionally the processing threads hands forever and eventually ha... Guang Yang
02:24 PM Bug #9018 (Resolved): "LibRadosTwoPoolsPP*" failed in upgrade:dumpling-x-firefly---basic-vps
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-05_09:22:33-upgrade:dumpling-x-firefly---basic-vps... Yuri Weinstein
02:13 PM devops Feature #8868: Update Fedora to 0.80.5 packages with ceph-common
So, there's a PR open for some restructuring of the .spec file now that we need to get in soon to make this more sane... Dan Mick
01:21 PM Fix #6278 (Resolved): osd: throttle snap trimming
Sage Weil
01:20 PM devops Fix #9017 (Rejected): [paddles] implement validation across all controller methods
paddles has a lot of boilerplate in controllers that look like:... Alfredo Deza
01:15 PM Feature #9015 (Resolved): msgr refactoring to support xio work
Sage Weil
01:09 PM Feature #9015 (Resolved): msgr refactoring to support xio work
Sage Weil
01:14 PM Fix #8905 (In Progress): msgr: encode osd epoch in nonce to avoid misc OSD reconnect races
Sage Weil
01:10 PM Feature #7516 (Fix Under Review): mon: reweight-by-pg
Sage Weil
01:06 PM Feature #7238 (In Progress): erasure code : implement LRC plugin
Samuel Just
12:55 PM Bug #8083: erasure-code: fix static code analysis errors found in gf-complete
Loïc Dachary
12:28 PM Documentation #8875 (Resolved): `ceph-deploy new` needs to be called for every node, not just the...
PR https://github.com/ceph/ceph/pull/2206
and merged commit e6935dd into master
Alfredo Deza
09:37 AM Documentation #8875 (In Progress): `ceph-deploy new` needs to be called for every node, not just ...
I noted the problem in the docs and will fix that shortly.
You are right, you need to run `ceph-deploy new {NODES}...
Alfredo Deza
11:19 AM Bug #9011: osd memory leaks on next
gonna see if this happens on plana too Sage Weil
11:13 AM Bug #9011: osd memory leaks on next
these look like static std::strings. and some other weird leaks that don't make sense... Sage Weil
08:00 AM Bug #9011 (Duplicate): osd memory leaks on next
ubuntu@teuthology:/a/sage-2014-08-04_11:34:19-rgw-next-testing-basic-vps/397606
need to clean these up
Sage Weil
09:26 AM rgw Feature #9013 (Resolved): rgw: set civetweb as a default frontend
Should add civetweb to the default frontends. Yehuda Sadeh
09:13 AM Messengers Bug #8880 (Pending Backport): msg/Pipe.cc: 1538: FAILED assert(0 == "old msgs despite reconnect_s...
Sage Weil
09:11 AM Bug #9012 (Duplicate): "[WRN] map e277 wrongly marked me down" in upgrade:dumpling-x-firefly---ba...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-04_14:18:17-upgrade:dumpling-x-firefly---basic-vps... Yuri Weinstein
09:05 AM rgw Feature #8218 (In Progress): rgw: object versioning manifest changes
Ian Colle
09:05 AM rgw Feature #8217 (In Progress): rgw: object versioning object overwrite / delete changes
Ian Colle
09:05 AM rgw Feature #8216 (In Progress): rgw: object versioning objclass support
Ian Colle
09:05 AM rgw Feature #8473 (In Progress): rgw: Shard bucket index objects to improve single bucket PUT throughput
Ian Colle
08:54 AM rbd Bug #8845 (Fix Under Review): Flattening Clones of clone, results in command failure
https://github.com/ceph/ceph/pull/2205 Josh Durgin
08:52 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
btw, the steps to reproduce this issue are mentioned by Sahana above & it can be reproduced on a single node too.
...
Dhiraj Kamble
08:47 AM Fix #8914: osd crashed at assert ReplicatedBackend::build_push_op
Hi Greg,
No i did not intend to add any comments.
The reason i thought we should assert is, so that we can serv...
Dhiraj Kamble
08:25 AM Bug #9007 (Duplicate): Ceph Firefly 0.80.4 : Unable to get some pool values
you're right. this is fixed in master, and backported to firefly-next.. will be in next firefly point release. Sage Weil
01:50 AM Bug #9007 (Duplicate): Ceph Firefly 0.80.4 : Unable to get some pool values
h1. Hello Developers
I am curious to know if there is something missing from the code for Ceph pool values.
As...
karan singh
07:56 AM rgw Bug #8676: md5sum check failed during readwrite.py
ubuntu@teuthology:/a/sage-2014-08-04_11:34:19-rgw-next-testing-basic-vps/397522 Sage Weil
04:46 AM Feature #8496: erasure-code: ErasureCode base class
"upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-05_13:45:56-upgrade:firefly-x:stress-split-wip-... Loïc Dachary
12:48 AM Feature #8496 (Fix Under Review): erasure-code: ErasureCode base class
"pull request":https://github.com/ceph/ceph/pull/2201 Loïc Dachary
04:22 AM Bug #9009 (In Progress): (wip-objecter) ObjectCacher assert in fs client
OK, no big deal, just that there are contexts in the Client, like the MDS, which need updating to take client_lock wh... John Spray
03:49 AM Bug #9009 (Resolved): (wip-objecter) ObjectCacher assert in fs client

From branch wip-mds-contexts, which is a derivative of wip-objecter.
http://qa-proxy.ceph.com/teuthology/john-20...
John Spray
03:21 AM rgw Feature #8911: RGW doesn't return 'x-timestamp' in header which is used by 'View Details' of Open...
It also doesnot returns "Content-type" header as well. Swift does return this header aswell. So I would love to see r... Ashish Chandra
12:45 AM rgw Documentation #9003: rgw: document development setup for rgw
Much needed. Great! Abhishek Lekshmanan

08/04/2014

11:33 PM Feature #8496 (In Progress): erasure-code: ErasureCode base class
Because it needs work to adapt the isa plugin, it deserves a separate patch. Otherwise it mixes two unrelated topics. Loïc Dachary
05:12 AM Feature #8496 (Rejected): erasure-code: ErasureCode base class
It is part of a "larger pull request":https://github.com/ceph/ceph/pull/1911 Loïc Dachary
11:21 PM Bug #8736: thrash and scrub combination lead to error
http://pulpito.ceph.com/loic-2014-08-04_15:06:02-upgrade:firefly-x:stress-split-wip-8475-testing-basic-plana/396887/
...
Loïc Dachary
11:02 PM Feature #8475 (Resolved): erasure-code: oversized objects when using the Cauchy technique
Loïc Dachary
06:05 AM Feature #8475: erasure-code: oversized objects when using the Cauchy technique
"scheduled upgrade:firefly-x:stress-split":http://pulpito.ceph.com/loic-2014-08-04_15:06:02-upgrade:firefly-x:stress-... Loïc Dachary
02:07 AM Feature #8475: erasure-code: oversized objects when using the Cauchy technique
"Rebased and repushed":https://github.com/ceph/ceph/pull/1890 , running gitbuilder Loïc Dachary
08:00 PM rgw Feature #3454: Support temp URLs for Swift API
This should be documented somewhere too, at least in the table at http://ceph.com/docs/master/radosgw/swift/ Blair Bethwaite
03:09 PM Bug #8998 (Pending Backport): osd: SEGV in OSD::heartbeat()
Sage Weil
03:00 PM Bug #8998 (Fix Under Review): osd: SEGV in OSD::heartbeat()
https://github.com/ceph/ceph/pull/2198 Sage Weil
09:14 AM Bug #8998: osd: SEGV in OSD::heartbeat()
ubuntu@teuthology:/a/teuthology-2014-08-03_02:30:01-rados-next-testing-basic-plana/394893 Sage Weil
02:18 PM rgw Feature #9004 (New): rgw: multi-site: multi-master
As a user, I want to be able to write to any available RGW and have that file available on other RGWs for read and wr... Neil Levine
02:06 PM Bug #8891 (Resolved): rados bench hang during thrashing
Sage Weil
09:17 AM Bug #8891 (Fix Under Review): rados bench hang during thrashing
Sage Weil
01:53 PM rgw Documentation #9003: rgw: document development setup for rgw
While we're at it, beefing up the rgw support in vstart.sh would be great. right now you can pass RGW=1 and it will ... Sage Weil
01:49 PM rgw Documentation #9003 (Closed): rgw: document development setup for rgw
Yehuda Sadeh
11:20 AM rgw Bug #9002 (Duplicate): Creating swift key with --gen-secret in separate step from subuser creatio...
Customer reported on CentOS with Ceph v0.80.4
Steps to reproduce:
radosgw-admin user create --uid=testuser1 --dis...
Brian Andrus
11:00 AM rgw Bug #9001 (Won't Fix): Starting gateway with radosgw init script fails to create socket
Ceph Version: v0.80.4
Distro: CentOS
Customer reported, unable to reproduce.
/var/run/ceph directory owned by ...
Brian Andrus
09:16 AM Bug #7986: 3.1s0 scrub stat mismatch, got 2041/2044 objects, 0/0 clones, 2041/2044 dirty, 0/0
ubuntu@teuthology:/a/teuthology-2014-08-03_02:30:01-rados-next-testing-basic-plana/395219 Sage Weil
07:07 AM Linux kernel client Bug #8979: GPF kernel panics - auth?
pushed wip-8979 which removes the fixed buffer size. but, we still need to make things not crash when the auth reply... Sage Weil
06:57 AM Linux kernel client Bug #8979: GPF kernel panics - auth?
yeah:
#define TEMP_TICKET_BUF_LEN 256
Sage Weil
06:48 AM Linux kernel client Bug #8979: GPF kernel panics - auth?
... Sage Weil
06:36 AM Documentation #8875: `ceph-deploy new` needs to be called for every node, not just the admin one
I was able to complete install.
The first step above granted sudo rights on each node.
The way I was able to get it...
Bobby Yakov
05:56 AM Documentation #8875: `ceph-deploy new` needs to be called for every node, not just the admin one
You still need a user that can call sudo without a password prompt on remote nodes.
And it looks like you only pas...
Alfredo Deza
05:47 AM devops Bug #8893 (Resolved): ceph-deploy install command on centos 6.5 reports exception
merged commit eb9ea33 into ceph:master Alfredo Deza
01:47 AM Bug #8601 (Resolved): erasure-code: default profile does not exist after upgrade
Loïc Dachary

08/03/2014

09:48 PM rgw Bug #8864: radosgw help doesn't seem to display some debug options
I pushed a couple of commits to fix most of undocumented options in man pages & help for #8112. Can you let me know w... Abhishek Lekshmanan
09:35 PM rbd Bug #8000: SLAB: Unable to allocate memory on node 0
Finally I've isolated the issue.
Something was wrong with a particular RBD image (format 1) that was created on Ceph...
Dmitry Smirnov
09:11 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
another similar hang:... Sage Weil
06:27 PM Bug #8891: rados bench hang during thrashing
i think this was the same repaer vs fast dispatch that i tracked down in wip-msgr. Sage Weil
02:48 PM devops Bug #8330: repodata on rpm repos do not list latest ceph-deploy (1.5.2)
Agreed, this is fixed. Current repodata works perfectly with all packages showing correctly (on the same host btw, I'... Simon Ironside
08:40 AM rgw Bug #8784: rgw: completion leak
ubuntu@teuthology:/a/teuthology-2014-08-01_23:02:01-rgw-master-testing-basic-plana/394054 Sage Weil
08:39 AM Bug #8996 (Resolved): "Segmentation fault" in upgrade:dumpling-x-firefly---basic-vps suite
botched (double) backport, fixed by commit:4e03d5b512c8d2f7fa51dda95c6132e676529f9b Sage Weil

08/02/2014

05:01 PM Bug #8998 (Resolved): osd: SEGV in OSD::heartbeat()
... Sage Weil
04:58 PM Bug #8997 (Can't reproduce): ceph_test_rados_watch_notify hangs
... Sage Weil
04:55 PM Bug #8996 (Resolved): "Segmentation fault" in upgrade:dumpling-x-firefly---basic-vps suite
There are lots of these errors in:
http://pulpito.front.sepia.ceph.com/teuthology-2014-08-02_08:50:33-upgrade:dumpli...
Yuri Weinstein
04:31 PM Messengers Bug #8880: msg/Pipe.cc: 1538: FAILED assert(0 == "old msgs despite reconnect_seq feature")
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392461 Sage Weil
08:14 AM Bug #8396: osd: message delayed in Session misdirected after split
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392256 Sage Weil
08:07 AM Bug #6003: journal Unable to read past sequence 406 ...
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392342... Sage Weil

08/01/2014

09:25 PM Bug #8776 (Won't Fix): osd: runaway memory on dumpling
this is a result of a very large omap object and us building a transaction to delete the keys. the problem is the bi... Sage Weil
09:57 AM Bug #8776: osd: runaway memory on dumpling
Argh, it's building up a leveldb operation to atomically remove all of the keys associated with the object. I *think... Samuel Just
06:26 PM Bug #8930 (Resolved): osd: test unable to produce unfound objects
David Zafman
04:07 PM Bug #8930 (Fix Under Review): osd: test unable to produce unfound objects
David Zafman
09:41 AM Bug #8930: osd: test unable to produce unfound objects
David Zafman
03:56 PM devops Bug #8849 (Resolved): rpm restarts daemons on upgrade
already backported, commit:e75dd2e4b7adb65c2de84e633efcd6c19a6e457b and ^ Sage Weil
03:55 PM Bug #8728 (Resolved): rest/test.py osd create not idempotent
Sage Weil
03:54 PM Bug #8670: Cache tiering parameters can not be displayed for a pool
non trivial to backport.. need to get all the rados test refactoring, too! Sage Weil
03:51 PM CephFS Bug #8622 (Resolved): erasure-code: rados command does not enforce alignement constraints
commit:7a58da53ebfcaaf385c21403b654d1d2f1508e1a Sage Weil
03:48 PM Bug #6789 (Resolved): cannot remove the leader when there only are two monitors
Sage Weil
03:39 PM Bug #8944 (Pending Backport): Ceph daemon bad asok used in connection with cluster
Sage Weil
03:37 PM Bug #8714 (Pending Backport): we do not block old clients from breaking cache pools
Sage Weil
03:35 PM Feature #8674 (Pending Backport): osd: cache tier: avoid promotion on first read
commit:79d1aff1821bc9f21477636df4d0d4e57f2cd008 Sage Weil
03:32 PM rgw Bug #8937 (Pending Backport): rgw: broken large(-ish) objects
Sage Weil
03:05 PM Documentation #8995 (Resolved): Preflight Checklist Clarifications
There are several small clarifications that can be made to the Ceph Preflight Checklist to help new users try out Cep... Christopher Hertel
02:44 PM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
No need to do that just yet. I now fully understand the problem and working on a proper fix that I'd like you to tes... Ilya Dryomov
02:37 PM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
I have done some testing and I am seeing the same thing as Eric. With the deadlock-bad kernel I hit the deadlock iss... Greg Wilson
02:06 PM Bug #8625: EC pool - OSD creates an empty file for op with 'create 0~0, writefull 0~xxx, setxattr...
Making it not an rgw bug. Samuel Just
02:06 PM Bug #8625: EC pool - OSD creates an empty file for op with 'create 0~0, writefull 0~xxx, setxattr...
wip-8625, versioning should never be necessary after a create (it will be necessary before the create if the object a... Samuel Just
09:53 AM Bug #8625: EC pool - OSD creates an empty file for op with 'create 0~0, writefull 0~xxx, setxattr...
It's the create 0~0 followed by a writefull. Arguably, we still shouldn't version the object, I'll take a look. Samuel Just
01:02 PM Fix #8993 (Closed): osd_pool_default_pgp_num woes
When setting osd_pool_default_pgp_num and not osd_pool_default_pg_num you can create pools with more pgp than pg.
...
Alexandre Marangone
12:57 PM devops Bug #8893 (Fix Under Review): ceph-deploy install command on centos 6.5 reports exception
PR opened https://github.com/ceph/ceph-deploy/pull/226 Alfredo Deza
06:51 AM devops Bug #8893 (In Progress): ceph-deploy install command on centos 6.5 reports exception
Alfredo Deza
09:15 AM rbd Bug #8416 (Closed): Client Crash when try to map a volume (ubuntu)
OK, I'm going to assume this was indeed the missing features handling bug. I looked into it, it was introduced in 3.... Ilya Dryomov
08:23 AM Bug #8989 (Rejected): Failed running iogen.sh in upgrade:firefly-firefly-testing-basic-vps suite
It was a test mis-configuration. When we added a new client to run workload on, we had to be more specific about on ... Yuri Weinstein
07:10 AM Bug #8717 (Resolved): teuthology: valgrind leak checks broken for osd (at least)
Sage Weil
05:57 AM Bug #8601: erasure-code: default profile does not exist after upgrade
... Loïc Dachary
02:23 AM Feature #8992 (New): Uniqueness between two or more CRUSH ruleset choose statements
Assuming that ceph-node1 is in default root, when we define and assign following crush rule:... Szymon Zacher
01:44 AM Bug #8641: Cache tiering agent cannot flush or evict objects during the benchmark
In my opinion problem affect also cache_min_evict_age cache_min_flush_age and others. It's impossible to force ceph c... Szymon Zacher
12:40 AM CephFS Bug #8962: kcephfs: client does not release revoked cap
... Zheng Yan

07/31/2014

09:04 PM rgw Bug #8972 (Pending Backport): rgw: bucket index log wrong object name in multipart completion
Sage Weil
09:31 AM rgw Bug #8972 (Fix Under Review): rgw: bucket index log wrong object name in multipart completion
Sage Weil
08:54 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
Zheng Yan wrote:
> Sage Weil wrote:
> > Zheng Yan wrote:
> > > no clue what happened. please dump the mds cache wh...
Sage Weil
07:32 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
Sage Weil wrote:
> Zheng Yan wrote:
> > no clue what happened. please dump the mds cache when it happens next time
...
Zheng Yan
10:11 AM CephFS Bug #8962: kcephfs: client does not release revoked cap
Zheng Yan wrote:
> no clue what happened. please dump the mds cache when it happens next time
We have a dump, act...
Sage Weil
08:48 PM rgw Bug #8991 (Resolved): rgw: RGWRados::list_bi_log_entries() doesn't clear list
... Yehuda Sadeh
03:52 PM Bug #8977: osd: didn't discard sub_op_reply from previous interval?
Added some debugging to dump the OpWQ queue information if there are stale ops, running in loop. Samuel Just
12:53 PM Bug #8977: osd: didn't discard sub_op_reply from previous interval?
2014-07-30 10:40:58.317063 7fc2164da700 0 log [WRN] : slow request 960.196157 seconds old, received at 2014-07-30 10... Samuel Just
02:35 PM Bug #8989 (Rejected): Failed running iogen.sh in upgrade:firefly-firefly-testing-basic-vps suite
There majority of failures related to this in this run: http://pulpito.front.sepia.ceph.com/teuthology-2014-07-30_12:... Yuri Weinstein
12:52 PM Feature #131 (In Progress): bring wireshark plugin is up to date
Sage Weil
12:51 PM Documentation #7 (Resolved): Document Monitor Commands
ceph -h Sage Weil
11:29 AM rgw Bug #8988 (Resolved): AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-plana
"Related issue":http://tracker.ceph.com/issues/9100
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-201...
Yuri Weinstein
11:25 AM Bug #8982 (Pending Backport): cache pool osds crashing when data is evicting to underlying storag...
Sage Weil
11:14 AM Bug #8982 (Fix Under Review): cache pool osds crashing when data is evicting to underlying storag...
Sage Weil
08:47 AM Bug #8982 (In Progress): cache pool osds crashing when data is evicting to underlying storage pool
Sage Weil
07:36 AM Bug #8982 (Resolved): cache pool osds crashing when data is evicting to underlying storage pool
We have a erasure coded pool 'ecdata' and a replicated(size=3) pool 'cache' acting as writeback cache upon it.
When...
Kenneth Waegeman
11:17 AM Bug #8969 (Pending Backport): PerfCounters.SinglePerfCounters failure on i386
Sage Weil
09:48 AM rgw Feature #8987 (New): rgw: data sync for multipart upload
Yehuda Sadeh
09:46 AM Bug #8986 (Duplicate): "[WRN] map e62 wrongly marked me down" in upgrade:dumpling-x-firefly---bas...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-07-30_13:00:44-upgrade:dumpling-x-firefly---basic-vps... Yuri Weinstein
09:43 AM Bug #8985: "[WRN] map e9 wrongly marked me down" in upgrade:dumpling-x-firefly---basic-vps suite
... Yuri Weinstein
09:42 AM Bug #8985 (Resolved): "[WRN] map e9 wrongly marked me down" in upgrade:dumpling-x-firefly---basic...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-07-30_13:00:44-upgrade:dumpling-x-firefly---basic-vps... Yuri Weinstein
09:35 AM Bug #8970 (Won't Fix): Injectargs - inconsistent parsing of bool values
these will also work:
--my-boolean-option=0
--my-boolean-option=false
but you're right, the others won't, be...
Sage Weil
09:33 AM Feature #8973: Add support for collecting usage information by namespace
We decided not to do this when designing namespaces because we wanted namespaces to scale independnetly of the size o... Sage Weil
08:49 AM Bug #8947 (Duplicate): Writing rados objects with max objects set for cache pool crashed osd
Oh, i see it now. This is a dup of #8982. Sage Weil
08:29 AM RADOS Support #8600: MON crashes on new crushmap injection
In addition to the choose vs. chooseleaf issue that Joao is mentioning here, we have also seen problems when min_size... Henning Stener
08:13 AM Bug #8966: ceph.conf "osd pool default size = 2" not working
Then the documentation (http://ceph.com/docs/master/start/quick-ceph-deploy/) on point 2 should be updated.... Christoph Pedro
07:58 AM RADOS Bug #8984 (Won't Fix): creating erasure-code pool when not having a root item default
When creating a EC pool:
> ceph osd pool create poolio 128 128 erasure profile15
It returns
> Error ENOENT: root ...
Kenneth Waegeman
07:46 AM Bug #8983 (Resolved): rados bench -b option does not take orders of magnitude (k,M,..) but also d...
When running this:
> rados -p <pool> bench 1000 write -t 10 -b 4M
It runs with -b 4 instead of expected
> rados -...
Kenneth Waegeman
06:04 AM Bug #8601: erasure-code: default profile does not exist after upgrade
Apparently having an EC pool is still sufficient to prevent kernel clients from mounting, so I don't think we can bac... Greg Farnum
05:52 AM Bug #8601: erasure-code: default profile does not exist after upgrade
"firefly backport":https://github.com/ceph/ceph/pull/2178 Loïc Dachary
05:16 AM Bug #8601 (Pending Backport): erasure-code: default profile does not exist after upgrade
Loïc Dachary
02:53 AM Linux kernel client Bug #8979 (Resolved): GPF kernel panics - auth?
From James Eckersall, "GPF kernel panics" on ceph-users.
I've had a fun time with ceph this week.
We have a clust...
Ilya Dryomov

07/30/2014

10:59 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
no clue what happened. please dump the mds cache when it happens next time Zheng Yan
07:01 AM CephFS Bug #8962: kcephfs: client does not release revoked cap
and the code that did it is in teuthology.git/teuthology/misc.py:... Sage Weil
07:00 AM CephFS Bug #8962: kcephfs: client does not release revoked cap
here is the final state of the directory:... Sage Weil
10:25 PM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
The deadlock-bad kernel showed the error after a few minutes of running multiple dd writes to rbd device. Here is one... Eric Eastman
11:33 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
All,
Can you try and confirm that deadlock-bad fails and deadlock-good works for you?
deadlock-bad:
http://g...
Ilya Dryomov
05:18 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Update: At this point I'm almost certain this is not an rbd/ceph problem. Trying to track down the exact culprit. Ilya Dryomov
04:59 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
I can reproduce this with 100% certainty now on Trusty, 3.15.6-031506-generic.
Running:
bonnie++ -n 512
agai...
Karl Austin
09:57 PM Bug #8752 (New): firefly: scrub/repair stat mismatch
This problem manifests only on caching pools.
I have two EC pools with the following settings:...
Dmitry Smirnov
09:44 PM Bug #8229 (Closed): 0.80~rc1: OSD crash (domino effect)
Closing: nothing left to track here; did not have this problem with 0.80.4. Dmitry Smirnov
09:42 PM Bug #8978 (Can't reproduce): ceph ping not working as expected
Reading the doc: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
I came across command: cep...
Eric Eastman
09:26 PM Bug #8977 (Can't reproduce): osd: didn't discard sub_op_reply from previous interval?
/a/teuthology-2014-07-29_02:30:02-rados-firefly-distro-basic-plana/384397
an op gets stuck in limbo because we are...
Sage Weil
08:54 PM rgw Bug #8586 (Pending Backport): Missing Swift API Header causes RadosGW to segfault
Sage Weil
05:57 PM devops Bug #8976: httpd on RHEL7 (RHEL repo) incompatible with mod_fastcgi (ceph repo)
Also, when trying to enable the httpd ceph pkg with systemctl:
systemctl enable httpd
httpd.service is not a nat...
Marcelo Giles
05:22 PM devops Bug #8976 (Resolved): httpd on RHEL7 (RHEL repo) incompatible with mod_fastcgi (ceph repo)
On a RHEL7 system
yum install httpd mod_fastcgi
systemctl start httpd
Apache fails to start with the folowin...
Marcelo Giles
05:12 PM Bug #8947 (Need More Info): Writing rados objects with max objects set for cache pool crashed osd
can you attach the complete logs? all three osds claim to have hit an assert, but the assert message isn't in the lo... Sage Weil
04:59 PM rbd Bug #8920 (Pending Backport): rbd/singleton/{all/formatted-output.yaml} fails on trusty due to wh...
Sage Weil
01:43 PM rbd Bug #8920 (Fix Under Review): rbd/singleton/{all/formatted-output.yaml} fails on trusty due to wh...
Sage Weil
04:36 PM Bug #8776: osd: runaway memory on dumpling
it's all here:... Sage Weil
02:49 PM Bug #8969 (Fix Under Review): PerfCounters.SinglePerfCounters failure on i386
Sage Weil
10:31 AM Bug #8969 (Resolved): PerfCounters.SinglePerfCounters failure on i386
[ RUN ] PerfCounters.SinglePerfCounters
test/perf_counters.cc:111: Failure
Value of: msg
Actual: "{"test_perfcount...
Sage Weil
02:29 PM Bug #8628 (Resolved): Bad ceph_osd_op.extent union access in ReplicatedPG::do_osd_ops
commit:58212b1245373b6f015cbff11844d33a900bf3cb Sage Weil
02:19 PM Bug #8628 (Rejected): Bad ceph_osd_op.extent union access in ReplicatedPG::do_osd_ops
ceph_osd_op_uses_extent(op.op) guards the references ot the extent view of the union Sage Weil
02:13 PM Bug #8717: teuthology: valgrind leak checks broken for osd (at least)
Sage Weil
02:12 PM Bug #8717 (Resolved): teuthology: valgrind leak checks broken for osd (at least)
Sage Weil
02:12 PM Bug #8777 (Can't reproduce): osd/PGLog.h: 88: FAILED assert(rollback_info_trimmed_to_riter == log...
Sage Weil
02:11 PM Bug #8595: osd: client op blocks until backfill starts (dumpling)
Sage Weil
02:02 PM Bug #8595 (In Progress): osd: client op blocks until backfill starts (dumpling)
Sage Weil
01:59 PM Bug #8714 (Fix Under Review): we do not block old clients from breaking cache pools
https://github.com/ceph/ceph/pull/2172 Sage Weil
01:46 PM Bug #8974 (Can't reproduce): osd crashed with merge_log assert due to removal of isds
Even I got same asserts in one of the osds, when removed one osd from each node in a ceph cluster of 3 osd nodes ( 5 ... Sahana Lokeshappa
01:31 PM devops Bug #8850: ceph-deploy tests fail during tar due to file changed; incomplete shutdown?
an initial take on getting more information on what is going on:

https://github.com/ceph/teuthology/pull/302/files
Alfredo Deza
12:47 PM devops Bug #8850: ceph-deploy tests fail during tar due to file changed; incomplete shutdown?
I initially thought that the ceph daemon was still running but according to upstart docs, this output:... Alfredo Deza
11:53 AM Feature #8973 (New): Add support for collecting usage information by namespace
As of now there is no simple way to determine how much data is being used by a particular namespace. Customers curren... Tyler Brekke
11:36 AM rgw Bug #8972 (Resolved): rgw: bucket index log wrong object name in multipart completion
When completing a multipart upload operation, when removing the parts from the index the entries that are logged in t... Yehuda Sadeh
11:27 AM rgw Bug #8971 (Duplicate): rgw: s3 test failures with civetweb
teuthology logs are copied to ubuntu@mira023.front.sepia.ceph.com:/home/ubuntu/civetweb_s3
config.yaml:...
Tamilarasi muthamizhan
10:35 AM Bug #8970 (Won't Fix): Injectargs - inconsistent parsing of bool values
Hi all,
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) on Ubuntu 14.04 LTS
This is how I am able ...
Peter Vilhan
10:19 AM Feature #8960 (Fix Under Review): filestore: store backend type persisently
https://github.com/ceph/ceph/pull/2163 Sage Weil
10:17 AM Bug #8601: erasure-code: default profile does not exist after upgrade
"rebased and repushed":https://github.com/ceph/ceph/pull/1990 Loïc Dachary
09:37 AM Bug #8966 (Closed): ceph.conf "osd pool default size = 2" not working
the config option needs to go in the [global] section, not [default] (which is never used for anything) Sage Weil
04:31 AM Bug #8966: ceph.conf "osd pool default size = 2" not working
Recognized the failure with the command "ceph osd dump". There the pools had always the size 3 (default). Christoph Pedro
04:29 AM Bug #8966 (Closed): ceph.conf "osd pool default size = 2" not working
Version
ceph-deploy: 1.5.9
ceph 0.80.5
Ceph.config:...
Christoph Pedro
09:03 AM Documentation #8875: `ceph-deploy new` needs to be called for every node, not just the admin one
It appears I was able to get further this time, the steps are below.
Key difference is, when I did ceph-deploy new I...
Bobby Yakov
06:20 AM Documentation #8875: `ceph-deploy new` needs to be called for every node, not just the admin one
Hi Alfredo,
Nodes were cleaned out, will re-run install today and get you the log files.
In the mean time, it appea...
Bobby Yakov
06:17 AM Bug #8922: ceph-deploy mon create fails to create additional monitoring nodes.
ceph-deploy new cwtcph001
ceph-deploy install cwtcph001 cwtcph002 cwtcph003
ceph-deploy mon create cwtcph001 cwtcph...
Bobby Yakov
05:32 AM rbd Bug #8000: SLAB: Unable to allocate memory on node 0
Ilya Dryomov wrote:
> What do you mean by "I can't explain why only one machine is affected" above? Do you have oth...
Dmitry Smirnov
12:27 AM rbd Bug #8000: SLAB: Unable to allocate memory on node 0
What do you mean by "I can't explain why only one machine is affected" above? Do you have other similar boxes/setups... Ilya Dryomov
02:01 AM rgw Bug #8383: Upload part of one object passed with incorrect upload id or incorrect object id in S3...
Hi,sage,
Sure!
I use S3 API to do this test....
Jingjing Zhao
01:28 AM CephFS Bug #8961 (Won't Fix): du [directory] vs du -b [directory] size doubles
cephfs tracks recursive directory stats. A directory's size is space used by files underneath the directory. If you d... Zheng Yan

07/29/2014

09:41 PM rbd Bug #8000: SLAB: Unable to allocate memory on node 0
This problem remains to be very painful... Average frequency is one crash per day. Less than 24 hours ago I had two c... Dmitry Smirnov
09:38 PM Bug #8863: osd: second reservation rejection -> crash
i used this command reimport the crushmap, bug osd still crash shaojun ruan
01:19 PM Bug #8863: osd: second reservation rejection -> crash
try this:
ceph osd getcrushmap -o cm
ceph osd setcrushmap -i cm
and then see if you can reproduce it after t...
Sage Weil
03:41 AM Bug #8863: osd: second reservation rejection -> crash
osd reject the other osd's backfill request twice probably because the space is full, then the request one crashed shaojun ruan
03:27 AM Bug #8863: osd: second reservation rejection -> crash
*scenario:*
1. 3-replica
2. space is nearlly full(some osd >96%)
We guess the reason is osd continuously receivi...
shaojun ruan
07:52 PM Bug #8886: Miss some folders in PG's folder
Hi, Samuel,
First, I correct my word " it should be stored in the DIR_3 at third level", actually it miss the DIR_...
Jingjing Zhao
01:43 PM Bug #8886: Miss some folders in PG's folder
Can you add a find . on that pg directory? Also, does this happen reliably? Also, on what version did you reproduce... Samuel Just
07:30 PM CephFS Bug #8962: kcephfs: client does not release revoked cap
I saw similar hang a few weeks ago. In that case, all OSDs were down, the MDS couldn't submit log event. Zheng Yan
03:05 PM CephFS Bug #8962 (Resolved): kcephfs: client does not release revoked cap
several instances where the mds tries to revoke a cap (Ls and Fs have been observed so far) and the client doesn't re... Sage Weil
07:18 PM CephFS Bug #8964: kcephfs: client does not resend requests on mds restart
Zheng Yan
07:18 PM CephFS Bug #8964: kcephfs: client does not resend requests on mds restart
probably fixed by https://github.com/ceph/ceph-client/commit/967166011221589288348b893720d358150176b9 Zheng Yan
05:40 PM CephFS Bug #8964: kcephfs: client does not resend requests on mds restart
mds log and the client kern.log with debug cranked up:... Sage Weil
05:39 PM CephFS Bug #8964 (Resolved): kcephfs: client does not resend requests on mds restart
i have a bunch of hung requests,... Sage Weil
06:47 PM Feature #8965 (New): Improve threading for ObjectCacher
The ObjectCacher currently use a single global lock for all state. Break this down to improve multithread performanc... Haomai Wang
03:55 PM Feature #8960: filestore: store backend type persisently
Sage Weil
10:27 AM Feature #8960 (Resolved): filestore: store backend type persisently
Sage Weil
03:32 PM rgw Bug #8586 (Fix Under Review): Missing Swift API Header causes RadosGW to segfault
Yehuda Sadeh
03:06 PM RADOS Bug #8963 (Resolved): erasure coding crush rulset breaks rbd kernel clients on non-ec pools on Ub...
On a fresh install using ceph-deploy on Ubuntu 14.04 creating any erasure coded pool breaks rbd clients on linux 3.13... Greg Dahlman
03:02 PM Bug #8726 (Resolved): (firefly command on dumpling issue?) Error "'adjust-ulimits ceph-coverage /...
commit:fcc0b2451b47793a64fc4cd4675fef667a4a5b45 in ceph-qa-suite.git Josh Durgin
02:31 PM Bug #8628: Bad ceph_osd_op.extent union access in ReplicatedPG::do_osd_ops
This was fixed in 58212b1. Adam Crume
02:28 PM devops Bug #6091 (Won't Fix): centos build should use redhat-rpm-config for debuginfo packages
Sage Weil
02:28 PM devops Bug #5819 (Won't Fix): redhat-rpm-config package needed for debuginfo packages
Sage Weil
02:26 PM devops Bug #7181 (Rejected): debian 7 wheezy init.d script will not start OSDs not corresponding to a mo...
touch /var/lib/ceph/osd/*/sysvinit Sage Weil
02:26 PM devops Bug #6937 (Resolved): udev: OSD using dmcrypt aren't automatically started
Sage Weil
02:25 PM devops Bug #6453 (Won't Fix): libapache2-mod-fastcgi Packages for Debian Squeeze have incorrect dependen...
Sage Weil
02:25 PM devops Bug #6158: selective sync of ceph precise dependencies from havana cloud archive
Note: Talk to neil about this. Sandon Van Ness
02:22 PM devops Bug #8602 (Rejected): ceph fedora package is missing erasure code libraries
redoing (redid?) these packages Sage Weil
02:22 PM Bug #8711 (Resolved): Error "ceph --format=json-pretty osd lspools" is "unrecognized command" in ...

Oops, this should have been closed already...
John Spray
01:51 PM Bug #8711: Error "ceph --format=json-pretty osd lspools" is "unrecognized command" in cuttlefish
Probably best to change the test to cope? Samuel Just
02:21 PM devops Bug #7598 (Can't reproduce): ceph-disk-activate error with ceph-deploy
Sage Weil
02:19 PM devops Bug #8581 (Can't reproduce): DNS issues when resolving hosts
Sage Weil
02:17 PM devops Bug #8734: EPEL / Ceph.com package priority issues
ceph-deploy sets the priority; other users will need to do so themselves.
perhaps that can be mentioned in the doc...
Sage Weil
02:15 PM devops Bug #5283 (Won't Fix): Ceph-deploy can't handle /dev/disk/by-* device paths
Sage Weil
02:06 PM devops Bug #7627 (Resolved): ceph-disk: does not start daemons properly under systemd
commit:3e0d9800767018625f0e7d797c812aa44c426dab Sage Weil
02:01 PM Documentation #8875: `ceph-deploy new` needs to be called for every node, not just the admin one
Can you paste the whole output of ceph-deploy? Alfredo Deza
01:58 PM Bug #6141 (Can't reproduce): OSDs crash on recovery
Samuel Just
01:52 PM Bug #8673 (Resolved): s3tests.functional.test_s3.test_multipart_upload failed in teuthology-2014-...
Sage Weil
01:50 PM Bug #8654 (Resolved): Parsing /etc/lsb-release for OSD metadata is not portable
Sage Weil
01:49 PM Bug #8644 (Rejected): 624ae21833 breaks ceph-disk
Sage Weil
01:48 PM Bug #8852 (Won't Fix): submodules not cecking out the right branch, jerasure does not compile
workaround is to remove the dir then rerun the submodule command. we blame git! Sage Weil
01:47 PM Bug #8801 (Can't reproduce): Ceph monitors do not start after server restart
from teh logs the ceph-mon process was never started.. iw ould look in your /var/log/upstart logs? Sage Weil
01:37 PM Bug #8943 (Pending Backport): "ceph df" cannot show pool available space correctly
commit:04d0526718ccfc220b4fe0c9046ac58899d9dafc Sage Weil
01:34 PM Bug #8495 (Duplicate): osd: bad state machine event on backfill request
Sage Weil
01:29 PM Bug #8694 (Duplicate): OSD crashed (assertion failure) at FileStore::_collection_move_rename
#8733 Sage Weil
01:28 PM rgw Bug #8676: md5sum check failed during readwrite.py
I don't see anything wrong in the logs other than this:... Yehuda Sadeh
01:27 PM Bug #8753: PG::activate assert failed when recover finished
Has this happened since? Samuel Just
01:26 PM Bug #8865: cep osd setmaxosd doesn't check if osds exist
agreed Samuel Just
01:26 PM Bug #8752 (Can't reproduce): firefly: scrub/repair stat mismatch
Sage Weil
01:25 PM Bug #8752 (Resolved): firefly: scrub/repair stat mismatch
Sage Weil
01:06 PM CephFS Bug #8961 (Won't Fix): du [directory] vs du -b [directory] size doubles
Under cephfs using the kernel client, du -b shows an incorrect size.
I've also found that du --apparent-size shows...
Matt Hook
01:04 PM Bug #8717 (In Progress): teuthology: valgrind leak checks broken for osd (at least)
Sage Weil
01:03 PM Bug #8717 (Resolved): teuthology: valgrind leak checks broken for osd (at least)
Sage Weil
01:03 PM Bug #8926 (Resolved): osd: invalid Message* deref in C_SendMap
Sage Weil
01:03 PM Bug #8924 (Resolved): osd: leaking local_connection under valgrind
Sage Weil
12:59 PM Messengers Bug #8880: msg/Pipe.cc: 1538: FAILED assert(0 == "old msgs despite reconnect_seq feature")
Sage Weil
10:42 AM rgw Bug #8632 (Resolved): rgw: bucket listing with delimiter doesn't scale well
backported to dumpling commit:9604425b86f5839a109faa1f396b0d114e9b9391 Yehuda Sadeh
09:36 AM rgw Bug #8632 (Pending Backport): rgw: bucket listing with delimiter doesn't scale well
in firefly, not dumpling yet Sage Weil
10:31 AM rgw Bug #8846 (Resolved): radosgw on 0.80.4 crashes when doing a multi-part upload
Yehuda Sadeh
10:11 AM Bug #8532 (Can't reproduce): 0.80.1: OSD crash (domino effect), same as BUG #8229
Let us know if anything interesting comes up. Samuel Just
10:10 AM Bug #8229: 0.80~rc1: OSD crash (domino effect)
This bug described a whole bunch of unrelated problems, can you open a fresh bug? Samuel Just
10:01 AM Bug #8959: osd crashed in upgrade:dumpling-x-firefly---basic-vps suite
this sounds a bit like a problem we had a while back with hung IOs from the VMs? Sage Weil
08:40 AM Bug #8959: osd crashed in upgrade:dumpling-x-firefly---basic-vps suite
Seems the same crash in another tests, logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-07-28_11:48:15... Yuri Weinstein
08:36 AM Bug #8959 (Can't reproduce): osd crashed in upgrade:dumpling-x-firefly---basic-vps suite
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps... Yuri Weinstein
09:41 AM CephFS Bug #8574: teuthology: NFS mounts on trusty are failing
I'm not sure if this is a different issue or a different system:... Greg Farnum
09:40 AM devops Support #8861: Deploying additional monitors fails.
I am also seeing this error when trying to add a new monitor. Same version of Ubuntu and Ceph. James Devine
09:38 AM rgw Bug #8735 (Can't reproduce): TestAccountNoContainers fail in Firefly upgrade:firefly-x:stress-split
Sage Weil
09:38 AM rgw Bug #8766: multipart minimum size error should be EntityTooSmall
Sage Weil
09:37 AM rgw Bug #8848 (Resolved): "adjust-ulimits: command not found" in upgrade:firefly-firefly-testing-basi...
Sage Weil
09:37 AM rgw Bug #8847 (Can't reproduce): "Error initializing cluster client" in upgrade:firefly-firefly-testi...
Sage Weil
09:34 AM Bug #8921 (Won't Fix): ceph pg dump <{summary|sum|delta|pools|osds|pgs|pgs_brief}> only work corr...
Sage Weil
09:33 AM rgw Bug #8864: radosgw help doesn't seem to display some debug options
there are others that we could add Sage Weil
09:32 AM rgw Bug #8864 (Resolved): radosgw help doesn't seem to display some debug options
Sage Weil
09:32 AM rgw Bug #6911 (Won't Fix): rgw test failure on the arm set up
Sage Weil
09:31 AM rgw Bug #8111 (Need More Info): /etc/init.d/ceph-radosgw for RHEL needs QA
isn't it /etc/init.d/radosgw?
Sage Weil
09:30 AM rgw Bug #8383 (Need More Info): Upload part of one object passed with incorrect upload id or incorrec...
Can you provide more detailed steps to reproduce? ideally, a new test in s3-tests.... :) Sage Weil
09:29 AM rgw Bug #7799 (Can't reproduce): Errors in upgrade:dumpling-x:stress-split-firefly---basic-plana suite
Sage Weil
09:25 AM rgw Bug #8311 (Resolved): No pool name error in ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-...
Sage Weil
09:25 AM rgw Bug #8784: rgw: completion leak
Sage Weil
09:23 AM rbd Bug #6695 (Won't Fix): Upgrade rbd failure in nightly tests. (mkdir --p ..)
Sage Weil
09:22 AM rbd Bug #5480 (Can't reproduce): libceph: unexpected old state in con_sock_state_change
Sage Weil
09:21 AM rbd Bug #8845: Flattening Clones of clone, results in command failure
fsx is now able to catch this one. Ilya Dryomov
09:19 AM rbd Bug #8845: Flattening Clones of clone, results in command failure
Josh Durgin
09:15 AM rbd Bug #8845: Flattening Clones of clone, results in command failure
Ilya Dryomov
09:21 AM rbd Bug #7693: virsh domblkinfo fails with 'Bad file descriptor'
https://bugzilla.redhat.com/show_bug.cgi?id=1124508 Sage Weil
09:17 AM rbd Bug #7620 (Can't reproduce): BUG: soft lockup - CPU#0 stuck for 23s!
Sage Weil
09:15 AM Linux kernel client Bug #8568 (New): libceph: kernel BUG at net/ceph/osd_client.c:885
Ilya Dryomov
09:10 AM Linux kernel client Bug #8568: libceph: kernel BUG at net/ceph/osd_client.c:885
Ilya Dryomov
09:14 AM rbd Bug #8709: stale size reported by ioctl(BLKGETSIZE64) after librbd_resize() returns
The problem has been traced to http://tracker.ceph.com/issues/8806. Keeping this around to re-test after it gets fixed. Ilya Dryomov
09:11 AM Bug #8439 (Won't Fix): ceph-osd crashing often
see 0.80.x Sage Weil
09:10 AM Bug #8445 (Won't Fix): osd not starting anymore
0.78 had lots of issues; see 0.80.x Sage Weil
09:01 AM rbd Bug #8318 (Can't reproduce): "rbd: create error" in upgrade:dumpling-dumpling-testing-basic-plana...
Sage Weil
09:01 AM rbd Bug #8715 (Can't reproduce): "ceph_test_librbd_fsx: invalid option -- 'h'" error in teuthology-20...
Sage Weil
06:57 AM CephFS Feature #7759 (Resolved): journal-tool: roll in resetter/dumper from MDS
... John Spray
06:56 AM CephFS Feature #7761 (Resolved): journal-tool: forwards-search through corrupt regions
... John Spray
06:55 AM CephFS Feature #7763: journal-tool: import
... John Spray
06:54 AM CephFS Feature #7763 (Resolved): journal-tool: import
This was done when undump was merged into cephfs-journal-tool John Spray
06:51 AM CephFS Bug #8773 (Resolved): failing cephfs set_layout tests
Test is retired and unsafe behaviour (data pool default to 0) is disabled in master. John Spray
06:07 AM CephFS Bug #8811 (Resolved): Journal corruption during upgrade to 0.82 with standby-replay daemons
This got fixed 11 days ago, but was never marked closed. Merged in commit:b9463e3497cc1f2a1bab0838430a4402d8c88af0 Greg Farnum
05:59 AM Bug #8932 (Resolved): rados api test hang on HitSetWrite
Merged to master in commit:37eba045ec78f2ea8f9000c6b158e20808d29fb2 Greg Farnum
05:56 AM Bug #8931 (Pending Backport): failed write reply order from ceph_test_rados
Merged to master in commit:050ac87530c2637f097e07b5373115721303f07c Greg Farnum

07/28/2014

10:47 PM Bug #8944: Ceph daemon bad asok used in connection with cluster
wip-8944 created, but gitbuilders are having enough problems I'm not submitting a PR yet Dan Mick
02:11 PM Bug #8944 (Fix Under Review): Ceph daemon bad asok used in connection with cluster
Adding the global args to the invocation of ceph-conf seems to resolve this. Dan Mick
12:41 PM Bug #8944: Ceph daemon bad asok used in connection with cluster
oh....because --cluster on the cli ... yeah.
Dan Mick
12:40 PM Bug #8944: Ceph daemon bad asok used in connection with cluster
ceph uses ceph-conf --show-config-value admin_socket -n <name> and believes it; wonder why that's not working? Dan Mick
09:58 AM Bug #8944: Ceph daemon bad asok used in connection with cluster
Sage Weil
05:01 AM Bug #8944 (Resolved): Ceph daemon bad asok used in connection with cluster
Using @ceph --cluster clustername daemon mon.host1 config@ causes ... Szymon Zacher
10:46 PM Bug #8947: Writing rados objects with max objects set for cache pool crashed osd
Uploading crash dump Mallikarjun Biradar
01:45 PM Bug #8947: Writing rados objects with max objects set for cache pool crashed osd
Could not reproduce using vstart.sh on current master branch. I never saw a crash or bug report with that stack trace. David Zafman
10:08 AM Bug #8947: Writing rados objects with max objects set for cache pool crashed osd
I don't remember the details, but we were previously crashing with a 10-object limit anyway due to hit sets and such.... Greg Farnum
08:16 AM Bug #8947: Writing rados objects with max objects set for cache pool crashed osd
Test configuration:
No of osd nodes: 3
No of osd's : 4
No of monitors: 2
Kernel versions: 3.13.0-24-generic
No o...
Mallikarjun Biradar
08:15 AM Bug #8947 (Duplicate): Writing rados objects with max objects set for cache pool crashed osd
Setting target_max_objects parameter and writing rados object onto cache pool crashed osd.
History of operations o...
Mallikarjun Biradar
06:41 PM Messengers Bug #8880 (Fix Under Review): msg/Pipe.cc: 1538: FAILED assert(0 == "old msgs despite reconnect_s...
New patches to split up the code more, as requested. :) Greg Farnum
10:56 AM Messengers Bug #8880 (In Progress): msg/Pipe.cc: 1538: FAILED assert(0 == "old msgs despite reconnect_seq fe...
Greg Farnum
02:12 PM rgw Bug #8937 (Fix Under Review): rgw: broken large(-ish) objects
Yehuda Sadeh
02:10 PM rgw Feature #7774 (Resolved): rgw: cache decoded user and bucket info
This one has been merged in a while a go, at commit:82c547952dc9e7a3e9fab1264f5fdd903ab6973e. Yehuda Sadeh
01:07 PM Bug #8941 (Can't reproduce): DaemonConfig.SubstitutionLoop unit test goes haywire
nevermind, most recent occurrence was feb, so ignoring this. Sage Weil
01:02 PM rgw Feature #8956 (Resolved): rgw: support bucket notification
Yehuda Sadeh
11:32 AM Documentation #8955: doc refers to [default] section, don't think it exists
http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster refers to the [default] section in the ceph.con... Dan Mick
11:31 AM Documentation #8955 (Resolved): doc refers to [default] section, don't think it exists
Dan Mick
09:21 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
I'm pretty sure it's the disabled lockdep that affects this. Our testing kernel is built with lockdep enabled, Ubunt... Ilya Dryomov
08:50 AM Linux kernel client Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Hi Ilya,
I can reliably reproduce the error when running this generic kernel with no changes:
http://kernel.ubu...
Greg Wilson
08:39 AM Bug #8935: operations not idempotent when enabling cache
I think you're right that a per-object log would be needed to solve this problem — and I think that means we shouldn'... Greg Farnum
08:02 AM rgw Feature #8945 (Resolved): rgw: support swift /info api
Yehuda Sadeh
06:55 AM Bug #8938 (Resolved): OSD memory leak seen with fs-master-testing-basic/kernel_untar_build.sh
This was fixed at about the same time:... John Spray
06:42 AM CephFS Feature #7810 (In Progress): libcephfs: add a test that freezes + unfreezes a client, and then ve...
John Spray
05:27 AM Bug #8895: ceph osd pool stats (displayed incorrect values)
Negative & undefined values in counts objects:
*-5/0 objects degraded (-inf%)*
*-32/12 objects degraded (-266...
Andrey Matyashov
03:06 AM rgw Bug #8864: radosgw help doesn't seem to display some debug options
This should be closed with #8112 Abhishek Lekshmanan
02:48 AM Bug #8943 (Resolved): "ceph df" cannot show pool available space correctly
Currently when user have 2 pools with different ruleset and different root, basically they will use differen... Xiaoxi Chen
12:37 AM Bug #8863: osd: second reservation rejection -> crash
Last week we've created a new cluster(all components use v0.80.4), continuously writing data until space is full, the... shaojun ruan

07/27/2014

11:45 PM Bug #8942 (Resolved): Bad JSON output in ceph osd tree
Hi,
JSON output for @ceph osd tree@ has bad format for stray array: every osd are printed in the same array element....
Szymon Zacher
10:41 PM Bug #8941 (Can't reproduce): DaemonConfig.SubstitutionLoop unit test goes haywire
... Sage Weil
10:31 PM Bug #8822: osd: hang on shutdown, spinlocks
saw this again, ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basi... Sage Weil
10:28 PM Bug #8396: osd: message delayed in Session misdirected after split
very likely another instance, but i didn't look closely.... Sage Weil
10:20 PM Bug #8940 (Duplicate): 3.22s1 shard 0(2) missing ad166f62/benchmark_data_plana57_30491_object1036...
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basic-plana/380335
...
Sage Weil
09:47 PM Bug #6003: journal Unable to read past sequence 406 ...
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basic-plana/380261
...
Sage Weil
02:32 PM Bug #8758: PGs get stuck in “replay”, but drop it upon osd restarts
As for the issue of losing replay states upon member osd restarts... Could the fix be as simple as not setting inter... Alexandre Oliva
01:44 PM Bug #8758: PGs get stuck in “replay”, but drop it upon osd restarts
Here's a patch that addresses the “stuck in replay” problem (but not the “replay is dropped after osd re-peering” one). Alexandre Oliva
11:21 AM Bug #8863 (Need More Info): osd: second reservation rejection -> crash
Sage Weil
11:20 AM Bug #8922 (Need More Info): ceph-deploy mon create fails to create additional monitoring nodes.
It sounds like the monitor names don't match the host names or something similar. Can you post the full sequence of ... Sage Weil

07/26/2014

10:14 PM Bug #8939 (In Progress): stalled LibRadosTwoPoolsPP.TryFlushReadRace; client failed to reconnect?
Sage Weil
10:10 PM Bug #8939: stalled LibRadosTwoPoolsPP.TryFlushReadRace; client failed to reconnect?
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-07-25_22:40:14-rados-wip-sage-testing-testing-basic-plana/37... Sage Weil
10:05 PM Bug #8939 (Duplicate): stalled LibRadosTwoPoolsPP.TryFlushReadRace; client failed to reconnect?
it appears the OSD was behaving properly, but things stalled because on of the stat replies got... Sage Weil
02:06 PM Bug #8938 (Resolved): OSD memory leak seen with fs-master-testing-basic/kernel_untar_build.sh

http://pulpito.front.sepia.ceph.com/teuthology-2014-07-25_23:04:01-fs-master-testing-basic-plana/378947/
Initial...
John Spray
 

Also available in: Atom