Project

General

Profile

Activity

From 10/16/2014 to 11/14/2014

11/14/2014

10:46 PM Bug #10115: mon not running. osd is dead
my ceoh version is 0.80.1. i install them on ubuntu 12.04.4
uname -a : Linux controller 3.11.0-26-generic #45~preci...
? ??
10:22 PM Bug #10115: mon not running. osd is dead
this is the log file on one of my ceph node. ? ??
10:10 PM Bug #10115 (Can't reproduce): mon not running. osd is dead
my ceph did't config the cephx. i sloved one problem before as this issue said:http://tracker.ceph.com/issues/8851.
...
? ??
06:05 PM Bug #10114 (Fix Under Review): assembly files need annotation to assert that stack should not be ...
seeming workaround in wip-execstack
Dan Mick
05:58 PM Bug #10114: assembly files need annotation to assert that stack should not be executable

References:
https://bugzilla.redhat.com/show_bug.cgi?id=1118504 the original bug that noticed the problem on Fe...
Dan Mick
05:30 PM Bug #10114 (Resolved): assembly files need annotation to assert that stack should not be executable
Dan Mick
05:10 PM Bug #10113: --log-to-stderr with -f/-d sends a lot of things to logfile
on a vstart cluster with 3 osds, if I stop osd.2 and restart like:
./ceph-osd -i 2 -c ./ceph.conf --log-to-stderr ...
Dan Mick
05:10 PM Bug #10113 (Duplicate): --log-to-stderr with -f/-d sends a lot of things to logfile
Dan Mick
03:45 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
Samuel Just
03:12 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
This is almost certainly unrelated to those two bugs. This is a specific edge case in divergent write recovery. Samuel Just
11:43 AM devops Cleanup #7722 (Resolved): Make /admin/build-doc distro independent
John Wilkins
11:41 AM devops Cleanup #7722: Make /admin/build-doc distro independent
Updated the procedure doc with all dependencies. John Wilkins
11:43 AM Bug #9788 (New): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" is...
Logs are in http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-basic-vps/... Yuri Weinstein
10:22 AM Cleanup #10110 (New): librados: mark old objects_begin interface deprecated
There is some minor refactoring needed since the new methods call the old ones when ns == "". The fix is probably to... Sage Weil
10:18 AM devops Tasks #8366: Update ceph.com/docs to default to the latest major release (0.80)
Can we update it to the latest major release with the backports--e.g., v0.80.7? I finally have someone to help with t... John Wilkins
10:12 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
I think that's an annoying special case for snaps purged on an empty pg. Both the old primary which did the trim and... Samuel Just
08:09 AM Bug #10107: Coredump in upgrade:giant-x-next-distro-basic-multi run
... Sage Weil
07:40 AM Bug #10107 (Duplicate): Coredump in upgrade:giant-x-next-distro-basic-multi run
(Maybe related to #8733)
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-13_17:04:11-upgrade:gi...
Yuri Weinstein
08:03 AM Bug #10109 (Duplicate): "LibRadosTwoPoolsECPP.PromoteSnap" test failed in upgrade:dumpling-firefl...
3 tests failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:15:02-upgrade:dumpling-firefly-x:p... Yuri Weinstein
07:55 AM rgw Bug #10108 (Duplicate): s3tests fail in upgrade:dumpling-firefly-x:parallel-next-distro-basic-mul...
All tests failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:10:02-upgrade:dumpling-firefly-x... Yuri Weinstein
07:47 AM Bug #10105: crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
the upgrade from 0.80.1 to 0.80.7 case was a bad disk. Sage Weil
07:32 AM Bug #9727: 0.86 EC+ KV OSDs crashing
Hi,
I tried this again on the new 0.88 release.
After about 30 minutes of testing, the EC-KV OSDs started crashin...
Kenneth Waegeman
04:51 AM Messengers Feature #10029: Retry binding on IPv6 address if not available
I started playing with this a bit (no commits yet), I simply loop in SimpleMessenger's Accepter.cc and retry to bind ... Wido den Hollander
03:26 AM Feature #9979 (In Progress): osd: cache: proxy reads (instead of redirect)
https://github.com/ceph/ceph/pull/2927 Loïc Dachary
02:17 AM rgw Bug #10106 (Resolved): rgw acl response should start with <?xml version="1.0" ?>
I encountered some surprising behaviour when playing with radosgw and s3cmd.
You can probably make a convincing case...
Jon Kåre Hellan
02:10 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Loïc Dachary

11/13/2014

10:32 PM Bug #10052 (Fix Under Review): LibRadosTwoPools[EC]PP.PromoteSnap failure
https://github.com/ceph/ceph/pull/2926 Sage Weil
10:19 PM Bug #10052 (In Progress): LibRadosTwoPools[EC]PP.PromoteSnap failure
// read baz
{
bufferlist bl;
ASSERT_EQ(-ENOENT, ioctx.read("baz", bl, 1, 0));
}
I think this usu...
Sage Weil
05:44 PM Bug #10052: LibRadosTwoPools[EC]PP.PromoteSnap failure
ubuntu@teuthology:/a/sage-2014-11-12_13:30:37-smoke-wip-warn-max-pg-distro-basic-multi/598501 Sage Weil
08:49 PM Bug #10105 (Can't reproduce): crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
... Sage Weil
05:48 PM Bug #10104 (Resolved): rados.py: wait_for_* don't wait; should have poll, wait, and wait+cb versions
Completion.wait_for_{safe, complete} are using the poll functions "is_{safe,complete}"; the comments indicate that's ... Dan Mick
05:47 PM rgw Bug #10103 (Resolved): swift tests failing
ubuntu@teuthology:/a/dzafman-2014-11-13_10:42:58-rgw-wip-10082-testing-basic-multi$ teuthology-ls . | grep FAIL
5996...
Sage Weil
05:02 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
Any progress? Dmitry Smirnov
04:36 PM rgw Bug #10082 (Resolved): Segmentation fault in upgrade:dumpling-firefly-x:parallel-next-distro-basi...
Sage Weil
04:28 PM Feature #10064 (Fix Under Review): add ceph_objectstore_tool tests to make check
https://github.com/ceph/ceph/pull/2915 Loïc Dachary
04:28 PM Bug #10063 (Fix Under Review): ceph_objectstore_tool does not support getting attributes for eras...
https://github.com/ceph/ceph/pull/2915 Loïc Dachary
03:48 PM rgw Bug #10102 (Resolved): sync agent: does not handle gracefully transient errors
on a copy operation, rgw sent back 400 and the sync agent got stuck in the following loop:... Yehuda Sadeh
12:58 PM rgw Bug #9587 (Pending Backport): ceph-radosgw sysvinit script on EL6 cannot set ulimit
Loïc Dachary
12:25 PM rgw Bug #10099 (Duplicate): radosgw-agent - error geting op state: list index out of range
radosgw-agent logs the following, and objects are not synced to the secondary gateway.
INFO:urllib3.connectionpool...
Brian Andrus
12:25 PM Bug #10096: ceph-disk prepare fails to unmount temp file successfully
Notes:
- Issuing a short delay before 'umount' fixes the issue - this is a terrible workaround
- Issuing 'sync' b...
Blaine Gardner
07:52 AM Bug #10096 (Resolved): ceph-disk prepare fails to unmount temp file successfully
I have been testing on a virtual machine for ease of testing, and 'ceph-disk prepare' kept forwarding an error from '... Blaine Gardner
11:07 AM Bug #10095 (Resolved): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
Sage Weil
11:02 AM Bug #10095 (Fix Under Review): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
https://github.com/ceph/ceph/pull/2920 Sage Weil
07:37 AM Bug #10095 (Resolved): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
ubuntu@teuthology:/a/samuelj-2014-11-11_22:08:30-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/597458
...
Samuel Just
10:36 AM Bug #9835 (Resolved): osd: bug in misdirected op checks (firefly)
Sage Weil
10:25 AM Messengers Feature #10079 (Resolved): AsyncMessenger: Support select for other OS
Haomai Wang
09:49 AM Feature #10098 (Resolved): wanted: command to clear 'incomplete' PGs
Hello,
Please create a command that would clear 'incomplete' PGs.
Perhaps ceph pg force_create_pg could be extend...
c sights
08:32 AM rbd Bug #9854 (Pending Backport): librbd: reads contending for cache space can cause livelock
Jason Dillaman
08:28 AM Bug #10097 (Resolved): failed: mon_thrash
debian 7.0
logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-12_17:15:01-upgrade:giant-giant-dist...
Yuri Weinstein
07:17 AM Support #10024: Cluster unreachable after restart
Hi,
I've missed anything?
Did I do something wrong?
Because I didn't get any answer after more than 1 week.
Thank...
Luca Mazzaferro
06:59 AM Cleanup #10094 (New): Create new git repo for json_spirit
json spirt is currently part of the code tree of ceph, but it's external code. There was also no update within a long... Danny Al-Gaaf
06:58 AM CephFS Bug #10092 (Resolved): multiple_rsync.sh + ceph-fuse timing out on firefly
greg is right, these time out semi-regularly. increased the timeout on master, giant, firefly. Sage Weil
06:38 AM Bug #10093 (Fix Under Review): ceph-monstore-tool: FAILED assert(!is_open)
https://github.com/ceph/ceph/pull/2914 Loïc Dachary
06:35 AM Bug #10093 (Resolved): ceph-monstore-tool: FAILED assert(!is_open)
Using a vstart cluster + stoph.sh:... Loïc Dachary
04:17 AM Bug #9916: osd: crash in check_ops_in_flight
Hi Yehuda,
After taking a look at the rgw code, I failed to find which (http) request would need CEPH_OSD_OP_SRC_CMP...
Guang Yang
12:14 AM Feature #9943 (In Progress): osd: mark pg and use replica on EIO from client read
Current OSD check PG map and get only k items and send sub-read request. So if one read failed. It assert and core du... Wei Luo

11/12/2014

09:21 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
How do we tell the difference between (2) and (3)? In both cases, ceph_objectstore_tool will see there is no SHARDS ... Sage Weil
09:06 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to

I see from the code that there are a couple of scenarios that need to be handled or at least documented:
1. Expo...
David Zafman
08:59 PM CephFS Bug #10092 (Resolved): multiple_rsync.sh + ceph-fuse timing out on firefly
teuthology-2014-11-11_23:04:01-fs-firefly-distro-basic-multi/598145
teuthology-2014-11-11_23:04:01-fs-firefly-distro...
Sage Weil
08:25 PM Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size...
Wei is working on this along with http://tracker.ceph.com/issues/9943 . Guang Yang
06:52 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
Greg Farnum wrote:
> What version are you running? This looks like one of a couple of bugs that have been resolved i...
Wenjun Huang
10:47 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
What version are you running? This looks like one of a couple of bugs that have been resolved in the latest point rel... Greg Farnum
04:26 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
And the peer OSD's log is as below:... Wenjun Huang
03:40 AM Messengers Bug #10080 (Resolved): Pipe::connect() cause osd crash when osd reconnect to its peer
When our cluster load is heavy, the osd sometimes crashes. The critical log is as below:
-278> 2014-08-20 11:04:28...
Wenjun Huang
05:15 PM rbd Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
Jason Dillaman
05:13 PM rbd Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
Commit b75f85a2 added new elements to the _Thread_ class, breaking ABI. In this (and several other upgrade tests fro... Jason Dillaman
05:08 PM Feature #9957: librados: add fadvise op
See the pull request: https://github.com/ceph/ceph/pull/2905 jianpeng ma
04:09 PM rgw Bug #10090 (Resolved): ceph_objectstore_tool import broken
Sage Weil
03:27 PM rgw Bug #10090 (Fix Under Review): ceph_objectstore_tool import broken
David Zafman
02:15 PM rgw Bug #10090 (Resolved): ceph_objectstore_tool import broken

The tool can't import because it finds that the recently removed collection still exists.
Is may be because fini...
David Zafman
12:37 PM rbd Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
commit:e94d3c11edb9c9cbcf108463fdff8404df79be33 Josh Durgin
11:38 AM Bug #10083 (Resolved): cephtool/test.sh: osd create w/o uuid test is noisy
Sage Weil
10:09 AM Bug #10083: cephtool/test.sh: osd create w/o uuid test is noisy
Verified to work with... Loïc Dachary
09:53 AM Bug #10083 (Fix Under Review): cephtool/test.sh: osd create w/o uuid test is noisy
https://github.com/ceph/ceph/pull/2902 Loïc Dachary
09:29 AM Bug #10083 (Resolved): cephtool/test.sh: osd create w/o uuid test is noisy
... Sage Weil
10:56 AM Bug #10085 (Resolved): dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
After upgrade to glibc 2.20, "ceph" & "rbd" commands exiting with "Illegal instruction" exit message and !=0 exit cod... Denis kaganovich
10:00 AM Feature #9598 (Pending Backport): re-enable Objecter fast dispatch
sage-2014-11-11_08:26:01-rados-wip-sage-testing-distro-basic-multi Sage Weil
08:42 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Same issue http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-11_17:03:01-upgrade:firefly:older-firefly-distro-ba... Yuri Weinstein
08:29 AM rgw Bug #10082 (Resolved): Segmentation fault in upgrade:dumpling-firefly-x:parallel-next-distro-basi...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-11_17:10:01-upgrade:dumpling-firefly-x:parallel-ne... Yuri Weinstein
06:53 AM rbd Feature #2467 (Resolved): qemu: implement bdrv_invalidate_cache
Merged upstream: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=be21788495fdc8251b04dd4bfd0cdce95c49d75b Jason Dillaman
01:23 AM Messengers Feature #10079 (Resolved): AsyncMessenger: Support select for other OS
AsyncMessenger already support epoll and kqueue, but for other legacy OS or windows, we need to use select for the wo... Haomai Wang

11/11/2014

06:17 PM rbd Bug #10002 (Fix Under Review): Errors during import_export test in upgrade:firefly-x-next-distro-...
https://github.com/ceph/ceph/pull/2899 Josh Durgin
08:23 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:15:02-upgrade:dumpling-firefly-x:parallel-... Yuri Weinstein
08:17 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Seems similar issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:05:02-upgrade:firefly:singleton-f... Yuri Weinstein
05:20 PM Bug #10052: LibRadosTwoPools[EC]PP.PromoteSnap failure
ubuntu@teuthology:/a/sage-2014-11-11_14:57:42-smoke-wip-warn-max-pg-distro-basic-multi/596722 Sage Weil
02:59 PM CephFS Bug #8090: multimds: mds crash in check_rstats
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-10_23:18:02-multimds-giant-testing-basic-multi/595393 Sage Weil
02:54 PM Bug #10077 (Resolved): ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
user on 0.87 exported a replicated pg and couldn't import it because the shards feature wasn't set on the osd.
w...
Sage Weil
02:14 PM rgw Feature #9933: rgw: implement S3 RR (reduced redundancy) API
Hmm, was looking just now at the S3 api, and it seems that you can set RR per object, not per bucket. This complicate... Yehuda Sadeh
11:01 AM Bug #10069 (Rejected): SyncEntryTimeout::finish() timeout

The ceph_objectstore_tool aborted in FileStore code.
On my wip-9780 branch which is rebased on current master ru...
David Zafman
10:31 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
Replying to my own post for posterity:
I figured out why those Git hashes don't align. It's bug in log.cgi. Appare...
Ken Dreyer
08:50 AM devops Bug #10049 (Resolved): "Failed to fetch package" "rhel7_0-x86_64-basic"
Looks fixed Yuri Weinstein
09:53 AM Bug #10067 (Can't reproduce): ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
09:01 AM rgw Feature #9013 (Resolved): rgw: set civetweb as a default frontend
Sage Weil
08:48 AM rgw Bug #10066: rgw: failed md5sum on s3tests-test-readwrite
Same problem in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_18:11:17-upgrade:firefly:newer-firefly-dist... Yuri Weinstein
07:22 AM rgw Bug #10066 (Resolved): rgw: failed md5sum on s3tests-test-readwrite
... Sage Weil
08:16 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Same issues in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-10_17:18:01-upgrade:firefly-x-next-distro-b... Yuri Weinstein
08:02 AM Bug #10016 (Resolved): "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
tests passed. Yuri Weinstein
07:25 AM rgw Bug #9917 (Won't Fix): RADOSGW: Not able to create Swift objects with erasure coded pool
Sage Weil
03:51 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
OK,I was not aware of this, seems sane behaviour to me. pushpesh sharma
07:21 AM rgw Bug #10062: s3-test failures using keystone authentication
Looks like for a few of them eg. the date ones occur as it looks like radosgw doesn't consider checking the date head... Abhishek Lekshmanan
05:02 AM rgw Bug #10062 (Resolved): s3-test failures using keystone authentication
* "rgw: check for timestamp for s3 keystone auth":https://github.com/ceph/ceph/pull/2993
* "wip: rgw: check keystone...
Abhishek Lekshmanan
07:20 AM Bug #10065 (Duplicate): hung ec-lost-unfound.yaml, failed of osd.{0,2,3}
this pattern keeps popping up:... Sage Weil
07:16 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
ubuntu@teuthology:/a/teuthology-2014-11-10_02:32:01-rados-giant-distro-basic-multi/594038 Sage Weil
06:40 AM Feature #10064 (Resolved): add ceph_objectstore_tool tests to make check
The "ceph_objectstore_tool.py":https://github.com/ceph/ceph/blob/giant/src/test/ceph_objectstore_tool.py tests can be... Loïc Dachary
06:35 AM Bug #10063: ceph_objectstore_tool does not support getting attributes for erasure coded objects
... Loïc Dachary
06:33 AM Bug #10063 (Resolved): ceph_objectstore_tool does not support getting attributes for erasure code...
... Loïc Dachary
04:37 AM Bug #9554: "FAILED assert(0 == "hit suicide timeout")" in upgrade:firefly-firefly-testing-basic-v...
Yes it reproduced in giant too. Sahana Lokeshappa

11/10/2014

11:46 PM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
Just wanted to add that lack of timeout causes havoc all over the place... Autofs, backup scrips mounting CephFS on d... Dmitry Smirnov
04:05 PM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
Although it terminates on "Ctrl+C" a timeout would be _very_ useful because it would prevent system from hanging on b... Dmitry Smirnov
11:11 AM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
Was it blocking in the foreground? Did SIGKILL (ie, control-C) work on it?
We can add a configurable timeout but I...
Greg Farnum
01:07 AM CephFS Bug #10041 (Resolved): ceph-fuse: never exit when no MDS server is available
I'm attempting to mount CephFS using Fuse client (i.e. _ceph-fuse_) which do not exit if all MDS servers are down (I ... Dmitry Smirnov
10:57 PM CephFS Bug #10061 (New): uclient: MDS: output cap data in messages
MClientCaps messages don't dump the caps they're updating, and generally neither does anything else. We need to optio... Greg Farnum
10:55 PM CephFS Feature #10060 (New): uclient: warn about stuck cap flushes
It can be hard to diagnose issues that involve cap state. To help with that, the client should keep track of its cap ... Greg Farnum
10:40 PM CephFS Bug #9977 (Resolved): cephfs-journal-tool falsely reports invalid start_ptr
In next branch as commit:65c33503c83ff8d88781c5c3ae81d88d84c8b3e4 and in giant as commit:fc5354dec55248724f8f6b795e3a... Greg Farnum
09:36 PM CephFS Bug #9341: MDS: very slow rejoin
Thanks. Dmitry Smirnov
09:27 PM CephFS Bug #9341 (Resolved): MDS: very slow rejoin
This is backported to giant as of commit:97e423f52155e2902bf265bac0b1b9ed137f8aa0. The test for it also got backporte... Greg Farnum
09:26 PM CephFS Bug #9800 (Resolved): client-limits test is not passing
Backported in commit:387efc5fe1fb148ec135a6d8585a3b8f8d97dbf8 Greg Farnum
06:15 PM Bug #10042: OSD crash doing object recovery with EC pool
I'm not sure either, investigating. Loïc Dachary
05:15 PM Bug #10042: OSD crash doing object recovery with EC pool
Hi Loic,
I am still a little bit confused in terms of what happened behind the crash (and what is the relation betwe...
Guang Yang
05:30 AM Bug #10042: OSD crash doing object recovery with EC pool
Loïc Dachary
03:49 AM Bug #10042 (Duplicate): OSD crash doing object recovery with EC pool
We observed one OSD crash with the following assertion failure:... Guang Yang
06:10 PM rbd Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
Sage Weil
06:45 AM rbd Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
... Sage Weil
05:44 PM Bug #9921: msgr/osd/pg dead lock giant
Giving Sage this ticket since he took the PR. Greg Farnum
05:35 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
testing this PR https://github.com/ceph/ceph-qa-suite/pull/233
http://pulpito.front.sepia.ceph.com/teuthology-2014...
Yuri Weinstein
03:06 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- install.upgrade:
all:
branch: giant
is upgrading all roles
Sage Weil
02:29 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
Still failed - http://pulpito.front.sepia.ceph.com/teuthology-2014-11-10_10:56:16-upgrade:giant-giant-distro-basic-mu... Yuri Weinstein
10:48 AM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
Moved client.0 to a separate node, testing now
https://github.com/ceph/ceph-qa-suite/pull/232
Yuri Weinstein
09:57 AM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
... Sage Weil
05:20 PM CephFS Bug #10025 (Resolved): Journal undump causes MDS to crash when start pos is not on object boundary
Merged into next in commit:69be8e9b30c18e47c17ff7dafc4ac8fbe00d48e7, and the appropriate backport bits were merged la... Greg Farnum
04:34 PM rgw Feature #9359 (Resolved): rgw: Export user stats in get-user-info Adminops API
Yehuda Sadeh
04:21 PM rgw Bug #9907 (Pending Backport): radosgw-admin: can't disable max_size quota
Sage Weil
04:13 PM rgw Feature #8911 (Pending Backport): RGW doesn't return 'x-timestamp' in header which is used by 'Vi...
Sage Weil
04:09 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
This bug makes me cry as it is the reason for my cluster to be _completely down_ for over 10 days now... Duplicate ad... Dmitry Smirnov
03:20 PM Bug #10059 (Resolved): osd/ECBackend.cc: 876: FAILED assert(0)
-1> 2014-11-09 14:13:01.334410 7f8b93c8b700 10 filestore(/var/lib/ceph/osd/ceph-3) FileStore::read(1.1ds0_head/78... Samuel Just
03:59 PM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
When I look at the log for http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel7-amd64-basic/log.cgi?log=6977d02... Ken Dreyer
03:29 PM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
Disk space looks ok to me:... Ken Dreyer
10:28 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
From http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel7-amd64-basic/log.cgi?log=6977d02f0d31c453cdf554a8f1796... Ken Dreyer
10:03 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
Needs a link:
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-09_17:18:01-upgrade:firefly-x-next-distro-basic...
Zack Cerza
09:12 AM devops Bug #10049 (Resolved): "Failed to fetch package" "rhel7_0-x86_64-basic"
Seems wide spread on next run using rhel 7
Run teuthology-2014-11-09_17:18:01-upgrade:firefly-x-next-distro-basic-...
Yuri Weinstein
03:40 PM Bug #10057 (In Progress): msgr: skipped message on peer reconnect
... Sage Weil
01:42 PM Bug #10057 (Can't reproduce): msgr: skipped message on peer reconnect
ubuntu@teuthology:/a/teuthology-2014-11-09_23:06:01-krbd-next-testing-basic-multi/593102... Sage Weil
03:36 PM Feature #9420: erasure-code: tools and archive to check for non regression of encoding
the backport is needed to generate the content of https://github.com/ceph/ceph-erasure-code-corpus/tree/master/v0.80.... Loïc Dachary
03:32 PM Feature #9420 (Pending Backport): erasure-code: tools and archive to check for non regression of ...
Loïc Dachary
02:57 PM Feature #9420 (Resolved): erasure-code: tools and archive to check for non regression of encoding
I don't think this needs to be backported. Samuel Just
03:06 PM Bug #10058 (Can't reproduce): next stuck in recovery, no progress
/a/sage-2014-11-09_07:49:57-rados-next-testing-basic-multi/591906
/a/sage-2014-11-09_07:49:57-rados-next-testing-bas...
Samuel Just
02:59 PM Bug #9986 (Pending Backport): objecter: map epoch skipping broken
Samuel Just
02:56 PM Feature #9262 (Resolved): Additional namespace issues
Samuel Just
02:55 PM Feature #9031 (Resolved): List RADOS namespaces and list all objects in all namespaces
Samuel Just
02:53 PM Bug #6756 (Pending Backport): journal full hang on startup
Samuel Just
02:51 PM Bug #9852 (Resolved): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID that DNE
Samuel Just
02:49 PM Bug #9987 (Pending Backport): mon: min_last_epoch_complete tracking broken
Samuel Just
02:12 PM Bug #10053 (Resolved): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
Sage Weil
11:18 AM Bug #10053 (In Progress): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
ubuntu@teuthology:/a/sage-2014-11-09_07:49:57-rados-next-testing-basic-multi$ teuthology-ls . | grep FAIL
591648 FAI...
Sage Weil
11:14 AM Bug #10053 (Resolved): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
ubuntu@teuthology:/a/samuelj-2014-11-07_21:48:36-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/590242
...
Samuel Just
01:40 PM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
* how to use ceph_objectstore_tool https://github.com/ceph/ceph-qa-suite/blob/giant/tasks/ceph_objectstore_tool.py
*...
Loïc Dachary
06:20 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
The tests should use the same as #9887 which requires https://github.com/ceph/ceph-qa-suite/compare/wip-dzaddscrub Loïc Dachary
01:27 PM Feature #10056 (New): Object metadata mismatch detection and handling
Possible things we may want to address:
- clone vs head snapshot metadata mismatches
- object metadata vs ondis...
Samuel Just
01:23 PM Feature #10055 (New): PG metadata corruption detection and handling
Possible problems we might want to handle:
- missing pg info
- missing pg epoch
- missing pg log
Correct ...
Samuel Just
01:21 PM Feature #10054 (New): OSD level metadata mismatch handling
Meta feature for detecting and handling OSD metadata.
Possible directions:
- full osdmap vs incremental mismatch?
Samuel Just
11:57 AM devops Feature #10046: run make check on every pull request
Removing myself and clarifying the scope. I would be happy to help with the implementation but I'm not equipped to ta... Loïc Dachary
07:48 AM devops Feature #10046 (Resolved): run make check on every pull request
And report back on the success / failure, with the logs attached for debugging. The suggested approach is to define a... Loïc Dachary
11:24 AM CephFS Bug #9997: test_client_pin case is failing
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_23:04:01-fs-next-testing-basic-multi/593068/ Greg Farnum
11:23 AM CephFS Bug #6613: samba is crashing in teuthology
Still happening: http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_23:14:01-samba-next-testing-basic-multi/59... Greg Farnum
11:13 AM Bug #10052 (Resolved): LibRadosTwoPools[EC]PP.PromoteSnap failure
ubuntu@teuthology:/a/samuelj-2014-11-07_21:48:36-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/590439
...
Samuel Just
09:53 AM rbd Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
#10045 Sage Weil
09:52 AM Bug #10033 (Won't Fix): ceph pg <pg> query hangs when OSD down, EC PG
In this case teh osd seems to be up (the pg state isn't 'stale'), so this is expected behavior (the osd hasn't respon... Sage Weil
09:51 AM rbd Bug #10051 (Won't Fix): kernel-mounted RBD image may block shutdown
init-rbdmap fails to unmap an RBD image when the latter is still in use.
As consequence system shutdown hangs dead w...
Dmitry Smirnov
09:46 AM rgw Bug #9899 (Fix Under Review): Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-du...
Per Sage - removed mon_thrash tests from the rgw/ section, https://github.com/ceph/ceph-qa-suite/pull/230 Yuri Weinstein
09:30 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
this bug was fixed in 0.80.3 or 0.80.4. i think we need to make the 'older' tests skip the mon_thrash tests. Sage Weil
09:23 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
Same issue in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-09_10:00:02-upgrade:dumpling-dumpling-distro... Yuri Weinstein
09:19 AM devops Bug #10050 (Rejected): "Segmentation fault" (radosgw-admin) in upgrade:firefly:singleton-firefly-...
Logs rae in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_17:05:02-upgrade:firefly:singleton-firefly-dist... Yuri Weinstein
09:05 AM Bug #10013: "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_19:13:01-upgrade:dumpling-x-firefly-distro-ba... Yuri Weinstein
08:43 AM Bug #9913: mon: audit log entires for forwarded requests lack info
session is with the monitor that forwarded the request. there's no auth handler for the session as it is a monitor. ... Joao Eduardo Luis
08:41 AM rbd Bug #10030 (Pending Backport): Crash when attempting to open non-existent parent image
Sage Weil
08:40 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Same issue in job http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_18:13:01-upgrade:firefly-x-giant-distro-b... Yuri Weinstein
08:24 AM Bug #9864 (Can't reproduce): osd doesn't report new stats for 3 hours when running test LibCephFS...
not enough info to tell why teh client test hung. let's see if it happens again! Sage Weil
08:08 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
Looking into the osd logs show that the osds don't report new stats for the ~3 hours because no pgs are update in tha... Joao Eduardo Luis
07:47 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
Joao Eduardo Luis
07:44 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
Not so weird after all.
Log shows that last log is created because we had some stats to report:...
Joao Eduardo Luis
07:30 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
this is not the monitor taking 2 hours to commit. The log snippets above refer to two different proposals: the first... Joao Eduardo Luis
06:08 AM Feature #10044 (New): ECUtil::HashInfoRef should have a NONE value
So that "ECBackend::get_hash_info":https://github.com/ceph/ceph/blob/giant/src/osd/ECBackend.cc#L1435 can return it i... Loïc Dachary
05:10 AM Bug #10040 (Rejected): install ceph packages broken for firefly
The problem here is that the machine needs to be properly cleaned up from newer Ceph packages.
It is always proble...
Alfredo Deza
04:13 AM Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size...
Hi Sam,
Any suggestion in terms of how to fix this issue?
One potential solution is to validate the digest for ea...
Guang Yang

11/09/2014

10:41 PM CephFS Bug #9995 (Resolved): failing test_filelock
Zheng Yan
10:39 PM Bug #10040 (Rejected): install ceph packages broken for firefly
hitting the following error, when trying to install ceph packages for firefly on rhel7.0 using ceph-deploy.
test m...
Tamilarasi muthamizhan
08:32 PM Bug #10039 (Resolved): osd cann't entry up status with cpu 100%, when osd restart from out status.
backported in commit:0804deeab293e09123d1b58825051ccc4dddbc0e Sage Weil
07:40 PM Bug #10039: osd cann't entry up status with cpu 100%, when osd restart from out status.
I have fix this problem, by merge this patch, thanks.
osd: fix map advance limit to handle map gaps
The recent ...
qiu shanggao
06:03 PM Bug #10039 (Resolved): osd cann't entry up status with cpu 100%, when osd restart from out status.
ceph version: 0.80.6
platform : Redhat 6.5
Host: 3
osd node: 15 (5 per host)
operator:
1 start the ceph cluste...
qiu shanggao
05:52 PM Bug #10038 (Rejected): osd cannwhen osd outed restart
Haomai Wang
05:51 PM Bug #10038: osd cannwhen osd outed restart
operator error, pls delete it thanks. qiu shanggao
05:50 PM Bug #10038 (Rejected): osd cannwhen osd outed restart
qiu shanggao
10:09 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Thanks and sorry for lack of details. I was curious about KV-based OSDs for a long time and after upgrading to 0.87 I... Dmitry Smirnov
05:45 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
It's a pity to see it.
Thanks to Dmitry Smirnov, could you give a detail summary about what you done and suggestio...
Haomai Wang
05:02 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Haomai Wang wrote:
> EC+KeyValueStore is a good match.
Ironically that was the worst thing I ever tried in Ceph....
Dmitry Smirnov
09:19 AM CephFS Bug #9341: MDS: very slow rejoin
Greg Farnum wrote:
> Hmm, we didn't put this in Giant initially because we were trying not to perturb it. Master has...
Dmitry Smirnov
05:17 AM Bug #8752: firefly: scrub/repair stat mismatch
After upgrade to 0.87 I've noticed inconsistencies on all PGs of all caching pools again, during and after deleting o... Dmitry Smirnov
05:12 AM rbd Feature #10037 (Resolved): cache-tier: Optimise RBD image removal
While removing an RBD image from EC pool I've noticed that it bubbles-up to caching pool hence removal is very slow. ... Dmitry Smirnov
05:07 AM Feature #10036 (Resolved): osd tree to show primary-affinity value
It would be nice (and useful) if "primary-affinity" value could be shown in "ceph osd tree" view. Dmitry Smirnov
01:41 AM Documentation #10035 (Resolved): explain the semantic of pgp num
As of Giant the "documentation":http://ceph.com/docs/giant/rados/operations/placement-groups/#set-the-number-of-place... Loïc Dachary

11/08/2014

06:29 PM Bug #9970 (Fix Under Review): document erasure coded pool simple operations
https://github.com/ceph/ceph/pull/2888 Loïc Dachary
08:07 AM CephFS Bug #9977 (Fix Under Review): cephfs-journal-tool falsely reports invalid start_ptr
Backport to giant PR at:
https://github.com/ceph/ceph/pull/2887
John Spray
04:22 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Hmm, thanks.
I'm not sure who is working or dive into it. I need more free time to diagnose it. EC+KeyValueStore i...
Haomai Wang
03:31 AM devops Feature #7475: ceph-disk: prepare should be idempotent
Checking the arguments you listed with a * would be fine. Loïc Dachary

11/07/2014

07:41 PM Bug #10033 (Won't Fix): ceph pg <pg> query hangs when OSD down, EC PG
EC PGs with down OSD result in hang the PG query command.
-bash-4.1$ sudo ceph pg 3.1352 query
^CError EINTR: pr...
c3 cleveland
04:30 PM Linux kernel client Bug #9894 (Resolved): kcephfs: rm -r left files behind
Greg Farnum
02:35 AM Linux kernel client Bug #9894 (Pending Backport): kcephfs: rm -r left files behind
https://github.com/ceph/ceph/pull/2876 Zheng Yan
04:27 PM CephFS Bug #10011: Journaler: failed on shutdown or EBLACKLISTED
Should be resolved by commit:6977d02f0d31c453cdf554a8f1796f290c1a3b89. We may want to backport once it's been through... Greg Farnum
04:16 PM CephFS Feature #4138 (Resolved): MDS: forward scrub: add functionality to verify disk data is consistent
This one ticket at least is definitely fulfilled by commit:daa9f9ffe82a811b5e0e69ef52241c4e0b7556bc Greg Farnum
02:58 PM devops Feature #7475: ceph-disk: prepare should be idempotent
'ceph-disk prepare' takes the following arguments:
I have marked arguments with [*] that I believe should match to r...
Blaine Gardner
12:36 PM devops Feature #7475: ceph-disk: prepare should be idempotent
I agree with your assessment. Blaine Gardner
09:40 AM devops Feature #7475: ceph-disk: prepare should be idempotent
Less assumptions is better indeed. When given an existing partition (or a device that is already partitionned) ceph-d... Loïc Dachary
09:26 AM devops Feature #7475: ceph-disk: prepare should be idempotent
I spoke with my coworker and have a few more thoughts:
Something else to consider is that the FSID should be check...
Blaine Gardner
08:59 AM devops Feature #7475: ceph-disk: prepare should be idempotent
I have looked at the 'ceph-disk prepare' scripts and have a few thoughts:
1) A '--force' option could be added to ...
Blaine Gardner
02:54 PM Bug #7679 (Resolved): mds: stuck on TMAP2OMAP check incorrectly
Added a new section for upgrading from Dumpling to Firefly. Reviewed by Tamil.
http://ceph.com/docs/master/instal...
John Wilkins
01:03 PM Bug #7679 (In Progress): mds: stuck on TMAP2OMAP check incorrectly
John Wilkins
12:10 PM Bug #7679: mds: stuck on TMAP2OMAP check incorrectly
https://github.com/ceph/ceph-qa-suite/pull/229 - fixed by Yuri
assigning this to John Wilkins, to make sure we alr...
Tamilarasi muthamizhan
09:59 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Samuel Just wrote:
> Loic: you'll want to cover this in the same test as the hinfo one.
Ack :-)
Loïc Dachary
07:48 AM rbd Bug #10030 (Fix Under Review): Crash when attempting to open non-existent parent image
Jason Dillaman
07:09 AM rbd Bug #10030 (Resolved): Crash when attempting to open non-existent parent image
If a child image is not able to open a parent image, librbd will incorrectly attempt to close the parent image handle... Jason Dillaman
07:25 AM Bug #9987: mon: min_last_epoch_complete tracking broken
I don't think this will help much with your case. This patch will allow the monitor to delete data that should be re... Joao Eduardo Luis
07:12 AM Bug #10021 (Can't reproduce): ceph auth caps doesn't show in the CLI help / commands list
Running on "pretty-close to master with a few unrelated patches":... Joao Eduardo Luis
06:52 AM Linux kernel client Feature #9906 (In Progress): Inline data support
(setting assignee because you mentioned you were working on it) John Spray
06:18 AM Bug #9876: failed pull needs to allow mark_unfound_lost revert eventually
A customer has requested this be backported to firefly.
Tupper Cole
03:32 AM Bug #9554: "FAILED assert(0 == "hit suicide timeout")" in upgrade:firefly-firefly-testing-basic-v...
Its reproducible in ceph 0.84 (customized build)
2014-11-04 15:26:39.388499 7f377cac3700 0 -- 10.242.42.172:7241/...
Sahana Lokeshappa
03:01 AM Messengers Feature #10029 (Resolved): Retry binding on IPv6 address if not available
On systems with IPv6 it might be that the IPv6 address is not yet available when a MON or OSD boots.
This can have...
Wido den Hollander

11/06/2014

11:43 PM CephFS Bug #9995: failing test_filelock
Zheng Yan
12:16 AM CephFS Bug #9995: failing test_filelock
https://github.com/ceph/ceph-qa-suite/pull/228 Zheng Yan
09:46 PM CephFS Bug #9977 (Pending Backport): cephfs-journal-tool falsely reports invalid start_ptr
Merged to next in commit:574c1d4bad37514ba941e3ae83e33a7d926697d9
Yes, let's please backport.
Greg Farnum
07:27 PM devops Feature #7475: ceph-disk: prepare should be idempotent
No update yet, it never was enough of a problem to get in front of the bug queue. Would you have time to work on it ?... Loïc Dachary
01:06 PM devops Feature #7475: ceph-disk: prepare should be idempotent
I am encountering this issue using puppet-ceph for automated deployments. Is there a status update on this bug? Blaine Gardner
05:49 PM CephFS Bug #9674: nightly failed multiple_rsync.sh
I messed up (didn't set sudo everywhere), newer commits will hopefully make it all good. giant:f66bf31b6743246fb1c882... Greg Farnum
05:35 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
I'm saying I think we want to backport all of the flag changes to giant (for userspace) because we're seeing failures... Greg Farnum
05:16 PM Linux kernel client Bug #9894 (Resolved): kcephfs: rm -r left files behind
Zheng Yan
05:16 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
giant does not contain the commit that introduce the ORDERED flag Zheng Yan
04:48 PM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Actually, the marking down thing won't work. Samuel Just
04:29 PM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
That all looks right. I'd mark the osd down, get the object, re-put it, and mark the osd back up. Should cause reco... Samuel Just
02:10 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Besides the code fix, I am wondering what is the right way to fix the PG state (and object)? Bringing the OSD out mig... Guang Yang
04:43 PM Bug #10028 (Duplicate): ec_lost_unfound failing on giant
ubuntu@teuthology:/a/teuthology-2014-11-03_02:32:01-rados-giant-distro-basic-multi/584089
2014-11-03T10:46:32.795 ...
Samuel Just
02:16 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Samuel Just wrote:
> Type of osd probably does matter if the KV osds are distributing faulty information.
Earlier...
Dmitry Smirnov
10:48 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Type of osd probably does matter if the KV osds are distributing faulty information. Samuel Just
05:08 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Please let me know if you need more logs.
Due to this error all OSDs (except few) in my cluster are down or crashing...
Dmitry Smirnov
01:55 AM Bug #9978 (New): keyvaluestore: void ECBackend::handle_sub_read
Sorry. Mistake Haomai Wang
01:49 AM Bug #9978 (Duplicate): keyvaluestore: void ECBackend::handle_sub_read
http://tracker.ceph.com/issues/9978#change-44091 Haomai Wang
12:32 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Here is another log from filestore-based OSD, just crashed. Dmitry Smirnov
01:37 PM rbd Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-05_23:00:03-rbd-master-testing-basic-multi/588155/... Yuri Weinstein
11:35 AM Bug #7679: mds: stuck on TMAP2OMAP check incorrectly
Okay, so we had a better smoking gun, logs in teuthology:~/jcsp/7679. The OSDs all have features set to 0 in the OSD... John Spray
08:39 AM Bug #7679: mds: stuck on TMAP2OMAP check incorrectly
I'm confused by the order of operations in the tests, it seems like there is an upgrade, then a restart, then an upgr... John Spray
11:16 AM CephFS Bug #10025 (Resolved): Journal undump causes MDS to crash when start pos is not on object boundary

Related ML thread from Jasper Siero, who first encountered the issue on firefly (http://lists.ceph.com/pipermail/ce...
John Spray
10:42 AM devops Bug #9747: ceph.spec.in will always use 95-ceph-osd-alt.rules
Backported to firefly. Samuel Just
10:41 AM Bug #9875 (Resolved): stuck recovering due to unfound hit_set object
backported to firefly Samuel Just
10:41 AM Bug #9821 (Resolved): failed to recover before timeout expired
Backported to firefly Samuel Just
10:40 AM Bug #9718 (Resolved): osd_types: check_new_interval: min_size check needs to consider CRUSH_ITEM_...
backported to firefly Samuel Just
10:39 AM Bug #9113: osd: snap trimming eats memory, linearly
Backported to firefly. Samuel Just
10:39 AM Bug #9626 (Resolved): PG: cancel backfill reservations if we get a cancel during backfill
Don't really want to backport to dumpling. Samuel Just
10:38 AM Bug #9574 (Resolved): Backfill: recheck full status once reservation is granted
backported to firefly, don't really want to backport to dumpling Samuel Just
10:38 AM Feature #9262: Additional namespace issues
David Zafman
10:38 AM Bug #9293 (Resolved): _collection_move_rename EEXIST
firefly Samuel Just
10:37 AM Bug #8315 (Resolved): osd: watch callback vs callback funky
firefly Samuel Just
10:29 AM Bug #8629 (Resolved): cache_evict needs to prevent make_writeable from creating a snapdir
Merged to firefly. Samuel Just
10:27 AM Bug #9301 (Resolved): paxos: off by one w/ versions in forming quorum
merged to firefly Samuel Just
10:26 AM Bug #9053 (Resolved): mon/Paxos.cc: 628: FAILED assert(begin->last_committed == last_committed)
Merged to firefly. Samuel Just
10:21 AM Bug #9502 (Resolved): mon: does not verify disk is not full on startup
Samuel Just
10:17 AM Bug #9851 (Resolved): crash on journal/filestore shutdown on firefly
Samuel Just
12:05 AM Bug #9851: crash on journal/filestore shutdown on firefly
It has been added to wip-sam-testing ( https://github.com/ceph/ceph/pull/2764#issuecomment-61167705 ) which is anothe... Loïc Dachary
10:15 AM Bug #9675: splitting a pool doesn't start when rule_id != ruleset_id
Backported to firefly. Samuel Just
09:29 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Same issues in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-05_17:18:01-upgrade:firefly-x-next-distro-b... Yuri Weinstein
08:53 AM Support #10024 (New): Cluster unreachable after restart
Dear Support,
I'm quite a new user, I already asked for this question to the users lists without any solution.
I se...
Luca Mazzaferro
08:42 AM rados-java Bug #10023 (Resolved): method Rados.shutdown() is missing for closing the connection to the clust...
Hi,
if you call the sample code (1) 3000 times, you will get an error -24 (EMFILE, Too many open files). Why? Beca...
Daniel Schwager
07:00 AM Bug #10018 (Fix Under Review): OSD assertion failure if the hinfo_key xattr is not there (corrupt...
https://github.com/ceph/ceph/pull/2872 Loïc Dachary
06:14 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Actually it happens on master, my test was incorrect and is now fixed : https://github.com/dachary/ceph/commit/312cda... Loïc Dachary
05:37 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Also on 0.80.7... Loïc Dachary
05:22 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Test case that reproduces the problem: https://github.com/dachary/ceph/commit/5639303646418913ba0929ce73e8a5c61190191... Loïc Dachary
02:03 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Sorry we don't have verbose log during crashing, but following is the code leading the crash:... Guang Yang
03:18 AM Feature #9943: osd: mark pg and use replica on EIO from client read
Wei will work on this one. Guang Yang
01:49 AM Bug #9727 (Duplicate): 0.86 EC+ KV OSDs crashing
http://tracker.ceph.com/issues/9978#change-44091 Haomai Wang
01:36 AM Messengers Bug #10022 (Resolved): AsyncMessenger: Wrong newly_acked_seq when replacing existing connection
Here the output. (monitor ips are 10.11.1.27,10.11.1.28,10.11.1.29)
# ceph -w --debug-ms=10/10
2014-11-04 10:38:...
Haomai Wang
12:06 AM Bug #9485: Monitor crash due to wrong crush rule set
Did not forget about it, just busy with other things (the OpenStack summit after the Giant release). Loïc Dachary
12:02 AM Support #9901: libgoogle-perftools4: tcmalloc performance regression on armhf
Loïc Dachary
12:00 AM Bug #10021 (Can't reproduce): ceph auth caps doesn't show in the CLI help / commands list
When trying to resetting a client's permissions I've tried to use the 'ceph auth add' command, and it failed. When se... Yogev Rabl

11/05/2014

11:51 PM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
fixed by "ceph: introduce global empty snap context" Zheng Yan
11:41 PM Bug #10018 (Need More Info): OSD assertion failure if the hinfo_key xattr is not there (corrupted...
Could you please attach the logs of the crashed OSD (the last 20,000 lines would be enough) ? Loïc Dachary
08:02 PM Bug #10018 (Resolved): OSD assertion failure if the hinfo_key xattr is not there (corrupted?) dur...
We observed an OSD crash during scrubbing on EC pool, the crash happened if the hinfo_key xattr of the file is absent... Guang Yang
11:26 PM Bug #10020 (Closed): bloom filter unit tests fail (power8)
As of ac3c1cb5d0e17250fa147c11e42ed93e15b2184a unittest_bloom_filter fails with:... Loïc Dachary
09:55 PM Tasks #10019 (Closed): rbd
hello all
I'm deploying openstack with ceph.
Compute node used the rbd device to created disk.
I have a problem...
lion cui
08:30 PM Bug #9927: RHEL: selinux-policy-targeted rpm update triggers slow requests
I would strongly reccomend limiting it to the subdirectories where large mounts are, not on the parent directory. Th... Wade Mealing
06:32 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Sorry for confusion -- I have impression that it may be not related to store type.
Attaching more detailed log...
Dmitry Smirnov
06:09 PM Bug #10017 (Resolved): OSD wrongly marks object as unfound if only the primary is corrupted for E...
Recently we observed there was one PG stuck at recovering with one object marked as lost, the scrubbing log showed th... Guang Yang
04:35 PM Bug #10016 (Resolved): "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
In a new suite all jobs failed.
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-05_11:35:26-upgrade:giant-gia...
Yuri Weinstein
04:19 PM Bug #9788 (Rejected): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeou...
2014-11-05 09:29:31.507827 7fa236d5b700 10 filestore(/var/lib/ceph/osd/ceph-3) sync_entry commit took 150.696754, int... Samuel Just
04:15 PM Bug #9788: "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
584644 and 584647 both stuck in sync, probably environmental.
Samuel Just
04:14 PM Bug #9788: "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
Also seeing in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-04_19:00:01-rados-dumpling-distro-basic-mul... Yuri Weinstein
02:32 PM rgw Bug #9918 (Fix Under Review): RGW-Swift: SubUser access permissions, does not seems to work
Yehuda Sadeh
01:41 PM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
The bucket index cannot reside on EC pools. Yehuda Sadeh
01:37 PM rgw Bug #9973 (Fix Under Review): Validation of Swift DLO manifest object ETag doesn't match OpenStac...
Yehuda Sadeh
01:23 PM rgw Bug #8766 (Resolved): multipart minimum size error should be EntityTooSmall
Tested on firefly, seem to work. Yehuda Sadeh
01:22 PM rgw Bug #9479 (Fix Under Review): ETag is not included in the XML response to put object copy operation
Yehuda Sadeh
12:28 PM rgw Bug #9478 (Fix Under Review): Incorrect content type in response header
Yehuda Sadeh
10:44 AM rgw Bug #10015 (Resolved): rgw sync agent: 403 when syncing object that has tilde in its name
The cuplrit is the python requests module that the sync agent uses in order to send the http requets. Wittily the mod... Yehuda Sadeh
09:18 AM Bug #10014 (Pending Backport): osd: spurious memmove on data payload
Sage Weil
09:13 AM Bug #10014 (Resolved): osd: spurious memmove on data payload
see commit:a1aa70f2f21339feabfe9c1b3c9c9f97fbd53c9d Sage Weil
09:00 AM Bug #10013 (Rejected): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_19:13:01-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
08:58 AM CephFS Bug #9995: failing test_filelock
We'll need to update the test then so that it detects this situation and aborts quietly instead of raising an error. Greg Farnum
08:39 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
suite:upgrade:dumpling-firefly-x
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_17:25:01-upgrade:dump...
Yuri Weinstein
08:32 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
suite:upgrade:firefly:singleton
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_18:41:21-upgrade:firef...
Yuri Weinstein
05:42 AM CephFS Bug #10011: Journaler: failed on shutdown or EBLACKLISTED
Ah... I've just realised why the "respawn on blacklist" thing I put in a while back isn't kicking in here: because Jo... John Spray
04:32 AM CephFS Bug #10011: Journaler: failed on shutdown or EBLACKLISTED

mon.a says:...
John Spray
05:19 AM Feature #9962 (Fix Under Review): osd: kill 'category' in stats and public API
Sage Weil
12:47 AM Feature #9957: librados: add fadvise op
From the man posix_advise & kernel related code:
In kernels before 2.6.18, POSIX_FADV_NOREUSE had the same se...
jianpeng ma
12:10 AM Bug #9909: lost_unfound test/rados tool flawed, EEXIST when putting empty object
http://tracker.ceph.com/issues/9387 addresses the ceph-qa-suite part of this problem by using the /etc/group file ins... Loïc Dachary

11/04/2014

11:53 PM Bug #9909 (Resolved): lost_unfound test/rados tool flawed, EEXIST when putting empty object
https://github.com/ceph/ceph/pull/2858 Loïc Dachary
08:13 PM Bug #9909 (Fix Under Review): lost_unfound test/rados tool flawed, EEXIST when putting empty object
Already fixed by 50e80407f3c2f74d77ba876d01e7313c3544ea4d. Creating pull request for backport to giant. David Zafman
11:48 PM rgw Bug #9907: radosgw-admin: can't disable max_size quota
Hi, can you help merge this fix?
https://github.com/ceph/ceph/pull/2782
Dong Lei
10:59 PM CephFS Bug #9995: failing test_filelock
... Zheng Yan
08:54 PM CephFS Bug #9995: failing test_filelock
Is there something we can do as a workaround to prevent this blocking things? I expect people are going to use new ce... Greg Farnum
07:36 PM CephFS Bug #9995 (Won't Fix): failing test_filelock
it's a bug in old version of libfuse, it calls our setlk callback for both fcntl setlk and flock requests Zheng Yan
10:15 PM rgw Bug #9877 (Fix Under Review): In some cases it's possible for rgw to segfault on http COPY
Yehuda Sadeh
09:54 PM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
Ah, moreover, the issue is fixed already in the firefly branch but didn't make it to a dot release (will be in the ne... Yehuda Sadeh
09:44 PM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
Ok, I was able to reproduce it using this script. It seem that there are a few things that don't work as required. Th... Yehuda Sadeh
02:37 AM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
reproduces on ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
#!/bin/bash
base=test
source=${...
Anonymous
07:54 PM Linux kernel client Bug #9432 (Resolved): kcephfs: null pointer deref in posix_acl_create
fixed by commit b1ee94aa593abd03634bc3887b8e189840e42c12 Zheng Yan
07:53 PM Linux kernel client Bug #9505 (Duplicate): kcephfs: client gets stuck in reconnect loop?
Zheng Yan
07:53 PM Linux kernel client Bug #9505: kcephfs: client gets stuck in reconnect loop?
dup of #9458 Zheng Yan
07:50 PM Linux kernel client Bug #9426 (Resolved): kcephfs: soft lockup in handle mds map
Zheng Yan
05:46 PM CephFS Bug #9994: ceph-qa-suite: nfs mount timeouts
teuthology-2014-11-03_23:10:01-knfs-giant-testing-basic-multi/585658/ Greg Farnum
05:40 PM Bug #10012 (Can't reproduce): Configuration parameters not picked up outside of the [global] sect...

Certain osd* and radosgw* parameters are not picked up outside of the [global] section in the ceph.conf file, as pe...
Christian Balzer
05:40 PM CephFS Bug #10011 (Resolved): Journaler: failed on shutdown or EBLACKLISTED
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_23:08:01-kcephfs-giant-testing-basic-multi/585648/
teuth...
Greg Farnum
05:13 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
We'll also want to do a backport to Giant of this and the prior series. Greg Farnum
04:53 PM Bug #10010 (Resolved): ceph_osd.cc calls global_init_shutdown_stderr even when running with -f or...
ceph-osd is difficult to debug in operation when running under systemd or docker, or any other system that expects to... Scott Laird
04:48 PM devops Documentation #10009 (Rejected): Ceph build requirements are incomplete
The build instructions at http://ceph.com/docs/giant/install/build-ceph/ have a list of Ubuntu packages that are requ... Scott Laird
04:12 PM Bug #10008 (Resolved): "obsolete rollback obj" error in upgrade:firefly-x-giant-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_18:18:02-upgrade:firefly-x-giant-distro-basic-v... Yuri Weinstein
04:02 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
Yehuda, can you take a look pls? Yuri Weinstein
03:56 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
It's probably duplicate of #8311, but effects other releases:... Yuri Weinstein
03:44 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
Yes, Tamil said we have such a case with empty pool name. Yuri Weinstein
01:36 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
Is this that bug where radosgw can create a pool with an empty name? Samuel Just
01:35 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
It does appear to be trying to get pg_num for the empty name pool. Is that deliberate? Samuel Just
11:15 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
Same results in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-02_10:00:02-upgrade:dumpling-dumpling-dist... Yuri Weinstein
03:15 PM Feature #10007 (New): option to disable erasure code plugin version check
An option such as... Loïc Dachary
03:04 PM Bug #9939: "giant" no longer log scrub errors
I am getting something like ... Dmitry Smirnov
01:19 PM Bug #9939: "giant" no longer log scrub errors
Ok, pick a known-bad pg. On the primary, set debug osd = 20, debug ms = 1, debug filestore = 20. Scrub. Attach cep... Samuel Just
02:59 PM Bug #10006 (Resolved): osd cache full mode still skips young objects
commit f4ee949
Sage Weil
01:17 PM Bug #10003 (Duplicate): "found obsolete rollback obj" error in upgrade:firefly-x-giant-distro-bas...
Samuel Just
10:29 AM Bug #10003 (Duplicate): "found obsolete rollback obj" error in upgrade:firefly-x-giant-distro-bas...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_18:18:02-upgrade:firefly-x-giant-distro-basic-v... Yuri Weinstein
01:06 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Because the bug you say this duplicates is about KV osds. Samuel Just
01:06 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Are you saying these osds are using the KV backend? Samuel Just
12:09 PM rbd Bug #9854: librbd: reads contending for cache space can cause livelock
PR: https://github.com/ceph/ceph/pull/2820 Jason Dillaman
09:12 AM rbd Bug #9854 (Fix Under Review): librbd: reads contending for cache space can cause livelock
Jason Dillaman
11:48 AM Bug #10004: ceph osd find does not correctly report crush locations
Moreover, it's no longer reporting the entire crush branch, but only the immediate parent; that's a change in behavio... Dan Mick
10:38 AM Bug #10004 (Can't reproduce): ceph osd find does not correctly report crush locations
... Christina Meno
10:59 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
suite:upgrade:dumpling-firefly-x
Same issue in job http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_17:25...
Yuri Weinstein
09:29 AM rbd Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
Two jobs failed ['584634', '584648']
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_17:18:0...
Yuri Weinstein
10:24 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
And if it hasn't, the same (or at least a full dmesg) from the previous crash won't hurt, if you still have it around. Ilya Dryomov
10:21 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
If it's crashed again, a full dmesg and a tail (say, last 5-10 minutes before the crash) of osd/messenger logs would ... Ilya Dryomov
10:17 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
Is there anything which needs to be gathered from the cluster currently displaying this issue which could help out? JuanJose Galvez
09:29 AM rbd Bug #9742: `rbd map lun` fails with: (2) No such file or directory on kernel 3.14.14 w/ udev-216 ...
I'm guessing CRYPTO_CBC kernel config option is not enabled - -ENOENT is most likely because crypto core can't find a... Ilya Dryomov
09:20 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-03_17:18:01-upgrade:firefly-x-next-distro-basic-vps/
St...
Yuri Weinstein
09:07 AM Bug #9788 (New): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" is...
suite:upgrade:firefly-x
next
Run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-03_17:18:01-upgrade:firef...
Yuri Weinstein
09:04 AM rgw Bug #9587: ceph-radosgw sysvinit script on EL6 cannot set ulimit
The pull request is closed, anything new with this one? Yehuda Sadeh
09:02 AM rgw Bug #9148 (Resolved): rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy...
Yehuda Sadeh
09:01 AM rbd Bug #9936 (Pending Backport): Exporting images larger than 2GB fails
Jason Dillaman
08:39 AM Bug #9998: Replaced OSD weight below 0
We see this often in our dumpling cluster. Kinda annoying. Dan van der Ster
06:00 AM Bug #9998: Replaced OSD weight below 0
This bug might be related to this part of code (use of one variable in two nested loops):... Pawel Sadowski
04:53 AM Bug #9998 (Resolved): Replaced OSD weight below 0
I've hit a bug when replacing OSDs. Under specific conditions replaced OSD gets weight of @-3.052e-05@.
h4. How to...
Pawel Sadowski
06:54 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
Looking more, I noticed that the pool 35 PGs are not entering the backfilling state -- only recovery. I'm bringing os... Dan van der Ster
02:31 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
Hi Sage and Sam,
I've just tried wip-9113-9487-dumpling on our test cluster. (Using this build: http://gitbuilder.ce...
Dan van der Ster
06:53 AM CephFS Bug #9869 (Resolved): Client: not handling cap_flush_ack messages properly
Greg Farnum
02:09 AM Bug #9727: 0.86 EC+ KV OSDs crashing
OK, it's seemed that not a simple test problem I just misunderstand. So do you have more logs and I find there exists... Haomai Wang
01:43 AM Bug #9727: 0.86 EC+ KV OSDs crashing
Well, I can apply selective patches on top of 0.87 but I'd be reluctant to deploy master branch cluster-wide...
All ...
Dmitry Smirnov
01:33 AM Bug #9727: 0.86 EC+ KV OSDs crashing
Hmm, still now KeyValueStore isn't suitable for large version upgrade. So I don't sure which problem you hit.
I'm ...
Haomai Wang
01:46 AM rgw Bug #9766: s3tests: test_100_continue failing
Yes, sorry it was due to using apache from the ubuntu repos Abhishek Lekshmanan

11/03/2014

11:43 PM Bug #9727: 0.86 EC+ KV OSDs crashing
My prematurely upgraded to "Giant" cluster is practically wrecked by this problem.
Haomai, is there any additional i...
Dmitry Smirnov
07:55 PM CephFS Feature #1398: qa: multiclient file io test
A first pass of this is in origin/wip-multiclientio-wusui Anonymous
04:22 PM Bug #7679 (New): mds: stuck on TMAP2OMAP check incorrectly
I see similar problem on latest runs:
http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-28_17:00:01-upgrade:f...
Yuri Weinstein
03:30 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Any news on this please? I can barely use my cluster since upgrade to Giant -- OSDs are crashing during backfill all ... Dmitry Smirnov
12:10 PM CephFS Bug #9997 (Resolved): test_client_pin case is failing
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-02_23:04:01-fs-next-testing-basic-multi/583588/
RuntimeErro...
Greg Farnum
12:08 PM devops Bug #9996 (Won't Fix): SyntaxError in Chef run
... Alfredo Deza
12:05 PM CephFS Bug #9995 (Resolved): failing test_filelock
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-02_23:04:01-fs-next-testing-basic-multi/583589/
It's gettin...
Greg Farnum
11:43 AM CephFS Bug #9994: ceph-qa-suite: nfs mount timeouts
http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-31_23:10:01-knfs-giant-testing-basic-multi/582459/
http://q...
Greg Farnum
11:34 AM CephFS Bug #9994 (Resolved): ceph-qa-suite: nfs mount timeouts
... Greg Farnum
11:27 AM CephFS Bug #9977: cephfs-journal-tool falsely reports invalid start_ptr
https://github.com/ceph/ceph/pull/2853 John Spray
11:27 AM CephFS Bug #9977 (Fix Under Review): cephfs-journal-tool falsely reports invalid start_ptr
PR up for next, probably also worth backporting to giant as without it journal-tool is pretty useless on filesystems ... John Spray
10:00 AM devops Bug #9992 (Resolved): git configuration issues in Jenkins slaves
It looks like it was just one host that needed this, copied the settings from ~/.gitconfig on the Jenkins server. Alfredo Deza
08:31 AM devops Bug #9992: git configuration issues in Jenkins slaves
It looks like it is this on the Jenkins server itself:... Alfredo Deza
08:16 AM devops Bug #9992 (Resolved): git configuration issues in Jenkins slaves
... Alfredo Deza
09:06 AM rgw Bug #7796 (Resolved): RGW Keystone token auth fails with '411 Length Required' when Keystone usin...
Yehuda Sadeh
09:02 AM rgw Bug #8587 (Resolved): rgw: subuser object not created correctly
Yehuda Sadeh
08:44 AM Bug #9987: mon: min_last_epoch_complete tracking broken
BTW, we compact our dumpling mon leveldb _without_ restarting. We do
ceph tell mon.0 compact
and that can s...
Dan van der Ster
04:17 AM devops Bug #9697: exitcode of gatherkeys has changed the latests versions
I placed a new issue in the correct project: #9991 Kenneth Waegeman
12:03 AM rados-java Bug #9990 (Resolved): Rbd.list() / JVM crashes
If I call Rbd.list() to get all the available images, my JVM mostly crashes (3).
Sometimes, if it does not crash...
Daniel Schwager
12:00 AM rados-java Bug #9989 (Resolved): Rbd.list() / more than 1024 images in pool?
Is your list()-implementation limited to 1024 images because of
IntByReference size = new IntByReference(1024);
...
Daniel Schwager

11/02/2014

11:58 PM rados-java Bug #9988 (Resolved): Rbd.list() / list contains one element, if pool is empty
If no image is in the pool, the list contains one empty ("") element but should contain 0 elemtents. Daniel Schwager
03:16 PM Feature #9954 (Resolved): buffer: method to ensure an extent is contiguous
Sage Weil
07:29 AM Feature #9954: buffer: method to ensure an extent is contiguous
how about wip-buffer? can certainly optimize the rebuild case, but i think here we expect to never hit it. https://... Sage Weil
02:53 PM Bug #9986: objecter: map epoch skipping broken
https://github.com/ceph/ceph/pull/2851 Sage Weil
02:47 PM Bug #9986 (Resolved): objecter: map epoch skipping broken
Sage Weil
02:52 PM Bug #9987: mon: min_last_epoch_complete tracking broken
https://github.com/ceph/ceph/pull/2850 Sage Weil
02:49 PM Bug #9987 (Resolved): mon: min_last_epoch_complete tracking broken
When we moved to pulling pgmap values out of keys we broke the min_last_epoch_clean invalidation code.
I suspect t...
Sage Weil
07:35 AM Feature #9926 (Resolved): AsyncMessenger: Support kqueue interface for BSD and mac osx OS
Haomai Wang
02:18 AM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
Sage, everyone - when will approximately this exact fix land into master? It effectively blocks our testing progress ... Andrey Korolyov

11/01/2014

02:00 PM Bug #9976: ceph cli injectargs parsing broken
I did not see this one, sorry about that. Loïc Dachary

10/31/2014

05:27 PM Feature #9981: osd: cache: proxy writes (instead of unconditionally promoting)
One thing we'll need to be careful about when not promoting is how we handle snapshots. I don't remember exactly how ... Greg Farnum
01:55 PM Feature #9981 (Resolved): osd: cache: proxy writes (instead of unconditionally promoting)
This should work similar to the read recency checks that don't always promote on first read, but give the cache osd a... Sage Weil
05:16 PM Bug #9985 (Resolved): osd: incorrect atime calculation
https://github.com/ceph/ceph/pull/2816 should be backported Sage Weil
05:10 PM CephFS Tasks #3680 (Rejected): deduplication in ceph
we should discuss this on the email list Sage Weil
04:38 PM RADOS Bug #9984: lttng_probe_unregister hangs on shutdown
maybe related to dynamic+static linking of lttng? Josh Durgin
04:16 PM RADOS Bug #9984 (New): lttng_probe_unregister hangs on shutdown
... Sage Weil
04:31 PM Bug #9976 (Resolved): ceph cli injectargs parsing broken
Sage Weil
02:25 PM Bug #9976: ceph cli injectargs parsing broken
Dan Mick
02:25 PM Bug #9976: ceph cli injectargs parsing broken
close; needs "if injectargs and ..", but that seems good Dan Mick
02:17 PM Bug #9976: ceph cli injectargs parsing broken
Maybe as simple as... Dan Mick
09:11 AM Bug #9976 (Resolved): ceph cli injectargs parsing broken
looks like it was the recent -- handling that broke?... Sage Weil
02:21 PM phprados Feature #424: Stream wrappers
Charles du Jeu wrote:
> Hi! Maybe I'm totally at the wrong place, if so, sorry for that.
> Was there any work done...
Wido den Hollander
08:27 AM phprados Feature #424: Stream wrappers
Hi! Maybe I'm totally at the wrong place, if so, sorry for that.
Was there any work done on that (streamwrapper imp...
Charles du Jeu
02:14 PM Bug #9983 (Resolved): Cleanup boost optionals for boost 1.56
This patch cleans up fatal errors with boost 1.56 when implicitly converting optionals to non-optional values.
It ...
William Kennington
01:58 PM RADOS Feature #9982 (New): osd: cache: make writes in readonly mode invalidate and then forward 
Sage Weil
01:36 PM Feature #9980 (Resolved): osd: cache: proxy reads during promote
wip-promote-forward may be a useful base, although I think it is not quite correct (we should proxy reads, not forwar... Sage Weil
01:35 PM Feature #9979 (Resolved): osd: cache: proxy reads (instead of redirect)
Sage Weil
01:12 PM Bug #9974 (Won't Fix): Osd-s bind only to 1st network in "public network"
OSDs bind and listen on only a single IP by design. Changing it would require major changes to how we handle identity... Greg Farnum
04:58 AM Bug #9974 (Won't Fix): Osd-s bind only to 1st network in "public network"
OSD daemons bind only to first network in ceph.conf "public network" parameter.
ceph.conf:
[global]
cluster netw...
elder one
10:57 AM Bug #9978 (Closed): keyvaluestore: void ECBackend::handle_sub_read
On 0.87 "Giant" I'm repeatedly hit by the following assert, typically crashing 4 ODSs at once:... Dmitry Smirnov
10:48 AM CephFS Bug #9977 (Resolved): cephfs-journal-tool falsely reports invalid start_ptr

This is happening when the journal expire_pos isn't at an object boundary. The expected start_ptr counter is being...
John Spray
10:03 AM CephFS Feature #1398: qa: multiclient file io test
... Anonymous
09:20 AM RADOS Bug #9911 (Rejected): ceph not placing replicas to OSDs on same host as down/out OSD
ah, it's because the vary_r tunable is false. we fixed this bug in firefly. switching to firefly tunables will reso... Sage Weil
08:35 AM Bug #9752 (Resolved): acting in past intervals contains primary and up_primary (looks like duplic...
Sage Weil
02:27 AM Bug #9752 (Fix Under Review): acting in past intervals contains primary and up_primary (looks lik...
* firefly https://github.com/ceph/ceph/pull/2847
* giant https://github.com/ceph/ceph/pull/2846
Loïc Dachary
08:28 AM devops Tasks #8366: Update ceph.com/docs to default to the latest major release (0.80)
John any updates on this? It is a bummer that we have all the infrastructure/services ready to deal with the redirect... Alfredo Deza
08:26 AM Bug #9503: Dumpling: removing many snapshots in a short time makes OSDs go berserk
Any news on the backport? Christopher Thorjussen
08:19 AM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
Merged the userspace version of this; is there a separate ticket for that? Greg Farnum
06:19 AM devops Feature #8303: ceph-extra packages for newer Ubuntu versions
Bump.. ceph-extras still not available for Trusty 14.04. David Moreau Simard
04:32 AM rgw Bug #9973 (Resolved): Validation of Swift DLO manifest object ETag doesn't match OpenStack Swift ...
The way the RGW Swift API validates the ETag on DLO manifest objects does not match the way the OpenStack Swift imple... Mike Dorman

10/30/2014

09:32 PM Bug #9971 (Duplicate): OSD crashes again after restarting due to op thread time out at writing pg...
This crashes observed when one OSD was restarted after being down for a long time, it crashed again
because its op t...
Zhi Zhang
08:39 PM Feature #9954: buffer: method to ensure an extent is contiguous
Haomai Wang wrote:
> Hmm, just a another approach.
> Maybe we can use another interface called "get_range" for the ...
Sage Weil
06:49 PM Feature #9954: buffer: method to ensure an extent is contiguous
Hmm, just a another approach.
Maybe we can use another interface called "get_range" for the same goal.
| 1M byte...
Haomai Wang
01:34 PM Feature #9954 (Resolved): buffer: method to ensure an extent is contiguous
Add a method to assure that an extent in a bufferlist is contigous. Something like
bufferlist bl;
...
char *...
Sage Weil
08:33 PM Feature #9966: librados: set user_version operation
My recollection is that we preserve them when moving objects in/out of the cache tier. I assume we want them to also... Sage Weil
06:38 PM Feature #9966: librados: set user_version operation
What's the purpose of this? User versions are "user" only in the sense that they're the versions we expose to them as... Greg Farnum
02:08 PM Feature #9966 (New): librados: set user_version operation
Sage Weil
08:26 PM Feature #9953: osd: efficient ObjectStore::Transaction encoding
haomai's slides Sage Weil
01:31 PM Feature #9953 (Resolved): osd: efficient ObjectStore::Transaction encoding
Haomai and Dong proposed a vastly improved Transaction encoding during CDS. Video is here:
https://www.youtube.co...
Sage Weil
06:33 PM Bug #9752 (Pending Backport): acting in past intervals contains primary and up_primary (looks lik...
Greg Farnum
04:53 PM Bug #9752 (Fix Under Review): acting in past intervals contains primary and up_primary (looks lik...
https://github.com/ceph/ceph/pull/2843 Loïc Dachary
02:36 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
On a cluster running from sources with... Loïc Dachary
01:52 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
... Loïc Dachary
01:44 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
Fortunately I saved an entire osd directory from which I was able to extract osdmaps with duplicates related to attac... Loïc Dachary
10:36 AM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
... Loïc Dachary
10:33 AM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
Could it be that the "the acting vector":https://github.com/ceph/ceph/blob/giant/src/osd/osd_types.h#L1391 size is no... Loïc Dachary
03:26 PM Bug #9970 (Resolved): document erasure coded pool simple operations
Move part of http://ceph.com/docs/master/dev/erasure-coded-pool/#interface to the rados operation guide and fix the i... Loïc Dachary
03:01 PM Bug #9969 (Can't reproduce): osd: crash in delete, tcmalloc, PGLog::write_log (dumpling)
... Sage Weil
02:17 PM Feature #8633 (Duplicate): allow writes before recovering a replica
see #7861 Sage Weil
02:10 PM RADOS Feature #9967 (New): rados: pool rollback
roll back an entire pool to a previous snapshot. this is O(n): we enumerate objects and call rollback() on each one. Sage Weil
02:07 PM Feature #9965 (New): rados: new import from pipe/file
- use file format from ceph_objectstore_tool and new export (#9964)
- take care to preserve snapshot state
- preser...
Sage Weil
02:05 PM Feature #9964 (Resolved): rados: new export [range] to pipe/file
- export a range of hash values (or the entire pool) to stdout (or a file).
- use the same format that ceph_objectst...
Sage Weil
02:03 PM Feature #9963 (Fix Under Review): librados: improve get_objects and get_position interfaces
The requirement is that export (or some other user) needs to be able to
#. partition the hash space into N segment...
Sage Weil
01:59 PM Feature #9962 (In Progress): osd: kill 'category' in stats and public API
Sage Weil
01:58 PM Feature #9962 (Resolved): osd: kill 'category' in stats and public API
Sage Weil
01:56 PM Feature #9961 (Resolved): osd: new MOSDClientSubOp and Reply
Discussed during CDS here:
http://pad.ceph.com/p/hammer-fixed_memory_layout
http://youtu.be/CTp4eP9kPok
Create...
Sage Weil
01:48 PM Feature #9960 (Resolved): osd: adjust hint(s) for replica vs primary writes
We should generally DONTNEED on replicas, regardless of what the client asked us to do. Sage Weil
01:48 PM Feature #9959 (Resolved): osd: pass client fadvise hints through to objecstore
Sage Weil
01:47 PM Feature #9958 (Resolved): osd: add fadvise op to Objectstore::Transaction
Add fadvise op to ObjectStore::Transaction. Mirror posix_fadvise(2).
See #9957.
Sage Weil
01:45 PM Feature #9957 (Resolved): librados: add fadvise op
Add an fadvise operation to ObjectOperation. Mirror posix_fadvise(2).
Add it right around here: https://github.co...
Sage Weil
01:45 PM Feature #9956 (Resolved): osd: reenable alloc hints if kernel is known to be safe
Sage Weil
01:42 PM Bug #9480 (Resolved): OSD is crashing while object deletion
Samuel Just
01:37 PM Feature #9955 (Resolved): osd: allow encoded bufferlist to be used in place of map<K,V> for kv APIs
This will avoid encode/reencode overhead to convert things to an STL structure. Eventually, once we pass through the... Sage Weil
01:31 PM RADOS Feature #9952: osd: smarter choice of primary to minimize recovery disruption
We currently choose the first up osd as the primary unless it is impossible to do so. But, we can do better: other o... Sage Weil
01:30 PM RADOS Feature #9952 (New): osd: smarter choice of primary to minimize recovery disruption
Sage Weil
01:29 PM Feature #7862 (In Progress): allow backfill/recovery while below min_size
Sage Weil
01:28 PM Feature #9951 (New): librados, osd: per-object scrub operation
librados operation to scrub a single object. Sage Weil
01:27 PM Feature #9950 (New): rados: add ability to read a specific replica/shard from CLI
Sage Weil
01:25 PM Feature #9949 (New): librados: add ability to read a specific replica or shard
Part of make scrub/repair work is being able to explicitly fetch any copy or shard of an object. Extend librados to ... Sage Weil
01:24 PM Feature #9948 (New): osd: add scrub result query interface
This will use the admin interface (ceph tell <pgid> ...), similar to 'ceph tell <pgid> query'. results in json. see... Sage Weil
01:24 PM Feature #9947 (New): osd: store scrub error state in kv store; clear on peering event 
Sage Weil
11:39 AM Bug #9944 (Pending Backport): objecter: pool dne checks not correct
Sage Weil
10:56 AM Bug #9944 (Fix Under Review): objecter: pool dne checks not correct
https://github.com/ceph/ceph/pull/2839
Sage Weil
09:06 AM Bug #9944 (Resolved): objecter: pool dne checks not correct
... Sage Weil
11:08 AM Bug #9942 (Won't Fix): Debian armhf packages are missing in latest repo updates for Debian in Fir...
we don't (and never have) built armhf packages for ceph.com.
we do have a bunch of armv7l hardware and did build...
Sage Weil
04:26 AM Bug #9942 (Won't Fix): Debian armhf packages are missing in latest repo updates for Debian in Fir...
I'm trying to install Ceph with ceph-deploy on a armhf cluster but it failed:
[MS0][ERROR ] RuntimeError: command ...
Jasper Siero
10:35 AM Bug #9750: pg incomplete
... Loïc Dachary
10:11 AM CephFS Feature #1398: qa: multiclient file io test
A task that implements this could be useful for testing calamari as well (I manually did some of the things needed he... Anonymous
10:08 AM CephFS Feature #1398 (In Progress): qa: multiclient file io test
Anonymous
10:04 AM Bug #9945 (Resolved): giant: MClientSession COMPAT_VERSION is 2, should be 1
yup! Sage Weil
09:52 AM Bug #9945 (Fix Under Review): giant: MClientSession COMPAT_VERSION is 2, should be 1
https://github.com/ceph/ceph/pull/2837
https://github.com/ceph/ceph/pull/2838
John Spray
09:41 AM Bug #9945 (Resolved): giant: MClientSession COMPAT_VERSION is 2, should be 1
John Spray
09:37 AM CephFS Feature #9881 (In Progress): mds: admin command to flush the mds journal
John Spray
07:55 AM Bug #9916: osd: crash in check_ops_in_flight
The crash happened with radosgw as the client, so I guess it is formed by objecter - https://github.com/ceph/ceph/blo... Guang Yang
04:36 AM Feature #9943 (In Progress): osd: mark pg and use replica on EIO from client read
Copy the below email thread and open an issue to track the enhancement.... Guang Yang
02:56 AM Bug #9941 (Rejected): rados command line crashes when trying to copy pool snapshot
We are exploring options to regularly preserve the contents of the pools backing our rados gateways. For that we crea... Daniel Schneller
12:53 AM Bug #8797: "ceph status" do not exit with python_2.7.8
Just a note that people are hitting this in fedora 21, now:
https://bugzilla.redhat.com/show_bug.cgi?id=1155335
Boris Ranto

10/29/2014

09:34 PM CephFS Feature #9940: uclient: be more robust when dealing with outstanding RADOS IO and stale caps
While in the general case it is necessary to fence clients that have become unresponsive to the MDS, this type of "so... John Spray
09:23 PM CephFS Feature #9940 (New): uclient: be more robust when dealing with outstanding RADOS IO and stale caps
If we've given IO to the Objecter and our caps go stale, we need to do something to handle it. Greg Farnum
09:06 PM CephFS Bug #1666 (Resolved): hadoop: time-related meta-data problems
We now take client timestamps for almost everything, so this should no longer be a problem and I'm closing it unless ... Greg Farnum
07:13 PM Bug #9939 (Resolved): "giant" no longer log scrub errors
Scrubbing problematic PGs no longer report found errors: there no more records of discovered errors in ... Dmitry Smirnov
02:49 PM Bug #9916 (Need More Info): osd: crash in check_ops_in_flight
how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len by... Sage Weil
02:45 PM Bug #9910 (Pending Backport): ceph_test_rados: out of order, probably due to message delay logic
Sage Weil
11:22 AM Bug #9910 (Fix Under Review): ceph_test_rados: out of order, probably due to message delay logic
https://github.com/ceph/ceph/pull/2832 Sage Weil
01:16 PM Feature #9776: try to make address sanitizer work
Ok, so the gcc version required to make this work is only a month or two old (dynamic linking bug fix). So, we're go... Samuel Just
01:11 PM Bug #9875 (Pending Backport): stuck recovering due to unfound hit_set object
Samuel Just
11:44 AM rbd Bug #9936: Exporting images larger than 2GB fails
PR: https://github.com/ceph/ceph/pull/2828 Jason Dillaman
11:43 AM rbd Bug #9936 (Resolved): Exporting images larger than 2GB fails
An lseek64 result code is copied into an int32, causing an overflow for large images. Jason Dillaman
11:37 AM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
Sorry, forgot that the majority agreement does not work with two replicas. Everything is ok now. Andrey Korolyov
10:44 AM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
Andrey Korolyov wrote:
> Can confirm placement mess on giant: I am backfilling one node from another one within two-...
Sage Weil
10:41 AM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
Can confirm placement mess on giant: I am backfilling one node from another one within two-node cluster. After today`... Andrey Korolyov
11:17 AM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
The very first error message is:... Zheng Yan
10:37 AM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
... Sage Weil
08:30 AM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
MDS cache dump at ~/jcsp/9928/cachedump.1870.mds0 on teuthology.
This was taken at around 0800 local, long after t...
John Spray
07:55 AM Linux kernel client Bug #9928 (Resolved): kernel BUG at fs/ceph/caps.c:2307!

Client's view of its operations:...
John Spray
11:04 AM CephFS Bug #9935: client: segfault on ceph_rmdir path "/"
Yes, EBUSY is what a local filesystem gives you, so that sounds right to me. John Spray
10:48 AM CephFS Bug #9935 (Resolved): client: segfault on ceph_rmdir path "/"
A segfault occurs when removing the root directory. What is the expected behavior? I think -EBUSY is what makes sense. Noah Watkins
10:00 AM Bug #9891: "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x-giant-dist...
does not appear to be a ceph issue.. either bad disk or leveldb corruption or something. lowering priority. Sage Weil
09:54 AM rgw Documentation #9934 (Closed): rgw: document backing pool capabilities and API usage
Document what RGW is capable of in terms of defining multiple backing RADOS pools and how they can be used via the S3... Sage Weil
09:52 AM rgw Feature #9933 (New): rgw: implement S3 RR (reduced redundancy) API
- mark a particular backing pool as the 'rr' one
- make RGW understand the S3 API for RR and use that backing pool f...
Sage Weil
09:51 AM rgw Feature #9932 (Resolved): rgw: map swift X-Storage-Policy header to rgw pools
This will let people use the new Swift "storage policies" API to use the preexisting RGW functionality Sage Weil
09:29 AM Subtask #9931 (New): create selinux policies for ceph-mon, ceph-osd, ceph-mds
From an internal red hat discussion:
There are probably three distinct things we need to do to get cephs and
SELi...
Sage Weil
09:27 AM Cleanup #9930 (New): gtest: update, move to submodule
the version we have is very old. update to a newer version, and possibly/probably move to a submodule. Sage Weil
05:25 AM Bug #9927: RHEL: selinux-policy-targeted rpm update triggers slow requests
Here's a solution:... Dan van der Ster
03:46 AM Bug #9927: RHEL: selinux-policy-targeted rpm update triggers slow requests
It is triggered by fixfiles -C /etc/selinux/targeted/contexts/files/file_contexts.pre restore... Dan van der Ster
03:35 AM Bug #9927 (Can't reproduce): RHEL: selinux-policy-targeted rpm update triggers slow requests
We observe slow requests while updating a server to RHEL6.6. The upgrade includes selinux-policy-targeted, which runs... Dan van der Ster
12:11 AM Bug #9919 (Resolved): tests: qa/workunits/cephtool/test.sh injectargs instability
Loïc Dachary

10/28/2014

10:52 PM Feature #9926 (Resolved): AsyncMessenger: Support kqueue interface for BSD and mac osx OS
Haomai Wang
09:14 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
... Sage Weil
09:08 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
wip-9910 Sage Weil
09:00 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
yeah, almost certain this is a bug with delayed messages. testing a fix.
ubuntu@teuthology:/a/sage-bug-9910-a/576723
Sage Weil
04:25 PM Bug #9910 (In Progress): ceph_test_rados: out of order, probably due to message delay logic
reproducing with client ms logs Sage Weil
05:35 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...

I happen to notice the issue because I happen to look at this guys pastebin. I didn't interact with him at all. N...
David Zafman
03:46 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
It is unfortunately gone... Loïc Dachary
03:00 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
First thing we want to get is an osdmap from the misbehaving epoch.
Loic: you can get the osdmap for a particular ...
Samuel Just
02:21 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
Actually, that thread is the same instance as david's. Samuel Just
02:10 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
See the thread "[ceph-users] Troubleshooting Incomplete PGs" for another instance of this (and there are several more... Greg Farnum
05:08 PM Bug #9921: msgr/osd/pg dead lock giant
https://github.com/ceph/ceph/pull/2825 Greg Farnum
04:56 PM Bug #9921 (Fix Under Review): msgr/osd/pg dead lock giant
wip-9921, totally untested. Greg Farnum
02:51 PM Bug #9921: msgr/osd/pg dead lock giant
From what I recall, none of these are simple locks to get rid of. I'm not actually sure how to go about it; even some... Greg Farnum
02:14 PM Bug #9921: msgr/osd/pg dead lock giant
SimpleMessenger lock is held by an accepting Pipe trying to replace an old Pipe:... Greg Farnum
01:50 PM Bug #9921: msgr/osd/pg dead lock giant
nvm, different deadlock Samuel Just
01:49 PM Bug #9921 (Duplicate): msgr/osd/pg dead lock giant
just kidding, this appears to be 9898 Samuel Just
11:03 AM Bug #9921 (Resolved): msgr/osd/pg dead lock giant
commit:2d6980570af2226fdee0edfcfe5a8e7f60fae615
/a/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-multi/5...
Samuel Just
03:42 PM Bug #9750: pg incomplete
I'm afraid these maps are lost... Loïc Dachary
03:22 PM Bug #9750: pg incomplete
Yeah, you'll want maps from back when the acting set was wonky. Might want to look into the past intervals code perh... Samuel Just
02:25 PM Bug #9919 (Fix Under Review): tests: qa/workunits/cephtool/test.sh injectargs instability
https://github.com/ceph/ceph/pull/2823 Loïc Dachary
09:42 AM Bug #9919 (Resolved): tests: qa/workunits/cephtool/test.sh injectargs instability
By modifying *osd_debug_drop_ping_probability = '444'* it introduces a side effect on the cluster that can create pro... Loïc Dachary
12:43 PM CephFS Bug #9900 (Duplicate): Failure in multiple_rsync (directories wrongly appear changed)
I imagine this is a dup of #9894? Greg Farnum
12:24 PM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
I bet there is another trace of this somewhere, no rcu stall, just plain NULL deref in rb_erase(). Will try to inves... Ilya Dryomov
11:36 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
Got reports of the 2nd trace (http://tracker.ceph.com/issues/5429#note-7) occuring on a kernel with the notify fixes. Josh Durgin
12:18 PM CephFS Bug #9800 (Pending Backport): client-limits test is not passing
I don't know that we need/want to try and push this in before release (although since it's all guarded inside of a br... Greg Farnum
05:29 AM CephFS Bug #9800 (Resolved): client-limits test is not passing
... John Spray
11:21 AM Bug #9920: admin socket check hang, osd appears fine
Hmm, osd.4 seems fine, not sure why the admin socket check didn't work. Samuel Just
10:00 AM Bug #9920 (Can't reproduce): admin socket check hang, osd appears fine
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-27_17:18:01-upgrade:firefly-x-giant-distro-basic-v... Yuri Weinstein
11:12 AM CephFS Bug #8255 (Fix Under Review): mds: directory with missing object cannot be removed
https://github.com/ceph/ceph/pull/2821 Zheng Yan
08:54 AM Bug #9288 (Resolved): "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vp...
Yuri Weinstein
08:52 AM rgw Bug #9866 (Resolved): "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-firefly-...
Yuri Weinstein
04:23 AM rgw Bug #9918: RGW-Swift: SubUser access permissions, does not seems to work
2014-10-28 16:43:28.776693 7f5cd87c0700 1 civetweb: 0x7f5d2c0093f0: 127.0.0.1 - - [28/Oct/2014:16:43:28 +0530] "GET ... pushpesh sharma
04:18 AM rgw Bug #9918 (Resolved): RGW-Swift: SubUser access permissions, does not seems to work
Create users and sub-users in generic development env:-
This is relevant json DS:-
{ "user_id": "user1",
"disp...
pushpesh sharma
03:58 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
2014-10-28 15:59:41.468515 7f0863fef700 20 RGWEnv::set(): HTTP_HOST: localhost:8000
2014-10-28 15:59:41.468583 7f086...
pushpesh sharma
03:58 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
able to create rados object:-
#./ceph osd pool create mypool 20 20 erasure
DEVELOPER MODE: setting PATH, PYTHONPA...
pushpesh sharma
03:56 AM rgw Bug #9917 (Won't Fix): RADOSGW: Not able to create Swift objects with erasure coded pool
ceph@Ubuntu14:~/ceph-0.86/src$ MON=3 MDS=0 RGW=1 OSD=3 ./vstart.sh -d -n -x -r
going verbose **
[./fetch_config /tm...
pushpesh sharma

10/27/2014

10:21 PM Bug #9916 (Resolved): osd: crash in check_ops_in_flight
Assertion failure:... Guang Yang
07:44 PM Bug #9915 (Resolved): osd: eviction logic reversed
commit:622c5ac Sage Weil
06:17 PM CephFS Feature #4138 (Fix Under Review): MDS: forward scrub: add functionality to verify disk data is co...
This bit at least has been isolated and put into a PR:
https://github.com/ceph/ceph/pull/2814
Greg Farnum
04:56 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
another one: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-m... Sage Weil
01:15 PM Bug #9910 (Resolved): ceph_test_rados: out of order, probably due to message delay logic
/a/samuelj-2014-10-24_23:51:24-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/571220
* commit:f7431cc...
Samuel Just
04:23 PM CephFS Bug #9870 (Resolved): kernel: not handling cap_flush_ack messages properly
Zheng Yan
03:33 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
Zheng Yan
03:25 PM rbd Bug #9391: fio rbd driver rewrites same blocks
@Mark: I have to take a look at fio for that. Is this all about sequential writes only? Do you see a different behavi... Danny Al-Gaaf
02:44 PM Bug #9891: "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x-giant-dist...
2014-10-25 18:56:23.243456 7fefdcc3e700 20 filestore dbobjectmap: seq is 485
2014-10-25 18:56:23.243559 7fefdd43f700...
Samuel Just
02:31 PM Bug #9913 (Resolved): mon: audit log entires for forwarded requests lack info
... Sage Weil
02:27 PM Bug #9912 (Won't Fix): ceph osd up # not a valid command in 0.80.7
there is no way to administratively make an osd 'up'. the daemon needs to go through it's startup procedure and join... Sage Weil
02:24 PM Bug #9912 (Won't Fix): ceph osd up # not a valid command in 0.80.7
There is a valid command for setting an osd down:... Mark Nelson
02:16 PM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
ceph -s output with an OSD down and type host:... Mark Nelson
02:11 PM RADOS Bug #9911 (Rejected): ceph not placing replicas to OSDs on same host as down/out OSD
On a 3 node firefly cluster with 6 OSDs per host and 3x replication, when noup is set and 1 OSD is marked down/out, a... Mark Nelson
01:38 PM Feature #9598: re-enable Objecter fast dispatch
Sage Weil
01:13 PM Bug #9909 (Resolved): lost_unfound test/rados tool flawed, EEXIST when putting empty object
ubuntu@teuthology:/a/samuelj-2014-10-24_23:51:24-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/571037
...
Samuel Just
01:09 PM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
ubuntu@teuthology:/a/samuelj-2014-10-24_23:51:24-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/571474/r... Samuel Just
11:31 AM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
You mean #9226 ? Anonymous
11:16 AM rgw Bug #9907 (Resolved): radosgw-admin: can't disable max_size quota
From pull request, by Dong Lei:... Yehuda Sadeh
11:05 AM Linux kernel client Feature #9906 (Resolved): Inline data support

Currently the fuse client supports CEPH_FEATURE_MDS_INLINE_DATA but the kernel client does not.
John Spray
10:28 AM CephFS Bug #9904 (Resolved): Don't crash MDS on clients sending messages with bad seq
Currently in Server::handle_client_session, we do this:... John Spray
10:14 AM CephFS Feature #9903 (Resolved): Recover lost dirfrag via data pool

[While the MDS cluster is offline and journal has been flushed if necessary]
Given that a particular dirfrag obj...
John Spray
10:10 AM Bug #9731: Ceph 0.80.6 OSD crashes
Ok, let me know what happens. Samuel Just
10:09 AM Bug #9731: Ceph 0.80.6 OSD crashes
Nothing reported from valgrind. Also haven't seen crashes lately. At this point I'm thinking the issues were corrup... Brad House
10:06 AM Feature #9902 (Duplicate): Tool for RADOS import/export pool to file

To assist with CephFS disaster recovery, provide the ability to dump an entire pool (the cephfs metadata pool) to a...
John Spray
10:00 AM Support #9901 (New): libgoogle-perftools4: tcmalloc performance regression on armhf
Just to keep track of https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=766986 Loïc Dachary
09:36 AM CephFS Bug #9900 (Duplicate): Failure in multiple_rsync (directories wrongly appear changed)

http://pulpito.ceph.com/teuthology-2014-10-24_23:08:01-kcephfs-giant-testing-basic-multi/570840/
http://pulpito.ce...
John Spray
09:34 AM rgw Bug #9892: radosgw_admin.py: failed len(out['entries']) == 0 on usage show
seem like a broken test. We write an object here:... Yehuda Sadeh
09:09 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
Seems that the slow_backend param has not been applied on the s3tests giant branch. Yehuda Sadeh
09:08 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
in latest run, still trying to copy the 100M:... Yehuda Sadeh
09:09 AM devops Bug #9747 (Resolved): ceph.spec.in will always use 95-ceph-osd-alt.rules
Loïc Dachary
08:24 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Update in run http://pulpito.front.sepia.ceph.com/teuthology-2014-10-26_18:13:01-upgrade:firefly-x-giant-distro-basic... Yuri Weinstein
06:05 AM CephFS Bug #9800: client-limits test is not passing
https://github.com/ceph/ceph/pull/2809
http://pulpito.front.sepia.ceph.com/john-2014-10-27_13:05:29-fs:recovery-wip-...
John Spray

10/26/2014

07:54 PM Bug #9895 (Duplicate): Master/giant branch: OSD deadlock during recovery
#9898 Sage Weil
11:24 AM Bug #9895 (Duplicate): Master/giant branch: OSD deadlock during recovery
Given eight-OSD, two-node cluster (node01 and node04), three mons (node01, node04, twin2). OSDs placed on node04 acts... Andrey Korolyov
04:51 PM rbd Bug #9391: fio rbd driver rewrites same blocks
Hi Guys,
This is all on the fio side. From what I remember, when you are doing sequential writes and specify mult...
Mark Nelson
03:33 PM rgw Bug #9899 (Resolved): Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-d...
Seems related to rgw and 3-upgrade-sequence/upgrade-osd-mon-mds.yaml configurations... Yuri Weinstein
02:33 PM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
Looks like the same as I reported some hours before: #9895. Please close mine or this one as a duplicate. Andrey Korolyov
12:19 PM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-10-24_21:12:40-rados-wip-sam-testing-distro-basic-multi/570144 Sage Weil
12:18 PM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
full backtrace Sage Weil
12:17 PM Messengers Bug #9898 (Resolved): osd: fast dispatch deadlock in mark_down (giant)
this is basically a dup of the issue we saw with fast dispach in the objecter, but with the osd.... Sage Weil
11:49 AM rbd Bug #9855 (Resolved): rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-...
fixed test Sage Weil
11:48 AM Linux kernel client Bug #9896: krbd: EPERM from map-snapshot-io.sh
ubuntu@teuthology:/a/teuthology-2014-10-24_23:06:01-krbd-giant-testing-basic-multi/570827 too Sage Weil
11:48 AM Linux kernel client Bug #9896 (Resolved): krbd: EPERM from map-snapshot-io.sh
... Sage Weil
11:24 AM Linux kernel client Bug #9894 (Resolved): kcephfs: rm -r left files behind
... Sage Weil
11:21 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
also
ubuntu@teuthology:/a/teuthology-2014-10-24_23:02:01-rgw-giant-distro-basic-multi/570719
ubuntu@teuthology:/a/t...
Sage Weil
11:16 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
teuthology-2014-10-24_23:02:01-rgw-giant-distro-basic-multi/570701 fails with slow_backend:true on giant.... Sage Weil
11:19 AM rgw Bug #9892 (Resolved): radosgw_admin.py: failed len(out['entries']) == 0 on usage show
... Sage Weil
08:42 AM Bug #9891 (Resolved): "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-25_18:13:01-upgrade:firefly-x-giant-distro-basic-m... Yuri Weinstein
05:03 AM Subtask #9890: mon: VIRT usage 2.4G larger than tcmalloc's VIRT stats (dumpling, centos6.3)
forgot to mention that leveldb stores for all mons are several GB large, even after compaction:... Joao Eduardo Luis
04:57 AM Subtask #9890: mon: VIRT usage 2.4G larger than tcmalloc's VIRT stats (dumpling, centos6.3)
mon.c (in quorum) is being the synchronization provider for mon.b (restarted with valgrind memcheck).
mon.c's spik...
Joao Eduardo Luis
04:39 AM Subtask #9890 (Can't reproduce): mon: VIRT usage 2.4G larger than tcmalloc's VIRT stats (dumpling...

* centos 6.3
* ceph version 0.67.11 (bc8b67bef6309a32361be76cd11fb56b057ea9d2)
* Stressing the monitors with qa...
Joao Eduardo Luis
04:26 AM Bug #9889 (Closed): mon: leveldb weirdness
Inquiries on leveldb on the monitors and weirdness sometimes associated.
This ticket is being used to track severa...
Joao Eduardo Luis

10/25/2014

11:04 AM Feature #9888: AsyncMessenger: Async event threads can shared by all AsyncMessenger
+1 Sage Weil
07:32 AM Feature #9888 (Resolved): AsyncMessenger: Async event threads can shared by all AsyncMessenger
Now, each AsyncMessenger will create "ms_async_op_threads" threads which will process incoming/outcoming connections.... Haomai Wang

10/24/2014

09:27 PM Bug #9727: 0.86 EC+ KV OSDs crashing
Not sure, I'm still waiting for crash for master branch Haomai Wang
06:10 AM Bug #9727: 0.86 EC+ KV OSDs crashing
will this be an issue solved in Giant? Kenneth Waegeman
02:38 PM Bug #9746: reconcile upstream ceph.spec.in with other ceph.spec (SuSE, EPEL, etc)
https://build.opensuse.org/package/show/home:netsroth/ceph Loïc Dachary
02:06 PM Bug #9731: Ceph 0.80.6 OSD crashes
Still no crashes under valgrind? How many osds are running under valgrind? We should probably leave it running for ... Samuel Just
01:20 PM Bug #9480 (Pending Backport): OSD is crashing while object deletion
Sage Weil
11:40 AM rgw Bug #9866: "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-firefly-distro-basi...
yuri, please close it when we get a pass on the nightlies. Tamilarasi muthamizhan
11:35 AM rgw Bug #9886 (Resolved): rgw: apache 2.4 does not send http status reason string
There's an issue with certain apache 2.4 versions, where it doesn't send back the http status reason in the response.... Yehuda Sadeh
11:34 AM rgw Bug #9878 (Pending Backport): rhel7 s3-tests fail due to missing reason
commit:a9dd4af Sage Weil
11:26 AM rbd Bug #8912: librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)
For better searchability, the backtrace for this crash is:... Josh Durgin
11:24 AM rbd Bug #9513 (Pending Backport): rbd_cache=true default setting is degading librbd performance ~10X ...
reverted the backport for now as fully fixing the ObjectCacher is too large a change close to the giant release Josh Durgin
11:14 AM CephFS Bug #9884: too many files in /usr for multiple_rsync.sh
Yeah, just cutting it down to a more predictable/smaller directory sounds good to me. Greg Farnum
10:50 AM CephFS Bug #9884: too many files in /usr for multiple_rsync.sh
one failure http://pulpito.ceph.com/teuthology-2014-10-20_23:04:01-fs-giant-distro-basic-multi/562537/ Zheng Yan
10:49 AM CephFS Bug #9884 (Closed): too many files in /usr for multiple_rsync.sh
for example, plana81 has 60k files in /usr, but plana90 has 90k files in /usr. perhaps multiple_rsync should /usr/src... Zheng Yan
10:51 AM Bug #9873 (Resolved): rados bench crash
Sage Weil
10:15 AM Bug #9873: rados bench crash
ubuntu@teuthology:/a/samuelj-2014-10-23_17:44:53-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/567665 Samuel Just
09:38 AM Bug #9873 (Fix Under Review): rados bench crash
https://github.com/ceph/ceph/pull/2795 Sage Weil
09:07 AM Bug #9873 (In Progress): rados bench crash
Sage Weil
09:53 AM CephFS Feature #3882 (Rejected): Hide snapshot directory name in mount/mtab
we can now restrict snap access by uid... Sage Weil
09:49 AM CephFS Feature #9883 (Resolved): journal-tool: smarter scavenge (conditionally update dir objects)
Sage Weil
09:42 AM CephFS Feature #9881 (Resolved): mds: admin command to flush the mds journal
Sage Weil
09:41 AM CephFS Feature #9880 (Resolved): mds: more gracefully handle EIO on missing dir object
Sage Weil
08:53 AM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
looks like #9266. Yehuda Sadeh

10/23/2014

09:35 PM rgw Bug #9878: rhel7 s3-tests fail due to missing reason
Sage Weil
06:10 PM rgw Bug #9878 (Resolved): rhel7 s3-tests fail due to missing reason
commit:a9dd4af401328e8f9071dee52470a0685ceb296b Sage Weil
06:08 PM rgw Bug #9169 (Resolved): 100-continue broken for centos/rhel
Sage Weil
04:58 PM rgw Bug #9877 (Resolved): In some cases it's possible for rgw to segfault on http COPY

on 0.80.4
-81> 2014-10-23 22:22:05.586898 7f83547f8700 1 ====== starting new request req=0x7f8368013400 ==...
Anonymous
03:03 PM Bug #9876 (Resolved): failed pull needs to allow mark_unfound_lost revert eventually
Samuel Just
01:50 PM rgw Bug #9616 (Resolved): upgrade test restarts rgw, test gets 500
Sage Weil
01:47 PM CephFS Bug #9869 (Pending Backport): Client: not handling cap_flush_ack messages properly
I tested this manually with a patch that sets the starting tid value to 65535 and looking at the logs. That causes im... Greg Farnum
01:47 PM rbd Bug #9854: librbd: reads contending for cache space can cause livelock
Reads thrashing the cache can be reproduced with:... Josh Durgin
01:44 PM Bug #9821 (Pending Backport): failed to recover before timeout expired
Sage Weil
09:41 AM Bug #9821 (Fix Under Review): failed to recover before timeout expired
Samuel Just
12:47 PM CephFS Bug #9870: kernel: not handling cap_flush_ack messages properly
Zheng Yan
12:43 PM Bug #9372: injectarg boolean option is discarded
There is a warkaround (using --), not sure it deserves backporting. Loïc Dachary
12:41 PM Bug #9372 (Resolved): injectarg boolean option is discarded
Loïc Dachary
11:38 AM rbd Feature #9733: Separate rbd listing into CAP
Is the list of OSD class methods documented somewhere? Robert LeBlanc
11:37 AM Bug #9731: Ceph 0.80.6 OSD crashes
Other details as per sjustwork on irc:
* 3-node ceph cluster, 2 OSDs per node (1ssd 1hdd). All ssds are assigned ...
Brad House
10:34 AM Bug #9731: Ceph 0.80.6 OSD crashes
backtrace from last core... Brad House
10:19 AM Bug #9731: Ceph 0.80.6 OSD crashes
Forgot to attach latest core file from the crash prior to testing with valgrind when running wip-9731 Brad House
11:30 AM Bug #9836: mon unit tests use the wrong id
Although it could be backported to giant and firefly, it does not create actual problems. Only some tests use the mon... Loïc Dachary
11:28 AM Bug #9836 (Resolved): mon unit tests use the wrong id
Loïc Dachary
09:59 AM Bug #9408 (Pending Backport): erasure-code: misalignment
It can't be easily cherry picked because the code has changed. That can happen on firefly too. Backporting would make... Loïc Dachary
09:44 AM Bug #9874: ceph_test_rados, out of order ops
- exec:
client.0:
- ceph osd pool create base 4
- ceph osd pool create cache 4
- ceph osd tier ad...
Samuel Just
08:54 AM Bug #9874 (Duplicate): ceph_test_rados, out of order ops
2014-10-22T17:06:21.115 INFO:tasks.rados.rados.0.burnupi60.stderr:Error: finished tid 3 when last_acked_tid was 7
20...
Samuel Just
09:21 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
ubuntu@teuthology:/a/samuelj-2014-10-22_14:27:22-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/566853/r... Samuel Just
09:07 AM Bug #9875 (Resolved): stuck recovering due to unfound hit_set object
The hitset creation log entries have the same version for version and prior_version. This causes divergent entry det... Samuel Just
08:50 AM Bug #9873 (Resolved): rados bench crash
2014-10-23T00:25:06.570 INFO:tasks.radosbench.radosbench.0.mira034.stderr:osdc/Objecter.cc: 3971: FAILED assert(!tick... Samuel Just
08:49 AM devops Fix #5900: Create a Python package for ceph Python bindings
https://github.com/ceph/ceph/compare/wip-5900 Loïc Dachary
04:01 AM rgw Feature #8562 (Fix Under Review): rgw: Conditional PUT on ETag
Xiangyu Lv

10/22/2014

09:15 PM Documentation #9872 (Closed): erasure-code: document the LRC per layer plugin configuration
It is possible to set the profile on a per layer basis using the low level configuration http://ceph.com/docs/master/... Loïc Dachary
06:16 PM Bug #9731: Ceph 0.80.6 OSD crashes
We don't really want leak-check, it is likely slowing down the osds more than necessary. Samuel Just
05:23 PM Bug #9731: Ceph 0.80.6 OSD crashes
so far no luck replicating this with... Sage Weil
04:45 PM Bug #9731: Ceph 0.80.6 OSD crashes
We probably want to let them run under valgrind overnight if possible. Samuel Just
03:32 PM Bug #9731: Ceph 0.80.6 OSD crashes
Right, I couldn't get 3/3 under valgrind to ever come up to a good health, probably because of the load on it. Howev... Brad House
03:27 PM Bug #9731: Ceph 0.80.6 OSD crashes
(Last I heard, 2/3 were running valgrind, cluster is healthy)
Question: what version are the clients?
Samuel Just
08:16 AM Bug #9731: Ceph 0.80.6 OSD crashes
the 3rd OSD won't join, it is now always aborting at startup. log attached. Perhaps all the starting/stopping has c... Brad House
08:01 AM Bug #9731: Ceph 0.80.6 OSD crashes
after installing wip-9731 but before running under valgrind, I received a crash at 2014-10-22 10:44:42.326583 log at... Brad House
07:51 AM Bug #9731: Ceph 0.80.6 OSD crashes
I've got ceph updated to the wip-9731, and am attempting to start the OSDs under valgrind. However, the first one ap... Brad House
05:34 PM CephFS Bug #9870 (Resolved): kernel: not handling cap_flush_ack messages properly
This is the analogue to #9869, which Zheng tells me is also a problem in the kernel. We need to downcast the message ... Greg Farnum
05:30 PM CephFS Bug #9869: Client: not handling cap_flush_ack messages properly
Waiting for this to build so it can be tested. Greg Farnum
05:28 PM CephFS Bug #9869 (Resolved): Client: not handling cap_flush_ack messages properly
We saw a log segment that contained this:... Greg Farnum
04:47 PM Fix #9566 (Fix Under Review): osd: prioritize recovery of OSDs with most work to do
Here is a draft for review: https://github.com/ceph/ceph/pull/2778 if this sounds reasonable I'll write tests. Otherw... Loïc Dachary
02:47 PM Documentation #9867 (Closed): PGs per OSD documentation needs clarification
Documentation in question:
http://ceph.com/docs/master/rados/operations/placement-groups/
http://ceph.com/docs/mast...
Michael Kidd
02:37 PM rgw Bug #9169: 100-continue broken for centos/rhel
The problem seem to be unrelated to the fastcgi module. The actual issue is that we're running the apache with mpm co... Yehuda Sadeh
01:15 PM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
I think it's monitor bug. It took about two hours to commit an update... Zheng Yan
11:06 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
let's add debug to teh test yaml so that we have logs next time? Sage Weil
10:59 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
there is no mds log or client log. but ceph.log on both burnupi58 and burnupi58 look strange... Zheng Yan
09:31 AM Bug #9864 (Can't reproduce): osd doesn't report new stats for 3 hours when running test LibCephFS...
... Sage Weil
12:53 PM Bug #9480: OSD is crashing while object deletion
Samuel Just
12:32 PM rgw Bug #9866 (Fix Under Review): "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-...
https://github.com/ceph/ceph-qa-suite/pull/209 Sage Weil
10:30 AM rgw Bug #9866 (Resolved): "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-firefly-...
Run http://pulpito.front.sepia.ceph.com/teuthology-2014-10-21_18:40:01-upgrade:firefly:older-firefly-distro-basic-vps... Yuri Weinstein
12:17 PM rbd Bug #9854 (In Progress): librbd: reads contending for cache space can cause livelock
Jason Dillaman
11:41 AM rbd Bug #9854: librbd: reads contending for cache space can cause livelock
Update:
Run teuthology-2014-10-21_23:17:01-upgrade:firefly:newer-firefly-distro-basic-vps
Job: ['565380']
Logs...
Yuri Weinstein
11:35 AM Bug #9859 (Resolved): Commit 2ac2a96 appears to break OSD creation
Sage Weil
10:43 AM Bug #9859: Commit 2ac2a96 appears to break OSD creation
Problem has been identified.
This went unnoticed as vstart.sh, even with cephx disabled, always creates a keyring,...
Joao Eduardo Luis
10:18 AM Bug #9859: Commit 2ac2a96 appears to break OSD creation
also, 2ac2a96 is the merge commit for the branch of c0e3bc9a Joao Eduardo Luis
10:11 AM Bug #9859: Commit 2ac2a96 appears to break OSD creation
Yesterday I figured as far as the monitor not handling 'MMonGetMap' messages from the OSD during mkfs because the OSD... Joao Eduardo Luis
09:59 AM Bug #9859 (In Progress): Commit 2ac2a96 appears to break OSD creation
Joao Eduardo Luis
11:29 AM rgw Bug #9865 (Resolved): "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distro-b...
pushed fix to giant and firefly branches of ceph-qa-suite Sage Weil
11:19 AM rgw Bug #9865: "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distro-basic-vps run
thrasher needs to not thrash primary affinity in this case. client connects before the primary-affinity is set so th... Sage Weil
11:10 AM rgw Bug #9865 (In Progress): "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distr...
Sage Weil
10:22 AM rgw Bug #9865 (Resolved): "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distro-b...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-21_18:40:01-upgrade:firefly:older-firefly-distro-b... Yuri Weinstein
10:49 AM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
Full logs from pastebin to survive expiration. Loïc Dachary
10:32 AM Bug #8885: SIGABRT in TrackedOp::dump() via dump_ops_in_flight()
/a/samuelj-2014-10-21_16:45:57-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/564093/remote Samuel Just
10:29 AM Bug #9675: splitting a pool doesn't start when rule_id != ruleset_id
Note that this patch will not change existing crushmaps, it will just make new rules using matching ruleset_id == rul... Loïc Dachary
10:00 AM Bug #9675 (Pending Backport): splitting a pool doesn't start when rule_id != ruleset_id
pending backports awaiting review/merge on
* dumpling: https://github.com/ceph/ceph/pull/2775
* emperor: https://...
Joao Eduardo Luis
10:05 AM Bug #9851: crash on journal/filestore shutdown on firefly
Running http://pulpito.ceph.com/loic-2014-10-22_10:04:57-upgrade:firefly-x-giant-testing-basic-vps/ which is s/branch... Loïc Dachary
09:14 AM Bug #9851: crash on journal/filestore shutdown on firefly
Running http://pulpito.ceph.com/loic-2014-10-22_20:28:41-rados:thrash-wip-9851-testing-basic-vps/ Loïc Dachary
09:09 AM Bug #9851: crash on journal/filestore shutdown on firefly
Loïc Dachary
09:05 AM Bug #9851: crash on journal/filestore shutdown on firefly
I wonder how to re-run http://pulpito.ceph.com/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-distro-basic-mu... Loïc Dachary
10:01 AM Bug #9852 (Fix Under Review): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID th...
https://github.com/ceph/ceph/pull/2773 Joao Eduardo Luis
09:46 AM rbd Bug #9857 (Resolved): rbd readahead division by zero exception
Jason Dillaman
09:45 AM rbd Bug #9857: rbd readahead division by zero exception
PR: https://github.com/ceph/ceph/pull/2770 Jason Dillaman
08:53 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
And sda1 which is the ext4 mounted disj of osd.13
Oct 22 07:42:00 stri os-prober: debug: running /usr/lib/os-probe...
Laurent GUERBY
08:45 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
Logs detailing what os-prober was doing when one of the OSD crashed, sda2 is the journal partition of osd.13 who got ... Laurent GUERBY
08:42 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
Loïc Dachary
08:25 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
Adding more complete log lines with ASSERT references
<guerby> 2014-10-22 07:42:07.369785 7f6edf0b5700 0 -- 192.1...
Laurent GUERBY
12:12 AM devops Bug #9860 (Fix Under Review): grub/os-prober launch kills most ceph OSD
h3. Workaround
Disable os-probe with ...
Laurent GUERBY
08:09 AM Bug #9858 (Rejected): osd crush rule create-erasure idempotency failure
This was a side effect of process being killed at random. It was possible to reproduce it consistently until https://... Loïc Dachary
04:10 AM Bug #5925: hung ceph_test_rados_delete_pools_parallel
this was fun though.
I'll stop with the noise now and test this with the patch from #9845.
Joao Eduardo Luis
04:08 AM Bug #5925 (Can't reproduce): hung ceph_test_rados_delete_pools_parallel
and then I read David's comments on this ticket and I felt dumb. Joao Eduardo Luis
04:06 AM Bug #5925: hung ceph_test_rados_delete_pools_parallel
My last statement about the tick even was inaccurate.
gdb tells me that 'tick_event' is still set by the time we i...
Joao Eduardo Luis
03:48 AM Bug #5925: hung ceph_test_rados_delete_pools_parallel
Hit this again while testing a mon patch. Setting to this 'Verified' again until I check with David or Sam on what t... Joao Eduardo Luis
02:58 AM Bug #9585: ceph assertion using rocksdb store in master branch
Hi Tamilarasi, it's still broken for the master branch? Give a link to the corresponding job for pulpito.ceph.com? Haomai Wang
02:56 AM Bug #9814 (Resolved): FAILED assert(0) In function 'GenericObjectMap::Header GenericObjectMap::lo...
Haomai Wang
01:58 AM Bug #9761: ceph-osd: segfault at 654c30 ip 00007f00dc5f1f07 sp 00007f00c5642e00 error 7 in ld-2.1...
No. Pavel Veretennikov

10/21/2014

09:32 PM Bug #9859: Commit 2ac2a96 appears to break OSD creation
Specifically, this is with osd creation where the monmap isn't specified (similar to how vstart does it, but not ceph... Mark Nelson
09:09 PM Bug #9859 (Resolved): Commit 2ac2a96 appears to break OSD creation
Narrowed this down through Joao's comments and bisecting to hit this commit. Not sure if this only happens under spec... Mark Nelson
06:18 PM rgw Bug #9169: 100-continue broken for centos/rhel
Running a simplified yaml, see https://gist.github.com/yuriw/1603e536ee33a28f93a4
Note: Moved clients to separate ...
Yuri Weinstein
04:53 PM rgw Bug #9169: 100-continue broken for centos/rhel
Running a simplified yaml, see https://gist.github.com/yuriw/1603e536ee33a28f93a4
Note: Moved clients to separate ...
Yuri Weinstein
10:16 AM rgw Bug #9169: 100-continue broken for centos/rhel
See the dupe #9825 for latest run info Yuri Weinstein
10:07 AM rgw Bug #9169: 100-continue broken for centos/rhel
yuri to make a minimal test case Sage Weil
05:50 PM rgw Bug #9587 (Fix Under Review): ceph-radosgw sysvinit script on EL6 cannot set ulimit
https://github.com/ceph/ceph/pull/2771
This could use a manual test as well to ensure the limit is properly set on...
Sage Weil
05:23 PM Bug #9858: osd crush rule create-erasure idempotency failure
reproduced with *while make -j8 check ; do : ; done* after ~30 minutes (i.e. ~15 runs). Loïc Dachary
05:03 PM Bug #9858 (Rejected): osd crush rule create-erasure idempotency failure
The *./ceph osd crush rule create-erasure ruleset3* command run by test/mon/osd-crush.sh sometime fails to notice the... Loïc Dachary
05:20 PM Bug #9837 (Duplicate): rbd crash when upgrading from v0.80.5 to firefly
this could be same as bug # 9288, modified the upgrade:firefly suite to NOT upgrade clients when workload is in progr... Tamilarasi muthamizhan
05:19 PM rbd Feature #9733: Separate rbd listing into CAP
It sounds like Nova is configured to use RBD as the backing store for its ephemeral disk images instead of the local ... Jason Dillaman
11:51 AM rbd Feature #9733: Separate rbd listing into CAP
OK, putting the pool argument first does work. We have consequently found out that Nova does require list permissions... Robert LeBlanc
10:54 AM rbd Feature #9733: Separate rbd listing into CAP
Try placing the "pool=test" argument before the "object_prefix XYZ" portion of the cap:... Jason Dillaman
05:16 PM Bug #9610 (Resolved): Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::Cal...
fixed in multi-version suite already - commit b966da7b71c8aee22ff8e58b3b0c105b1d7ca4bf
fixed in upgrade:firefly/ol...
Tamilarasi muthamizhan
02:06 PM Bug #9610: Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)...
New ceph_test_rados is too picky for dumpling osds. We only want to use dumpling ceph_test_rados against clusters wi... Samuel Just
02:06 PM Bug #9610: Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)...
also: ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561550
<...
Tamilarasi muthamizhan
12:53 PM Bug #9610: Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)...
seeing this on the upgrade test from v0.67.11 to firefly [v0.80.7]... Tamilarasi muthamizhan
04:54 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...

"kit" on #ceph was in a situation of having incomplete pg. They sent the pg query output and it showed strange pas...
David Zafman
04:44 PM rbd Bug #9857 (Fix Under Review): rbd readahead division by zero exception
Jason Dillaman
03:53 PM rbd Bug #9857 (In Progress): rbd readahead division by zero exception
Jason Dillaman
02:42 PM rbd Bug #9857 (Resolved): rbd readahead division by zero exception
When using old-format RBD images, the RBD readahead block alignments are initialized to zero because the stripe param... Jason Dillaman
04:07 PM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
Tamilarasi muthamizhan wrote:
> I think this issue could be related to bug # 9288, upgrading clients when workload i...
Sage Weil
02:04 PM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
I think this issue could be related to bug # 9288, upgrading clients when workload is in progress.
Tamilarasi muthamizhan
02:02 PM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
more logs:
ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561562
Tamilarasi muthamizhan
11:11 AM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
logs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561993 Tamilarasi muthamizhan
11:07 AM rbd Bug #9855 (Resolved): rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-...
On:
os_type: rhel
os_version: '6.4'
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-20_20:50:...
Yuri Weinstein
03:32 PM rgw Bug #9612: "ERROR: test suite for <module 's3tests.functional'" in multi-version-giant-testing-ba...
... Tamilarasi muthamizhan
03:24 PM rgw Bug #9612 (Rejected): "ERROR: test suite for <module 's3tests.functional'" in multi-version-giant...
that's giant rgw and dumpling osds, shouldn't work. Yehuda Sadeh
03:22 PM CephFS Feature #9557 (Fix Under Review): mds: verify backtrace on fetch_dir
Zheng Yan
10:44 AM CephFS Feature #9557 (In Progress): mds: verify backtrace on fetch_dir
Greg Farnum
03:19 PM Feature #7104: rest-api: support commands requiring 'w' cap without 'rw' cap
Please stop throwing this bug in the FS tracker just because it has the word MDS in it... Greg Farnum
02:58 PM rgw Bug #9616: upgrade test restarts rgw, test gets 500
fixed... Tamilarasi muthamizhan
02:33 PM Bug #9823 (Won't Fix): ceph-osd mkfs or ceph auth add : exit -9
It did not show up after the fix. The tests use a lot more than the default 1024 file descriptors allowed. Marking wo... Loïc Dachary
02:25 PM devops Bug #9665: ceph-disk zap should call partprobe
Running http://pulpito.ceph.com/loic-2014-10-21_14:25:31-ceph-deploy:singleton-wip-9665-ceph-disk-partprobe-testing-b... Loïc Dachary
02:19 PM Bug #9731: Ceph 0.80.6 OSD crashes
Sorry, still out sick today. Hoping to be in tomorrow. Brad House
11:10 AM Bug #9731: Ceph 0.80.6 OSD crashes
Brad House wrote:
> Sorry, I only have access during the week to the test system, and I'm out sick today. Hopefully...
Sage Weil
02:17 PM devops Bug #9807 (Duplicate): Missing radosgw packages in various upgrade suites
This was basically the same issue that was thought to be fixed and centos still had issues in issue #9824 but should ... Sandon Van Ness
02:13 PM devops Bug #9747: ceph.spec.in will always use 95-ceph-osd-alt.rules
gitbuilder was clean but got trimed Loïc Dachary
01:53 PM Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
fixed it ... Tamilarasi muthamizhan
11:17 AM Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
the job is upgrading client.0 (vpm072) in that test too.
i think
- install.upgrade:
all:
bran...
Sage Weil
01:47 PM Bug #9408: erasure-code: misalignment
Samuel Just
01:34 PM Bug #9485 (In Progress): Monitor crash due to wrong crush rule set
Did not forget about it, just busy with other things. Loïc Dachary
01:33 PM Bug #9684 (Can't reproduce): "Scrubbing terminated" in upgrade:firefly-firefly-distro-basic-multi...
no log or core Sage Weil
01:32 PM Bug #9434 (Can't reproduce): rbd rm hangs
Samuel Just
01:30 PM Bug #9702 (Duplicate): "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:fire...
probably dup of #9835 Sage Weil
01:29 PM Bug #9703 (Resolved): "Segmentation fault" in upgrade:firefly-x-giant-distro-basic-multi run
Samuel Just
01:27 PM Bug #9739 (Won't Fix): rados cli: listsnaps does not list snaps
because you haven't written to it yet! Samuel Just
01:19 PM Bug #9761: ceph-osd: segfault at 654c30 ip 00007f00dc5f1f07 sp 00007f00c5642e00 error 7 in ld-2.1...
Has this happened more than once? Samuel Just
01:18 PM Bug #9794 (Resolved): vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
checked firefly; no need to backport. Joao Eduardo Luis
01:15 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
Samuel Just
01:15 PM Bug #9794 (Resolved): vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
Samuel Just
01:11 PM Bug #9419 (Resolved): dumpling->firefly upgrade, sending setallochint?
Samuel Just
01:09 PM Bug #9649 (Can't reproduce): OSD hang in op_tp
Samuel Just
01:07 PM Bug #9559 (Resolved): ?off-by-one vulnerability?ceph-0.80.5/src/common/fd.cc dump_open_fds() func...
Samuel Just
11:43 AM CephFS Bug #8809 (Can't reproduce): uclient: memory leak
maybe fixed by 2313ce1d024361fd7f4d2cbca789010f0fe0faad Zheng Yan
11:34 AM Bug #9856 (Duplicate): osd crashed after upgrade from v0.80.5 to firefly
#9851 Sage Weil
11:26 AM Bug #9856: osd crashed after upgrade from v0.80.5 to firefly
more jobs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561999
...
Tamilarasi muthamizhan
11:23 AM Bug #9856 (Duplicate): osd crashed after upgrade from v0.80.5 to firefly
osd crashed after upgrading from ceph v0.80.5 to firefly and during thrashing,
logs: ubuntu@teuthology:/a/teutholo...
Tamilarasi muthamizhan
11:10 AM Linux kernel client Bug #9507 (Resolved): calling llistxattr(2) on a symlink crashes the client
Zheng Yan
10:55 AM CephFS Bug #9674: nightly failed multiple_rsync.sh
commit:477073aba1da880dfd0b8c82f4792788579f28b9 in master and commit:44ce33c12443909b02c7ee451ad45400f55d53c9 in giant Greg Farnum
10:38 AM Bug #9845 (Resolved): hung ceph_test_rados_delete_pools_parallel
Sage Weil
12:59 AM Bug #9845 (Fix Under Review): hung ceph_test_rados_delete_pools_parallel
David Zafman
12:48 AM Bug #9845 (Resolved): hung ceph_test_rados_delete_pools_parallel
... David Zafman
09:57 AM rgw Bug #9575 (Duplicate): s3tests.functional.test_s3.test_region_copy_object fails (races with rados...
Sage Weil
09:43 AM rgw Bug #3896 (Resolved): rest-bench common/WorkQueue.cc: 54: FAILED assert(_threads.empty())
Sage Weil
09:42 AM rgw Bug #1673 (Won't Fix): rgw: mod_fastcgi needs to be backward compatible
Sage Weil
09:41 AM rgw Bug #8251: radosgw-agent does not sync objects uploaded to recreated buckets
closed and obsolete : https://github.com/ceph/ceph/pull/2765 Sage Weil
09:40 AM rgw Bug #8550 (Resolved): rgw: need to reduce calls to rgw_obj.set_obj()
Sage Weil
09:38 AM rgw Bug #9043 (Duplicate): rgw:Cannot add object to Ceph using Openstack Dashboard(Horizon) in firefly
Sage Weil
09:31 AM rgw Bug #9525 (Duplicate): Deleted object shows in object listing
Sage Weil
09:29 AM rgw Bug #9576 (Fix Under Review): rgw: update object content-length doesn't work correctly
Sage Weil
09:27 AM rgw Bug #9500 (Duplicate): 0.80.5 on CentOS 6.5: radosgw-admin fails to correctly name subuser object
Sage Weil
09:27 AM rgw Bug #9500: 0.80.5 on CentOS 6.5: radosgw-admin fails to correctly name subuser object
unlikely to be ubuntu vs centos. this looks like #8587 or releated issues (pending backport to firefly) Sage Weil
09:25 AM rgw Bug #9469 (Rejected): RadosGW performance degrades with high concurrency workload.
please send an email about this to ceph-devel; that is a better forum to discuss performance issues. Sage Weil
09:23 AM rgw Bug #9543 (Rejected): AssertionError(s) in upgrade:dumpling-dumpling-distro-basic-vps run
Sage Weil
09:23 AM rgw Bug #9588 (Rejected): Keystone s3 auth integration lacking access_key = tenant:user ability suppo...
thanks Mark! Sage Weil
09:21 AM rgw Bug #9766 (Rejected): s3tests: test_100_continue failing
this is almost certainly a configuration error. need rgw print continue = true and patched mod_fastcgi Sage Weil
09:20 AM rgw Bug #9002 (Duplicate): Creating swift key with --gen-secret in separate step from subuser creatio...
#8587 Sage Weil
09:19 AM rgw Bug #8676 (Duplicate): md5sum check failed during readwrite.py
this appears to be resolved by #9307 Sage Weil
09:17 AM rgw Bug #9307 (Resolved): "s3.test_multipart_upload_multiple_sizes ... ERROR" in upgrade:dumpling-fir...
Sage Weil
09:17 AM rgw Bug #9307: "s3.test_multipart_upload_multiple_sizes ... ERROR" in upgrade:dumpling-firefly-x-mast...
Sage Weil
09:16 AM rbd Bug #9854 (Resolved): librbd: reads contending for cache space can cause livelock
As a result of accounting for reads properly with #9513. Using qemu-io (a test program) is one way to trigger this - ... Josh Durgin
09:13 AM rgw Bug #9039 (Resolved): Using COPY on radosgw to copy object from one bucket to another that's in a...
Yehuda Sadeh
09:07 AM Bug #9675 (In Progress): splitting a pool doesn't start when rule_id != ruleset_id
Joao Eduardo Luis
09:06 AM rbd Bug #9513 (Resolved): rbd_cache=true default setting is degading librbd performance ~10X in Giant
backported in commit:65be257e9295619b960b49f6aa80ecdf8ea4d16a Josh Durgin
09:04 AM Bug #9813 (Resolved): cryptopp dependency missing for deb-based systems
Sage Weil
08:45 AM Bug #9813: cryptopp dependency missing for deb-based systems
Already addressed by [1], cheers!
[1] https://github.com/ceph/ceph/pull/2761
Federico Gimenez Nieto
08:54 AM Bug #9853 (Duplicate): coredump in upgrade:firefly-x-giant-distro-basic-vps run
#9851 Sage Weil
08:21 AM Bug #9853 (Duplicate): coredump in upgrade:firefly-x-giant-distro-basic-vps run
On:
os_type: rhel
os_version: '6.5'
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-20_15...
Yuri Weinstein
08:52 AM rgw Bug #9825 (Duplicate): s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel...
this is #9169 Sage Weil
08:33 AM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
In the run teuthology/teuthology-2014-10-20_15:01:14-upgrade:firefly-x-giant-distro-basic-vps , jobs ['560024', '5600... Yuri Weinstein
08:12 AM Bug #9073 (Pending Backport): OSD with device/partition journals down after fresh deploy or upgra...
Loïc Dachary
07:50 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
https://github.com/ceph/ceph/pull/2764 is a better fix. The isolated patch made sense to me at the time but it looks ... Loïc Dachary
08:09 AM Linux kernel client Bug #9561 (Rejected): libceph: do not crash if auth reply is not understood
I believe the code is correct as is and I misdiagnosed the original issue. :( Sage Weil
08:09 AM Linux kernel client Bug #9560 (Rejected): libceph: msg kmalloc failure handling on the reply path
I believe the code is correct as is and I misdiagnosed the original issue. Sage Weil
07:35 AM Bug #9852 (Resolved): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID that DNE
... Joao Eduardo Luis
06:58 AM Bug #9851 (Fix Under Review): crash on journal/filestore shutdown on firefly
https://github.com/ceph/ceph/pull/2764 Sage Weil
06:42 AM Bug #9851 (Resolved): crash on journal/filestore shutdown on firefly
saw this on several runs, e.g. /var/lib/teuthworker/archive/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-di... Sage Weil
01:48 AM devops Bug #9840: Monitor hung when add new osd
using the valgrin find errors:
==17554== Thread 7:
==17554== Invalid read of size 4
==17554== at 0x3168A0C380: ...
qiu shanggao
12:50 AM Bug #5925 (Can't reproduce): hung ceph_test_rados_delete_pools_parallel

Filed #9845 to describe the recent occurence. This bug was probably something else, so I'm setting it back to "Can...
David Zafman

10/20/2014

11:51 PM Bug #5925: hung ceph_test_rados_delete_pools_parallel
I don't think this would have happened when safe_callbacks was true. It was set to false in a fix for #9582. See al... David Zafman
07:02 PM Bug #5925: hung ceph_test_rados_delete_pools_parallel

Yup, there is a shutdown race. Thread 1 is waiting for the timer thread while holding the Objecter::rwlock in writ...
David Zafman
11:22 PM Fix #9834 (Rejected): osd_scrub_load_threshold should be checked during scrubbing
Loïc Dachary
10:14 AM Fix #9834: osd_scrub_load_threshold should be checked during scrubbing
I'm not sure this is something we should do. We attempt to schedule scrubs during periods of low disk usage, but if t... Greg Farnum
07:25 AM Fix #9834 (Rejected): osd_scrub_load_threshold should be checked during scrubbing
"osd_scrub_load_threshold":https://github.com/ceph/ceph/blob/firefly/src/common/config_opts.h#L515 is "considered":ht... Loïc Dachary
11:20 PM Bug #9844 (Won't Fix): "initiating reconnect" (log) race; crash of multiple OSDs (domino effect)
On 0.87 I watch "ceph osd tree" and notice that one OSD (leveldb/keyvaluestore-dev) is "down".
In its log I see
...
Dmitry Smirnov
10:45 PM Bug #9356 (Resolved): ceph_test_rados_striper_api_aio Segmentation faults
https://github.com/ceph/ceph/pull/2419 Loïc Dachary
09:38 PM Bug #9839 (Rejected): ErasureCodePluginSelectJerasure: generic plugin : abort
... Loïc Dachary
03:23 PM Bug #9839 (Need More Info): ErasureCodePluginSelectJerasure: generic plugin : abort
Loïc Dachary
03:23 PM Bug #9839: ErasureCodePluginSelectJerasure: generic plugin : abort
When trying to run manually *ceph-osd -i 0* it hangs at the same point. Loïc Dachary
03:02 PM Bug #9839 (Rejected): ErasureCodePluginSelectJerasure: generic plugin : abort
It fails when pre-loading the plugin in a context where erasure-code is not used.
http://pulpito.ceph.com/teutholo...
Loïc Dachary
06:27 PM devops Bug #9840: Monitor hung when add new osd
try again, monitor hung still
Thread 25 (Thread 0x7f93e5ec0700 (LWP 13652)):
#0 0x0000003168a0b5bc in pthread_co...
qiu shanggao
06:17 PM devops Bug #9840 (Rejected): Monitor hung when add new osd
ceph version: 0.80.6
Platform: Redhat RHLS 6.5
we want to test the replace disk case,
operator step:
1. ceph o...
qiu shanggao
06:02 PM Bug #9419: dumpling->firefly upgrade, sending setallochint?
This is done an a new case was added - PR https://github.com/ceph/ceph-qa-suite/pull/198 Yuri Weinstein
06:01 PM Feature #9568: Add test case to test #9419 (ceph wip-9419)
This is done an a new case was added - PR https://github.com/ceph/ceph-qa-suite/pull/198 Yuri Weinstein
02:14 PM Feature #9568: Add test case to test #9419 (ceph wip-9419)
this seems to require clients upgraded first running workloads against upgraded monitors and mixed versions of osds, ... Tamilarasi muthamizhan
04:14 PM rbd Feature #9733: Separate rbd listing into CAP
OK, so one more question. This looks like it allows access to any pool. Is there a way to limit this to a particular ... Robert LeBlanc
03:04 PM Bug #9389 (Duplicate): ec pg stuck peering, did not send query for one shard
Samuel Just
03:04 PM Bug #9822 (Resolved): failed to become clean before timeout expired
Samuel Just
02:29 PM Bug #9822: failed to become clean before timeout expired
Samuel Just
02:18 PM Bug #9821: failed to recover before timeout expired
in wip-sam-testing Samuel Just
02:09 PM Bug #9821: failed to recover before timeout expired
working on patch Samuel Just
01:55 PM Bug #9835 (Fix Under Review): osd: bug in misdirected op checks (firefly)
https://github.com/ceph/ceph/pull/2760 Sage Weil
10:12 AM Bug #9835: osd: bug in misdirected op checks (firefly)
Maybe we need to adjust how we're handling waiting_for_pg, but I don't think that this particular check is a bug — th... Greg Farnum
09:33 AM Bug #9835 (Resolved): osd: bug in misdirected op checks (firefly)
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-distro-basic-mu... Sage Weil
01:32 PM Bug #9806: Objecter: resend linger ops on split
Josh Durgin
01:23 PM CephFS Feature #414 (Resolved): ceph-fuse: implement file locking
Zheng Yan
01:22 PM CephFS Bug #8576: teuthology: nfs tests failing on umount
teuthology commit:4f2957c42d0f76a399cb26c660ede9243c095779 runs those commands as well as the previous ones. Greg Farnum
01:02 PM CephFS Bug #9679 (Closed): Ceph hadoop terasort job failure
Fixed in cephfs-hadoop repo. Noah Watkins
11:31 AM Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
logs: ubuntu@teuthology:/a/teuthology-2014-10-17_23:30:01-upgrade:firefly:newer-firefly-distro-basic-vps/555356
<p...
Tamilarasi muthamizhan
11:20 AM Bug #9288 (New): "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
this seems to look different from bug # 9040. Tamilarasi muthamizhan
11:29 AM Bug #9837: rbd crash when upgrading from v0.80.5 to firefly
... Tamilarasi muthamizhan
11:27 AM Bug #9837 (Duplicate): rbd crash when upgrading from v0.80.5 to firefly
logs: ubuntu@teuthology:/a/teuthology-2014-10-17_23:30:01-upgrade:firefly:newer-firefly-distro-basic-vps/555359
<p...
Tamilarasi muthamizhan
11:22 AM Bug #9836 (Fix Under Review): mon unit tests use the wrong id
https://github.com/ceph/ceph/pull/2759 Loïc Dachary
11:19 AM Bug #9836: mon unit tests use the wrong id
It impacts
* "osd-erasure-code-profile.sh":https://github.com/ceph/ceph/blob/giant/src/test/mon/osd-erasure-code-...
Loïc Dachary
11:13 AM Bug #9836 (Resolved): mon unit tests use the wrong id
the mon id is incorrect for mon tests using "the call_TEST_functions":https://github.com/ceph/ceph/blob/firefly/src/t... Loïc Dachary
11:15 AM CephFS Bug #9800: client-limits test is not passing

Same failure:
http://pulpito.front.sepia.ceph.com/teuthology-2014-10-17_23:04:02-fs-giant-distro-basic-multi/555...
John Spray
11:07 AM Linux kernel client Bug #9458 (Resolved): client wrongly fenced
Zheng Yan
11:06 AM Linux kernel client Bug #1513 (Resolved): kclient: cap migration can race with cap addition on client
now cap import/export are ordered.
(commit 186e4f7a4b1883f3f46aa15366c0bcebc28fdda7, 4ee6a914edbbd2543884f0ad7d58ea4...
Zheng Yan
10:46 AM Bug #9820 (Resolved): mon connection hang on cephtool/test.sh
Sage Weil
10:38 AM Bug #9372: injectarg boolean option is discarded
"fails on precise":http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-precise-i386-basic/log.cgi?log=14ed21f9ad... Loïc Dachary
10:06 AM Bug #9826 (Rejected): ceph osd crush rule ls should use the pending crush, if any
Loïc Dachary
08:59 AM Bug #9826: ceph osd crush rule ls should use the pending crush, if any
... Loïc Dachary
09:18 AM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
I am wondering if it's related to changes in s3tests? Yuri Weinstein
08:41 AM Bug #9819 (Won't Fix): EBUSY during scrub
this is expected and harmless. we just report the failure and move it. it happens when paxos is busy when we reques... Sage Weil
08:38 AM Bug #9731: Ceph 0.80.6 OSD crashes
Sorry, I only have access during the week to the test system, and I'm out sick today. Hopefully I'll be able to cont... Brad House
04:02 AM rgw Feature #8562: rgw: Conditional PUT on ETag
Closed the previous out-of-synced PR and submitted a new one: https://github.com/ceph/ceph/pull/2756 Xiangyu Lv
01:38 AM rgw Feature #8562: rgw: Conditional PUT on ETag
Here is a PR for discussion purpose: https://github.com/ceph/ceph/pull/2755
We may need to elaborate a bit on it aft...
Xiangyu Lv
03:46 AM Bug #9816: mon exits unexpectedly and gracefully
just a hunch: feels like you're capturing only stdout from the monitor, and the monitor may have hit the 'mon data av... Joao Eduardo Luis
01:42 AM Linux kernel client Bug #9749: kcephfs: kernel divide-by-zero crash in __validate_layout (fs/ceph/ioctl.c)
I guess we are just not used to doing it - I think we haven't filed any CVEs for ceph kernel bits (and kcephfs in par... Ilya Dryomov

10/19/2014

08:29 PM Bug #9731: Ceph 0.80.6 OSD crashes
Any update? Sage Weil
07:20 PM CephFS Bug #9341 (Pending Backport): MDS: very slow rejoin
Hmm, we didn't put this in Giant initially because we were trying not to perturb it. Master hasn't been run through t... Greg Farnum
06:45 PM CephFS Bug #9341 (Fix Under Review): MDS: very slow rejoin
Please include this fix to 0.87 which is affected just as badly as 0.80.x.
On 0.87 MDS stuck in "rejoin" for hours a...
Dmitry Smirnov
07:13 PM Bug #9826 (Fix Under Review): ceph osd crush rule ls should use the pending crush, if any
Loïc Dachary
07:13 PM Bug #9826: ceph osd crush rule ls should use the pending crush, if any
https://github.com/ceph/ceph/pull/2754 Loïc Dachary
07:07 PM Bug #9826 (Rejected): ceph osd crush rule ls should use the pending crush, if any
The following is racy:... Loïc Dachary
05:03 PM Bug #9823: ceph-osd mkfs or ceph auth add : exit -9
Maybe it runs out of file descriptors because of the // runs. Since the erasure code test is the one using the more d... Loïc Dachary
03:12 PM Bug #9823: ceph-osd mkfs or ceph auth add : exit -9
The error matching the mon log is different: auth add exits with -9 instead of mkfs.... Loïc Dachary
03:02 PM Bug #9823: ceph-osd mkfs or ceph auth add : exit -9
it was reproduced with a change to the script to keep the logs. Loïc Dachary
12:53 PM Bug #9823 (Won't Fix): ceph-osd mkfs or ceph auth add : exit -9
While running src/test/erasure-code/test-erasure-code.sh in a loop, the following happened. The -9 exit code suggests... Loïc Dachary
04:27 PM Bug #9796: osd: crash on blacklisted watcher reconnect (dumpling)
Observed similar crash in suite:upgrade:dumpling
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:00...
Yuri Weinstein
04:09 PM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
Same problems:
suite:upgrade:dumpling-x
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-17_19:13:01-up...
Yuri Weinstein
03:19 PM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
Log for rhel 6.5 job - http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:15:02-upgrade:dumpling-firefly-x:... Yuri Weinstein
03:18 PM rgw Bug #9825 (Duplicate): s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel...
Looks similar to #9763
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:15:02-upgrade:dump...
Yuri Weinstein
02:26 PM Linux kernel client Bug #9749: kcephfs: kernel divide-by-zero crash in __validate_layout (fs/ceph/ioctl.c)
This bug appears to be exploitable by unprivileged local users and will cause a machine-wide DoS. Is there some reaso... David Ramos
10:17 AM Bug #9822 (Resolved): failed to become clean before timeout expired
logs: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553345... Tamilarasi muthamizhan
10:15 AM Bug #9821: failed to recover before timeout expired
ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553255 Tamilarasi muthamizhan
10:11 AM Bug #9821 (Resolved): failed to recover before timeout expired
logs: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553125... Tamilarasi muthamizhan
09:44 AM Feature #9817 (Fix Under Review): display X.XX deep-scrub starts
https://github.com/ceph/ceph/pull/2752 Loïc Dachary
08:15 AM Feature #9817 (Resolved): display X.XX deep-scrub starts
It would be convenient to have a message in the logs when deep-scrub starts... Loïc Dachary
09:40 AM Bug #9820 (Resolved): mon connection hang on cephtool/test.sh
log: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553035... Tamilarasi muthamizhan
09:28 AM Bug #9819 (Won't Fix): EBUSY during scrub
logs: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/552986... Tamilarasi muthamizhan
08:40 AM Bug #9818 (Resolved): ENXIO qa/workunits/cephtool/test.sh:test_osd_bench
It looks like the OSD crashed but there is no more information than the following log at the moment. It was created w... Loïc Dachary
08:12 AM Bug #9816 (Can't reproduce): mon exits unexpectedly and gracefully
... Loïc Dachary

10/18/2014

09:25 PM Bug #9814 (Fix Under Review): FAILED assert(0) In function 'GenericObjectMap::Header GenericObjec...
https://github.com/ceph/ceph/pull/2710 Haomai Wang
05:10 PM Bug #9814 (Resolved): FAILED assert(0) In function 'GenericObjectMap::Header GenericObjectMap::lo...
LevelDB-based OSD (i.e. "keyvaluestore-dev") crashed as follows on 0.87 during backfill:... Dmitry Smirnov
07:54 PM Feature #9815 (Fix Under Review): run make check in parallel
https://github.com/ceph/ceph/pull/2750 Loïc Dachary
05:46 PM Feature #9815 (Resolved): run make check in parallel
Individual tests run by make check may bind fixed ports or use identical files or subdirectories to store temporary d... Loïc Dachary
05:06 PM Bug #9744: cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
Sage Weil wrote:
> this happens when clocks are very skewed.
Are we OK with such vulnerability that allow to brin...
Dmitry Smirnov
03:27 AM Bug #9813 (Resolved): cryptopp dependency missing for deb-based systems
Hi, when following [1] from a trusty64 box I've noticed that the libcrypto++-dev entry is missing from deps.deb.txt. ... Federico Gimenez Nieto

10/17/2014

08:30 PM Bug #9810 (Duplicate): dout_emergency is silenced in ceph-osd
"ceph-osd closes stderr":https://github.com/ceph/ceph/blob/giant/src/ceph_osd.cc#L499 and this may be the reason why ... Loïc Dachary
05:03 PM Bug #9809 (Rejected): common/perf_counters.cc: 105: FAILED assert(idx < m_upper_bound)
I changed the code and introduced the problem and then forgot I changed the code. Reverting the change fixes the prob... Loïc Dachary
04:50 PM Bug #9809 (Rejected): common/perf_counters.cc: 105: FAILED assert(idx < m_upper_bound)
Steps to reproduce
* modify vstart.sh with ...
Loïc Dachary
04:27 PM Bug #9808 (Rejected): PG stuck in active+undersized+degraded+remapped+backfill_toofull
The disk was 90% full ... hence the block. Loïc Dachary
04:21 PM Bug #9808: PG stuck in active+undersized+degraded+remapped+backfill_toofull
The "scheduled":https://github.com/ceph/ceph/blob/giant/src/osd/PG.cc#L5674 "RequestBackfill":https://github.com/cep... Loïc Dachary
04:13 PM Bug #9808 (Rejected): PG stuck in active+undersized+degraded+remapped+backfill_toofull
Steps to reproduce
* modify vstart.sh with ...
Loïc Dachary
04:21 PM Bug #9731: Ceph 0.80.6 OSD crashes
Just to check, there isn't anything interesting in dmesg, right? Samuel Just
03:07 PM Bug #9731: Ceph 0.80.6 OSD crashes
Oh, and the
--00:00:06:05.108 2312-- WARNING: unhandled syscall: 306
--00:00:06:05.108 2312-- You may be able to ...
Samuel Just
02:45 PM Bug #9731: Ceph 0.80.6 OSD crashes
Looks like in our testing we invoke valgrind as:
valgrind --suppressions=<suppression_file> --num-callers=50 --xml...
Samuel Just
02:14 PM Bug #9731: Ceph 0.80.6 OSD crashes
wheezy gitbuilder should be working. Samuel Just
02:13 PM Bug #9731: Ceph 0.80.6 OSD crashes
yeah, -f I think. Samuel Just
01:58 PM Bug #9731: Ceph 0.80.6 OSD crashes
valgrind appears to detach from the console when running with ceph-osd, is there some other flag I need to pass to ce... Brad House
01:55 PM Bug #9731: Ceph 0.80.6 OSD crashes
Sage also pushed a wip-9731 based on 0.80.7 with a piece of debugging which would be handy. Reproducing with that wo... Samuel Just
09:23 AM Bug #9731: Ceph 0.80.6 OSD crashes
Brad House wrote:
> sure, just tell me the best command line to us as I haven't ever tried to run ceph-osd outside o...
Sage Weil
03:22 PM Bug #9788 (Rejected): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeou...
Two osds, both on mira076 timed out:
osd5: a stat in the op_tp took 3 minutes (completed, surprisingly, right before...
Samuel Just
03:03 PM devops Bug #9807: Missing radosgw packages in various upgrade suites
looks like we are hitting a lot of failures in upgrade tests because of this issue. Tamilarasi muthamizhan
03:01 PM devops Bug #9807 (Duplicate): Missing radosgw packages in various upgrade suites
In teuthology-2014-10-16_19:00:01-upgrade:dumpling-x-firefly-distro-basic-vps... Yuri Weinstein
02:57 PM Bug #9220 (Resolved): objecter doesn't reconnect watch on interval change w/ same primary
This did not need backporting to dumpling after all, since it was broken after dumpling by commit:860d72770cdf092c027... Josh Durgin
11:13 AM Bug #9220 (Pending Backport): objecter doesn't reconnect watch on interval change w/ same primary
Josh Durgin
11:20 AM Bug #9806 (Resolved): Objecter: resend linger ops on split
Otherwise, we can lose notifies.
commit:cb9262abd7fd5f0a9f583bd34e4c425a049e56ce
Samuel Just
10:50 AM Bug #9419: dumpling->firefly upgrade, sending setallochint?
Samuel Just
10:49 AM Bug #9419: dumpling->firefly upgrade, sending setallochint?
next step is to add a tests for this to the upgrade suties. Samuel Just
10:43 AM Bug #9073 (Resolved): OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Samuel Just
10:39 AM Bug #9614 (Pending Backport): PG stuck with remapped
Samuel Just
10:38 AM Bug #9718 (Pending Backport): osd_types: check_new_interval: min_size check needs to consider CRU...
Samuel Just
10:32 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
ubuntu@teuthology:/a/samuelj-2014-10-15_20:19:09-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/551397/r... Samuel Just
09:33 AM Documentation #9804 (Resolved): kvm and qemu do not document ceph/rbd support
* looking for ceph or rbd in http://www.linux-kvm.org/page/Special:Search?search=ceph&go=Go : zero match
* on qemu.o...
Loïc Dachary
09:10 AM Bug #6756 (Fix Under Review): journal full hang on startup
https://github.com/ceph/ceph/pull/2745
(rebased and retested old patch)
Sage Weil
07:48 AM Bug #9729 (Resolved): "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x:paralle...
Yuri Weinstein
07:47 AM rbd Bug #9642 (Resolved): Errors in test_rbd.test_* tests in upgrade:dumpling-firefly-x:parallel-gian...
Yuri Weinstein
07:46 AM rbd Bug #9642: Errors in test_rbd.test_* tests in upgrade:dumpling-firefly-x:parallel-giant-distro-ba...
Fixed, tests passed on bare metal.
Last results - http://pulpito.front.sepia.ceph.com/teuthology-2014-10-16_17:10:01...
Yuri Weinstein
05:16 AM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
I confirm that
* the problem can be reproduced 100% of the time on my laptop,
* that cherry-pick c84a13ae87eed555...
Loïc Dachary
04:17 AM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
Loic, try this patch with the same conditions in which you triggered it: c84a13ae87eed5550bafda394d983a8e843cc08c
...
Joao Eduardo Luis
01:52 AM Feature #9802 (New): When replaced a disk, the CRUSH weight of the related host changed
In disk replacement test, when add a disk into cluster. The osd tree likes
below:...
Jingjing Zhao

10/16/2014

10:27 PM Bug #9801 (Won't Fix): ceph 0.80.7 build rpm packages in centos 7 error
ceph 0.80.7 build rpm packages in centos 7 error... wei li
06:30 PM Bug #8629: cache_evict needs to prevent make_writeable from creating a snapdir
https://github.com/ceph/ceph/pull/2737 Sage Weil
05:24 PM Fix #9566 (In Progress): osd: prioritize recovery of OSDs with most work to do
Loïc Dachary
05:11 PM Fix #9566: osd: prioritize recovery of OSDs with most work to do
Related commits:
* "osd: prioritize backfill based on *how* degraded":https://github.com/ceph/ceph/commit/0985ae71bc...
Loïc Dachary
05:04 PM Bug #9769 (Resolved): upgrade/firefly: latest_dumpling_release.yaml always fails
Sage Weil
10:56 AM Bug #9769: upgrade/firefly: latest_dumpling_release.yaml always fails
It fixed, testing now, here is the run passed:... Yuri Weinstein
04:59 PM Bug #9765 (Duplicate): CachePool flush -> OSD Failed
I'm pretty sure this is because #8629 has not yet been backported to firefly. It should be in 0.80.8. I'll prepare ... Sage Weil
05:48 AM Bug #9765: CachePool flush -> OSD Failed
The 'forward' mode means we will modify cached objects in place but forward any 'misses'. It is also possible that t... Sage Weil
04:58 PM Bug #9731: Ceph 0.80.6 OSD crashes
sure, just tell me the best command line to us as I haven't ever tried to run ceph-osd outside of the standard init s... Brad House
04:52 PM Bug #9731: Ceph 0.80.6 OSD crashes
Would it be possible to run the osds in question under valgrind? Samuel Just
01:49 PM Bug #9731: Ceph 0.80.6 OSD crashes
core file for last crash as requested by Samuel Just Brad House
01:38 PM Bug #9731: Ceph 0.80.6 OSD crashes
Samuel Just
12:48 PM Bug #9731: Ceph 0.80.6 OSD crashes
Another crash from another node, this time with debug increased. Will attach log, here is the backtrace from gdb:
<...
Brad House
10:42 AM Bug #9731: Ceph 0.80.6 OSD crashes
Another backtrace from a different machine, definitely different:... Brad House
10:33 AM Bug #9731: Ceph 0.80.6 OSD crashes
backtrace from last core file:... Brad House
10:02 AM Bug #9731: Ceph 0.80.6 OSD crashes
Can you reproduce with
debug osd = 20
debug ms = 1
debug filestore = 20
?
Samuel Just
06:47 AM Bug #9731: Ceph 0.80.6 OSD crashes
0.80.7 segfault core file and log. Happened immediately at startup after rebooting after update. Brad House
06:42 AM Bug #9731: Ceph 0.80.6 OSD crashes
I just upgraded to 0.80.7, and got a crash on startup of one of my OSDs. I'll grab the log and core dump and attach ... Brad House
04:04 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
Reverting to 128 PG on master makes the problem disapear. 92 PG also works. 64 PG fails. Loïc Dachary
03:43 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
... Loïc Dachary
01:54 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
It works on v0.85, bissecting Loïc Dachary
01:46 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
reproduced on a fresh ubuntu 14.04 with v0.86-408-gad2514d Loïc Dachary
02:59 PM Feature #9799: ceph tell {daemon}.{id} config set etc.
Two things to consider:
The authentication model is pretty different for a network connection to the daemon vs. a ...
Dan Mick
01:19 PM Feature #9799 (Resolved): ceph tell {daemon}.{id} config set etc.
It would be nice to be able to send asok commands to a daemon using ceph tell instead of login in the machine and usi... Loïc Dachary
02:19 PM Bug #9729 (Fix Under Review): "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x...
Yuri Weinstein
02:19 PM Bug #9729: "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x:parallel-giant-dis...
Backport to master - https://github.com/ceph/ceph-qa-suite/pull/195 Yuri Weinstein
10:59 AM Bug #9729: "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x:parallel-giant-dis...
Passed on nightlies:
http://pulpito.front.sepia.ceph.com/teuthology-2014-10-15_17:10:01-upgrade:dumpling-firefly-x...
Yuri Weinstein
01:54 PM CephFS Bug #9800 (Resolved): client-limits test is not passing
/a/teuthology-2014-10-13_23:04:01-fs-giant-distro-basic-multi/547170
The client isn't dropping its caps:...
Greg Farnum
01:15 PM rbd Bug #9595 (Resolved): librbd: internal methods can operate on extra objects when non-default stri...
commit:7b66ee4928d934d684b361602de783b927988503 Josh Durgin
10:50 AM CephFS Feature #4137: MDS: Implement a forward-scrubbing mechanism.
I realized today that we probably want to optionally scrub directories that were renamed into place following a scrub... Greg Farnum
09:11 AM Bug #9675: splitting a pool doesn't start when rule_id != ruleset_id
See also the ceph-user thread "NO pg created for erasure-coded pool" where rule_id != ruleset on firefly. Loïc Dachary
05:57 AM Bug #9796 (Won't Fix): osd: crash on blacklisted watcher reconnect (dumpling)
... Sage Weil
 

Also available in: Atom