Activity
From 10/16/2014 to 11/14/2014
11/14/2014
- 10:46 PM Bug #10115: mon not running. osd is dead
- my ceoh version is 0.80.1. i install them on ubuntu 12.04.4
uname -a : Linux controller 3.11.0-26-generic #45~preci... - 10:22 PM Bug #10115: mon not running. osd is dead
- this is the log file on one of my ceph node.
- 10:10 PM Bug #10115 (Can't reproduce): mon not running. osd is dead
- my ceph did't config the cephx. i sloved one problem before as this issue said:http://tracker.ceph.com/issues/8851.
... - 06:05 PM Bug #10114 (Fix Under Review): assembly files need annotation to assert that stack should not be ...
- seeming workaround in wip-execstack
- 05:58 PM Bug #10114: assembly files need annotation to assert that stack should not be executable
References:
https://bugzilla.redhat.com/show_bug.cgi?id=1118504 the original bug that noticed the problem on Fe...- 05:30 PM Bug #10114 (Resolved): assembly files need annotation to assert that stack should not be executable
- 05:10 PM Bug #10113: --log-to-stderr with -f/-d sends a lot of things to logfile
- on a vstart cluster with 3 osds, if I stop osd.2 and restart like:
./ceph-osd -i 2 -c ./ceph.conf --log-to-stderr ... - 05:10 PM Bug #10113 (Duplicate): --log-to-stderr with -f/-d sends a lot of things to logfile
- 03:45 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
- 03:12 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
- This is almost certainly unrelated to those two bugs. This is a specific edge case in divergent write recovery.
- 11:43 AM devops Cleanup #7722 (Resolved): Make /admin/build-doc distro independent
- 11:41 AM devops Cleanup #7722: Make /admin/build-doc distro independent
- Updated the procedure doc with all dependencies.
- 11:43 AM Bug #9788 (New): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" is...
- Logs are in http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-basic-vps/...
- 10:22 AM Cleanup #10110 (New): librados: mark old objects_begin interface deprecated
- There is some minor refactoring needed since the new methods call the old ones when ns == "". The fix is probably to...
- 10:18 AM devops Tasks #8366: Update ceph.com/docs to default to the latest major release (0.80)
- Can we update it to the latest major release with the backports--e.g., v0.80.7? I finally have someone to help with t...
- 10:12 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
- I think that's an annoying special case for snaps purged on an empty pg. Both the old primary which did the trim and...
- 08:09 AM Bug #10107: Coredump in upgrade:giant-x-next-distro-basic-multi run
- ...
- 07:40 AM Bug #10107 (Duplicate): Coredump in upgrade:giant-x-next-distro-basic-multi run
- (Maybe related to #8733)
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-13_17:04:11-upgrade:gi... - 08:03 AM Bug #10109 (Duplicate): "LibRadosTwoPoolsECPP.PromoteSnap" test failed in upgrade:dumpling-firefl...
- 3 tests failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:15:02-upgrade:dumpling-firefly-x:p...
- 07:55 AM rgw Bug #10108 (Duplicate): s3tests fail in upgrade:dumpling-firefly-x:parallel-next-distro-basic-mul...
- All tests failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:10:02-upgrade:dumpling-firefly-x...
- 07:47 AM Bug #10105: crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
- the upgrade from 0.80.1 to 0.80.7 case was a bad disk.
- 07:32 AM Bug #9727: 0.86 EC+ KV OSDs crashing
- Hi,
I tried this again on the new 0.88 release.
After about 30 minutes of testing, the EC-KV OSDs started crashin... - 04:51 AM Messengers Feature #10029: Retry binding on IPv6 address if not available
- I started playing with this a bit (no commits yet), I simply loop in SimpleMessenger's Accepter.cc and retry to bind ...
- 03:26 AM Feature #9979 (In Progress): osd: cache: proxy reads (instead of redirect)
- https://github.com/ceph/ceph/pull/2927
- 02:17 AM rgw Bug #10106 (Resolved): rgw acl response should start with <?xml version="1.0" ?>
- I encountered some surprising behaviour when playing with radosgw and s3cmd.
You can probably make a convincing case... - 02:10 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
11/13/2014
- 10:32 PM Bug #10052 (Fix Under Review): LibRadosTwoPools[EC]PP.PromoteSnap failure
- https://github.com/ceph/ceph/pull/2926
- 10:19 PM Bug #10052 (In Progress): LibRadosTwoPools[EC]PP.PromoteSnap failure
- // read baz
{
bufferlist bl;
ASSERT_EQ(-ENOENT, ioctx.read("baz", bl, 1, 0));
}
I think this usu... - 05:44 PM Bug #10052: LibRadosTwoPools[EC]PP.PromoteSnap failure
- ubuntu@teuthology:/a/sage-2014-11-12_13:30:37-smoke-wip-warn-max-pg-distro-basic-multi/598501
- 08:49 PM Bug #10105 (Can't reproduce): crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
- ...
- 05:48 PM Bug #10104 (Resolved): rados.py: wait_for_* don't wait; should have poll, wait, and wait+cb versions
- Completion.wait_for_{safe, complete} are using the poll functions "is_{safe,complete}"; the comments indicate that's ...
- 05:47 PM rgw Bug #10103 (Resolved): swift tests failing
- ubuntu@teuthology:/a/dzafman-2014-11-13_10:42:58-rgw-wip-10082-testing-basic-multi$ teuthology-ls . | grep FAIL
5996... - 05:02 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
- Any progress?
- 04:36 PM rgw Bug #10082 (Resolved): Segmentation fault in upgrade:dumpling-firefly-x:parallel-next-distro-basi...
- 04:28 PM Feature #10064 (Fix Under Review): add ceph_objectstore_tool tests to make check
- https://github.com/ceph/ceph/pull/2915
- 04:28 PM Bug #10063 (Fix Under Review): ceph_objectstore_tool does not support getting attributes for eras...
- https://github.com/ceph/ceph/pull/2915
- 03:48 PM rgw Bug #10102 (Resolved): sync agent: does not handle gracefully transient errors
- on a copy operation, rgw sent back 400 and the sync agent got stuck in the following loop:...
- 12:58 PM rgw Bug #9587 (Pending Backport): ceph-radosgw sysvinit script on EL6 cannot set ulimit
- 12:25 PM rgw Bug #10099 (Duplicate): radosgw-agent - error geting op state: list index out of range
- radosgw-agent logs the following, and objects are not synced to the secondary gateway.
INFO:urllib3.connectionpool... - 12:25 PM Bug #10096: ceph-disk prepare fails to unmount temp file successfully
- Notes:
- Issuing a short delay before 'umount' fixes the issue - this is a terrible workaround
- Issuing 'sync' b... - 07:52 AM Bug #10096 (Resolved): ceph-disk prepare fails to unmount temp file successfully
- I have been testing on a virtual machine for ease of testing, and 'ceph-disk prepare' kept forwarding an error from '...
- 11:07 AM Bug #10095 (Resolved): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
- 11:02 AM Bug #10095 (Fix Under Review): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
- https://github.com/ceph/ceph/pull/2920
- 07:37 AM Bug #10095 (Resolved): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
- ubuntu@teuthology:/a/samuelj-2014-11-11_22:08:30-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/597458
... - 10:36 AM Bug #9835 (Resolved): osd: bug in misdirected op checks (firefly)
- 10:25 AM Messengers Feature #10079 (Resolved): AsyncMessenger: Support select for other OS
- 09:49 AM Feature #10098 (Resolved): wanted: command to clear 'incomplete' PGs
- Hello,
Please create a command that would clear 'incomplete' PGs.
Perhaps ceph pg force_create_pg could be extend... - 08:32 AM rbd Bug #9854 (Pending Backport): librbd: reads contending for cache space can cause livelock
- 08:28 AM Bug #10097 (Resolved): failed: mon_thrash
- debian 7.0
logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-12_17:15:01-upgrade:giant-giant-dist... - 07:17 AM Support #10024: Cluster unreachable after restart
- Hi,
I've missed anything?
Did I do something wrong?
Because I didn't get any answer after more than 1 week.
Thank... - 06:59 AM Cleanup #10094 (New): Create new git repo for json_spirit
- json spirt is currently part of the code tree of ceph, but it's external code. There was also no update within a long...
- 06:58 AM CephFS Bug #10092 (Resolved): multiple_rsync.sh + ceph-fuse timing out on firefly
- greg is right, these time out semi-regularly. increased the timeout on master, giant, firefly.
- 06:38 AM Bug #10093 (Fix Under Review): ceph-monstore-tool: FAILED assert(!is_open)
- https://github.com/ceph/ceph/pull/2914
- 06:35 AM Bug #10093 (Resolved): ceph-monstore-tool: FAILED assert(!is_open)
- Using a vstart cluster + stoph.sh:...
- 04:17 AM Bug #9916: osd: crash in check_ops_in_flight
- Hi Yehuda,
After taking a look at the rgw code, I failed to find which (http) request would need CEPH_OSD_OP_SRC_CMP... - 12:14 AM Feature #9943 (In Progress): osd: mark pg and use replica on EIO from client read
- Current OSD check PG map and get only k items and send sub-read request. So if one read failed. It assert and core du...
11/12/2014
- 09:21 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
- How do we tell the difference between (2) and (3)? In both cases, ceph_objectstore_tool will see there is no SHARDS ...
- 09:06 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
I see from the code that there are a couple of scenarios that need to be handled or at least documented:
1. Expo...- 08:59 PM CephFS Bug #10092 (Resolved): multiple_rsync.sh + ceph-fuse timing out on firefly
- teuthology-2014-11-11_23:04:01-fs-firefly-distro-basic-multi/598145
teuthology-2014-11-11_23:04:01-fs-firefly-distro... - 08:25 PM Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size...
- Wei is working on this along with http://tracker.ceph.com/issues/9943 .
- 06:52 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
- Greg Farnum wrote:
> What version are you running? This looks like one of a couple of bugs that have been resolved i... - 10:47 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
- What version are you running? This looks like one of a couple of bugs that have been resolved in the latest point rel...
- 04:26 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
- And the peer OSD's log is as below:...
- 03:40 AM Messengers Bug #10080 (Resolved): Pipe::connect() cause osd crash when osd reconnect to its peer
- When our cluster load is heavy, the osd sometimes crashes. The critical log is as below:
-278> 2014-08-20 11:04:28... - 05:15 PM rbd Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
- 05:13 PM rbd Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
- Commit b75f85a2 added new elements to the _Thread_ class, breaking ABI. In this (and several other upgrade tests fro...
- 05:08 PM Feature #9957: librados: add fadvise op
- See the pull request: https://github.com/ceph/ceph/pull/2905
- 04:09 PM rgw Bug #10090 (Resolved): ceph_objectstore_tool import broken
- 03:27 PM rgw Bug #10090 (Fix Under Review): ceph_objectstore_tool import broken
- 02:15 PM rgw Bug #10090 (Resolved): ceph_objectstore_tool import broken
The tool can't import because it finds that the recently removed collection still exists.
Is may be because fini...- 12:37 PM rbd Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
- commit:e94d3c11edb9c9cbcf108463fdff8404df79be33
- 11:38 AM Bug #10083 (Resolved): cephtool/test.sh: osd create w/o uuid test is noisy
- 10:09 AM Bug #10083: cephtool/test.sh: osd create w/o uuid test is noisy
- Verified to work with...
- 09:53 AM Bug #10083 (Fix Under Review): cephtool/test.sh: osd create w/o uuid test is noisy
- https://github.com/ceph/ceph/pull/2902
- 09:29 AM Bug #10083 (Resolved): cephtool/test.sh: osd create w/o uuid test is noisy
- ...
- 10:56 AM Bug #10085 (Resolved): dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
- After upgrade to glibc 2.20, "ceph" & "rbd" commands exiting with "Illegal instruction" exit message and !=0 exit cod...
- 10:00 AM Feature #9598 (Pending Backport): re-enable Objecter fast dispatch
- sage-2014-11-11_08:26:01-rados-wip-sage-testing-distro-basic-multi
- 08:42 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
- Same issue http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-11_17:03:01-upgrade:firefly:older-firefly-distro-ba...
- 08:29 AM rgw Bug #10082 (Resolved): Segmentation fault in upgrade:dumpling-firefly-x:parallel-next-distro-basi...
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-11_17:10:01-upgrade:dumpling-firefly-x:parallel-ne...
- 06:53 AM rbd Feature #2467 (Resolved): qemu: implement bdrv_invalidate_cache
- Merged upstream: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=be21788495fdc8251b04dd4bfd0cdce95c49d75b
- 01:23 AM Messengers Feature #10079 (Resolved): AsyncMessenger: Support select for other OS
- AsyncMessenger already support epoll and kqueue, but for other legacy OS or windows, we need to use select for the wo...
11/11/2014
- 06:17 PM rbd Bug #10002 (Fix Under Review): Errors during import_export test in upgrade:firefly-x-next-distro-...
- https://github.com/ceph/ceph/pull/2899
- 08:23 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
- Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:15:02-upgrade:dumpling-firefly-x:parallel-...
- 08:17 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
- Seems similar issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:05:02-upgrade:firefly:singleton-f...
- 05:20 PM Bug #10052: LibRadosTwoPools[EC]PP.PromoteSnap failure
- ubuntu@teuthology:/a/sage-2014-11-11_14:57:42-smoke-wip-warn-max-pg-distro-basic-multi/596722
- 02:59 PM CephFS Bug #8090: multimds: mds crash in check_rstats
- ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-10_23:18:02-multimds-giant-testing-basic-multi/595393
- 02:54 PM Bug #10077 (Resolved): ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
- user on 0.87 exported a replicated pg and couldn't import it because the shards feature wasn't set on the osd.
w... - 02:14 PM rgw Feature #9933: rgw: implement S3 RR (reduced redundancy) API
- Hmm, was looking just now at the S3 api, and it seems that you can set RR per object, not per bucket. This complicate...
- 11:01 AM Bug #10069 (Rejected): SyncEntryTimeout::finish() timeout
The ceph_objectstore_tool aborted in FileStore code.
On my wip-9780 branch which is rebased on current master ru...- 10:31 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
- Replying to my own post for posterity:
I figured out why those Git hashes don't align. It's bug in log.cgi. Appare... - 08:50 AM devops Bug #10049 (Resolved): "Failed to fetch package" "rhel7_0-x86_64-basic"
- Looks fixed
- 09:53 AM Bug #10067 (Can't reproduce): ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basi...
- 09:01 AM rgw Feature #9013 (Resolved): rgw: set civetweb as a default frontend
- 08:48 AM rgw Bug #10066: rgw: failed md5sum on s3tests-test-readwrite
- Same problem in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_18:11:17-upgrade:firefly:newer-firefly-dist...
- 07:22 AM rgw Bug #10066 (Resolved): rgw: failed md5sum on s3tests-test-readwrite
- ...
- 08:16 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
- Same issues in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-10_17:18:01-upgrade:firefly-x-next-distro-b...
- 08:02 AM Bug #10016 (Resolved): "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- tests passed.
- 07:25 AM rgw Bug #9917 (Won't Fix): RADOSGW: Not able to create Swift objects with erasure coded pool
- 03:51 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
- OK,I was not aware of this, seems sane behaviour to me.
- 07:21 AM rgw Bug #10062: s3-test failures using keystone authentication
- Looks like for a few of them eg. the date ones occur as it looks like radosgw doesn't consider checking the date head...
- 05:02 AM rgw Bug #10062 (Resolved): s3-test failures using keystone authentication
- * "rgw: check for timestamp for s3 keystone auth":https://github.com/ceph/ceph/pull/2993
* "wip: rgw: check keystone... - 07:20 AM Bug #10065 (Duplicate): hung ec-lost-unfound.yaml, failed of osd.{0,2,3}
- this pattern keeps popping up:...
- 07:16 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
- ubuntu@teuthology:/a/teuthology-2014-11-10_02:32:01-rados-giant-distro-basic-multi/594038
- 06:40 AM Feature #10064 (Resolved): add ceph_objectstore_tool tests to make check
- The "ceph_objectstore_tool.py":https://github.com/ceph/ceph/blob/giant/src/test/ceph_objectstore_tool.py tests can be...
- 06:35 AM Bug #10063: ceph_objectstore_tool does not support getting attributes for erasure coded objects
- ...
- 06:33 AM Bug #10063 (Resolved): ceph_objectstore_tool does not support getting attributes for erasure code...
- ...
- 04:37 AM Bug #9554: "FAILED assert(0 == "hit suicide timeout")" in upgrade:firefly-firefly-testing-basic-v...
- Yes it reproduced in giant too.
11/10/2014
- 11:46 PM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
- Just wanted to add that lack of timeout causes havoc all over the place... Autofs, backup scrips mounting CephFS on d...
- 04:05 PM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
- Although it terminates on "Ctrl+C" a timeout would be _very_ useful because it would prevent system from hanging on b...
- 11:11 AM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
- Was it blocking in the foreground? Did SIGKILL (ie, control-C) work on it?
We can add a configurable timeout but I... - 01:07 AM CephFS Bug #10041 (Resolved): ceph-fuse: never exit when no MDS server is available
- I'm attempting to mount CephFS using Fuse client (i.e. _ceph-fuse_) which do not exit if all MDS servers are down (I ...
- 10:57 PM CephFS Bug #10061 (New): uclient: MDS: output cap data in messages
- MClientCaps messages don't dump the caps they're updating, and generally neither does anything else. We need to optio...
- 10:55 PM CephFS Feature #10060 (New): uclient: warn about stuck cap flushes
- It can be hard to diagnose issues that involve cap state. To help with that, the client should keep track of its cap ...
- 10:40 PM CephFS Bug #9977 (Resolved): cephfs-journal-tool falsely reports invalid start_ptr
- In next branch as commit:65c33503c83ff8d88781c5c3ae81d88d84c8b3e4 and in giant as commit:fc5354dec55248724f8f6b795e3a...
- 09:36 PM CephFS Bug #9341: MDS: very slow rejoin
- Thanks.
- 09:27 PM CephFS Bug #9341 (Resolved): MDS: very slow rejoin
- This is backported to giant as of commit:97e423f52155e2902bf265bac0b1b9ed137f8aa0. The test for it also got backporte...
- 09:26 PM CephFS Bug #9800 (Resolved): client-limits test is not passing
- Backported in commit:387efc5fe1fb148ec135a6d8585a3b8f8d97dbf8
- 06:15 PM Bug #10042: OSD crash doing object recovery with EC pool
- I'm not sure either, investigating.
- 05:15 PM Bug #10042: OSD crash doing object recovery with EC pool
- Hi Loic,
I am still a little bit confused in terms of what happened behind the crash (and what is the relation betwe... - 05:30 AM Bug #10042: OSD crash doing object recovery with EC pool
- 03:49 AM Bug #10042 (Duplicate): OSD crash doing object recovery with EC pool
- We observed one OSD crash with the following assertion failure:...
- 06:10 PM rbd Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
- 06:45 AM rbd Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
- ...
- 05:44 PM Bug #9921: msgr/osd/pg dead lock giant
- Giving Sage this ticket since he took the PR.
- 05:35 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- testing this PR https://github.com/ceph/ceph-qa-suite/pull/233
http://pulpito.front.sepia.ceph.com/teuthology-2014... - 03:06 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- - install.upgrade:
all:
branch: giant
is upgrading all roles - 02:29 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- Still failed - http://pulpito.front.sepia.ceph.com/teuthology-2014-11-10_10:56:16-upgrade:giant-giant-distro-basic-mu...
- 10:48 AM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- Moved client.0 to a separate node, testing now
https://github.com/ceph/ceph-qa-suite/pull/232 - 09:57 AM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- ...
- 05:20 PM CephFS Bug #10025 (Resolved): Journal undump causes MDS to crash when start pos is not on object boundary
- Merged into next in commit:69be8e9b30c18e47c17ff7dafc4ac8fbe00d48e7, and the appropriate backport bits were merged la...
- 04:34 PM rgw Feature #9359 (Resolved): rgw: Export user stats in get-user-info Adminops API
- 04:21 PM rgw Bug #9907 (Pending Backport): radosgw-admin: can't disable max_size quota
- 04:13 PM rgw Feature #8911 (Pending Backport): RGW doesn't return 'x-timestamp' in header which is used by 'Vi...
- 04:09 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
- This bug makes me cry as it is the reason for my cluster to be _completely down_ for over 10 days now... Duplicate ad...
- 03:20 PM Bug #10059 (Resolved): osd/ECBackend.cc: 876: FAILED assert(0)
- -1> 2014-11-09 14:13:01.334410 7f8b93c8b700 10 filestore(/var/lib/ceph/osd/ceph-3) FileStore::read(1.1ds0_head/78...
- 03:59 PM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
- When I look at the log for http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel7-amd64-basic/log.cgi?log=6977d02...
- 03:29 PM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
- Disk space looks ok to me:...
- 10:28 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
- From http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel7-amd64-basic/log.cgi?log=6977d02f0d31c453cdf554a8f1796...
- 10:03 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
- Needs a link:
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-09_17:18:01-upgrade:firefly-x-next-distro-basic... - 09:12 AM devops Bug #10049 (Resolved): "Failed to fetch package" "rhel7_0-x86_64-basic"
- Seems wide spread on next run using rhel 7
Run teuthology-2014-11-09_17:18:01-upgrade:firefly-x-next-distro-basic-... - 03:40 PM Bug #10057 (In Progress): msgr: skipped message on peer reconnect
- ...
- 01:42 PM Bug #10057 (Can't reproduce): msgr: skipped message on peer reconnect
- ubuntu@teuthology:/a/teuthology-2014-11-09_23:06:01-krbd-next-testing-basic-multi/593102...
- 03:36 PM Feature #9420: erasure-code: tools and archive to check for non regression of encoding
- the backport is needed to generate the content of https://github.com/ceph/ceph-erasure-code-corpus/tree/master/v0.80....
- 03:32 PM Feature #9420 (Pending Backport): erasure-code: tools and archive to check for non regression of ...
- 02:57 PM Feature #9420 (Resolved): erasure-code: tools and archive to check for non regression of encoding
- I don't think this needs to be backported.
- 03:06 PM Bug #10058 (Can't reproduce): next stuck in recovery, no progress
- /a/sage-2014-11-09_07:49:57-rados-next-testing-basic-multi/591906
/a/sage-2014-11-09_07:49:57-rados-next-testing-bas... - 02:59 PM Bug #9986 (Pending Backport): objecter: map epoch skipping broken
- 02:56 PM Feature #9262 (Resolved): Additional namespace issues
- 02:55 PM Feature #9031 (Resolved): List RADOS namespaces and list all objects in all namespaces
- 02:53 PM Bug #6756 (Pending Backport): journal full hang on startup
- 02:51 PM Bug #9852 (Resolved): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID that DNE
- 02:49 PM Bug #9987 (Pending Backport): mon: min_last_epoch_complete tracking broken
- 02:12 PM Bug #10053 (Resolved): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
- 11:18 AM Bug #10053 (In Progress): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
- ubuntu@teuthology:/a/sage-2014-11-09_07:49:57-rados-next-testing-basic-multi$ teuthology-ls . | grep FAIL
591648 FAI... - 11:14 AM Bug #10053 (Resolved): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
- ubuntu@teuthology:/a/samuelj-2014-11-07_21:48:36-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/590242
... - 01:40 PM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
- * how to use ceph_objectstore_tool https://github.com/ceph/ceph-qa-suite/blob/giant/tasks/ceph_objectstore_tool.py
*... - 06:20 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
- The tests should use the same as #9887 which requires https://github.com/ceph/ceph-qa-suite/compare/wip-dzaddscrub
- 01:27 PM Feature #10056 (New): Object metadata mismatch detection and handling
- Possible things we may want to address:
- clone vs head snapshot metadata mismatches
- object metadata vs ondis... - 01:23 PM Feature #10055 (New): PG metadata corruption detection and handling
- Possible problems we might want to handle:
- missing pg info
- missing pg epoch
- missing pg log
Correct ... - 01:21 PM Feature #10054 (New): OSD level metadata mismatch handling
- Meta feature for detecting and handling OSD metadata.
Possible directions:
- full osdmap vs incremental mismatch? - 11:57 AM devops Feature #10046: run make check on every pull request
- Removing myself and clarifying the scope. I would be happy to help with the implementation but I'm not equipped to ta...
- 07:48 AM devops Feature #10046 (Resolved): run make check on every pull request
- And report back on the success / failure, with the logs attached for debugging. The suggested approach is to define a...
- 11:24 AM CephFS Bug #9997: test_client_pin case is failing
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_23:04:01-fs-next-testing-basic-multi/593068/
- 11:23 AM CephFS Bug #6613: samba is crashing in teuthology
- Still happening: http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_23:14:01-samba-next-testing-basic-multi/59...
- 11:13 AM Bug #10052 (Resolved): LibRadosTwoPools[EC]PP.PromoteSnap failure
- ubuntu@teuthology:/a/samuelj-2014-11-07_21:48:36-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/590439
... - 09:53 AM rbd Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
- #10045
- 09:52 AM Bug #10033 (Won't Fix): ceph pg <pg> query hangs when OSD down, EC PG
- In this case teh osd seems to be up (the pg state isn't 'stale'), so this is expected behavior (the osd hasn't respon...
- 09:51 AM rbd Bug #10051 (Won't Fix): kernel-mounted RBD image may block shutdown
- init-rbdmap fails to unmap an RBD image when the latter is still in use.
As consequence system shutdown hangs dead w... - 09:46 AM rgw Bug #9899 (Fix Under Review): Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-du...
- Per Sage - removed mon_thrash tests from the rgw/ section, https://github.com/ceph/ceph-qa-suite/pull/230
- 09:30 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- this bug was fixed in 0.80.3 or 0.80.4. i think we need to make the 'older' tests skip the mon_thrash tests.
- 09:23 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- Same issue in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-09_10:00:02-upgrade:dumpling-dumpling-distro...
- 09:19 AM devops Bug #10050 (Rejected): "Segmentation fault" (radosgw-admin) in upgrade:firefly:singleton-firefly-...
- Logs rae in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_17:05:02-upgrade:firefly:singleton-firefly-dist...
- 09:05 AM Bug #10013: "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
- Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_19:13:01-upgrade:dumpling-x-firefly-distro-ba...
- 08:43 AM Bug #9913: mon: audit log entires for forwarded requests lack info
- session is with the monitor that forwarded the request. there's no auth handler for the session as it is a monitor. ...
- 08:41 AM rbd Bug #10030 (Pending Backport): Crash when attempting to open non-existent parent image
- 08:40 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
- Same issue in job http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_18:13:01-upgrade:firefly-x-giant-distro-b...
- 08:24 AM Bug #9864 (Can't reproduce): osd doesn't report new stats for 3 hours when running test LibCephFS...
- not enough info to tell why teh client test hung. let's see if it happens again!
- 08:08 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- Looking into the osd logs show that the osds don't report new stats for the ~3 hours because no pgs are update in tha...
- 07:47 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- 07:44 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- Not so weird after all.
Log shows that last log is created because we had some stats to report:... - 07:30 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- this is not the monitor taking 2 hours to commit. The log snippets above refer to two different proposals: the first...
- 06:08 AM Feature #10044 (New): ECUtil::HashInfoRef should have a NONE value
- So that "ECBackend::get_hash_info":https://github.com/ceph/ceph/blob/giant/src/osd/ECBackend.cc#L1435 can return it i...
- 05:10 AM Bug #10040 (Rejected): install ceph packages broken for firefly
- The problem here is that the machine needs to be properly cleaned up from newer Ceph packages.
It is always proble... - 04:13 AM Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size...
- Hi Sam,
Any suggestion in terms of how to fix this issue?
One potential solution is to validate the digest for ea...
11/09/2014
- 10:41 PM CephFS Bug #9995 (Resolved): failing test_filelock
- 10:39 PM Bug #10040 (Rejected): install ceph packages broken for firefly
- hitting the following error, when trying to install ceph packages for firefly on rhel7.0 using ceph-deploy.
test m... - 08:32 PM Bug #10039 (Resolved): osd cann't entry up status with cpu 100%, when osd restart from out status.
- backported in commit:0804deeab293e09123d1b58825051ccc4dddbc0e
- 07:40 PM Bug #10039: osd cann't entry up status with cpu 100%, when osd restart from out status.
- I have fix this problem, by merge this patch, thanks.
osd: fix map advance limit to handle map gaps
The recent ... - 06:03 PM Bug #10039 (Resolved): osd cann't entry up status with cpu 100%, when osd restart from out status.
- ceph version: 0.80.6
platform : Redhat 6.5
Host: 3
osd node: 15 (5 per host)
operator:
1 start the ceph cluste... - 05:52 PM Bug #10038 (Rejected): osd cannwhen osd outed restart
- 05:51 PM Bug #10038: osd cannwhen osd outed restart
- operator error, pls delete it thanks.
- 05:50 PM Bug #10038 (Rejected): osd cannwhen osd outed restart
- 10:09 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Thanks and sorry for lack of details. I was curious about KV-based OSDs for a long time and after upgrading to 0.87 I...
- 05:45 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- It's a pity to see it.
Thanks to Dmitry Smirnov, could you give a detail summary about what you done and suggestio... - 05:02 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Haomai Wang wrote:
> EC+KeyValueStore is a good match.
Ironically that was the worst thing I ever tried in Ceph.... - 09:19 AM CephFS Bug #9341: MDS: very slow rejoin
- Greg Farnum wrote:
> Hmm, we didn't put this in Giant initially because we were trying not to perturb it. Master has... - 05:17 AM Bug #8752: firefly: scrub/repair stat mismatch
- After upgrade to 0.87 I've noticed inconsistencies on all PGs of all caching pools again, during and after deleting o...
- 05:12 AM rbd Feature #10037 (Resolved): cache-tier: Optimise RBD image removal
- While removing an RBD image from EC pool I've noticed that it bubbles-up to caching pool hence removal is very slow. ...
- 05:07 AM Feature #10036 (Resolved): osd tree to show primary-affinity value
- It would be nice (and useful) if "primary-affinity" value could be shown in "ceph osd tree" view.
- 01:41 AM Documentation #10035 (Resolved): explain the semantic of pgp num
- As of Giant the "documentation":http://ceph.com/docs/giant/rados/operations/placement-groups/#set-the-number-of-place...
11/08/2014
- 06:29 PM Bug #9970 (Fix Under Review): document erasure coded pool simple operations
- https://github.com/ceph/ceph/pull/2888
- 08:07 AM CephFS Bug #9977 (Fix Under Review): cephfs-journal-tool falsely reports invalid start_ptr
- Backport to giant PR at:
https://github.com/ceph/ceph/pull/2887 - 04:22 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Hmm, thanks.
I'm not sure who is working or dive into it. I need more free time to diagnose it. EC+KeyValueStore i... - 03:31 AM devops Feature #7475: ceph-disk: prepare should be idempotent
- Checking the arguments you listed with a * would be fine.
11/07/2014
- 07:41 PM Bug #10033 (Won't Fix): ceph pg <pg> query hangs when OSD down, EC PG
- EC PGs with down OSD result in hang the PG query command.
-bash-4.1$ sudo ceph pg 3.1352 query
^CError EINTR: pr... - 04:30 PM Linux kernel client Bug #9894 (Resolved): kcephfs: rm -r left files behind
- 02:35 AM Linux kernel client Bug #9894 (Pending Backport): kcephfs: rm -r left files behind
- https://github.com/ceph/ceph/pull/2876
- 04:27 PM CephFS Bug #10011: Journaler: failed on shutdown or EBLACKLISTED
- Should be resolved by commit:6977d02f0d31c453cdf554a8f1796f290c1a3b89. We may want to backport once it's been through...
- 04:16 PM CephFS Feature #4138 (Resolved): MDS: forward scrub: add functionality to verify disk data is consistent
- This one ticket at least is definitely fulfilled by commit:daa9f9ffe82a811b5e0e69ef52241c4e0b7556bc
- 02:58 PM devops Feature #7475: ceph-disk: prepare should be idempotent
- 'ceph-disk prepare' takes the following arguments:
I have marked arguments with [*] that I believe should match to r... - 12:36 PM devops Feature #7475: ceph-disk: prepare should be idempotent
- I agree with your assessment.
- 09:40 AM devops Feature #7475: ceph-disk: prepare should be idempotent
- Less assumptions is better indeed. When given an existing partition (or a device that is already partitionned) ceph-d...
- 09:26 AM devops Feature #7475: ceph-disk: prepare should be idempotent
- I spoke with my coworker and have a few more thoughts:
Something else to consider is that the FSID should be check... - 08:59 AM devops Feature #7475: ceph-disk: prepare should be idempotent
- I have looked at the 'ceph-disk prepare' scripts and have a few thoughts:
1) A '--force' option could be added to ... - 02:54 PM Bug #7679 (Resolved): mds: stuck on TMAP2OMAP check incorrectly
- Added a new section for upgrading from Dumpling to Firefly. Reviewed by Tamil.
http://ceph.com/docs/master/instal... - 01:03 PM Bug #7679 (In Progress): mds: stuck on TMAP2OMAP check incorrectly
- 12:10 PM Bug #7679: mds: stuck on TMAP2OMAP check incorrectly
- https://github.com/ceph/ceph-qa-suite/pull/229 - fixed by Yuri
assigning this to John Wilkins, to make sure we alr... - 09:59 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
- Samuel Just wrote:
> Loic: you'll want to cover this in the same test as the hinfo one.
Ack :-) - 07:48 AM rbd Bug #10030 (Fix Under Review): Crash when attempting to open non-existent parent image
- 07:09 AM rbd Bug #10030 (Resolved): Crash when attempting to open non-existent parent image
- If a child image is not able to open a parent image, librbd will incorrectly attempt to close the parent image handle...
- 07:25 AM Bug #9987: mon: min_last_epoch_complete tracking broken
- I don't think this will help much with your case. This patch will allow the monitor to delete data that should be re...
- 07:12 AM Bug #10021 (Can't reproduce): ceph auth caps doesn't show in the CLI help / commands list
- Running on "pretty-close to master with a few unrelated patches":...
- 06:52 AM Linux kernel client Feature #9906 (In Progress): Inline data support
- (setting assignee because you mentioned you were working on it)
- 06:18 AM Bug #9876: failed pull needs to allow mark_unfound_lost revert eventually
- A customer has requested this be backported to firefly.
- 03:32 AM Bug #9554: "FAILED assert(0 == "hit suicide timeout")" in upgrade:firefly-firefly-testing-basic-v...
- Its reproducible in ceph 0.84 (customized build)
2014-11-04 15:26:39.388499 7f377cac3700 0 -- 10.242.42.172:7241/... - 03:01 AM Messengers Feature #10029 (Resolved): Retry binding on IPv6 address if not available
- On systems with IPv6 it might be that the IPv6 address is not yet available when a MON or OSD boots.
This can have...
11/06/2014
- 11:43 PM CephFS Bug #9995: failing test_filelock
- 12:16 AM CephFS Bug #9995: failing test_filelock
- https://github.com/ceph/ceph-qa-suite/pull/228
- 09:46 PM CephFS Bug #9977 (Pending Backport): cephfs-journal-tool falsely reports invalid start_ptr
- Merged to next in commit:574c1d4bad37514ba941e3ae83e33a7d926697d9
Yes, let's please backport. - 07:27 PM devops Feature #7475: ceph-disk: prepare should be idempotent
- No update yet, it never was enough of a problem to get in front of the bug queue. Would you have time to work on it ?...
- 01:06 PM devops Feature #7475: ceph-disk: prepare should be idempotent
- I am encountering this issue using puppet-ceph for automated deployments. Is there a status update on this bug?
- 05:49 PM CephFS Bug #9674: nightly failed multiple_rsync.sh
- I messed up (didn't set sudo everywhere), newer commits will hopefully make it all good. giant:f66bf31b6743246fb1c882...
- 05:35 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
- I'm saying I think we want to backport all of the flag changes to giant (for userspace) because we're seeing failures...
- 05:16 PM Linux kernel client Bug #9894 (Resolved): kcephfs: rm -r left files behind
- 05:16 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
- giant does not contain the commit that introduce the ORDERED flag
- 04:48 PM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
- Actually, the marking down thing won't work.
- 04:29 PM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
- That all looks right. I'd mark the osd down, get the object, re-put it, and mark the osd back up. Should cause reco...
- 02:10 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
- Besides the code fix, I am wondering what is the right way to fix the PG state (and object)? Bringing the OSD out mig...
- 04:43 PM Bug #10028 (Duplicate): ec_lost_unfound failing on giant
- ubuntu@teuthology:/a/teuthology-2014-11-03_02:32:01-rados-giant-distro-basic-multi/584089
2014-11-03T10:46:32.795 ... - 02:16 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Samuel Just wrote:
> Type of osd probably does matter if the KV osds are distributing faulty information.
Earlier... - 10:48 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Type of osd probably does matter if the KV osds are distributing faulty information.
- 05:08 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Please let me know if you need more logs.
Due to this error all OSDs (except few) in my cluster are down or crashing... - 01:55 AM Bug #9978 (New): keyvaluestore: void ECBackend::handle_sub_read
- Sorry. Mistake
- 01:49 AM Bug #9978 (Duplicate): keyvaluestore: void ECBackend::handle_sub_read
- http://tracker.ceph.com/issues/9978#change-44091
- 12:32 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Here is another log from filestore-based OSD, just crashed.
- 01:37 PM rbd Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-05_23:00:03-rbd-master-testing-basic-multi/588155/...
- 11:35 AM Bug #7679: mds: stuck on TMAP2OMAP check incorrectly
- Okay, so we had a better smoking gun, logs in teuthology:~/jcsp/7679. The OSDs all have features set to 0 in the OSD...
- 08:39 AM Bug #7679: mds: stuck on TMAP2OMAP check incorrectly
- I'm confused by the order of operations in the tests, it seems like there is an upgrade, then a restart, then an upgr...
- 11:16 AM CephFS Bug #10025 (Resolved): Journal undump causes MDS to crash when start pos is not on object boundary
Related ML thread from Jasper Siero, who first encountered the issue on firefly (http://lists.ceph.com/pipermail/ce...- 10:42 AM devops Bug #9747: ceph.spec.in will always use 95-ceph-osd-alt.rules
- Backported to firefly.
- 10:41 AM Bug #9875 (Resolved): stuck recovering due to unfound hit_set object
- backported to firefly
- 10:41 AM Bug #9821 (Resolved): failed to recover before timeout expired
- Backported to firefly
- 10:40 AM Bug #9718 (Resolved): osd_types: check_new_interval: min_size check needs to consider CRUSH_ITEM_...
- backported to firefly
- 10:39 AM Bug #9113: osd: snap trimming eats memory, linearly
- Backported to firefly.
- 10:39 AM Bug #9626 (Resolved): PG: cancel backfill reservations if we get a cancel during backfill
- Don't really want to backport to dumpling.
- 10:38 AM Bug #9574 (Resolved): Backfill: recheck full status once reservation is granted
- backported to firefly, don't really want to backport to dumpling
- 10:38 AM Feature #9262: Additional namespace issues
- 10:38 AM Bug #9293 (Resolved): _collection_move_rename EEXIST
- firefly
- 10:37 AM Bug #8315 (Resolved): osd: watch callback vs callback funky
- firefly
- 10:29 AM Bug #8629 (Resolved): cache_evict needs to prevent make_writeable from creating a snapdir
- Merged to firefly.
- 10:27 AM Bug #9301 (Resolved): paxos: off by one w/ versions in forming quorum
- merged to firefly
- 10:26 AM Bug #9053 (Resolved): mon/Paxos.cc: 628: FAILED assert(begin->last_committed == last_committed)
- Merged to firefly.
- 10:21 AM Bug #9502 (Resolved): mon: does not verify disk is not full on startup
- 10:17 AM Bug #9851 (Resolved): crash on journal/filestore shutdown on firefly
- 12:05 AM Bug #9851: crash on journal/filestore shutdown on firefly
- It has been added to wip-sam-testing ( https://github.com/ceph/ceph/pull/2764#issuecomment-61167705 ) which is anothe...
- 10:15 AM Bug #9675: splitting a pool doesn't start when rule_id != ruleset_id
- Backported to firefly.
- 09:29 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
- Same issues in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-05_17:18:01-upgrade:firefly-x-next-distro-b...
- 08:53 AM Support #10024 (New): Cluster unreachable after restart
- Dear Support,
I'm quite a new user, I already asked for this question to the users lists without any solution.
I se... - 08:42 AM rados-java Bug #10023 (Resolved): method Rados.shutdown() is missing for closing the connection to the clust...
- Hi,
if you call the sample code (1) 3000 times, you will get an error -24 (EMFILE, Too many open files). Why? Beca... - 07:00 AM Bug #10018 (Fix Under Review): OSD assertion failure if the hinfo_key xattr is not there (corrupt...
- https://github.com/ceph/ceph/pull/2872
- 06:14 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
- Actually it happens on master, my test was incorrect and is now fixed : https://github.com/dachary/ceph/commit/312cda...
- 05:37 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
- Also on 0.80.7...
- 05:22 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
- Test case that reproduces the problem: https://github.com/dachary/ceph/commit/5639303646418913ba0929ce73e8a5c61190191...
- 02:03 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
- Sorry we don't have verbose log during crashing, but following is the code leading the crash:...
- 03:18 AM Feature #9943: osd: mark pg and use replica on EIO from client read
- Wei will work on this one.
- 01:49 AM Bug #9727 (Duplicate): 0.86 EC+ KV OSDs crashing
- http://tracker.ceph.com/issues/9978#change-44091
- 01:36 AM Messengers Bug #10022 (Resolved): AsyncMessenger: Wrong newly_acked_seq when replacing existing connection
- Here the output. (monitor ips are 10.11.1.27,10.11.1.28,10.11.1.29)
# ceph -w --debug-ms=10/10
2014-11-04 10:38:... - 12:06 AM Bug #9485: Monitor crash due to wrong crush rule set
- Did not forget about it, just busy with other things (the OpenStack summit after the Giant release).
- 12:02 AM Support #9901: libgoogle-perftools4: tcmalloc performance regression on armhf
- 12:00 AM Bug #10021 (Can't reproduce): ceph auth caps doesn't show in the CLI help / commands list
- When trying to resetting a client's permissions I've tried to use the 'ceph auth add' command, and it failed. When se...
11/05/2014
- 11:51 PM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
- fixed by "ceph: introduce global empty snap context"
- 11:41 PM Bug #10018 (Need More Info): OSD assertion failure if the hinfo_key xattr is not there (corrupted...
- Could you please attach the logs of the crashed OSD (the last 20,000 lines would be enough) ?
- 08:02 PM Bug #10018 (Resolved): OSD assertion failure if the hinfo_key xattr is not there (corrupted?) dur...
- We observed an OSD crash during scrubbing on EC pool, the crash happened if the hinfo_key xattr of the file is absent...
- 11:26 PM Bug #10020 (Closed): bloom filter unit tests fail (power8)
- As of ac3c1cb5d0e17250fa147c11e42ed93e15b2184a unittest_bloom_filter fails with:...
- 09:55 PM Tasks #10019 (Closed): rbd
- hello all
I'm deploying openstack with ceph.
Compute node used the rbd device to created disk.
I have a problem... - 08:30 PM Bug #9927: RHEL: selinux-policy-targeted rpm update triggers slow requests
- I would strongly reccomend limiting it to the subdirectories where large mounts are, not on the parent directory. Th...
- 06:32 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Sorry for confusion -- I have impression that it may be not related to store type.
Attaching more detailed log... - 06:09 PM Bug #10017 (Resolved): OSD wrongly marks object as unfound if only the primary is corrupted for E...
- Recently we observed there was one PG stuck at recovering with one object marked as lost, the scrubbing log showed th...
- 04:35 PM Bug #10016 (Resolved): "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- In a new suite all jobs failed.
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-05_11:35:26-upgrade:giant-gia... - 04:19 PM Bug #9788 (Rejected): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeou...
- 2014-11-05 09:29:31.507827 7fa236d5b700 10 filestore(/var/lib/ceph/osd/ceph-3) sync_entry commit took 150.696754, int...
- 04:15 PM Bug #9788: "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
- 584644 and 584647 both stuck in sync, probably environmental.
- 04:14 PM Bug #9788: "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
- Also seeing in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-04_19:00:01-rados-dumpling-distro-basic-mul...
- 02:32 PM rgw Bug #9918 (Fix Under Review): RGW-Swift: SubUser access permissions, does not seems to work
- 01:41 PM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
- The bucket index cannot reside on EC pools.
- 01:37 PM rgw Bug #9973 (Fix Under Review): Validation of Swift DLO manifest object ETag doesn't match OpenStac...
- 01:23 PM rgw Bug #8766 (Resolved): multipart minimum size error should be EntityTooSmall
- Tested on firefly, seem to work.
- 01:22 PM rgw Bug #9479 (Fix Under Review): ETag is not included in the XML response to put object copy operation
- 12:28 PM rgw Bug #9478 (Fix Under Review): Incorrect content type in response header
- 10:44 AM rgw Bug #10015 (Resolved): rgw sync agent: 403 when syncing object that has tilde in its name
- The cuplrit is the python requests module that the sync agent uses in order to send the http requets. Wittily the mod...
- 09:18 AM Bug #10014 (Pending Backport): osd: spurious memmove on data payload
- 09:13 AM Bug #10014 (Resolved): osd: spurious memmove on data payload
- see commit:a1aa70f2f21339feabfe9c1b3c9c9f97fbd53c9d
- 09:00 AM Bug #10013 (Rejected): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_19:13:01-upgrade:dumpling-x-firefly-distro-basi...
- 08:58 AM CephFS Bug #9995: failing test_filelock
- We'll need to update the test then so that it detects this situation and aborts quietly instead of raising an error.
- 08:39 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
- suite:upgrade:dumpling-firefly-x
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_17:25:01-upgrade:dump... - 08:32 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
- suite:upgrade:firefly:singleton
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_18:41:21-upgrade:firef... - 05:42 AM CephFS Bug #10011: Journaler: failed on shutdown or EBLACKLISTED
- Ah... I've just realised why the "respawn on blacklist" thing I put in a while back isn't kicking in here: because Jo...
- 04:32 AM CephFS Bug #10011: Journaler: failed on shutdown or EBLACKLISTED
mon.a says:...- 05:19 AM Feature #9962 (Fix Under Review): osd: kill 'category' in stats and public API
- 12:47 AM Feature #9957: librados: add fadvise op
- From the man posix_advise & kernel related code:
In kernels before 2.6.18, POSIX_FADV_NOREUSE had the same se... - 12:10 AM Bug #9909: lost_unfound test/rados tool flawed, EEXIST when putting empty object
- http://tracker.ceph.com/issues/9387 addresses the ceph-qa-suite part of this problem by using the /etc/group file ins...
11/04/2014
- 11:53 PM Bug #9909 (Resolved): lost_unfound test/rados tool flawed, EEXIST when putting empty object
- https://github.com/ceph/ceph/pull/2858
- 08:13 PM Bug #9909 (Fix Under Review): lost_unfound test/rados tool flawed, EEXIST when putting empty object
- Already fixed by 50e80407f3c2f74d77ba876d01e7313c3544ea4d. Creating pull request for backport to giant.
- 11:48 PM rgw Bug #9907: radosgw-admin: can't disable max_size quota
- Hi, can you help merge this fix?
https://github.com/ceph/ceph/pull/2782 - 10:59 PM CephFS Bug #9995: failing test_filelock
- ...
- 08:54 PM CephFS Bug #9995: failing test_filelock
- Is there something we can do as a workaround to prevent this blocking things? I expect people are going to use new ce...
- 07:36 PM CephFS Bug #9995 (Won't Fix): failing test_filelock
- it's a bug in old version of libfuse, it calls our setlk callback for both fcntl setlk and flock requests
- 10:15 PM rgw Bug #9877 (Fix Under Review): In some cases it's possible for rgw to segfault on http COPY
- 09:54 PM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
- Ah, moreover, the issue is fixed already in the firefly branch but didn't make it to a dot release (will be in the ne...
- 09:44 PM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
- Ok, I was able to reproduce it using this script. It seem that there are a few things that don't work as required. Th...
- 02:37 AM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
- reproduces on ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
#!/bin/bash
base=test
source=${... - 07:54 PM Linux kernel client Bug #9432 (Resolved): kcephfs: null pointer deref in posix_acl_create
- fixed by commit b1ee94aa593abd03634bc3887b8e189840e42c12
- 07:53 PM Linux kernel client Bug #9505 (Duplicate): kcephfs: client gets stuck in reconnect loop?
- 07:53 PM Linux kernel client Bug #9505: kcephfs: client gets stuck in reconnect loop?
- dup of #9458
- 07:50 PM Linux kernel client Bug #9426 (Resolved): kcephfs: soft lockup in handle mds map
- 05:46 PM CephFS Bug #9994: ceph-qa-suite: nfs mount timeouts
- teuthology-2014-11-03_23:10:01-knfs-giant-testing-basic-multi/585658/
- 05:40 PM Bug #10012 (Can't reproduce): Configuration parameters not picked up outside of the [global] sect...
Certain osd* and radosgw* parameters are not picked up outside of the [global] section in the ceph.conf file, as pe...- 05:40 PM CephFS Bug #10011 (Resolved): Journaler: failed on shutdown or EBLACKLISTED
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_23:08:01-kcephfs-giant-testing-basic-multi/585648/
teuth... - 05:13 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
- We'll also want to do a backport to Giant of this and the prior series.
- 04:53 PM Bug #10010 (Resolved): ceph_osd.cc calls global_init_shutdown_stderr even when running with -f or...
- ceph-osd is difficult to debug in operation when running under systemd or docker, or any other system that expects to...
- 04:48 PM devops Documentation #10009 (Rejected): Ceph build requirements are incomplete
- The build instructions at http://ceph.com/docs/giant/install/build-ceph/ have a list of Ubuntu packages that are requ...
- 04:12 PM Bug #10008 (Resolved): "obsolete rollback obj" error in upgrade:firefly-x-giant-distro-basic-vps run
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_18:18:02-upgrade:firefly-x-giant-distro-basic-v...
- 04:02 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- Yehuda, can you take a look pls?
- 03:56 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- It's probably duplicate of #8311, but effects other releases:...
- 03:44 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- Yes, Tamil said we have such a case with empty pool name.
- 01:36 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- Is this that bug where radosgw can create a pool with an empty name?
- 01:35 PM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- It does appear to be trying to get pg_num for the empty name pool. Is that deliberate?
- 11:15 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
- Same results in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-02_10:00:02-upgrade:dumpling-dumpling-dist...
- 03:15 PM Feature #10007 (New): option to disable erasure code plugin version check
- An option such as...
- 03:04 PM Bug #9939: "giant" no longer log scrub errors
- I am getting something like ...
- 01:19 PM Bug #9939: "giant" no longer log scrub errors
- Ok, pick a known-bad pg. On the primary, set debug osd = 20, debug ms = 1, debug filestore = 20. Scrub. Attach cep...
- 02:59 PM Bug #10006 (Resolved): osd cache full mode still skips young objects
- commit f4ee949
- 01:17 PM Bug #10003 (Duplicate): "found obsolete rollback obj" error in upgrade:firefly-x-giant-distro-bas...
- 10:29 AM Bug #10003 (Duplicate): "found obsolete rollback obj" error in upgrade:firefly-x-giant-distro-bas...
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_18:18:02-upgrade:firefly-x-giant-distro-basic-v...
- 01:06 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Because the bug you say this duplicates is about KV osds.
- 01:06 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Are you saying these osds are using the KV backend?
- 12:09 PM rbd Bug #9854: librbd: reads contending for cache space can cause livelock
- PR: https://github.com/ceph/ceph/pull/2820
- 09:12 AM rbd Bug #9854 (Fix Under Review): librbd: reads contending for cache space can cause livelock
- 11:48 AM Bug #10004: ceph osd find does not correctly report crush locations
- Moreover, it's no longer reporting the entire crush branch, but only the immediate parent; that's a change in behavio...
- 10:38 AM Bug #10004 (Can't reproduce): ceph osd find does not correctly report crush locations
- ...
- 10:59 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
- suite:upgrade:dumpling-firefly-x
Same issue in job http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_17:25... - 09:29 AM rbd Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
- Two jobs failed ['584634', '584648']
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_17:18:0... - 10:24 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
- And if it hasn't, the same (or at least a full dmesg) from the previous crash won't hurt, if you still have it around.
- 10:21 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
- If it's crashed again, a full dmesg and a tail (say, last 5-10 minutes before the crash) of osd/messenger logs would ...
- 10:17 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
- Is there anything which needs to be gathered from the cluster currently displaying this issue which could help out?
- 09:29 AM rbd Bug #9742: `rbd map lun` fails with: (2) No such file or directory on kernel 3.14.14 w/ udev-216 ...
- I'm guessing CRYPTO_CBC kernel config option is not enabled - -ENOENT is most likely because crypto core can't find a...
- 09:20 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
- Run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-03_17:18:01-upgrade:firefly-x-next-distro-basic-vps/
St... - 09:07 AM Bug #9788 (New): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" is...
- suite:upgrade:firefly-x
next
Run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-03_17:18:01-upgrade:firef... - 09:04 AM rgw Bug #9587: ceph-radosgw sysvinit script on EL6 cannot set ulimit
- The pull request is closed, anything new with this one?
- 09:02 AM rgw Bug #9148 (Resolved): rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy...
- 09:01 AM rbd Bug #9936 (Pending Backport): Exporting images larger than 2GB fails
- 08:39 AM Bug #9998: Replaced OSD weight below 0
- We see this often in our dumpling cluster. Kinda annoying.
- 06:00 AM Bug #9998: Replaced OSD weight below 0
- This bug might be related to this part of code (use of one variable in two nested loops):...
- 04:53 AM Bug #9998 (Resolved): Replaced OSD weight below 0
- I've hit a bug when replacing OSDs. Under specific conditions replaced OSD gets weight of @-3.052e-05@.
h4. How to... - 06:54 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
- Looking more, I noticed that the pool 35 PGs are not entering the backfilling state -- only recovery. I'm bringing os...
- 02:31 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
- Hi Sage and Sam,
I've just tried wip-9113-9487-dumpling on our test cluster. (Using this build: http://gitbuilder.ce... - 06:53 AM CephFS Bug #9869 (Resolved): Client: not handling cap_flush_ack messages properly
- 02:09 AM Bug #9727: 0.86 EC+ KV OSDs crashing
- OK, it's seemed that not a simple test problem I just misunderstand. So do you have more logs and I find there exists...
- 01:43 AM Bug #9727: 0.86 EC+ KV OSDs crashing
- Well, I can apply selective patches on top of 0.87 but I'd be reluctant to deploy master branch cluster-wide...
All ... - 01:33 AM Bug #9727: 0.86 EC+ KV OSDs crashing
- Hmm, still now KeyValueStore isn't suitable for large version upgrade. So I don't sure which problem you hit.
I'm ... - 01:46 AM rgw Bug #9766: s3tests: test_100_continue failing
- Yes, sorry it was due to using apache from the ubuntu repos
11/03/2014
- 11:43 PM Bug #9727: 0.86 EC+ KV OSDs crashing
- My prematurely upgraded to "Giant" cluster is practically wrecked by this problem.
Haomai, is there any additional i... - 07:55 PM CephFS Feature #1398: qa: multiclient file io test
- A first pass of this is in origin/wip-multiclientio-wusui
- 04:22 PM Bug #7679 (New): mds: stuck on TMAP2OMAP check incorrectly
- I see similar problem on latest runs:
http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-28_17:00:01-upgrade:f... - 03:30 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
- Any news on this please? I can barely use my cluster since upgrade to Giant -- OSDs are crashing during backfill all ...
- 12:10 PM CephFS Bug #9997 (Resolved): test_client_pin case is failing
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-02_23:04:01-fs-next-testing-basic-multi/583588/
RuntimeErro... - 12:08 PM devops Bug #9996 (Won't Fix): SyntaxError in Chef run
- ...
- 12:05 PM CephFS Bug #9995 (Resolved): failing test_filelock
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-02_23:04:01-fs-next-testing-basic-multi/583589/
It's gettin... - 11:43 AM CephFS Bug #9994: ceph-qa-suite: nfs mount timeouts
- http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-31_23:10:01-knfs-giant-testing-basic-multi/582459/
http://q... - 11:34 AM CephFS Bug #9994 (Resolved): ceph-qa-suite: nfs mount timeouts
- ...
- 11:27 AM CephFS Bug #9977: cephfs-journal-tool falsely reports invalid start_ptr
- https://github.com/ceph/ceph/pull/2853
- 11:27 AM CephFS Bug #9977 (Fix Under Review): cephfs-journal-tool falsely reports invalid start_ptr
- PR up for next, probably also worth backporting to giant as without it journal-tool is pretty useless on filesystems ...
- 10:00 AM devops Bug #9992 (Resolved): git configuration issues in Jenkins slaves
- It looks like it was just one host that needed this, copied the settings from ~/.gitconfig on the Jenkins server.
- 08:31 AM devops Bug #9992: git configuration issues in Jenkins slaves
- It looks like it is this on the Jenkins server itself:...
- 08:16 AM devops Bug #9992 (Resolved): git configuration issues in Jenkins slaves
- ...
- 09:06 AM rgw Bug #7796 (Resolved): RGW Keystone token auth fails with '411 Length Required' when Keystone usin...
- 09:02 AM rgw Bug #8587 (Resolved): rgw: subuser object not created correctly
- 08:44 AM Bug #9987: mon: min_last_epoch_complete tracking broken
- BTW, we compact our dumpling mon leveldb _without_ restarting. We do
ceph tell mon.0 compact
and that can s... - 04:17 AM devops Bug #9697: exitcode of gatherkeys has changed the latests versions
- I placed a new issue in the correct project: #9991
- 12:03 AM rados-java Bug #9990 (Resolved): Rbd.list() / JVM crashes
- If I call Rbd.list() to get all the available images, my JVM mostly crashes (3).
Sometimes, if it does not crash... - 12:00 AM rados-java Bug #9989 (Resolved): Rbd.list() / more than 1024 images in pool?
- Is your list()-implementation limited to 1024 images because of
IntByReference size = new IntByReference(1024);
...
11/02/2014
- 11:58 PM rados-java Bug #9988 (Resolved): Rbd.list() / list contains one element, if pool is empty
- If no image is in the pool, the list contains one empty ("") element but should contain 0 elemtents.
- 03:16 PM Feature #9954 (Resolved): buffer: method to ensure an extent is contiguous
- 07:29 AM Feature #9954: buffer: method to ensure an extent is contiguous
- how about wip-buffer? can certainly optimize the rebuild case, but i think here we expect to never hit it. https://...
- 02:53 PM Bug #9986: objecter: map epoch skipping broken
- https://github.com/ceph/ceph/pull/2851
- 02:47 PM Bug #9986 (Resolved): objecter: map epoch skipping broken
- 02:52 PM Bug #9987: mon: min_last_epoch_complete tracking broken
- https://github.com/ceph/ceph/pull/2850
- 02:49 PM Bug #9987 (Resolved): mon: min_last_epoch_complete tracking broken
- When we moved to pulling pgmap values out of keys we broke the min_last_epoch_clean invalidation code.
I suspect t... - 07:35 AM Feature #9926 (Resolved): AsyncMessenger: Support kqueue interface for BSD and mac osx OS
- 02:18 AM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
- Sage, everyone - when will approximately this exact fix land into master? It effectively blocks our testing progress ...
11/01/2014
- 02:00 PM Bug #9976: ceph cli injectargs parsing broken
- I did not see this one, sorry about that.
10/31/2014
- 05:27 PM Feature #9981: osd: cache: proxy writes (instead of unconditionally promoting)
- One thing we'll need to be careful about when not promoting is how we handle snapshots. I don't remember exactly how ...
- 01:55 PM Feature #9981 (Resolved): osd: cache: proxy writes (instead of unconditionally promoting)
- This should work similar to the read recency checks that don't always promote on first read, but give the cache osd a...
- 05:16 PM Bug #9985 (Resolved): osd: incorrect atime calculation
- https://github.com/ceph/ceph/pull/2816 should be backported
- 05:10 PM CephFS Tasks #3680 (Rejected): deduplication in ceph
- we should discuss this on the email list
- 04:38 PM RADOS Bug #9984: lttng_probe_unregister hangs on shutdown
- maybe related to dynamic+static linking of lttng?
- 04:16 PM RADOS Bug #9984 (New): lttng_probe_unregister hangs on shutdown
- ...
- 04:31 PM Bug #9976 (Resolved): ceph cli injectargs parsing broken
- 02:25 PM Bug #9976: ceph cli injectargs parsing broken
- 02:25 PM Bug #9976: ceph cli injectargs parsing broken
- close; needs "if injectargs and ..", but that seems good
- 02:17 PM Bug #9976: ceph cli injectargs parsing broken
- Maybe as simple as...
- 09:11 AM Bug #9976 (Resolved): ceph cli injectargs parsing broken
- looks like it was the recent -- handling that broke?...
- 02:21 PM phprados Feature #424: Stream wrappers
- Charles du Jeu wrote:
> Hi! Maybe I'm totally at the wrong place, if so, sorry for that.
> Was there any work done... - 08:27 AM phprados Feature #424: Stream wrappers
- Hi! Maybe I'm totally at the wrong place, if so, sorry for that.
Was there any work done on that (streamwrapper imp... - 02:14 PM Bug #9983 (Resolved): Cleanup boost optionals for boost 1.56
- This patch cleans up fatal errors with boost 1.56 when implicitly converting optionals to non-optional values.
It ... - 01:58 PM RADOS Feature #9982 (New): osd: cache: make writes in readonly mode invalidate and then forward
- 01:36 PM Feature #9980 (Resolved): osd: cache: proxy reads during promote
- wip-promote-forward may be a useful base, although I think it is not quite correct (we should proxy reads, not forwar...
- 01:35 PM Feature #9979 (Resolved): osd: cache: proxy reads (instead of redirect)
- 01:12 PM Bug #9974 (Won't Fix): Osd-s bind only to 1st network in "public network"
- OSDs bind and listen on only a single IP by design. Changing it would require major changes to how we handle identity...
- 04:58 AM Bug #9974 (Won't Fix): Osd-s bind only to 1st network in "public network"
- OSD daemons bind only to first network in ceph.conf "public network" parameter.
ceph.conf:
[global]
cluster netw... - 10:57 AM Bug #9978 (Closed): keyvaluestore: void ECBackend::handle_sub_read
- On 0.87 "Giant" I'm repeatedly hit by the following assert, typically crashing 4 ODSs at once:...
- 10:48 AM CephFS Bug #9977 (Resolved): cephfs-journal-tool falsely reports invalid start_ptr
This is happening when the journal expire_pos isn't at an object boundary. The expected start_ptr counter is being...- 10:03 AM CephFS Feature #1398: qa: multiclient file io test
- ...
- 09:20 AM RADOS Bug #9911 (Rejected): ceph not placing replicas to OSDs on same host as down/out OSD
- ah, it's because the vary_r tunable is false. we fixed this bug in firefly. switching to firefly tunables will reso...
- 08:35 AM Bug #9752 (Resolved): acting in past intervals contains primary and up_primary (looks like duplic...
- 02:27 AM Bug #9752 (Fix Under Review): acting in past intervals contains primary and up_primary (looks lik...
- * firefly https://github.com/ceph/ceph/pull/2847
* giant https://github.com/ceph/ceph/pull/2846
- 08:28 AM devops Tasks #8366: Update ceph.com/docs to default to the latest major release (0.80)
- John any updates on this? It is a bummer that we have all the infrastructure/services ready to deal with the redirect...
- 08:26 AM Bug #9503: Dumpling: removing many snapshots in a short time makes OSDs go berserk
- Any news on the backport?
- 08:19 AM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
- Merged the userspace version of this; is there a separate ticket for that?
- 06:19 AM devops Feature #8303: ceph-extra packages for newer Ubuntu versions
- Bump.. ceph-extras still not available for Trusty 14.04.
- 04:32 AM rgw Bug #9973 (Resolved): Validation of Swift DLO manifest object ETag doesn't match OpenStack Swift ...
- The way the RGW Swift API validates the ETag on DLO manifest objects does not match the way the OpenStack Swift imple...
10/30/2014
- 09:32 PM Bug #9971 (Duplicate): OSD crashes again after restarting due to op thread time out at writing pg...
- This crashes observed when one OSD was restarted after being down for a long time, it crashed again
because its op t... - 08:39 PM Feature #9954: buffer: method to ensure an extent is contiguous
- Haomai Wang wrote:
> Hmm, just a another approach.
> Maybe we can use another interface called "get_range" for the ... - 06:49 PM Feature #9954: buffer: method to ensure an extent is contiguous
- Hmm, just a another approach.
Maybe we can use another interface called "get_range" for the same goal.
| 1M byte... - 01:34 PM Feature #9954 (Resolved): buffer: method to ensure an extent is contiguous
- Add a method to assure that an extent in a bufferlist is contigous. Something like
bufferlist bl;
...
char *... - 08:33 PM Feature #9966: librados: set user_version operation
- My recollection is that we preserve them when moving objects in/out of the cache tier. I assume we want them to also...
- 06:38 PM Feature #9966: librados: set user_version operation
- What's the purpose of this? User versions are "user" only in the sense that they're the versions we expose to them as...
- 02:08 PM Feature #9966 (New): librados: set user_version operation
- 08:26 PM Feature #9953: osd: efficient ObjectStore::Transaction encoding
- haomai's slides
- 01:31 PM Feature #9953 (Resolved): osd: efficient ObjectStore::Transaction encoding
- Haomai and Dong proposed a vastly improved Transaction encoding during CDS. Video is here:
https://www.youtube.co... - 06:33 PM Bug #9752 (Pending Backport): acting in past intervals contains primary and up_primary (looks lik...
- 04:53 PM Bug #9752 (Fix Under Review): acting in past intervals contains primary and up_primary (looks lik...
- https://github.com/ceph/ceph/pull/2843
- 02:36 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- On a cluster running from sources with...
- 01:52 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- ...
- 01:44 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- Fortunately I saved an entire osd directory from which I was able to extract osdmaps with duplicates related to attac...
- 10:36 AM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- ...
- 10:33 AM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- Could it be that the "the acting vector":https://github.com/ceph/ceph/blob/giant/src/osd/osd_types.h#L1391 size is no...
- 03:26 PM Bug #9970 (Resolved): document erasure coded pool simple operations
- Move part of http://ceph.com/docs/master/dev/erasure-coded-pool/#interface to the rados operation guide and fix the i...
- 03:01 PM Bug #9969 (Can't reproduce): osd: crash in delete, tcmalloc, PGLog::write_log (dumpling)
- ...
- 02:17 PM Feature #8633 (Duplicate): allow writes before recovering a replica
- see #7861
- 02:10 PM RADOS Feature #9967 (New): rados: pool rollback
- roll back an entire pool to a previous snapshot. this is O(n): we enumerate objects and call rollback() on each one.
- 02:07 PM Feature #9965 (New): rados: new import from pipe/file
- - use file format from ceph_objectstore_tool and new export (#9964)
- take care to preserve snapshot state
- preser... - 02:05 PM Feature #9964 (Resolved): rados: new export [range] to pipe/file
- - export a range of hash values (or the entire pool) to stdout (or a file).
- use the same format that ceph_objectst... - 02:03 PM Feature #9963 (Fix Under Review): librados: improve get_objects and get_position interfaces
- The requirement is that export (or some other user) needs to be able to
#. partition the hash space into N segment... - 01:59 PM Feature #9962 (In Progress): osd: kill 'category' in stats and public API
- 01:58 PM Feature #9962 (Resolved): osd: kill 'category' in stats and public API
- 01:56 PM Feature #9961 (Resolved): osd: new MOSDClientSubOp and Reply
- Discussed during CDS here:
http://pad.ceph.com/p/hammer-fixed_memory_layout
http://youtu.be/CTp4eP9kPok
Create... - 01:48 PM Feature #9960 (Resolved): osd: adjust hint(s) for replica vs primary writes
- We should generally DONTNEED on replicas, regardless of what the client asked us to do.
- 01:48 PM Feature #9959 (Resolved): osd: pass client fadvise hints through to objecstore
- 01:47 PM Feature #9958 (Resolved): osd: add fadvise op to Objectstore::Transaction
- Add fadvise op to ObjectStore::Transaction. Mirror posix_fadvise(2).
See #9957. - 01:45 PM Feature #9957 (Resolved): librados: add fadvise op
- Add an fadvise operation to ObjectOperation. Mirror posix_fadvise(2).
Add it right around here: https://github.co... - 01:45 PM Feature #9956 (Resolved): osd: reenable alloc hints if kernel is known to be safe
- 01:42 PM Bug #9480 (Resolved): OSD is crashing while object deletion
- 01:37 PM Feature #9955 (Resolved): osd: allow encoded bufferlist to be used in place of map<K,V> for kv APIs
- This will avoid encode/reencode overhead to convert things to an STL structure. Eventually, once we pass through the...
- 01:31 PM RADOS Feature #9952: osd: smarter choice of primary to minimize recovery disruption
- We currently choose the first up osd as the primary unless it is impossible to do so. But, we can do better: other o...
- 01:30 PM RADOS Feature #9952 (New): osd: smarter choice of primary to minimize recovery disruption
- 01:29 PM Feature #7862 (In Progress): allow backfill/recovery while below min_size
- 01:28 PM Feature #9951 (New): librados, osd: per-object scrub operation
- librados operation to scrub a single object.
- 01:27 PM Feature #9950 (New): rados: add ability to read a specific replica/shard from CLI
- 01:25 PM Feature #9949 (New): librados: add ability to read a specific replica or shard
- Part of make scrub/repair work is being able to explicitly fetch any copy or shard of an object. Extend librados to ...
- 01:24 PM Feature #9948 (New): osd: add scrub result query interface
- This will use the admin interface (ceph tell <pgid> ...), similar to 'ceph tell <pgid> query'. results in json. see...
- 01:24 PM Feature #9947 (New): osd: store scrub error state in kv store; clear on peering event
- 11:39 AM Bug #9944 (Pending Backport): objecter: pool dne checks not correct
- 10:56 AM Bug #9944 (Fix Under Review): objecter: pool dne checks not correct
- https://github.com/ceph/ceph/pull/2839
- 09:06 AM Bug #9944 (Resolved): objecter: pool dne checks not correct
- ...
- 11:08 AM Bug #9942 (Won't Fix): Debian armhf packages are missing in latest repo updates for Debian in Fir...
- we don't (and never have) built armhf packages for ceph.com.
we do have a bunch of armv7l hardware and did build... - 04:26 AM Bug #9942 (Won't Fix): Debian armhf packages are missing in latest repo updates for Debian in Fir...
- I'm trying to install Ceph with ceph-deploy on a armhf cluster but it failed:
[MS0][ERROR ] RuntimeError: command ... - 10:35 AM Bug #9750: pg incomplete
- ...
- 10:11 AM CephFS Feature #1398: qa: multiclient file io test
- A task that implements this could be useful for testing calamari as well (I manually did some of the things needed he...
- 10:08 AM CephFS Feature #1398 (In Progress): qa: multiclient file io test
- 10:04 AM Bug #9945 (Resolved): giant: MClientSession COMPAT_VERSION is 2, should be 1
- yup!
- 09:52 AM Bug #9945 (Fix Under Review): giant: MClientSession COMPAT_VERSION is 2, should be 1
- https://github.com/ceph/ceph/pull/2837
https://github.com/ceph/ceph/pull/2838 - 09:41 AM Bug #9945 (Resolved): giant: MClientSession COMPAT_VERSION is 2, should be 1
- 09:37 AM CephFS Feature #9881 (In Progress): mds: admin command to flush the mds journal
- 07:55 AM Bug #9916: osd: crash in check_ops_in_flight
- The crash happened with radosgw as the client, so I guess it is formed by objecter - https://github.com/ceph/ceph/blo...
- 04:36 AM Feature #9943 (In Progress): osd: mark pg and use replica on EIO from client read
- Copy the below email thread and open an issue to track the enhancement....
- 02:56 AM Bug #9941 (Rejected): rados command line crashes when trying to copy pool snapshot
- We are exploring options to regularly preserve the contents of the pools backing our rados gateways. For that we crea...
- 12:53 AM Bug #8797: "ceph status" do not exit with python_2.7.8
- Just a note that people are hitting this in fedora 21, now:
https://bugzilla.redhat.com/show_bug.cgi?id=1155335
10/29/2014
- 09:34 PM CephFS Feature #9940: uclient: be more robust when dealing with outstanding RADOS IO and stale caps
- While in the general case it is necessary to fence clients that have become unresponsive to the MDS, this type of "so...
- 09:23 PM CephFS Feature #9940 (New): uclient: be more robust when dealing with outstanding RADOS IO and stale caps
- If we've given IO to the Objecter and our caps go stale, we need to do something to handle it.
- 09:06 PM CephFS Bug #1666 (Resolved): hadoop: time-related meta-data problems
- We now take client timestamps for almost everything, so this should no longer be a problem and I'm closing it unless ...
- 07:13 PM Bug #9939 (Resolved): "giant" no longer log scrub errors
- Scrubbing problematic PGs no longer report found errors: there no more records of discovered errors in ...
- 02:49 PM Bug #9916 (Need More Info): osd: crash in check_ops_in_flight
- how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len by...
- 02:45 PM Bug #9910 (Pending Backport): ceph_test_rados: out of order, probably due to message delay logic
- 11:22 AM Bug #9910 (Fix Under Review): ceph_test_rados: out of order, probably due to message delay logic
- https://github.com/ceph/ceph/pull/2832
- 01:16 PM Feature #9776: try to make address sanitizer work
- Ok, so the gcc version required to make this work is only a month or two old (dynamic linking bug fix). So, we're go...
- 01:11 PM Bug #9875 (Pending Backport): stuck recovering due to unfound hit_set object
- 11:44 AM rbd Bug #9936: Exporting images larger than 2GB fails
- PR: https://github.com/ceph/ceph/pull/2828
- 11:43 AM rbd Bug #9936 (Resolved): Exporting images larger than 2GB fails
- An lseek64 result code is copied into an int32, causing an overflow for large images.
- 11:37 AM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
- Sorry, forgot that the majority agreement does not work with two replicas. Everything is ok now.
- 10:44 AM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
- Andrey Korolyov wrote:
> Can confirm placement mess on giant: I am backfilling one node from another one within two-... - 10:41 AM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
- Can confirm placement mess on giant: I am backfilling one node from another one within two-node cluster. After today`...
- 11:17 AM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
- The very first error message is:...
- 10:37 AM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
- ...
- 08:30 AM Linux kernel client Bug #9928: kernel BUG at fs/ceph/caps.c:2307!
- MDS cache dump at ~/jcsp/9928/cachedump.1870.mds0 on teuthology.
This was taken at around 0800 local, long after t... - 07:55 AM Linux kernel client Bug #9928 (Resolved): kernel BUG at fs/ceph/caps.c:2307!
Client's view of its operations:...- 11:04 AM CephFS Bug #9935: client: segfault on ceph_rmdir path "/"
- Yes, EBUSY is what a local filesystem gives you, so that sounds right to me.
- 10:48 AM CephFS Bug #9935 (Resolved): client: segfault on ceph_rmdir path "/"
- A segfault occurs when removing the root directory. What is the expected behavior? I think -EBUSY is what makes sense.
- 10:00 AM Bug #9891: "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x-giant-dist...
- does not appear to be a ceph issue.. either bad disk or leveldb corruption or something. lowering priority.
- 09:54 AM rgw Documentation #9934 (Closed): rgw: document backing pool capabilities and API usage
- Document what RGW is capable of in terms of defining multiple backing RADOS pools and how they can be used via the S3...
- 09:52 AM rgw Feature #9933 (New): rgw: implement S3 RR (reduced redundancy) API
- - mark a particular backing pool as the 'rr' one
- make RGW understand the S3 API for RR and use that backing pool f... - 09:51 AM rgw Feature #9932 (Resolved): rgw: map swift X-Storage-Policy header to rgw pools
- This will let people use the new Swift "storage policies" API to use the preexisting RGW functionality
- 09:29 AM Subtask #9931 (New): create selinux policies for ceph-mon, ceph-osd, ceph-mds
- From an internal red hat discussion:
There are probably three distinct things we need to do to get cephs and
SELi... - 09:27 AM Cleanup #9930 (New): gtest: update, move to submodule
- the version we have is very old. update to a newer version, and possibly/probably move to a submodule.
- 05:25 AM Bug #9927: RHEL: selinux-policy-targeted rpm update triggers slow requests
- Here's a solution:...
- 03:46 AM Bug #9927: RHEL: selinux-policy-targeted rpm update triggers slow requests
- It is triggered by fixfiles -C /etc/selinux/targeted/contexts/files/file_contexts.pre restore...
- 03:35 AM Bug #9927 (Can't reproduce): RHEL: selinux-policy-targeted rpm update triggers slow requests
- We observe slow requests while updating a server to RHEL6.6. The upgrade includes selinux-policy-targeted, which runs...
- 12:11 AM Bug #9919 (Resolved): tests: qa/workunits/cephtool/test.sh injectargs instability
10/28/2014
- 10:52 PM Feature #9926 (Resolved): AsyncMessenger: Support kqueue interface for BSD and mac osx OS
- 09:14 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
- ...
- 09:08 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
- wip-9910
- 09:00 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
- yeah, almost certain this is a bug with delayed messages. testing a fix.
ubuntu@teuthology:/a/sage-bug-9910-a/576723 - 04:25 PM Bug #9910 (In Progress): ceph_test_rados: out of order, probably due to message delay logic
- reproducing with client ms logs
- 05:35 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
I happen to notice the issue because I happen to look at this guys pastebin. I didn't interact with him at all. N...- 03:46 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- It is unfortunately gone...
- 03:00 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- First thing we want to get is an osdmap from the misbehaving epoch.
Loic: you can get the osdmap for a particular ... - 02:21 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- Actually, that thread is the same instance as david's.
- 02:10 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- See the thread "[ceph-users] Troubleshooting Incomplete PGs" for another instance of this (and there are several more...
- 05:08 PM Bug #9921: msgr/osd/pg dead lock giant
- https://github.com/ceph/ceph/pull/2825
- 04:56 PM Bug #9921 (Fix Under Review): msgr/osd/pg dead lock giant
- wip-9921, totally untested.
- 02:51 PM Bug #9921: msgr/osd/pg dead lock giant
- From what I recall, none of these are simple locks to get rid of. I'm not actually sure how to go about it; even some...
- 02:14 PM Bug #9921: msgr/osd/pg dead lock giant
- SimpleMessenger lock is held by an accepting Pipe trying to replace an old Pipe:...
- 01:50 PM Bug #9921: msgr/osd/pg dead lock giant
- nvm, different deadlock
- 01:49 PM Bug #9921 (Duplicate): msgr/osd/pg dead lock giant
- just kidding, this appears to be 9898
- 11:03 AM Bug #9921 (Resolved): msgr/osd/pg dead lock giant
- commit:2d6980570af2226fdee0edfcfe5a8e7f60fae615
/a/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-multi/5... - 03:42 PM Bug #9750: pg incomplete
- I'm afraid these maps are lost...
- 03:22 PM Bug #9750: pg incomplete
- Yeah, you'll want maps from back when the acting set was wonky. Might want to look into the past intervals code perh...
- 02:25 PM Bug #9919 (Fix Under Review): tests: qa/workunits/cephtool/test.sh injectargs instability
- https://github.com/ceph/ceph/pull/2823
- 09:42 AM Bug #9919 (Resolved): tests: qa/workunits/cephtool/test.sh injectargs instability
- By modifying *osd_debug_drop_ping_probability = '444'* it introduces a side effect on the cluster that can create pro...
- 12:43 PM CephFS Bug #9900 (Duplicate): Failure in multiple_rsync (directories wrongly appear changed)
- I imagine this is a dup of #9894?
- 12:24 PM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
- I bet there is another trace of this somewhere, no rcu stall, just plain NULL deref in rb_erase(). Will try to inves...
- 11:36 AM Linux kernel client Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
- Got reports of the 2nd trace (http://tracker.ceph.com/issues/5429#note-7) occuring on a kernel with the notify fixes.
- 12:18 PM CephFS Bug #9800 (Pending Backport): client-limits test is not passing
- I don't know that we need/want to try and push this in before release (although since it's all guarded inside of a br...
- 05:29 AM CephFS Bug #9800 (Resolved): client-limits test is not passing
- ...
- 11:21 AM Bug #9920: admin socket check hang, osd appears fine
- Hmm, osd.4 seems fine, not sure why the admin socket check didn't work.
- 10:00 AM Bug #9920 (Can't reproduce): admin socket check hang, osd appears fine
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-27_17:18:01-upgrade:firefly-x-giant-distro-basic-v...
- 11:12 AM CephFS Bug #8255 (Fix Under Review): mds: directory with missing object cannot be removed
- https://github.com/ceph/ceph/pull/2821
- 08:54 AM Bug #9288 (Resolved): "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vp...
- 08:52 AM rgw Bug #9866 (Resolved): "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-firefly-...
- 04:23 AM rgw Bug #9918: RGW-Swift: SubUser access permissions, does not seems to work
- 2014-10-28 16:43:28.776693 7f5cd87c0700 1 civetweb: 0x7f5d2c0093f0: 127.0.0.1 - - [28/Oct/2014:16:43:28 +0530] "GET ...
- 04:18 AM rgw Bug #9918 (Resolved): RGW-Swift: SubUser access permissions, does not seems to work
- Create users and sub-users in generic development env:-
This is relevant json DS:-
{ "user_id": "user1",
"disp... - 03:58 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
- 2014-10-28 15:59:41.468515 7f0863fef700 20 RGWEnv::set(): HTTP_HOST: localhost:8000
2014-10-28 15:59:41.468583 7f086... - 03:58 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
- able to create rados object:-
#./ceph osd pool create mypool 20 20 erasure
DEVELOPER MODE: setting PATH, PYTHONPA... - 03:56 AM rgw Bug #9917 (Won't Fix): RADOSGW: Not able to create Swift objects with erasure coded pool
- ceph@Ubuntu14:~/ceph-0.86/src$ MON=3 MDS=0 RGW=1 OSD=3 ./vstart.sh -d -n -x -r
going verbose **
[./fetch_config /tm...
10/27/2014
- 10:21 PM Bug #9916 (Resolved): osd: crash in check_ops_in_flight
- Assertion failure:...
- 07:44 PM Bug #9915 (Resolved): osd: eviction logic reversed
- commit:622c5ac
- 06:17 PM CephFS Feature #4138 (Fix Under Review): MDS: forward scrub: add functionality to verify disk data is co...
- This bit at least has been isolated and put into a PR:
https://github.com/ceph/ceph/pull/2814 - 04:56 PM Bug #9910: ceph_test_rados: out of order, probably due to message delay logic
- another one: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-m...
- 01:15 PM Bug #9910 (Resolved): ceph_test_rados: out of order, probably due to message delay logic
- /a/samuelj-2014-10-24_23:51:24-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/571220
* commit:f7431cc... - 04:23 PM CephFS Bug #9870 (Resolved): kernel: not handling cap_flush_ack messages properly
- 03:33 PM Linux kernel client Bug #9894: kcephfs: rm -r left files behind
- 03:25 PM rbd Bug #9391: fio rbd driver rewrites same blocks
- @Mark: I have to take a look at fio for that. Is this all about sequential writes only? Do you see a different behavi...
- 02:44 PM Bug #9891: "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x-giant-dist...
- 2014-10-25 18:56:23.243456 7fefdcc3e700 20 filestore dbobjectmap: seq is 485
2014-10-25 18:56:23.243559 7fefdd43f700... - 02:31 PM Bug #9913 (Resolved): mon: audit log entires for forwarded requests lack info
- ...
- 02:27 PM Bug #9912 (Won't Fix): ceph osd up # not a valid command in 0.80.7
- there is no way to administratively make an osd 'up'. the daemon needs to go through it's startup procedure and join...
- 02:24 PM Bug #9912 (Won't Fix): ceph osd up # not a valid command in 0.80.7
- There is a valid command for setting an osd down:...
- 02:16 PM RADOS Bug #9911: ceph not placing replicas to OSDs on same host as down/out OSD
- ceph -s output with an OSD down and type host:...
- 02:11 PM RADOS Bug #9911 (Rejected): ceph not placing replicas to OSDs on same host as down/out OSD
- On a 3 node firefly cluster with 6 OSDs per host and 3x replication, when noup is set and 1 OSD is marked down/out, a...
- 01:38 PM Feature #9598: re-enable Objecter fast dispatch
- 01:13 PM Bug #9909 (Resolved): lost_unfound test/rados tool flawed, EEXIST when putting empty object
- ubuntu@teuthology:/a/samuelj-2014-10-24_23:51:24-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/571037
... - 01:09 PM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
- ubuntu@teuthology:/a/samuelj-2014-10-24_23:51:24-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/571474/r...
- 11:31 AM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
- You mean #9226 ?
- 11:16 AM rgw Bug #9907 (Resolved): radosgw-admin: can't disable max_size quota
- From pull request, by Dong Lei:...
- 11:05 AM Linux kernel client Feature #9906 (Resolved): Inline data support
Currently the fuse client supports CEPH_FEATURE_MDS_INLINE_DATA but the kernel client does not.
- 10:28 AM CephFS Bug #9904 (Resolved): Don't crash MDS on clients sending messages with bad seq
- Currently in Server::handle_client_session, we do this:...
- 10:14 AM CephFS Feature #9903 (Resolved): Recover lost dirfrag via data pool
[While the MDS cluster is offline and journal has been flushed if necessary]
Given that a particular dirfrag obj...- 10:10 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Ok, let me know what happens.
- 10:09 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Nothing reported from valgrind. Also haven't seen crashes lately. At this point I'm thinking the issues were corrup...
- 10:06 AM Feature #9902 (Duplicate): Tool for RADOS import/export pool to file
To assist with CephFS disaster recovery, provide the ability to dump an entire pool (the cephfs metadata pool) to a...- 10:00 AM Support #9901 (New): libgoogle-perftools4: tcmalloc performance regression on armhf
- Just to keep track of https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=766986
- 09:36 AM CephFS Bug #9900 (Duplicate): Failure in multiple_rsync (directories wrongly appear changed)
http://pulpito.ceph.com/teuthology-2014-10-24_23:08:01-kcephfs-giant-testing-basic-multi/570840/
http://pulpito.ce...- 09:34 AM rgw Bug #9892: radosgw_admin.py: failed len(out['entries']) == 0 on usage show
- seem like a broken test. We write an object here:...
- 09:09 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
- Seems that the slow_backend param has not been applied on the s3tests giant branch.
- 09:08 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
- in latest run, still trying to copy the 100M:...
- 09:09 AM devops Bug #9747 (Resolved): ceph.spec.in will always use 95-ceph-osd-alt.rules
- 08:24 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
- Update in run http://pulpito.front.sepia.ceph.com/teuthology-2014-10-26_18:13:01-upgrade:firefly-x-giant-distro-basic...
- 06:05 AM CephFS Bug #9800: client-limits test is not passing
- https://github.com/ceph/ceph/pull/2809
http://pulpito.front.sepia.ceph.com/john-2014-10-27_13:05:29-fs:recovery-wip-...
10/26/2014
- 07:54 PM Bug #9895 (Duplicate): Master/giant branch: OSD deadlock during recovery
- #9898
- 11:24 AM Bug #9895 (Duplicate): Master/giant branch: OSD deadlock during recovery
- Given eight-OSD, two-node cluster (node01 and node04), three mons (node01, node04, twin2). OSDs placed on node04 acts...
- 04:51 PM rbd Bug #9391: fio rbd driver rewrites same blocks
- Hi Guys,
This is all on the fio side. From what I remember, when you are doing sequential writes and specify mult... - 03:33 PM rgw Bug #9899 (Resolved): Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-d...
- Seems related to rgw and 3-upgrade-sequence/upgrade-osd-mon-mds.yaml configurations...
- 02:33 PM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
- Looks like the same as I reported some hours before: #9895. Please close mine or this one as a duplicate.
- 12:19 PM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
- ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-10-24_21:12:40-rados-wip-sam-testing-distro-basic-multi/570144
- 12:18 PM Messengers Bug #9898: osd: fast dispatch deadlock in mark_down (giant)
- full backtrace
- 12:17 PM Messengers Bug #9898 (Resolved): osd: fast dispatch deadlock in mark_down (giant)
- this is basically a dup of the issue we saw with fast dispach in the objecter, but with the osd....
- 11:49 AM rbd Bug #9855 (Resolved): rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-...
- fixed test
- 11:48 AM Linux kernel client Bug #9896: krbd: EPERM from map-snapshot-io.sh
- ubuntu@teuthology:/a/teuthology-2014-10-24_23:06:01-krbd-giant-testing-basic-multi/570827 too
- 11:48 AM Linux kernel client Bug #9896 (Resolved): krbd: EPERM from map-snapshot-io.sh
- ...
- 11:24 AM Linux kernel client Bug #9894 (Resolved): kcephfs: rm -r left files behind
- ...
- 11:21 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
- also
ubuntu@teuthology:/a/teuthology-2014-10-24_23:02:01-rgw-giant-distro-basic-multi/570719
ubuntu@teuthology:/a/t... - 11:16 AM rgw Bug #9148: rgw: multiregion tests failing, s3tests.functional.test_s3.test_region_copy_object
- teuthology-2014-10-24_23:02:01-rgw-giant-distro-basic-multi/570701 fails with slow_backend:true on giant....
- 11:19 AM rgw Bug #9892 (Resolved): radosgw_admin.py: failed len(out['entries']) == 0 on usage show
- ...
- 08:42 AM Bug #9891 (Resolved): "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x...
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-25_18:13:01-upgrade:firefly-x-giant-distro-basic-m...
- 05:03 AM Subtask #9890: mon: VIRT usage 2.4G larger than tcmalloc's VIRT stats (dumpling, centos6.3)
- forgot to mention that leveldb stores for all mons are several GB large, even after compaction:...
- 04:57 AM Subtask #9890: mon: VIRT usage 2.4G larger than tcmalloc's VIRT stats (dumpling, centos6.3)
- mon.c (in quorum) is being the synchronization provider for mon.b (restarted with valgrind memcheck).
mon.c's spik... - 04:39 AM Subtask #9890 (Can't reproduce): mon: VIRT usage 2.4G larger than tcmalloc's VIRT stats (dumpling...
* centos 6.3
* ceph version 0.67.11 (bc8b67bef6309a32361be76cd11fb56b057ea9d2)
* Stressing the monitors with qa...- 04:26 AM Bug #9889 (Closed): mon: leveldb weirdness
- Inquiries on leveldb on the monitors and weirdness sometimes associated.
This ticket is being used to track severa...
10/25/2014
- 11:04 AM Feature #9888: AsyncMessenger: Async event threads can shared by all AsyncMessenger
- +1
- 07:32 AM Feature #9888 (Resolved): AsyncMessenger: Async event threads can shared by all AsyncMessenger
- Now, each AsyncMessenger will create "ms_async_op_threads" threads which will process incoming/outcoming connections....
10/24/2014
- 09:27 PM Bug #9727: 0.86 EC+ KV OSDs crashing
- Not sure, I'm still waiting for crash for master branch
- 06:10 AM Bug #9727: 0.86 EC+ KV OSDs crashing
- will this be an issue solved in Giant?
- 02:38 PM Bug #9746: reconcile upstream ceph.spec.in with other ceph.spec (SuSE, EPEL, etc)
- https://build.opensuse.org/package/show/home:netsroth/ceph
- 02:06 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Still no crashes under valgrind? How many osds are running under valgrind? We should probably leave it running for ...
- 01:20 PM Bug #9480 (Pending Backport): OSD is crashing while object deletion
- 11:40 AM rgw Bug #9866: "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-firefly-distro-basi...
- yuri, please close it when we get a pass on the nightlies.
- 11:35 AM rgw Bug #9886 (Resolved): rgw: apache 2.4 does not send http status reason string
- There's an issue with certain apache 2.4 versions, where it doesn't send back the http status reason in the response....
- 11:34 AM rgw Bug #9878 (Pending Backport): rhel7 s3-tests fail due to missing reason
- commit:a9dd4af
- 11:26 AM rbd Bug #8912: librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)
- For better searchability, the backtrace for this crash is:...
- 11:24 AM rbd Bug #9513 (Pending Backport): rbd_cache=true default setting is degading librbd performance ~10X ...
- reverted the backport for now as fully fixing the ObjectCacher is too large a change close to the giant release
- 11:14 AM CephFS Bug #9884: too many files in /usr for multiple_rsync.sh
- Yeah, just cutting it down to a more predictable/smaller directory sounds good to me.
- 10:50 AM CephFS Bug #9884: too many files in /usr for multiple_rsync.sh
- one failure http://pulpito.ceph.com/teuthology-2014-10-20_23:04:01-fs-giant-distro-basic-multi/562537/
- 10:49 AM CephFS Bug #9884 (Closed): too many files in /usr for multiple_rsync.sh
- for example, plana81 has 60k files in /usr, but plana90 has 90k files in /usr. perhaps multiple_rsync should /usr/src...
- 10:51 AM Bug #9873 (Resolved): rados bench crash
- 10:15 AM Bug #9873: rados bench crash
- ubuntu@teuthology:/a/samuelj-2014-10-23_17:44:53-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/567665
- 09:38 AM Bug #9873 (Fix Under Review): rados bench crash
- https://github.com/ceph/ceph/pull/2795
- 09:07 AM Bug #9873 (In Progress): rados bench crash
- 09:53 AM CephFS Feature #3882 (Rejected): Hide snapshot directory name in mount/mtab
- we can now restrict snap access by uid...
- 09:49 AM CephFS Feature #9883 (Resolved): journal-tool: smarter scavenge (conditionally update dir objects)
- 09:42 AM CephFS Feature #9881 (Resolved): mds: admin command to flush the mds journal
- 09:41 AM CephFS Feature #9880 (Resolved): mds: more gracefully handle EIO on missing dir object
- 08:53 AM rgw Bug #9877: In some cases it's possible for rgw to segfault on http COPY
- looks like #9266.
10/23/2014
- 09:35 PM rgw Bug #9878: rhel7 s3-tests fail due to missing reason
- 06:10 PM rgw Bug #9878 (Resolved): rhel7 s3-tests fail due to missing reason
- commit:a9dd4af401328e8f9071dee52470a0685ceb296b
- 06:08 PM rgw Bug #9169 (Resolved): 100-continue broken for centos/rhel
- 04:58 PM rgw Bug #9877 (Resolved): In some cases it's possible for rgw to segfault on http COPY
on 0.80.4
-81> 2014-10-23 22:22:05.586898 7f83547f8700 1 ====== starting new request req=0x7f8368013400 ==...- 03:03 PM Bug #9876 (Resolved): failed pull needs to allow mark_unfound_lost revert eventually
- 01:50 PM rgw Bug #9616 (Resolved): upgrade test restarts rgw, test gets 500
- 01:47 PM CephFS Bug #9869 (Pending Backport): Client: not handling cap_flush_ack messages properly
- I tested this manually with a patch that sets the starting tid value to 65535 and looking at the logs. That causes im...
- 01:47 PM rbd Bug #9854: librbd: reads contending for cache space can cause livelock
- Reads thrashing the cache can be reproduced with:...
- 01:44 PM Bug #9821 (Pending Backport): failed to recover before timeout expired
- 09:41 AM Bug #9821 (Fix Under Review): failed to recover before timeout expired
- 12:47 PM CephFS Bug #9870: kernel: not handling cap_flush_ack messages properly
- 12:43 PM Bug #9372: injectarg boolean option is discarded
- There is a warkaround (using --), not sure it deserves backporting.
- 12:41 PM Bug #9372 (Resolved): injectarg boolean option is discarded
- 11:38 AM rbd Feature #9733: Separate rbd listing into CAP
- Is the list of OSD class methods documented somewhere?
- 11:37 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Other details as per sjustwork on irc:
* 3-node ceph cluster, 2 OSDs per node (1ssd 1hdd). All ssds are assigned ... - 10:34 AM Bug #9731: Ceph 0.80.6 OSD crashes
- backtrace from last core...
- 10:19 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Forgot to attach latest core file from the crash prior to testing with valgrind when running wip-9731
- 11:30 AM Bug #9836: mon unit tests use the wrong id
- Although it could be backported to giant and firefly, it does not create actual problems. Only some tests use the mon...
- 11:28 AM Bug #9836 (Resolved): mon unit tests use the wrong id
- 09:59 AM Bug #9408 (Pending Backport): erasure-code: misalignment
- It can't be easily cherry picked because the code has changed. That can happen on firefly too. Backporting would make...
- 09:44 AM Bug #9874: ceph_test_rados, out of order ops
- - exec:
client.0:
- ceph osd pool create base 4
- ceph osd pool create cache 4
- ceph osd tier ad... - 08:54 AM Bug #9874 (Duplicate): ceph_test_rados, out of order ops
- 2014-10-22T17:06:21.115 INFO:tasks.rados.rados.0.burnupi60.stderr:Error: finished tid 3 when last_acked_tid was 7
20... - 09:21 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
- ubuntu@teuthology:/a/samuelj-2014-10-22_14:27:22-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/566853/r...
- 09:07 AM Bug #9875 (Resolved): stuck recovering due to unfound hit_set object
- The hitset creation log entries have the same version for version and prior_version. This causes divergent entry det...
- 08:50 AM Bug #9873 (Resolved): rados bench crash
- 2014-10-23T00:25:06.570 INFO:tasks.radosbench.radosbench.0.mira034.stderr:osdc/Objecter.cc: 3971: FAILED assert(!tick...
- 08:49 AM devops Fix #5900: Create a Python package for ceph Python bindings
- https://github.com/ceph/ceph/compare/wip-5900
- 04:01 AM rgw Feature #8562 (Fix Under Review): rgw: Conditional PUT on ETag
10/22/2014
- 09:15 PM Documentation #9872 (Closed): erasure-code: document the LRC per layer plugin configuration
- It is possible to set the profile on a per layer basis using the low level configuration http://ceph.com/docs/master/...
- 06:16 PM Bug #9731: Ceph 0.80.6 OSD crashes
- We don't really want leak-check, it is likely slowing down the osds more than necessary.
- 05:23 PM Bug #9731: Ceph 0.80.6 OSD crashes
- so far no luck replicating this with...
- 04:45 PM Bug #9731: Ceph 0.80.6 OSD crashes
- We probably want to let them run under valgrind overnight if possible.
- 03:32 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Right, I couldn't get 3/3 under valgrind to ever come up to a good health, probably because of the load on it. Howev...
- 03:27 PM Bug #9731: Ceph 0.80.6 OSD crashes
- (Last I heard, 2/3 were running valgrind, cluster is healthy)
Question: what version are the clients? - 08:16 AM Bug #9731: Ceph 0.80.6 OSD crashes
- the 3rd OSD won't join, it is now always aborting at startup. log attached. Perhaps all the starting/stopping has c...
- 08:01 AM Bug #9731: Ceph 0.80.6 OSD crashes
- after installing wip-9731 but before running under valgrind, I received a crash at 2014-10-22 10:44:42.326583 log at...
- 07:51 AM Bug #9731: Ceph 0.80.6 OSD crashes
- I've got ceph updated to the wip-9731, and am attempting to start the OSDs under valgrind. However, the first one ap...
- 05:34 PM CephFS Bug #9870 (Resolved): kernel: not handling cap_flush_ack messages properly
- This is the analogue to #9869, which Zheng tells me is also a problem in the kernel. We need to downcast the message ...
- 05:30 PM CephFS Bug #9869: Client: not handling cap_flush_ack messages properly
- Waiting for this to build so it can be tested.
- 05:28 PM CephFS Bug #9869 (Resolved): Client: not handling cap_flush_ack messages properly
- We saw a log segment that contained this:...
- 04:47 PM Fix #9566 (Fix Under Review): osd: prioritize recovery of OSDs with most work to do
- Here is a draft for review: https://github.com/ceph/ceph/pull/2778 if this sounds reasonable I'll write tests. Otherw...
- 02:47 PM Documentation #9867 (Closed): PGs per OSD documentation needs clarification
- Documentation in question:
http://ceph.com/docs/master/rados/operations/placement-groups/
http://ceph.com/docs/mast... - 02:37 PM rgw Bug #9169: 100-continue broken for centos/rhel
- The problem seem to be unrelated to the fastcgi module. The actual issue is that we're running the apache with mpm co...
- 01:15 PM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- I think it's monitor bug. It took about two hours to commit an update...
- 11:06 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- let's add debug to teh test yaml so that we have logs next time?
- 10:59 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
- there is no mds log or client log. but ceph.log on both burnupi58 and burnupi58 look strange...
- 09:31 AM Bug #9864 (Can't reproduce): osd doesn't report new stats for 3 hours when running test LibCephFS...
- ...
- 12:53 PM Bug #9480: OSD is crashing while object deletion
- 12:32 PM rgw Bug #9866 (Fix Under Review): "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-...
- https://github.com/ceph/ceph-qa-suite/pull/209
- 10:30 AM rgw Bug #9866 (Resolved): "test_s3.test_multipart_upload ... ERROR" in upgrade:firefly:older-firefly-...
- Run http://pulpito.front.sepia.ceph.com/teuthology-2014-10-21_18:40:01-upgrade:firefly:older-firefly-distro-basic-vps...
- 12:17 PM rbd Bug #9854 (In Progress): librbd: reads contending for cache space can cause livelock
- 11:41 AM rbd Bug #9854: librbd: reads contending for cache space can cause livelock
- Update:
Run teuthology-2014-10-21_23:17:01-upgrade:firefly:newer-firefly-distro-basic-vps
Job: ['565380']
Logs... - 11:35 AM Bug #9859 (Resolved): Commit 2ac2a96 appears to break OSD creation
- 10:43 AM Bug #9859: Commit 2ac2a96 appears to break OSD creation
- Problem has been identified.
This went unnoticed as vstart.sh, even with cephx disabled, always creates a keyring,... - 10:18 AM Bug #9859: Commit 2ac2a96 appears to break OSD creation
- also, 2ac2a96 is the merge commit for the branch of c0e3bc9a
- 10:11 AM Bug #9859: Commit 2ac2a96 appears to break OSD creation
- Yesterday I figured as far as the monitor not handling 'MMonGetMap' messages from the OSD during mkfs because the OSD...
- 09:59 AM Bug #9859 (In Progress): Commit 2ac2a96 appears to break OSD creation
- 11:29 AM rgw Bug #9865 (Resolved): "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distro-b...
- pushed fix to giant and firefly branches of ceph-qa-suite
- 11:19 AM rgw Bug #9865: "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distro-basic-vps run
- thrasher needs to not thrash primary affinity in this case. client connects before the primary-affinity is set so th...
- 11:10 AM rgw Bug #9865 (In Progress): "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distr...
- 10:22 AM rgw Bug #9865 (Resolved): "Assertion: osdc/ObjectCacher.cc" in upgrade:firefly:older-firefly-distro-b...
- Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-21_18:40:01-upgrade:firefly:older-firefly-distro-b...
- 10:49 AM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
- Full logs from pastebin to survive expiration.
- 10:32 AM Bug #8885: SIGABRT in TrackedOp::dump() via dump_ops_in_flight()
- /a/samuelj-2014-10-21_16:45:57-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/564093/remote
- 10:29 AM Bug #9675: splitting a pool doesn't start when rule_id != ruleset_id
- Note that this patch will not change existing crushmaps, it will just make new rules using matching ruleset_id == rul...
- 10:00 AM Bug #9675 (Pending Backport): splitting a pool doesn't start when rule_id != ruleset_id
- pending backports awaiting review/merge on
* dumpling: https://github.com/ceph/ceph/pull/2775
* emperor: https://... - 10:05 AM Bug #9851: crash on journal/filestore shutdown on firefly
- Running http://pulpito.ceph.com/loic-2014-10-22_10:04:57-upgrade:firefly-x-giant-testing-basic-vps/ which is s/branch...
- 09:14 AM Bug #9851: crash on journal/filestore shutdown on firefly
- Running http://pulpito.ceph.com/loic-2014-10-22_20:28:41-rados:thrash-wip-9851-testing-basic-vps/
- 09:09 AM Bug #9851: crash on journal/filestore shutdown on firefly
- 09:05 AM Bug #9851: crash on journal/filestore shutdown on firefly
- I wonder how to re-run http://pulpito.ceph.com/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-distro-basic-mu...
- 10:01 AM Bug #9852 (Fix Under Review): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID th...
- https://github.com/ceph/ceph/pull/2773
- 09:46 AM rbd Bug #9857 (Resolved): rbd readahead division by zero exception
- 09:45 AM rbd Bug #9857: rbd readahead division by zero exception
- PR: https://github.com/ceph/ceph/pull/2770
- 08:53 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
- And sda1 which is the ext4 mounted disj of osd.13
Oct 22 07:42:00 stri os-prober: debug: running /usr/lib/os-probe... - 08:45 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
- Logs detailing what os-prober was doing when one of the OSD crashed, sda2 is the journal partition of osd.13 who got ...
- 08:42 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
- 08:25 AM devops Bug #9860: grub/os-prober launch kills most ceph OSD
- Adding more complete log lines with ASSERT references
<guerby> 2014-10-22 07:42:07.369785 7f6edf0b5700 0 -- 192.1... - 12:12 AM devops Bug #9860 (Fix Under Review): grub/os-prober launch kills most ceph OSD
- h3. Workaround
Disable os-probe with ... - 08:09 AM Bug #9858 (Rejected): osd crush rule create-erasure idempotency failure
- This was a side effect of process being killed at random. It was possible to reproduce it consistently until https://...
- 04:10 AM Bug #5925: hung ceph_test_rados_delete_pools_parallel
- this was fun though.
I'll stop with the noise now and test this with the patch from #9845. - 04:08 AM Bug #5925 (Can't reproduce): hung ceph_test_rados_delete_pools_parallel
- and then I read David's comments on this ticket and I felt dumb.
- 04:06 AM Bug #5925: hung ceph_test_rados_delete_pools_parallel
- My last statement about the tick even was inaccurate.
gdb tells me that 'tick_event' is still set by the time we i... - 03:48 AM Bug #5925: hung ceph_test_rados_delete_pools_parallel
- Hit this again while testing a mon patch. Setting to this 'Verified' again until I check with David or Sam on what t...
- 02:58 AM Bug #9585: ceph assertion using rocksdb store in master branch
- Hi Tamilarasi, it's still broken for the master branch? Give a link to the corresponding job for pulpito.ceph.com?
- 02:56 AM Bug #9814 (Resolved): FAILED assert(0) In function 'GenericObjectMap::Header GenericObjectMap::lo...
- 01:58 AM Bug #9761: ceph-osd: segfault at 654c30 ip 00007f00dc5f1f07 sp 00007f00c5642e00 error 7 in ld-2.1...
- No.
10/21/2014
- 09:32 PM Bug #9859: Commit 2ac2a96 appears to break OSD creation
- Specifically, this is with osd creation where the monmap isn't specified (similar to how vstart does it, but not ceph...
- 09:09 PM Bug #9859 (Resolved): Commit 2ac2a96 appears to break OSD creation
- Narrowed this down through Joao's comments and bisecting to hit this commit. Not sure if this only happens under spec...
- 06:18 PM rgw Bug #9169: 100-continue broken for centos/rhel
- Running a simplified yaml, see https://gist.github.com/yuriw/1603e536ee33a28f93a4
Note: Moved clients to separate ... - 04:53 PM rgw Bug #9169: 100-continue broken for centos/rhel
- Running a simplified yaml, see https://gist.github.com/yuriw/1603e536ee33a28f93a4
Note: Moved clients to separate ... - 10:16 AM rgw Bug #9169: 100-continue broken for centos/rhel
- See the dupe #9825 for latest run info
- 10:07 AM rgw Bug #9169: 100-continue broken for centos/rhel
- yuri to make a minimal test case
- 05:50 PM rgw Bug #9587 (Fix Under Review): ceph-radosgw sysvinit script on EL6 cannot set ulimit
- https://github.com/ceph/ceph/pull/2771
This could use a manual test as well to ensure the limit is properly set on... - 05:23 PM Bug #9858: osd crush rule create-erasure idempotency failure
- reproduced with *while make -j8 check ; do : ; done* after ~30 minutes (i.e. ~15 runs).
- 05:03 PM Bug #9858 (Rejected): osd crush rule create-erasure idempotency failure
- The *./ceph osd crush rule create-erasure ruleset3* command run by test/mon/osd-crush.sh sometime fails to notice the...
- 05:20 PM Bug #9837 (Duplicate): rbd crash when upgrading from v0.80.5 to firefly
- this could be same as bug # 9288, modified the upgrade:firefly suite to NOT upgrade clients when workload is in progr...
- 05:19 PM rbd Feature #9733: Separate rbd listing into CAP
- It sounds like Nova is configured to use RBD as the backing store for its ephemeral disk images instead of the local ...
- 11:51 AM rbd Feature #9733: Separate rbd listing into CAP
- OK, putting the pool argument first does work. We have consequently found out that Nova does require list permissions...
- 10:54 AM rbd Feature #9733: Separate rbd listing into CAP
- Try placing the "pool=test" argument before the "object_prefix XYZ" portion of the cap:...
- 05:16 PM Bug #9610 (Resolved): Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::Cal...
- fixed in multi-version suite already - commit b966da7b71c8aee22ff8e58b3b0c105b1d7ca4bf
fixed in upgrade:firefly/ol... - 02:06 PM Bug #9610: Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)...
- New ceph_test_rados is too picky for dumpling osds. We only want to use dumpling ceph_test_rados against clusters wi...
- 02:06 PM Bug #9610: Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)...
- also: ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561550
<... - 12:53 PM Bug #9610: Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)...
- seeing this on the upgrade test from v0.67.11 to firefly [v0.80.7]...
- 04:54 PM Bug #9752: acting in past intervals contains primary and up_primary (looks like duplicates but is...
"kit" on #ceph was in a situation of having incomplete pg. They sent the pg query output and it showed strange pas...- 04:44 PM rbd Bug #9857 (Fix Under Review): rbd readahead division by zero exception
- 03:53 PM rbd Bug #9857 (In Progress): rbd readahead division by zero exception
- 02:42 PM rbd Bug #9857 (Resolved): rbd readahead division by zero exception
- When using old-format RBD images, the RBD readahead block alignments are initialized to zero because the stripe param...
- 04:07 PM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
- Tamilarasi muthamizhan wrote:
> I think this issue could be related to bug # 9288, upgrading clients when workload i... - 02:04 PM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
- I think this issue could be related to bug # 9288, upgrading clients when workload is in progress.
- 02:02 PM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
- more logs:
ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561562 - 11:11 AM rbd Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
- logs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561993
- 11:07 AM rbd Bug #9855 (Resolved): rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-...
- On:
os_type: rhel
os_version: '6.4'
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-20_20:50:... - 03:32 PM rgw Bug #9612: "ERROR: test suite for <module 's3tests.functional'" in multi-version-giant-testing-ba...
- ...
- 03:24 PM rgw Bug #9612 (Rejected): "ERROR: test suite for <module 's3tests.functional'" in multi-version-giant...
- that's giant rgw and dumpling osds, shouldn't work.
- 03:22 PM CephFS Feature #9557 (Fix Under Review): mds: verify backtrace on fetch_dir
- 10:44 AM CephFS Feature #9557 (In Progress): mds: verify backtrace on fetch_dir
- 03:19 PM Feature #7104: rest-api: support commands requiring 'w' cap without 'rw' cap
- Please stop throwing this bug in the FS tracker just because it has the word MDS in it...
- 02:58 PM rgw Bug #9616: upgrade test restarts rgw, test gets 500
- fixed...
- 02:33 PM Bug #9823 (Won't Fix): ceph-osd mkfs or ceph auth add : exit -9
- It did not show up after the fix. The tests use a lot more than the default 1024 file descriptors allowed. Marking wo...
- 02:25 PM devops Bug #9665: ceph-disk zap should call partprobe
- Running http://pulpito.ceph.com/loic-2014-10-21_14:25:31-ceph-deploy:singleton-wip-9665-ceph-disk-partprobe-testing-b...
- 02:19 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Sorry, still out sick today. Hoping to be in tomorrow.
- 11:10 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Brad House wrote:
> Sorry, I only have access during the week to the test system, and I'm out sick today. Hopefully... - 02:17 PM devops Bug #9807 (Duplicate): Missing radosgw packages in various upgrade suites
- This was basically the same issue that was thought to be fixed and centos still had issues in issue #9824 but should ...
- 02:13 PM devops Bug #9747: ceph.spec.in will always use 95-ceph-osd-alt.rules
- gitbuilder was clean but got trimed
- 01:53 PM Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
- fixed it ...
- 11:17 AM Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
- the job is upgrading client.0 (vpm072) in that test too.
i think
- install.upgrade:
all:
bran... - 01:47 PM Bug #9408: erasure-code: misalignment
- 01:34 PM Bug #9485 (In Progress): Monitor crash due to wrong crush rule set
- Did not forget about it, just busy with other things.
- 01:33 PM Bug #9684 (Can't reproduce): "Scrubbing terminated" in upgrade:firefly-firefly-distro-basic-multi...
- no log or core
- 01:32 PM Bug #9434 (Can't reproduce): rbd rm hangs
- 01:30 PM Bug #9702 (Duplicate): "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:fire...
- probably dup of #9835
- 01:29 PM Bug #9703 (Resolved): "Segmentation fault" in upgrade:firefly-x-giant-distro-basic-multi run
- 01:27 PM Bug #9739 (Won't Fix): rados cli: listsnaps does not list snaps
- because you haven't written to it yet!
- 01:19 PM Bug #9761: ceph-osd: segfault at 654c30 ip 00007f00dc5f1f07 sp 00007f00c5642e00 error 7 in ld-2.1...
- Has this happened more than once?
- 01:18 PM Bug #9794 (Resolved): vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- checked firefly; no need to backport.
- 01:15 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- 01:15 PM Bug #9794 (Resolved): vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- 01:11 PM Bug #9419 (Resolved): dumpling->firefly upgrade, sending setallochint?
- 01:09 PM Bug #9649 (Can't reproduce): OSD hang in op_tp
- 01:07 PM Bug #9559 (Resolved): ?off-by-one vulnerability?ceph-0.80.5/src/common/fd.cc dump_open_fds() func...
- 11:43 AM CephFS Bug #8809 (Can't reproduce): uclient: memory leak
- maybe fixed by 2313ce1d024361fd7f4d2cbca789010f0fe0faad
- 11:34 AM Bug #9856 (Duplicate): osd crashed after upgrade from v0.80.5 to firefly
- #9851
- 11:26 AM Bug #9856: osd crashed after upgrade from v0.80.5 to firefly
- more jobs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561999
... - 11:23 AM Bug #9856 (Duplicate): osd crashed after upgrade from v0.80.5 to firefly
- osd crashed after upgrading from ceph v0.80.5 to firefly and during thrashing,
logs: ubuntu@teuthology:/a/teutholo... - 11:10 AM Linux kernel client Bug #9507 (Resolved): calling llistxattr(2) on a symlink crashes the client
- 10:55 AM CephFS Bug #9674: nightly failed multiple_rsync.sh
- commit:477073aba1da880dfd0b8c82f4792788579f28b9 in master and commit:44ce33c12443909b02c7ee451ad45400f55d53c9 in giant
- 10:38 AM Bug #9845 (Resolved): hung ceph_test_rados_delete_pools_parallel
- 12:59 AM Bug #9845 (Fix Under Review): hung ceph_test_rados_delete_pools_parallel
- 12:48 AM Bug #9845 (Resolved): hung ceph_test_rados_delete_pools_parallel
- ...
- 09:57 AM rgw Bug #9575 (Duplicate): s3tests.functional.test_s3.test_region_copy_object fails (races with rados...
- 09:43 AM rgw Bug #3896 (Resolved): rest-bench common/WorkQueue.cc: 54: FAILED assert(_threads.empty())
- 09:42 AM rgw Bug #1673 (Won't Fix): rgw: mod_fastcgi needs to be backward compatible
- 09:41 AM rgw Bug #8251: radosgw-agent does not sync objects uploaded to recreated buckets
- closed and obsolete : https://github.com/ceph/ceph/pull/2765
- 09:40 AM rgw Bug #8550 (Resolved): rgw: need to reduce calls to rgw_obj.set_obj()
- 09:38 AM rgw Bug #9043 (Duplicate): rgw:Cannot add object to Ceph using Openstack Dashboard(Horizon) in firefly
- 09:31 AM rgw Bug #9525 (Duplicate): Deleted object shows in object listing
- 09:29 AM rgw Bug #9576 (Fix Under Review): rgw: update object content-length doesn't work correctly
- 09:27 AM rgw Bug #9500 (Duplicate): 0.80.5 on CentOS 6.5: radosgw-admin fails to correctly name subuser object
- 09:27 AM rgw Bug #9500: 0.80.5 on CentOS 6.5: radosgw-admin fails to correctly name subuser object
- unlikely to be ubuntu vs centos. this looks like #8587 or releated issues (pending backport to firefly)
- 09:25 AM rgw Bug #9469 (Rejected): RadosGW performance degrades with high concurrency workload.
- please send an email about this to ceph-devel; that is a better forum to discuss performance issues.
- 09:23 AM rgw Bug #9543 (Rejected): AssertionError(s) in upgrade:dumpling-dumpling-distro-basic-vps run
- 09:23 AM rgw Bug #9588 (Rejected): Keystone s3 auth integration lacking access_key = tenant:user ability suppo...
- thanks Mark!
- 09:21 AM rgw Bug #9766 (Rejected): s3tests: test_100_continue failing
- this is almost certainly a configuration error. need rgw print continue = true and patched mod_fastcgi
- 09:20 AM rgw Bug #9002 (Duplicate): Creating swift key with --gen-secret in separate step from subuser creatio...
- #8587
- 09:19 AM rgw Bug #8676 (Duplicate): md5sum check failed during readwrite.py
- this appears to be resolved by #9307
- 09:17 AM rgw Bug #9307 (Resolved): "s3.test_multipart_upload_multiple_sizes ... ERROR" in upgrade:dumpling-fir...
- 09:17 AM rgw Bug #9307: "s3.test_multipart_upload_multiple_sizes ... ERROR" in upgrade:dumpling-firefly-x-mast...
- 09:16 AM rbd Bug #9854 (Resolved): librbd: reads contending for cache space can cause livelock
- As a result of accounting for reads properly with #9513. Using qemu-io (a test program) is one way to trigger this - ...
- 09:13 AM rgw Bug #9039 (Resolved): Using COPY on radosgw to copy object from one bucket to another that's in a...
- 09:07 AM Bug #9675 (In Progress): splitting a pool doesn't start when rule_id != ruleset_id
- 09:06 AM rbd Bug #9513 (Resolved): rbd_cache=true default setting is degading librbd performance ~10X in Giant
- backported in commit:65be257e9295619b960b49f6aa80ecdf8ea4d16a
- 09:04 AM Bug #9813 (Resolved): cryptopp dependency missing for deb-based systems
- 08:45 AM Bug #9813: cryptopp dependency missing for deb-based systems
- Already addressed by [1], cheers!
[1] https://github.com/ceph/ceph/pull/2761 - 08:54 AM Bug #9853 (Duplicate): coredump in upgrade:firefly-x-giant-distro-basic-vps run
- #9851
- 08:21 AM Bug #9853 (Duplicate): coredump in upgrade:firefly-x-giant-distro-basic-vps run
- On:
os_type: rhel
os_version: '6.5'
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-20_15... - 08:52 AM rgw Bug #9825 (Duplicate): s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel...
- this is #9169
- 08:33 AM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
- In the run teuthology/teuthology-2014-10-20_15:01:14-upgrade:firefly-x-giant-distro-basic-vps , jobs ['560024', '5600...
- 08:12 AM Bug #9073 (Pending Backport): OSD with device/partition journals down after fresh deploy or upgra...
- 07:50 AM Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83
- https://github.com/ceph/ceph/pull/2764 is a better fix. The isolated patch made sense to me at the time but it looks ...
- 08:09 AM Linux kernel client Bug #9561 (Rejected): libceph: do not crash if auth reply is not understood
- I believe the code is correct as is and I misdiagnosed the original issue. :(
- 08:09 AM Linux kernel client Bug #9560 (Rejected): libceph: msg kmalloc failure handling on the reply path
- I believe the code is correct as is and I misdiagnosed the original issue.
- 07:35 AM Bug #9852 (Resolved): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID that DNE
- ...
- 06:58 AM Bug #9851 (Fix Under Review): crash on journal/filestore shutdown on firefly
- https://github.com/ceph/ceph/pull/2764
- 06:42 AM Bug #9851 (Resolved): crash on journal/filestore shutdown on firefly
- saw this on several runs, e.g. /var/lib/teuthworker/archive/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-di...
- 01:48 AM devops Bug #9840: Monitor hung when add new osd
- using the valgrin find errors:
==17554== Thread 7:
==17554== Invalid read of size 4
==17554== at 0x3168A0C380: ... - 12:50 AM Bug #5925 (Can't reproduce): hung ceph_test_rados_delete_pools_parallel
Filed #9845 to describe the recent occurence. This bug was probably something else, so I'm setting it back to "Can...
10/20/2014
- 11:51 PM Bug #5925: hung ceph_test_rados_delete_pools_parallel
- I don't think this would have happened when safe_callbacks was true. It was set to false in a fix for #9582. See al...
- 07:02 PM Bug #5925: hung ceph_test_rados_delete_pools_parallel
Yup, there is a shutdown race. Thread 1 is waiting for the timer thread while holding the Objecter::rwlock in writ...- 11:22 PM Fix #9834 (Rejected): osd_scrub_load_threshold should be checked during scrubbing
- 10:14 AM Fix #9834: osd_scrub_load_threshold should be checked during scrubbing
- I'm not sure this is something we should do. We attempt to schedule scrubs during periods of low disk usage, but if t...
- 07:25 AM Fix #9834 (Rejected): osd_scrub_load_threshold should be checked during scrubbing
- "osd_scrub_load_threshold":https://github.com/ceph/ceph/blob/firefly/src/common/config_opts.h#L515 is "considered":ht...
- 11:20 PM Bug #9844 (Won't Fix): "initiating reconnect" (log) race; crash of multiple OSDs (domino effect)
- On 0.87 I watch "ceph osd tree" and notice that one OSD (leveldb/keyvaluestore-dev) is "down".
In its log I see
... - 10:45 PM Bug #9356 (Resolved): ceph_test_rados_striper_api_aio Segmentation faults
- https://github.com/ceph/ceph/pull/2419
- 09:38 PM Bug #9839 (Rejected): ErasureCodePluginSelectJerasure: generic plugin : abort
- ...
- 03:23 PM Bug #9839 (Need More Info): ErasureCodePluginSelectJerasure: generic plugin : abort
- 03:23 PM Bug #9839: ErasureCodePluginSelectJerasure: generic plugin : abort
- When trying to run manually *ceph-osd -i 0* it hangs at the same point.
- 03:02 PM Bug #9839 (Rejected): ErasureCodePluginSelectJerasure: generic plugin : abort
- It fails when pre-loading the plugin in a context where erasure-code is not used.
http://pulpito.ceph.com/teutholo... - 06:27 PM devops Bug #9840: Monitor hung when add new osd
- try again, monitor hung still
Thread 25 (Thread 0x7f93e5ec0700 (LWP 13652)):
#0 0x0000003168a0b5bc in pthread_co... - 06:17 PM devops Bug #9840 (Rejected): Monitor hung when add new osd
- ceph version: 0.80.6
Platform: Redhat RHLS 6.5
we want to test the replace disk case,
operator step:
1. ceph o... - 06:02 PM Bug #9419: dumpling->firefly upgrade, sending setallochint?
- This is done an a new case was added - PR https://github.com/ceph/ceph-qa-suite/pull/198
- 06:01 PM Feature #9568: Add test case to test #9419 (ceph wip-9419)
- This is done an a new case was added - PR https://github.com/ceph/ceph-qa-suite/pull/198
- 02:14 PM Feature #9568: Add test case to test #9419 (ceph wip-9419)
- this seems to require clients upgraded first running workloads against upgraded monitors and mixed versions of osds, ...
- 04:14 PM rbd Feature #9733: Separate rbd listing into CAP
- OK, so one more question. This looks like it allows access to any pool. Is there a way to limit this to a particular ...
- 03:04 PM Bug #9389 (Duplicate): ec pg stuck peering, did not send query for one shard
- 03:04 PM Bug #9822 (Resolved): failed to become clean before timeout expired
- 02:29 PM Bug #9822: failed to become clean before timeout expired
- 02:18 PM Bug #9821: failed to recover before timeout expired
- in wip-sam-testing
- 02:09 PM Bug #9821: failed to recover before timeout expired
- working on patch
- 01:55 PM Bug #9835 (Fix Under Review): osd: bug in misdirected op checks (firefly)
- https://github.com/ceph/ceph/pull/2760
- 10:12 AM Bug #9835: osd: bug in misdirected op checks (firefly)
- Maybe we need to adjust how we're handling waiting_for_pg, but I don't think that this particular check is a bug — th...
- 09:33 AM Bug #9835 (Resolved): osd: bug in misdirected op checks (firefly)
- ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-distro-basic-mu...
- 01:32 PM Bug #9806: Objecter: resend linger ops on split
- 01:23 PM CephFS Feature #414 (Resolved): ceph-fuse: implement file locking
- 01:22 PM CephFS Bug #8576: teuthology: nfs tests failing on umount
- teuthology commit:4f2957c42d0f76a399cb26c660ede9243c095779 runs those commands as well as the previous ones.
- 01:02 PM CephFS Bug #9679 (Closed): Ceph hadoop terasort job failure
- Fixed in cephfs-hadoop repo.
- 11:31 AM Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
- logs: ubuntu@teuthology:/a/teuthology-2014-10-17_23:30:01-upgrade:firefly:newer-firefly-distro-basic-vps/555356
<p... - 11:20 AM Bug #9288 (New): "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suite
- this seems to look different from bug # 9040.
- 11:29 AM Bug #9837: rbd crash when upgrading from v0.80.5 to firefly
- ...
- 11:27 AM Bug #9837 (Duplicate): rbd crash when upgrading from v0.80.5 to firefly
- logs: ubuntu@teuthology:/a/teuthology-2014-10-17_23:30:01-upgrade:firefly:newer-firefly-distro-basic-vps/555359
<p... - 11:22 AM Bug #9836 (Fix Under Review): mon unit tests use the wrong id
- https://github.com/ceph/ceph/pull/2759
- 11:19 AM Bug #9836: mon unit tests use the wrong id
- It impacts
* "osd-erasure-code-profile.sh":https://github.com/ceph/ceph/blob/giant/src/test/mon/osd-erasure-code-... - 11:13 AM Bug #9836 (Resolved): mon unit tests use the wrong id
- the mon id is incorrect for mon tests using "the call_TEST_functions":https://github.com/ceph/ceph/blob/firefly/src/t...
- 11:15 AM CephFS Bug #9800: client-limits test is not passing
Same failure:
http://pulpito.front.sepia.ceph.com/teuthology-2014-10-17_23:04:02-fs-giant-distro-basic-multi/555...- 11:07 AM Linux kernel client Bug #9458 (Resolved): client wrongly fenced
- 11:06 AM Linux kernel client Bug #1513 (Resolved): kclient: cap migration can race with cap addition on client
- now cap import/export are ordered.
(commit 186e4f7a4b1883f3f46aa15366c0bcebc28fdda7, 4ee6a914edbbd2543884f0ad7d58ea4... - 10:46 AM Bug #9820 (Resolved): mon connection hang on cephtool/test.sh
- 10:38 AM Bug #9372: injectarg boolean option is discarded
- "fails on precise":http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-precise-i386-basic/log.cgi?log=14ed21f9ad...
- 10:06 AM Bug #9826 (Rejected): ceph osd crush rule ls should use the pending crush, if any
- 08:59 AM Bug #9826: ceph osd crush rule ls should use the pending crush, if any
- ...
- 09:18 AM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
- I am wondering if it's related to changes in s3tests?
- 08:41 AM Bug #9819 (Won't Fix): EBUSY during scrub
- this is expected and harmless. we just report the failure and move it. it happens when paxos is busy when we reques...
- 08:38 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Sorry, I only have access during the week to the test system, and I'm out sick today. Hopefully I'll be able to cont...
- 04:02 AM rgw Feature #8562: rgw: Conditional PUT on ETag
- Closed the previous out-of-synced PR and submitted a new one: https://github.com/ceph/ceph/pull/2756
- 01:38 AM rgw Feature #8562: rgw: Conditional PUT on ETag
- Here is a PR for discussion purpose: https://github.com/ceph/ceph/pull/2755
We may need to elaborate a bit on it aft... - 03:46 AM Bug #9816: mon exits unexpectedly and gracefully
- just a hunch: feels like you're capturing only stdout from the monitor, and the monitor may have hit the 'mon data av...
- 01:42 AM Linux kernel client Bug #9749: kcephfs: kernel divide-by-zero crash in __validate_layout (fs/ceph/ioctl.c)
- I guess we are just not used to doing it - I think we haven't filed any CVEs for ceph kernel bits (and kcephfs in par...
10/19/2014
- 08:29 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Any update?
- 07:20 PM CephFS Bug #9341 (Pending Backport): MDS: very slow rejoin
- Hmm, we didn't put this in Giant initially because we were trying not to perturb it. Master hasn't been run through t...
- 06:45 PM CephFS Bug #9341 (Fix Under Review): MDS: very slow rejoin
- Please include this fix to 0.87 which is affected just as badly as 0.80.x.
On 0.87 MDS stuck in "rejoin" for hours a... - 07:13 PM Bug #9826 (Fix Under Review): ceph osd crush rule ls should use the pending crush, if any
- 07:13 PM Bug #9826: ceph osd crush rule ls should use the pending crush, if any
- https://github.com/ceph/ceph/pull/2754
- 07:07 PM Bug #9826 (Rejected): ceph osd crush rule ls should use the pending crush, if any
- The following is racy:...
- 05:03 PM Bug #9823: ceph-osd mkfs or ceph auth add : exit -9
- Maybe it runs out of file descriptors because of the // runs. Since the erasure code test is the one using the more d...
- 03:12 PM Bug #9823: ceph-osd mkfs or ceph auth add : exit -9
- The error matching the mon log is different: auth add exits with -9 instead of mkfs....
- 03:02 PM Bug #9823: ceph-osd mkfs or ceph auth add : exit -9
- it was reproduced with a change to the script to keep the logs.
- 12:53 PM Bug #9823 (Won't Fix): ceph-osd mkfs or ceph auth add : exit -9
- While running src/test/erasure-code/test-erasure-code.sh in a loop, the following happened. The -9 exit code suggests...
- 04:27 PM Bug #9796: osd: crash on blacklisted watcher reconnect (dumpling)
- Observed similar crash in suite:upgrade:dumpling
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:00... - 04:09 PM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
- Same problems:
suite:upgrade:dumpling-x
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-17_19:13:01-up... - 03:19 PM rgw Bug #9825: s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel-giant-distr...
- Log for rhel 6.5 job - http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:15:02-upgrade:dumpling-firefly-x:...
- 03:18 PM rgw Bug #9825 (Duplicate): s3tests failing on rhel 6.4 and 6.5 in upgrade:dumpling-firefly-x:parallel...
- Looks similar to #9763
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:15:02-upgrade:dump... - 02:26 PM Linux kernel client Bug #9749: kcephfs: kernel divide-by-zero crash in __validate_layout (fs/ceph/ioctl.c)
- This bug appears to be exploitable by unprivileged local users and will cause a machine-wide DoS. Is there some reaso...
- 10:17 AM Bug #9822 (Resolved): failed to become clean before timeout expired
- logs: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553345...
- 10:15 AM Bug #9821: failed to recover before timeout expired
- ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553255
- 10:11 AM Bug #9821 (Resolved): failed to recover before timeout expired
- logs: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553125...
- 09:44 AM Feature #9817 (Fix Under Review): display X.XX deep-scrub starts
- https://github.com/ceph/ceph/pull/2752
- 08:15 AM Feature #9817 (Resolved): display X.XX deep-scrub starts
- It would be convenient to have a message in the logs when deep-scrub starts...
- 09:40 AM Bug #9820 (Resolved): mon connection hang on cephtool/test.sh
- log: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/553035...
- 09:28 AM Bug #9819 (Won't Fix): EBUSY during scrub
- logs: ubuntu@teuthology:/a/teuthology-2014-10-17_02:32:01-rados-giant-distro-basic-multi/552986...
- 08:40 AM Bug #9818 (Resolved): ENXIO qa/workunits/cephtool/test.sh:test_osd_bench
- It looks like the OSD crashed but there is no more information than the following log at the moment. It was created w...
- 08:12 AM Bug #9816 (Can't reproduce): mon exits unexpectedly and gracefully
- ...
10/18/2014
- 09:25 PM Bug #9814 (Fix Under Review): FAILED assert(0) In function 'GenericObjectMap::Header GenericObjec...
- https://github.com/ceph/ceph/pull/2710
- 05:10 PM Bug #9814 (Resolved): FAILED assert(0) In function 'GenericObjectMap::Header GenericObjectMap::lo...
- LevelDB-based OSD (i.e. "keyvaluestore-dev") crashed as follows on 0.87 during backfill:...
- 07:54 PM Feature #9815 (Fix Under Review): run make check in parallel
- https://github.com/ceph/ceph/pull/2750
- 05:46 PM Feature #9815 (Resolved): run make check in parallel
- Individual tests run by make check may bind fixed ports or use identical files or subdirectories to store temporary d...
- 05:06 PM Bug #9744: cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
- Sage Weil wrote:
> this happens when clocks are very skewed.
Are we OK with such vulnerability that allow to brin... - 03:27 AM Bug #9813 (Resolved): cryptopp dependency missing for deb-based systems
- Hi, when following [1] from a trusty64 box I've noticed that the libcrypto++-dev entry is missing from deps.deb.txt. ...
10/17/2014
- 08:30 PM Bug #9810 (Duplicate): dout_emergency is silenced in ceph-osd
- "ceph-osd closes stderr":https://github.com/ceph/ceph/blob/giant/src/ceph_osd.cc#L499 and this may be the reason why ...
- 05:03 PM Bug #9809 (Rejected): common/perf_counters.cc: 105: FAILED assert(idx < m_upper_bound)
- I changed the code and introduced the problem and then forgot I changed the code. Reverting the change fixes the prob...
- 04:50 PM Bug #9809 (Rejected): common/perf_counters.cc: 105: FAILED assert(idx < m_upper_bound)
- Steps to reproduce
* modify vstart.sh with ... - 04:27 PM Bug #9808 (Rejected): PG stuck in active+undersized+degraded+remapped+backfill_toofull
- The disk was 90% full ... hence the block.
- 04:21 PM Bug #9808: PG stuck in active+undersized+degraded+remapped+backfill_toofull
- The "scheduled":https://github.com/ceph/ceph/blob/giant/src/osd/PG.cc#L5674 "RequestBackfill":https://github.com/cep...
- 04:13 PM Bug #9808 (Rejected): PG stuck in active+undersized+degraded+remapped+backfill_toofull
- Steps to reproduce
* modify vstart.sh with ... - 04:21 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Just to check, there isn't anything interesting in dmesg, right?
- 03:07 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Oh, and the
--00:00:06:05.108 2312-- WARNING: unhandled syscall: 306
--00:00:06:05.108 2312-- You may be able to ... - 02:45 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Looks like in our testing we invoke valgrind as:
valgrind --suppressions=<suppression_file> --num-callers=50 --xml... - 02:14 PM Bug #9731: Ceph 0.80.6 OSD crashes
- wheezy gitbuilder should be working.
- 02:13 PM Bug #9731: Ceph 0.80.6 OSD crashes
- yeah, -f I think.
- 01:58 PM Bug #9731: Ceph 0.80.6 OSD crashes
- valgrind appears to detach from the console when running with ceph-osd, is there some other flag I need to pass to ce...
- 01:55 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Sage also pushed a wip-9731 based on 0.80.7 with a piece of debugging which would be handy. Reproducing with that wo...
- 09:23 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Brad House wrote:
> sure, just tell me the best command line to us as I haven't ever tried to run ceph-osd outside o... - 03:22 PM Bug #9788 (Rejected): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeou...
- Two osds, both on mira076 timed out:
osd5: a stat in the op_tp took 3 minutes (completed, surprisingly, right before... - 03:03 PM devops Bug #9807: Missing radosgw packages in various upgrade suites
- looks like we are hitting a lot of failures in upgrade tests because of this issue.
- 03:01 PM devops Bug #9807 (Duplicate): Missing radosgw packages in various upgrade suites
- In teuthology-2014-10-16_19:00:01-upgrade:dumpling-x-firefly-distro-basic-vps...
- 02:57 PM Bug #9220 (Resolved): objecter doesn't reconnect watch on interval change w/ same primary
- This did not need backporting to dumpling after all, since it was broken after dumpling by commit:860d72770cdf092c027...
- 11:13 AM Bug #9220 (Pending Backport): objecter doesn't reconnect watch on interval change w/ same primary
- 11:20 AM Bug #9806 (Resolved): Objecter: resend linger ops on split
- Otherwise, we can lose notifies.
commit:cb9262abd7fd5f0a9f583bd34e4c425a049e56ce
- 10:50 AM Bug #9419: dumpling->firefly upgrade, sending setallochint?
- 10:49 AM Bug #9419: dumpling->firefly upgrade, sending setallochint?
- next step is to add a tests for this to the upgrade suties.
- 10:43 AM Bug #9073 (Resolved): OSD with device/partition journals down after fresh deploy or upgrade to 0.83
- 10:39 AM Bug #9614 (Pending Backport): PG stuck with remapped
- 10:38 AM Bug #9718 (Pending Backport): osd_types: check_new_interval: min_size check needs to consider CRU...
- 10:32 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
- ubuntu@teuthology:/a/samuelj-2014-10-15_20:19:09-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/551397/r...
- 09:33 AM Documentation #9804 (Resolved): kvm and qemu do not document ceph/rbd support
- * looking for ceph or rbd in http://www.linux-kvm.org/page/Special:Search?search=ceph&go=Go : zero match
* on qemu.o... - 09:10 AM Bug #6756 (Fix Under Review): journal full hang on startup
- https://github.com/ceph/ceph/pull/2745
(rebased and retested old patch) - 07:48 AM Bug #9729 (Resolved): "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x:paralle...
- 07:47 AM rbd Bug #9642 (Resolved): Errors in test_rbd.test_* tests in upgrade:dumpling-firefly-x:parallel-gian...
- 07:46 AM rbd Bug #9642: Errors in test_rbd.test_* tests in upgrade:dumpling-firefly-x:parallel-giant-distro-ba...
- Fixed, tests passed on bare metal.
Last results - http://pulpito.front.sepia.ceph.com/teuthology-2014-10-16_17:10:01... - 05:16 AM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- I confirm that
* the problem can be reproduced 100% of the time on my laptop,
* that cherry-pick c84a13ae87eed555... - 04:17 AM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- Loic, try this patch with the same conditions in which you triggered it: c84a13ae87eed5550bafda394d983a8e843cc08c
... - 01:52 AM Feature #9802 (New): When replaced a disk, the CRUSH weight of the related host changed
- In disk replacement test, when add a disk into cluster. The osd tree likes
below:...
10/16/2014
- 10:27 PM Bug #9801 (Won't Fix): ceph 0.80.7 build rpm packages in centos 7 error
- ceph 0.80.7 build rpm packages in centos 7 error...
- 06:30 PM Bug #8629: cache_evict needs to prevent make_writeable from creating a snapdir
- https://github.com/ceph/ceph/pull/2737
- 05:24 PM Fix #9566 (In Progress): osd: prioritize recovery of OSDs with most work to do
- 05:11 PM Fix #9566: osd: prioritize recovery of OSDs with most work to do
- Related commits:
* "osd: prioritize backfill based on *how* degraded":https://github.com/ceph/ceph/commit/0985ae71bc... - 05:04 PM Bug #9769 (Resolved): upgrade/firefly: latest_dumpling_release.yaml always fails
- 10:56 AM Bug #9769: upgrade/firefly: latest_dumpling_release.yaml always fails
- It fixed, testing now, here is the run passed:...
- 04:59 PM Bug #9765 (Duplicate): CachePool flush -> OSD Failed
- I'm pretty sure this is because #8629 has not yet been backported to firefly. It should be in 0.80.8. I'll prepare ...
- 05:48 AM Bug #9765: CachePool flush -> OSD Failed
- The 'forward' mode means we will modify cached objects in place but forward any 'misses'. It is also possible that t...
- 04:58 PM Bug #9731: Ceph 0.80.6 OSD crashes
- sure, just tell me the best command line to us as I haven't ever tried to run ceph-osd outside of the standard init s...
- 04:52 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Would it be possible to run the osds in question under valgrind?
- 01:49 PM Bug #9731: Ceph 0.80.6 OSD crashes
- core file for last crash as requested by Samuel Just
- 01:38 PM Bug #9731: Ceph 0.80.6 OSD crashes
- 12:48 PM Bug #9731: Ceph 0.80.6 OSD crashes
- Another crash from another node, this time with debug increased. Will attach log, here is the backtrace from gdb:
<... - 10:42 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Another backtrace from a different machine, definitely different:...
- 10:33 AM Bug #9731: Ceph 0.80.6 OSD crashes
- backtrace from last core file:...
- 10:02 AM Bug #9731: Ceph 0.80.6 OSD crashes
- Can you reproduce with
debug osd = 20
debug ms = 1
debug filestore = 20
? - 06:47 AM Bug #9731: Ceph 0.80.6 OSD crashes
- 0.80.7 segfault core file and log. Happened immediately at startup after rebooting after update.
- 06:42 AM Bug #9731: Ceph 0.80.6 OSD crashes
- I just upgraded to 0.80.7, and got a crash on startup of one of my OSDs. I'll grab the log and core dump and attach ...
- 04:04 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- Reverting to 128 PG on master makes the problem disapear. 92 PG also works. 64 PG fails.
- 03:43 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- ...
- 01:54 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- It works on v0.85, bissecting
- 01:46 PM Bug #9794: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS
- reproduced on a fresh ubuntu 14.04 with v0.86-408-gad2514d
- 02:59 PM Feature #9799: ceph tell {daemon}.{id} config set etc.
- Two things to consider:
The authentication model is pretty different for a network connection to the daemon vs. a ... - 01:19 PM Feature #9799 (Resolved): ceph tell {daemon}.{id} config set etc.
- It would be nice to be able to send asok commands to a daemon using ceph tell instead of login in the machine and usi...
- 02:19 PM Bug #9729 (Fix Under Review): "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x...
- 02:19 PM Bug #9729: "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x:parallel-giant-dis...
- Backport to master - https://github.com/ceph/ceph-qa-suite/pull/195
- 10:59 AM Bug #9729: "LibRadosMisc.Operate1PP" test failed in upgrade:dumpling-firefly-x:parallel-giant-dis...
- Passed on nightlies:
http://pulpito.front.sepia.ceph.com/teuthology-2014-10-15_17:10:01-upgrade:dumpling-firefly-x... - 01:54 PM CephFS Bug #9800 (Resolved): client-limits test is not passing
- /a/teuthology-2014-10-13_23:04:01-fs-giant-distro-basic-multi/547170
The client isn't dropping its caps:... - 01:15 PM rbd Bug #9595 (Resolved): librbd: internal methods can operate on extra objects when non-default stri...
- commit:7b66ee4928d934d684b361602de783b927988503
- 10:50 AM CephFS Feature #4137: MDS: Implement a forward-scrubbing mechanism.
- I realized today that we probably want to optionally scrub directories that were renamed into place following a scrub...
- 09:11 AM Bug #9675: splitting a pool doesn't start when rule_id != ruleset_id
- See also the ceph-user thread "NO pg created for erasure-coded pool" where rule_id != ruleset on firefly.
- 05:57 AM Bug #9796 (Won't Fix): osd: crash on blacklisted watcher reconnect (dumpling)
- ...
Also available in: Atom