Activity
From 05/02/2017 to 05/31/2017
05/31/2017
- 11:06 PM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-05-31_18:45:30-rados-wip-sage-testing---basic-smithi/1248735
- 04:38 PM Bug #18043: ceph-mon prioritizes public_network over mon_host address
- and to elaborate on the fact that i have a branch and no pr, i do intend to finish this up soon, but likely only afte...
- 04:36 PM Bug #18043: ceph-mon prioritizes public_network over mon_host address
- fwiw, i've got a branch handling this from earlier this year: https://github.com/jecluis/ceph/commits/wip-mon-host
... - 04:04 PM Support #18508 (Closed): PGs of EC pool stuck in peering state
- There was clearly a lot going on here and none of it was clear. If switching to SimpleMessenger fixed it, I presume t...
- 03:14 PM Bug #17138: crush: inconsistent ruleset/ruled_id are difficult to figure out
- Some work in progress on this here: https://github.com/ceph/ceph/pull/13683
- 03:21 AM Bug #20117 (Rejected): BlueStore.cc: 8585: FAILED assert(0 == "unexpected error")
- version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o... - 03:19 AM Bug #20116 (Can't reproduce): osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILE...
- version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
05/30/2017
- 01:46 PM Support #20108 (Resolved): PGs are not remapped correctly when one host fails
- I have run into the following problem:
in a 6 node cluster we have 2 nodes/chassis, and the crush rule set to distri... - 01:45 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Logs available on teuthology:/home/jdillaman/osd.23.log_try_rados_rm.gz
05/29/2017
- 11:30 PM Bug #19790 (In Progress): rados ls on pool with no access returns no error
- 11:28 PM Bug #19790: rados ls on pool with no access returns no error
- https://github.com/ceph/ceph/pull/15354
Greg, will talk to you about the per-object cap semantics separately. - 07:45 PM Bug #19964: occasional crushtool timeouts
- /a/sage-2017-05-28_05:00:18-rados-wip-sage-testing---basic-smithi/1238511
description: rados/singleton-bluestore/{... - 02:51 PM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
- https://github.com/ceph/ceph/pull/15349
05/28/2017
- 09:17 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- I've no idea the repercussions (thinking I'll backup and recreate the cluster) but if you write an osdmap into all of...
- 03:09 AM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235222
- 03:07 AM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235419
- 02:03 AM Bug #19964: occasional crushtool timeouts
- /a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235225
- 01:59 AM Bug #19964: occasional crushtool timeouts
- /a/sage-2017-05-27_01:05:11-rados-wip-sage-testing---basic-smithi/1233483
- 01:57 AM Bug #20105 (Resolved): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- ...
05/27/2017
- 08:06 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
- I have a document that provides the detail of our analysis of this problem, but it's written in chinese. If needed, I...
- 08:03 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
- Hi, everyone.
Sorry, I forgot to watch my issues.
We found that the problem is due to "librados::OPERATION_BALA... - 07:59 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
- I pulled out a disk, and then there was the problem.
- 03:06 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
- fang yuxiang wrote:
> i think this is not functional issue of ceph, maybe your local fs data is corrupted.
>
> ar... - 03:01 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
- `read_log 406'6529418` and `read_log 346'6529418` have the same seq
other, I use ceph-kvstore-tool can show as:
... - 02:46 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
- i think this is not functional issue of ceph, maybe your local fs data is corrupted.
are you using any block cache... - 02:41 AM Bug #20099 (Need More Info): osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.versi...
- My Ceph cluster is down when the server is powered off,
and when i restart my osd, it failed in read_log.
As fllow:...
05/26/2017
- 09:44 PM Bug #19943: osd: enoent on snaptrimmer
- http://pulpito.ceph.com/gregf-2017-05-26_06:45:56-rados-wip-19931-snaptrim-pgs---basic-smithi/1231020/
- 03:36 PM Bug #20068 (Need More Info): osd valgrind error in CrushWrapper::has_incompat_choose_args
- https://github.com/ceph/ceph/pull/15244 was merged recently and modified how things are handled. Let see if it happen...
- 12:40 PM Bug #20092 (Duplicate): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
- http://pulpito.ceph.com/jdillaman-2017-05-25_16:48:38-rbd-wip-jd-testing-distro-basic-smithi/1229611...
05/25/2017
- 10:07 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
- ...
- 06:11 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
- /a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224591/teuthology.log...
- 05:56 AM Bug #19943: osd: enoent on snaptrimmer
- /a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224546/teuthology.log
- 02:27 AM Bug #19964: occasional crushtool timeouts
- /a/sage-2017-05-24_22:20:09-rados-wip-sage-testing---basic-smithi/1225182
- 12:16 AM Bug #19790: rados ls on pool with no access returns no error
- Looking into this
05/24/2017
- 11:13 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Kefu, could you take a look at this one? Not sure if it's related to recent denc changes, or perhaps https://github.c...
- 10:26 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- More instances from last night's master:
- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic... - 10:01 PM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-05-24_18:40:38-rados-wip-sage-testing2---basic-smithi/1224933
- 03:44 PM Bug #16890 (Fix Under Review): rbd diff outputs nothing when the image is layered and with a writ...
- 03:43 PM Feature #16883: omap not supported by ec pools
- This is due to erasure coded pools not supporting omap operations. It's a limitation for the current cache pool code,...
- 03:25 PM Bug #17170 (Can't reproduce): mon/monclient: update "unable to obtain rotating service keys when ...
- 03:22 PM Bug #17929: rados tool should bail out if you combine listing and setting the snap ID
- There is discussion on that (closed) PR. We just don't want to do snap listing as it's even more expensive than norma...
- 03:13 PM Bug #17968 (Need More Info): Ceph:OSD can't finish recovery+backfill process due to assertion fai...
- 03:13 PM Bug #17968 (Can't reproduce): Ceph:OSD can't finish recovery+backfill process due to assertion fa...
- 12:05 PM Bug #20068 (In Progress): osd valgrind error in CrushWrapper::has_incompat_choose_args
- 10:34 AM Bug #20068: osd valgrind error in CrushWrapper::has_incompat_choose_args
- Oops, left off the actual link:
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/122... - 10:33 AM Bug #20068 (Resolved): osd valgrind error in CrushWrapper::has_incompat_choose_args
- Loic: assigning to you because it looks like you were working in this function recently....
- 10:47 AM Bug #20069 (New): PGs failing to create at start of test, REQUIRE_LUMINOUS not set?
- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/1222407...
- 08:52 AM Bug #19790: rados ls on pool with no access returns no error
- For what it's worth, this is a regression. In Hammer, the appropriate EPERM is raised:...
05/23/2017
- 08:24 PM Bug #18165 (In Progress): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_tar...
- 07:37 PM Bug #19790: rados ls on pool with no access returns no error
- Well, it's obvious enough, we go into PrimaryLogPG::do_pg_op() before we check op_has_sufficient_caps().
I think t... - 06:57 PM Bug #20059 (Resolved): miscounting degraded objects
- on bigbang,...
- 09:50 AM Bug #20053 (New): crush compile / decompile looses precision on weight
- The weight of an item is displayed with %.3f and looses precision that makes a difference in mapping.
Steps to rep... - 03:39 AM Bug #20050: osd: very old pg creates take a long time to build past_intervals
- partially addressed by patch in wip-bigbang.
- 03:33 AM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
- (bigbang)
osds were down for a long time and pgs never got created. when the osds finally are up, they have to go...
05/22/2017
- 11:05 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Still happens in 12.0.3, with the patch [[https://github.com/ceph/ceph/pull/15046]] applied。
- 08:35 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- ...
- 05:22 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- I've seen this on scrub as well.
- 03:55 PM Bug #20041 (Resolved): ceph-osd: PGs getting stuck in scrub state, stalling RBD
- See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa...
- 04:34 PM Bug #19964: occasional crushtool timeouts
- See this log as well:
http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-20_04:20:14-rados-master_2017_5_20---basic... - 06:51 AM Bug #20000 (Can't reproduce): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+ec+overw...
05/20/2017
- 06:41 AM Bug #19964: occasional crushtool timeouts
- this is not new, i've been spotting this occasionally in our jenkins run.
05/19/2017
- 04:38 PM Bug #19991 (New): dmclock-tests fail on my build VM
On my build machine which is a VM. It passes on Jenkins.
[ RUN ] test_client.full_bore_timing
/home/dz...- 03:18 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
i have the same error with 12.0.3...- 02:52 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
- After switching to writeback cache mode, this error didn't occur again. So I'm confident the proxy mode of the cache ...
- 08:30 AM Bug #19983 (Closed): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/Kerne...
- version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
05/17/2017
- 11:09 PM Bug #19971 (In Progress): osd: deletes are performed inline during pg log processing
- 11:09 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
- With a large number of deletes in a client workload, this can easily saturate a disk and cause very high latency, sin...
- 09:42 PM Bug #19700: OSD remained up despite cluster network being inactive?
- Was the cluster performing IO while this happened? Do your public and private networks perhaps route to each other?
... - 07:34 PM Bug #19790: rados ls on pool with no access returns no error
- Same issue even with just @rw@:...
- 06:38 PM Bug #19790: rados ls on pool with no access returns no error
- I'm not at a computer to check, but I'm pretty sure the "allow *" is short-circuiting other security checks here and ...
- 04:10 PM Bug #19790: rados ls on pool with no access returns no error
- 09:12 AM Bug #19790: rados ls on pool with no access returns no error
- Just checking: is anyone looking at this? It's arguably a security issue, after all.
- 03:53 PM Bug #16567: Ceph raises scrub errors on pgs with objects being overwritten
- Hmm, similar reports have popped up (although with on-disk size 0) on the mailing list. Those involved cache tiers th...
- 03:51 PM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- xfs corruption means your setup was not safe for power failure, or your disk is dying. Neither is something that ceph...
- 03:38 PM Bug #15936: Osd-s on cache pool crash after upgrade from Hammer to Jewel
- Ping Joao? This looks to have been a crash in persisting/trimming HitSets, which I know underwent a bunch of changes/...
- 03:19 PM Bug #15741: librados get_last_version() doesn't return correct result after aio completion
- Any update on this, David? :)
- 02:23 PM Bug #19964 (Resolved): occasional crushtool timeouts
- ...
- 11:21 AM Bug #19960 (Resolved): overflow in client_io_rate in ceph osd pool stats
- luminous branch, v12.0.2
Output of ceph osd pool stats -f json contains overflowed values in client_io_rate sectio...
05/16/2017
- 07:59 PM Bug #19943: osd: enoent on snaptrimmer
- /a/yuriw-2017-05-15_22:59:10-rados-wip-yuri-testing_2017_5_16-distro-basic-smithi/1181575 (bluestore)
- 05:55 PM Bug #19943: osd: enoent on snaptrimmer
- Clone 269 was trimmed but it corresponds to a lot of other snapshots, so the object shouldn't be removed until all th...
- 04:04 PM Bug #19943 (Resolved): osd: enoent on snaptrimmer
- ...
- 04:54 PM Feature #19944 (Rejected): [RFE]: add option/support config persistence with ceph tell command
- we should have support in ceph itself to make the conf changes persist, ceph tell has good error checking mechanism a...
- 11:08 AM Bug #19939 (Resolved): OSD crash in MOSDRepOpReply::decode_payload
Seen on kcephfs suite, running against test branch based on Monday's master....- 02:39 AM Bug #19936 (New): filestore ENOTEMPTY
- ...
05/12/2017
- 06:32 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Or Sage/Zheng can confirm if this failure mode matches that error...
- 06:31 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- I'm just pattern-matching from going through my email, but https://github.com/ceph/ceph/pull/15046 is about OSDMap de...
05/11/2017
- 03:14 PM Bug #19911 (Can't reproduce): osd: out of order op
- ...
- 01:01 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
- After updating osds from 12.0.1 to 12.0.1-2248-g745902a, we get all osd's failing like this:...
05/10/2017
- 02:03 PM Bug #19639: mon crash on shutdown
- Is it reproducing? Wouldn't surprise me if these were linked.
- 10:42 AM Bug #19639: mon crash on shutdown
- Sorry, I made the history confusing by editing the description. The "other one" is the one that is now the only one ...
05/09/2017
- 03:50 PM Bug #19895 (Can't reproduce): test/osd/RadosModel.h: 1169: FAILED assert(version == old_value.ver...
- ...
- 08:42 AM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
- Hi,
in fact, we are still (after some days) having this issue after upgrading to Luminous 12.0.2.
Same errors in th...
05/08/2017
- 09:50 PM Bug #19882 (Resolved): rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0....
- /a/sage-2017-05-08_20:50:21-rbd:qemu-wip-19863---basic-smithi/1114854
/a/sage-2017-05-08_20:50:21-rbd:qemu-wip-19863... - 06:54 PM Bug #19881 (Can't reproduce): ceph-osd: pg_update_log_missing(1.20 epoch 66/11 rep_tid 1493 entri...
- OSD assertion failure during rbd-mirror test:
http://qa-proxy.ceph.com/teuthology/jdillaman-2017-05-08_11:56:19-rbd-...
05/07/2017
- 05:15 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
- Unfortunately my colleague already fixed the MDS (recover_dentries, journal reset) - now the op reply contains data.
...
05/04/2017
- 02:35 AM Bug #17945: ceph_test_rados_api_tier: failed to decode hitset in HitSetWrite test
- saw it again, ...
- 01:05 AM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
- osd op replies for 'stat' do not contain data. (140+0+0 in these lines) ...
05/03/2017
- 11:21 PM Bug #19849 (New): cls ops do not consistently get ENOENT on whiteouts
- the cls glue objclass.cc directly calls do_osd_ops, which inconsistently checks for !exists || is_whiteout(). instea...
- 03:58 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
- Thanks for looking into this.
Here is the output with debug_mds=20 and debug_ms=1:...
05/02/2017
- 04:25 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
- Hi,
i tested with kraken v11.2.0 again : deactivated the bluefs_allocator = stupid and restarted all my OSDs, issu... - 10:06 AM Bug #18749: OSD: allow EC PGs to do recovery below min_size
- https://trello.com/c/5q8YSNtu
I am willing to solve this problem - 09:16 AM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
- The crash is strange,it happened when decoding on-wire message from osd. Please add 'debug ms = 1' to mds config and ...
- 08:10 AM Cleanup #18875: osd: give deletion ops a cost when performing backfill
- Working on this issue
- 07:06 AM Bug #19400: add more info during pool delete error
- It's resolved.
Also available in: Atom