Project

General

Profile

Activity

From 05/02/2017 to 05/31/2017

05/31/2017

11:06 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-31_18:45:30-rados-wip-sage-testing---basic-smithi/1248735 Sage Weil
04:38 PM Bug #18043: ceph-mon prioritizes public_network over mon_host address
and to elaborate on the fact that i have a branch and no pr, i do intend to finish this up soon, but likely only afte... Joao Eduardo Luis
04:36 PM Bug #18043: ceph-mon prioritizes public_network over mon_host address
fwiw, i've got a branch handling this from earlier this year: https://github.com/jecluis/ceph/commits/wip-mon-host
...
Joao Eduardo Luis
04:04 PM Support #18508 (Closed): PGs of EC pool stuck in peering state
There was clearly a lot going on here and none of it was clear. If switching to SimpleMessenger fixed it, I presume t... Greg Farnum
03:14 PM Bug #17138: crush: inconsistent ruleset/ruled_id are difficult to figure out
Some work in progress on this here: https://github.com/ceph/ceph/pull/13683 Josh Durgin
03:21 AM Bug #20117 (Rejected): BlueStore.cc: 8585: FAILED assert(0 == "unexpected error")
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
xw zhang
03:19 AM Bug #20116 (Can't reproduce): osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILE...
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
xw zhang

05/30/2017

01:46 PM Support #20108 (Resolved): PGs are not remapped correctly when one host fails
I have run into the following problem:
in a 6 node cluster we have 2 nodes/chassis, and the crush rule set to distri...
Laszlo Budai
01:45 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Logs available on teuthology:/home/jdillaman/osd.23.log_try_rados_rm.gz
Jason Dillaman

05/29/2017

11:30 PM Bug #19790 (In Progress): rados ls on pool with no access returns no error
Brad Hubbard
11:28 PM Bug #19790: rados ls on pool with no access returns no error
https://github.com/ceph/ceph/pull/15354
Greg, will talk to you about the per-object cap semantics separately.
Brad Hubbard
07:45 PM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-28_05:00:18-rados-wip-sage-testing---basic-smithi/1238511
description: rados/singleton-bluestore/{...
Sage Weil
02:51 PM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
https://github.com/ceph/ceph/pull/15349 Xuehan Xu

05/28/2017

09:17 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
I've no idea the repercussions (thinking I'll backup and recreate the cluster) but if you write an osdmap into all of... Jason McNeil
03:09 AM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235222 Sage Weil
03:07 AM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235419 Sage Weil
02:03 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235225 Sage Weil
01:59 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-27_01:05:11-rados-wip-sage-testing---basic-smithi/1233483 Sage Weil
01:57 AM Bug #20105 (Resolved): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
... Sage Weil

05/27/2017

08:06 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
I have a document that provides the detail of our analysis of this problem, but it's written in chinese. If needed, I... Xuehan Xu
08:03 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
Hi, everyone.
Sorry, I forgot to watch my issues.
We found that the problem is due to "librados::OPERATION_BALA...
Xuehan Xu
07:59 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
I pulled out a disk, and then there was the problem. xw zhang
03:06 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
fang yuxiang wrote:
> i think this is not functional issue of ceph, maybe your local fs data is corrupted.
>
> ar...
huanwen ren
03:01 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
`read_log 406'6529418` and `read_log 346'6529418` have the same seq
other, I use ceph-kvstore-tool can show as:
...
huanwen ren
02:46 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
i think this is not functional issue of ceph, maybe your local fs data is corrupted.
are you using any block cache...
fang yuxiang
02:41 AM Bug #20099 (Need More Info): osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.versi...
My Ceph cluster is down when the server is powered off,
and when i restart my osd, it failed in read_log.
As fllow:...
huanwen ren

05/26/2017

09:44 PM Bug #19943: osd: enoent on snaptrimmer
http://pulpito.ceph.com/gregf-2017-05-26_06:45:56-rados-wip-19931-snaptrim-pgs---basic-smithi/1231020/ Greg Farnum
03:36 PM Bug #20068 (Need More Info): osd valgrind error in CrushWrapper::has_incompat_choose_args
https://github.com/ceph/ceph/pull/15244 was merged recently and modified how things are handled. Let see if it happen... Loïc Dachary
12:40 PM Bug #20092 (Duplicate): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
http://pulpito.ceph.com/jdillaman-2017-05-25_16:48:38-rbd-wip-jd-testing-distro-basic-smithi/1229611... Jason Dillaman

05/25/2017

10:07 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
... Sage Weil
06:11 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224591/teuthology.log... Brad Hubbard
05:56 AM Bug #19943: osd: enoent on snaptrimmer
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224546/teuthology.log Brad Hubbard
02:27 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-24_22:20:09-rados-wip-sage-testing---basic-smithi/1225182 Sage Weil
12:16 AM Bug #19790: rados ls on pool with no access returns no error
Looking into this Brad Hubbard

05/24/2017

11:13 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
Kefu, could you take a look at this one? Not sure if it's related to recent denc changes, or perhaps https://github.c... Josh Durgin
10:26 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
More instances from last night's master:
- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic...
John Spray
10:01 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-24_18:40:38-rados-wip-sage-testing2---basic-smithi/1224933 Sage Weil
03:44 PM Bug #16890 (Fix Under Review): rbd diff outputs nothing when the image is layered and with a writ...
Josh Durgin
03:43 PM Feature #16883: omap not supported by ec pools
This is due to erasure coded pools not supporting omap operations. It's a limitation for the current cache pool code,... Josh Durgin
03:25 PM Bug #17170 (Can't reproduce): mon/monclient: update "unable to obtain rotating service keys when ...
Sage Weil
03:22 PM Bug #17929: rados tool should bail out if you combine listing and setting the snap ID
There is discussion on that (closed) PR. We just don't want to do snap listing as it's even more expensive than norma... Greg Farnum
03:13 PM Bug #17968 (Need More Info): Ceph:OSD can't finish recovery+backfill process due to assertion fai...
Greg Farnum
03:13 PM Bug #17968 (Can't reproduce): Ceph:OSD can't finish recovery+backfill process due to assertion fa...
Greg Farnum
12:05 PM Bug #20068 (In Progress): osd valgrind error in CrushWrapper::has_incompat_choose_args
Loïc Dachary
10:34 AM Bug #20068: osd valgrind error in CrushWrapper::has_incompat_choose_args
Oops, left off the actual link:
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/122...
John Spray
10:33 AM Bug #20068 (Resolved): osd valgrind error in CrushWrapper::has_incompat_choose_args
Loic: assigning to you because it looks like you were working in this function recently.... John Spray
10:47 AM Bug #20069 (New): PGs failing to create at start of test, REQUIRE_LUMINOUS not set?
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/1222407... John Spray
08:52 AM Bug #19790: rados ls on pool with no access returns no error
For what it's worth, this is a regression. In Hammer, the appropriate EPERM is raised:... Florian Haas

05/23/2017

08:24 PM Bug #18165 (In Progress): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_tar...
David Zafman
07:37 PM Bug #19790: rados ls on pool with no access returns no error
Well, it's obvious enough, we go into PrimaryLogPG::do_pg_op() before we check op_has_sufficient_caps().
I think t...
Greg Farnum
06:57 PM Bug #20059 (Resolved): miscounting degraded objects
on bigbang,... Sage Weil
09:50 AM Bug #20053 (New): crush compile / decompile looses precision on weight
The weight of an item is displayed with %.3f and looses precision that makes a difference in mapping.
Steps to rep...
Loïc Dachary
03:39 AM Bug #20050: osd: very old pg creates take a long time to build past_intervals
partially addressed by patch in wip-bigbang. Sage Weil
03:33 AM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
(bigbang)
osds were down for a long time and pgs never got created. when the osds finally are up, they have to go...
Sage Weil

05/22/2017

11:05 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
Still happens in 12.0.3, with the patch [[https://github.com/ceph/ceph/pull/15046]] applied。 WANG Guoqin
08:35 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
... Jason Dillaman
05:22 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
I've seen this on scrub as well. Stefan Priebe
03:55 PM Bug #20041 (Resolved): ceph-osd: PGs getting stuck in scrub state, stalling RBD
See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa... Jason Dillaman
04:34 PM Bug #19964: occasional crushtool timeouts
See this log as well:
http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-20_04:20:14-rados-master_2017_5_20---basic...
Yuri Weinstein
06:51 AM Bug #20000 (Can't reproduce): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+ec+overw...
xw zhang

05/20/2017

06:41 AM Bug #19964: occasional crushtool timeouts
this is not new, i've been spotting this occasionally in our jenkins run. Kefu Chai

05/19/2017

04:38 PM Bug #19991 (New): dmclock-tests fail on my build VM

On my build machine which is a VM. It passes on Jenkins.
[ RUN ] test_client.full_bore_timing
/home/dz...
David Zafman
03:18 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))

i have the same error with 12.0.3...
Bertrand Gouny
02:52 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
After switching to writeback cache mode, this error didn't occur again. So I'm confident the proxy mode of the cache ... Andreas Gerstmayr
08:30 AM Bug #19983 (Closed): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/Kerne...
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
xw zhang

05/17/2017

11:09 PM Bug #19971 (In Progress): osd: deletes are performed inline during pg log processing
Josh Durgin
11:09 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
With a large number of deletes in a client workload, this can easily saturate a disk and cause very high latency, sin... Josh Durgin
09:42 PM Bug #19700: OSD remained up despite cluster network being inactive?
Was the cluster performing IO while this happened? Do your public and private networks perhaps route to each other?
...
Greg Farnum
07:34 PM Bug #19790: rados ls on pool with no access returns no error
Same issue even with just @rw@:... Florian Haas
06:38 PM Bug #19790: rados ls on pool with no access returns no error
I'm not at a computer to check, but I'm pretty sure the "allow *" is short-circuiting other security checks here and ... Greg Farnum
04:10 PM Bug #19790: rados ls on pool with no access returns no error
Xiaoxi Chen
09:12 AM Bug #19790: rados ls on pool with no access returns no error
Just checking: is anyone looking at this? It's arguably a security issue, after all. Florian Haas
03:53 PM Bug #16567: Ceph raises scrub errors on pgs with objects being overwritten
Hmm, similar reports have popped up (although with on-disk size 0) on the mailing list. Those involved cache tiers th... Greg Farnum
03:51 PM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
xfs corruption means your setup was not safe for power failure, or your disk is dying. Neither is something that ceph... Josh Durgin
03:38 PM Bug #15936: Osd-s on cache pool crash after upgrade from Hammer to Jewel
Ping Joao? This looks to have been a crash in persisting/trimming HitSets, which I know underwent a bunch of changes/... Greg Farnum
03:19 PM Bug #15741: librados get_last_version() doesn't return correct result after aio completion
Any update on this, David? :) Greg Farnum
02:23 PM Bug #19964 (Resolved): occasional crushtool timeouts
... Sage Weil
11:21 AM Bug #19960 (Resolved): overflow in client_io_rate in ceph osd pool stats
luminous branch, v12.0.2
Output of ceph osd pool stats -f json contains overflowed values in client_io_rate sectio...
Aleksei Gutikov

05/16/2017

07:59 PM Bug #19943: osd: enoent on snaptrimmer
/a/yuriw-2017-05-15_22:59:10-rados-wip-yuri-testing_2017_5_16-distro-basic-smithi/1181575 (bluestore) Sage Weil
05:55 PM Bug #19943: osd: enoent on snaptrimmer
Clone 269 was trimmed but it corresponds to a lot of other snapshots, so the object shouldn't be removed until all th... Greg Farnum
04:04 PM Bug #19943 (Resolved): osd: enoent on snaptrimmer
... Sage Weil
04:54 PM Feature #19944 (Rejected): [RFE]: add option/support config persistence with ceph tell command
we should have support in ceph itself to make the conf changes persist, ceph tell has good error checking mechanism a... Vasu Kulkarni
11:08 AM Bug #19939 (Resolved): OSD crash in MOSDRepOpReply::decode_payload

Seen on kcephfs suite, running against test branch based on Monday's master....
John Spray
02:39 AM Bug #19936 (New): filestore ENOTEMPTY
... Sage Weil

05/12/2017

06:32 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
Or Sage/Zheng can confirm if this failure mode matches that error... Greg Farnum
06:31 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
I'm just pattern-matching from going through my email, but https://github.com/ceph/ceph/pull/15046 is about OSDMap de... Greg Farnum

05/11/2017

03:14 PM Bug #19911 (Can't reproduce): osd: out of order op
... Sage Weil
01:01 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
After updating osds from 12.0.1 to 12.0.1-2248-g745902a, we get all osd's failing like this:... Dan van der Ster

05/10/2017

02:03 PM Bug #19639: mon crash on shutdown
Is it reproducing? Wouldn't surprise me if these were linked. Greg Farnum
10:42 AM Bug #19639: mon crash on shutdown
Sorry, I made the history confusing by editing the description. The "other one" is the one that is now the only one ... John Spray

05/09/2017

03:50 PM Bug #19895 (Can't reproduce): test/osd/RadosModel.h: 1169: FAILED assert(version == old_value.ver...
... Sage Weil
08:42 AM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Hi,
in fact, we are still (after some days) having this issue after upgrading to Luminous 12.0.2.
Same errors in th...
François Blondel

05/08/2017

09:50 PM Bug #19882 (Resolved): rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0....
/a/sage-2017-05-08_20:50:21-rbd:qemu-wip-19863---basic-smithi/1114854
/a/sage-2017-05-08_20:50:21-rbd:qemu-wip-19863...
Sage Weil
06:54 PM Bug #19881 (Can't reproduce): ceph-osd: pg_update_log_missing(1.20 epoch 66/11 rep_tid 1493 entri...
OSD assertion failure during rbd-mirror test:
http://qa-proxy.ceph.com/teuthology/jdillaman-2017-05-08_11:56:19-rbd-...
Jason Dillaman

05/07/2017

05:15 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
Unfortunately my colleague already fixed the MDS (recover_dentries, journal reset) - now the op reply contains data.
...
Andreas Gerstmayr

05/04/2017

02:35 AM Bug #17945: ceph_test_rados_api_tier: failed to decode hitset in HitSetWrite test
saw it again, ... Sage Weil
01:05 AM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
osd op replies for 'stat' do not contain data. (140+0+0 in these lines) ... Zheng Yan

05/03/2017

11:21 PM Bug #19849 (New): cls ops do not consistently get ENOENT on whiteouts
the cls glue objclass.cc directly calls do_osd_ops, which inconsistently checks for !exists || is_whiteout(). instea... Sage Weil
03:58 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
Thanks for looking into this.
Here is the output with debug_mds=20 and debug_ms=1:...
Andreas Gerstmayr

05/02/2017

04:25 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Hi,
i tested with kraken v11.2.0 again : deactivated the bluefs_allocator = stupid and restarted all my OSDs, issu...
François Blondel
10:06 AM Bug #18749: OSD: allow EC PGs to do recovery below min_size
https://trello.com/c/5q8YSNtu
I am willing to solve this problem
Chang Liu
09:16 AM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
The crash is strange,it happened when decoding on-wire message from osd. Please add 'debug ms = 1' to mds config and ... Zheng Yan
08:10 AM Cleanup #18875: osd: give deletion ops a cost when performing backfill
Working on this issue Chang Liu
07:06 AM Bug #19400: add more info during pool delete error
It's resolved. Chang Liu
 

Also available in: Atom