Project

General

Profile

Activity

From 04/27/2017 to 05/26/2017

05/26/2017

09:44 PM Bug #19943: osd: enoent on snaptrimmer
http://pulpito.ceph.com/gregf-2017-05-26_06:45:56-rados-wip-19931-snaptrim-pgs---basic-smithi/1231020/ Greg Farnum
03:36 PM Bug #20068 (Need More Info): osd valgrind error in CrushWrapper::has_incompat_choose_args
https://github.com/ceph/ceph/pull/15244 was merged recently and modified how things are handled. Let see if it happen... Loïc Dachary
12:40 PM Bug #20092 (Duplicate): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
http://pulpito.ceph.com/jdillaman-2017-05-25_16:48:38-rbd-wip-jd-testing-distro-basic-smithi/1229611... Jason Dillaman

05/25/2017

10:07 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
... Sage Weil
06:11 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224591/teuthology.log... Brad Hubbard
05:56 AM Bug #19943: osd: enoent on snaptrimmer
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224546/teuthology.log Brad Hubbard
02:27 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-24_22:20:09-rados-wip-sage-testing---basic-smithi/1225182 Sage Weil
12:16 AM Bug #19790: rados ls on pool with no access returns no error
Looking into this Brad Hubbard

05/24/2017

11:13 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
Kefu, could you take a look at this one? Not sure if it's related to recent denc changes, or perhaps https://github.c... Josh Durgin
10:26 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
More instances from last night's master:
- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic...
John Spray
10:01 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-24_18:40:38-rados-wip-sage-testing2---basic-smithi/1224933 Sage Weil
03:44 PM Bug #16890 (Fix Under Review): rbd diff outputs nothing when the image is layered and with a writ...
Josh Durgin
03:43 PM Feature #16883: omap not supported by ec pools
This is due to erasure coded pools not supporting omap operations. It's a limitation for the current cache pool code,... Josh Durgin
03:25 PM Bug #17170 (Can't reproduce): mon/monclient: update "unable to obtain rotating service keys when ...
Sage Weil
03:22 PM Bug #17929: rados tool should bail out if you combine listing and setting the snap ID
There is discussion on that (closed) PR. We just don't want to do snap listing as it's even more expensive than norma... Greg Farnum
03:13 PM Bug #17968 (Need More Info): Ceph:OSD can't finish recovery+backfill process due to assertion fai...
Greg Farnum
03:13 PM Bug #17968 (Can't reproduce): Ceph:OSD can't finish recovery+backfill process due to assertion fa...
Greg Farnum
12:05 PM Bug #20068 (In Progress): osd valgrind error in CrushWrapper::has_incompat_choose_args
Loïc Dachary
10:34 AM Bug #20068: osd valgrind error in CrushWrapper::has_incompat_choose_args
Oops, left off the actual link:
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/122...
John Spray
10:33 AM Bug #20068 (Resolved): osd valgrind error in CrushWrapper::has_incompat_choose_args
Loic: assigning to you because it looks like you were working in this function recently.... John Spray
10:47 AM Bug #20069 (New): PGs failing to create at start of test, REQUIRE_LUMINOUS not set?
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/1222407... John Spray
08:52 AM Bug #19790: rados ls on pool with no access returns no error
For what it's worth, this is a regression. In Hammer, the appropriate EPERM is raised:... Florian Haas

05/23/2017

08:24 PM Bug #18165 (In Progress): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_tar...
David Zafman
07:37 PM Bug #19790: rados ls on pool with no access returns no error
Well, it's obvious enough, we go into PrimaryLogPG::do_pg_op() before we check op_has_sufficient_caps().
I think t...
Greg Farnum
06:57 PM Bug #20059 (Resolved): miscounting degraded objects
on bigbang,... Sage Weil
09:50 AM Bug #20053 (New): crush compile / decompile looses precision on weight
The weight of an item is displayed with %.3f and looses precision that makes a difference in mapping.
Steps to rep...
Loïc Dachary
03:39 AM Bug #20050: osd: very old pg creates take a long time to build past_intervals
partially addressed by patch in wip-bigbang. Sage Weil
03:33 AM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
(bigbang)
osds were down for a long time and pgs never got created. when the osds finally are up, they have to go...
Sage Weil

05/22/2017

11:05 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
Still happens in 12.0.3, with the patch [[https://github.com/ceph/ceph/pull/15046]] applied。 WANG Guoqin
08:35 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
... Jason Dillaman
05:22 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
I've seen this on scrub as well. Stefan Priebe
03:55 PM Bug #20041 (Resolved): ceph-osd: PGs getting stuck in scrub state, stalling RBD
See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa... Jason Dillaman
04:34 PM Bug #19964: occasional crushtool timeouts
See this log as well:
http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-20_04:20:14-rados-master_2017_5_20---basic...
Yuri Weinstein
06:51 AM Bug #20000 (Can't reproduce): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+ec+overw...
xw zhang

05/20/2017

06:41 AM Bug #19964: occasional crushtool timeouts
this is not new, i've been spotting this occasionally in our jenkins run. Kefu Chai

05/19/2017

04:38 PM Bug #19991 (New): dmclock-tests fail on my build VM

On my build machine which is a VM. It passes on Jenkins.
[ RUN ] test_client.full_bore_timing
/home/dz...
David Zafman
03:18 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))

i have the same error with 12.0.3...
Bertrand Gouny
02:52 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
After switching to writeback cache mode, this error didn't occur again. So I'm confident the proxy mode of the cache ... Andreas Gerstmayr
08:30 AM Bug #19983 (Closed): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/Kerne...
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
xw zhang

05/17/2017

11:09 PM Bug #19971 (In Progress): osd: deletes are performed inline during pg log processing
Josh Durgin
11:09 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
With a large number of deletes in a client workload, this can easily saturate a disk and cause very high latency, sin... Josh Durgin
09:42 PM Bug #19700: OSD remained up despite cluster network being inactive?
Was the cluster performing IO while this happened? Do your public and private networks perhaps route to each other?
...
Greg Farnum
07:34 PM Bug #19790: rados ls on pool with no access returns no error
Same issue even with just @rw@:... Florian Haas
06:38 PM Bug #19790: rados ls on pool with no access returns no error
I'm not at a computer to check, but I'm pretty sure the "allow *" is short-circuiting other security checks here and ... Greg Farnum
04:10 PM Bug #19790: rados ls on pool with no access returns no error
Xiaoxi Chen
09:12 AM Bug #19790: rados ls on pool with no access returns no error
Just checking: is anyone looking at this? It's arguably a security issue, after all. Florian Haas
03:53 PM Bug #16567: Ceph raises scrub errors on pgs with objects being overwritten
Hmm, similar reports have popped up (although with on-disk size 0) on the mailing list. Those involved cache tiers th... Greg Farnum
03:51 PM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
xfs corruption means your setup was not safe for power failure, or your disk is dying. Neither is something that ceph... Josh Durgin
03:38 PM Bug #15936: Osd-s on cache pool crash after upgrade from Hammer to Jewel
Ping Joao? This looks to have been a crash in persisting/trimming HitSets, which I know underwent a bunch of changes/... Greg Farnum
03:19 PM Bug #15741: librados get_last_version() doesn't return correct result after aio completion
Any update on this, David? :) Greg Farnum
02:23 PM Bug #19964 (Resolved): occasional crushtool timeouts
... Sage Weil
11:21 AM Bug #19960 (Resolved): overflow in client_io_rate in ceph osd pool stats
luminous branch, v12.0.2
Output of ceph osd pool stats -f json contains overflowed values in client_io_rate sectio...
Aleksei Gutikov

05/16/2017

07:59 PM Bug #19943: osd: enoent on snaptrimmer
/a/yuriw-2017-05-15_22:59:10-rados-wip-yuri-testing_2017_5_16-distro-basic-smithi/1181575 (bluestore) Sage Weil
05:55 PM Bug #19943: osd: enoent on snaptrimmer
Clone 269 was trimmed but it corresponds to a lot of other snapshots, so the object shouldn't be removed until all th... Greg Farnum
04:04 PM Bug #19943 (Resolved): osd: enoent on snaptrimmer
... Sage Weil
04:54 PM Feature #19944 (Rejected): [RFE]: add option/support config persistence with ceph tell command
we should have support in ceph itself to make the conf changes persist, ceph tell has good error checking mechanism a... Vasu Kulkarni
11:08 AM Bug #19939 (Resolved): OSD crash in MOSDRepOpReply::decode_payload

Seen on kcephfs suite, running against test branch based on Monday's master....
John Spray
02:39 AM Bug #19936 (New): filestore ENOTEMPTY
... Sage Weil

05/12/2017

06:32 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
Or Sage/Zheng can confirm if this failure mode matches that error... Greg Farnum
06:31 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
I'm just pattern-matching from going through my email, but https://github.com/ceph/ceph/pull/15046 is about OSDMap de... Greg Farnum

05/11/2017

03:14 PM Bug #19911 (Can't reproduce): osd: out of order op
... Sage Weil
01:01 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
After updating osds from 12.0.1 to 12.0.1-2248-g745902a, we get all osd's failing like this:... Dan van der Ster

05/10/2017

02:03 PM Bug #19639: mon crash on shutdown
Is it reproducing? Wouldn't surprise me if these were linked. Greg Farnum
10:42 AM Bug #19639: mon crash on shutdown
Sorry, I made the history confusing by editing the description. The "other one" is the one that is now the only one ... John Spray

05/09/2017

03:50 PM Bug #19895 (Can't reproduce): test/osd/RadosModel.h: 1169: FAILED assert(version == old_value.ver...
... Sage Weil
08:42 AM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Hi,
in fact, we are still (after some days) having this issue after upgrading to Luminous 12.0.2.
Same errors in th...
François Blondel

05/08/2017

09:50 PM Bug #19882 (Resolved): rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0....
/a/sage-2017-05-08_20:50:21-rbd:qemu-wip-19863---basic-smithi/1114854
/a/sage-2017-05-08_20:50:21-rbd:qemu-wip-19863...
Sage Weil
06:54 PM Bug #19881 (Can't reproduce): ceph-osd: pg_update_log_missing(1.20 epoch 66/11 rep_tid 1493 entri...
OSD assertion failure during rbd-mirror test:
http://qa-proxy.ceph.com/teuthology/jdillaman-2017-05-08_11:56:19-rbd-...
Jason Dillaman

05/07/2017

05:15 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
Unfortunately my colleague already fixed the MDS (recover_dentries, journal reset) - now the op reply contains data.
...
Andreas Gerstmayr

05/04/2017

02:35 AM Bug #17945: ceph_test_rados_api_tier: failed to decode hitset in HitSetWrite test
saw it again, ... Sage Weil
01:05 AM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
osd op replies for 'stat' do not contain data. (140+0+0 in these lines) ... Zheng Yan

05/03/2017

11:21 PM Bug #19849 (New): cls ops do not consistently get ENOENT on whiteouts
the cls glue objclass.cc directly calls do_osd_ops, which inconsistently checks for !exists || is_whiteout(). instea... Sage Weil
03:58 PM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
Thanks for looking into this.
Here is the output with debug_mds=20 and debug_ms=1:...
Andreas Gerstmayr

05/02/2017

04:25 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Hi,
i tested with kraken v11.2.0 again : deactivated the bluefs_allocator = stupid and restarted all my OSDs, issu...
François Blondel
10:06 AM Bug #18749: OSD: allow EC PGs to do recovery below min_size
https://trello.com/c/5q8YSNtu
I am willing to solve this problem
Chang Liu
09:16 AM Bug #19803: osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled buffer::...
The crash is strange,it happened when decoding on-wire message from osd. Please add 'debug ms = 1' to mds config and ... Zheng Yan
08:10 AM Cleanup #18875: osd: give deletion ops a cost when performing backfill
Working on this issue Chang Liu
07:06 AM Bug #19400: add more info during pool delete error
It's resolved. Chang Liu

05/01/2017

10:26 PM Bug #19818 (New): crush: get_rule_weight_osd_map does not factor in pool size, rule
The get_rule_weight_osd_map assumes that every osd reachable by the TAKE ops are used once. This isn't true in gener... Sage Weil
05:33 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
Sage Weil
04:24 PM Bug #18698 (Can't reproduce): BlueFS FAILED assert(0 == "allocate failed... wtf")
I haven't seen this in any our qa... is it still happening for you? Which versions? Sage Weil
03:55 PM Bug #19639 (Need More Info): mon crash on shutdown
what is the "other one" (besides probe_timeout #19738)? Sage Weil
02:34 PM Bug #19815 (New): Rollback/EC log entries take gratuitous amounts of memory
Each osd consumes too much memory when i tested ec overwrite. So i watched heap memory with google-perftools.
I fou...
hongpeng lu

04/28/2017

09:38 PM Feature #19810 (New): qa: test that we are trimming maps
We merged https://github.com/ceph/ceph/pull/14504 without noticing that it prevented *all* OSD map trimming, because ... Greg Farnum
03:29 PM Bug #19803 (New): osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled bu...
Hi,
our MDS crashes reproducible after some hours when we're extracting lots of zip archives (with many small file...
Andreas Gerstmayr
05:42 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
In this case, the bug occurs if firstly remove pool, following create pool and image.
When it occur, The most intui...
Tang Jin
02:19 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
(gdb) bt
#0 0x00002b0ff06d4cc3 in pread64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00005591f7cecd35 in pr...
Tang Jin
02:18 AM Bug #19800 (Resolved): some osds are down when create a new pool and a new image of the pool (blu...
After much Write IOs such as snapshot writing and PG splitting for cluster, to create a new pool and a new image of t... Tang Jin

04/27/2017

06:57 PM Bug #18329 (Can't reproduce): pure virtual method called in rocksdb from bluestore
haven't seen this since then. Sage Weil
06:19 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
Ran 4 times - 50% failure rate: http://pulpito.ceph.com/smithfarm-2017-04-27_17:35:57-rados-wip-jewel-backports-distr... Nathan Cutler
05:40 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
http://pulpito.ceph.com/smithfarm-2017-04-27_16:56:17-rados-wip-jewel-backports---basic-smithi/1074069/ Nathan Cutler
08:58 AM Bug #19790 (Resolved): rados ls on pool with no access returns no error
Given the following auth capabilities:... Florian Haas
 

Also available in: Atom