Project

General

Profile

Activity

From 05/22/2017 to 06/20/2017

06/20/2017

10:39 PM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
If you don't use the GPT partition labels/types that ceph-disk uses then the device ownership won't be changed to cep... Sage Weil
10:35 PM Bug #19983 (Need More Info): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluesto...
Do you mean you pulled out the disk, and then ceph-osd crashed? That is normal--the disk si gone!
Or, do you mean...
Sage Weil
09:15 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
https://github.com/ceph/ceph/pull/15791
Sage Weil
09:07 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
related? also started seeing these:... Sage Weil
08:32 PM Bug #20360 (New): rados/verify valgrind tests: osds fail to start (xenial valgrind)
... Sage Weil
08:55 PM Bug #19299 (New): Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
Ping Sage, you got that subprocess strace data. Greg Farnum
06:45 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
Same problem here (fresh 12.0.3). Got OSD's behind by > 5000 maps, it took ~8 hours to get them booted.
Looking in...
red ref
08:52 PM Bug #19700: OSD remained up despite cluster network being inactive?
Sounds like we messed up the way cluster network heartbeating and the monitor's public network connection to the OSDs... Greg Farnum
06:35 PM Bug #19700: OSD remained up despite cluster network being inactive?
The cluster does not need to be performing any IO, other than normal peering and checking, and this will still happen... Patrick McLean
08:50 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
red ref, are you saying you created a brand-new cluster with 12.0.3 and saw this on first boot?
Sage, do you think...
Greg Farnum
06:30 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
I can confirm the second behavior ("failed to load OSD map for epoch 1") in native installed 12.0.3 (not in productio... red ref
06:20 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
What Greg said! :) Sage Weil
04:52 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
N.0.Y releases such as 12.0.2 are dev releases; you should not run them if you can't afford to rebuild them. Upgrades... Greg Farnum
08:24 PM Bug #20227 (Resolved): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded s...
Sage Weil
02:56 AM Bug #20227 (Fix Under Review): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark un...
https://github.com/ceph/ceph/pull/15766 Sage Weil
02:54 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
... Sage Weil
02:50 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
/a/sage-2017-06-19_18:44:38-rbd:qemu-master---basic-smithi/1301319 Sage Weil
08:16 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
/a/sage-2017-06-20_16:21:45-rados-wip-sage-testing2-distro-basic-smithi/1305525
rados/thrash/{0-size-min-size-overri...
Sage Weil
06:27 PM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
Sage Weil
06:15 PM Bug #20343: Jewel: OSD Thread time outs in XFS
The filestore-level splitting and merging isn't in the logs - the best way to tell is examining a pg's directory - e.... Josh Durgin
05:32 PM Bug #20343: Jewel: OSD Thread time outs in XFS
We looked through the mon logs and we can't really find any splitting (or merging) pg states in there. Do we need to... Eric Choi
12:34 AM Bug #20343: Jewel: OSD Thread time outs in XFS
This could be filestore splitting directories into multiple subdirectories when there are many objects, then merging ... Josh Durgin
06:12 PM Bug #19943 (Fix Under Review): osd: enoent on snaptrimmer
https://github.com/ceph/ceph/pull/15787 Sage Weil
06:02 PM Bug #19943: osd: enoent on snaptrimmer
no, i'm an idiot, ceph-objectstore-tool is doing it and it's noted in a different log file. sheesh. Sage Weil
01:43 PM Bug #19943: osd: enoent on snaptrimmer
confirmed same thing in another run. on osd startup, fsck shows the key that was deleted.... Sage Weil
04:33 PM Bug #20301: "/src/osd/SnapMapper.cc: 231: FAILED assert(r == -2)" in rados
also in http://qa-proxy.ceph.com/teuthology/yuriw-2017-06-20_00:37:23-rados-master-2017_6_20-distro-basic-smithi/1302... Yuri Weinstein
03:56 PM Bug #20358 (Fix Under Review): bluestore: sharedblob not moved during split
https://github.com/ceph/ceph/pull/15783 Sage Weil
03:54 PM Bug #20358 (Resolved): bluestore: sharedblob not moved during split
... Sage Weil
01:22 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
Bug is not reproducible after this commit (not sure that only one contains fix):
commit d6d1db62edeb4c40a774fcb56e...
Aleksei Gutikov

06/19/2017

11:05 PM Bug #20273 (Resolved): osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
Sage Weil
10:47 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
from thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/017869.html
[15:41:40] <jdillaman> greg...
Greg Farnum
10:25 PM Bug #20343: Jewel: OSD Thread time outs in XFS
That IO pattern may just be killing the OSD on its own, but I'm not sure what RGW is turning it into or if there's st... Greg Farnum
07:16 PM Bug #20343 (New): Jewel: OSD Thread time outs in XFS
Creating a tracker ticket following suggestion from mailing list:
"
We've been having this ongoing problem with...
Eric Choi
09:12 PM Bug #19960 (Resolved): overflow in client_io_rate in ceph osd pool stats
If it's just one or two commits, we could backport (please fill in the Backport field in that case). But 131 commits? Nathan Cutler
09:11 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
Aleksei: Please be more specific. PR#15073 has 131 commits - see https://github.com/ceph/ceph/pull/15073/commits Nathan Cutler
07:55 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
http://pulpito.ceph.com/jdillaman-2017-05-25_16:48:38-rbd-wip-jd-testing-distro-basic-smithi/1229611 Greg Farnum
07:55 PM Bug #20092 (Duplicate): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
Oh, that's probably the new thing where btrfs is giving us ENOENT (Sage guessing it's about rocksdb and snapshots). T... Greg Farnum
12:26 PM Bug #20092 (Rejected): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
The osd.1 log showed the rocksdb encountered a full disk:
-17> 2017-05-25 22:14:28.664403 7fb70cd9b700 -1 rocks...
Jason Dillaman
07:51 PM Bug #20326 (Resolved): Scrubbing terminated -- not all pgs were active and clean.
Nathan Cutler
06:45 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
reliably triggered, it seems, by rbd/qemu xfstests workload Sage Weil
06:45 PM Bug #19882 (Resolved): rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0....
Sage Weil
05:43 PM Bug #19943: osd: enoent on snaptrimmer
... Sage Weil
03:30 PM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
Moving this to the RADOS bluestore tracker since it's probably owned by that team. Greg Farnum
11:55 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
... Kefu Chai
10:54 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
Unless there was a patch, I wouldn't be too sure this is fixed -- it was an intermittent failure. John Spray
10:48 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
all passed modulo a valgrind error in ceph-mds, see /a/kchai-2017-06-19_09:40:27-fs-master---basic-smithi/1300881/rem... Kefu Chai
09:41 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
rerunning at http://pulpito.ceph.com/kchai-2017-06-19_09:40:27-fs-master---basic-smithi/ Kefu Chai
08:14 AM Feature #15835 (Fix Under Review): filestore: randomize split threshold
Nathan Cutler

06/18/2017

08:36 AM Bug #20332: rados bench seq option doesn't work
Did you actually write out some data for it to read first? "seq" is just pulling back whatever was written down in th... Greg Farnum
08:28 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
Bumping this priority up since it's an assert on read of committed data, rather than a simple disk write error. Greg Farnum
08:24 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
Sounds like we need some way of more reliably accounting for the extra cost of EC overwrites in our throttle limits. Greg Farnum

06/17/2017

09:19 PM Bug #20188: filestore: os/filestore/FileStore.h: 357: FAILED assert(q.empty()) from ceph_test_obj...
This testing branch didn't include any of the filestore improvements we've been getting, did it? Greg Farnum
09:18 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-17_13:41:40-rados-wip-sage-testing-distro-basic-smithi/1297478 Sage Weil
09:16 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
Do we have any idea why it hasn't popped up in leveldb? Is the multi-threading stuff less conducive to being snapshot... Greg Farnum
09:14 PM Bug #20134 (Rejected): test_rados.TestIoctx.test_aio_read AssertionError: 5 != 2
5 is EIO. Thats not an error code we produce, but it's a possibility until David's stuff preventing us from returning... Greg Farnum
09:10 PM Bug #20326: Scrubbing terminated -- not all pgs were active and clean.
https://github.com/ceph/ceph/pull/15747 Sage Weil
09:09 PM Bug #20116: osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILED assert(curmap))
Are there more logs or core dumps available around this? That backtrace looks serious but doesn't contain enough info... Greg Farnum
09:05 PM Support #20108 (Resolved): PGs are not remapped correctly when one host fails
Okay, as described (and especially since it's better in jewel) this is almost certainly about CRUSH max_retries. I'm ... Greg Farnum
06:18 PM Bug #20242: Make osd-scrub-repair.sh unit test run faster
I'm looking into making this test run faster as well as a couple of the other slow ones by splitting them up into sma... Caleb Boylan
06:18 PM Bug #19639 (Can't reproduce): mon crash on shutdown
Greg Farnum
05:52 PM Bug #19639: mon crash on shutdown
I haven't seen this happen again in recent memory. John Spray
05:25 AM Bug #19639: mon crash on shutdown
Turning this down; should close if we don't get it happening again. Greg Farnum
02:59 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
A month past and I'm still not able to figure where the problem was, neither am I able to recover my cluster. Trying ... WANG Guoqin
01:47 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
I presume this was a bug in the older dev releases, but we should verify that before release. Greg Farnum
02:26 PM Bug #20099 (Need More Info): osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.versi...
Does this still exist or is it all cleaned up now? The repeating versions is a little weird but that's not enough dat... Greg Farnum
02:22 PM Bug #20092: ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
Do you have any evidence this *wasn't* an unexpected error given to us by the Filesystems, Jason? That does happen in... Greg Farnum
02:15 PM Bug #20059: miscounting degraded objects
Maybe we count each instance of an object when it's degraded (i.e., 3x for replicated pools), but the non-degraded on... Greg Farnum
01:43 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
Is this the read of partially-written EC extents? Need some context if it's in Testing... Greg Farnum
01:36 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
http://pulpito.ceph.com/sage-2017-06-16_19:23:03-rbd:qemu-wip-19882---basic-smithi/
reliably reproduced by rbd/qemu
Sage Weil
05:50 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
(Optimistically sorting it as a test issue.) Greg Farnum
05:50 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
Is the message that the primary OSD is down incorrect? We've seen a few things like this that are test bugs around ha... Greg Farnum
05:45 AM Bug #19700 (Need More Info): OSD remained up despite cluster network being inactive?
Greg Farnum
05:42 AM Bug #19695: mon: leaked session
Has this reproduced? I thought valgrind was clean enough we notice new leaks. Greg Farnum
05:19 AM Bug #19518: log entry does not include per-op rvals?
Have we *ever* filled in the per-op rvalues on retry? That sounds distressingly like returning read data on a write o... Greg Farnum
05:15 AM Bug #19487 (In Progress): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
Based on PR comments we expect this to be fixed up by one of David's disk handling branches. Or did that one already... Greg Farnum
03:52 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
John, sorry. i missed this. will take a look at it next monday. Kefu Chai
02:34 AM Bug #19486: Rebalancing can propagate corrupt copy of replicated object
Hat is an interesting point about BlueStore; it will detect corruption but not manual edits... Greg Farnum
02:23 AM Bug #19400 (Resolved): add more info during pool delete error
Greg Farnum
12:26 AM Bug #20332 (Won't Fix): rados bench seq option doesn't work

For some reason "seq" option finishes too quickly....
David Zafman

06/16/2017

09:10 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
/a/sage-2017-06-16_18:45:23-rados-wip-sage-testing-distro-basic-smithi/1293630 Sage Weil
01:40 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
... Sage Weil
09:10 PM Bug #20331 (Rejected): osd/PGLog.h: 770: FAILED assert(i->prior_version == last)
... Sage Weil
07:44 PM Bug #20000 (Need More Info): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
Sage Weil
07:44 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
Could be... maybe also #20273? Sage Weil
02:56 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
we found that the msg threads still working after the `delete osd` in asyncmsg env, its because the asyncmsg::wait() ... Zengran Zhang
07:41 PM Bug #20274: rewind divergent deletes head whiteout
Sage Weil
01:39 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
/a/sage-2017-06-16_00:46:50-rados-wip-sage-testing-distro-basic-smithi/1292433
rados/thrash-erasure-code/{ceph.yam...
Sage Weil
01:49 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
/a/kchai-2017-06-15_17:39:27-rados-wip-kefu-testing---basic-smithi/1291475 also with rocksdb + btrfs Kefu Chai
06:39 AM Bug #14088 (In Progress): mon: nothing logged when ENOSPC encountered during start up
https://github.com/ceph/ceph/pull/15723 - merged Brad Hubbard
05:54 AM Bug #19320: Pg inconsistent make ceph osd down
Hmm, did one of our official release said have the broken snapshot trimming back port semantics? I didn't think so bu... Greg Farnum
04:05 AM Bug #20256 (Resolved): "ceph osd df" is broken; asserts out on Luminous-enabled clusters
Nathan Cutler
02:30 AM Bug #20326 (In Progress): Scrubbing terminated -- not all pgs were active and clean.
... Sage Weil
01:29 AM Bug #20326 (New): Scrubbing terminated -- not all pgs were active and clean.
Kefu Chai
01:03 AM Bug #20326 (Resolved): Scrubbing terminated -- not all pgs were active and clean.
... Kefu Chai
12:42 AM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
/a//kchai-2017-06-15_17:39:27-rados-wip-kefu-testing---basic-smithi/1291451 Kefu Chai

06/15/2017

09:42 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
Sage Weil
06:04 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
/a/teuthology-2017-06-15_02:01:02-rbd-master-distro-basic-smithi/1287766
rbd/qemu/{cache/writeback.yaml clusters/{fi...
Sage Weil
05:59 PM Bug #20273 (Fix Under Review): osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
https://github.com/ceph/ceph/pull/15710 Sage Weil
05:53 PM Bug #20273: osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
- handle_osd_map queued a write, with _write_committed as callback
- thread pools alls hut down, including peering_w...
Sage Weil

06/14/2017

08:36 PM Bug #20256: "ceph osd df" is broken; asserts out on Luminous-enabled clusters
Greg Farnum
08:20 PM Bug #20303 (Can't reproduce): filejournal: Unable to read past sequence ... journal is corrupt
Run: http://pulpito.ceph.com/teuthology-2017-06-14_15:26:27-powercycle-master-distro-basic-smithi/
Job: 1285933
Log...
Yuri Weinstein
08:18 PM Bug #20302 (Resolved): "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle...
Run: http://pulpito.ceph.com/teuthology-2017-06-14_15:26:27-powercycle-master-distro-basic-smithi/
Job: 1285969
Log...
Yuri Weinstein
07:52 PM Bug #20301 (Can't reproduce): "/src/osd/SnapMapper.cc: 231: FAILED assert(r == -2)" in rados
Run: http://pulpito.ceph.com/yuriw-2017-06-14_15:02:07-rados-master_2017_6_14-distro-basic-smithi/
Job: 1285768
Log...
Yuri Weinstein
06:46 PM Bug #19943 (In Progress): osd: enoent on snaptrimmer
Sage Weil
02:12 PM Bug #19943: osd: enoent on snaptrimmer
log with more debugging at /a/sage-2017-06-14_03:38:53-rados:thrash-wip-19943---basic-smithi/1284145/ceph-osd.5.log Sage Weil
03:38 AM Bug #19943: osd: enoent on snaptrimmer
WTH. I've seen two cases where the object exists in snapmapper a different pool (cache tiering), but I think this is... Sage Weil
04:26 PM Bug #17806 (Resolved): OSD: do not open pgs when the pg is not in pg_map
Greg Farnum
10:01 AM Bug #17806: OSD: do not open pgs when the pg is not in pg_map
The PR is merged to upstream. https://github.com/ceph/ceph/pull/11803. So please close it. Thanks. Xinze Chi
03:54 AM Bug #17806: OSD: do not open pgs when the pg is not in pg_map
Without more details I'm not sure this assessment is actually correct... Greg Farnum
02:34 PM Bug #20295 (Resolved): bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool ...
When running "rbd bench-write" using an RBD image stored in an EC pool, the some OSD threads start to timeout and eve... Ricardo Dias
01:44 PM Bug #16890: rbd diff outputs nothing when the image is layered and with a writeback cache tier
RBD isn't doing anything special with regard to cache tiering. It sounds like the whiteout in the cache tier is not r... Jason Dillaman
03:35 AM Bug #16890: rbd diff outputs nothing when the image is layered and with a writeback cache tier
Jason, can you make sure you expect this to work from an RBD perspective and throw it into the RADPS project if so? :) Greg Farnum
01:32 PM Feature #15835: filestore: randomize split threshold
https://github.com/ceph/ceph/pull/15689 Josh Durgin
09:01 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
Greg Farnum wrote:
> Note the second reporter confirms this is with cache tiering. Rather suspect that's got more to...
Bart Vanbrabant
03:46 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
Note the second reporter confirms this is with cache tiering. Rather suspect that's got more to do with it than snaps... Greg Farnum
05:27 AM Bug #18930: received Segmentation fault in PGLog::IndexedLog::add
Don't suppose there's still a log or core dump associated with this? Greg Farnum
04:46 AM Bug #14088: mon: nothing logged when ENOSPC encountered during start up
No, just scrubbing and trying to get things in a realistic state. Greg Farnum
04:08 AM Bug #14088: mon: nothing logged when ENOSPC encountered during start up
Greg, No, but I can try and take a look in the next few days if you'd like? Brad Hubbard
12:46 AM Bug #14088: mon: nothing logged when ENOSPC encountered during start up
Brad, did you do any work on this? Greg Farnum
04:35 AM Bug #18752: LibRadosList.EnumerateObjects failure
Hasn't reproduced yet. Greg Farnum
04:27 AM Bug #18328 (Closed): crush: flaky unitest:
Greg Farnum
04:13 AM Bug #18021 (Duplicate): Assertion "needs_recovery" fails when balance_read reaches a replica OSD ...
These are the same thing, right? Greg Farnum
04:11 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
https://github.com/ceph/ceph/pull/15489#issuecomment-308152157 Greg Farnum
04:09 AM Bug #17949 (Resolved): make check: unittest_bit_alloc get_used_blocks() >= 0
Linked PR is not merged but has a comment the race condition fix was merged. Greg Farnum
04:03 AM Bug #17830: osd-scrub-repair.sh is failing (intermittently?) on Jenkins
David, do we have any idea why this is failing? I'm not getting any idea from what's in the comments here. Greg Farnum
03:51 AM Bug #17718: EC Overwrites: update ceph-objectstore-tool export/import to handle rollforward/rollback
Josh, is this still outstanding? I presume we need it for testing... Greg Farnum
03:02 AM Bug #16385 (Fix Under Review): rados bench seq and rand tests do not work if op_size != object_size
One of the stuck PRs:
https://github.com/ceph/ceph/pull/12203
Greg Farnum
02:59 AM Bug #16379 (Closed): [ERROR ] "ceph auth get-or-create for keytype admin returned -1
It's been a year without updates and tests are more or less working, so this must be fixed. Greg Farnum
02:56 AM Bug #16365 (Resolved): Better network partition detection
We're switching to 2KB heartbeat packets now for other reasons. I don't think there's much else we can do here, pract... Greg Farnum
01:37 AM Bug #16177 (Closed): leveldb horrendously slow
Adam's cluster got cleaned up; the MDS doesn't allow you to generate directory omaps that large anymore; RGW is doing... Greg Farnum
12:43 AM Bug #13493: osd: for ec, cascading crash during recovery if one shard is corrupted
I suspect this is being resolved by David's work on EIO handling? Greg Farnum
12:02 AM Bug #20283 (New): qa: missing even trivial tests for many commands
I wrote a trivial script to look for missing commands in tests (https://github.com/ceph/ceph/pull/15675/commits/3aad0... Greg Farnum

06/13/2017

11:38 PM Bug #20256: "ceph osd df" is broken; asserts out on Luminous-enabled clusters
https://github.com/ceph/ceph/pull/15675 Greg Farnum
10:00 PM Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.
I don't really get how the AsyncMessenger could have caused this issue...? Greg Farnum
09:50 PM Bug #12659 (Closed): Can't delete cache pool
Closing due to lack of updates and various changes in cache pools since .94. Greg Farnum
09:48 PM Bug #12615: Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clea...
David, is this still an issue? Greg Farnum
08:53 AM Bug #20277: bluestore crashed while performing scrub
What happened (twice) was:
* the osd had a crc error inconsistent pg
* set debug-bluestore and debug-osd to 20
* t...
Peter Gervai
08:21 AM Bug #20277 (Can't reproduce): bluestore crashed while performing scrub
... Kefu Chai
03:07 AM Bug #20274: rewind divergent deletes head whiteout
https://github.com/ceph/ceph/pull/15649 Sage Weil
02:54 AM Bug #20274 (Resolved): rewind divergent deletes head whiteout
... Sage Weil
03:00 AM Bug #19943: osd: enoent on snaptrimmer
with snap trim whiteout fix applied,
/a/sage-2017-06-12_20:56:37-rados-wip-sage-testing-distro-basic-smithi/128066...
Sage Weil
02:59 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
/a/sage-2017-06-12_20:56:37-rados-wip-sage-testing-distro-basic-smithi/1280581
has full log...
Sage Weil
02:33 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
... Sage Weil
02:28 AM Bug #20273 (Resolved): osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
... Sage Weil

06/12/2017

04:35 PM Bug #20256: "ceph osd df" is broken; asserts out on Luminous-enabled clusters
So obviously what happened is I thought we had moved the osd df command into the monitor, but that didn't actually ha... Greg Farnum
04:33 PM Bug #20256 (Resolved): "ceph osd df" is broken; asserts out on Luminous-enabled clusters
I got a private email report:
When do ‘ceph osd df’, ceph-mon always crush. The stack info as following:...
Greg Farnum
08:46 AM Bug #18043: ceph-mon prioritizes public_network over mon_host address
Thanks for the update, I look forward to seeing your PR :). Sébastien Han

06/11/2017

07:52 PM Bug #13146 (Resolved): mon: creating a huge pool triggers a mon election
We're throttling PG creates now. Greg Farnum
07:28 PM Bug #11907: crushmap validation must not block the monitor
Don't we internally time out crush map testing now? Does it behave sensibly if things take too long? Greg Farnum
07:21 PM Bug #9523 (Closed): Both op threads and dispatcher threads could be stuck at acquiring the budget...
Based on the PR discussion it seems the diagnosed issue wasn't the cause of the slowness. Closing since it hasn't (kn... Greg Farnum

06/09/2017

07:51 PM Bug #20243 (Resolved): Improve size scrub error handling and ignore system attrs in xattr checking

Something similar to this was seen on a production system. If all the object_info_t matched there would be no erro...
David Zafman
06:39 PM Bug #20242 (Resolved): Make osd-scrub-repair.sh unit test run faster

Most likely move some tests to the rados suite.
David Zafman
01:26 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
ugh just saw this on xenial too. hrm.
/a/sage-2017-06-08_20:27:41-rados-wip-sage-testing2-distro-basic-smithi/127...
Sage Weil

06/08/2017

06:52 PM Bug #20227 (Need More Info): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unlo...
Hmm, I see the fault_range call (it's in the new ec unclone code), but it's only dirtying the range including extents... Sage Weil
06:18 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
/a/sage-2017-06-08_02:04:29-rados-wip-sage-testing-distro-basic-smithi/1269367 too Sage Weil
06:14 PM Bug #20227 (Resolved): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded s...
... Sage Weil
06:44 PM Bug #20221: kill osd + osd out leads to stale PGs
@Greg the original bug description was updated with a simpler reproducer which does not involve copying objects. I be... Loïc Dachary
06:34 PM Bug #20221: kill osd + osd out leads to stale PGs
Right, but what you've said here is that if you have pool size one, and kill the only OSD hosting it, then no other O... Greg Farnum
02:58 PM Bug #20221: kill osd + osd out leads to stale PGs
FWIW it was reproduced by badone. Loïc Dachary
12:20 PM Bug #20221: kill osd + osd out leads to stale PGs
@Greg the first reproducer was not trying to rados put the same object. It was trying to rados put another object. I ... Loïc Dachary
12:18 PM Bug #20221: kill osd + osd out leads to stale PGs
The reproducer works as expected on 12.0.3. The behavior changed somewhere in master after 12.0.3 was released. Loïc Dachary
12:17 PM Bug #20221: kill osd + osd out leads to stale PGs
I don't understand what behavior you're looking for. Hanging is the expected behavior when data is unavailable. Greg Farnum
10:07 AM Bug #20221 (New): kill osd + osd out leads to stale PGs
h3. description
When the OSD is killed before ceph osd out, the PGs stay in stale state.
h3. reproducer
From...
Loïc Dachary
05:53 PM Bug #19960 (Pending Backport): overflow in client_io_rate in ceph osd pool stats
Matt Benjamin
03:14 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
> By which commit/PR?
554cf8394a9ac4f845c1fce03dd1a7f551a414a9
Merge pull request #15073 from liewegas/wip-mgr-stats
Aleksei Gutikov
11:00 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Hi Greg,
Thank you for taking the time to look into this.
Following the incident of the present ticket the clus...
Yiorgos Stamoulis

06/07/2017

08:57 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-07_16:25:35-rados-wip-sage-testing2-distro-basic-smithi/1268182
rados/thrash-erasure-code/{ceph.ya...
Sage Weil
02:03 AM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265627
rados/thrash/{0-size-min-size-overr...
Sage Weil
08:11 PM Documentation #20215 (New): librados documentation improvement for the use cases
librados documentation improvement for the use cases including the tradeoffs of object size, i/o rate, and omap vs re... Vikhyat Umrao
04:44 PM Bug #18696: OSD might assert when LTTNG tracing is enabled
Wonder if this PR https://github.com/ceph/ceph/pull/14304 fixes this issue as well. Ganesh Mahalingam
04:01 PM Bug #18750: handle_pg_remove: pg_map_lock held for write when taking pg_lock
I think I remember this one and it wasn't really feasible to fix (at the time). If doing code inspection you'll want ... Greg Farnum
03:59 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Pretty weird, that assert appears to be an internal interval_set consistency thing: https://github.com/ceph/ceph/blob... Greg Farnum
03:58 PM Bug #19198: Bluestore doubles mem usage when caching object content
Sage Weil
03:50 PM Bug #18667: [cache tiering] omap data time-traveled to stale version
Jason says this "seems to pop up randomly every few weeks or so", so it's definitely a live, going concern. :( Greg Farnum
03:40 PM Bug #19086 (Rejected): BlockDevice::create should add check for readlink result instead of raise ...
Sage Weil
03:36 PM Bug #18647: ceph df output with erasure coded pools
Let's verify this prior to Luminous and write a test for it! Greg Farnum
03:29 PM Bug #19023 (Fix Under Review): ceph_test_rados invalid read caused apparently by lost intervals d...
https://github.com/ceph/ceph/pull/15555 Sage Weil
01:23 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
Aleksei Gutikov wrote:
> fixed in master
By which commit/PR?
Nathan Cutler
12:04 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
fixed in master Aleksei Gutikov
09:28 AM Bug #19783 (New): upgrade tests failing with "AssertionError: failed to complete snap trimming be...
Nathan Cutler
06:34 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
Zengran Zhang wrote:
> 2017-05-19 22:48:23.854608 7f14f1c1e700 0 -- 10.10.133.1:6823/2019 >> 10.10.133.1:6819/19544...
Zengran Zhang
02:04 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
... Sage Weil
02:02 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265467
rados/thrash/{0-size-min-size-overrid...
Sage Weil
02:02 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265435
rados/thrash/{0-size-min-size-overr...
Sage Weil

06/06/2017

07:02 PM Bug #20068 (Resolved): osd valgrind error in CrushWrapper::has_incompat_choose_args
Sage Weil
01:19 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-05_22:19:51-rados-wip-sage-testing-distro-basic-smithi/1262663
rados/thrash/{0-size-min-size-overr...
Sage Weil
01:16 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-05_22:19:51-rados-wip-sage-testing-distro-basic-smithi/1262583
rados/thrash/{0-size-min-size-overr...
Sage Weil
01:13 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
/a/sage-2017-06-05_22:19:51-rados-wip-sage-testing-distro-basic-smithi/1262365 Sage Weil
12:40 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
2017-05-19 22:58:05.142834 7f14de2a2700 0 osd.0 pg_epoch: 78440 pg[9.10cs0( v 78440'6350 (78438'4241,78440'6350] loc... Zengran Zhang

06/05/2017

09:32 PM Bug #19518: log entry does not include per-op rvals?
/a/sage-2017-06-05_18:36:01-rados-wip-sage-testing2-distro-basic-smithi/1261843... Sage Weil
06:27 PM Bug #20188 (New): filestore: os/filestore/FileStore.h: 357: FAILED assert(q.empty()) from ceph_te...
... Sage Weil
06:25 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-05_14:47:27-rados-wip-sage-testing-distro-basic-smithi/1260424
teuthology:1260424 06:25 PM $ cat s...
Sage Weil
06:24 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-05_14:47:27-rados-wip-sage-testing-distro-basic-smithi/1260344
teuthology:1260344 06:24 PM $ cat su...
Sage Weil
09:24 AM Backport #16239 (Fix Under Review): 'ceph tell osd.0 flush_pg_stats' fails in rados qa run
https://github.com/ceph/ceph/pull/15475 Kefu Chai

06/03/2017

06:46 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
It's less likely on Centos, but I think we've seen this before and it's usually been a btrfs kernel bug that got reso... Greg Farnum

06/02/2017

03:29 PM Bug #20169 (New): filestore+btrfs occasionally returns ENOSPC
... Sage Weil
03:14 PM Bug #19964: occasional crushtool timeouts
/a/sage-2017-06-02_08:32:01-rados-wip-sage-testing-distro-basic-smithi/1255514
teuthology:1255514 03:14 PM $ cat su...
Sage Weil
02:23 AM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-06-01_21:44:07-rados-wip-sage-testing---basic-smithi/1253654
teuthology:1253654 02:23 AM $ cat summary...
Sage Weil
02:20 AM Bug #20134 (Rejected): test_rados.TestIoctx.test_aio_read AssertionError: 5 != 2
<Pre>
2017-06-01T22:57:09.649 INFO:tasks.workunit.client.0.smithi084.stderr:========================================...
Sage Weil

06/01/2017

04:50 PM Bug #20133 (Can't reproduce): EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksd...
... Sage Weil
04:43 PM Bug #19964: occasional crushtool timeouts
/a/sage-2017-06-01_02:27:12-rados-wip-sage-testing2---basic-smithi/1249759
description: rados/singleton-bluestore/{a...
Sage Weil

05/31/2017

11:06 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-31_18:45:30-rados-wip-sage-testing---basic-smithi/1248735 Sage Weil
04:38 PM Bug #18043: ceph-mon prioritizes public_network over mon_host address
and to elaborate on the fact that i have a branch and no pr, i do intend to finish this up soon, but likely only afte... Joao Eduardo Luis
04:36 PM Bug #18043: ceph-mon prioritizes public_network over mon_host address
fwiw, i've got a branch handling this from earlier this year: https://github.com/jecluis/ceph/commits/wip-mon-host
...
Joao Eduardo Luis
04:04 PM Support #18508 (Closed): PGs of EC pool stuck in peering state
There was clearly a lot going on here and none of it was clear. If switching to SimpleMessenger fixed it, I presume t... Greg Farnum
03:14 PM Bug #17138: crush: inconsistent ruleset/ruled_id are difficult to figure out
Some work in progress on this here: https://github.com/ceph/ceph/pull/13683 Josh Durgin
03:21 AM Bug #20117 (Rejected): BlueStore.cc: 8585: FAILED assert(0 == "unexpected error")
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
xw zhang
03:19 AM Bug #20116 (Can't reproduce): osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILE...
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+o...
xw zhang

05/30/2017

01:46 PM Support #20108 (Resolved): PGs are not remapped correctly when one host fails
I have run into the following problem:
in a 6 node cluster we have 2 nodes/chassis, and the crush rule set to distri...
Laszlo Budai
01:45 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Logs available on teuthology:/home/jdillaman/osd.23.log_try_rados_rm.gz
Jason Dillaman

05/29/2017

11:30 PM Bug #19790 (In Progress): rados ls on pool with no access returns no error
Brad Hubbard
11:28 PM Bug #19790: rados ls on pool with no access returns no error
https://github.com/ceph/ceph/pull/15354
Greg, will talk to you about the per-object cap semantics separately.
Brad Hubbard
07:45 PM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-28_05:00:18-rados-wip-sage-testing---basic-smithi/1238511
description: rados/singleton-bluestore/{...
Sage Weil
02:51 PM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
https://github.com/ceph/ceph/pull/15349 Xuehan Xu

05/28/2017

09:17 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
I've no idea the repercussions (thinking I'll backup and recreate the cluster) but if you write an osdmap into all of... Jason McNeil
03:09 AM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235222 Sage Weil
03:07 AM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235419 Sage Weil
02:03 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235225 Sage Weil
01:59 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-27_01:05:11-rados-wip-sage-testing---basic-smithi/1233483 Sage Weil
01:57 AM Bug #20105 (Resolved): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
... Sage Weil

05/27/2017

08:06 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
I have a document that provides the detail of our analysis of this problem, but it's written in chinese. If needed, I... Xuehan Xu
08:03 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
Hi, everyone.
Sorry, I forgot to watch my issues.
We found that the problem is due to "librados::OPERATION_BALA...
Xuehan Xu
07:59 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
I pulled out a disk, and then there was the problem. xw zhang
03:06 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
fang yuxiang wrote:
> i think this is not functional issue of ceph, maybe your local fs data is corrupted.
>
> ar...
huanwen ren
03:01 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
`read_log 406'6529418` and `read_log 346'6529418` have the same seq
other, I use ceph-kvstore-tool can show as:
...
huanwen ren
02:46 AM Bug #20099: osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.ve...
i think this is not functional issue of ceph, maybe your local fs data is corrupted.
are you using any block cache...
fang yuxiang
02:41 AM Bug #20099 (Need More Info): osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.versi...
My Ceph cluster is down when the server is powered off,
and when i restart my osd, it failed in read_log.
As fllow:...
huanwen ren

05/26/2017

09:44 PM Bug #19943: osd: enoent on snaptrimmer
http://pulpito.ceph.com/gregf-2017-05-26_06:45:56-rados-wip-19931-snaptrim-pgs---basic-smithi/1231020/ Greg Farnum
03:36 PM Bug #20068 (Need More Info): osd valgrind error in CrushWrapper::has_incompat_choose_args
https://github.com/ceph/ceph/pull/15244 was merged recently and modified how things are handled. Let see if it happen... Loïc Dachary
12:40 PM Bug #20092 (Duplicate): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
http://pulpito.ceph.com/jdillaman-2017-05-25_16:48:38-rbd-wip-jd-testing-distro-basic-smithi/1229611... Jason Dillaman

05/25/2017

10:07 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
... Sage Weil
06:11 AM Bug #19983: osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.c...
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224591/teuthology.log... Brad Hubbard
05:56 AM Bug #19943: osd: enoent on snaptrimmer
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224546/teuthology.log Brad Hubbard
02:27 AM Bug #19964: occasional crushtool timeouts
/a/sage-2017-05-24_22:20:09-rados-wip-sage-testing---basic-smithi/1225182 Sage Weil
12:16 AM Bug #19790: rados ls on pool with no access returns no error
Looking into this Brad Hubbard

05/24/2017

11:13 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
Kefu, could you take a look at this one? Not sure if it's related to recent denc changes, or perhaps https://github.c... Josh Durgin
10:26 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
More instances from last night's master:
- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic...
John Spray
10:01 PM Bug #19943: osd: enoent on snaptrimmer
/a/sage-2017-05-24_18:40:38-rados-wip-sage-testing2---basic-smithi/1224933 Sage Weil
03:44 PM Bug #16890 (Fix Under Review): rbd diff outputs nothing when the image is layered and with a writ...
Josh Durgin
03:43 PM Feature #16883: omap not supported by ec pools
This is due to erasure coded pools not supporting omap operations. It's a limitation for the current cache pool code,... Josh Durgin
03:25 PM Bug #17170 (Can't reproduce): mon/monclient: update "unable to obtain rotating service keys when ...
Sage Weil
03:22 PM Bug #17929: rados tool should bail out if you combine listing and setting the snap ID
There is discussion on that (closed) PR. We just don't want to do snap listing as it's even more expensive than norma... Greg Farnum
03:13 PM Bug #17968 (Need More Info): Ceph:OSD can't finish recovery+backfill process due to assertion fai...
Greg Farnum
03:13 PM Bug #17968 (Can't reproduce): Ceph:OSD can't finish recovery+backfill process due to assertion fa...
Greg Farnum
12:05 PM Bug #20068 (In Progress): osd valgrind error in CrushWrapper::has_incompat_choose_args
Loïc Dachary
10:34 AM Bug #20068: osd valgrind error in CrushWrapper::has_incompat_choose_args
Oops, left off the actual link:
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/122...
John Spray
10:33 AM Bug #20068 (Resolved): osd valgrind error in CrushWrapper::has_incompat_choose_args
Loic: assigning to you because it looks like you were working in this function recently.... John Spray
10:47 AM Bug #20069 (New): PGs failing to create at start of test, REQUIRE_LUMINOUS not set?
http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/1222407... John Spray
08:52 AM Bug #19790: rados ls on pool with no access returns no error
For what it's worth, this is a regression. In Hammer, the appropriate EPERM is raised:... Florian Haas

05/23/2017

08:24 PM Bug #18165 (In Progress): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_tar...
David Zafman
07:37 PM Bug #19790: rados ls on pool with no access returns no error
Well, it's obvious enough, we go into PrimaryLogPG::do_pg_op() before we check op_has_sufficient_caps().
I think t...
Greg Farnum
06:57 PM Bug #20059 (Resolved): miscounting degraded objects
on bigbang,... Sage Weil
09:50 AM Bug #20053 (New): crush compile / decompile looses precision on weight
The weight of an item is displayed with %.3f and looses precision that makes a difference in mapping.
Steps to rep...
Loïc Dachary
03:39 AM Bug #20050: osd: very old pg creates take a long time to build past_intervals
partially addressed by patch in wip-bigbang. Sage Weil
03:33 AM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
(bigbang)
osds were down for a long time and pgs never got created. when the osds finally are up, they have to go...
Sage Weil

05/22/2017

11:05 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
Still happens in 12.0.3, with the patch [[https://github.com/ceph/ceph/pull/15046]] applied。 WANG Guoqin
08:35 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
... Jason Dillaman
05:22 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
I've seen this on scrub as well. Stefan Priebe
03:55 PM Bug #20041 (Resolved): ceph-osd: PGs getting stuck in scrub state, stalling RBD
See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa... Jason Dillaman
04:34 PM Bug #19964: occasional crushtool timeouts
See this log as well:
http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-20_04:20:14-rados-master_2017_5_20---basic...
Yuri Weinstein
06:51 AM Bug #20000 (Can't reproduce): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+ec+overw...
xw zhang
 

Also available in: Atom