Activity
From 04/02/2017 to 05/01/2017
05/01/2017
- 10:26 PM Bug #19818 (New): crush: get_rule_weight_osd_map does not factor in pool size, rule
- The get_rule_weight_osd_map assumes that every osd reachable by the TAKE ops are used once. This isn't true in gener...
- 05:33 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
- 04:24 PM Bug #18698 (Can't reproduce): BlueFS FAILED assert(0 == "allocate failed... wtf")
- I haven't seen this in any our qa... is it still happening for you? Which versions?
- 03:55 PM Bug #19639 (Need More Info): mon crash on shutdown
- what is the "other one" (besides probe_timeout #19738)?
- 02:34 PM Bug #19815 (New): Rollback/EC log entries take gratuitous amounts of memory
- Each osd consumes too much memory when i tested ec overwrite. So i watched heap memory with google-perftools.
I fou...
04/28/2017
- 09:38 PM Feature #19810 (New): qa: test that we are trimming maps
- We merged https://github.com/ceph/ceph/pull/14504 without noticing that it prevented *all* OSD map trimming, because ...
- 03:29 PM Bug #19803 (New): osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled bu...
- Hi,
our MDS crashes reproducible after some hours when we're extracting lots of zip archives (with many small file... - 05:42 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
- In this case, the bug occurs if firstly remove pool, following create pool and image.
When it occur, The most intui... - 02:19 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
- (gdb) bt
#0 0x00002b0ff06d4cc3 in pread64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00005591f7cecd35 in pr... - 02:18 AM Bug #19800 (Resolved): some osds are down when create a new pool and a new image of the pool (blu...
- After much Write IOs such as snapshot writing and PG splitting for cluster, to create a new pool and a new image of t...
04/27/2017
- 06:57 PM Bug #18329 (Can't reproduce): pure virtual method called in rocksdb from bluestore
- haven't seen this since then.
- 06:19 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
- Ran 4 times - 50% failure rate: http://pulpito.ceph.com/smithfarm-2017-04-27_17:35:57-rados-wip-jewel-backports-distr...
- 05:40 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
- http://pulpito.ceph.com/smithfarm-2017-04-27_16:56:17-rados-wip-jewel-backports---basic-smithi/1074069/
- 08:58 AM Bug #19790 (Resolved): rados ls on pool with no access returns no error
- Given the following auth capabilities:...
04/26/2017
- 07:15 PM Bug #19783: upgrade tests failing with "AssertionError: failed to complete snap trimming before t...
- 06:21 PM Bug #19783 (Fix Under Review): upgrade tests failing with "AssertionError: failed to complete sna...
- 06:17 PM Bug #19783 (Pending Backport): upgrade tests failing with "AssertionError: failed to complete sna...
- -*master PR*: https://github.com/ceph/ceph/pull/14811-
- 06:15 PM Bug #19783 (New): upgrade tests failing with "AssertionError: failed to complete snap trimming be...
- After recent snap trimming changes, upgrade tests have been failing on slow machines (VPS) with "AssertionError: fail...
04/25/2017
- 06:05 PM Bug #18647: ceph df output with erasure coded pools
- This seems to still be an issue in Jewel. I am able to use the N and K settings of the EC crush rule to determine th...
- 01:41 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- It looks like the core of the problem is related to processing increases in the osd map epoch. Any change requires u...
- 12:49 AM Bug #19753 (Resolved): Deny reservation if expected backfill size would put us over backfill_full...
We currently just check the full status based on disk usage. We need to adjust for the amount of data expected l...
04/24/2017
- 03:23 PM Bug #19750: osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
- David, in case you want to take a look before the log is nuked by jenkins (i downloaded a copy, but bzip2 fails to co...
- 03:21 PM Bug #19750 (Can't reproduce): osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
- ...
- 02:40 PM Feature #17043: [RFE] filestore merge threadhold and split multiple defaults may not be ideal
- Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=1219974
04/21/2017
- 10:45 AM Bug #19639: mon crash on shutdown
- Split out the propose_pending on into http://tracker.ceph.com/issues/19738 with a candidate fix, the other one is sti...
- 09:08 AM Backport #16239: 'ceph tell osd.0 flush_pg_stats' fails in rados qa run
- This is showing in jewel 10.2.8 integration testing.
description: rados/singleton/{all/ec-lost-unfound-upgrade.yam... - 08:56 AM Bug #19737 (Resolved): EAGAIN encountered during pg scrub (jewel)
- test description: rados/singleton-nomsgr/{all/lfn-upgrade-infernalis.yaml rados.yaml}
http://qa-proxy.ceph.com/teu...
04/20/2017
- 11:38 AM Bug #19486: Rebalancing can propagate corrupt copy of replicated object
- Thanks. I thought it might be the case that Bluestore would fix or improve this, but I haven't found a way to test th...
04/19/2017
- 11:45 PM Bug #19700: OSD remained up despite cluster network being inactive?
- Just for clarification, it was ceph2.r2 that was down, "chassis" is the physical node, and "host" is the subgroup on ...
- 11:42 PM Bug #19700: OSD remained up despite cluster network being inactive?
- Here is the output of "ip addr", note that the "internal" interface is DOWN with NO-CARRIER...
- 11:37 PM Bug #19700: OSD remained up despite cluster network being inactive?
- Here is the output of "ceph osd tree"...
- 11:32 PM Bug #19700: OSD remained up despite cluster network being inactive?
- one osd was unable to communicate and ceph osd tree showed the osd as up
- 11:25 PM Bug #19700 (Closed): OSD remained up despite cluster network being inactive?
- We have a ceph cluster with segregated cluster network for the OSDs to communicate with each other, and a "public" ne...
- 09:09 PM Bug #19697 (New): make PG state names intelligible
- A question came up on the mailing list about whether PGs go degraded on CRUSH map changes, and how to distinguish tha...
- 05:20 PM Bug #19695 (New): mon: leaked session
- ...
- 05:19 PM Bug #18925 (Can't reproduce): Leak_DefinitelyLost in KernelDevice::aio_write
- 03:55 PM Bug #19444 (Can't reproduce): BlueStore::read() asserts in rados qa run
- 03:55 PM Bug #19486: Rebalancing can propagate corrupt copy of replicated object
- Yes. The new scrub tools (in progress) will give you more control over which copy is propagated. And bluestore's ch...
- 03:10 PM Bug #14115: crypto: race in nss init
- Seems harder to hit in our test environment now, but I did see this in one recent run.
04/17/2017
- 10:21 AM Bug #19639 (Can't reproduce): mon crash on shutdown
- Mon crash happening during shutdown in a cephfs test run....
04/15/2017
- 10:00 AM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- This failure is plaguing kraken backports - see e.g.
* https://github.com/ceph/ceph/pull/14517
* https://github.c... - 08:09 AM Bug #18599 (Resolved): bluestore: full osd will not start. _do_alloc_write failed to reserve 0x1...
04/13/2017
- 01:03 PM Bug #19606 (Can't reproduce): monitors crash on incorrect OSD UUID (and bad uuid following reboot?)
- I had restarted host with few OSDs and all three monitors crashed.
version: 10.2.6-0ubuntu0.16.04.1
Trace:
... - 10:24 AM Bug #19605 (Resolved): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Seen in master multimds test run here:
http://pulpito.ceph.com/jspray-2017-04-12_23:38:47-multimds-master-testing-ba... - 07:43 AM Bug #10348 (Won't Fix): crushtool --show-choose-tries overflows
- The conditions that create the statistic lossage do not make logical sense (i.e. the total_tries is lower than the lo...
04/12/2017
- 05:42 PM Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.
- Is this still a problem?
- 05:34 PM Bug #12659: Can't delete cache pool
- Just found this bug. Is this still causing problems?
- 04:24 PM Bug #8675: Unnecessary remapping/backfilling?
- CRUSH improvements are a continuously ongoing discussion, and it's being improved right now.
- 04:23 PM Bug #8675 (Won't Fix): Unnecessary remapping/backfilling?
- 03:03 PM Bug #18926: Why osds do not release memory?
- I also see memory leakage on v12.0.1 w/ bluestore. My test is 1000 clients writing at 1MB/s into a CephFS. The OSDs s...
- 01:11 AM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
- I've updated my comment in https://github.com/ceph/ceph/pull/14318.
04/11/2017
- 08:37 PM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
Let's say I set osd failsafe ful ratio = .90. Below I made up these numbers to show how
these percentages won't...
04/10/2017
- 11:11 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- 1. We got the same problem when the power of data center is shutdown(the electricity was cut off). There are two osds...
- 04:38 AM Bug #19487 (Fix Under Review): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_...
- https://github.com/ceph/ceph/pull/14318
04/06/2017
- 01:55 PM Bug #19518: log entry does not include per-op rvals?
- ...
- 01:52 PM Bug #19518 (New): log entry does not include per-op rvals?
- ...
- 10:21 AM Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDs
- We recently had an interesting issue with RBD images and filestore on Jewel 10.2.5:
We have a pool with RBD images, ...
04/05/2017
- 01:56 PM Bug #19379 (Resolved): bluestore: crc mismatch after recent overwrite
- 09:46 AM Bug #18467: ceph ping mon.* can fail
- @Nathan
Its value should be lower so that fault is easier to reproduce. Now, ping socket error will reconnect auto...
04/04/2017
- 10:26 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
- Totally misdiagnosed this one; closing the PR.
The problem looks like it's related to map skipping. Here:
<pre... - 10:09 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
- /a/sage-2017-03-31_02:07:33-rados:thrash-wip-kill-subop-reordered---basic-smithi/968193
- 03:31 PM Bug #19449: 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing?
- seems that my tunables jumped (for some reason) from firefly (jewel defaults, right?) to hammer, if it really happene...
- 02:09 PM Bug #19487 (Closed): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
- 1) Use vstart.sh to create a cluster, with option: osd failsafe ful ratio = .46
2) Input "ceph df":
GLOBAL:
SIZ... - 01:31 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
- With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, ...
04/03/2017
- 10:52 AM Bug #19449 (Won't Fix): 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors whil...
- Hi,
when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.
...
04/02/2017
- 04:18 AM Bug #19444: BlueStore::read() asserts in rados qa run
- not reproducible on master: http://pulpito.ceph.com/kchai-2017-04-02_03:46:37-rados-master---basic-mira/
not repro... - 03:48 AM Bug #19444: BlueStore::read() asserts in rados qa run
- see also https://github.com/rook/rook/issues/374
- 03:44 AM Bug #19444 (Can't reproduce): BlueStore::read() asserts in rados qa run
- ...
Also available in: Atom