Project

General

Profile

Activity

From 04/02/2017 to 05/01/2017

05/01/2017

10:26 PM Bug #19818 (New): crush: get_rule_weight_osd_map does not factor in pool size, rule
The get_rule_weight_osd_map assumes that every osd reachable by the TAKE ops are used once. This isn't true in gener... Sage Weil
05:33 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
Sage Weil
04:24 PM Bug #18698 (Can't reproduce): BlueFS FAILED assert(0 == "allocate failed... wtf")
I haven't seen this in any our qa... is it still happening for you? Which versions? Sage Weil
03:55 PM Bug #19639 (Need More Info): mon crash on shutdown
what is the "other one" (besides probe_timeout #19738)? Sage Weil
02:34 PM Bug #19815 (New): Rollback/EC log entries take gratuitous amounts of memory
Each osd consumes too much memory when i tested ec overwrite. So i watched heap memory with google-perftools.
I fou...
hongpeng lu

04/28/2017

09:38 PM Feature #19810 (New): qa: test that we are trimming maps
We merged https://github.com/ceph/ceph/pull/14504 without noticing that it prevented *all* OSD map trimming, because ... Greg Farnum
03:29 PM Bug #19803 (New): osd_op_reply for stat does not contain data (ceph-mds crashes with unhandled bu...
Hi,
our MDS crashes reproducible after some hours when we're extracting lots of zip archives (with many small file...
Andreas Gerstmayr
05:42 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
In this case, the bug occurs if firstly remove pool, following create pool and image.
When it occur, The most intui...
Tang Jin
02:19 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
(gdb) bt
#0 0x00002b0ff06d4cc3 in pread64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00005591f7cecd35 in pr...
Tang Jin
02:18 AM Bug #19800 (Resolved): some osds are down when create a new pool and a new image of the pool (blu...
After much Write IOs such as snapshot writing and PG splitting for cluster, to create a new pool and a new image of t... Tang Jin

04/27/2017

06:57 PM Bug #18329 (Can't reproduce): pure virtual method called in rocksdb from bluestore
haven't seen this since then. Sage Weil
06:19 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
Ran 4 times - 50% failure rate: http://pulpito.ceph.com/smithfarm-2017-04-27_17:35:57-rados-wip-jewel-backports-distr... Nathan Cutler
05:40 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
http://pulpito.ceph.com/smithfarm-2017-04-27_16:56:17-rados-wip-jewel-backports---basic-smithi/1074069/ Nathan Cutler
08:58 AM Bug #19790 (Resolved): rados ls on pool with no access returns no error
Given the following auth capabilities:... Florian Haas

04/26/2017

07:15 PM Bug #19783: upgrade tests failing with "AssertionError: failed to complete snap trimming before t...
Nathan Cutler
06:21 PM Bug #19783 (Fix Under Review): upgrade tests failing with "AssertionError: failed to complete sna...
Nathan Cutler
06:17 PM Bug #19783 (Pending Backport): upgrade tests failing with "AssertionError: failed to complete sna...
-*master PR*: https://github.com/ceph/ceph/pull/14811- Nathan Cutler
06:15 PM Bug #19783 (New): upgrade tests failing with "AssertionError: failed to complete snap trimming be...
After recent snap trimming changes, upgrade tests have been failing on slow machines (VPS) with "AssertionError: fail... Nathan Cutler

04/25/2017

06:05 PM Bug #18647: ceph df output with erasure coded pools
This seems to still be an issue in Jewel. I am able to use the N and K settings of the EC crush rule to determine th... David Turner
01:41 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
It looks like the core of the problem is related to processing increases in the osd map epoch. Any change requires u... Ben Meekhof
12:49 AM Bug #19753 (Resolved): Deny reservation if expected backfill size would put us over backfill_full...

We currently just check the full status based on disk usage. We need to adjust for the amount of data expected l...
David Zafman

04/24/2017

03:23 PM Bug #19750: osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
David, in case you want to take a look before the log is nuked by jenkins (i downloaded a copy, but bzip2 fails to co... Kefu Chai
03:21 PM Bug #19750 (Can't reproduce): osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
... Kefu Chai
02:40 PM Feature #17043: [RFE] filestore merge threadhold and split multiple defaults may not be ideal
Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=1219974 Vikhyat Umrao

04/21/2017

10:45 AM Bug #19639: mon crash on shutdown
Split out the propose_pending on into http://tracker.ceph.com/issues/19738 with a candidate fix, the other one is sti... John Spray
09:08 AM Backport #16239: 'ceph tell osd.0 flush_pg_stats' fails in rados qa run
This is showing in jewel 10.2.8 integration testing.
description: rados/singleton/{all/ec-lost-unfound-upgrade.yam...
Nathan Cutler
08:56 AM Bug #19737 (Resolved): EAGAIN encountered during pg scrub (jewel)
test description: rados/singleton-nomsgr/{all/lfn-upgrade-infernalis.yaml rados.yaml}
http://qa-proxy.ceph.com/teu...
Nathan Cutler

04/20/2017

11:38 AM Bug #19486: Rebalancing can propagate corrupt copy of replicated object
Thanks. I thought it might be the case that Bluestore would fix or improve this, but I haven't found a way to test th... Mark Houghton

04/19/2017

11:45 PM Bug #19700: OSD remained up despite cluster network being inactive?
Just for clarification, it was ceph2.r2 that was down, "chassis" is the physical node, and "host" is the subgroup on ... Patrick McLean
11:42 PM Bug #19700: OSD remained up despite cluster network being inactive?
Here is the output of "ip addr", note that the "internal" interface is DOWN with NO-CARRIER... Patrick McLean
11:37 PM Bug #19700: OSD remained up despite cluster network being inactive?
Here is the output of "ceph osd tree"... Patrick McLean
11:32 PM Bug #19700: OSD remained up despite cluster network being inactive?
one osd was unable to communicate and ceph osd tree showed the osd as up Vasu Kulkarni
11:25 PM Bug #19700 (Closed): OSD remained up despite cluster network being inactive?
We have a ceph cluster with segregated cluster network for the OSDs to communicate with each other, and a "public" ne... Patrick McLean
09:09 PM Bug #19697 (New): make PG state names intelligible
A question came up on the mailing list about whether PGs go degraded on CRUSH map changes, and how to distinguish tha... Greg Farnum
05:20 PM Bug #19695 (New): mon: leaked session
... Sage Weil
05:19 PM Bug #18925 (Can't reproduce): Leak_DefinitelyLost in KernelDevice::aio_write
Sage Weil
03:55 PM Bug #19444 (Can't reproduce): BlueStore::read() asserts in rados qa run
Sage Weil
03:55 PM Bug #19486: Rebalancing can propagate corrupt copy of replicated object
Yes. The new scrub tools (in progress) will give you more control over which copy is propagated. And bluestore's ch... Sage Weil
03:10 PM Bug #14115: crypto: race in nss init
Seems harder to hit in our test environment now, but I did see this in one recent run. Josh Durgin

04/17/2017

10:21 AM Bug #19639 (Can't reproduce): mon crash on shutdown
Mon crash happening during shutdown in a cephfs test run.... John Spray

04/15/2017

10:00 AM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
This failure is plaguing kraken backports - see e.g.
* https://github.com/ceph/ceph/pull/14517
* https://github.c...
Nathan Cutler
08:09 AM Bug #18599 (Resolved): bluestore: full osd will not start. _do_alloc_write failed to reserve 0x1...
Nathan Cutler

04/13/2017

01:03 PM Bug #19606 (Can't reproduce): monitors crash on incorrect OSD UUID (and bad uuid following reboot?)
I had restarted host with few OSDs and all three monitors crashed.
version: 10.2.6-0ubuntu0.16.04.1
Trace:
...
George Shuklin
10:24 AM Bug #19605 (Resolved): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
Seen in master multimds test run here:
http://pulpito.ceph.com/jspray-2017-04-12_23:38:47-multimds-master-testing-ba...
John Spray
07:43 AM Bug #10348 (Won't Fix): crushtool --show-choose-tries overflows
The conditions that create the statistic lossage do not make logical sense (i.e. the total_tries is lower than the lo... Loïc Dachary

04/12/2017

05:42 PM Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.
Is this still a problem? Sage Weil
05:34 PM Bug #12659: Can't delete cache pool
Just found this bug. Is this still causing problems? Sage Weil
04:24 PM Bug #8675: Unnecessary remapping/backfilling?
CRUSH improvements are a continuously ongoing discussion, and it's being improved right now. Greg Farnum
04:23 PM Bug #8675 (Won't Fix): Unnecessary remapping/backfilling?
Sage Weil
03:03 PM Bug #18926: Why osds do not release memory?
I also see memory leakage on v12.0.1 w/ bluestore. My test is 1000 clients writing at 1MB/s into a CephFS. The OSDs s... Dan van der Ster
01:11 AM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
I've updated my comment in https://github.com/ceph/ceph/pull/14318. Pan Liu

04/11/2017

08:37 PM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status

Let's say I set osd failsafe ful ratio = .90. Below I made up these numbers to show how
these percentages won't...
David Zafman

04/10/2017

11:11 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
1. We got the same problem when the power of data center is shutdown(the electricity was cut off). There are two osds... davinci yin
04:38 AM Bug #19487 (Fix Under Review): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_...
https://github.com/ceph/ceph/pull/14318 Kefu Chai

04/06/2017

01:55 PM Bug #19518: log entry does not include per-op rvals?
... Sage Weil
01:52 PM Bug #19518 (New): log entry does not include per-op rvals?
... Sage Weil
10:21 AM Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDs
We recently had an interesting issue with RBD images and filestore on Jewel 10.2.5:
We have a pool with RBD images, ...
Piotr Dalek

04/05/2017

01:56 PM Bug #19379 (Resolved): bluestore: crc mismatch after recent overwrite
Sage Weil
09:46 AM Bug #18467: ceph ping mon.* can fail
@Nathan
Its value should be lower so that fault is easier to reproduce. Now, ping socket error will reconnect auto...
Chang Liu

04/04/2017

10:26 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
Totally misdiagnosed this one; closing the PR.
The problem looks like it's related to map skipping. Here:
<pre...
Sage Weil
10:09 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
/a/sage-2017-03-31_02:07:33-rados:thrash-wip-kill-subop-reordered---basic-smithi/968193 Sage Weil
03:31 PM Bug #19449: 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing?
seems that my tunables jumped (for some reason) from firefly (jewel defaults, right?) to hammer, if it really happene... Herbert Faleiros
02:09 PM Bug #19487 (Closed): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
1) Use vstart.sh to create a cluster, with option: osd failsafe ful ratio = .46
2) Input "ceph df":
GLOBAL:
SIZ...
Pan Liu
01:31 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, ... Mark Houghton

04/03/2017

10:52 AM Bug #19449 (Won't Fix): 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors whil...
Hi,
when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.
...
Herbert Faleiros

04/02/2017

04:18 AM Bug #19444: BlueStore::read() asserts in rados qa run
not reproducible on master: http://pulpito.ceph.com/kchai-2017-04-02_03:46:37-rados-master---basic-mira/
not repro...
Kefu Chai
03:48 AM Bug #19444: BlueStore::read() asserts in rados qa run
see also https://github.com/rook/rook/issues/374 Kefu Chai
03:44 AM Bug #19444 (Can't reproduce): BlueStore::read() asserts in rados qa run
... Kefu Chai
 

Also available in: Atom