Project

General

Profile

Activity

From 03/17/2017 to 04/15/2017

04/15/2017

10:00 AM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
This failure is plaguing kraken backports - see e.g.
* https://github.com/ceph/ceph/pull/14517
* https://github.c...
Nathan Cutler
08:09 AM Bug #18599 (Resolved): bluestore: full osd will not start. _do_alloc_write failed to reserve 0x1...
Nathan Cutler

04/13/2017

01:03 PM Bug #19606 (Can't reproduce): monitors crash on incorrect OSD UUID (and bad uuid following reboot?)
I had restarted host with few OSDs and all three monitors crashed.
version: 10.2.6-0ubuntu0.16.04.1
Trace:
...
George Shuklin
10:24 AM Bug #19605 (Resolved): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
Seen in master multimds test run here:
http://pulpito.ceph.com/jspray-2017-04-12_23:38:47-multimds-master-testing-ba...
John Spray
07:43 AM Bug #10348 (Won't Fix): crushtool --show-choose-tries overflows
The conditions that create the statistic lossage do not make logical sense (i.e. the total_tries is lower than the lo... Loïc Dachary

04/12/2017

05:42 PM Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.
Is this still a problem? Sage Weil
05:34 PM Bug #12659: Can't delete cache pool
Just found this bug. Is this still causing problems? Sage Weil
04:24 PM Bug #8675: Unnecessary remapping/backfilling?
CRUSH improvements are a continuously ongoing discussion, and it's being improved right now. Greg Farnum
04:23 PM Bug #8675 (Won't Fix): Unnecessary remapping/backfilling?
Sage Weil
03:03 PM Bug #18926: Why osds do not release memory?
I also see memory leakage on v12.0.1 w/ bluestore. My test is 1000 clients writing at 1MB/s into a CephFS. The OSDs s... Dan van der Ster
01:11 AM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
I've updated my comment in https://github.com/ceph/ceph/pull/14318. Pan Liu

04/11/2017

08:37 PM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status

Let's say I set osd failsafe ful ratio = .90. Below I made up these numbers to show how
these percentages won't...
David Zafman

04/10/2017

11:11 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
1. We got the same problem when the power of data center is shutdown(the electricity was cut off). There are two osds... davinci yin
04:38 AM Bug #19487 (Fix Under Review): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_...
https://github.com/ceph/ceph/pull/14318 Kefu Chai

04/06/2017

01:55 PM Bug #19518: log entry does not include per-op rvals?
... Sage Weil
01:52 PM Bug #19518 (New): log entry does not include per-op rvals?
... Sage Weil
10:21 AM Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDs
We recently had an interesting issue with RBD images and filestore on Jewel 10.2.5:
We have a pool with RBD images, ...
Piotr Dalek

04/05/2017

01:56 PM Bug #19379 (Resolved): bluestore: crc mismatch after recent overwrite
Sage Weil
09:46 AM Bug #18467: ceph ping mon.* can fail
@Nathan
Its value should be lower so that fault is easier to reproduce. Now, ping socket error will reconnect auto...
Chang Liu

04/04/2017

10:26 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
Totally misdiagnosed this one; closing the PR.
The problem looks like it's related to map skipping. Here:
<pre...
Sage Weil
10:09 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
/a/sage-2017-03-31_02:07:33-rados:thrash-wip-kill-subop-reordered---basic-smithi/968193 Sage Weil
03:31 PM Bug #19449: 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing?
seems that my tunables jumped (for some reason) from firefly (jewel defaults, right?) to hammer, if it really happene... Herbert Faleiros
02:09 PM Bug #19487 (Closed): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
1) Use vstart.sh to create a cluster, with option: osd failsafe ful ratio = .46
2) Input "ceph df":
GLOBAL:
SIZ...
Pan Liu
01:31 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, ... Mark Houghton

04/03/2017

10:52 AM Bug #19449 (Won't Fix): 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors whil...
Hi,
when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.
...
Herbert Faleiros

04/02/2017

04:18 AM Bug #19444: BlueStore::read() asserts in rados qa run
not reproducible on master: http://pulpito.ceph.com/kchai-2017-04-02_03:46:37-rados-master---basic-mira/
not repro...
Kefu Chai
03:48 AM Bug #19444: BlueStore::read() asserts in rados qa run
see also https://github.com/rook/rook/issues/374 Kefu Chai
03:44 AM Bug #19444 (Can't reproduce): BlueStore::read() asserts in rados qa run
... Kefu Chai

03/31/2017

08:09 PM Bug #13385: cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round fa...
Just saw this bug , cluster was working normally one minute, and the next it's doing this on most of the OSDs. Never... Ben England
03:41 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
I'm experiencing this runaway memory issue as well. It only appeared a couple of days ago. I tried setting the bluest... Aaron T
01:56 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
https://github.com/ceph/ceph/pull/14270 Sage Weil
01:56 PM Bug #19440 (Fix Under Review): osd: trims maps taht pgs haven't consumed yet when there are gaps
Sage Weil
01:52 PM Bug #19440 (New): osd: trims maps taht pgs haven't consumed yet when there are gaps
... Sage Weil
01:21 PM Feature #19384: ceph_objectstore_tool (set|clear)-missing-item command
Chang Liu wrote:
> Hi, Sam
>
> I looked at this problem, and find other problem.
>
> [...]
>
> We don't use...
Chang Liu
12:59 PM Feature #19384: ceph_objectstore_tool (set|clear)-missing-item command
Hi, Sam
I looked at this problem, and find other problem....
Chang Liu

03/29/2017

01:07 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
sorry wrong window . ignore my previous comment Nokia ceph-users
01:00 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue

This is a bug with the ceph-mgr service -->> http://tracker.ceph.com/issues/19407 and currently set to need review ...
Nokia ceph-users
06:42 AM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
Hi Jaime,
The issue not fixed with this workaround, and we will address this workaround in another issue related t...
Muthusamy Muthiah
12:34 AM Feature #15835: filestore: randomize split threshold
This one is more about performance testing, and at this point I think effort there is better spent on bluestore than ... Josh Durgin

03/28/2017

04:39 PM Bug #19400 (Resolved): add more info during pool delete error
In luminous the mon_allow_pool_delete is default to false and it may be confusing for any admin who
tries to delete ...
Vasu Kulkarni
02:55 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue

We decided to stop the ceph-mgr service in all the nodes because is using lot of CPU and we understood that this se...
Jaime Ruiz
02:51 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue

Fixed with the following commands:
The memory is released by applying the following commands in a content no...
Jaime Ruiz
09:40 AM Documentation #18986: Need to document monitor health configuration values
the description of "mon warn osd usage percent" and "mon_osd_min_in_ratio " can be found at https://github.com/ceph/... Kefu Chai

03/27/2017

09:09 PM Bug #19267: rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad
Oh I see, it's missing the error string.
I'm not sure if in this case it's just taking one of them as authoritativ...
Greg Farnum
08:59 AM Bug #19320: Pg inconsistent make ceph osd down
backtrace in the attached log_inconsistent.txt... Kefu Chai

03/24/2017

10:01 PM Feature #19384 (New): ceph_objectstore_tool (set|clear)-missing-item command
This one is only relevant for kraken and later. It would be good to have a command for directly manipulating a pg's ... Samuel Just
09:59 PM Feature #19383 (New): ceph_objectstore_tool: set-version op to allow setting the prior_version an...
The motivation for this one is to be able to manually do part of what mark_unfound_lost revert does automatically and... Samuel Just
09:49 PM Bug #19380 (New): only sort of a bug: it's possible to get an unfound object without losing min_s...
Fundamentally, ReplicatedBackend does destructive updates. That makes the following sequence possible. Assume that ... Samuel Just
09:33 PM Bug #19379 (Resolved): bluestore: crc mismatch after recent overwrite
... Sage Weil
08:37 PM Bug #19377: mark_unfound_lost revert won't actually recover the objects unless there are some fou...
There is a very clumsy workaround to this issue. Once the mark_unfound_lost revert commands claims to have completed... Samuel Just
08:34 PM Bug #19377 (Duplicate): mark_unfound_lost revert won't actually recover the objects unless there ...
See ReplicatedPG::start_recovery_ops. If the num_missing==num_unfound, we don't try to do recovery. This is problem... Samuel Just

03/23/2017

01:36 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Hi !
I also got that issue. I also added "bluefs_allocator = stupid" in my /etc/ceph/ceph.conf. Worked.
Issue happe...
François Blondel

03/22/2017

10:14 PM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
I've created https://github.com/ceph/ceph/pull/14054 to track Alexandre's changes.
I'm working on handling out of ...
David Zafman
09:10 AM Bug #19348 (Can't reproduce): "ceph ping mon.c" cli prints assertion failure on timeout
# start a cluster with 3 monitors: mon.a, mon.b and mon.c
# stop mon.c
# ceph ping mon.c --connect-timeout=5
it ...
Kefu Chai

03/21/2017

10:09 AM Bug #19320 (New): Pg inconsistent make ceph osd down
Hi all.
I am running a ceph cluster.
These is a pg inconsistent:
pg 3.aff is active+recovery_wait+degraded+incon...
hoan nv

03/20/2017

04:26 AM Feature #15835: filestore: randomize split threshold
Hi! I am an undergrad student wishing to contribute to CEPH, and I would like to work on this issue. Please let me kn... Peng Chen

03/18/2017

06:03 AM Bug #19267: rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad
Greg Farnum wrote:
> I don't understand. What about this output says that two copies are bad and one isn't?
Thank...
cheng li

03/17/2017

09:33 PM Bug #19300 (Can't reproduce): "Segmentation fault ceph_test_objectstore --gtest_filter=-*/3"
Run: http://pulpito.ceph.com/yuriw-2017-03-16_15:10:12-rados-wip-yuri-testing_2017_3_16-distro-basic-smithi/
Logs: h...
Yuri Weinstein
09:11 PM Bug #19267: rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad
I don't understand. What about this output says that two copies are bad and one isn't? Greg Farnum
08:11 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
As far as I know that was with -f passed. Maybe is relevant that I grepped out 'madvise' calls because they occur at... Ben Meekhof
07:59 PM Bug #19299 (Need More Info): Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
Sage Weil
07:58 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
the strace doesn't include child processes.. can you repeat with -f passed to strace? Sage Weil
07:48 PM Bug #19299 (Can't reproduce): Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
Since upgrading to Kraken we've had severe problems with OSD startup. Though this ticket mentions bootup specificall... Ben Meekhof
 

Also available in: Atom