Activity
From 03/17/2017 to 04/15/2017
04/15/2017
- 10:00 AM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- This failure is plaguing kraken backports - see e.g.
* https://github.com/ceph/ceph/pull/14517
* https://github.c... - 08:09 AM Bug #18599 (Resolved): bluestore: full osd will not start. _do_alloc_write failed to reserve 0x1...
04/13/2017
- 01:03 PM Bug #19606 (Can't reproduce): monitors crash on incorrect OSD UUID (and bad uuid following reboot?)
- I had restarted host with few OSDs and all three monitors crashed.
version: 10.2.6-0ubuntu0.16.04.1
Trace:
... - 10:24 AM Bug #19605 (Resolved): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Seen in master multimds test run here:
http://pulpito.ceph.com/jspray-2017-04-12_23:38:47-multimds-master-testing-ba... - 07:43 AM Bug #10348 (Won't Fix): crushtool --show-choose-tries overflows
- The conditions that create the statistic lossage do not make logical sense (i.e. the total_tries is lower than the lo...
04/12/2017
- 05:42 PM Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.
- Is this still a problem?
- 05:34 PM Bug #12659: Can't delete cache pool
- Just found this bug. Is this still causing problems?
- 04:24 PM Bug #8675: Unnecessary remapping/backfilling?
- CRUSH improvements are a continuously ongoing discussion, and it's being improved right now.
- 04:23 PM Bug #8675 (Won't Fix): Unnecessary remapping/backfilling?
- 03:03 PM Bug #18926: Why osds do not release memory?
- I also see memory leakage on v12.0.1 w/ bluestore. My test is 1000 clients writing at 1MB/s into a CephFS. The OSDs s...
- 01:11 AM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
- I've updated my comment in https://github.com/ceph/ceph/pull/14318.
04/11/2017
- 08:37 PM Bug #19487: "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
Let's say I set osd failsafe ful ratio = .90. Below I made up these numbers to show how
these percentages won't...
04/10/2017
- 11:11 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- 1. We got the same problem when the power of data center is shutdown(the electricity was cut off). There are two osds...
- 04:38 AM Bug #19487 (Fix Under Review): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_...
- https://github.com/ceph/ceph/pull/14318
04/06/2017
- 01:55 PM Bug #19518: log entry does not include per-op rvals?
- ...
- 01:52 PM Bug #19518 (New): log entry does not include per-op rvals?
- ...
- 10:21 AM Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDs
- We recently had an interesting issue with RBD images and filestore on Jewel 10.2.5:
We have a pool with RBD images, ...
04/05/2017
- 01:56 PM Bug #19379 (Resolved): bluestore: crc mismatch after recent overwrite
- 09:46 AM Bug #18467: ceph ping mon.* can fail
- @Nathan
Its value should be lower so that fault is easier to reproduce. Now, ping socket error will reconnect auto...
04/04/2017
- 10:26 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
- Totally misdiagnosed this one; closing the PR.
The problem looks like it's related to map skipping. Here:
<pre... - 10:09 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
- /a/sage-2017-03-31_02:07:33-rados:thrash-wip-kill-subop-reordered---basic-smithi/968193
- 03:31 PM Bug #19449: 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing?
- seems that my tunables jumped (for some reason) from firefly (jewel defaults, right?) to hammer, if it really happene...
- 02:09 PM Bug #19487 (Closed): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
- 1) Use vstart.sh to create a cluster, with option: osd failsafe ful ratio = .46
2) Input "ceph df":
GLOBAL:
SIZ... - 01:31 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
- With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, ...
04/03/2017
- 10:52 AM Bug #19449 (Won't Fix): 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors whil...
- Hi,
when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.
...
04/02/2017
- 04:18 AM Bug #19444: BlueStore::read() asserts in rados qa run
- not reproducible on master: http://pulpito.ceph.com/kchai-2017-04-02_03:46:37-rados-master---basic-mira/
not repro... - 03:48 AM Bug #19444: BlueStore::read() asserts in rados qa run
- see also https://github.com/rook/rook/issues/374
- 03:44 AM Bug #19444 (Can't reproduce): BlueStore::read() asserts in rados qa run
- ...
03/31/2017
- 08:09 PM Bug #13385: cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round fa...
- Just saw this bug , cluster was working normally one minute, and the next it's doing this on most of the OSDs. Never...
- 03:41 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
- I'm experiencing this runaway memory issue as well. It only appeared a couple of days ago. I tried setting the bluest...
- 01:56 PM Bug #19440: osd: trims maps taht pgs haven't consumed yet when there are gaps
- https://github.com/ceph/ceph/pull/14270
- 01:56 PM Bug #19440 (Fix Under Review): osd: trims maps taht pgs haven't consumed yet when there are gaps
- 01:52 PM Bug #19440 (New): osd: trims maps taht pgs haven't consumed yet when there are gaps
- ...
- 01:21 PM Feature #19384: ceph_objectstore_tool (set|clear)-missing-item command
- Chang Liu wrote:
> Hi, Sam
>
> I looked at this problem, and find other problem.
>
> [...]
>
> We don't use... - 12:59 PM Feature #19384: ceph_objectstore_tool (set|clear)-missing-item command
- Hi, Sam
I looked at this problem, and find other problem....
03/29/2017
- 01:07 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
- sorry wrong window . ignore my previous comment
- 01:00 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
This is a bug with the ceph-mgr service -->> http://tracker.ceph.com/issues/19407 and currently set to need review ...- 06:42 AM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
- Hi Jaime,
The issue not fixed with this workaround, and we will address this workaround in another issue related t... - 12:34 AM Feature #15835: filestore: randomize split threshold
- This one is more about performance testing, and at this point I think effort there is better spent on bluestore than ...
03/28/2017
- 04:39 PM Bug #19400 (Resolved): add more info during pool delete error
- In luminous the mon_allow_pool_delete is default to false and it may be confusing for any admin who
tries to delete ... - 02:55 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
We decided to stop the ceph-mgr service in all the nodes because is using lot of CPU and we understood that this se...- 02:51 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
Fixed with the following commands:
The memory is released by applying the following commands in a content no...- 09:40 AM Documentation #18986: Need to document monitor health configuration values
- the description of "mon warn osd usage percent" and "mon_osd_min_in_ratio " can be found at https://github.com/ceph/...
03/27/2017
- 09:09 PM Bug #19267: rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad
- Oh I see, it's missing the error string.
I'm not sure if in this case it's just taking one of them as authoritativ... - 08:59 AM Bug #19320: Pg inconsistent make ceph osd down
- backtrace in the attached log_inconsistent.txt...
03/24/2017
- 10:01 PM Feature #19384 (New): ceph_objectstore_tool (set|clear)-missing-item command
- This one is only relevant for kraken and later. It would be good to have a command for directly manipulating a pg's ...
- 09:59 PM Feature #19383 (New): ceph_objectstore_tool: set-version op to allow setting the prior_version an...
- The motivation for this one is to be able to manually do part of what mark_unfound_lost revert does automatically and...
- 09:49 PM Bug #19380 (New): only sort of a bug: it's possible to get an unfound object without losing min_s...
- Fundamentally, ReplicatedBackend does destructive updates. That makes the following sequence possible. Assume that ...
- 09:33 PM Bug #19379 (Resolved): bluestore: crc mismatch after recent overwrite
- ...
- 08:37 PM Bug #19377: mark_unfound_lost revert won't actually recover the objects unless there are some fou...
- There is a very clumsy workaround to this issue. Once the mark_unfound_lost revert commands claims to have completed...
- 08:34 PM Bug #19377 (Duplicate): mark_unfound_lost revert won't actually recover the objects unless there ...
- See ReplicatedPG::start_recovery_ops. If the num_missing==num_unfound, we don't try to do recovery. This is problem...
03/23/2017
- 01:36 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
- Hi !
I also got that issue. I also added "bluefs_allocator = stupid" in my /etc/ceph/ceph.conf. Worked.
Issue happe...
03/22/2017
- 10:14 PM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- I've created https://github.com/ceph/ceph/pull/14054 to track Alexandre's changes.
I'm working on handling out of ... - 09:10 AM Bug #19348 (Can't reproduce): "ceph ping mon.c" cli prints assertion failure on timeout
- # start a cluster with 3 monitors: mon.a, mon.b and mon.c
# stop mon.c
# ceph ping mon.c --connect-timeout=5
it ...
03/21/2017
- 10:09 AM Bug #19320 (New): Pg inconsistent make ceph osd down
- Hi all.
I am running a ceph cluster.
These is a pg inconsistent:
pg 3.aff is active+recovery_wait+degraded+incon...
03/20/2017
- 04:26 AM Feature #15835: filestore: randomize split threshold
- Hi! I am an undergrad student wishing to contribute to CEPH, and I would like to work on this issue. Please let me kn...
03/18/2017
- 06:03 AM Bug #19267: rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad
- Greg Farnum wrote:
> I don't understand. What about this output says that two copies are bad and one isn't?
Thank...
03/17/2017
- 09:33 PM Bug #19300 (Can't reproduce): "Segmentation fault ceph_test_objectstore --gtest_filter=-*/3"
- Run: http://pulpito.ceph.com/yuriw-2017-03-16_15:10:12-rados-wip-yuri-testing_2017_3_16-distro-basic-smithi/
Logs: h... - 09:11 PM Bug #19267: rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad
- I don't understand. What about this output says that two copies are bad and one isn't?
- 08:11 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- As far as I know that was with -f passed. Maybe is relevant that I grepped out 'madvise' calls because they occur at...
- 07:59 PM Bug #19299 (Need More Info): Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- 07:58 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- the strace doesn't include child processes.. can you repeat with -f passed to strace?
- 07:48 PM Bug #19299 (Can't reproduce): Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- Since upgrading to Kraken we've had severe problems with OSD startup. Though this ticket mentions bootup specificall...
Also available in: Atom