Activity
From 06/21/2017 to 07/20/2017
07/20/2017
- 11:47 PM Bug #20545: erasure coding = crashes
- Trying to reproduce this issue in my lab
- 11:20 PM Bug #18209 (Need More Info): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue....
- Zheng, what's the source for this bug? Any updates?
- 10:52 PM Bug #19790: rados ls on pool with no access returns no error
- Looks like we may have set the wrong state on this tracker and therefore overlooked it for the purposes of backportin...
- 08:26 PM Bug #19790 (Pending Backport): rados ls on pool with no access returns no error
- 08:03 PM Bug #19790: rados ls on pool with no access returns no error
- Thanks a lot for the fix in master/luminous, taking the liberty to follow up on this one — looks like the backport to...
- 08:52 PM Bug #20730: need new OSD_SKEWED_USAGE implementation
- see https://github.com/ceph/ceph/pull/16461
- 08:51 PM Bug #20730 (New): need new OSD_SKEWED_USAGE implementation
- I've removed the OSD_SKEWED_USAGE implementation because it isn't smart enough:
1. It doesn't understand different... - 08:30 PM Bug #20704 (Fix Under Review): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- https://github.com/ceph/ceph/pull/16459
- 08:08 PM Bug #20704: osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- This was a bug in persisting the missing state during split. Building a fix.
- 07:48 PM Bug #20704 (In Progress): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- Found a bug in my ceph-objectstore-tool change that could cause this, seeing if it did in this case.
- 03:26 PM Bug #20704 (Resolved): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- ...
- 08:28 PM Backport #20723 (Resolved): jewel: rados ls on pool with no access returns no error
- https://github.com/ceph/ceph/pull/16473
- 08:28 PM Backport #20722 (Rejected): kraken: rados ls on pool with no access returns no error
- 03:58 PM Bug #20667 (Fix Under Review): segv in cephx_verify_authorizing during monc init
- https://github.com/ceph/ceph/pull/16455
I think we *also* need to fix the root cause, though, in commit bf49385679... - 03:25 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- this time with a core...
- 02:52 AM Bug #20667: segv in cephx_verify_authorizing during monc init
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419306
/a/sage-2017-07-19_15:27:16-rados-wi... - 03:42 PM Bug #20705 (Fix Under Review): repair_test fails due to race with osd start
- https://github.com/ceph/ceph/pull/16454
- 03:40 PM Bug #20705 (Resolved): repair_test fails due to race with osd start
- ...
- 03:40 PM Feature #15835: filestore: randomize split threshold
- I spoke too soon, there is significantly improved latency and throughput in longer running tests on several osds.
- 02:54 PM Bug #19939 (Resolved): OSD crash in MOSDRepOpReply::decode_payload
- 02:34 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
- /a/kchai-2017-07-20_03:05:27-rados-wip-kefu-testing-distro-basic-mira/1422161
$ zless remote/mira104/log/ceph-osd.... - 02:53 AM Bug #20694 (Can't reproduce): osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_lo...
- ...
- 10:09 AM Bug #20690: Cluster status is HEALTH_OK even though PGs are in unknown state
- This log excerpt illustrates the problem: https://paste2.org/cne4IzG1
The logs starts immediately after cephfs dep... - 04:54 AM Bug #20645: bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- sorry for not post the version, the assert occured in v12.0.2. maybe its similar with #18054, but i think they are di...
- 03:02 AM Bug #20105 (Resolved): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- 03:01 AM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419525
- 02:51 AM Bug #20693 (Resolved): monthrash has spurious PG_AVAILABILITY etc warnings
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419393
no osd thrashing, but not fully pe... - 02:49 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419390
07/19/2017
- 09:29 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Updatet two of my clusters - will report back. Thanks again.
- 06:11 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Yes i'm - builing right now. But it will take some time to publish that one to the clusters.
- 07:59 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
- 07:53 PM Bug #19971: osd: deletes are performed inline during pg log processing
- merged https://github.com/ceph/ceph/pull/15952
- 06:32 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- /a/yuriw-2017-07-18_19:38:33-rados-wip-yuri-testing3_2017_7_19-distro-basic-smithi/1413393
/a/yuriw-2017-07-18_19:38... - 03:46 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- Another instance, this time jewel:...
- 05:55 PM Bug #20684: pg refs leaked when osd shutdown
- Nice debugging and presentation of your analysis! That's my favorite kind of bug report!
- 03:11 PM Bug #20684 (Fix Under Review): pg refs leaked when osd shutdown
- 03:12 AM Bug #20684: pg refs leaked when osd shutdown
- https://github.com/ceph/ceph/pull/16408
- 03:08 AM Bug #20684 (Resolved): pg refs leaked when osd shutdown
- h1. 1. summary
When kicking a pg, its ref count is great than 1, this cause assert failed.
When osd is in proce... - 04:54 PM Bug #20690 (Need More Info): Cluster status is HEALTH_OK even though PGs are in unknown state
- In an automated test, we see PGs in unknown state, yet "ceph -s" reports HEALTH_OK. The test sees HEALTH_OK and proce...
- 03:16 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- can you retset on current master? this is pretty old code. please reopen if the bug is still present.
- 03:16 PM Support #20648 (Closed): odd osd acting set
- You have three hosts and want to replicate across those domains. It can't do that when one host goes down, so it's do...
- 03:02 PM Bug #20666 (Resolved): jewel -> luminous upgrade doesn't update client.admin mgr cap
- 01:28 PM Bug #19939 (Fix Under Review): OSD crash in MOSDRepOpReply::decode_payload
- https://github.com/ceph/ceph/pull/16421
- 11:55 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- occasionally, i see ...
- 11:15 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- MSODRepOpReply is always sent by OSD.
core dump from osd.1... - 12:49 PM Bug #19605 (New): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i can reproduce this...
- 03:04 AM Bug #20243 (Fix Under Review): Improve size scrub error handling and ignore system attrs in xattr...
- 02:39 AM Bug #20646: run_seed_to_range.sh: segv, tp_fstore_op timeout
- http://pulpito.ceph.com/sage-2017-07-18_16:17:27-rados-master-distro-basic-smithi/
hmm, i think this got fixe din ... - 02:36 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- http://pulpito.ceph.com/sage-2017-07-18_19:06:10-rados-master-distro-basic-smithi/
failed 19/90 - 01:18 AM Feature #15835 (Resolved): filestore: randomize split threshold
- Perf testing is not indicating much benefit, so I'd hold off on backporting this.
07/18/2017
- 10:34 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Stefan A patch for Jewel (current on current jewel branch) is can be found here:
https://github.com/ceph/ceph/pul... - 10:20 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Analysis:
Secondary got scrub map request with scrub_to 1748'25608...- 06:19 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @David
That would be so great! I'm happy to test any patch ;-) - 04:54 PM Bug #20041 (In Progress): ceph-osd: PGs getting stuck in scrub state, stalling RBD
I think I've reproduced this, examining logs.- 09:43 PM Bug #20105 (Fix Under Review): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 fa...
- https://github.com/ceph/ceph/pull/16402
- 08:37 PM Feature #20664 (Closed): compact OSD's omap before active
- This exists as leveldb_compact_on_mount. It may not have functioned in all releases but has been present since Januar...
- 12:03 PM Feature #20664 (Closed): compact OSD's omap before active
- current, we have supported mon_compact_on_start. does it make sense to add this feature to OSD.
likes:... - 08:14 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- We set it to 1 if the MSODRepOpReply is encoded with features that do not contain SERVER_LUMINOUS.
...which I thin... - 09:07 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i found that the header.version of the MOSDRepOpReply message being decoded was 1. but i am using a vstart cluster fo...
- 05:44 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i am able to reproduce this issue using qa/workunits/fs/snaps/untar_snap_rm.sh. but not always...
- 06:04 PM Bug #20666: jewel -> luminous upgrade doesn't update client.admin mgr cap
- 03:34 PM Bug #20666 (Fix Under Review): jewel -> luminous upgrade doesn't update client.admin mgr cap
- https://github.com/ceph/ceph/pull/16395
- 01:23 PM Bug #20666: jewel -> luminous upgrade doesn't update client.admin mgr cap
- Hmm, I suspect the issue is with the bootstrap-mgr keyring. I notice
that when trying a "mgr create" on an upgraded... - 01:22 PM Bug #20666 (Resolved): jewel -> luminous upgrade doesn't update client.admin mgr cap
- ...
- 01:40 PM Bug #20605 (Resolved): luminous mon lacks force_create_pg equivalent
- 01:38 PM Bug #20667 (Resolved): segv in cephx_verify_authorizing during monc init
- ...
- 08:23 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- lower the priority since we haven't spotted it for a while.
- 05:33 AM Bug #20625 (Duplicate): ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
07/17/2017
- 08:10 PM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- ...
- 08:08 PM Bug #20653 (Can't reproduce): bluestore: aios don't complete on very large writes on xenial
- ...
- 03:05 PM Bug #20631 (Resolved): OSD needs restart after upgrade to luminous IF upgraded before a luminous ...
- 02:05 PM Bug #20631: OSD needs restart after upgrade to luminous IF upgraded before a luminous quorum
- 02:05 PM Bug #20605: luminous mon lacks force_create_pg equivalent
- 12:15 PM Bug #20602 (Resolved): mon crush smoke test can time out under valgrind
- 11:12 AM Bug #20625: ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- tried to reproduce on btrfs locally, no luck.
- 03:00 AM Bug #20625: ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- ...
- 02:41 AM Support #20648 (Closed): odd osd acting set
- I have three host.
When I set one of them down.
I got something like this.... - 02:21 AM Bug #20646 (New): run_seed_to_range.sh: segv, tp_fstore_op timeout
- ...
07/16/2017
- 09:41 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Uh, I don't think master branch has this problem. Since "list-snaps"'s result has been moved from ObjectContext::obs....
- 09:24 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- But I'm working on it.
- 08:54 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Sorry, as the related source code has been reconstructed and I haven't test this for the master branch, I can't judge...
- 08:03 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Thanks for the jewel-specific fix. Has the bug been declared fixed in master, though?
- 07:26 AM Backport #17445 (Fix Under Review): jewel: list-snap cache tier missing promotion logic (was: rbd...
- 06:34 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- It seems that ReplicatedPG::do_op's code of "master" branch has been totally reconstructed, so I submitted a pull req...
- 08:09 AM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
it seems like alloc hint equal end of wal-bdev, but the begin of the wal-bdev is still in use...
my wal-bdev si...
07/15/2017
- 07:49 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Jason: *argh* yes this seems to be correct.
So it seems i didn't have any logs. Currently no idea how to generate... - 07:31 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Stefan: just for clarification, I believe the gpg-encrypted ceph-post-file dump was the gcore of the OSD and a Debia...
- 06:38 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Hello @david,
the best logs i could produce with level 20 i sent to @Jason Dillaman 2 month ago (pgp encrypted). R... - 07:32 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Definitely sounds like it could be the root-cause to me. Thanks for the investigation help.
- 02:48 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- I encountered the same promblem.
I debugged a little, and found that this might have something to do with the "cache... - 02:34 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- I encountered the same promblem.
I debugged a little, and found that this might have something to do with the "cache... - 08:27 AM Bug #20605 (Fix Under Review): luminous mon lacks force_create_pg equivalent
- https://github.com/ceph/ceph/pull/16353
07/14/2017
- 11:01 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Stefan Priebe wrote:
> Anything i could provide or test? VMs are still crashing every night...
Can you reproduce ... - 09:51 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Based on a the earlier information:
subset_last_update = {
version = 20796861,
epoch = 453051,
...- 08:32 PM Backport #20638 (In Progress): kraken: EPERM: cannot set require_min_compat_client to luminous: 6...
- 08:22 PM Backport #20638 (Need More Info): kraken: EPERM: cannot set require_min_compat_client to luminous...
- Now I'm not sure
- 08:11 PM Backport #20638 (In Progress): kraken: EPERM: cannot set require_min_compat_client to luminous: 6...
- 08:10 PM Backport #20638 (Resolved): kraken: EPERM: cannot set require_min_compat_client to luminous: 6 co...
- https://github.com/ceph/ceph/pull/16342
- 08:31 PM Backport #20639 (In Progress): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 ...
- 08:23 PM Backport #20639 (Need More Info): jewel: EPERM: cannot set require_min_compat_client to luminous:...
- Not sure if the PR really fixes this bug
- 08:12 PM Backport #20639 (In Progress): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 ...
- 08:10 PM Backport #20639 (Resolved): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 con...
- https://github.com/ceph/ceph/pull/16343
- 08:09 PM Bug #20546 (Resolved): buggy osd down warnings by subtree vs crush device classes
- 03:57 PM Bug #20602 (Fix Under Review): mon crush smoke test can time out under valgrind
- 03:52 PM Bug #20602: mon crush smoke test can time out under valgrind
- Valgrind is slow to do the fork and cleanup; that's why we keep timing out. Blame e189f11fcde6829cc7f86894b913bc1a3f...
- 03:31 PM Bug #20602: mon crush smoke test can time out under valgrind
- Valgrind is slow to do the fork and cleanup; that's why we keep timing out. Blame e189f11fcde6829cc7f86894b913bc1a3f...
- 01:57 PM Bug #20602: mon crush smoke test can time out under valgrind
- A simple workaround would be to make a 'mon smoke test crush changes' option and turn it off when using valgrind.. wh...
- 02:55 AM Bug #20602: mon crush smoke test can time out under valgrind
- /a/kchai-2017-07-13_18:13:10-rados-wip-kefu-testing-distro-basic-smithi/1396642
rados/singleton-nomsgr/{all/valgri... - 02:51 AM Bug #20602: mon crush smoke test can time out under valgrind
- /a/sage-2017-07-13_20:38:15-rados-wip-sage-testing-distro-basic-smithi/1397207
that's two consecutive runs for me.. - 03:31 PM Bug #20601 (Duplicate): mon comamnds time out due to pool create backlog w/ valgrind
- ok, the problem is that the fork-based crushtool test is very slow under valgrind (valgrind has to do init/cleanup on...
- 03:23 PM Bug #20601: mon comamnds time out due to pool create backlog w/ valgrind
- It isn't that pool creations are serialized, actually; they are already batched. Maybe valgrind is just making it sl...
- 02:51 AM Bug #20601: mon comamnds time out due to pool create backlog w/ valgrind
- another failure with same cause, different symptom: this time a 'osd out 0' timed out due to a bunch of pool creates....
- 03:19 PM Bug #20475 (Pending Backport): EPERM: cannot set require_min_compat_client to luminous: 6 connect...
- https://github.com/ceph/ceph/pull/16340 merged to master
backports for kraken and jewel:
https://github.com/ceph/... - 02:05 PM Bug #20475 (In Progress): EPERM: cannot set require_min_compat_client to luminous: 6 connected cl...
- 03:04 AM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- ok, smithi083 was (is!) locked by
/home/teuthworker/archive/teuthology-2017-07-13_05:10:02-fs-kraken-distro-basic-... - 02:57 AM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- baddy is...
- 02:56 PM Bug #20631 (Fix Under Review): OSD needs restart after upgrade to luminous IF upgraded before a l...
- https://github.com/ceph/ceph/pull/16341
- 02:42 PM Bug #20631 (Resolved): OSD needs restart after upgrade to luminous IF upgraded before a luminous ...
- If an OSD is upgraded to luminous before the monmap has the luminous feature, it will require to be restarted before ...
- 09:51 AM Fix #20627 (New): Clean config special cases out of common_preinit
- Post-https://github.com/ceph/ceph/pull/16211, we should use set_daemon_default for this:...
- 03:16 AM Bug #20600 (Resolved): 'ceph pg set_full_ratio ...' blocks on luminous
- 03:15 AM Bug #20617 (Resolved): Exception: timed out waiting for mon to be updated with osd.0: 0 < 4724464...
- 03:14 AM Bug #20626 (Can't reproduce): failed to become clean before timeout expired, pgs stuck unknown
- ...
- 02:50 AM Bug #20625 (Duplicate): ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- ...
- 02:30 AM Bug #20624 (Duplicate): cluster [WRN] Health check failed: no active mgr (MGR_DOWN)" in cluster log
- mgr.x...
07/13/2017
- 04:40 PM Bug #20546: buggy osd down warnings by subtree vs crush device classes
- https://github.com/ceph/ceph/pull/16221
- 02:30 PM Bug #20602: mon crush smoke test can time out under valgrind
- /a/sage-2017-07-12_19:30:01-rados-wip-sage-testing-distro-basic-smithi/1392270
rados/singleton-nomsgr/{all/valgrind-... - 02:17 PM Bug #20617 (Fix Under Review): Exception: timed out waiting for mon to be updated with osd.0: 0 <...
- https://github.com/ceph/ceph/pull/16322
- 02:14 PM Bug #20617 (Resolved): Exception: timed out waiting for mon to be updated with osd.0: 0 < 4724464...
- ...
- 02:16 PM Bug #20616: pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is set but no...
- This can't be reproduced with 12.1.0. So this have been fixed in the meantime.
- 01:48 PM Bug #20616 (Resolved): pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is...
- Hi,
In Gnocchi, with use the python-rados API and we recently encounter some data corruption when "rados_osd_op_ti...
07/12/2017
- 10:30 PM Bug #20605 (Resolved): luminous mon lacks force_create_pg equivalent
- This was part of the now-defunct PGMonitor. Also, pg creation is totally different now.
Create new 'osd force-cre... - 09:19 PM Bug #20332: rados bench seq option doesn't work
- Yeah so IIRC it will stop after the specified time, but if it runs out of data that's it. I suppose it could loop? No...
- 06:54 PM Bug #20332: rados bench seq option doesn't work
- I think the bug here is that the specified seconds isn't honored in the "seq" case. It probably reads every object o...
- 03:24 PM Bug #20332 (Need More Info): rados bench seq option doesn't work
- 07:16 PM Bug #20600 (Fix Under Review): 'ceph pg set_full_ratio ...' blocks on luminous
- https://github.com/ceph/ceph/pull/16300
- 03:09 PM Bug #20600 (In Progress): 'ceph pg set_full_ratio ...' blocks on luminous
- 01:32 PM Bug #20600: 'ceph pg set_full_ratio ...' blocks on luminous
- This actually affects any command that remains in mon, not just "pg set_full_ratio".
- 01:22 PM Bug #20600 (Resolved): 'ceph pg set_full_ratio ...' blocks on luminous
- 06:55 PM Bug #20041 (Need More Info): ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 06:27 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Was the osd log saved for the primary of a stuck PG in this state? Can this be reproduced and provide an osd log?
- 05:57 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Another one: /ceph/teuthology-archive/pdonnell-2017-07-07_20:24:01-fs-wip-pdonnell-20170706-distro-basic-smithi/13723...
- 05:14 PM Bug #20546 (Fix Under Review): buggy osd down warnings by subtree vs crush device classes
- 03:43 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Since this flag is set all the time now, it (and the require_x_osds flags) aren't shown by default. Does it appear in...
- 03:33 PM Bug #20545: erasure coding = crashes
- So this looks like you're just killing the cluster by overflowing it with infinite IO. The crash is distressing, though.
- 03:32 PM Bug #20545: erasure coding = crashes
- From the log the backtrace is:...
- 03:31 PM Bug #20552: "Scrubbing terminated -- not all pgs were active and clean." error in rados
- this was fixed a few days ago (there were too few osds in the test yaml)
- 03:30 PM Bug #20552 (Resolved): "Scrubbing terminated -- not all pgs were active and clean." error in rados
- 03:21 PM Bug #20507 (Duplicate): "[WRN] Manager daemon x is unresponsive. No standby daemons available." i...
- 03:18 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- The fact that the error stopped when cinder was stopped makes me thing this was related to in-flight requests from th...
- 03:18 PM Bug #18746 (Need More Info): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (j...
- 03:12 PM Bug #20562 (Resolved): Monitor's "perf dump cluster" values are no longer maintained
- 01:41 PM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-12_02:31:06-rbd-wip-health-distro-basic-smithi/1389750
this is about to trigger more test failures... - 01:28 PM Bug #20602 (Resolved): mon crush smoke test can time out under valgrind
- /a/sage-2017-07-12_02:32:14-rados-wip-sage-testing-distro-basic-smithi/1390174
rados/singleton-nomsgr/{all/valgrind-... - 01:27 PM Bug #20601 (Duplicate): mon comamnds time out due to pool create backlog w/ valgrind
- This isn't wrong per se, but it does mean worklaods with lots of pool creates (parallel rados api tests) and slow mon...
07/11/2017
- 09:53 PM Bug #20590 (Duplicate): 'sudo ceph --cluster ceph osd new xx" no valid command in upgrade:jewel-x...
- 09:41 PM Bug #20590 (Duplicate): 'sudo ceph --cluster ceph osd new xx" no valid command in upgrade:jewel-x...
- Run: http://pulpito.ceph.com/teuthology-2017-07-11_04:23:04-upgrade:jewel-x-master-distro-basic-smithi/
Jobs: '13854... - 06:48 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- https://github.com/ceph/ceph/pull/16265
- 06:46 PM Bug #20470 (Resolved): rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- 05:09 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
For some reason pg 2.0 is created on osd.0 which never happened previously....- 02:42 PM Bug #20561: bluestore: segv in _deferred_submit_unlock from deferred_try_submit, _txc_finish
- This might be related to a failure reported on the list:...
- 02:16 PM Feature #12195 (Resolved): 'ceph osd version' to print OSD versions
- We now have a 'ceph osd versions' that will return the versions of osds in the cluster. At first sight it seems it do...
- 01:57 PM Feature #5657 (Resolved): monitor: deal with bad crush maps more gracefully
- Resolved at some point by using external crushtool to validate crushmaps.
- 01:54 PM Feature #6325 (New): mon: mon_status should make it clear when the mon has connection issues
- 01:52 PM Feature #4835 (Resolved): Monitor: better handle aborted synchronizations
- The synchronization code has been overhauled a few times in the past few years. I believe this to have been resolved ...
- 01:50 PM Cleanup #10506: mon: get rid of QuorumServices
- I don't think the QuorumService interface is bringing enough to the table to keep it around.
What we are achieving... - 04:25 AM Bug #20504 (Resolved): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- 04:23 AM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- not (always) reproducible with a single try: http://pulpito.ceph.com/kchai-2017-07-11_03:53:32-rados-master-distro-ba...
- 03:49 AM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- partial bt of /a/sage-2017-07-10_16:55:37-rados-wip-sage-testing-distro-basic-smithi/1383143:...
- 03:58 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- filed #20566 against fs for "Behind on trimming" warnings from MDS
- 02:09 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- http://pulpito.ceph.com/kchai-2017-07-10_10:29:54-powercycle-master-distro-basic-smithi/ failed with...
07/10/2017
- 09:50 PM Bug #20433 (Resolved): 'mon features' does not update properly for mons
- 09:47 PM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- /a/sage-2017-07-10_16:55:37-rados-wip-sage-testing-distro-basic-smithi/1383143
similar, but a seg fault!... - 08:23 PM Bug #20562 (Fix Under Review): Monitor's "perf dump cluster" values are no longer maintained
- https://github.com/ceph/ceph/pull/16249
- 08:11 PM Bug #20562 (In Progress): Monitor's "perf dump cluster" values are no longer maintained
- ...
- 05:20 PM Bug #20562 (Resolved): Monitor's "perf dump cluster" values are no longer maintained
- We have a PerfCounters collection in the monitor which maintains cluster aggregate data like storage space available,...
- 08:11 PM Bug #20563 (Duplicate): mon: fix clsuter-level perfcounters to pull from PGMapDigest
- Sage Weil wrote:
> [...]
- 08:00 PM Bug #20563 (Duplicate): mon: fix clsuter-level perfcounters to pull from PGMapDigest
- ...
- 05:29 PM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- I don't think we run the powercycle tests very often — they're hard on the hardware. This may not really be an immedi...
- 10:19 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- have we spotted this problem recently after the first occurrence?
rerunning at http://pulpito.ceph.com/kchai-2017-... - 04:51 PM Bug #20561 (Can't reproduce): bluestore: segv in _deferred_submit_unlock from deferred_try_submit...
- ...
- 10:14 AM Bug #20525 (Need More Info): ceph osd replace problem with osd out
- 10:14 AM Bug #20525: ceph osd replace problem with osd out
- peng,
i don't follow you. could you rephrase your problem? what is the expected behavior? - 05:27 AM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- /a//sage-2017-07-09_19:14:46-rados-wip-sage-testing-distro-basic-smithi/1379319
- 03:02 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i will look at this issue again later on if no progress has been made before then.
07/09/2017
- 06:34 PM Bug #20545: erasure coding = crashes
- Sorry, forgot a line of the code. Here's the exact process I'm using to do this:
Shell:... - 06:27 PM Bug #20545: erasure coding = crashes
- I ran Rados bench on the same cluster and it seems to be working fine, so it seems that something about my Python cod...
- 05:49 PM Bug #20545: erasure coding = crashes
- Actually I thought to test this with filestore on BTRFS and it fails there in the same way as well. This seems to be ...
- 06:14 PM Bug #20446 (Resolved): mon does not let you create crush rules using device classes
- 11:39 AM Bug #20433: 'mon features' does not update properly for mons
- https://github.com/ceph/ceph/pull/16230
- 11:38 AM Bug #20433 (Fix Under Review): 'mon features' does not update properly for mons
- 02:40 AM Bug #17743 (Won't Fix): ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (k...
- see https://github.com/ceph/ceph/pull/16215 (disabled the memstore tests on kraken)
07/08/2017
- 09:15 PM Bug #20543: osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry found") in PGLog::re...
- also in yuriw-2017-07-07_22:19:55-rados-wip-yuri-testing2_2017_7_9-distro-basic-smithi
job: 1373063 - 02:01 PM Bug #19964 (Resolved): occasional crushtool timeouts
- 02:21 AM Bug #19815: Rollback/EC log entries take gratuitous amounts of memory
- It seemed that this bug has been fixed at version 12.1.0.
https://github.com/ceph/ceph/commit/9da684316630ac1c087e...
07/07/2017
- 10:13 PM Bug #20552 (Resolved): "Scrubbing terminated -- not all pgs were active and clean." error in rados
- Run: http://pulpito.ceph.com/yuriw-2017-07-06_20:01:14-rados-wip-yuri-testing3_2017_7_8-distro-basic-smithi/
Job: 13... - 10:11 PM Bug #20551 (Duplicate): LOST_REVERT assert during rados bench+thrash in ReplicatedBackend::prepar...
- From osd.0 in:
http://pulpito.ceph.com/yuriw-2017-07-06_20:01:14-rados-wip-yuri-testing3_2017_7_8-distro-basic-smi... - 09:44 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
- 04:22 PM Bug #20471: Can't repair corrupt object info due to bad oid on all replicas
- 08:39 PM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- Hmm, seems like that might slow stuff down enough to make it an unrealistic model, so probably not something we shoul...
- 03:50 AM Bug #20303 (Need More Info): filejournal: Unable to read past sequence ... journal is corrupt
- The logs end long before the event in question. I think in order for us to gather more useful logs for the powercycl...
- 08:37 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- What info do we need if this is reproducing with nightly logging?
- 03:45 AM Bug #20475 (Need More Info): EPERM: cannot set require_min_compat_client to luminous: 6 connected...
- 06:42 PM Bug #20546 (Resolved): buggy osd down warnings by subtree vs crush device classes
- The subtree-based down (host down etc) messages appear to be confused by the shadow hieararchy from crush device clas...
- 05:43 PM Bug #20545 (Duplicate): erasure coding = crashes
- Steps to reproduce:
* Create 4 OSDs and a mon on a machine (4TB disk per OSD, Bluestore, using dm-crypt too), usi... - 03:39 PM Bug #19964: occasional crushtool timeouts
- 03:38 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- https://github.com/ceph/ceph/pull/16215 ?
- 03:36 PM Bug #20454 (Resolved): bluestore: leaked aios from internal log
- 03:35 PM Bug #20434 (Resolved): mon metadata does not include ceph_version
- 03:13 PM Bug #20543 (Can't reproduce): osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry fo...
- ...
- 03:08 PM Bug #20534 (Resolved): unittest_direct_messenger segv
- 08:08 AM Bug #20534 (Fix Under Review): unittest_direct_messenger segv
- 02:42 PM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 05:49 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16201
i swear: this is the last PR for this ticket! - 02:22 AM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 03:46 AM Bug #20381 (Resolved): bluestore: deferred aio submission can deadlock with completion
- https://github.com/ceph/ceph/pull/16051 merged
- 02:35 AM Bug #19518: log entry does not include per-op rvals?
- https://github.com/ceph/ceph/pull/16196 disables the assertion until we fix this bug.
07/06/2017
- 09:54 PM Bug #20326: Scrubbing terminated -- not all pgs were active and clean.
- Saw this error here:
/ceph/teuthology-archive/pdonnell-2017-07-01_01:07:39-fs-wip-pdonnell-20170630-distro-basic-s... - 09:19 PM Bug #20534: unittest_direct_messenger segv
- was able to reproduce with:...
- 07:37 PM Bug #20534 (Resolved): unittest_direct_messenger segv
- ...
- 02:34 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 09:20 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16159
- 06:36 AM Bug #20432: pgid 0.7 has ref count of 2
- at the end of @OSD::process_peering_events()@, @dispatch_context(rctx, 0, curmap, &handle)@ is called, which just del...
- 10:30 AM Backport #20511 (In Progress): jewel: cache tier osd memory high memory consumption
- 10:19 AM Backport #20492 (In Progress): jewel: osd: omap threadpool heartbeat is only reset every 100 values
- 04:27 AM Feature #20526: swap-bucket can save the crushweight and osd weight?
- it not a bug just a need feature
- 04:25 AM Feature #20526 (New): swap-bucket can save the crushweight and osd weight?
- i test the swap-bucket function,and have some advice
when use swap-bucket the dst bucket will in the old crush tre... - 03:20 AM Bug #20525 (Need More Info): ceph osd replace problem with osd out
- i have try the new function of replace the osd with new command ,it work, but i have some problem,i don't know if it'...
- 02:30 AM Bug #20434 (Fix Under Review): mon metadata does not include ceph_version
- https://github.com/ceph/ceph/pull/16148 ?
07/05/2017
- 08:05 PM Bug #18924 (Resolved): kraken-bluestore 11.2.0 memory leak issue
- 08:05 PM Backport #20366 (Resolved): kraken: kraken-bluestore 11.2.0 memory leak issue
- 07:48 PM Bug #20434: mon metadata does not include ceph_version
- ...
- 05:42 PM Backport #20512 (Rejected): kraken: cache tier osd memory high memory consumption
- 05:42 PM Backport #20511 (Resolved): jewel: cache tier osd memory high memory consumption
- https://github.com/ceph/ceph/pull/16169
- 04:15 PM Bug #20454: bluestore: leaked aios from internal log
- 03:34 PM Bug #20507 (Duplicate): "[WRN] Manager daemon x is unresponsive. No standby daemons available." i...
- /a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356209
rados/monthrash/{ceph.yaml clusters... - 03:33 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- /a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356174
rados/singleton-bluestore/{all/ceph... - 11:33 AM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 08:08 AM Bug #20432: pgid 0.7 has ref count of 2
- /a/kchai-2017-07-05_04:38:56-rados-wip-kefu-testing2-distro-basic-mira/1363113...
- 10:52 AM Feature #5249 (Resolved): mon: support leader election configuration
- 07:04 AM Bug #20464 (Pending Backport): cache tier osd memory high memory consumption
- 07:02 AM Bug #20464 (Resolved): cache tier osd memory high memory consumption
- 06:45 AM Bug #20504 (Fix Under Review): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- https://github.com/ceph/ceph/pull/16120
- 06:23 AM Bug #20504 (Resolved): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- h1. 1. description
[root@yhg-1 work]# file 1498638564.27426.core ...
07/04/2017
- 05:51 PM Backport #20497 (In Progress): kraken: MaxWhileTries: reached maximum tries (105) after waiting f...
- 05:34 PM Backport #20497 (Resolved): kraken: MaxWhileTries: reached maximum tries (105) after waiting for ...
- https://github.com/ceph/ceph/pull/16111
- 05:34 PM Bug #20397 (Pending Backport): MaxWhileTries: reached maximum tries (105) after waiting for 630 s...
- 05:09 PM Bug #20433 (In Progress): 'mon features' does not update properly for mons
- 04:46 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- Happened on another kraken backport: https://github.com/ceph/ceph/pull/16108
- 08:33 AM Backport #20493 (Rejected): kraken: osd: omap threadpool heartbeat is only reset every 100 values
- 08:33 AM Backport #20492 (Resolved): jewel: osd: omap threadpool heartbeat is only reset every 100 values
- https://github.com/ceph/ceph/pull/16167
- 07:50 AM Bug #20491: objecter leaked OSDMap in handle_osd_map
- * /a/kchai-2017-07-04_06:08:32-rados-wip-20432-kefu-distro-basic-mira/1359525/remote/mira038/log/valgrind/osd.0.log.g...
- 05:46 AM Bug #20491 (Resolved): objecter leaked OSDMap in handle_osd_map
- ...
- 07:07 AM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 05:49 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16093
- 06:46 AM Bug #20375 (Pending Backport): osd: omap threadpool heartbeat is only reset every 100 values
- 05:35 AM Bug #19695: mon: leaked session
- /a/kchai-2017-07-04_04:14:45-rados-wip-20432-kefu-distro-basic-mira/1357985/remote/mira112/log/valgrind/mon.a.log.gz
- 02:59 AM Bug #20434: mon metadata does not include ceph_version
- Here it is the new output I get from a brand new installed cluster: ...
07/03/2017
- 03:58 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 10:51 AM Bug #20432: pgid 0.7 has ref count of 2
- seems @PG::recovery_queued@ is reset somehow after being set in @PG::queue_recovery()@, but the PG is not removed fro...
- 05:12 AM Bug #20432: pgid 0.7 has ref count of 2
- @Sage,
i reverted the changes introduced by 0780f9e67801f400d78ac704c65caaa98e968bbc and tested the verify test at... - 02:20 AM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 03:29 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- Those look to be 22 and 60, which are DEFINE_CEPH_FEATURE_RETIRED(22, 1, BACKFILL_RESERVATION, JEWEL, LUMINOUS) and D...
- 01:44 PM Documentation #20486: Document how to use bluestore compression
- Joao Luis wrote:
> The bits I found out were through skimming the code, and that did not provide too much insight ... - 01:05 PM Documentation #20486 (Resolved): Document how to use bluestore compression
- Bluestore is becoming the de facto default, and I haven't found any docs on how to configure compression.
The bits...
07/02/2017
- 06:52 PM Bug #20432: pgid 0.7 has ref count of 2
- I suspect 0780f9e67801f400d78ac704c65caaa98e968bbc, which changed when the CLEAN flag was set at the end of recovery.
- 06:51 PM Bug #20432: pgid 0.7 has ref count of 2
- bisecting this... so far i've narrowed it down to something between f43c5fa055386455a263802b0908ddc96a95b1b0 and e972...
- 01:04 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
07/01/2017
- 03:06 PM Bug #20432: pgid 0.7 has ref count of 2
- http://pulpito.ceph.com/kchai-2017-06-30_10:58:17-rados-wip-20432-kefu-distro-basic-smithi/
- 02:52 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- This test confuses me. It seems like the PG is always going to exist on the target osd.. why was it passing before?
- 02:17 PM Bug #20476: ops stuck waiting_for_map
- Trying to reproduce with same commit, more debugging, at http://pulpito.ceph.com/sage-2017-07-01_14:16:23-rados-wip-s...
- 02:08 PM Bug #20476 (Can't reproduce): ops stuck waiting_for_map
- observed many ops hung with waiting_for_map
made a dummy map update ('ceph osd unset nodown')
ops unblocked
... - 01:47 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- I've seen this at least twice now. It is not an upgrade test, so either unauthenticated clients that are strays in t...
- 01:46 PM Bug #20475 (Resolved): EPERM: cannot set require_min_compat_client to luminous: 6 connected clien...
- ...
- 06:35 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- WANG Guoqin wrote:
> Which IRC was that and do you have a chatting log on that?
https://gist.githubusercontent.co... - 06:10 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- sean redmond wrote:
> https://pastebin.com/raw/xmDPg84a was talked about in IRC by @mguz it seems it maybe related b... - 02:16 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-06-30_18:42:09-rados-wip-sage-testing-distro-basic-smithi/1345981
06/30/2017
- 11:28 PM Bug #20471 (Fix Under Review): Can't repair corrupt object info due to bad oid on all replicas
- https://github.com/ceph/ceph/pull/16052
- 11:03 PM Bug #20471 (In Progress): Can't repair corrupt object info due to bad oid on all replicas
- ...
- 05:24 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
We detect a kind of corruption where the oid in the object info doesn't match the oid of the object. This was adde...- 10:34 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- https://pastebin.com/raw/xmDPg84a was talked about in IRC by @mguz it seems it maybe related but this was kraken, jus...
- 03:25 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
- There was a lot of code churn around the 12.0.3 time period so this isn't too surprising to me. I'm not sure it's wo...
- 09:24 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- 09:03 PM Bug #20454: bluestore: leaked aios from internal log
- https://github.com/ceph/ceph/pull/16051 is a better fix
- 09:01 PM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- failure seems to be gone with the timeout change.
- 03:35 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- https://github.com/ceph/ceph/pull/16047
- 03:35 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- Easy workaround is to make the aio queue really big.
Harder fix to do some complicated locking juggling. I worry ... - 03:31 PM Bug #20277 (Can't reproduce): bluestore crashed while performing scrub
- 03:30 PM Cleanup #18734 (Resolved): crush: transparently deprecated ruleset/ruleid difference
- 03:30 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 03:29 PM Bug #20446: mon does not let you create crush rules using device classes
- see https://github.com/ceph/ceph/pull/16027
- 02:06 PM Bug #20470 (Resolved): rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- ...
- 01:51 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-06-30_05:44:03-rados-wip-sage-testing-distro-basic-smithi/1344959...
- 06:54 AM Bug #20432: pgid 0.7 has ref count of 2
- rerunning at http://pulpito.ceph.com/kchai-2017-06-30_06:49:46-rados-master-distro-basic-smithi/, if we can consisten...
- 02:22 AM Bug #17968 (Resolved): Ceph:OSD can't finish recovery+backfill process due to assertion failure
06/29/2017
- 09:19 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
- 09:18 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- https://github.com/ceph/ceph/pull/14760
- 07:33 PM Bug #12615: Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clea...
- This will be fixed when we move repair out of the OSD. We shouldn't be using recovery to do repair anyway.
- 07:32 PM Bug #13493 (Duplicate): osd: for ec, cascading crash during recovery if one shard is corrupted
- 07:18 PM Bug #19964 (Fix Under Review): occasional crushtool timeouts
- https://github.com/ceph/ceph/pull/16025
- 06:17 PM Bug #19750 (Can't reproduce): osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
This isn't happening anymore from what I've seen. If it does let's get the full log. From the lines I'm being sho...- 06:09 PM Bug #17830 (Can't reproduce): osd-scrub-repair.sh is failing (intermittently?) on Jenkins
- Haven't been seeing this at all, so I'm closing for now.
- 05:45 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- 10:07 AM Bug #19939 (Fix Under Review): OSD crash in MOSDRepOpReply::decode_payload
- https://github.com/ceph/ceph/pull/16008
- 04:40 PM Bug #20454 (Fix Under Review): bluestore: leaked aios from internal log
- 04:40 PM Bug #20454 (Rejected): bluestore: leaked aios from internal log
- see #20385
- 03:16 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anthony D'Atri wrote:
> We've experienced at least three distinct cases of ops stuck for long periods of time on a s... - 03:15 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- We've experienced at least three distinct cases of ops stuck for long periods of time on a scrub. The attached file ...
- 08:14 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @josh is this related to #19497?
- 11:11 AM Bug #20464 (Fix Under Review): cache tier osd memory high memory consumption
- 10:59 AM Bug #20464: cache tier osd memory high memory consumption
- https://github.com/ceph/ceph/pull/16011
this is my pull request , please help to review it - 07:13 AM Bug #20464 (Resolved): cache tier osd memory high memory consumption
- the osd used as the cache tier in our EC cluster suffers from the high memory usage (5GB~6GB consumption per osd)
wh... - 08:42 AM Bug #20434: mon metadata does not include ceph_version
- Also just noticed this on a cluster updated from 12.0.3:...
- 03:07 AM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- http://pulpito.ceph.com/sage-2017-06-27_15:03:40-rados:thrash-master-distro-basic-smithi/
baseline on master... 5 ...
06/28/2017
- 10:09 PM Bug #14088 (Resolved): mon: nothing logged when ENOSPC encountered during start up
- 09:31 PM Bug #20434: mon metadata does not include ceph_version
- Assigning the issue to me as a place holder to remove the ticket from the pool of unassigned tickets. Daniel is worki...
- 07:08 PM Bug #20434: mon metadata does not include ceph_version
- Daniel Oliveira wrote:
> Just talked to Sage and looking into this.
I just tested with Luminous branch (and also ... - 05:32 PM Bug #18647: ceph df output with erasure coded pools
- First I would need to know the PR numbers of SHA1 hashes of the commits that fix the issue in master.
- 04:58 PM Bug #18647: ceph df output with erasure coded pools
- Is it possible to backport this into Jewel?
- 03:49 PM Bug #18647 (Resolved): ceph df output with erasure coded pools
- fixed in luminous
- 04:42 PM Bug #20454 (Resolved): bluestore: leaked aios from internal log
- Reprorted and diagnosed by Igor; opening a ticket so we don't forget.
- 04:06 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 04:05 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Kefu, any new updates or should this be unassigned from you?
- 12:51 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Here's another one:
/a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333648
fs/snaps/{b... - 03:57 PM Bug #18926 (Duplicate): Why osds do not release memory?
- see #18924
- 03:43 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- David, anything up with this? Is it an urgent bug?
- 03:41 PM Bug #18204 (Can't reproduce): jewel: finish_promote unexpected promote error (34) Numerical resul...
- 03:40 PM Bug #18467 (Resolved): ceph ping mon.* can fail
- 03:39 PM Bug #19067 (Need More Info): missing set not persisted
- 03:32 PM Bug #19605 (Can't reproduce): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front()...
- If you can reproduce this on master or luminous rc, please reopen!
- 03:31 PM Bug #19790 (Resolved): rados ls on pool with no access returns no error
- 03:30 PM Bug #19911 (Can't reproduce): osd: out of order op
- 03:29 PM Bug #20133 (Can't reproduce): EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksd...
- 03:28 PM Bug #19191: osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missing().is_mis...
- https://github.com/ceph/ceph/pull/14053
- 03:17 PM Bug #19191: osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missing().is_mis...
- 03:27 PM Bug #19983 (Closed): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/Kerne...
- 03:27 PM Bug #18681 (Won't Fix): ceph-disk prepare/activate misses steps and fails on [Bluestore]
- 03:22 PM Bug #19964 (In Progress): occasional crushtool timeouts
- 03:21 PM Bug #20446 (Fix Under Review): mon does not let you create crush rules using device classes
- 02:36 PM Bug #20446: mon does not let you create crush rules using device classes
- https://github.com/ceph/ceph/pull/15975
- 11:49 AM Bug #20446: mon does not let you create crush rules using device classes
- I tested in my env, It does exist in master branch. seems that it's easy to fix this problem. I will create a PR.
- 11:42 AM Bug #20446: mon does not let you create crush rules using device classes
- I will try to verify it.
- 07:20 AM Bug #20446 (Resolved): mon does not let you create crush rules using device classes
- i run ceph 12.1.0 version ,and try crush class function,and find a problem with the name
step:
1.ceph osd cru... - 03:18 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
- 03:17 PM Bug #19895 (Can't reproduce): test/osd/RadosModel.h: 1169: FAILED assert(version == old_value.ver...
- 03:08 PM Bug #20419 (Duplicate): OSD aborts when shutting down
- 02:56 PM Bug #20419: OSD aborts when shutting down
- sage suspects that it could be regression: we switched the order of shutting down recently.
- 10:42 AM Bug #20419: OSD aborts when shutting down
- so somebody was still holding a reference to pg 0.50 when OSD was trying to kick it.
- 02:15 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- aio completion thread blocking on deferred_lock:...
- 12:18 PM Bug #20451 (Can't reproduce): osd Segmentation fault after upgrade from jewel (10.2.5) to kraken ...
- hi,
after upgrade, some osds are down
*** Caught signal (Segmentation fault) **
in thread 7f0237441700 thread... - 10:31 AM Feature #5249: mon: support leader election configuration
- https://github.com/ceph/ceph/pull/15964 enables the MonClient to have preference to the closer monitors.
- 07:00 AM Feature #5249 (Fix Under Review): mon: support leader election configuration
- https://github.com/ceph/ceph/pull/15964
- 08:03 AM Bug #20445 (Need More Info): fio stalls, scrubbing doesn't stop when repeatedly creating/deleting...
- Question for the original reporter of this bug: why do you expect the scrub to stop?
Please provide more details. - 07:13 AM Bug #20445 (Need More Info): fio stalls, scrubbing doesn't stop when repeatedly creating/deleting...
- This happens on latest jewel and is possibly related to (recently merged) https://github.com/ceph/ceph/pull/15529
... - 12:47 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- /a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333726
/a/pdonnell-2017-06-27_19:50:40-fs-w... - 12:07 AM Bug #20439 (Resolved): PG never finishes getting created
dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi/1328370
description: rados/singleton/{all/diverg...
06/27/2017
- 08:13 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- 07:12 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- I didn't do any digging through what patches were in the centos or xenial kernels. Happy if someone wants to chase t...
- 06:22 PM Bug #20434: mon metadata does not include ceph_version
- Just talked to Sage and looking into this.
- 04:46 PM Bug #20434 (Resolved): mon metadata does not include ceph_version
- on lab clsuter, after kraken -> luminous 12.1.0 upgrade,...
- 04:45 PM Bug #20433 (Resolved): 'mon features' does not update properly for mons
- on lab cluster, after upgrade from kraken -> luminous 12.1.0,...
- 04:06 PM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- 02:44 PM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- /a/sage-2017-06-27_05:44:05-rados-wip-sage-testing-distro-basic-smithi/1331664
rados/thrash/{0-size-min-size-overrid... - 04:05 PM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
- That fix (d24a8886658c2d8882275d69c6409717a62701be and 31d3ae8a878f7ede6357f602852d586e0621c73f) was not quite comple...
- 03:18 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- /a/sage-2017-06-27_05:44:05-rados-wip-sage-testing-distro-basic-smithi/1331957
- 03:17 PM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- ...
- 03:00 PM Bug #20419: OSD aborts when shutting down
- http://pulpito.ceph.com/yuriw-2017-06-27_03:16:16-rados-master_2017_6_27-distro-basic-smithi/1329613
http://pulpit...
06/26/2017
- 10:45 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 1:3.12.0-1.1ubuntu1
xenial on smithi107
/a/sage-2017-06-26_14:37:54-rados-wip-sage-testing2-distro-basic-smithi/132... - 10:43 PM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- /a/sage-2017-06-26_14:37:54-rados-wip-sage-testing2-distro-basic-smithi/1327079
rados/thrash/{0-size-min-size-overri... - 10:42 PM Bug #19964: occasional crushtool timeouts
- /a/sage-2017-06-26_14:37:54-rados-wip-sage-testing2-distro-basic-smithi/1327058
rados/thrash/{0-size-min-size-overri... - 04:08 PM Bug #19023 (Resolved): ceph_test_rados invalid read caused apparently by lost intervals due to mo...
- 04:04 PM Bug #20419 (Duplicate): OSD aborts when shutting down
- /a/kchai-2017-06-25_17:19:05-rados-wip-kefu-testing---basic-smithi/1324712/remote/smithi006/log/ceph-osd.3.log.gz
<p... - 02:53 PM Feature #5249: mon: support leader election configuration
- 12:12 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- Has this been reproduced with the following kernel fix applied?
commit 70e7af244f24c94604ef6eca32ad297632018583
A... - 10:11 AM Bug #20416 (Resolved): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Hello,
I've upgraded a Jewel cluster to Luminous 12.1.0 (RC), restarted the monitors, mgr is active, but I can't r...
06/23/2017
- 07:47 PM Bug #20302: "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle-master-dis...
- https://github.com/ceph/ceph/pull/15821
- 07:46 PM Bug #20302 (Resolved): "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle...
- 03:47 PM Bug #20302: "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle-master-dis...
- merged
- 03:10 PM Bug #20389 (Won't Fix): "Error EPERM: min_compat_client jewel < luminous, which is required for p...
- this is actually fine; we're ignoring errors from these commands (so the thrasher can work when the feature is unavai...
- 03:24 AM Bug #20389: "Error EPERM: min_compat_client jewel < luminous, which is required for pg-upmap" in ...
- Also in http://qa-proxy.ceph.com/teuthology/yuriw-2017-06-22_20:54:27-powercycle-wip-yuri-testing2_2017_7_22-distro-b...
- 03:23 AM Bug #20389 (Won't Fix): "Error EPERM: min_compat_client jewel < luminous, which is required for p...
- Run: http://pulpito.ceph.com/yuriw-2017-06-22_23:59:13-powercycle-wip-yuri-testing2_2017_7_22-distro-basic-smithi/
J... - 03:02 PM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- ...
- 02:57 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anything i could provide or test? VMs are still crashing every night...
- 11:32 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
@sage weil, could you show me the PR refer to readahead please.
06/22/2017
- 07:57 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- These osd assertion failures reproduce consistently on shutdown in the rgw:multisite suite.
- 06:30 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anecdotally, it looks like I may be running into this very same issue (or something similar) -- occasionally I have s...
- 05:46 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Basically yes.
In src/mon/Session.h -> Subscription->next = -1 or 0.
I am learning C++ standard all the way but... - 11:01 AM Bug #20381 (New): bluestore: deferred aio submission can deadlock with completion
- Turns out when something is marked as a duplicate in redmine, it automatically closes this one when I close the other...
- 11:00 AM Bug #20381 (Duplicate): bluestore: deferred aio submission can deadlock with completion
- This ticket was opened first, but let's close it in favour of 20381 because that one has the integration test logs.
- 10:53 AM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- The backtrace looks exactly like the one in #20379 - duplicate?
- 10:41 AM Bug #20381 (Resolved): bluestore: deferred aio submission can deadlock with completion
- ...
- 11:00 AM Bug #20379 (Duplicate): bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- This ticket was opened first, but let's close it in favour of 20381 because that one has the integration test logs.
- 10:58 AM Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- Updated title to make it clear that this isn't specific to vstart
- 10:52 AM Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- Looks like the integration tests are hitting this as well.
- 09:25 AM Bug #20379 (Duplicate): bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- There's already a bug (with lots of dups) that seems to be what I'm seeing in a vstart.sh cluster. Since this bug is...
- 02:13 AM Bug #20274 (Resolved): rewind divergent deletes head whiteout
- 12:50 AM Bug #20375 (Fix Under Review): osd: omap threadpool heartbeat is only reset every 100 values
- https://github.com/ceph/ceph/pull/15823
06/21/2017
- 10:26 PM Bug #20331 (Rejected): osd/PGLog.h: 770: FAILED assert(i->prior_version == last)
- #20274 isn't merged yet, fixing it there.
- 10:20 PM Bug #20331: osd/PGLog.h: 770: FAILED assert(i->prior_version == last)
- This is fallout from 986a31f02e11d915a630cab17234ec4b8040609c, the #20274 fix. When we skip error entries the prior_...
- 10:06 PM Bug #20375 (Resolved): osd: omap threadpool heartbeat is only reset every 100 values
- This could potentially be after 100MB of reads. There's little cost to resetting the heartbeat timeout, so simple do ...
- 09:02 PM Bug #20358 (Resolved): bluestore: sharedblob not moved during split
- 08:42 PM Bug #19909 (New): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- So I didn't follow it all the way through but it sure looks to me like our acting_primary input to the crashing seque...
- 09:13 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Yes, i'm pretty sure it was 12.0.3. But, not on first boot, only after massive failures got me to stale+down PG statu...
- 07:52 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Second reported case from mailing list of VMs locking up -- they also have VMs issuing periodic discards.
- 11:57 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Shouldn't this one be flagged as a regression? It was working fine under firefly and hammer.
- 07:31 PM Bug #19943 (Resolved): osd: enoent on snaptrimmer
- 04:34 PM Bug #20169 (Fix Under Review): filestore+btrfs occasionally returns ENOSPC
- https://github.com/ceph/ceph/pull/15814
- 04:09 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- I've seen xenial and centos failures now, no trusty yet.
- 04:07 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- ...
- 04:09 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- Also in http://qa-proxy.ceph.com/teuthology/yuriw-2017-06-21_01:02:43-rgw-master_2017_6_21-distro-basic-smithi/130726...
- 03:55 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- ok, valgrind is now restricted to centos again.
- 02:49 AM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 03:46 PM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- It looks like it wasn't aggressive enough about reconnection to the mon:...
- 02:17 PM Bug #20371 (Resolved): mgr: occasional fails to send beacons (monc reconnect backoff too aggressi...
- for a while,...
- 01:48 PM Bug #20370 (New): leaked MOSDOp via PrimaryLogPG::_copy_some and PrimaryLogPG::do_proxy_write
- ...
- 01:43 PM Bug #20369 (New): segv in OSD::ShardedOpWQ::_process
- ...
- 12:01 PM Backport #20366 (In Progress): kraken: kraken-bluestore 11.2.0 memory leak issue
- 11:50 AM Backport #20366 (Resolved): kraken: kraken-bluestore 11.2.0 memory leak issue
- https://github.com/ceph/ceph/pull/15792
- 08:44 AM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
- *master PR*: https://github.com/ceph/ceph/pull/15295
*kraken backport PR*: https://github.com/ceph/ceph/pull/15792 - 02:22 AM Bug #18924 (Pending Backport): kraken-bluestore 11.2.0 memory leak issue
- 02:21 AM Bug #18924 (Fix Under Review): kraken-bluestore 11.2.0 memory leak issue
- https://github.com/ceph/ceph/pull/15792
should help - 02:34 AM Bug #20302 (Fix Under Review): "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in po...
- ...
- 02:31 AM Bug #20277 (Need More Info): bluestore crashed while performing scrub
- A bug was just fixed in the spanning blob code, see https://github.com/ceph/ceph/pull/15654. Are you able to reprodu...
- 02:23 AM Bug #20117 (Rejected): BlueStore.cc: 8585: FAILED assert(0 == "unexpected error")
- you need more log info to see what the actual error was. usually when i see this it's enospc...
- 02:12 AM Bug #19800 (Resolved): some osds are down when create a new pool and a new image of the pool (blu...
- This looks like rocksdb compaction, probably triggered in part by a big deletion. There was a recent fix to do reada...
Also available in: Atom