Activity
From 06/28/2017 to 07/27/2017
07/27/2017
- 10:40 PM Bug #20808: osd deadlock: forced recovery
- thread 3 has pg lock, tries to take recovry lock. this is old code
thread 87 has recovery lock, trying to take pg... - 10:37 PM Bug #20808 (Resolved): osd deadlock: forced recovery
- ...
- 09:25 PM Bug #20744 (Resolved): monthrash: WRN Manager daemon x is unresponsive. No standby daemons available
- 09:24 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- So is this a timing issue where the lossy connection is dead and a message gets thrown out, but then the second reply...
- 08:02 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i think the root cause is in the messenger layer. in my case, osd.1 is the primary osd. and it expects that its peer ...
- 09:00 PM Bug #20804 (Fix Under Review): CancelRecovery event in NotRecovering state
- https://github.com/ceph/ceph/pull/16638
- 08:56 PM Bug #20804: CancelRecovery event in NotRecovering state
- Easy fix is to make CancelRecovery from NotRecovering a no-op.
Unsure whether this could happen in other states be... - 08:56 PM Bug #20804 (Resolved): CancelRecovery event in NotRecovering state
- ...
- 08:52 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Finally I got some clues about the situation I'm facing. Don't know if anyone's still watching this thread.
After ... - 07:52 PM Bug #20784: rados/standalone/erasure-code.yaml failure
- Interestingly, test-erasure-eio.sh passes when run on my build machine using qa/run-standalone.sh
- 01:35 PM Bug #20784: rados/standalone/erasure-code.yaml failure
- /a/sage-2017-07-26_14:40:34-rados-wip-sage-testing-distro-basic-smithi/1447168
- 07:11 PM Bug #20793 (Fix Under Review): osd: segv in CopyFromFinisher::execute in ec cache tiering test
- Appears to be resolved under tracker ticket #20783 [1]
*PR*: https://github.com/ceph/ceph/pull/16617
[1] http:/... - 05:06 PM Bug #20793: osd: segv in CopyFromFinisher::execute in ec cache tiering test
- Perhaps fixed under tracker # 20783 since it didn't repeat under a single run locally nor under teuthology. Going to ...
- 01:26 PM Bug #20793: osd: segv in CopyFromFinisher::execute in ec cache tiering test
- /a/sage-2017-07-26_19:43:32-rados-wip-sage-testing2-distro-basic-smithi/1448238
/a/sage-2017-07-26_19:43:32-rados-wi... - 01:19 PM Bug #20793: osd: segv in CopyFromFinisher::execute in ec cache tiering test
- similar:...
- 01:17 PM Bug #20793 (Resolved): osd: segv in CopyFromFinisher::execute in ec cache tiering test
- ...
- 06:47 PM Bug #20653 (Need More Info): bluestore: aios don't complete on very large writes on xenial
- 03:18 PM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- Those last two failures are due to #20771 fixed by dfab9d9b5d75d0f87053b1a3727f62da72af6c91
I haven't been able to... - 07:39 AM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- This may be a different bug, but it appears to be bluestore causing a rados aio test to time out (with full logs save...
- 07:31 AM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- Seeing the same thing in many jobs in these runs, but not just on xenial. The first one I looked at was trusty - osd....
- 06:37 PM Bug #20803 (Resolved): ceph tell osd.N config set osd_max_backfill does not work
- ...
- 04:34 PM Bug #20798 (Can't reproduce): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- ...
- 03:23 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/yuriw-2017-07-26_16:46:49-rados-wip-yuri-testing3_2017_7_27-distro-basic-smithi/1447634
- 01:32 PM Bug #20693 (Resolved): monthrash has spurious PG_AVAILABILITY etc warnings
- 01:15 PM Bug #20783: osd: leak from do_extent_cmp
- coverity sez...
- 04:46 AM Bug #20783 (Fix Under Review): osd: leak from do_extent_cmp
- *PR*: https://github.com/ceph/ceph/pull/16617
- 07:50 AM Bug #20791 (Duplicate): crash in operator<< in PrimaryLogPG::finish_copyfrom
- OSD logs and coredump are manually saved in /a/joshd-2017-07-26_22:34:59-rados-wip-dup-ops-debug-distro-basic-smithi/...
07/26/2017
- 11:02 PM Bug #20775 (In Progress): ceph_test_rados parameter error
- 12:22 PM Bug #20775: ceph_test_rados parameter error
- https://github.com/ceph/ceph/pull/16590
- 12:21 PM Bug #20775 (Resolved): ceph_test_rados parameter error
- ...
- 06:04 PM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
- problem appears to be the message the mon sent,...
- 06:03 PM Bug #20785 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool...
- ...
- 05:28 PM Bug #20783 (In Progress): osd: leak from do_extent_cmp
- 04:49 PM Bug #20783 (Resolved): osd: leak from do_extent_cmp
- ...
- 05:01 PM Bug #20371 (Resolved): mgr: occasional fails to send beacons (monc reconnect backoff too aggressi...
- 02:28 AM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-25_20:28:21-rados-wip-sage-testing2-distro-basic-smithi/1443641
- 04:51 PM Bug #20784 (Duplicate): rados/standalone/erasure-code.yaml failure
- /a/sage-2017-07-26_14:40:34-rados-wip-sage-testing-distro-basic-smithi/1447168...
- 03:08 PM Backport #20780 (In Progress): jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 03:06 PM Backport #20780: jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/16405
The master version is going through a test run, but I'm confident it won't... - 03:04 PM Backport #20780 (Resolved): jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/16405
- 03:07 PM Backport #20781 (Rejected): kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 03:03 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/16404
- 03:02 PM Bug #20041 (Pending Backport): ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 02:55 PM Bug #20770: test_pidfile.sh test is failing 2 places
- https://github.com/ceph/ceph/pull/16587
- 01:03 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- /me has a core dump now, /me looking.
- 02:37 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i reproduced it by running
fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml objectstore/filesto... - 09:17 AM Bug #20754 (Resolved): osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_misdirecte...
- 02:32 AM Bug #20751 (Resolved): osd_state not updated properly during osd-reuse-id.sh
07/25/2017
- 10:51 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- How do you reproduce it?
- 10:49 PM Bug #20371 (Fix Under Review): mgr: occasional fails to send beacons (monc reconnect backoff too ...
- https://github.com/ceph/ceph/pull/16576
- 10:30 PM Bug #20744: monthrash: WRN Manager daemon x is unresponsive. No standby daemons available
- 10:29 PM Bug #20693 (Fix Under Review): monthrash has spurious PG_AVAILABILITY etc warnings
- https://github.com/ceph/ceph/pull/16575
- 10:21 PM Bug #20751 (Fix Under Review): osd_state not updated properly during osd-reuse-id.sh
- follow-up defensive change: https://github.com/ceph/ceph/pull/16534
- 08:39 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Still everything fine. No new hanging scrub but getting a lot of scrub pg errors which i need to repair manually. Not...
- 07:05 PM Bug #20747 (Resolved): leaked context from handle_recovery_delete
- 07:04 PM Bug #20753 (Resolved): osd/PGLog.h: 1310: FAILED assert(0 == "invalid missing set entry found")
- 05:55 PM Bug #20770 (New): test_pidfile.sh test is failing 2 places
I've seen both of these on Jenkins make check runs.
test_pidfile.sh line 55...- 10:05 AM Bug #19198 (Need More Info): Bluestore doubles mem usage when caching object content
- 10:05 AM Bug #19198: Bluestore doubles mem usage when caching object content
- Update: the unit test in attachment does show that twice the memory is used due to page-alignment inefficiencies. How...
07/24/2017
- 05:50 PM Bug #20734 (Duplicate): mon: leaks caught by valgrind
- Closing this one since it doesn't have the actual allocation traceback.
- 05:04 PM Bug #20739 (Resolved): missing deletes not excluded from pgnls results?
- https://github.com/ceph/ceph/pull/16490
- 04:56 PM Bug #20753 (Fix Under Review): osd/PGLog.h: 1310: FAILED assert(0 == "invalid missing set entry f...
- This is just a bad assert - the missing entry was added by repair....
- 03:08 PM Bug #20759 (Can't reproduce): mon: valgrind detects a few leaks
- From /a/joshd-2017-07-23_23:56:38-rados:verify-wip-20747-distro-basic-smithi/1435050/remote/smithi036/log/valgrind/mo...
- 03:04 PM Bug #20747 (Fix Under Review): leaked context from handle_recovery_delete
- https://github.com/ceph/ceph/pull/16536
- 01:58 PM Bug #20751 (In Progress): osd_state not updated properly during osd-reuse-id.sh
- Hmm, we should also ensure that UP is cleared when doing the destroy, since existing clusters may have osds that !EXI...
- 01:57 PM Bug #20751 (Resolved): osd_state not updated properly during osd-reuse-id.sh
- 02:04 AM Bug #20751 (Fix Under Review): osd_state not updated properly during osd-reuse-id.sh
- https://github.com/ceph/ceph/pull/16518
- 01:43 PM Bug #20693: monthrash has spurious PG_AVAILABILITY etc warnings
- Ok, I've addressed one soruce of this, but there is another, see
/a/sage-2017-07-24_03:44:49-rados-wip-sage-testin... - 11:41 AM Bug #20750 (Resolved): ceph tell mgr fs status: Row has incorrect number of values, (actual) 5!=6...
- 02:37 AM Bug #20754 (Fix Under Review): osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_mi...
- https://github.com/ceph/ceph/pull/16519
- 02:35 AM Bug #20754: osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_misdirected_ops)
- the pg was split in e80:...
- 02:35 AM Bug #20754 (Resolved): osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_misdirecte...
- ...
07/23/2017
- 07:08 PM Bug #20753 (Resolved): osd/PGLog.h: 1310: FAILED assert(0 == "invalid missing set entry found")
- ...
- 02:27 AM Bug #20751 (Resolved): osd_state not updated properly during osd-reuse-id.sh
- when running osd-reuse-id.sh via teuthology i reliably fail an assert about all osds support the stateful mon subscri...
- 02:12 AM Bug #20750 (Resolved): ceph tell mgr fs status: Row has incorrect number of values, (actual) 5!=6...
- ...
07/22/2017
- 06:06 PM Bug #20747 (Resolved): leaked context from handle_recovery_delete
- ...
- 03:22 AM Bug #20744 (Resolved): monthrash: WRN Manager daemon x is unresponsive. No standby daemons available
- /a/sage-2017-07-21_21:27:50-rados-wip-sage-testing-distro-basic-smithi/1427732 for latest example.
The problem app...
07/21/2017
- 08:23 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Currently it looks good. Will wait until monday to be sure.
- 08:13 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 05:20 PM Bug #20684 (Resolved): pg refs leaked when osd shutdown
- 04:43 PM Bug #20684: pg refs leaked when osd shutdown
- Honggang Yang wrote:
> https://github.com/ceph/ceph/pull/16408
merged - 04:27 PM Bug #20739 (Resolved): missing deletes not excluded from pgnls results?
- ...
- 04:00 PM Bug #20667 (Resolved): segv in cephx_verify_authorizing during monc init
- 03:59 PM Bug #20704 (Resolved): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- 02:38 PM Bug #20371 (Need More Info): mgr: occasional fails to send beacons (monc reconnect backoff too ag...
- all suites end up getting stuck for quite a while (enough to trigger the cutoff for a laggy/down mgr) somewhere durin...
- 02:35 PM Bug #20624 (Duplicate): cluster [WRN] Health check failed: no active mgr (MGR_DOWN)" in cluster log
- 02:10 PM Bug #19790: rados ls on pool with no access returns no error
- No worries, thanks for the update!
- 11:31 AM Bug #20705 (Resolved): repair_test fails due to race with osd start
- 07:37 AM Backport #20723 (In Progress): jewel: rados ls on pool with no access returns no error
- 06:22 AM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- 06:22 AM Backport #20497 (Resolved): kraken: MaxWhileTries: reached maximum tries (105) after waiting for ...
- 03:50 AM Bug #20734 (Duplicate): mon: leaks caught by valgrind
- ...
07/20/2017
- 11:47 PM Bug #20545: erasure coding = crashes
- Trying to reproduce this issue in my lab
- 11:20 PM Bug #18209 (Need More Info): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue....
- Zheng, what's the source for this bug? Any updates?
- 10:52 PM Bug #19790: rados ls on pool with no access returns no error
- Looks like we may have set the wrong state on this tracker and therefore overlooked it for the purposes of backportin...
- 08:26 PM Bug #19790 (Pending Backport): rados ls on pool with no access returns no error
- 08:03 PM Bug #19790: rados ls on pool with no access returns no error
- Thanks a lot for the fix in master/luminous, taking the liberty to follow up on this one — looks like the backport to...
- 08:52 PM Bug #20730: need new OSD_SKEWED_USAGE implementation
- see https://github.com/ceph/ceph/pull/16461
- 08:51 PM Bug #20730 (New): need new OSD_SKEWED_USAGE implementation
- I've removed the OSD_SKEWED_USAGE implementation because it isn't smart enough:
1. It doesn't understand different... - 08:30 PM Bug #20704 (Fix Under Review): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- https://github.com/ceph/ceph/pull/16459
- 08:08 PM Bug #20704: osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- This was a bug in persisting the missing state during split. Building a fix.
- 07:48 PM Bug #20704 (In Progress): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- Found a bug in my ceph-objectstore-tool change that could cause this, seeing if it did in this case.
- 03:26 PM Bug #20704 (Resolved): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- ...
- 08:28 PM Backport #20723 (Resolved): jewel: rados ls on pool with no access returns no error
- https://github.com/ceph/ceph/pull/16473
- 08:28 PM Backport #20722 (Rejected): kraken: rados ls on pool with no access returns no error
- 03:58 PM Bug #20667 (Fix Under Review): segv in cephx_verify_authorizing during monc init
- https://github.com/ceph/ceph/pull/16455
I think we *also* need to fix the root cause, though, in commit bf49385679... - 03:25 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- this time with a core...
- 02:52 AM Bug #20667: segv in cephx_verify_authorizing during monc init
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419306
/a/sage-2017-07-19_15:27:16-rados-wi... - 03:42 PM Bug #20705 (Fix Under Review): repair_test fails due to race with osd start
- https://github.com/ceph/ceph/pull/16454
- 03:40 PM Bug #20705 (Resolved): repair_test fails due to race with osd start
- ...
- 03:40 PM Feature #15835: filestore: randomize split threshold
- I spoke too soon, there is significantly improved latency and throughput in longer running tests on several osds.
- 02:54 PM Bug #19939 (Resolved): OSD crash in MOSDRepOpReply::decode_payload
- 02:34 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
- /a/kchai-2017-07-20_03:05:27-rados-wip-kefu-testing-distro-basic-mira/1422161
$ zless remote/mira104/log/ceph-osd.... - 02:53 AM Bug #20694 (Can't reproduce): osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_lo...
- ...
- 10:09 AM Bug #20690: Cluster status is HEALTH_OK even though PGs are in unknown state
- This log excerpt illustrates the problem: https://paste2.org/cne4IzG1
The logs starts immediately after cephfs dep... - 04:54 AM Bug #20645: bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- sorry for not post the version, the assert occured in v12.0.2. maybe its similar with #18054, but i think they are di...
- 03:02 AM Bug #20105 (Resolved): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- 03:01 AM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419525
- 02:51 AM Bug #20693 (Resolved): monthrash has spurious PG_AVAILABILITY etc warnings
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419393
no osd thrashing, but not fully pe... - 02:49 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419390
07/19/2017
- 09:29 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Updatet two of my clusters - will report back. Thanks again.
- 06:11 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Yes i'm - builing right now. But it will take some time to publish that one to the clusters.
- 07:59 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
- 07:53 PM Bug #19971: osd: deletes are performed inline during pg log processing
- merged https://github.com/ceph/ceph/pull/15952
- 06:32 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- /a/yuriw-2017-07-18_19:38:33-rados-wip-yuri-testing3_2017_7_19-distro-basic-smithi/1413393
/a/yuriw-2017-07-18_19:38... - 03:46 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- Another instance, this time jewel:...
- 05:55 PM Bug #20684: pg refs leaked when osd shutdown
- Nice debugging and presentation of your analysis! That's my favorite kind of bug report!
- 03:11 PM Bug #20684 (Fix Under Review): pg refs leaked when osd shutdown
- 03:12 AM Bug #20684: pg refs leaked when osd shutdown
- https://github.com/ceph/ceph/pull/16408
- 03:08 AM Bug #20684 (Resolved): pg refs leaked when osd shutdown
- h1. 1. summary
When kicking a pg, its ref count is great than 1, this cause assert failed.
When osd is in proce... - 04:54 PM Bug #20690 (Need More Info): Cluster status is HEALTH_OK even though PGs are in unknown state
- In an automated test, we see PGs in unknown state, yet "ceph -s" reports HEALTH_OK. The test sees HEALTH_OK and proce...
- 03:16 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- can you retset on current master? this is pretty old code. please reopen if the bug is still present.
- 03:16 PM Support #20648 (Closed): odd osd acting set
- You have three hosts and want to replicate across those domains. It can't do that when one host goes down, so it's do...
- 03:02 PM Bug #20666 (Resolved): jewel -> luminous upgrade doesn't update client.admin mgr cap
- 01:28 PM Bug #19939 (Fix Under Review): OSD crash in MOSDRepOpReply::decode_payload
- https://github.com/ceph/ceph/pull/16421
- 11:55 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- occasionally, i see ...
- 11:15 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- MSODRepOpReply is always sent by OSD.
core dump from osd.1... - 12:49 PM Bug #19605 (New): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i can reproduce this...
- 03:04 AM Bug #20243 (Fix Under Review): Improve size scrub error handling and ignore system attrs in xattr...
- 02:39 AM Bug #20646: run_seed_to_range.sh: segv, tp_fstore_op timeout
- http://pulpito.ceph.com/sage-2017-07-18_16:17:27-rados-master-distro-basic-smithi/
hmm, i think this got fixe din ... - 02:36 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- http://pulpito.ceph.com/sage-2017-07-18_19:06:10-rados-master-distro-basic-smithi/
failed 19/90 - 01:18 AM Feature #15835 (Resolved): filestore: randomize split threshold
- Perf testing is not indicating much benefit, so I'd hold off on backporting this.
07/18/2017
- 10:34 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Stefan A patch for Jewel (current on current jewel branch) is can be found here:
https://github.com/ceph/ceph/pul... - 10:20 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Analysis:
Secondary got scrub map request with scrub_to 1748'25608...- 06:19 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @David
That would be so great! I'm happy to test any patch ;-) - 04:54 PM Bug #20041 (In Progress): ceph-osd: PGs getting stuck in scrub state, stalling RBD
I think I've reproduced this, examining logs.- 09:43 PM Bug #20105 (Fix Under Review): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 fa...
- https://github.com/ceph/ceph/pull/16402
- 08:37 PM Feature #20664 (Closed): compact OSD's omap before active
- This exists as leveldb_compact_on_mount. It may not have functioned in all releases but has been present since Januar...
- 12:03 PM Feature #20664 (Closed): compact OSD's omap before active
- current, we have supported mon_compact_on_start. does it make sense to add this feature to OSD.
likes:... - 08:14 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- We set it to 1 if the MSODRepOpReply is encoded with features that do not contain SERVER_LUMINOUS.
...which I thin... - 09:07 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i found that the header.version of the MOSDRepOpReply message being decoded was 1. but i am using a vstart cluster fo...
- 05:44 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i am able to reproduce this issue using qa/workunits/fs/snaps/untar_snap_rm.sh. but not always...
- 06:04 PM Bug #20666: jewel -> luminous upgrade doesn't update client.admin mgr cap
- 03:34 PM Bug #20666 (Fix Under Review): jewel -> luminous upgrade doesn't update client.admin mgr cap
- https://github.com/ceph/ceph/pull/16395
- 01:23 PM Bug #20666: jewel -> luminous upgrade doesn't update client.admin mgr cap
- Hmm, I suspect the issue is with the bootstrap-mgr keyring. I notice
that when trying a "mgr create" on an upgraded... - 01:22 PM Bug #20666 (Resolved): jewel -> luminous upgrade doesn't update client.admin mgr cap
- ...
- 01:40 PM Bug #20605 (Resolved): luminous mon lacks force_create_pg equivalent
- 01:38 PM Bug #20667 (Resolved): segv in cephx_verify_authorizing during monc init
- ...
- 08:23 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- lower the priority since we haven't spotted it for a while.
- 05:33 AM Bug #20625 (Duplicate): ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
07/17/2017
- 08:10 PM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- ...
- 08:08 PM Bug #20653 (Can't reproduce): bluestore: aios don't complete on very large writes on xenial
- ...
- 03:05 PM Bug #20631 (Resolved): OSD needs restart after upgrade to luminous IF upgraded before a luminous ...
- 02:05 PM Bug #20631: OSD needs restart after upgrade to luminous IF upgraded before a luminous quorum
- 02:05 PM Bug #20605: luminous mon lacks force_create_pg equivalent
- 12:15 PM Bug #20602 (Resolved): mon crush smoke test can time out under valgrind
- 11:12 AM Bug #20625: ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- tried to reproduce on btrfs locally, no luck.
- 03:00 AM Bug #20625: ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- ...
- 02:41 AM Support #20648 (Closed): odd osd acting set
- I have three host.
When I set one of them down.
I got something like this.... - 02:21 AM Bug #20646 (New): run_seed_to_range.sh: segv, tp_fstore_op timeout
- ...
07/16/2017
- 09:41 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Uh, I don't think master branch has this problem. Since "list-snaps"'s result has been moved from ObjectContext::obs....
- 09:24 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- But I'm working on it.
- 08:54 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Sorry, as the related source code has been reconstructed and I haven't test this for the master branch, I can't judge...
- 08:03 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Thanks for the jewel-specific fix. Has the bug been declared fixed in master, though?
- 07:26 AM Backport #17445 (Fix Under Review): jewel: list-snap cache tier missing promotion logic (was: rbd...
- 06:34 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- It seems that ReplicatedPG::do_op's code of "master" branch has been totally reconstructed, so I submitted a pull req...
- 08:09 AM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
it seems like alloc hint equal end of wal-bdev, but the begin of the wal-bdev is still in use...
my wal-bdev si...
07/15/2017
- 07:49 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Jason: *argh* yes this seems to be correct.
So it seems i didn't have any logs. Currently no idea how to generate... - 07:31 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Stefan: just for clarification, I believe the gpg-encrypted ceph-post-file dump was the gcore of the OSD and a Debia...
- 06:38 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Hello @david,
the best logs i could produce with level 20 i sent to @Jason Dillaman 2 month ago (pgp encrypted). R... - 07:32 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Definitely sounds like it could be the root-cause to me. Thanks for the investigation help.
- 02:48 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- I encountered the same promblem.
I debugged a little, and found that this might have something to do with the "cache... - 02:34 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- I encountered the same promblem.
I debugged a little, and found that this might have something to do with the "cache... - 08:27 AM Bug #20605 (Fix Under Review): luminous mon lacks force_create_pg equivalent
- https://github.com/ceph/ceph/pull/16353
07/14/2017
- 11:01 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Stefan Priebe wrote:
> Anything i could provide or test? VMs are still crashing every night...
Can you reproduce ... - 09:51 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Based on a the earlier information:
subset_last_update = {
version = 20796861,
epoch = 453051,
...- 08:32 PM Backport #20638 (In Progress): kraken: EPERM: cannot set require_min_compat_client to luminous: 6...
- 08:22 PM Backport #20638 (Need More Info): kraken: EPERM: cannot set require_min_compat_client to luminous...
- Now I'm not sure
- 08:11 PM Backport #20638 (In Progress): kraken: EPERM: cannot set require_min_compat_client to luminous: 6...
- 08:10 PM Backport #20638 (Resolved): kraken: EPERM: cannot set require_min_compat_client to luminous: 6 co...
- https://github.com/ceph/ceph/pull/16342
- 08:31 PM Backport #20639 (In Progress): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 ...
- 08:23 PM Backport #20639 (Need More Info): jewel: EPERM: cannot set require_min_compat_client to luminous:...
- Not sure if the PR really fixes this bug
- 08:12 PM Backport #20639 (In Progress): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 ...
- 08:10 PM Backport #20639 (Resolved): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 con...
- https://github.com/ceph/ceph/pull/16343
- 08:09 PM Bug #20546 (Resolved): buggy osd down warnings by subtree vs crush device classes
- 03:57 PM Bug #20602 (Fix Under Review): mon crush smoke test can time out under valgrind
- 03:52 PM Bug #20602: mon crush smoke test can time out under valgrind
- Valgrind is slow to do the fork and cleanup; that's why we keep timing out. Blame e189f11fcde6829cc7f86894b913bc1a3f...
- 03:31 PM Bug #20602: mon crush smoke test can time out under valgrind
- Valgrind is slow to do the fork and cleanup; that's why we keep timing out. Blame e189f11fcde6829cc7f86894b913bc1a3f...
- 01:57 PM Bug #20602: mon crush smoke test can time out under valgrind
- A simple workaround would be to make a 'mon smoke test crush changes' option and turn it off when using valgrind.. wh...
- 02:55 AM Bug #20602: mon crush smoke test can time out under valgrind
- /a/kchai-2017-07-13_18:13:10-rados-wip-kefu-testing-distro-basic-smithi/1396642
rados/singleton-nomsgr/{all/valgri... - 02:51 AM Bug #20602: mon crush smoke test can time out under valgrind
- /a/sage-2017-07-13_20:38:15-rados-wip-sage-testing-distro-basic-smithi/1397207
that's two consecutive runs for me.. - 03:31 PM Bug #20601 (Duplicate): mon comamnds time out due to pool create backlog w/ valgrind
- ok, the problem is that the fork-based crushtool test is very slow under valgrind (valgrind has to do init/cleanup on...
- 03:23 PM Bug #20601: mon comamnds time out due to pool create backlog w/ valgrind
- It isn't that pool creations are serialized, actually; they are already batched. Maybe valgrind is just making it sl...
- 02:51 AM Bug #20601: mon comamnds time out due to pool create backlog w/ valgrind
- another failure with same cause, different symptom: this time a 'osd out 0' timed out due to a bunch of pool creates....
- 03:19 PM Bug #20475 (Pending Backport): EPERM: cannot set require_min_compat_client to luminous: 6 connect...
- https://github.com/ceph/ceph/pull/16340 merged to master
backports for kraken and jewel:
https://github.com/ceph/... - 02:05 PM Bug #20475 (In Progress): EPERM: cannot set require_min_compat_client to luminous: 6 connected cl...
- 03:04 AM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- ok, smithi083 was (is!) locked by
/home/teuthworker/archive/teuthology-2017-07-13_05:10:02-fs-kraken-distro-basic-... - 02:57 AM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- baddy is...
- 02:56 PM Bug #20631 (Fix Under Review): OSD needs restart after upgrade to luminous IF upgraded before a l...
- https://github.com/ceph/ceph/pull/16341
- 02:42 PM Bug #20631 (Resolved): OSD needs restart after upgrade to luminous IF upgraded before a luminous ...
- If an OSD is upgraded to luminous before the monmap has the luminous feature, it will require to be restarted before ...
- 09:51 AM Fix #20627 (New): Clean config special cases out of common_preinit
- Post-https://github.com/ceph/ceph/pull/16211, we should use set_daemon_default for this:...
- 03:16 AM Bug #20600 (Resolved): 'ceph pg set_full_ratio ...' blocks on luminous
- 03:15 AM Bug #20617 (Resolved): Exception: timed out waiting for mon to be updated with osd.0: 0 < 4724464...
- 03:14 AM Bug #20626 (Can't reproduce): failed to become clean before timeout expired, pgs stuck unknown
- ...
- 02:50 AM Bug #20625 (Duplicate): ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- ...
- 02:30 AM Bug #20624 (Duplicate): cluster [WRN] Health check failed: no active mgr (MGR_DOWN)" in cluster log
- mgr.x...
07/13/2017
- 04:40 PM Bug #20546: buggy osd down warnings by subtree vs crush device classes
- https://github.com/ceph/ceph/pull/16221
- 02:30 PM Bug #20602: mon crush smoke test can time out under valgrind
- /a/sage-2017-07-12_19:30:01-rados-wip-sage-testing-distro-basic-smithi/1392270
rados/singleton-nomsgr/{all/valgrind-... - 02:17 PM Bug #20617 (Fix Under Review): Exception: timed out waiting for mon to be updated with osd.0: 0 <...
- https://github.com/ceph/ceph/pull/16322
- 02:14 PM Bug #20617 (Resolved): Exception: timed out waiting for mon to be updated with osd.0: 0 < 4724464...
- ...
- 02:16 PM Bug #20616: pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is set but no...
- This can't be reproduced with 12.1.0. So this have been fixed in the meantime.
- 01:48 PM Bug #20616 (Resolved): pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is...
- Hi,
In Gnocchi, with use the python-rados API and we recently encounter some data corruption when "rados_osd_op_ti...
07/12/2017
- 10:30 PM Bug #20605 (Resolved): luminous mon lacks force_create_pg equivalent
- This was part of the now-defunct PGMonitor. Also, pg creation is totally different now.
Create new 'osd force-cre... - 09:19 PM Bug #20332: rados bench seq option doesn't work
- Yeah so IIRC it will stop after the specified time, but if it runs out of data that's it. I suppose it could loop? No...
- 06:54 PM Bug #20332: rados bench seq option doesn't work
- I think the bug here is that the specified seconds isn't honored in the "seq" case. It probably reads every object o...
- 03:24 PM Bug #20332 (Need More Info): rados bench seq option doesn't work
- 07:16 PM Bug #20600 (Fix Under Review): 'ceph pg set_full_ratio ...' blocks on luminous
- https://github.com/ceph/ceph/pull/16300
- 03:09 PM Bug #20600 (In Progress): 'ceph pg set_full_ratio ...' blocks on luminous
- 01:32 PM Bug #20600: 'ceph pg set_full_ratio ...' blocks on luminous
- This actually affects any command that remains in mon, not just "pg set_full_ratio".
- 01:22 PM Bug #20600 (Resolved): 'ceph pg set_full_ratio ...' blocks on luminous
- 06:55 PM Bug #20041 (Need More Info): ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 06:27 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Was the osd log saved for the primary of a stuck PG in this state? Can this be reproduced and provide an osd log?
- 05:57 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Another one: /ceph/teuthology-archive/pdonnell-2017-07-07_20:24:01-fs-wip-pdonnell-20170706-distro-basic-smithi/13723...
- 05:14 PM Bug #20546 (Fix Under Review): buggy osd down warnings by subtree vs crush device classes
- 03:43 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Since this flag is set all the time now, it (and the require_x_osds flags) aren't shown by default. Does it appear in...
- 03:33 PM Bug #20545: erasure coding = crashes
- So this looks like you're just killing the cluster by overflowing it with infinite IO. The crash is distressing, though.
- 03:32 PM Bug #20545: erasure coding = crashes
- From the log the backtrace is:...
- 03:31 PM Bug #20552: "Scrubbing terminated -- not all pgs were active and clean." error in rados
- this was fixed a few days ago (there were too few osds in the test yaml)
- 03:30 PM Bug #20552 (Resolved): "Scrubbing terminated -- not all pgs were active and clean." error in rados
- 03:21 PM Bug #20507 (Duplicate): "[WRN] Manager daemon x is unresponsive. No standby daemons available." i...
- 03:18 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- The fact that the error stopped when cinder was stopped makes me thing this was related to in-flight requests from th...
- 03:18 PM Bug #18746 (Need More Info): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (j...
- 03:12 PM Bug #20562 (Resolved): Monitor's "perf dump cluster" values are no longer maintained
- 01:41 PM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-12_02:31:06-rbd-wip-health-distro-basic-smithi/1389750
this is about to trigger more test failures... - 01:28 PM Bug #20602 (Resolved): mon crush smoke test can time out under valgrind
- /a/sage-2017-07-12_02:32:14-rados-wip-sage-testing-distro-basic-smithi/1390174
rados/singleton-nomsgr/{all/valgrind-... - 01:27 PM Bug #20601 (Duplicate): mon comamnds time out due to pool create backlog w/ valgrind
- This isn't wrong per se, but it does mean worklaods with lots of pool creates (parallel rados api tests) and slow mon...
07/11/2017
- 09:53 PM Bug #20590 (Duplicate): 'sudo ceph --cluster ceph osd new xx" no valid command in upgrade:jewel-x...
- 09:41 PM Bug #20590 (Duplicate): 'sudo ceph --cluster ceph osd new xx" no valid command in upgrade:jewel-x...
- Run: http://pulpito.ceph.com/teuthology-2017-07-11_04:23:04-upgrade:jewel-x-master-distro-basic-smithi/
Jobs: '13854... - 06:48 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- https://github.com/ceph/ceph/pull/16265
- 06:46 PM Bug #20470 (Resolved): rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- 05:09 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
For some reason pg 2.0 is created on osd.0 which never happened previously....- 02:42 PM Bug #20561: bluestore: segv in _deferred_submit_unlock from deferred_try_submit, _txc_finish
- This might be related to a failure reported on the list:...
- 02:16 PM Feature #12195 (Resolved): 'ceph osd version' to print OSD versions
- We now have a 'ceph osd versions' that will return the versions of osds in the cluster. At first sight it seems it do...
- 01:57 PM Feature #5657 (Resolved): monitor: deal with bad crush maps more gracefully
- Resolved at some point by using external crushtool to validate crushmaps.
- 01:54 PM Feature #6325 (New): mon: mon_status should make it clear when the mon has connection issues
- 01:52 PM Feature #4835 (Resolved): Monitor: better handle aborted synchronizations
- The synchronization code has been overhauled a few times in the past few years. I believe this to have been resolved ...
- 01:50 PM Cleanup #10506: mon: get rid of QuorumServices
- I don't think the QuorumService interface is bringing enough to the table to keep it around.
What we are achieving... - 04:25 AM Bug #20504 (Resolved): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- 04:23 AM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- not (always) reproducible with a single try: http://pulpito.ceph.com/kchai-2017-07-11_03:53:32-rados-master-distro-ba...
- 03:49 AM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- partial bt of /a/sage-2017-07-10_16:55:37-rados-wip-sage-testing-distro-basic-smithi/1383143:...
- 03:58 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- filed #20566 against fs for "Behind on trimming" warnings from MDS
- 02:09 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- http://pulpito.ceph.com/kchai-2017-07-10_10:29:54-powercycle-master-distro-basic-smithi/ failed with...
07/10/2017
- 09:50 PM Bug #20433 (Resolved): 'mon features' does not update properly for mons
- 09:47 PM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- /a/sage-2017-07-10_16:55:37-rados-wip-sage-testing-distro-basic-smithi/1383143
similar, but a seg fault!... - 08:23 PM Bug #20562 (Fix Under Review): Monitor's "perf dump cluster" values are no longer maintained
- https://github.com/ceph/ceph/pull/16249
- 08:11 PM Bug #20562 (In Progress): Monitor's "perf dump cluster" values are no longer maintained
- ...
- 05:20 PM Bug #20562 (Resolved): Monitor's "perf dump cluster" values are no longer maintained
- We have a PerfCounters collection in the monitor which maintains cluster aggregate data like storage space available,...
- 08:11 PM Bug #20563 (Duplicate): mon: fix clsuter-level perfcounters to pull from PGMapDigest
- Sage Weil wrote:
> [...]
- 08:00 PM Bug #20563 (Duplicate): mon: fix clsuter-level perfcounters to pull from PGMapDigest
- ...
- 05:29 PM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- I don't think we run the powercycle tests very often — they're hard on the hardware. This may not really be an immedi...
- 10:19 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- have we spotted this problem recently after the first occurrence?
rerunning at http://pulpito.ceph.com/kchai-2017-... - 04:51 PM Bug #20561 (Can't reproduce): bluestore: segv in _deferred_submit_unlock from deferred_try_submit...
- ...
- 10:14 AM Bug #20525 (Need More Info): ceph osd replace problem with osd out
- 10:14 AM Bug #20525: ceph osd replace problem with osd out
- peng,
i don't follow you. could you rephrase your problem? what is the expected behavior? - 05:27 AM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- /a//sage-2017-07-09_19:14:46-rados-wip-sage-testing-distro-basic-smithi/1379319
- 03:02 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i will look at this issue again later on if no progress has been made before then.
07/09/2017
- 06:34 PM Bug #20545: erasure coding = crashes
- Sorry, forgot a line of the code. Here's the exact process I'm using to do this:
Shell:... - 06:27 PM Bug #20545: erasure coding = crashes
- I ran Rados bench on the same cluster and it seems to be working fine, so it seems that something about my Python cod...
- 05:49 PM Bug #20545: erasure coding = crashes
- Actually I thought to test this with filestore on BTRFS and it fails there in the same way as well. This seems to be ...
- 06:14 PM Bug #20446 (Resolved): mon does not let you create crush rules using device classes
- 11:39 AM Bug #20433: 'mon features' does not update properly for mons
- https://github.com/ceph/ceph/pull/16230
- 11:38 AM Bug #20433 (Fix Under Review): 'mon features' does not update properly for mons
- 02:40 AM Bug #17743 (Won't Fix): ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (k...
- see https://github.com/ceph/ceph/pull/16215 (disabled the memstore tests on kraken)
07/08/2017
- 09:15 PM Bug #20543: osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry found") in PGLog::re...
- also in yuriw-2017-07-07_22:19:55-rados-wip-yuri-testing2_2017_7_9-distro-basic-smithi
job: 1373063 - 02:01 PM Bug #19964 (Resolved): occasional crushtool timeouts
- 02:21 AM Bug #19815: Rollback/EC log entries take gratuitous amounts of memory
- It seemed that this bug has been fixed at version 12.1.0.
https://github.com/ceph/ceph/commit/9da684316630ac1c087e...
07/07/2017
- 10:13 PM Bug #20552 (Resolved): "Scrubbing terminated -- not all pgs were active and clean." error in rados
- Run: http://pulpito.ceph.com/yuriw-2017-07-06_20:01:14-rados-wip-yuri-testing3_2017_7_8-distro-basic-smithi/
Job: 13... - 10:11 PM Bug #20551 (Duplicate): LOST_REVERT assert during rados bench+thrash in ReplicatedBackend::prepar...
- From osd.0 in:
http://pulpito.ceph.com/yuriw-2017-07-06_20:01:14-rados-wip-yuri-testing3_2017_7_8-distro-basic-smi... - 09:44 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
- 04:22 PM Bug #20471: Can't repair corrupt object info due to bad oid on all replicas
- 08:39 PM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- Hmm, seems like that might slow stuff down enough to make it an unrealistic model, so probably not something we shoul...
- 03:50 AM Bug #20303 (Need More Info): filejournal: Unable to read past sequence ... journal is corrupt
- The logs end long before the event in question. I think in order for us to gather more useful logs for the powercycl...
- 08:37 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- What info do we need if this is reproducing with nightly logging?
- 03:45 AM Bug #20475 (Need More Info): EPERM: cannot set require_min_compat_client to luminous: 6 connected...
- 06:42 PM Bug #20546 (Resolved): buggy osd down warnings by subtree vs crush device classes
- The subtree-based down (host down etc) messages appear to be confused by the shadow hieararchy from crush device clas...
- 05:43 PM Bug #20545 (Duplicate): erasure coding = crashes
- Steps to reproduce:
* Create 4 OSDs and a mon on a machine (4TB disk per OSD, Bluestore, using dm-crypt too), usi... - 03:39 PM Bug #19964: occasional crushtool timeouts
- 03:38 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- https://github.com/ceph/ceph/pull/16215 ?
- 03:36 PM Bug #20454 (Resolved): bluestore: leaked aios from internal log
- 03:35 PM Bug #20434 (Resolved): mon metadata does not include ceph_version
- 03:13 PM Bug #20543 (Can't reproduce): osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry fo...
- ...
- 03:08 PM Bug #20534 (Resolved): unittest_direct_messenger segv
- 08:08 AM Bug #20534 (Fix Under Review): unittest_direct_messenger segv
- 02:42 PM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 05:49 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16201
i swear: this is the last PR for this ticket! - 02:22 AM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 03:46 AM Bug #20381 (Resolved): bluestore: deferred aio submission can deadlock with completion
- https://github.com/ceph/ceph/pull/16051 merged
- 02:35 AM Bug #19518: log entry does not include per-op rvals?
- https://github.com/ceph/ceph/pull/16196 disables the assertion until we fix this bug.
07/06/2017
- 09:54 PM Bug #20326: Scrubbing terminated -- not all pgs were active and clean.
- Saw this error here:
/ceph/teuthology-archive/pdonnell-2017-07-01_01:07:39-fs-wip-pdonnell-20170630-distro-basic-s... - 09:19 PM Bug #20534: unittest_direct_messenger segv
- was able to reproduce with:...
- 07:37 PM Bug #20534 (Resolved): unittest_direct_messenger segv
- ...
- 02:34 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 09:20 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16159
- 06:36 AM Bug #20432: pgid 0.7 has ref count of 2
- at the end of @OSD::process_peering_events()@, @dispatch_context(rctx, 0, curmap, &handle)@ is called, which just del...
- 10:30 AM Backport #20511 (In Progress): jewel: cache tier osd memory high memory consumption
- 10:19 AM Backport #20492 (In Progress): jewel: osd: omap threadpool heartbeat is only reset every 100 values
- 04:27 AM Feature #20526: swap-bucket can save the crushweight and osd weight?
- it not a bug just a need feature
- 04:25 AM Feature #20526 (New): swap-bucket can save the crushweight and osd weight?
- i test the swap-bucket function,and have some advice
when use swap-bucket the dst bucket will in the old crush tre... - 03:20 AM Bug #20525 (Need More Info): ceph osd replace problem with osd out
- i have try the new function of replace the osd with new command ,it work, but i have some problem,i don't know if it'...
- 02:30 AM Bug #20434 (Fix Under Review): mon metadata does not include ceph_version
- https://github.com/ceph/ceph/pull/16148 ?
07/05/2017
- 08:05 PM Bug #18924 (Resolved): kraken-bluestore 11.2.0 memory leak issue
- 08:05 PM Backport #20366 (Resolved): kraken: kraken-bluestore 11.2.0 memory leak issue
- 07:48 PM Bug #20434: mon metadata does not include ceph_version
- ...
- 05:42 PM Backport #20512 (Rejected): kraken: cache tier osd memory high memory consumption
- 05:42 PM Backport #20511 (Resolved): jewel: cache tier osd memory high memory consumption
- https://github.com/ceph/ceph/pull/16169
- 04:15 PM Bug #20454: bluestore: leaked aios from internal log
- 03:34 PM Bug #20507 (Duplicate): "[WRN] Manager daemon x is unresponsive. No standby daemons available." i...
- /a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356209
rados/monthrash/{ceph.yaml clusters... - 03:33 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- /a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356174
rados/singleton-bluestore/{all/ceph... - 11:33 AM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 08:08 AM Bug #20432: pgid 0.7 has ref count of 2
- /a/kchai-2017-07-05_04:38:56-rados-wip-kefu-testing2-distro-basic-mira/1363113...
- 10:52 AM Feature #5249 (Resolved): mon: support leader election configuration
- 07:04 AM Bug #20464 (Pending Backport): cache tier osd memory high memory consumption
- 07:02 AM Bug #20464 (Resolved): cache tier osd memory high memory consumption
- 06:45 AM Bug #20504 (Fix Under Review): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- https://github.com/ceph/ceph/pull/16120
- 06:23 AM Bug #20504 (Resolved): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- h1. 1. description
[root@yhg-1 work]# file 1498638564.27426.core ...
07/04/2017
- 05:51 PM Backport #20497 (In Progress): kraken: MaxWhileTries: reached maximum tries (105) after waiting f...
- 05:34 PM Backport #20497 (Resolved): kraken: MaxWhileTries: reached maximum tries (105) after waiting for ...
- https://github.com/ceph/ceph/pull/16111
- 05:34 PM Bug #20397 (Pending Backport): MaxWhileTries: reached maximum tries (105) after waiting for 630 s...
- 05:09 PM Bug #20433 (In Progress): 'mon features' does not update properly for mons
- 04:46 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- Happened on another kraken backport: https://github.com/ceph/ceph/pull/16108
- 08:33 AM Backport #20493 (Rejected): kraken: osd: omap threadpool heartbeat is only reset every 100 values
- 08:33 AM Backport #20492 (Resolved): jewel: osd: omap threadpool heartbeat is only reset every 100 values
- https://github.com/ceph/ceph/pull/16167
- 07:50 AM Bug #20491: objecter leaked OSDMap in handle_osd_map
- * /a/kchai-2017-07-04_06:08:32-rados-wip-20432-kefu-distro-basic-mira/1359525/remote/mira038/log/valgrind/osd.0.log.g...
- 05:46 AM Bug #20491 (Resolved): objecter leaked OSDMap in handle_osd_map
- ...
- 07:07 AM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 05:49 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16093
- 06:46 AM Bug #20375 (Pending Backport): osd: omap threadpool heartbeat is only reset every 100 values
- 05:35 AM Bug #19695: mon: leaked session
- /a/kchai-2017-07-04_04:14:45-rados-wip-20432-kefu-distro-basic-mira/1357985/remote/mira112/log/valgrind/mon.a.log.gz
- 02:59 AM Bug #20434: mon metadata does not include ceph_version
- Here it is the new output I get from a brand new installed cluster: ...
07/03/2017
- 03:58 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 10:51 AM Bug #20432: pgid 0.7 has ref count of 2
- seems @PG::recovery_queued@ is reset somehow after being set in @PG::queue_recovery()@, but the PG is not removed fro...
- 05:12 AM Bug #20432: pgid 0.7 has ref count of 2
- @Sage,
i reverted the changes introduced by 0780f9e67801f400d78ac704c65caaa98e968bbc and tested the verify test at... - 02:20 AM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 03:29 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- Those look to be 22 and 60, which are DEFINE_CEPH_FEATURE_RETIRED(22, 1, BACKFILL_RESERVATION, JEWEL, LUMINOUS) and D...
- 01:44 PM Documentation #20486: Document how to use bluestore compression
- Joao Luis wrote:
> The bits I found out were through skimming the code, and that did not provide too much insight ... - 01:05 PM Documentation #20486 (Resolved): Document how to use bluestore compression
- Bluestore is becoming the de facto default, and I haven't found any docs on how to configure compression.
The bits...
07/02/2017
- 06:52 PM Bug #20432: pgid 0.7 has ref count of 2
- I suspect 0780f9e67801f400d78ac704c65caaa98e968bbc, which changed when the CLEAN flag was set at the end of recovery.
- 06:51 PM Bug #20432: pgid 0.7 has ref count of 2
- bisecting this... so far i've narrowed it down to something between f43c5fa055386455a263802b0908ddc96a95b1b0 and e972...
- 01:04 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
07/01/2017
- 03:06 PM Bug #20432: pgid 0.7 has ref count of 2
- http://pulpito.ceph.com/kchai-2017-06-30_10:58:17-rados-wip-20432-kefu-distro-basic-smithi/
- 02:52 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- This test confuses me. It seems like the PG is always going to exist on the target osd.. why was it passing before?
- 02:17 PM Bug #20476: ops stuck waiting_for_map
- Trying to reproduce with same commit, more debugging, at http://pulpito.ceph.com/sage-2017-07-01_14:16:23-rados-wip-s...
- 02:08 PM Bug #20476 (Can't reproduce): ops stuck waiting_for_map
- observed many ops hung with waiting_for_map
made a dummy map update ('ceph osd unset nodown')
ops unblocked
... - 01:47 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- I've seen this at least twice now. It is not an upgrade test, so either unauthenticated clients that are strays in t...
- 01:46 PM Bug #20475 (Resolved): EPERM: cannot set require_min_compat_client to luminous: 6 connected clien...
- ...
- 06:35 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- WANG Guoqin wrote:
> Which IRC was that and do you have a chatting log on that?
https://gist.githubusercontent.co... - 06:10 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- sean redmond wrote:
> https://pastebin.com/raw/xmDPg84a was talked about in IRC by @mguz it seems it maybe related b... - 02:16 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-06-30_18:42:09-rados-wip-sage-testing-distro-basic-smithi/1345981
06/30/2017
- 11:28 PM Bug #20471 (Fix Under Review): Can't repair corrupt object info due to bad oid on all replicas
- https://github.com/ceph/ceph/pull/16052
- 11:03 PM Bug #20471 (In Progress): Can't repair corrupt object info due to bad oid on all replicas
- ...
- 05:24 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
We detect a kind of corruption where the oid in the object info doesn't match the oid of the object. This was adde...- 10:34 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- https://pastebin.com/raw/xmDPg84a was talked about in IRC by @mguz it seems it maybe related but this was kraken, jus...
- 03:25 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
- There was a lot of code churn around the 12.0.3 time period so this isn't too surprising to me. I'm not sure it's wo...
- 09:24 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- 09:03 PM Bug #20454: bluestore: leaked aios from internal log
- https://github.com/ceph/ceph/pull/16051 is a better fix
- 09:01 PM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- failure seems to be gone with the timeout change.
- 03:35 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- https://github.com/ceph/ceph/pull/16047
- 03:35 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- Easy workaround is to make the aio queue really big.
Harder fix to do some complicated locking juggling. I worry ... - 03:31 PM Bug #20277 (Can't reproduce): bluestore crashed while performing scrub
- 03:30 PM Cleanup #18734 (Resolved): crush: transparently deprecated ruleset/ruleid difference
- 03:30 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 03:29 PM Bug #20446: mon does not let you create crush rules using device classes
- see https://github.com/ceph/ceph/pull/16027
- 02:06 PM Bug #20470 (Resolved): rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- ...
- 01:51 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-06-30_05:44:03-rados-wip-sage-testing-distro-basic-smithi/1344959...
- 06:54 AM Bug #20432: pgid 0.7 has ref count of 2
- rerunning at http://pulpito.ceph.com/kchai-2017-06-30_06:49:46-rados-master-distro-basic-smithi/, if we can consisten...
- 02:22 AM Bug #17968 (Resolved): Ceph:OSD can't finish recovery+backfill process due to assertion failure
06/29/2017
- 09:19 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
- 09:18 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- https://github.com/ceph/ceph/pull/14760
- 07:33 PM Bug #12615: Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clea...
- This will be fixed when we move repair out of the OSD. We shouldn't be using recovery to do repair anyway.
- 07:32 PM Bug #13493 (Duplicate): osd: for ec, cascading crash during recovery if one shard is corrupted
- 07:18 PM Bug #19964 (Fix Under Review): occasional crushtool timeouts
- https://github.com/ceph/ceph/pull/16025
- 06:17 PM Bug #19750 (Can't reproduce): osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
This isn't happening anymore from what I've seen. If it does let's get the full log. From the lines I'm being sho...- 06:09 PM Bug #17830 (Can't reproduce): osd-scrub-repair.sh is failing (intermittently?) on Jenkins
- Haven't been seeing this at all, so I'm closing for now.
- 05:45 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- 10:07 AM Bug #19939 (Fix Under Review): OSD crash in MOSDRepOpReply::decode_payload
- https://github.com/ceph/ceph/pull/16008
- 04:40 PM Bug #20454 (Fix Under Review): bluestore: leaked aios from internal log
- 04:40 PM Bug #20454 (Rejected): bluestore: leaked aios from internal log
- see #20385
- 03:16 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anthony D'Atri wrote:
> We've experienced at least three distinct cases of ops stuck for long periods of time on a s... - 03:15 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- We've experienced at least three distinct cases of ops stuck for long periods of time on a scrub. The attached file ...
- 08:14 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @josh is this related to #19497?
- 11:11 AM Bug #20464 (Fix Under Review): cache tier osd memory high memory consumption
- 10:59 AM Bug #20464: cache tier osd memory high memory consumption
- https://github.com/ceph/ceph/pull/16011
this is my pull request , please help to review it - 07:13 AM Bug #20464 (Resolved): cache tier osd memory high memory consumption
- the osd used as the cache tier in our EC cluster suffers from the high memory usage (5GB~6GB consumption per osd)
wh... - 08:42 AM Bug #20434: mon metadata does not include ceph_version
- Also just noticed this on a cluster updated from 12.0.3:...
- 03:07 AM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- http://pulpito.ceph.com/sage-2017-06-27_15:03:40-rados:thrash-master-distro-basic-smithi/
baseline on master... 5 ...
06/28/2017
- 10:09 PM Bug #14088 (Resolved): mon: nothing logged when ENOSPC encountered during start up
- 09:31 PM Bug #20434: mon metadata does not include ceph_version
- Assigning the issue to me as a place holder to remove the ticket from the pool of unassigned tickets. Daniel is worki...
- 07:08 PM Bug #20434: mon metadata does not include ceph_version
- Daniel Oliveira wrote:
> Just talked to Sage and looking into this.
I just tested with Luminous branch (and also ... - 05:32 PM Bug #18647: ceph df output with erasure coded pools
- First I would need to know the PR numbers of SHA1 hashes of the commits that fix the issue in master.
- 04:58 PM Bug #18647: ceph df output with erasure coded pools
- Is it possible to backport this into Jewel?
- 03:49 PM Bug #18647 (Resolved): ceph df output with erasure coded pools
- fixed in luminous
- 04:42 PM Bug #20454 (Resolved): bluestore: leaked aios from internal log
- Reprorted and diagnosed by Igor; opening a ticket so we don't forget.
- 04:06 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 04:05 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Kefu, any new updates or should this be unassigned from you?
- 12:51 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Here's another one:
/a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333648
fs/snaps/{b... - 03:57 PM Bug #18926 (Duplicate): Why osds do not release memory?
- see #18924
- 03:43 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- David, anything up with this? Is it an urgent bug?
- 03:41 PM Bug #18204 (Can't reproduce): jewel: finish_promote unexpected promote error (34) Numerical resul...
- 03:40 PM Bug #18467 (Resolved): ceph ping mon.* can fail
- 03:39 PM Bug #19067 (Need More Info): missing set not persisted
- 03:32 PM Bug #19605 (Can't reproduce): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front()...
- If you can reproduce this on master or luminous rc, please reopen!
- 03:31 PM Bug #19790 (Resolved): rados ls on pool with no access returns no error
- 03:30 PM Bug #19911 (Can't reproduce): osd: out of order op
- 03:29 PM Bug #20133 (Can't reproduce): EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksd...
- 03:28 PM Bug #19191: osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missing().is_mis...
- https://github.com/ceph/ceph/pull/14053
- 03:17 PM Bug #19191: osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missing().is_mis...
- 03:27 PM Bug #19983 (Closed): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/Kerne...
- 03:27 PM Bug #18681 (Won't Fix): ceph-disk prepare/activate misses steps and fails on [Bluestore]
- 03:22 PM Bug #19964 (In Progress): occasional crushtool timeouts
- 03:21 PM Bug #20446 (Fix Under Review): mon does not let you create crush rules using device classes
- 02:36 PM Bug #20446: mon does not let you create crush rules using device classes
- https://github.com/ceph/ceph/pull/15975
- 11:49 AM Bug #20446: mon does not let you create crush rules using device classes
- I tested in my env, It does exist in master branch. seems that it's easy to fix this problem. I will create a PR.
- 11:42 AM Bug #20446: mon does not let you create crush rules using device classes
- I will try to verify it.
- 07:20 AM Bug #20446 (Resolved): mon does not let you create crush rules using device classes
- i run ceph 12.1.0 version ,and try crush class function,and find a problem with the name
step:
1.ceph osd cru... - 03:18 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
- 03:17 PM Bug #19895 (Can't reproduce): test/osd/RadosModel.h: 1169: FAILED assert(version == old_value.ver...
- 03:08 PM Bug #20419 (Duplicate): OSD aborts when shutting down
- 02:56 PM Bug #20419: OSD aborts when shutting down
- sage suspects that it could be regression: we switched the order of shutting down recently.
- 10:42 AM Bug #20419: OSD aborts when shutting down
- so somebody was still holding a reference to pg 0.50 when OSD was trying to kick it.
- 02:15 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- aio completion thread blocking on deferred_lock:...
- 12:18 PM Bug #20451 (Can't reproduce): osd Segmentation fault after upgrade from jewel (10.2.5) to kraken ...
- hi,
after upgrade, some osds are down
*** Caught signal (Segmentation fault) **
in thread 7f0237441700 thread... - 10:31 AM Feature #5249: mon: support leader election configuration
- https://github.com/ceph/ceph/pull/15964 enables the MonClient to have preference to the closer monitors.
- 07:00 AM Feature #5249 (Fix Under Review): mon: support leader election configuration
- https://github.com/ceph/ceph/pull/15964
- 08:03 AM Bug #20445 (Need More Info): fio stalls, scrubbing doesn't stop when repeatedly creating/deleting...
- Question for the original reporter of this bug: why do you expect the scrub to stop?
Please provide more details. - 07:13 AM Bug #20445 (Need More Info): fio stalls, scrubbing doesn't stop when repeatedly creating/deleting...
- This happens on latest jewel and is possibly related to (recently merged) https://github.com/ceph/ceph/pull/15529
... - 12:47 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- /a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333726
/a/pdonnell-2017-06-27_19:50:40-fs-w... - 12:07 AM Bug #20439 (Resolved): PG never finishes getting created
dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi/1328370
description: rados/singleton/{all/diverg...
Also available in: Atom