Activity
From 06/07/2017 to 07/06/2017
07/06/2017
- 09:54 PM Bug #20326: Scrubbing terminated -- not all pgs were active and clean.
- Saw this error here:
/ceph/teuthology-archive/pdonnell-2017-07-01_01:07:39-fs-wip-pdonnell-20170630-distro-basic-s... - 09:19 PM Bug #20534: unittest_direct_messenger segv
- was able to reproduce with:...
- 07:37 PM Bug #20534 (Resolved): unittest_direct_messenger segv
- ...
- 02:34 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 09:20 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16159
- 06:36 AM Bug #20432: pgid 0.7 has ref count of 2
- at the end of @OSD::process_peering_events()@, @dispatch_context(rctx, 0, curmap, &handle)@ is called, which just del...
- 10:30 AM Backport #20511 (In Progress): jewel: cache tier osd memory high memory consumption
- 10:19 AM Backport #20492 (In Progress): jewel: osd: omap threadpool heartbeat is only reset every 100 values
- 04:27 AM Feature #20526: swap-bucket can save the crushweight and osd weight?
- it not a bug just a need feature
- 04:25 AM Feature #20526 (New): swap-bucket can save the crushweight and osd weight?
- i test the swap-bucket function,and have some advice
when use swap-bucket the dst bucket will in the old crush tre... - 03:20 AM Bug #20525 (Need More Info): ceph osd replace problem with osd out
- i have try the new function of replace the osd with new command ,it work, but i have some problem,i don't know if it'...
- 02:30 AM Bug #20434 (Fix Under Review): mon metadata does not include ceph_version
- https://github.com/ceph/ceph/pull/16148 ?
07/05/2017
- 08:05 PM Bug #18924 (Resolved): kraken-bluestore 11.2.0 memory leak issue
- 08:05 PM Backport #20366 (Resolved): kraken: kraken-bluestore 11.2.0 memory leak issue
- 07:48 PM Bug #20434: mon metadata does not include ceph_version
- ...
- 05:42 PM Backport #20512 (Rejected): kraken: cache tier osd memory high memory consumption
- 05:42 PM Backport #20511 (Resolved): jewel: cache tier osd memory high memory consumption
- https://github.com/ceph/ceph/pull/16169
- 04:15 PM Bug #20454: bluestore: leaked aios from internal log
- 03:34 PM Bug #20507 (Duplicate): "[WRN] Manager daemon x is unresponsive. No standby daemons available." i...
- /a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356209
rados/monthrash/{ceph.yaml clusters... - 03:33 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- /a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356174
rados/singleton-bluestore/{all/ceph... - 11:33 AM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 08:08 AM Bug #20432: pgid 0.7 has ref count of 2
- /a/kchai-2017-07-05_04:38:56-rados-wip-kefu-testing2-distro-basic-mira/1363113...
- 10:52 AM Feature #5249 (Resolved): mon: support leader election configuration
- 07:04 AM Bug #20464 (Pending Backport): cache tier osd memory high memory consumption
- 07:02 AM Bug #20464 (Resolved): cache tier osd memory high memory consumption
- 06:45 AM Bug #20504 (Fix Under Review): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- https://github.com/ceph/ceph/pull/16120
- 06:23 AM Bug #20504 (Resolved): FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed
- h1. 1. description
[root@yhg-1 work]# file 1498638564.27426.core ...
07/04/2017
- 05:51 PM Backport #20497 (In Progress): kraken: MaxWhileTries: reached maximum tries (105) after waiting f...
- 05:34 PM Backport #20497 (Resolved): kraken: MaxWhileTries: reached maximum tries (105) after waiting for ...
- https://github.com/ceph/ceph/pull/16111
- 05:34 PM Bug #20397 (Pending Backport): MaxWhileTries: reached maximum tries (105) after waiting for 630 s...
- 05:09 PM Bug #20433 (In Progress): 'mon features' does not update properly for mons
- 04:46 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- Happened on another kraken backport: https://github.com/ceph/ceph/pull/16108
- 08:33 AM Backport #20493 (Rejected): kraken: osd: omap threadpool heartbeat is only reset every 100 values
- 08:33 AM Backport #20492 (Resolved): jewel: osd: omap threadpool heartbeat is only reset every 100 values
- https://github.com/ceph/ceph/pull/16167
- 07:50 AM Bug #20491: objecter leaked OSDMap in handle_osd_map
- * /a/kchai-2017-07-04_06:08:32-rados-wip-20432-kefu-distro-basic-mira/1359525/remote/mira038/log/valgrind/osd.0.log.g...
- 05:46 AM Bug #20491 (Resolved): objecter leaked OSDMap in handle_osd_map
- ...
- 07:07 AM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- 05:49 AM Bug #20432 (Fix Under Review): pgid 0.7 has ref count of 2
- https://github.com/ceph/ceph/pull/16093
- 06:46 AM Bug #20375 (Pending Backport): osd: omap threadpool heartbeat is only reset every 100 values
- 05:35 AM Bug #19695: mon: leaked session
- /a/kchai-2017-07-04_04:14:45-rados-wip-20432-kefu-distro-basic-mira/1357985/remote/mira112/log/valgrind/mon.a.log.gz
- 02:59 AM Bug #20434: mon metadata does not include ceph_version
- Here it is the new output I get from a brand new installed cluster: ...
07/03/2017
- 03:58 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 10:51 AM Bug #20432: pgid 0.7 has ref count of 2
- seems @PG::recovery_queued@ is reset somehow after being set in @PG::queue_recovery()@, but the PG is not removed fro...
- 05:12 AM Bug #20432: pgid 0.7 has ref count of 2
- @Sage,
i reverted the changes introduced by 0780f9e67801f400d78ac704c65caaa98e968bbc and tested the verify test at... - 02:20 AM Bug #20432: pgid 0.7 has ref count of 2
- ...
- 03:29 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- Those look to be 22 and 60, which are DEFINE_CEPH_FEATURE_RETIRED(22, 1, BACKFILL_RESERVATION, JEWEL, LUMINOUS) and D...
- 01:44 PM Documentation #20486: Document how to use bluestore compression
- Joao Luis wrote:
> The bits I found out were through skimming the code, and that did not provide too much insight ... - 01:05 PM Documentation #20486 (Resolved): Document how to use bluestore compression
- Bluestore is becoming the de facto default, and I haven't found any docs on how to configure compression.
The bits...
07/02/2017
- 06:52 PM Bug #20432: pgid 0.7 has ref count of 2
- I suspect 0780f9e67801f400d78ac704c65caaa98e968bbc, which changed when the CLEAN flag was set at the end of recovery.
- 06:51 PM Bug #20432: pgid 0.7 has ref count of 2
- bisecting this... so far i've narrowed it down to something between f43c5fa055386455a263802b0908ddc96a95b1b0 and e972...
- 01:04 PM Bug #20432: pgid 0.7 has ref count of 2
- ...
07/01/2017
- 03:06 PM Bug #20432: pgid 0.7 has ref count of 2
- http://pulpito.ceph.com/kchai-2017-06-30_10:58:17-rados-wip-20432-kefu-distro-basic-smithi/
- 02:52 PM Bug #20470: rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- This test confuses me. It seems like the PG is always going to exist on the target osd.. why was it passing before?
- 02:17 PM Bug #20476: ops stuck waiting_for_map
- Trying to reproduce with same commit, more debugging, at http://pulpito.ceph.com/sage-2017-07-01_14:16:23-rados-wip-s...
- 02:08 PM Bug #20476 (Can't reproduce): ops stuck waiting_for_map
- observed many ops hung with waiting_for_map
made a dummy map update ('ceph osd unset nodown')
ops unblocked
... - 01:47 PM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- I've seen this at least twice now. It is not an upgrade test, so either unauthenticated clients that are strays in t...
- 01:46 PM Bug #20475 (Resolved): EPERM: cannot set require_min_compat_client to luminous: 6 connected clien...
- ...
- 06:35 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- WANG Guoqin wrote:
> Which IRC was that and do you have a chatting log on that?
https://gist.githubusercontent.co... - 06:10 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- sean redmond wrote:
> https://pastebin.com/raw/xmDPg84a was talked about in IRC by @mguz it seems it maybe related b... - 02:16 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-06-30_18:42:09-rados-wip-sage-testing-distro-basic-smithi/1345981
06/30/2017
- 11:28 PM Bug #20471 (Fix Under Review): Can't repair corrupt object info due to bad oid on all replicas
- https://github.com/ceph/ceph/pull/16052
- 11:03 PM Bug #20471 (In Progress): Can't repair corrupt object info due to bad oid on all replicas
- ...
- 05:24 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
We detect a kind of corruption where the oid in the object info doesn't match the oid of the object. This was adde...- 10:34 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- https://pastebin.com/raw/xmDPg84a was talked about in IRC by @mguz it seems it maybe related but this was kraken, jus...
- 03:25 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
- There was a lot of code churn around the 12.0.3 time period so this isn't too surprising to me. I'm not sure it's wo...
- 09:24 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- 09:03 PM Bug #20454: bluestore: leaked aios from internal log
- https://github.com/ceph/ceph/pull/16051 is a better fix
- 09:01 PM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- failure seems to be gone with the timeout change.
- 03:35 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- https://github.com/ceph/ceph/pull/16047
- 03:35 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- Easy workaround is to make the aio queue really big.
Harder fix to do some complicated locking juggling. I worry ... - 03:31 PM Bug #20277 (Can't reproduce): bluestore crashed while performing scrub
- 03:30 PM Cleanup #18734 (Resolved): crush: transparently deprecated ruleset/ruleid difference
- 03:30 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 03:29 PM Bug #20446: mon does not let you create crush rules using device classes
- see https://github.com/ceph/ceph/pull/16027
- 02:06 PM Bug #20470 (Resolved): rados/singleton/all/reg11184.yaml: assert proc.exitstatus == 0
- ...
- 01:51 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-06-30_05:44:03-rados-wip-sage-testing-distro-basic-smithi/1344959...
- 06:54 AM Bug #20432: pgid 0.7 has ref count of 2
- rerunning at http://pulpito.ceph.com/kchai-2017-06-30_06:49:46-rados-master-distro-basic-smithi/, if we can consisten...
- 02:22 AM Bug #17968 (Resolved): Ceph:OSD can't finish recovery+backfill process due to assertion failure
06/29/2017
- 09:19 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
- 09:18 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- https://github.com/ceph/ceph/pull/14760
- 07:33 PM Bug #12615: Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clea...
- This will be fixed when we move repair out of the OSD. We shouldn't be using recovery to do repair anyway.
- 07:32 PM Bug #13493 (Duplicate): osd: for ec, cascading crash during recovery if one shard is corrupted
- 07:18 PM Bug #19964 (Fix Under Review): occasional crushtool timeouts
- https://github.com/ceph/ceph/pull/16025
- 06:17 PM Bug #19750 (Can't reproduce): osd-scrub-repair.sh:2214: corrupt_scrub_erasure: test no = yes
This isn't happening anymore from what I've seen. If it does let's get the full log. From the lines I'm being sho...- 06:09 PM Bug #17830 (Can't reproduce): osd-scrub-repair.sh is failing (intermittently?) on Jenkins
- Haven't been seeing this at all, so I'm closing for now.
- 05:45 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- 10:07 AM Bug #19939 (Fix Under Review): OSD crash in MOSDRepOpReply::decode_payload
- https://github.com/ceph/ceph/pull/16008
- 04:40 PM Bug #20454 (Fix Under Review): bluestore: leaked aios from internal log
- 04:40 PM Bug #20454 (Rejected): bluestore: leaked aios from internal log
- see #20385
- 03:16 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anthony D'Atri wrote:
> We've experienced at least three distinct cases of ops stuck for long periods of time on a s... - 03:15 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- We've experienced at least three distinct cases of ops stuck for long periods of time on a scrub. The attached file ...
- 08:14 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @josh is this related to #19497?
- 11:11 AM Bug #20464 (Fix Under Review): cache tier osd memory high memory consumption
- 10:59 AM Bug #20464: cache tier osd memory high memory consumption
- https://github.com/ceph/ceph/pull/16011
this is my pull request , please help to review it - 07:13 AM Bug #20464 (Resolved): cache tier osd memory high memory consumption
- the osd used as the cache tier in our EC cluster suffers from the high memory usage (5GB~6GB consumption per osd)
wh... - 08:42 AM Bug #20434: mon metadata does not include ceph_version
- Also just noticed this on a cluster updated from 12.0.3:...
- 03:07 AM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- http://pulpito.ceph.com/sage-2017-06-27_15:03:40-rados:thrash-master-distro-basic-smithi/
baseline on master... 5 ...
06/28/2017
- 10:09 PM Bug #14088 (Resolved): mon: nothing logged when ENOSPC encountered during start up
- 09:31 PM Bug #20434: mon metadata does not include ceph_version
- Assigning the issue to me as a place holder to remove the ticket from the pool of unassigned tickets. Daniel is worki...
- 07:08 PM Bug #20434: mon metadata does not include ceph_version
- Daniel Oliveira wrote:
> Just talked to Sage and looking into this.
I just tested with Luminous branch (and also ... - 05:32 PM Bug #18647: ceph df output with erasure coded pools
- First I would need to know the PR numbers of SHA1 hashes of the commits that fix the issue in master.
- 04:58 PM Bug #18647: ceph df output with erasure coded pools
- Is it possible to backport this into Jewel?
- 03:49 PM Bug #18647 (Resolved): ceph df output with erasure coded pools
- fixed in luminous
- 04:42 PM Bug #20454 (Resolved): bluestore: leaked aios from internal log
- Reprorted and diagnosed by Igor; opening a ticket so we don't forget.
- 04:06 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 04:05 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Kefu, any new updates or should this be unassigned from you?
- 12:51 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Here's another one:
/a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333648
fs/snaps/{b... - 03:57 PM Bug #18926 (Duplicate): Why osds do not release memory?
- see #18924
- 03:43 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- David, anything up with this? Is it an urgent bug?
- 03:41 PM Bug #18204 (Can't reproduce): jewel: finish_promote unexpected promote error (34) Numerical resul...
- 03:40 PM Bug #18467 (Resolved): ceph ping mon.* can fail
- 03:39 PM Bug #19067 (Need More Info): missing set not persisted
- 03:32 PM Bug #19605 (Can't reproduce): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front()...
- If you can reproduce this on master or luminous rc, please reopen!
- 03:31 PM Bug #19790 (Resolved): rados ls on pool with no access returns no error
- 03:30 PM Bug #19911 (Can't reproduce): osd: out of order op
- 03:29 PM Bug #20133 (Can't reproduce): EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksd...
- 03:28 PM Bug #19191: osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missing().is_mis...
- https://github.com/ceph/ceph/pull/14053
- 03:17 PM Bug #19191: osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missing().is_mis...
- 03:27 PM Bug #19983 (Closed): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/Kerne...
- 03:27 PM Bug #18681 (Won't Fix): ceph-disk prepare/activate misses steps and fails on [Bluestore]
- 03:22 PM Bug #19964 (In Progress): occasional crushtool timeouts
- 03:21 PM Bug #20446 (Fix Under Review): mon does not let you create crush rules using device classes
- 02:36 PM Bug #20446: mon does not let you create crush rules using device classes
- https://github.com/ceph/ceph/pull/15975
- 11:49 AM Bug #20446: mon does not let you create crush rules using device classes
- I tested in my env, It does exist in master branch. seems that it's easy to fix this problem. I will create a PR.
- 11:42 AM Bug #20446: mon does not let you create crush rules using device classes
- I will try to verify it.
- 07:20 AM Bug #20446 (Resolved): mon does not let you create crush rules using device classes
- i run ceph 12.1.0 version ,and try crush class function,and find a problem with the name
step:
1.ceph osd cru... - 03:18 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
- 03:17 PM Bug #19895 (Can't reproduce): test/osd/RadosModel.h: 1169: FAILED assert(version == old_value.ver...
- 03:08 PM Bug #20419 (Duplicate): OSD aborts when shutting down
- 02:56 PM Bug #20419: OSD aborts when shutting down
- sage suspects that it could be regression: we switched the order of shutting down recently.
- 10:42 AM Bug #20419: OSD aborts when shutting down
- so somebody was still holding a reference to pg 0.50 when OSD was trying to kick it.
- 02:15 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- aio completion thread blocking on deferred_lock:...
- 12:18 PM Bug #20451 (Can't reproduce): osd Segmentation fault after upgrade from jewel (10.2.5) to kraken ...
- hi,
after upgrade, some osds are down
*** Caught signal (Segmentation fault) **
in thread 7f0237441700 thread... - 10:31 AM Feature #5249: mon: support leader election configuration
- https://github.com/ceph/ceph/pull/15964 enables the MonClient to have preference to the closer monitors.
- 07:00 AM Feature #5249 (Fix Under Review): mon: support leader election configuration
- https://github.com/ceph/ceph/pull/15964
- 08:03 AM Bug #20445 (Need More Info): fio stalls, scrubbing doesn't stop when repeatedly creating/deleting...
- Question for the original reporter of this bug: why do you expect the scrub to stop?
Please provide more details. - 07:13 AM Bug #20445 (Need More Info): fio stalls, scrubbing doesn't stop when repeatedly creating/deleting...
- This happens on latest jewel and is possibly related to (recently merged) https://github.com/ceph/ceph/pull/15529
... - 12:47 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- /a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333726
/a/pdonnell-2017-06-27_19:50:40-fs-w... - 12:07 AM Bug #20439 (Resolved): PG never finishes getting created
dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi/1328370
description: rados/singleton/{all/diverg...
06/27/2017
- 08:13 PM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- 07:12 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- I didn't do any digging through what patches were in the centos or xenial kernels. Happy if someone wants to chase t...
- 06:22 PM Bug #20434: mon metadata does not include ceph_version
- Just talked to Sage and looking into this.
- 04:46 PM Bug #20434 (Resolved): mon metadata does not include ceph_version
- on lab clsuter, after kraken -> luminous 12.1.0 upgrade,...
- 04:45 PM Bug #20433 (Resolved): 'mon features' does not update properly for mons
- on lab cluster, after upgrade from kraken -> luminous 12.1.0,...
- 04:06 PM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- 02:44 PM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- /a/sage-2017-06-27_05:44:05-rados-wip-sage-testing-distro-basic-smithi/1331664
rados/thrash/{0-size-min-size-overrid... - 04:05 PM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
- That fix (d24a8886658c2d8882275d69c6409717a62701be and 31d3ae8a878f7ede6357f602852d586e0621c73f) was not quite comple...
- 03:18 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- /a/sage-2017-06-27_05:44:05-rados-wip-sage-testing-distro-basic-smithi/1331957
- 03:17 PM Bug #20432 (Resolved): pgid 0.7 has ref count of 2
- ...
- 03:00 PM Bug #20419: OSD aborts when shutting down
- http://pulpito.ceph.com/yuriw-2017-06-27_03:16:16-rados-master_2017_6_27-distro-basic-smithi/1329613
http://pulpit...
06/26/2017
- 10:45 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 1:3.12.0-1.1ubuntu1
xenial on smithi107
/a/sage-2017-06-26_14:37:54-rados-wip-sage-testing2-distro-basic-smithi/132... - 10:43 PM Bug #20397: MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds from radosbe...
- /a/sage-2017-06-26_14:37:54-rados-wip-sage-testing2-distro-basic-smithi/1327079
rados/thrash/{0-size-min-size-overri... - 10:42 PM Bug #19964: occasional crushtool timeouts
- /a/sage-2017-06-26_14:37:54-rados-wip-sage-testing2-distro-basic-smithi/1327058
rados/thrash/{0-size-min-size-overri... - 04:08 PM Bug #19023 (Resolved): ceph_test_rados invalid read caused apparently by lost intervals due to mo...
- 04:04 PM Bug #20419 (Duplicate): OSD aborts when shutting down
- /a/kchai-2017-06-25_17:19:05-rados-wip-kefu-testing---basic-smithi/1324712/remote/smithi006/log/ceph-osd.3.log.gz
<p... - 02:53 PM Feature #5249: mon: support leader election configuration
- 12:12 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- Has this been reproduced with the following kernel fix applied?
commit 70e7af244f24c94604ef6eca32ad297632018583
A... - 10:11 AM Bug #20416 (Resolved): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Hello,
I've upgraded a Jewel cluster to Luminous 12.1.0 (RC), restarted the monitors, mgr is active, but I can't r...
06/23/2017
- 07:47 PM Bug #20302: "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle-master-dis...
- https://github.com/ceph/ceph/pull/15821
- 07:46 PM Bug #20302 (Resolved): "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle...
- 03:47 PM Bug #20302: "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle-master-dis...
- merged
- 03:10 PM Bug #20389 (Won't Fix): "Error EPERM: min_compat_client jewel < luminous, which is required for p...
- this is actually fine; we're ignoring errors from these commands (so the thrasher can work when the feature is unavai...
- 03:24 AM Bug #20389: "Error EPERM: min_compat_client jewel < luminous, which is required for pg-upmap" in ...
- Also in http://qa-proxy.ceph.com/teuthology/yuriw-2017-06-22_20:54:27-powercycle-wip-yuri-testing2_2017_7_22-distro-b...
- 03:23 AM Bug #20389 (Won't Fix): "Error EPERM: min_compat_client jewel < luminous, which is required for p...
- Run: http://pulpito.ceph.com/yuriw-2017-06-22_23:59:13-powercycle-wip-yuri-testing2_2017_7_22-distro-basic-smithi/
J... - 03:02 PM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- ...
- 02:57 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anything i could provide or test? VMs are still crashing every night...
- 11:32 AM Bug #19800: some osds are down when create a new pool and a new image of the pool (bluestore)
@sage weil, could you show me the PR refer to readahead please.
06/22/2017
- 07:57 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- These osd assertion failures reproduce consistently on shutdown in the rgw:multisite suite.
- 06:30 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Anecdotally, it looks like I may be running into this very same issue (or something similar) -- occasionally I have s...
- 05:46 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Basically yes.
In src/mon/Session.h -> Subscription->next = -1 or 0.
I am learning C++ standard all the way but... - 11:01 AM Bug #20381 (New): bluestore: deferred aio submission can deadlock with completion
- Turns out when something is marked as a duplicate in redmine, it automatically closes this one when I close the other...
- 11:00 AM Bug #20381 (Duplicate): bluestore: deferred aio submission can deadlock with completion
- This ticket was opened first, but let's close it in favour of 20381 because that one has the integration test logs.
- 10:53 AM Bug #20381: bluestore: deferred aio submission can deadlock with completion
- The backtrace looks exactly like the one in #20379 - duplicate?
- 10:41 AM Bug #20381 (Resolved): bluestore: deferred aio submission can deadlock with completion
- ...
- 11:00 AM Bug #20379 (Duplicate): bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- This ticket was opened first, but let's close it in favour of 20381 because that one has the integration test logs.
- 10:58 AM Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- Updated title to make it clear that this isn't specific to vstart
- 10:52 AM Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- Looks like the integration tests are hitting this as well.
- 09:25 AM Bug #20379 (Duplicate): bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0))
- There's already a bug (with lots of dups) that seems to be what I'm seeing in a vstart.sh cluster. Since this bug is...
- 02:13 AM Bug #20274 (Resolved): rewind divergent deletes head whiteout
- 12:50 AM Bug #20375 (Fix Under Review): osd: omap threadpool heartbeat is only reset every 100 values
- https://github.com/ceph/ceph/pull/15823
06/21/2017
- 10:26 PM Bug #20331 (Rejected): osd/PGLog.h: 770: FAILED assert(i->prior_version == last)
- #20274 isn't merged yet, fixing it there.
- 10:20 PM Bug #20331: osd/PGLog.h: 770: FAILED assert(i->prior_version == last)
- This is fallout from 986a31f02e11d915a630cab17234ec4b8040609c, the #20274 fix. When we skip error entries the prior_...
- 10:06 PM Bug #20375 (Resolved): osd: omap threadpool heartbeat is only reset every 100 values
- This could potentially be after 100MB of reads. There's little cost to resetting the heartbeat timeout, so simple do ...
- 09:02 PM Bug #20358 (Resolved): bluestore: sharedblob not moved during split
- 08:42 PM Bug #19909 (New): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- So I didn't follow it all the way through but it sure looks to me like our acting_primary input to the crashing seque...
- 09:13 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Yes, i'm pretty sure it was 12.0.3. But, not on first boot, only after massive failures got me to stale+down PG statu...
- 07:52 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Second reported case from mailing list of VMs locking up -- they also have VMs issuing periodic discards.
- 11:57 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Shouldn't this one be flagged as a regression? It was working fine under firefly and hammer.
- 07:31 PM Bug #19943 (Resolved): osd: enoent on snaptrimmer
- 04:34 PM Bug #20169 (Fix Under Review): filestore+btrfs occasionally returns ENOSPC
- https://github.com/ceph/ceph/pull/15814
- 04:09 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- I've seen xenial and centos failures now, no trusty yet.
- 04:07 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- ...
- 04:09 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- Also in http://qa-proxy.ceph.com/teuthology/yuriw-2017-06-21_01:02:43-rgw-master_2017_6_21-distro-basic-smithi/130726...
- 03:55 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- ok, valgrind is now restricted to centos again.
- 02:49 AM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- 03:46 PM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- It looks like it wasn't aggressive enough about reconnection to the mon:...
- 02:17 PM Bug #20371 (Resolved): mgr: occasional fails to send beacons (monc reconnect backoff too aggressi...
- for a while,...
- 01:48 PM Bug #20370 (New): leaked MOSDOp via PrimaryLogPG::_copy_some and PrimaryLogPG::do_proxy_write
- ...
- 01:43 PM Bug #20369 (New): segv in OSD::ShardedOpWQ::_process
- ...
- 12:01 PM Backport #20366 (In Progress): kraken: kraken-bluestore 11.2.0 memory leak issue
- 11:50 AM Backport #20366 (Resolved): kraken: kraken-bluestore 11.2.0 memory leak issue
- https://github.com/ceph/ceph/pull/15792
- 08:44 AM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
- *master PR*: https://github.com/ceph/ceph/pull/15295
*kraken backport PR*: https://github.com/ceph/ceph/pull/15792 - 02:22 AM Bug #18924 (Pending Backport): kraken-bluestore 11.2.0 memory leak issue
- 02:21 AM Bug #18924 (Fix Under Review): kraken-bluestore 11.2.0 memory leak issue
- https://github.com/ceph/ceph/pull/15792
should help - 02:34 AM Bug #20302 (Fix Under Review): "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in po...
- ...
- 02:31 AM Bug #20277 (Need More Info): bluestore crashed while performing scrub
- A bug was just fixed in the spanning blob code, see https://github.com/ceph/ceph/pull/15654. Are you able to reprodu...
- 02:23 AM Bug #20117 (Rejected): BlueStore.cc: 8585: FAILED assert(0 == "unexpected error")
- you need more log info to see what the actual error was. usually when i see this it's enospc...
- 02:12 AM Bug #19800 (Resolved): some osds are down when create a new pool and a new image of the pool (blu...
- This looks like rocksdb compaction, probably triggered in part by a big deletion. There was a recent fix to do reada...
06/20/2017
- 10:39 PM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
- If you don't use the GPT partition labels/types that ceph-disk uses then the device ownership won't be changed to cep...
- 10:35 PM Bug #19983 (Need More Info): osds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluesto...
- Do you mean you pulled out the disk, and then ceph-osd crashed? That is normal--the disk si gone!
Or, do you mean... - 09:15 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- https://github.com/ceph/ceph/pull/15791
- 09:07 PM Bug #20360: rados/verify valgrind tests: osds fail to start (xenial valgrind)
- related? also started seeing these:...
- 08:32 PM Bug #20360 (New): rados/verify valgrind tests: osds fail to start (xenial valgrind)
- ...
- 08:55 PM Bug #19299 (New): Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- Ping Sage, you got that subprocess strace data.
- 06:45 PM Bug #19299: Jewel -> Kraken: OSD boot takes 1+ hours, unusually high CPU
- Same problem here (fresh 12.0.3). Got OSD's behind by > 5000 maps, it took ~8 hours to get them booted.
Looking in... - 08:52 PM Bug #19700: OSD remained up despite cluster network being inactive?
- Sounds like we messed up the way cluster network heartbeating and the monitor's public network connection to the OSDs...
- 06:35 PM Bug #19700: OSD remained up despite cluster network being inactive?
- The cluster does not need to be performing any IO, other than normal peering and checking, and this will still happen...
- 08:50 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- red ref, are you saying you created a brand-new cluster with 12.0.3 and saw this on first boot?
Sage, do you think... - 06:30 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- I can confirm the second behavior ("failed to load OSD map for epoch 1") in native installed 12.0.3 (not in productio...
- 06:20 PM Bug #19909 (Won't Fix): PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid...
- What Greg said! :)
- 04:52 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- N.0.Y releases such as 12.0.2 are dev releases; you should not run them if you can't afford to rebuild them. Upgrades...
- 08:24 PM Bug #20227 (Resolved): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded s...
- 02:56 AM Bug #20227 (Fix Under Review): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark un...
- https://github.com/ceph/ceph/pull/15766
- 02:54 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- ...
- 02:50 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- /a/sage-2017-06-19_18:44:38-rbd:qemu-master---basic-smithi/1301319
- 08:16 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- /a/sage-2017-06-20_16:21:45-rados-wip-sage-testing2-distro-basic-smithi/1305525
rados/thrash/{0-size-min-size-overri... - 06:27 PM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- 06:15 PM Bug #20343: Jewel: OSD Thread time outs in XFS
- The filestore-level splitting and merging isn't in the logs - the best way to tell is examining a pg's directory - e....
- 05:32 PM Bug #20343: Jewel: OSD Thread time outs in XFS
- We looked through the mon logs and we can't really find any splitting (or merging) pg states in there. Do we need to...
- 12:34 AM Bug #20343: Jewel: OSD Thread time outs in XFS
- This could be filestore splitting directories into multiple subdirectories when there are many objects, then merging ...
- 06:12 PM Bug #19943 (Fix Under Review): osd: enoent on snaptrimmer
- https://github.com/ceph/ceph/pull/15787
- 06:02 PM Bug #19943: osd: enoent on snaptrimmer
- no, i'm an idiot, ceph-objectstore-tool is doing it and it's noted in a different log file. sheesh.
- 01:43 PM Bug #19943: osd: enoent on snaptrimmer
- confirmed same thing in another run. on osd startup, fsck shows the key that was deleted....
- 04:33 PM Bug #20301: "/src/osd/SnapMapper.cc: 231: FAILED assert(r == -2)" in rados
- also in http://qa-proxy.ceph.com/teuthology/yuriw-2017-06-20_00:37:23-rados-master-2017_6_20-distro-basic-smithi/1302...
- 03:56 PM Bug #20358 (Fix Under Review): bluestore: sharedblob not moved during split
- https://github.com/ceph/ceph/pull/15783
- 03:54 PM Bug #20358 (Resolved): bluestore: sharedblob not moved during split
- ...
- 01:22 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
- Bug is not reproducible after this commit (not sure that only one contains fix):
commit d6d1db62edeb4c40a774fcb56e...
06/19/2017
- 11:05 PM Bug #20273 (Resolved): osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
- 10:47 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- from thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/017869.html
[15:41:40] <jdillaman> greg... - 10:25 PM Bug #20343: Jewel: OSD Thread time outs in XFS
- That IO pattern may just be killing the OSD on its own, but I'm not sure what RGW is turning it into or if there's st...
- 07:16 PM Bug #20343 (New): Jewel: OSD Thread time outs in XFS
- Creating a tracker ticket following suggestion from mailing list:
"
We've been having this ongoing problem with... - 09:12 PM Bug #19960 (Resolved): overflow in client_io_rate in ceph osd pool stats
- If it's just one or two commits, we could backport (please fill in the Backport field in that case). But 131 commits?
- 09:11 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
- Aleksei: Please be more specific. PR#15073 has 131 commits - see https://github.com/ceph/ceph/pull/15073/commits
- 07:55 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- http://pulpito.ceph.com/jdillaman-2017-05-25_16:48:38-rbd-wip-jd-testing-distro-basic-smithi/1229611
- 07:55 PM Bug #20092 (Duplicate): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
- Oh, that's probably the new thing where btrfs is giving us ENOENT (Sage guessing it's about rocksdb and snapshots). T...
- 12:26 PM Bug #20092 (Rejected): ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
- The osd.1 log showed the rocksdb encountered a full disk:
-17> 2017-05-25 22:14:28.664403 7fb70cd9b700 -1 rocks... - 07:51 PM Bug #20326 (Resolved): Scrubbing terminated -- not all pgs were active and clean.
- 06:45 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- reliably triggered, it seems, by rbd/qemu xfstests workload
- 06:45 PM Bug #19882 (Resolved): rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0....
- 05:43 PM Bug #19943: osd: enoent on snaptrimmer
- ...
- 03:30 PM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
- Moving this to the RADOS bluestore tracker since it's probably owned by that team.
- 11:55 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- ...
- 10:54 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- Unless there was a patch, I wouldn't be too sure this is fixed -- it was an intermittent failure.
- 10:48 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- all passed modulo a valgrind error in ceph-mds, see /a/kchai-2017-06-19_09:40:27-fs-master---basic-smithi/1300881/rem...
- 09:41 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- rerunning at http://pulpito.ceph.com/kchai-2017-06-19_09:40:27-fs-master---basic-smithi/
- 08:14 AM Feature #15835 (Fix Under Review): filestore: randomize split threshold
06/18/2017
- 08:36 AM Bug #20332: rados bench seq option doesn't work
- Did you actually write out some data for it to read first? "seq" is just pulling back whatever was written down in th...
- 08:28 AM Bug #20303: filejournal: Unable to read past sequence ... journal is corrupt
- Bumping this priority up since it's an assert on read of committed data, rather than a simple disk write error.
- 08:24 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- Sounds like we need some way of more reliably accounting for the extra cost of EC overwrites in our throttle limits.
06/17/2017
- 09:19 PM Bug #20188: filestore: os/filestore/FileStore.h: 357: FAILED assert(q.empty()) from ceph_test_obj...
- This testing branch didn't include any of the filestore improvements we've been getting, did it?
- 09:18 PM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-06-17_13:41:40-rados-wip-sage-testing-distro-basic-smithi/1297478
- 09:16 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- Do we have any idea why it hasn't popped up in leveldb? Is the multi-threading stuff less conducive to being snapshot...
- 09:14 PM Bug #20134 (Rejected): test_rados.TestIoctx.test_aio_read AssertionError: 5 != 2
- 5 is EIO. Thats not an error code we produce, but it's a possibility until David's stuff preventing us from returning...
- 09:10 PM Bug #20326: Scrubbing terminated -- not all pgs were active and clean.
- https://github.com/ceph/ceph/pull/15747
- 09:09 PM Bug #20116: osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILED assert(curmap))
- Are there more logs or core dumps available around this? That backtrace looks serious but doesn't contain enough info...
- 09:05 PM Support #20108 (Resolved): PGs are not remapped correctly when one host fails
- Okay, as described (and especially since it's better in jewel) this is almost certainly about CRUSH max_retries. I'm ...
- 06:18 PM Bug #20242: Make osd-scrub-repair.sh unit test run faster
- I'm looking into making this test run faster as well as a couple of the other slow ones by splitting them up into sma...
- 06:18 PM Bug #19639 (Can't reproduce): mon crash on shutdown
- 05:52 PM Bug #19639: mon crash on shutdown
- I haven't seen this happen again in recent memory.
- 05:25 AM Bug #19639: mon crash on shutdown
- Turning this down; should close if we don't get it happening again.
- 02:59 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- A month past and I'm still not able to figure where the problem was, neither am I able to recover my cluster. Trying ...
- 01:47 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- I presume this was a bug in the older dev releases, but we should verify that before release.
- 02:26 PM Bug #20099 (Need More Info): osd/filestore: osd/PGLog.cc: 911: FAILED assert(last_e.version.versi...
- Does this still exist or is it all cleaned up now? The repeating versions is a little weird but that's not enough dat...
- 02:22 PM Bug #20092: ceph-osd: FileStore::_do_transaction: assert(0 == "unexpected error")
- Do you have any evidence this *wasn't* an unexpected error given to us by the Filesystems, Jason? That does happen in...
- 02:15 PM Bug #20059: miscounting degraded objects
- Maybe we count each instance of an object when it's degraded (i.e., 3x for replicated pools), but the non-degraded on...
- 01:43 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- Is this the read of partially-written EC extents? Need some context if it's in Testing...
- 01:36 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- http://pulpito.ceph.com/sage-2017-06-16_19:23:03-rbd:qemu-wip-19882---basic-smithi/
reliably reproduced by rbd/qemu - 05:50 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
- (Optimistically sorting it as a test issue.)
- 05:50 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
- Is the message that the primary OSD is down incorrect? We've seen a few things like this that are test bugs around ha...
- 05:45 AM Bug #19700 (Need More Info): OSD remained up despite cluster network being inactive?
- 05:42 AM Bug #19695: mon: leaked session
- Has this reproduced? I thought valgrind was clean enough we notice new leaks.
- 05:19 AM Bug #19518: log entry does not include per-op rvals?
- Have we *ever* filled in the per-op rvalues on retry? That sounds distressingly like returning read data on a write o...
- 05:15 AM Bug #19487 (In Progress): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
- Based on PR comments we expect this to be fixed up by one of David's disk handling branches. Or did that one already...
- 03:52 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- John, sorry. i missed this. will take a look at it next monday.
- 02:34 AM Bug #19486: Rebalancing can propagate corrupt copy of replicated object
- Hat is an interesting point about BlueStore; it will detect corruption but not manual edits...
- 02:23 AM Bug #19400 (Resolved): add more info during pool delete error
- 12:26 AM Bug #20332 (Won't Fix): rados bench seq option doesn't work
For some reason "seq" option finishes too quickly....
06/16/2017
- 09:10 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- /a/sage-2017-06-16_18:45:23-rados-wip-sage-testing-distro-basic-smithi/1293630
- 01:40 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- ...
- 09:10 PM Bug #20331 (Rejected): osd/PGLog.h: 770: FAILED assert(i->prior_version == last)
- ...
- 07:44 PM Bug #20000 (Need More Info): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- 07:44 PM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- Could be... maybe also #20273?
- 02:56 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- we found that the msg threads still working after the `delete osd` in asyncmsg env, its because the asyncmsg::wait() ...
- 07:41 PM Bug #20274: rewind divergent deletes head whiteout
- 01:39 PM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- /a/sage-2017-06-16_00:46:50-rados-wip-sage-testing-distro-basic-smithi/1292433
rados/thrash-erasure-code/{ceph.yam... - 01:49 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- /a/kchai-2017-06-15_17:39:27-rados-wip-kefu-testing---basic-smithi/1291475 also with rocksdb + btrfs
- 06:39 AM Bug #14088 (In Progress): mon: nothing logged when ENOSPC encountered during start up
- https://github.com/ceph/ceph/pull/15723 - merged
- 05:54 AM Bug #19320: Pg inconsistent make ceph osd down
- Hmm, did one of our official release said have the broken snapshot trimming back port semantics? I didn't think so bu...
- 04:05 AM Bug #20256 (Resolved): "ceph osd df" is broken; asserts out on Luminous-enabled clusters
- 02:30 AM Bug #20326 (In Progress): Scrubbing terminated -- not all pgs were active and clean.
- ...
- 01:29 AM Bug #20326 (New): Scrubbing terminated -- not all pgs were active and clean.
- 01:03 AM Bug #20326 (Resolved): Scrubbing terminated -- not all pgs were active and clean.
- ...
- 12:42 AM Bug #20105: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- /a//kchai-2017-06-15_17:39:27-rados-wip-kefu-testing---basic-smithi/1291451
06/15/2017
- 09:42 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- 06:04 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- /a/teuthology-2017-06-15_02:01:02-rbd-master-distro-basic-smithi/1287766
rbd/qemu/{cache/writeback.yaml clusters/{fi... - 05:59 PM Bug #20273 (Fix Under Review): osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
- https://github.com/ceph/ceph/pull/15710
- 05:53 PM Bug #20273: osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
- - handle_osd_map queued a write, with _write_committed as callback
- thread pools alls hut down, including peering_w...
06/14/2017
- 08:36 PM Bug #20256: "ceph osd df" is broken; asserts out on Luminous-enabled clusters
- 08:20 PM Bug #20303 (Can't reproduce): filejournal: Unable to read past sequence ... journal is corrupt
- Run: http://pulpito.ceph.com/teuthology-2017-06-14_15:26:27-powercycle-master-distro-basic-smithi/
Job: 1285933
Log... - 08:18 PM Bug #20302 (Resolved): "BlueStore.cc: 9023: FAILED assert(0 == "unexpected error")" in powercycle...
- Run: http://pulpito.ceph.com/teuthology-2017-06-14_15:26:27-powercycle-master-distro-basic-smithi/
Job: 1285969
Log... - 07:52 PM Bug #20301 (Can't reproduce): "/src/osd/SnapMapper.cc: 231: FAILED assert(r == -2)" in rados
- Run: http://pulpito.ceph.com/yuriw-2017-06-14_15:02:07-rados-master_2017_6_14-distro-basic-smithi/
Job: 1285768
Log... - 06:46 PM Bug #19943 (In Progress): osd: enoent on snaptrimmer
- 02:12 PM Bug #19943: osd: enoent on snaptrimmer
- log with more debugging at /a/sage-2017-06-14_03:38:53-rados:thrash-wip-19943---basic-smithi/1284145/ceph-osd.5.log
- 03:38 AM Bug #19943: osd: enoent on snaptrimmer
- WTH. I've seen two cases where the object exists in snapmapper a different pool (cache tiering), but I think this is...
- 04:26 PM Bug #17806 (Resolved): OSD: do not open pgs when the pg is not in pg_map
- 10:01 AM Bug #17806: OSD: do not open pgs when the pg is not in pg_map
- The PR is merged to upstream. https://github.com/ceph/ceph/pull/11803. So please close it. Thanks.
- 03:54 AM Bug #17806: OSD: do not open pgs when the pg is not in pg_map
- Without more details I'm not sure this assessment is actually correct...
- 02:34 PM Bug #20295 (Resolved): bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool ...
- When running "rbd bench-write" using an RBD image stored in an EC pool, the some OSD threads start to timeout and eve...
- 01:44 PM Bug #16890: rbd diff outputs nothing when the image is layered and with a writeback cache tier
- RBD isn't doing anything special with regard to cache tiering. It sounds like the whiteout in the cache tier is not r...
- 03:35 AM Bug #16890: rbd diff outputs nothing when the image is layered and with a writeback cache tier
- Jason, can you make sure you expect this to work from an RBD perspective and throw it into the RADPS project if so? :)
- 01:32 PM Feature #15835: filestore: randomize split threshold
- https://github.com/ceph/ceph/pull/15689
- 09:01 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Greg Farnum wrote:
> Note the second reporter confirms this is with cache tiering. Rather suspect that's got more to... - 03:46 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Note the second reporter confirms this is with cache tiering. Rather suspect that's got more to do with it than snaps...
- 05:27 AM Bug #18930: received Segmentation fault in PGLog::IndexedLog::add
- Don't suppose there's still a log or core dump associated with this?
- 04:46 AM Bug #14088: mon: nothing logged when ENOSPC encountered during start up
- No, just scrubbing and trying to get things in a realistic state.
- 04:08 AM Bug #14088: mon: nothing logged when ENOSPC encountered during start up
- Greg, No, but I can try and take a look in the next few days if you'd like?
- 12:46 AM Bug #14088: mon: nothing logged when ENOSPC encountered during start up
- Brad, did you do any work on this?
- 04:35 AM Bug #18752: LibRadosList.EnumerateObjects failure
- Hasn't reproduced yet.
- 04:27 AM Bug #18328 (Closed): crush: flaky unitest:
- 04:13 AM Bug #18021 (Duplicate): Assertion "needs_recovery" fails when balance_read reaches a replica OSD ...
- These are the same thing, right?
- 04:11 AM Bug #17968: Ceph:OSD can't finish recovery+backfill process due to assertion failure
- https://github.com/ceph/ceph/pull/15489#issuecomment-308152157
- 04:09 AM Bug #17949 (Resolved): make check: unittest_bit_alloc get_used_blocks() >= 0
- Linked PR is not merged but has a comment the race condition fix was merged.
- 04:03 AM Bug #17830: osd-scrub-repair.sh is failing (intermittently?) on Jenkins
- David, do we have any idea why this is failing? I'm not getting any idea from what's in the comments here.
- 03:51 AM Bug #17718: EC Overwrites: update ceph-objectstore-tool export/import to handle rollforward/rollback
- Josh, is this still outstanding? I presume we need it for testing...
- 03:02 AM Bug #16385 (Fix Under Review): rados bench seq and rand tests do not work if op_size != object_size
- One of the stuck PRs:
https://github.com/ceph/ceph/pull/12203 - 02:59 AM Bug #16379 (Closed): [ERROR ] "ceph auth get-or-create for keytype admin returned -1
- It's been a year without updates and tests are more or less working, so this must be fixed.
- 02:56 AM Bug #16365 (Resolved): Better network partition detection
- We're switching to 2KB heartbeat packets now for other reasons. I don't think there's much else we can do here, pract...
- 01:37 AM Bug #16177 (Closed): leveldb horrendously slow
- Adam's cluster got cleaned up; the MDS doesn't allow you to generate directory omaps that large anymore; RGW is doing...
- 12:43 AM Bug #13493: osd: for ec, cascading crash during recovery if one shard is corrupted
- I suspect this is being resolved by David's work on EIO handling?
- 12:02 AM Bug #20283 (New): qa: missing even trivial tests for many commands
- I wrote a trivial script to look for missing commands in tests (https://github.com/ceph/ceph/pull/15675/commits/3aad0...
06/13/2017
- 11:38 PM Bug #20256: "ceph osd df" is broken; asserts out on Luminous-enabled clusters
- https://github.com/ceph/ceph/pull/15675
- 10:00 PM Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.
- I don't really get how the AsyncMessenger could have caused this issue...?
- 09:50 PM Bug #12659 (Closed): Can't delete cache pool
- Closing due to lack of updates and various changes in cache pools since .94.
- 09:48 PM Bug #12615: Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clea...
- David, is this still an issue?
- 08:53 AM Bug #20277: bluestore crashed while performing scrub
- What happened (twice) was:
* the osd had a crc error inconsistent pg
* set debug-bluestore and debug-osd to 20
* t... - 08:21 AM Bug #20277 (Can't reproduce): bluestore crashed while performing scrub
- ...
- 03:07 AM Bug #20274: rewind divergent deletes head whiteout
- https://github.com/ceph/ceph/pull/15649
- 02:54 AM Bug #20274 (Resolved): rewind divergent deletes head whiteout
- ...
- 03:00 AM Bug #19943: osd: enoent on snaptrimmer
- with snap trim whiteout fix applied,
/a/sage-2017-06-12_20:56:37-rados-wip-sage-testing-distro-basic-smithi/128066... - 02:59 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- /a/sage-2017-06-12_20:56:37-rados-wip-sage-testing-distro-basic-smithi/1280581
has full log... - 02:33 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- ...
- 02:28 AM Bug #20273 (Resolved): osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty())
- ...
06/12/2017
- 04:35 PM Bug #20256: "ceph osd df" is broken; asserts out on Luminous-enabled clusters
- So obviously what happened is I thought we had moved the osd df command into the monitor, but that didn't actually ha...
- 04:33 PM Bug #20256 (Resolved): "ceph osd df" is broken; asserts out on Luminous-enabled clusters
- I got a private email report:
When do ‘ceph osd df’, ceph-mon always crush. The stack info as following:... - 08:46 AM Bug #18043: ceph-mon prioritizes public_network over mon_host address
- Thanks for the update, I look forward to seeing your PR :).
06/11/2017
- 07:52 PM Bug #13146 (Resolved): mon: creating a huge pool triggers a mon election
- We're throttling PG creates now.
- 07:28 PM Bug #11907: crushmap validation must not block the monitor
- Don't we internally time out crush map testing now? Does it behave sensibly if things take too long?
- 07:21 PM Bug #9523 (Closed): Both op threads and dispatcher threads could be stuck at acquiring the budget...
- Based on the PR discussion it seems the diagnosed issue wasn't the cause of the slowness. Closing since it hasn't (kn...
06/09/2017
- 07:51 PM Bug #20243 (Resolved): Improve size scrub error handling and ignore system attrs in xattr checking
Something similar to this was seen on a production system. If all the object_info_t matched there would be no erro...- 06:39 PM Bug #20242 (Resolved): Make osd-scrub-repair.sh unit test run faster
Most likely move some tests to the rados suite.- 01:26 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- ugh just saw this on xenial too. hrm.
/a/sage-2017-06-08_20:27:41-rados-wip-sage-testing2-distro-basic-smithi/127...
06/08/2017
- 06:52 PM Bug #20227 (Need More Info): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unlo...
- Hmm, I see the fault_range call (it's in the new ec unclone code), but it's only dirtying the range including extents...
- 06:18 PM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- /a/sage-2017-06-08_02:04:29-rados-wip-sage-testing-distro-basic-smithi/1269367 too
- 06:14 PM Bug #20227 (Resolved): os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded s...
- ...
- 06:44 PM Bug #20221: kill osd + osd out leads to stale PGs
- @Greg the original bug description was updated with a simpler reproducer which does not involve copying objects. I be...
- 06:34 PM Bug #20221: kill osd + osd out leads to stale PGs
- Right, but what you've said here is that if you have pool size one, and kill the only OSD hosting it, then no other O...
- 02:58 PM Bug #20221: kill osd + osd out leads to stale PGs
- FWIW it was reproduced by badone.
- 12:20 PM Bug #20221: kill osd + osd out leads to stale PGs
- @Greg the first reproducer was not trying to rados put the same object. It was trying to rados put another object. I ...
- 12:18 PM Bug #20221: kill osd + osd out leads to stale PGs
- The reproducer works as expected on 12.0.3. The behavior changed somewhere in master after 12.0.3 was released.
- 12:17 PM Bug #20221: kill osd + osd out leads to stale PGs
- I don't understand what behavior you're looking for. Hanging is the expected behavior when data is unavailable.
- 10:07 AM Bug #20221 (New): kill osd + osd out leads to stale PGs
- h3. description
When the OSD is killed before ceph osd out, the PGs stay in stale state.
h3. reproducer
From... - 05:53 PM Bug #19960 (Pending Backport): overflow in client_io_rate in ceph osd pool stats
- 03:14 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
- > By which commit/PR?
554cf8394a9ac4f845c1fce03dd1a7f551a414a9
Merge pull request #15073 from liewegas/wip-mgr-stats - 11:00 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Hi Greg,
Thank you for taking the time to look into this.
Following the incident of the present ticket the clus...
06/07/2017
- 08:57 PM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-06-07_16:25:35-rados-wip-sage-testing2-distro-basic-smithi/1268182
rados/thrash-erasure-code/{ceph.ya... - 02:03 AM Bug #19943: osd: enoent on snaptrimmer
- /a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265627
rados/thrash/{0-size-min-size-overr... - 08:11 PM Documentation #20215 (New): librados documentation improvement for the use cases
- librados documentation improvement for the use cases including the tradeoffs of object size, i/o rate, and omap vs re...
- 04:44 PM Bug #18696: OSD might assert when LTTNG tracing is enabled
- Wonder if this PR https://github.com/ceph/ceph/pull/14304 fixes this issue as well.
- 04:01 PM Bug #18750: handle_pg_remove: pg_map_lock held for write when taking pg_lock
- I think I remember this one and it wasn't really feasible to fix (at the time). If doing code inspection you'll want ...
- 03:59 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Pretty weird, that assert appears to be an internal interval_set consistency thing: https://github.com/ceph/ceph/blob...
- 03:58 PM Bug #19198: Bluestore doubles mem usage when caching object content
- 03:50 PM Bug #18667: [cache tiering] omap data time-traveled to stale version
- Jason says this "seems to pop up randomly every few weeks or so", so it's definitely a live, going concern. :(
- 03:40 PM Bug #19086 (Rejected): BlockDevice::create should add check for readlink result instead of raise ...
- 03:36 PM Bug #18647: ceph df output with erasure coded pools
- Let's verify this prior to Luminous and write a test for it!
- 03:29 PM Bug #19023 (Fix Under Review): ceph_test_rados invalid read caused apparently by lost intervals d...
- https://github.com/ceph/ceph/pull/15555
- 01:23 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
- Aleksei Gutikov wrote:
> fixed in master
By which commit/PR? - 12:04 PM Bug #19960: overflow in client_io_rate in ceph osd pool stats
- fixed in master
- 09:28 AM Bug #19783 (New): upgrade tests failing with "AssertionError: failed to complete snap trimming be...
- 06:34 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Zengran Zhang wrote:
> 2017-05-19 22:48:23.854608 7f14f1c1e700 0 -- 10.10.133.1:6823/2019 >> 10.10.133.1:6819/19544... - 02:04 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- ...
- 02:02 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- /a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265467
rados/thrash/{0-size-min-size-overrid... - 02:02 AM Bug #20169: filestore+btrfs occasionally returns ENOSPC
- /a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265435
rados/thrash/{0-size-min-size-overr...
Also available in: Atom