Activity
From 07/14/2017 to 08/12/2017
08/12/2017
- 06:08 PM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
- /a/sage-2017-08-11_21:54:20-rados-luminous-distro-basic-smithi/1512264
I'm going to whitelist this on luminous bra... - 05:31 PM Bug #20985: PG which marks divergent_priors causes crash on startup
- If anyone wants to validate that the fix packages at https://shaman.ceph.com/repos/ceph/wip-20985-divergent-handling-...
- 09:19 AM Bug #20985: PG which marks divergent_priors causes crash on startup
- Facing the same issue upgrading from jewel 10.2.9 -> luminous 12.1.3 (RC)
- 02:55 AM Bug #20923 (Resolved): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- 02:35 AM Bug #20983 (Resolved): bluestore: failure to dirty src onode on clone with 1-byte logical extent
08/11/2017
- 10:49 PM Bug #20986 (Can't reproduce): segv in crush_destroy_bucket_straw2 on rados/standalone/misc.yaml
- ...
- 10:45 PM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- ...
- 10:43 PM Bug #20985: PG which marks divergent_priors causes crash on startup
- Luminous at https://github.com/ceph/ceph/pull/17001
- 10:20 PM Bug #20985: PG which marks divergent_priors causes crash on startup
- https://github.com/ceph/ceph/pull/17000
Still compiling, testing, etc - 10:16 PM Bug #20985 (Resolved): PG which marks divergent_priors causes crash on startup
- This was noticed in the course of somebody upgrading from 12.1.1 to 12.1.2:...
- 10:14 PM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
- /a/sage-2017-08-11_17:22:37-rados-wip-sage-testing-20170811a-distro-basic-smithi/1511996
- 10:12 PM Bug #20959: cephfs application metdata not set by ceph.py
- https://github.com/ceph/ceph/pull/16954
- 02:29 AM Bug #20959 (Resolved): cephfs application metdata not set by ceph.py
- 05:36 PM Bug #20770: test_pidfile.sh test is failing 2 places
- 05:34 AM Bug #20770 (In Progress): test_pidfile.sh test is failing 2 places
- 04:46 PM Bug #20983: bluestore: failure to dirty src onode on clone with 1-byte logical extent
- https://github.com/ceph/ceph/pull/16994
- 04:45 PM Bug #20983 (Resolved): bluestore: failure to dirty src onode on clone with 1-byte logical extent
- symptom is...
- 04:27 PM Bug #20981: ./run_seed_to_range.sh errored out
- Super weird.. looks like a race between heartbeat timeout and a failure injection maybe?...
- 01:26 PM Bug #20981 (Can't reproduce): ./run_seed_to_range.sh errored out
- ...
- 01:00 PM Bug #20974 (Fix Under Review): osd/PG.cc: 3377: FAILED assert(r == 0) (update_snap_map remove fails)
- https://github.com/ceph/ceph/pull/16982
08/10/2017
- 07:59 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Yes, but osd.0 doing that is very incorrect. We've had some problems in this area before with marking stuff down not ...
- 10:20 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- greg, osd.0 failed to send the reply of tid 5386 over the wire because it was disconnected. but it managed to send th...
- 07:41 PM Bug #20975: test_pidfile.sh is flaky
- https://github.com/ceph/ceph/pull/16977
- 07:41 PM Bug #20975 (Resolved): test_pidfile.sh is flaky
- fails regularly on make check. disabling it for now.
- 04:41 PM Bug #20939: crush weight-set + rm-device-class segv
- 04:15 PM Feature #20956 (Pending Backport): Include front/back interface names in OSD metadata
- 04:12 PM Bug #20949 (Resolved): mon: quorum incorrectly believes mon has kraken (not jewel) features
- 03:49 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Moving this back to RADOS -- changing librbd to force a full object diff if an object exists in the cache tier seems ...
- 02:16 PM Bug #20974 (Can't reproduce): osd/PG.cc: 3377: FAILED assert(r == 0) (update_snap_map remove fails)
- ...
- 01:33 PM Bug #20958 (Resolved): missing set lost during upgrade
- also backported
- 01:23 PM Bug #20973 (Can't reproduce): src/osdc/ Objecter.cc: 3106: FAILED assert(check_latest_map_ops.fin...
- ...
- 07:04 AM Bug #20970 (Resolved): bug in funciton reweight_by_utilization
- There is one bug in function OSDMonitor::reweight_by_utilization ...
08/09/2017
- 09:34 PM Bug #20798 (Need More Info): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- Logs from the ClsLock unittest clearly show that there is a race in the test and it tries to take the lock again befo...
- 09:15 PM Bug #20959 (In Progress): cephfs application metdata not set by ceph.py
- So far I've identified three problems in the source:
1) we don't check that we're in luminous mode before the MDS se... - 07:57 PM Bug #20959: cephfs application metdata not set by ceph.py
- As I reported in #20891 I am seeing this on fresh luminous clusters.
- 07:56 PM Bug #20959: cephfs application metdata not set by ceph.py
- Okay, unlike the previous log I looked at, the "fs new" command is clearly *not* triggering a new osd map commit. We ...
- 07:53 PM Bug #20959: cephfs application metdata not set by ceph.py
- Hmm, this still doesn't make sense. The cluster started out as luminous and so the maps would always have the luminou...
- 04:19 PM Bug #20959: cephfs application metdata not set by ceph.py
- The bug I hit before was doing the right checks on encoding, *but* the pending_inc was applied to the in-memory mon c...
- 03:29 PM Bug #20959: cephfs application metdata not set by ceph.py
- We're encoding with the quorum features, though, so I don't think that could actually cause a problem, Maybe though.
- 03:23 PM Bug #20959: cephfs application metdata not set by ceph.py
- Sage was right, the MDSMonitor unconditionally calls do_application_enable() and that unconditionally sets applicatio...
- 03:06 PM Bug #20959 (Resolved): cephfs application metdata not set by ceph.py
- "2017-08-09 06:52:11.115593 mon.a mon.0 172.21.15.12:6789/0 154 : cluster [WRN] Health check failed: application not ...
- 07:54 PM Bug #20920 (Resolved): pg dump fails during point-to-point upgrade
- 07:26 PM Bug #20920: pg dump fails during point-to-point upgrade
- https://github.com/ceph/ceph/pull/16871
- 07:54 PM Backport #20963 (Resolved): luminous: pg dump fails during point-to-point upgrade
- Manually cherry-picked to luminous ahead of the 12.2.0 release.
- 06:32 PM Backport #20963 (Resolved): luminous: pg dump fails during point-to-point upgrade
- 07:33 PM Bug #20960: ceph_test_rados: mismatched version (due to pg import/export)
- I'm not really sure how we could reasonably handle this scenario on the Ceph side. Seems like we should adjust the te...
- 07:06 PM Bug #20960: ceph_test_rados: mismatched version (due to pg import/export)
- meanwhile on osd.2, start is...
- 06:46 PM Bug #20960: ceph_test_rados: mismatched version (due to pg import/export)
- second write to teh object sets uv482...
- 06:09 PM Bug #20960 (Can't reproduce): ceph_test_rados: mismatched version (due to pg import/export)
- ...
- 07:20 PM Bug #20947 (Resolved): OSD and mon scrub cluster log messages are too verbose
- 09:48 AM Bug #20947 (Pending Backport): OSD and mon scrub cluster log messages are too verbose
- 07:20 PM Backport #20961 (Resolved): luminous: OSD and mon scrub cluster log messages are too verbose
- Manually cherry-picked to luminous branch.
- 06:32 PM Backport #20961 (Resolved): luminous: OSD and mon scrub cluster log messages are too verbose
- 06:34 PM Backport #20965 (Resolved): luminous: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= l...
- https://github.com/ceph/ceph/pull/17197
- 06:19 PM Bug #20958: missing set lost during upgrade
- 06:14 PM Bug #20958: missing set lost during upgrade
- 05:47 PM Bug #20958: missing set lost during upgrade
- 04:17 PM Bug #20958: missing set lost during upgrade
- It looks like a bug in the jewel->luminous conversion:
* jewel doesn't save the missing set
* luminous detects th... - 02:12 PM Bug #20958: missing set lost during upgrade
- osd.3 send empty msising to primary at...
- 01:50 PM Bug #20958 (Resolved): missing set lost during upgrade
- pg 4.3...
- 05:46 PM Bug #18209 (Pending Backport): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queu...
- 12:00 PM Bug #20888 (Fix Under Review): "Health check update" log spam
- https://github.com/ceph/ceph/pull/16942
- 11:54 AM Feature #20956: Include front/back interface names in OSD metadata
- https://github.com/ceph/ceph/pull/16941
- 11:52 AM Feature #20956 (Resolved): Include front/back interface names in OSD metadata
- This information is needed by anyone who has a TSDB/dashboard that wants to correlate their NIC statistics with the u...
- 05:28 AM Bug #20952 (Can't reproduce): Glitchy monitor quorum causes spurious test failure
qa/standalone/mon/misc.sh failed in TEST_mon_features()
http://qa-proxy.ceph.com/teuthology/dzafman-2017-08-08_1...- 02:34 AM Bug #20925 (Resolved): bluestore: bad csum during fsck
08/08/2017
- 10:43 PM Bug #20949 (Resolved): mon: quorum incorrectly believes mon has kraken (not jewel) features
- mon.2 is the last mon to restart:...
- 10:13 PM Bug #20923 (Fix Under Review): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(las...
- https://github.com/ceph/ceph/pull/16924
- 09:10 PM Bug #20863 (Duplicate): CRC error does not mark PG as inconsistent or queue for repair
- 06:37 PM Bug #20863: CRC error does not mark PG as inconsistent or queue for repair
- This will be available in Luminous, see http://tracker.ceph.com/issues/19657
- 06:57 PM Bug #20947: OSD and mon scrub cluster log messages are too verbose
- https://github.com/ceph/ceph/pull/16916
- 06:56 PM Bug #20947 (Resolved): OSD and mon scrub cluster log messages are too verbose
- ...
- 06:43 PM Bug #20875 (Duplicate): mon segv during shutdown
- 06:16 PM Bug #20645: bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- 06:00 PM Bug #20944 (Fix Under Review): OSD metadata 'backend_filestore_dev_node' is "unknown" even for si...
- https://github.com/ceph/ceph/pull/16913
- 01:17 PM Bug #20944: OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple deployment
- Should have also said: bluestore was populating its bluestore_bdev_dev_node correctly on the same server and drive --...
- 01:16 PM Bug #20944 (Resolved): OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple dep...
OSD created using ceph-deploy "ceph-deploy osd create --filestore", metadata after starting up is:...- 03:41 PM Bug #19881 (Can't reproduce): ceph-osd: pg_update_log_missing(1.20 epoch 66/11 rep_tid 1493 entri...
- 03:39 PM Bug #20116 (Can't reproduce): osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILE...
- 03:39 PM Bug #20188 (Can't reproduce): filestore: os/filestore/FileStore.h: 357: FAILED assert(q.empty()) ...
- 03:39 PM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
- 03:35 PM Bug #20543: osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry found") in PGLog::re...
- Probably the incorrectly-assessed "out-of-order" op numbers.
- 03:35 PM Bug #20543 (Can't reproduce): osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry fo...
- 03:33 PM Bug #20626 (Can't reproduce): failed to become clean before timeout expired, pgs stuck unknown
- 01:58 PM Bug #20925: bluestore: bad csum during fsck
- https://github.com/ceph/ceph/pull/16900
- 01:19 PM Bug #20925: bluestore: bad csum during fsck
- deferred writes are completing out of order. this is fallout from ca32d575eb2673737198a63643d5d1923151eba3.
08/07/2017
- 10:43 PM Bug #20919 (Fix Under Review): osd: replica read can trigger cache promotion
- https://github.com/ceph/ceph/pull/16884
- 10:32 PM Bug #20939 (Fix Under Review): crush weight-set + rm-device-class segv
- https://github.com/ceph/ceph/pull/16883
- 08:49 PM Bug #20939 (Resolved): crush weight-set + rm-device-class segv
- Although that is probably just one of many problems; weight-set and device classes don't play well together.
- 07:49 PM Bug #20920 (Pending Backport): pg dump fails during point-to-point upgrade
- 07:02 PM Bug #20933 (Closed): All mon nodes down when i use ceph-disk prepare a new osd.
- Sage thinks this has been fixed ("[12:02:12] <sage> oh, it was a problem with the reusing osd ids"). Please update t...
- 07:00 PM Bug #20933: All mon nodes down when i use ceph-disk prepare a new osd.
- Apparently this is the result of a typo: https://www.spinics.net/lists/ceph-users/msg37317.html
But I'm not sure t... - 09:07 AM Bug #20933 (Closed): All mon nodes down when i use ceph-disk prepare a new osd.
- ceph version 12.1.0 (262617c9f16c55e863693258061c5b25dea5b086) luminous (dev)
when "ceph-disk prepare --bluestore ... - 04:51 PM Bug #20923: ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- Sage Weil wrote:
> [...]
> This object is larger than 32bits (4gb), which bluestore does not allow/support. Why ar... - 04:36 PM Bug #20923: ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- ...
- 01:44 PM Bug #20923: ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- Sage Weil wrote:
> can you reproduce with debug bluestore = 1/30 and attach the resulting log?
Here it comes (obj... - 01:21 AM Bug #20923 (Need More Info): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last ...
- can you reproduce with debug bluestore = 1/30 and attach the resulting log?
- 03:19 PM Bug #20922: misdirected op with localize_reads set
- Well, the issue is not immediately apparent, but _calc_target() is pretty complicated and we're feeding in a not-tota...
- 02:28 PM Bug #20475 (Resolved): EPERM: cannot set require_min_compat_client to luminous: 6 connected clien...
- 02:27 PM Backport #20639 (Resolved): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 con...
- 08:22 AM Tasks #20932 (New): run rocksdb's env_test with our BlueRocksEnv
- 07:41 AM Backport #20930 (Rejected): kraken: assert(i->prior_version == last) when a MODIFY entry follows ...
- 01:16 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-08-06_16:51:13-rados-wip-sage-testing2-20170806a-distro-basic-smithi/1490528
08/06/2017
- 07:08 PM Bug #19191 (Resolved): osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missi...
- 07:06 PM Bug #20925 (Resolved): bluestore: bad csum during fsck
- ...
- 07:05 PM Bug #20924 (Resolved): osd: leaked Session on osd.7
- ...
- 07:03 PM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
- /a/sage-2017-08-06_13:59:55-rados-wip-sage-testing-20170805a-distro-basic-smithi/1490103
seeing a lot of these. - 09:36 AM Bug #20923 (Resolved): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- Running 12.1.1 RC1 OSD:s, currently doing inline migration to BlueStore (ceph osd destroy procedure). Getting these a...
08/05/2017
- 06:23 PM Bug #20922 (New): misdirected op with localize_reads set
- ...
- 05:47 PM Bug #20770: test_pidfile.sh test is failing 2 places
- 05:47 PM Bug #20770: test_pidfile.sh test is failing 2 places
- This is still failing sometimes in TEST_without_pidfile() even after adding a sleep 1.
- 03:32 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I did another test: I did some writes to an object "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" obje...
- 03:30 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I did another test: I did some writes to an object "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" obje...
- 03:34 AM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
- This may be a bluestore bug - the log is so large from bluestore debugging that I haven't had time to properly read i...
- 02:32 AM Bug #20843 (Pending Backport): assert(i->prior_version == last) when a MODIFY entry follows an ER...
- Backport only needed for kraken, jewel does not have error log entries.
- 12:03 AM Bug #20920: pg dump fails during point-to-point upgrade
- Do we have a "legacy" command map that matches the pre-luminous ones? I think we just need to use that for the comman...
08/04/2017
- 10:25 PM Bug #20920 (Resolved): pg dump fails during point-to-point upgrade
- Command failed on smithi021 with status 22: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage...
- 09:03 PM Bug #20919: osd: replica read can trigger cache promotion
- a replica was servicing a read and tried to do a cache promotion:...
- 08:53 PM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
- ...
- 07:23 PM Bug #20561 (Can't reproduce): bluestore: segv in _deferred_submit_unlock from deferred_try_submit...
- 06:20 PM Bug #20904 (Resolved): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on lost-unf...
- 06:40 AM Bug #20904 (Fix Under Review): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on ...
- https://github.com/ceph/ceph/pull/16809
- 12:40 AM Bug #20904 (In Progress): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on lost-...
- Think I found the problem, testing a fix.
- 06:17 PM Bug #20913 (Resolved): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
- ...
- 06:00 PM Bug #18209 (Fix Under Review): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queu...
- https://github.com/ceph/ceph/pull/16828
- 03:56 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- /a/sage-2017-08-04_13:49:55-rbd:singleton-bluestore-wip-sage-testing2-20170803b-distro-basic-mira/1482623...
- 04:04 PM Bug #20295 (Resolved): bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool ...
- 01:59 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
- mon shows very slow progress for ~10 seconds, failing to send lease renewals etc, and triggering an election...
- 01:50 PM Bug #20845 (Resolved): Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- 01:46 PM Bug #20909 (Can't reproduce): Error ETIMEDOUT: crush test failed with -110: timed out during smok...
- ...
- 01:37 PM Bug #20908 (Resolved): qa/standalone/misc failure in TEST_mon_features
- ...
- 01:35 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-08-04_05:23:06-rados-wip-sage-testing-20170803-distro-basic-smithi/1481973
- 08:41 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- Hit the same assert in http://qa-proxy.ceph.com/teuthology/joshd-2017-08-04_06:16:52-rados-wip-20904-distro-basic-smi...
- 07:15 AM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I mean I think it's the condition check "is_present_clone" that
prevent the clone overlap to record the client write... - 04:54 AM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Hi, grep:-)
I finally got what you mean in https://github.com/ceph/ceph/pull/16790..
I agree with you in that "... - 12:58 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- osd.1 in the posted log has pg 1.4 in epoch 26 from the time it first dequeues those operations right up until it cra...
08/03/2017
- 11:52 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- from irc:
<joshd>:
> I'd suggest making rbd diff conservative when it's used with cache pools (if necessary, repo... - 11:40 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- > the reason we are submitting the PR is that, when we do export-diff to an rbd image in a pool with a cache tier poo...
- 11:31 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- The reason we are submitting the PR is that, when we do export-diff to an rbd image in a pool with a cache tier pool,...
- 03:00 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I submitted a pr for this: https://github.com/ceph/ceph/pull/16790
- 02:46 PM Bug #20896 (New): export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Recently, we find that, under some circumstance, in the cache tier, the "HEAD" object's clone_overlap can lose some O...
- 11:44 PM Bug #20798 (In Progress): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- 08:47 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- ...
- 11:28 PM Bug #20871 (In Progress): core dump when bluefs's mkdir returns -EEXIST
- 02:42 PM Bug #20871: core dump when bluefs's mkdir returns -EEXIST
- https://github.com/ceph/ceph/pull/16745/commits/6bb89702c1cae44558480f72c2723f564308f822
- 06:57 PM Bug #20904 (Resolved): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on lost-unf...
- ...
- 06:22 PM Bug #20810 (Resolved): fsck finish with 29 errors in 47.732275 seconds
- 06:22 PM Bug #20844 (Resolved): peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-ove...
- 02:49 PM Bug #20844 (Fix Under Review): peering_blocked_by_history_les_bound on workloads/ec-snaps-few-obj...
- https://github.com/ceph/ceph/pull/16789
- 01:51 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- This appears to be a test problem:
- the thrashosds has 'chance_test_map_discontinuity: 0.5', which will mark an o... - 09:59 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- mon.a.log...
- 09:42 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- ...
- 09:05 AM Documentation #20894 (Resolved): rados manpage does not document "cleanup"
- A user writes:...
- 02:46 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- https://github.com/ceph/ceph/pull/16769
08/02/2017
- 10:46 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
txn Z queues deferred io,...- 09:46 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Kefu Chai wrote:
> checked the actingset and actingbackfill of the PG of the crashed osd using gdb, they are not cha... - 06:23 PM Bug #20888 (Resolved): "Health check update" log spam
- (We've known about this for a while, just need to fix it!)
The health checks for PG related stuff get updated when... - 03:32 PM Bug #20301 (Can't reproduce): "/src/osd/SnapMapper.cc: 231: FAILED assert(r == -2)" in rados
- 03:31 PM Bug #20416 (Need More Info): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgrade...
- 03:29 PM Bug #20616: pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is set but no...
- 03:28 PM Bug #20690 (Need More Info): Cluster status is HEALTH_OK even though PGs are in unknown state
- why can't cephfs be mounted when pgs are unknown?
- 03:25 PM Bug #20791 (Duplicate): crash in operator<< in PrimaryLogPG::finish_copyfrom
- 03:21 PM Bug #20843 (Fix Under Review): assert(i->prior_version == last) when a MODIFY entry follows an ER...
- https://github.com/ceph/ceph/pull/16675
- 03:14 PM Bug #20551 (Duplicate): LOST_REVERT assert during rados bench+thrash in ReplicatedBackend::prepar...
- 03:12 PM Bug #20545 (Duplicate): erasure coding = crashes
- I think this is the same as #20295, which we can now reproduce.
- 03:02 PM Bug #20785 (Need More Info): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgi...
- 02:40 PM Bug #18595 (Resolved): bluestore: allocator fails for 0x80000000 allocations
- 02:31 PM Bug #18595 (Pending Backport): bluestore: allocator fails for 0x80000000 allocations
- 02:40 PM Backport #20884 (Resolved): kraken: bluestore: allocator fails for 0x80000000 allocations
- 02:33 PM Backport #20884 (Resolved): kraken: bluestore: allocator fails for 0x80000000 allocations
- https://github.com/ceph/ceph/pull/13011
- 02:11 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- /a/sage-2017-08-02_01:58:49-rados-wip-sage-testing-distro-basic-smithi/1470073
pg 2.d on [5,1,4] - 01:57 PM Bug #20876: BADAUTHORIZER on mgr, hung ceph tell mon.*
- /a/sage-2017-08-02_01:58:49-rados-wip-sage-testing-distro-basic-smithi/1469949
- 01:57 PM Bug #20876 (Can't reproduce): BADAUTHORIZER on mgr, hung ceph tell mon.*
- ...
- 01:18 PM Bug #20875 (Duplicate): mon segv during shutdown
- ...
- 01:14 PM Bug #20874 (Can't reproduce): osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end()...
- ...
08/01/2017
- 07:47 PM Bug #20810 (Fix Under Review): fsck finish with 29 errors in 47.732275 seconds
- https://github.com/ceph/ceph/pull/16738
- 07:14 PM Bug #20793 (Resolved): osd: segv in CopyFromFinisher::execute in ec cache tiering test
- 07:13 PM Bug #20803 (Resolved): ceph tell osd.N config set osd_max_backfill does not work
- 07:12 PM Bug #20850 (Resolved): osd: luminous osd crashes when older monitor doesn't support set-device-class
- 07:11 PM Bug #20808 (Resolved): osd deadlock: forced recovery
- 07:03 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- ...
- 07:02 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- /a/sage-2017-08-01_15:32:10-rados-wip-sage-testing-distro-basic-smithi/1469176
rados/thrash-erasure-code/{ceph.yam... - 03:03 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- New (hopefully more "mergeable") reproducer: https://github.com/ceph/ceph/pull/16731
- 02:02 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- This job reproduces the issue: http://pulpito.ceph.com/smithfarm-2017-08-01_13:28:09-rbd:singleton-master-distro-basi...
- 01:41 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- Nathan has a teuthology unit to, hopefully, flush this out: https://github.com/ceph/ceph/pull/16728
He also has a ... - 01:38 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- As far as I can tell, the differences seem to simply be the `--io-total`, and in most cases the `--io-size` or number...
- 01:16 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- Any idea how your test case varies from what's in the rbd suite?
- 11:35 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- For clarity's sake: the previous comment lacked the version. This is a recent master build (fa70335); from yesterday,...
- 11:26 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- We've been reproducing this reliably on one of our test clusters.
This is a cluster composed of mostly hdds, 32G R... - 02:53 PM Bug #20845 (In Progress): Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- 02:39 PM Bug #20871 (Resolved): core dump when bluefs's mkdir returns -EEXIST
- ...
- 02:13 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- if osd.1 is down, osd.2 should have started a peering. and repop_queue should be flushed by on_change() in start_peer...
- 12:44 PM Documentation #20867 (Closed): OSD::build_past_intervals_parallel()'s comment is stale
- PG::generate_past_intervals() was removed in 065bb89ca6d85cdab49db1d06c858456c9bbd2c8
- 12:14 PM Backport #20638 (Resolved): kraken: EPERM: cannot set require_min_compat_client to luminous: 6 co...
- 02:35 AM Bug #20242 (Resolved): Make osd-scrub-repair.sh unit test run faster
- https://github.com/ceph/ceph/pull/16513
Moved long running tests into qa/standalone to be run by teuthology instea...
07/31/2017
- 11:18 PM Bug #20784 (Duplicate): rados/standalone/erasure-code.yaml failure
- 09:47 PM Bug #20808 (Fix Under Review): osd deadlock: forced recovery
- https://github.com/ceph/ceph/pull/16712
- 09:03 PM Bug #20808: osd deadlock: forced recovery
- We're holding the pg_map_lock the whole time too, which I don't think is gonna work either (we certainly want to avoi...
- 03:50 PM Bug #20808: osd deadlock: forced recovery
- We use the pg_lock to protect the state field - so looking at this code more closely, the pg lock should be taken in ...
- 07:20 AM Bug #20808: osd deadlock: forced recovery
- Possible fix: https://github.com/ovh/ceph/commit/d92ce63b0f1953852bd1d520f6ad55acc6ce1c07
Does it look reasonable? I... - 08:54 PM Bug #20854 (Duplicate): (small-scoped) recovery_lock being blocked by pg lock holders
- 08:43 PM Bug #20854: (small-scoped) recovery_lock being blocked by pg lock holders
- That's from https://github.com/ceph/ceph/pull/13723, which was 7 days ago.
- 08:43 PM Bug #20854: (small-scoped) recovery_lock being blocked by pg lock holders
- Naively this looks like something else was blocked while holding the recovery_lock, which is a bit scary since that s...
- 03:48 PM Bug #20863 (Duplicate): CRC error does not mark PG as inconsistent or queue for repair
- While testing bitrot detection it was found that even when OSD process has detected CRC mismatch and returned an erro...
- 03:32 PM Bug #20845: Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- http://qa-proxy.ceph.com/teuthology/kchai-2017-07-31_14:22:05-rados-wip-kefu-testing-distro-basic-mira/1465207/teutho...
- 01:22 PM Bug #20845: Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- https://github.com/ceph/ceph/pull/16805
- 01:29 PM Bug #20803 (Fix Under Review): ceph tell osd.N config set osd_max_backfill does not work
- https://github.com/ceph/ceph/pull/16700
- 09:37 AM Bug #20803 (In Progress): ceph tell osd.N config set osd_max_backfill does not work
- OK, looks like this is setting the option (visible in "config show") but not calling the handlers properly (not refle...
- 07:18 AM Bug #19512: Sparse file info in filestore not propagated to other OSDs
- Enabled FIEMAP/SEEK_HOLE in QA here: https://github.com/ceph/ceph/pull/15939
- 02:26 AM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
- https://github.com/ceph/ceph/pull/16677 is posted to help debug this issue.
07/30/2017
07/29/2017
- 06:12 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- osd.1: the osd who sent the out of order reply.4205 without sending the reply.4198 first.
osd.2: the primary osd who... - 02:49 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Greg, i think the "fault on lossy channel, failing" lines are from heartbeat connections, and they are misleading. i ...
- 12:26 AM Bug #20850 (Resolved): osd: luminous osd crashes when older monitor doesn't support set-device-class
- See e.g.:
http://pulpito.ceph.com/joshd-2017-07-28_23:13:34-upgrade:jewel-x-master-distro-basic-smithi/1456505/
...
07/28/2017
- 10:51 PM Bug #20783 (Resolved): osd: leak from do_extent_cmp
- 10:08 PM Bug #20783: osd: leak from do_extent_cmp
- Jason Dillaman wrote:
> *PR*: https://github.com/ceph/ceph/pull/16617
merged - 09:30 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- The line "fault on lossy channel, failing" suggests that the connection you're looking at is lossy. So either it's ta...
- 03:12 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Greg, yeah, that's what it seems to be. but the osd-osd connection is not lossy. so the root cause of this issue is s...
- 01:59 PM Bug #20804 (Resolved): CancelRecovery event in NotRecovering state
- 01:58 PM Bug #20846: ceph_test_rados_list_parallel: options dtor racing with DispatchQueue lockdep -> segv
- all threads:...
- 01:57 PM Bug #20846 (New): ceph_test_rados_list_parallel: options dtor racing with DispatchQueue lockdep -...
- The interesting threads seem to be...
- 01:36 PM Bug #20845 (Resolved): Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- ...
- 01:35 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2017-07-28_04:13:20-rados-wip-sage-testing-distro-basic-smithi/1455364...
- 01:32 PM Bug #20808: osd deadlock: forced recovery
- /a/sage-2017-07-28_04:13:20-rados-wip-sage-testing-distro-basic-smithi/1455266
- 01:21 PM Bug #20844 (Resolved): peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-ove...
- ...
- 11:14 AM Bug #20843 (Resolved): assert(i->prior_version == last) when a MODIFY entry follows an ERROR entry
- We encountered a core dump of ceph-osd. According to the following information from gdb, the problem was that the pri...
- 08:50 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Yes and that doesn't help. None of the osds can start up steadily.
Anyone familiar with the trimming algo of osdma... - 07:11 AM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Can you upgrade to 12.1.1 - that's the latest version?
- 06:38 AM Backport #20781: kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- h3. description
See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa... - 06:37 AM Backport #20780: jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- h3. description
See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa... - 04:15 AM Bug #20810 (Resolved): fsck finish with 29 errors in 47.732275 seconds
- ...
07/27/2017
- 10:40 PM Bug #20808: osd deadlock: forced recovery
- thread 3 has pg lock, tries to take recovry lock. this is old code
thread 87 has recovery lock, trying to take pg... - 10:37 PM Bug #20808 (Resolved): osd deadlock: forced recovery
- ...
- 09:25 PM Bug #20744 (Resolved): monthrash: WRN Manager daemon x is unresponsive. No standby daemons available
- 09:24 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- So is this a timing issue where the lossy connection is dead and a message gets thrown out, but then the second reply...
- 08:02 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i think the root cause is in the messenger layer. in my case, osd.1 is the primary osd. and it expects that its peer ...
- 09:00 PM Bug #20804 (Fix Under Review): CancelRecovery event in NotRecovering state
- https://github.com/ceph/ceph/pull/16638
- 08:56 PM Bug #20804: CancelRecovery event in NotRecovering state
- Easy fix is to make CancelRecovery from NotRecovering a no-op.
Unsure whether this could happen in other states be... - 08:56 PM Bug #20804 (Resolved): CancelRecovery event in NotRecovering state
- ...
- 08:52 PM Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
- Finally I got some clues about the situation I'm facing. Don't know if anyone's still watching this thread.
After ... - 07:52 PM Bug #20784: rados/standalone/erasure-code.yaml failure
- Interestingly, test-erasure-eio.sh passes when run on my build machine using qa/run-standalone.sh
- 01:35 PM Bug #20784: rados/standalone/erasure-code.yaml failure
- /a/sage-2017-07-26_14:40:34-rados-wip-sage-testing-distro-basic-smithi/1447168
- 07:11 PM Bug #20793 (Fix Under Review): osd: segv in CopyFromFinisher::execute in ec cache tiering test
- Appears to be resolved under tracker ticket #20783 [1]
*PR*: https://github.com/ceph/ceph/pull/16617
[1] http:/... - 05:06 PM Bug #20793: osd: segv in CopyFromFinisher::execute in ec cache tiering test
- Perhaps fixed under tracker # 20783 since it didn't repeat under a single run locally nor under teuthology. Going to ...
- 01:26 PM Bug #20793: osd: segv in CopyFromFinisher::execute in ec cache tiering test
- /a/sage-2017-07-26_19:43:32-rados-wip-sage-testing2-distro-basic-smithi/1448238
/a/sage-2017-07-26_19:43:32-rados-wi... - 01:19 PM Bug #20793: osd: segv in CopyFromFinisher::execute in ec cache tiering test
- similar:...
- 01:17 PM Bug #20793 (Resolved): osd: segv in CopyFromFinisher::execute in ec cache tiering test
- ...
- 06:47 PM Bug #20653 (Need More Info): bluestore: aios don't complete on very large writes on xenial
- 03:18 PM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- Those last two failures are due to #20771 fixed by dfab9d9b5d75d0f87053b1a3727f62da72af6c91
I haven't been able to... - 07:39 AM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- This may be a different bug, but it appears to be bluestore causing a rados aio test to time out (with full logs save...
- 07:31 AM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- Seeing the same thing in many jobs in these runs, but not just on xenial. The first one I looked at was trusty - osd....
- 06:37 PM Bug #20803 (Resolved): ceph tell osd.N config set osd_max_backfill does not work
- ...
- 04:34 PM Bug #20798 (Can't reproduce): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- ...
- 03:23 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/yuriw-2017-07-26_16:46:49-rados-wip-yuri-testing3_2017_7_27-distro-basic-smithi/1447634
- 01:32 PM Bug #20693 (Resolved): monthrash has spurious PG_AVAILABILITY etc warnings
- 01:15 PM Bug #20783: osd: leak from do_extent_cmp
- coverity sez...
- 04:46 AM Bug #20783 (Fix Under Review): osd: leak from do_extent_cmp
- *PR*: https://github.com/ceph/ceph/pull/16617
- 07:50 AM Bug #20791 (Duplicate): crash in operator<< in PrimaryLogPG::finish_copyfrom
- OSD logs and coredump are manually saved in /a/joshd-2017-07-26_22:34:59-rados-wip-dup-ops-debug-distro-basic-smithi/...
07/26/2017
- 11:02 PM Bug #20775 (In Progress): ceph_test_rados parameter error
- 12:22 PM Bug #20775: ceph_test_rados parameter error
- https://github.com/ceph/ceph/pull/16590
- 12:21 PM Bug #20775 (Resolved): ceph_test_rados parameter error
- ...
- 06:04 PM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
- problem appears to be the message the mon sent,...
- 06:03 PM Bug #20785 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool...
- ...
- 05:28 PM Bug #20783 (In Progress): osd: leak from do_extent_cmp
- 04:49 PM Bug #20783 (Resolved): osd: leak from do_extent_cmp
- ...
- 05:01 PM Bug #20371 (Resolved): mgr: occasional fails to send beacons (monc reconnect backoff too aggressi...
- 02:28 AM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-25_20:28:21-rados-wip-sage-testing2-distro-basic-smithi/1443641
- 04:51 PM Bug #20784 (Duplicate): rados/standalone/erasure-code.yaml failure
- /a/sage-2017-07-26_14:40:34-rados-wip-sage-testing-distro-basic-smithi/1447168...
- 03:08 PM Backport #20780 (In Progress): jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 03:06 PM Backport #20780: jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/16405
The master version is going through a test run, but I'm confident it won't... - 03:04 PM Backport #20780 (Resolved): jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/16405
- 03:07 PM Backport #20781 (Rejected): kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 03:03 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/16404
- 03:02 PM Bug #20041 (Pending Backport): ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 02:55 PM Bug #20770: test_pidfile.sh test is failing 2 places
- https://github.com/ceph/ceph/pull/16587
- 01:03 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- /me has a core dump now, /me looking.
- 02:37 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i reproduced it by running
fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml objectstore/filesto... - 09:17 AM Bug #20754 (Resolved): osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_misdirecte...
- 02:32 AM Bug #20751 (Resolved): osd_state not updated properly during osd-reuse-id.sh
07/25/2017
- 10:51 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- How do you reproduce it?
- 10:49 PM Bug #20371 (Fix Under Review): mgr: occasional fails to send beacons (monc reconnect backoff too ...
- https://github.com/ceph/ceph/pull/16576
- 10:30 PM Bug #20744: monthrash: WRN Manager daemon x is unresponsive. No standby daemons available
- 10:29 PM Bug #20693 (Fix Under Review): monthrash has spurious PG_AVAILABILITY etc warnings
- https://github.com/ceph/ceph/pull/16575
- 10:21 PM Bug #20751 (Fix Under Review): osd_state not updated properly during osd-reuse-id.sh
- follow-up defensive change: https://github.com/ceph/ceph/pull/16534
- 08:39 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Still everything fine. No new hanging scrub but getting a lot of scrub pg errors which i need to repair manually. Not...
- 07:05 PM Bug #20747 (Resolved): leaked context from handle_recovery_delete
- 07:04 PM Bug #20753 (Resolved): osd/PGLog.h: 1310: FAILED assert(0 == "invalid missing set entry found")
- 05:55 PM Bug #20770 (New): test_pidfile.sh test is failing 2 places
I've seen both of these on Jenkins make check runs.
test_pidfile.sh line 55...- 10:05 AM Bug #19198 (Need More Info): Bluestore doubles mem usage when caching object content
- 10:05 AM Bug #19198: Bluestore doubles mem usage when caching object content
- Update: the unit test in attachment does show that twice the memory is used due to page-alignment inefficiencies. How...
07/24/2017
- 05:50 PM Bug #20734 (Duplicate): mon: leaks caught by valgrind
- Closing this one since it doesn't have the actual allocation traceback.
- 05:04 PM Bug #20739 (Resolved): missing deletes not excluded from pgnls results?
- https://github.com/ceph/ceph/pull/16490
- 04:56 PM Bug #20753 (Fix Under Review): osd/PGLog.h: 1310: FAILED assert(0 == "invalid missing set entry f...
- This is just a bad assert - the missing entry was added by repair....
- 03:08 PM Bug #20759 (Can't reproduce): mon: valgrind detects a few leaks
- From /a/joshd-2017-07-23_23:56:38-rados:verify-wip-20747-distro-basic-smithi/1435050/remote/smithi036/log/valgrind/mo...
- 03:04 PM Bug #20747 (Fix Under Review): leaked context from handle_recovery_delete
- https://github.com/ceph/ceph/pull/16536
- 01:58 PM Bug #20751 (In Progress): osd_state not updated properly during osd-reuse-id.sh
- Hmm, we should also ensure that UP is cleared when doing the destroy, since existing clusters may have osds that !EXI...
- 01:57 PM Bug #20751 (Resolved): osd_state not updated properly during osd-reuse-id.sh
- 02:04 AM Bug #20751 (Fix Under Review): osd_state not updated properly during osd-reuse-id.sh
- https://github.com/ceph/ceph/pull/16518
- 01:43 PM Bug #20693: monthrash has spurious PG_AVAILABILITY etc warnings
- Ok, I've addressed one soruce of this, but there is another, see
/a/sage-2017-07-24_03:44:49-rados-wip-sage-testin... - 11:41 AM Bug #20750 (Resolved): ceph tell mgr fs status: Row has incorrect number of values, (actual) 5!=6...
- 02:37 AM Bug #20754 (Fix Under Review): osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_mi...
- https://github.com/ceph/ceph/pull/16519
- 02:35 AM Bug #20754: osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_misdirected_ops)
- the pg was split in e80:...
- 02:35 AM Bug #20754 (Resolved): osd/PrimaryLogPG.cc: 1845: FAILED assert(!cct->_conf->osd_debug_misdirecte...
- ...
07/23/2017
- 07:08 PM Bug #20753 (Resolved): osd/PGLog.h: 1310: FAILED assert(0 == "invalid missing set entry found")
- ...
- 02:27 AM Bug #20751 (Resolved): osd_state not updated properly during osd-reuse-id.sh
- when running osd-reuse-id.sh via teuthology i reliably fail an assert about all osds support the stateful mon subscri...
- 02:12 AM Bug #20750 (Resolved): ceph tell mgr fs status: Row has incorrect number of values, (actual) 5!=6...
- ...
07/22/2017
- 06:06 PM Bug #20747 (Resolved): leaked context from handle_recovery_delete
- ...
- 03:22 AM Bug #20744 (Resolved): monthrash: WRN Manager daemon x is unresponsive. No standby daemons available
- /a/sage-2017-07-21_21:27:50-rados-wip-sage-testing-distro-basic-smithi/1427732 for latest example.
The problem app...
07/21/2017
- 08:23 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Currently it looks good. Will wait until monday to be sure.
- 08:13 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 05:20 PM Bug #20684 (Resolved): pg refs leaked when osd shutdown
- 04:43 PM Bug #20684: pg refs leaked when osd shutdown
- Honggang Yang wrote:
> https://github.com/ceph/ceph/pull/16408
merged - 04:27 PM Bug #20739 (Resolved): missing deletes not excluded from pgnls results?
- ...
- 04:00 PM Bug #20667 (Resolved): segv in cephx_verify_authorizing during monc init
- 03:59 PM Bug #20704 (Resolved): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- 02:38 PM Bug #20371 (Need More Info): mgr: occasional fails to send beacons (monc reconnect backoff too ag...
- all suites end up getting stuck for quite a while (enough to trigger the cutoff for a laggy/down mgr) somewhere durin...
- 02:35 PM Bug #20624 (Duplicate): cluster [WRN] Health check failed: no active mgr (MGR_DOWN)" in cluster log
- 02:10 PM Bug #19790: rados ls on pool with no access returns no error
- No worries, thanks for the update!
- 11:31 AM Bug #20705 (Resolved): repair_test fails due to race with osd start
- 07:37 AM Backport #20723 (In Progress): jewel: rados ls on pool with no access returns no error
- 06:22 AM Bug #20397 (Resolved): MaxWhileTries: reached maximum tries (105) after waiting for 630 seconds f...
- 06:22 AM Backport #20497 (Resolved): kraken: MaxWhileTries: reached maximum tries (105) after waiting for ...
- 03:50 AM Bug #20734 (Duplicate): mon: leaks caught by valgrind
- ...
07/20/2017
- 11:47 PM Bug #20545: erasure coding = crashes
- Trying to reproduce this issue in my lab
- 11:20 PM Bug #18209 (Need More Info): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue....
- Zheng, what's the source for this bug? Any updates?
- 10:52 PM Bug #19790: rados ls on pool with no access returns no error
- Looks like we may have set the wrong state on this tracker and therefore overlooked it for the purposes of backportin...
- 08:26 PM Bug #19790 (Pending Backport): rados ls on pool with no access returns no error
- 08:03 PM Bug #19790: rados ls on pool with no access returns no error
- Thanks a lot for the fix in master/luminous, taking the liberty to follow up on this one — looks like the backport to...
- 08:52 PM Bug #20730: need new OSD_SKEWED_USAGE implementation
- see https://github.com/ceph/ceph/pull/16461
- 08:51 PM Bug #20730 (New): need new OSD_SKEWED_USAGE implementation
- I've removed the OSD_SKEWED_USAGE implementation because it isn't smart enough:
1. It doesn't understand different... - 08:30 PM Bug #20704 (Fix Under Review): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- https://github.com/ceph/ceph/pull/16459
- 08:08 PM Bug #20704: osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- This was a bug in persisting the missing state during split. Building a fix.
- 07:48 PM Bug #20704 (In Progress): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- Found a bug in my ceph-objectstore-tool change that could cause this, seeing if it did in this case.
- 03:26 PM Bug #20704 (Resolved): osd/PGLog.h: 1204: FAILED assert(missing.may_include_deletes)
- ...
- 08:28 PM Backport #20723 (Resolved): jewel: rados ls on pool with no access returns no error
- https://github.com/ceph/ceph/pull/16473
- 08:28 PM Backport #20722 (Rejected): kraken: rados ls on pool with no access returns no error
- 03:58 PM Bug #20667 (Fix Under Review): segv in cephx_verify_authorizing during monc init
- https://github.com/ceph/ceph/pull/16455
I think we *also* need to fix the root cause, though, in commit bf49385679... - 03:25 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- this time with a core...
- 02:52 AM Bug #20667: segv in cephx_verify_authorizing during monc init
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419306
/a/sage-2017-07-19_15:27:16-rados-wi... - 03:42 PM Bug #20705 (Fix Under Review): repair_test fails due to race with osd start
- https://github.com/ceph/ceph/pull/16454
- 03:40 PM Bug #20705 (Resolved): repair_test fails due to race with osd start
- ...
- 03:40 PM Feature #15835: filestore: randomize split threshold
- I spoke too soon, there is significantly improved latency and throughput in longer running tests on several osds.
- 02:54 PM Bug #19939 (Resolved): OSD crash in MOSDRepOpReply::decode_payload
- 02:34 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
- /a/kchai-2017-07-20_03:05:27-rados-wip-kefu-testing-distro-basic-mira/1422161
$ zless remote/mira104/log/ceph-osd.... - 02:53 AM Bug #20694 (Can't reproduce): osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_lo...
- ...
- 10:09 AM Bug #20690: Cluster status is HEALTH_OK even though PGs are in unknown state
- This log excerpt illustrates the problem: https://paste2.org/cne4IzG1
The logs starts immediately after cephfs dep... - 04:54 AM Bug #20645: bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- sorry for not post the version, the assert occured in v12.0.2. maybe its similar with #18054, but i think they are di...
- 03:02 AM Bug #20105 (Resolved): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 failure
- 03:01 AM Bug #20371: mgr: occasional fails to send beacons (monc reconnect backoff too aggressive?)
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419525
- 02:51 AM Bug #20693 (Resolved): monthrash has spurious PG_AVAILABILITY etc warnings
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419393
no osd thrashing, but not fully pe... - 02:49 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419390
07/19/2017
- 09:29 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Updatet two of my clusters - will report back. Thanks again.
- 06:11 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Yes i'm - builing right now. But it will take some time to publish that one to the clusters.
- 07:59 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
- 07:53 PM Bug #19971: osd: deletes are performed inline during pg log processing
- merged https://github.com/ceph/ceph/pull/15952
- 06:32 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- /a/yuriw-2017-07-18_19:38:33-rados-wip-yuri-testing3_2017_7_19-distro-basic-smithi/1413393
/a/yuriw-2017-07-18_19:38... - 03:46 PM Bug #20667: segv in cephx_verify_authorizing during monc init
- Another instance, this time jewel:...
- 05:55 PM Bug #20684: pg refs leaked when osd shutdown
- Nice debugging and presentation of your analysis! That's my favorite kind of bug report!
- 03:11 PM Bug #20684 (Fix Under Review): pg refs leaked when osd shutdown
- 03:12 AM Bug #20684: pg refs leaked when osd shutdown
- https://github.com/ceph/ceph/pull/16408
- 03:08 AM Bug #20684 (Resolved): pg refs leaked when osd shutdown
- h1. 1. summary
When kicking a pg, its ref count is great than 1, this cause assert failed.
When osd is in proce... - 04:54 PM Bug #20690 (Need More Info): Cluster status is HEALTH_OK even though PGs are in unknown state
- In an automated test, we see PGs in unknown state, yet "ceph -s" reports HEALTH_OK. The test sees HEALTH_OK and proce...
- 03:16 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- can you retset on current master? this is pretty old code. please reopen if the bug is still present.
- 03:16 PM Support #20648 (Closed): odd osd acting set
- You have three hosts and want to replicate across those domains. It can't do that when one host goes down, so it's do...
- 03:02 PM Bug #20666 (Resolved): jewel -> luminous upgrade doesn't update client.admin mgr cap
- 01:28 PM Bug #19939 (Fix Under Review): OSD crash in MOSDRepOpReply::decode_payload
- https://github.com/ceph/ceph/pull/16421
- 11:55 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- occasionally, i see ...
- 11:15 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- MSODRepOpReply is always sent by OSD.
core dump from osd.1... - 12:49 PM Bug #19605 (New): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- i can reproduce this...
- 03:04 AM Bug #20243 (Fix Under Review): Improve size scrub error handling and ignore system attrs in xattr...
- 02:39 AM Bug #20646: run_seed_to_range.sh: segv, tp_fstore_op timeout
- http://pulpito.ceph.com/sage-2017-07-18_16:17:27-rados-master-distro-basic-smithi/
hmm, i think this got fixe din ... - 02:36 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- http://pulpito.ceph.com/sage-2017-07-18_19:06:10-rados-master-distro-basic-smithi/
failed 19/90 - 01:18 AM Feature #15835 (Resolved): filestore: randomize split threshold
- Perf testing is not indicating much benefit, so I'd hold off on backporting this.
07/18/2017
- 10:34 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Stefan A patch for Jewel (current on current jewel branch) is can be found here:
https://github.com/ceph/ceph/pul... - 10:20 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Analysis:
Secondary got scrub map request with scrub_to 1748'25608...- 06:19 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @David
That would be so great! I'm happy to test any patch ;-) - 04:54 PM Bug #20041 (In Progress): ceph-osd: PGs getting stuck in scrub state, stalling RBD
I think I've reproduced this, examining logs.- 09:43 PM Bug #20105 (Fix Under Review): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/0 fa...
- https://github.com/ceph/ceph/pull/16402
- 08:37 PM Feature #20664 (Closed): compact OSD's omap before active
- This exists as leveldb_compact_on_mount. It may not have functioned in all releases but has been present since Januar...
- 12:03 PM Feature #20664 (Closed): compact OSD's omap before active
- current, we have supported mon_compact_on_start. does it make sense to add this feature to OSD.
likes:... - 08:14 PM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- We set it to 1 if the MSODRepOpReply is encoded with features that do not contain SERVER_LUMINOUS.
...which I thin... - 09:07 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i found that the header.version of the MOSDRepOpReply message being decoded was 1. but i am using a vstart cluster fo...
- 05:44 AM Bug #19939: OSD crash in MOSDRepOpReply::decode_payload
- i am able to reproduce this issue using qa/workunits/fs/snaps/untar_snap_rm.sh. but not always...
- 06:04 PM Bug #20666: jewel -> luminous upgrade doesn't update client.admin mgr cap
- 03:34 PM Bug #20666 (Fix Under Review): jewel -> luminous upgrade doesn't update client.admin mgr cap
- https://github.com/ceph/ceph/pull/16395
- 01:23 PM Bug #20666: jewel -> luminous upgrade doesn't update client.admin mgr cap
- Hmm, I suspect the issue is with the bootstrap-mgr keyring. I notice
that when trying a "mgr create" on an upgraded... - 01:22 PM Bug #20666 (Resolved): jewel -> luminous upgrade doesn't update client.admin mgr cap
- ...
- 01:40 PM Bug #20605 (Resolved): luminous mon lacks force_create_pg equivalent
- 01:38 PM Bug #20667 (Resolved): segv in cephx_verify_authorizing during monc init
- ...
- 08:23 AM Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- lower the priority since we haven't spotted it for a while.
- 05:33 AM Bug #20625 (Duplicate): ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
07/17/2017
- 08:10 PM Bug #20653: bluestore: aios don't complete on very large writes on xenial
- ...
- 08:08 PM Bug #20653 (Can't reproduce): bluestore: aios don't complete on very large writes on xenial
- ...
- 03:05 PM Bug #20631 (Resolved): OSD needs restart after upgrade to luminous IF upgraded before a luminous ...
- 02:05 PM Bug #20631: OSD needs restart after upgrade to luminous IF upgraded before a luminous quorum
- 02:05 PM Bug #20605: luminous mon lacks force_create_pg equivalent
- 12:15 PM Bug #20602 (Resolved): mon crush smoke test can time out under valgrind
- 11:12 AM Bug #20625: ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- tried to reproduce on btrfs locally, no luck.
- 03:00 AM Bug #20625: ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- ...
- 02:41 AM Support #20648 (Closed): odd osd acting set
- I have three host.
When I set one of them down.
I got something like this.... - 02:21 AM Bug #20646 (New): run_seed_to_range.sh: segv, tp_fstore_op timeout
- ...
07/16/2017
- 09:41 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Uh, I don't think master branch has this problem. Since "list-snaps"'s result has been moved from ObjectContext::obs....
- 09:24 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- But I'm working on it.
- 08:54 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Sorry, as the related source code has been reconstructed and I haven't test this for the master branch, I can't judge...
- 08:03 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Thanks for the jewel-specific fix. Has the bug been declared fixed in master, though?
- 07:26 AM Backport #17445 (Fix Under Review): jewel: list-snap cache tier missing promotion logic (was: rbd...
- 06:34 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- It seems that ReplicatedPG::do_op's code of "master" branch has been totally reconstructed, so I submitted a pull req...
- 08:09 AM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
it seems like alloc hint equal end of wal-bdev, but the begin of the wal-bdev is still in use...
my wal-bdev si...
07/15/2017
- 07:49 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Jason: *argh* yes this seems to be correct.
So it seems i didn't have any logs. Currently no idea how to generate... - 07:31 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- @Stefan: just for clarification, I believe the gpg-encrypted ceph-post-file dump was the gcore of the OSD and a Debia...
- 06:38 AM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Hello @david,
the best logs i could produce with level 20 i sent to @Jason Dillaman 2 month ago (pgp encrypted). R... - 07:32 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Definitely sounds like it could be the root-cause to me. Thanks for the investigation help.
- 02:48 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- I encountered the same promblem.
I debugged a little, and found that this might have something to do with the "cache... - 02:34 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- I encountered the same promblem.
I debugged a little, and found that this might have something to do with the "cache... - 08:27 AM Bug #20605 (Fix Under Review): luminous mon lacks force_create_pg equivalent
- https://github.com/ceph/ceph/pull/16353
07/14/2017
- 11:01 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- Stefan Priebe wrote:
> Anything i could provide or test? VMs are still crashing every night...
Can you reproduce ... - 09:51 PM Bug #20041: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Based on a the earlier information:
subset_last_update = {
version = 20796861,
epoch = 453051,
...- 08:32 PM Backport #20638 (In Progress): kraken: EPERM: cannot set require_min_compat_client to luminous: 6...
- 08:22 PM Backport #20638 (Need More Info): kraken: EPERM: cannot set require_min_compat_client to luminous...
- Now I'm not sure
- 08:11 PM Backport #20638 (In Progress): kraken: EPERM: cannot set require_min_compat_client to luminous: 6...
- 08:10 PM Backport #20638 (Resolved): kraken: EPERM: cannot set require_min_compat_client to luminous: 6 co...
- https://github.com/ceph/ceph/pull/16342
- 08:31 PM Backport #20639 (In Progress): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 ...
- 08:23 PM Backport #20639 (Need More Info): jewel: EPERM: cannot set require_min_compat_client to luminous:...
- Not sure if the PR really fixes this bug
- 08:12 PM Backport #20639 (In Progress): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 ...
- 08:10 PM Backport #20639 (Resolved): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 con...
- https://github.com/ceph/ceph/pull/16343
- 08:09 PM Bug #20546 (Resolved): buggy osd down warnings by subtree vs crush device classes
- 03:57 PM Bug #20602 (Fix Under Review): mon crush smoke test can time out under valgrind
- 03:52 PM Bug #20602: mon crush smoke test can time out under valgrind
- Valgrind is slow to do the fork and cleanup; that's why we keep timing out. Blame e189f11fcde6829cc7f86894b913bc1a3f...
- 03:31 PM Bug #20602: mon crush smoke test can time out under valgrind
- Valgrind is slow to do the fork and cleanup; that's why we keep timing out. Blame e189f11fcde6829cc7f86894b913bc1a3f...
- 01:57 PM Bug #20602: mon crush smoke test can time out under valgrind
- A simple workaround would be to make a 'mon smoke test crush changes' option and turn it off when using valgrind.. wh...
- 02:55 AM Bug #20602: mon crush smoke test can time out under valgrind
- /a/kchai-2017-07-13_18:13:10-rados-wip-kefu-testing-distro-basic-smithi/1396642
rados/singleton-nomsgr/{all/valgri... - 02:51 AM Bug #20602: mon crush smoke test can time out under valgrind
- /a/sage-2017-07-13_20:38:15-rados-wip-sage-testing-distro-basic-smithi/1397207
that's two consecutive runs for me.. - 03:31 PM Bug #20601 (Duplicate): mon comamnds time out due to pool create backlog w/ valgrind
- ok, the problem is that the fork-based crushtool test is very slow under valgrind (valgrind has to do init/cleanup on...
- 03:23 PM Bug #20601: mon comamnds time out due to pool create backlog w/ valgrind
- It isn't that pool creations are serialized, actually; they are already batched. Maybe valgrind is just making it sl...
- 02:51 AM Bug #20601: mon comamnds time out due to pool create backlog w/ valgrind
- another failure with same cause, different symptom: this time a 'osd out 0' timed out due to a bunch of pool creates....
- 03:19 PM Bug #20475 (Pending Backport): EPERM: cannot set require_min_compat_client to luminous: 6 connect...
- https://github.com/ceph/ceph/pull/16340 merged to master
backports for kraken and jewel:
https://github.com/ceph/... - 02:05 PM Bug #20475 (In Progress): EPERM: cannot set require_min_compat_client to luminous: 6 connected cl...
- 03:04 AM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- ok, smithi083 was (is!) locked by
/home/teuthworker/archive/teuthology-2017-07-13_05:10:02-fs-kraken-distro-basic-... - 02:57 AM Bug #20475: EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look l...
- baddy is...
- 02:56 PM Bug #20631 (Fix Under Review): OSD needs restart after upgrade to luminous IF upgraded before a l...
- https://github.com/ceph/ceph/pull/16341
- 02:42 PM Bug #20631 (Resolved): OSD needs restart after upgrade to luminous IF upgraded before a luminous ...
- If an OSD is upgraded to luminous before the monmap has the luminous feature, it will require to be restarted before ...
- 09:51 AM Fix #20627 (New): Clean config special cases out of common_preinit
- Post-https://github.com/ceph/ceph/pull/16211, we should use set_daemon_default for this:...
- 03:16 AM Bug #20600 (Resolved): 'ceph pg set_full_ratio ...' blocks on luminous
- 03:15 AM Bug #20617 (Resolved): Exception: timed out waiting for mon to be updated with osd.0: 0 < 4724464...
- 03:14 AM Bug #20626 (Can't reproduce): failed to become clean before timeout expired, pgs stuck unknown
- ...
- 02:50 AM Bug #20625 (Duplicate): ceph_test_filestore_idempotent_sequence aborts in run_seed_to_range.sh
- ...
- 02:30 AM Bug #20624 (Duplicate): cluster [WRN] Health check failed: no active mgr (MGR_DOWN)" in cluster log
- mgr.x...
Also available in: Atom