Activity
From 08/17/2019 to 09/15/2019
09/15/2019
- 01:59 PM Bug #41716 (Resolved): LibRadosTwoPoolsPP.ManifestUnset fails
- 01:51 PM Bug #41716: LibRadosTwoPoolsPP.ManifestUnset fails
- This issue is fixed by https://github.com/ceph/ceph/pull/29985
When the error occurs, the following ops are executed... - 03:05 AM Bug #41834 (Resolved): qa: EC Pool configuration and slow op warnings for OSDs caused by recent m...
- See: http://pulpito.ceph.com/pdonnell-2019-09-14_22:39:31-fs-master-distro-basic-smithi/
Recent run of fs suite on...
09/13/2019
- 10:29 PM Feature #41831 (Resolved): tools/rados: allow list objects in a specific pg in a pool
- This one is already present in nautilus.
- 04:41 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- David, can you please take a look at this whenever you get a chance.
- 01:31 PM Bug #41817 (Closed): qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- ...
- 04:40 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
- I'll try to see if I can reproduce this.
- 01:30 PM Bug #41816 (Resolved): Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert inf...
- ...
- 04:37 PM Bug #41735 (Resolved): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- See https://tracker.ceph.com/issues/41735#note-3 and https://github.com/rook/rook/pull/3847/commits/11d3831d742639148...
- 04:29 PM Bug #24531 (Pending Backport): Mimic MONs have slow/long running ops
- 09:09 AM Backport #40993 (Rejected): mimic: Ceph status in some cases does not report slow ops
- backports will be pursued in https://tracker.ceph.com/issues/41741
- 07:54 AM Bug #41758 (Duplicate): Ceph status in some cases does not report slow ops
- 05:13 AM Feature #40420: Introduce an ceph.conf option to disable HEALTH_WARN when nodeep-scrub/scrub flag...
- What is the back port targets for this? I don't see a health mute tracker referenced by any of the commits, but this...
- 01:55 AM Backport #41712 (In Progress): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::reg...
- https://github.com/ceph/ceph/pull/30371
09/12/2019
- 10:38 PM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- Nathan Cutler wrote:
> backport ticket opened prematurely - setting "Need More Info" pending:
>
> 1. opening of P... - 08:19 PM Backport #40993 (Need More Info): mimic: Ceph status in some cases does not report slow ops
- backport ticket opened prematurely - setting "Need More Info" pending:
1. opening of PR fixing the issue in master... - 08:18 PM Backport #40993 (New): mimic: Ceph status in some cases does not report slow ops
- 11:58 AM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- Converting this to track backport from master where the fix is under review.
- 02:03 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- We had to scrap the idea of changing the backend and went for upgrading the OSDs to Bluestore. Our backfilling issue ...
- 01:58 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- David:
Did you run into a solution for this? We're seeing similar issues but the only possible alternative seems ... - 08:32 AM Backport #41785 (Resolved): nautilus: Make dumping of reservation info congruent between scrub an...
- https://github.com/ceph/ceph/pull/31444
- 05:41 AM Backport #41764 (In Progress): nautilus: TestClsRbd.sparsify fails when using filestore
- https://github.com/ceph/ceph/pull/30354
- 02:24 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
- http://pulpito.ceph.com/nojha-2019-09-06_14:33:54-rados:singleton-wip-41385-3-distro-basic-smithi/ - this is where I ...
- 01:22 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
Reproduced several times with debug_ms = 20
http://pulpito.ceph.com/dzafman-2019-09-11_15:28:37-rados-wip-zafman...- 01:21 AM Bug #41735: pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- sorry I missed that...
09/11/2019
- 10:28 PM Bug #41735 (Fix Under Review): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- Rook should probably set this option explicitly, since it is working with nautilus and we won't backport this (or the...
- 09:29 PM Bug #41735 (Need More Info): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- can you attach the 'ceph health detail' output so i can see which warning it's throwing?
- 09:33 PM Bug #41669 (Pending Backport): Make dumping of reservation info congruent between scrub and recovery
- 09:11 PM Bug #41680 (Won't Fix): Removed OSDs with outstanding peer failure reports crash the monitor
- OSD failure reports will die out on their own eventually and there's no general reason to expect a removed OSD was in...
- 09:11 PM Bug #41639 (Rejected): mon/MgrMonitor: enable pg_autoscaler by default for nautilus
- 09:10 PM Bug #41693 (Need More Info): a accidental problems with osd detection algorithm in monitor
- Can you explain in more detail exactly what happened here?
It sounds like you have three hosts with colocated OSDs... - 09:08 PM Bug #41718 (Fix Under Review): ceph osd stat JSON output incomplete
- 03:28 PM Bug #41758 (Fix Under Review): Ceph status in some cases does not report slow ops
- 01:13 PM Bug #41758: Ceph status in some cases does not report slow ops
- After applying the fix, health warning pertaining to slow ops show up as shown below,...
- 12:57 PM Bug #41758: Ceph status in some cases does not report slow ops
- PR https://github.com/ceph/ceph/pull/30337 addresses this issue.
- 09:29 AM Bug #41758 (Duplicate): Ceph status in some cases does not report slow ops
- In cases when only osds report slow ops, it is observed that ceph summary status doesn't report the same. This issue ...
- 01:28 PM Backport #41764 (Resolved): nautilus: TestClsRbd.sparsify fails when using filestore
- https://github.com/ceph/ceph/pull/30354
- 09:14 AM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- Further to my findings earlier, I confirmed that the "reported" flag is being reset in case ONLY an osd daemon report...
- 04:08 AM Bug #41754 (New): Use dump_stream() instead of dump_float() for floats where max precision isn't ...
Some examples from osd dump are below. The full_ratio is .95, backfill_ratio .90 and nearfull_ratio .85.
<pre...- 01:25 AM Bug #41661 (Resolved): radosbench_omap_write cleanup slow/stuck
- 12:25 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
- 12:24 AM Bug #41743 (In Progress): Long heartbeat ping times on front interface seen, longest is 2237.999 ...
09/10/2019
- 10:42 PM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
- The only OSDs involved are osd.6 and osd.0.
Slow heartbeat ping on front interface from osd.6 to osd.0 2237.999 ms... - 12:12 PM Bug #41743 (Resolved): Long heartbeat ping times on front interface seen, longest is 2237.999 mse...
- "2019-09-09T22:25:11.794749+0000 mon.b (mon.0) 389 : cluster [WRN] Health check failed: Long heartbeat ping times on ...
- 08:21 PM Bug #41661 (Fix Under Review): radosbench_omap_write cleanup slow/stuck
- 07:54 PM Bug #41661: radosbench_omap_write cleanup slow/stuck
- Clearly, filestore-xfs.yaml is the one failing consistently.
See http://pulpito.ceph.com/nojha-2019-09-09_23:22:30... - 05:03 PM Backport #40082 (In Progress): luminous: osd: Better error message when OSD count is less than os...
- 02:59 PM Bug #41748 (Can't reproduce): log [ERR] : 7.19 caller_ops.size 62 > log size 61
- ...
- 08:27 AM Bug #41721 (Pending Backport): TestClsRbd.sparsify fails when using filestore
- 06:45 AM Backport #41640 (In Progress): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0...
- 06:36 AM Backport #41530 (Resolved): mimic: doc: mon_health_to_clog_* values flipped
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30227
m... - 06:34 AM Backport #41532 (Resolved): luminous: Move bluefs alloc size initialization log message to log le...
- 06:32 AM Backport #38551: luminous: core: lazy omap stat collection
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29190
m... - 05:42 AM Backport #41703 (In Progress): nautilus: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30278
- 03:55 AM Backport #41704 (In Progress): mimic: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30275
- 01:01 AM Bug #41735 (Resolved): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- Old pools have auto_scale on and ceph health still shows HEALTH_WARN (20 < 30)...
09/09/2019
- 11:38 PM Bug #41661: radosbench_omap_write cleanup slow/stuck
- The current timeout (config.get('time', 360) * 30 + 300 = 300*30 + 300) of 9300 seconds is not enough to clean up the...
- 10:25 PM Feature #38136 (Resolved): core: lazy omap stat collection
- 10:25 PM Backport #38551 (Resolved): luminous: core: lazy omap stat collection
- 09:45 PM Bug #41601: oi(object_info_t).size does not match on disk size
- Greg Farnum wrote:
> Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm ... - 08:20 PM Bug #41601: oi(object_info_t).size does not match on disk size
- Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will ...
- 09:35 PM Backport #41731 (Need More Info): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
- note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu...
- 07:39 PM Backport #41731 (Rejected): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
- 09:34 PM Backport #41732 (Need More Info): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_...
- 09:33 PM Backport #41732: mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fro...
- note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu...
- 07:39 PM Backport #41732 (Rejected): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missin...
- 09:33 PM Backport #41730 (Need More Info): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
- note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu...
- 07:39 PM Backport #41730 (Resolved): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
- https://github.com/ceph/ceph/pull/31855
- 09:03 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- Nathan Cutler wrote:
> @Neha - backport all three PRs?
Yes, note that the backport of https://github.com/ceph/cep... - 07:41 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- @Neha - backport all three PRs?
- 04:53 PM Bug #41385 (Pending Backport): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.co...
- 08:51 PM Bug #41065 (Closed): new osd added to cluster upgraded from 13 to 14 will down after some days
- It's not clear from these snippets what issue you're actually experiencing. The "bad authorizer" suggests either a cl...
- 08:37 PM Bug #41406: common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient bootstrap
- That's a weird one; perhaps the MonClient should behave differently instead.
(Note that this is a problem only on ... - 04:20 PM Bug #41689 (Resolved): Network ping test fails in TEST_network_ping_test2
- This is a follow on fix for the feature https://tracker.ceph.com/issues/40640. The backport is included as part of t...
- 10:50 AM Bug #41721 (Fix Under Review): TestClsRbd.sparsify fails when using filestore
- 10:24 AM Bug #41721 (Resolved): TestClsRbd.sparsify fails when using filestore
- it's a regression introduced by https://github.com/ceph/ceph/pull/30061
see http://pulpito.ceph.com/kchai-2019-09-...
09/08/2019
- 06:16 PM Bug #41718 (Resolved): ceph osd stat JSON output incomplete
- ...
- 09:22 AM Bug #40583 (Resolved): Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:20 AM Backport #40653 (Resolved): luminous: Lower the default value of osd_deep_scrub_large_omap_object...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29175
m...
09/07/2019
- 08:07 PM Bug #41716 (Resolved): LibRadosTwoPoolsPP.ManifestUnset fails
- ...
- 09:29 AM Backport #41712 (Resolved): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::regist...
- https://github.com/ceph/ceph/pull/30371
- 09:23 AM Backport #41705 (Resolved): nautilus: Incorrect logical operator in Monitor::handle_auth_request()
- https://github.com/ceph/ceph/pull/31038
- 09:23 AM Backport #41704 (Resolved): mimic: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30275
- 09:23 AM Backport #41703 (Resolved): nautilus: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30278
- 09:23 AM Backport #41702 (Rejected): luminous: oi(object_info_t).size does not match on disk size
- 07:45 AM Backport #41697 (In Progress): luminous: Network ping monitoring
- 07:31 AM Backport #41697 (Resolved): luminous: Network ping monitoring
- https://github.com/ceph/ceph/pull/30230
- 07:43 AM Backport #41696 (In Progress): mimic: Network ping monitoring
- 07:31 AM Backport #41696 (Resolved): mimic: Network ping monitoring
- https://github.com/ceph/ceph/pull/30225
- 07:34 AM Backport #41695 (In Progress): nautilus: Network ping monitoring
- 07:31 AM Backport #41695 (Resolved): nautilus: Network ping monitoring
- https://github.com/ceph/ceph/pull/30195
- 02:35 AM Bug #41693 (Need More Info): a accidental problems with osd detection algorithm in monitor
- There is a accidental problems with osd detection algorithm in monitor. In a three-cluster environment,HostA/HostB/Ho...
09/06/2019
- 11:49 PM Backport #41531 (In Progress): nautilus: Move bluefs alloc size initialization log message to log...
- 10:15 PM Backport #41531 (Need More Info): nautilus: Move bluefs alloc size initialization log message to ...
- non-trivial backport - needs https://github.com/ceph/ceph/pull/29537 at least
- 11:38 PM Bug #41385 (Fix Under Review): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.co...
- https://github.com/ceph/ceph/pull/30119 (merged September 4, 2019)
https://github.com/ceph/ceph/pull/30059 (merged S... - 10:21 PM Backport #41533 (In Progress): mimic: Move bluefs alloc size initialization log message to log le...
- 10:14 PM Backport #41533 (Need More Info): mimic: Move bluefs alloc size initialization log message to log...
- non-trivial backport - needs https://github.com/ceph/ceph/pull/29537 at least
- 10:03 PM Backport #41530 (In Progress): mimic: doc: mon_health_to_clog_* values flipped
- 08:01 PM Backport #41499 (Need More Info): mimic: backfill_toofull while OSDs are not full (Unneccessary H...
- The backport needs 3b8f86c8b09b9143d3e25ab34b51057581b48114 to be cherry-picked, first, for it to make sense, but tha...
- 03:34 PM Backport #41499 (In Progress): mimic: backfill_toofull while OSDs are not full (Unneccessary HEAL...
- 07:42 PM Backport #41502 (In Progress): mimic: Warning about past_interval bounds on deleting pg
- 07:03 PM Bug #41689: Network ping test fails in TEST_network_ping_test2
- ...
- 06:37 PM Bug #41689 (Fix Under Review): Network ping test fails in TEST_network_ping_test2
- 06:18 PM Bug #41689 (Resolved): Network ping test fails in TEST_network_ping_test2
- http://pulpito.ceph.com/kchai-2019-09-06_15:05:18-rados-wip-kefu-testing-2019-09-06-1807-distro-basic-smithi/4283774/...
- 05:27 PM Bug #41429 (Pending Backport): Incorrect logical operator in Monitor::handle_auth_request()
- 05:08 PM Bug #38513: luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_pro...
- /a/nojha-2019-09-05_23:53:20-rados-wip-40769-luminous-distro-basic-smithi/4279855/
- 03:29 PM Backport #41490 (In Progress): mimic: OSDCap.PoolClassRNS test aborts
- 03:28 PM Backport #41449 (In Progress): mimic: mon: C_AckMarkedDown has not handled the Callback Arguments
- 01:29 PM Backport #40993 (In Progress): mimic: Ceph status in some cases does not report slow ops
- The logs relating to this tracker didn't indicate anything obvious upon analysis. The issue was reproduced locally on...
- 10:04 AM Bug #41680 (Resolved): Removed OSDs with outstanding peer failure reports crash the monitor
- The osd have been reduced, but reported anomaly information for partner OSD Previously. However, reporters of failure...
- 09:50 AM Bug #41677: Cephmon:fix mon crash
- shuguang wang wrote:
> Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasiona... - 08:45 AM Bug #41677 (Fix Under Review): Cephmon:fix mon crash
- 08:43 AM Bug #41677: Cephmon:fix mon crash
- shuguang wang wrote:
> Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasiona... - 08:42 AM Bug #41677: Cephmon:fix mon crash
- The osd have been reduced, but reported anomaly information for partner OSD Previously. However, failure_info of this...
- 05:34 AM Bug #41677 (Resolved): Cephmon:fix mon crash
- Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasional.
- 05:53 AM Bug #41427 (Resolved): set-chunk raced with deep-scrub
- 05:52 AM Bug #41514 (Resolved): in-flight manifest ops not properly cancelled on interval changing
- 03:17 AM Bug #41601 (Pending Backport): oi(object_info_t).size does not match on disk size
09/05/2019
- 10:30 PM Bug #41657 (Rejected): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_in...
- this is caused by a bug in my test branch
- 09:19 PM Bug #41669: Make dumping of reservation info congruent between scrub and recovery
- 05:47 PM Bug #41669 (Resolved): Make dumping of reservation info congruent between scrub and recovery
Rename dump_reservations to dump_recovery_reservations
Add dump_scrub_reservations- 06:59 PM Feature #40640 (Pending Backport): Network ping monitoring
- 01:50 PM Backport #41447 (In Progress): mimic: osd/PrimaryLogPG: Access destroyed references in finish_deg...
- 01:01 PM Backport #41351 (In Progress): mimic: hidden corei7 requirement in binary packages
- 12:49 PM Backport #41291 (In Progress): mimic: filestore pre-split may not split enough directories
- 12:48 PM Backport #40732 (In Progress): mimic: mon: auth mon isn't loading full KeyServerData after restart
- 12:36 PM Backport #40083 (In Progress): mimic: osd: Better error message when OSD count is less than osd_p...
- 07:25 AM Feature #41666 (Resolved): Issue a HEALTH_WARN when a Pool is configured with [min_]size == 1
- To prevent the user from experiencing data loss, Ceph should issue a health warning if any Pool is configured with a ...
- 12:00 AM Bug #41661 (Resolved): radosbench_omap_write cleanup slow/stuck
- ...
09/04/2019
- 09:34 PM Feature #38458: Ceph does not have command to show current osd primary-affinity
- "ceph osd dump", perhaps with a detail or json formatting, includes that information.
I don't think we have any qu... - 05:49 AM Feature #38458: Ceph does not have command to show current osd primary-affinity
- Greg, what is the exact command ?
- 05:58 PM Bug #41657 (Fix Under Review): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find...
- 05:53 PM Bug #41657: osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_info_ignore_h...
- The find_best_info process excludes getting a master log from an osd with an old(er) last_epoch_started. However, th...
- 05:51 PM Bug #41657 (Rejected): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_in...
- ...
- 01:27 PM Feature #41650 (New): Convert between EC profiles online
- Users have repeatedly voiced the need to convert/modify an EC profile while the cluster was running, in response to c...
- 09:06 AM Feature #41647 (Resolved): pg_autoscaler should show a warning if pg_num isn't a power of two
- As the pg_autoscaler will be automatically turned on with the 14.2.4 release and future releases I would like to enha...
- 03:36 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
- 01:23 AM Bug #40646 (Fix Under Review): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-...
- 12:43 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
- 12:32 AM Bug #38483 (Pending Backport): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_...
09/03/2019
- 09:19 PM Bug #20283: qa: missing even trivial tests for many commands
- Updated script run
'cache drop' has no apparent tests
'cache status' has no apparent tests
'config ls' has no ap... - 09:04 PM Bug #41610 (Rejected): python Rados library does not support mon_host bracketed syntax
- Your version of librados is too old - 0.69 is cuttlefish. For v1/v2 addresses like that, you need nautilus (v14.2.0+)
- 07:26 AM Bug #41610 (Rejected): python Rados library does not support mon_host bracketed syntax
- Ceph Nautilus deployed using ceph-ansible
By default it generates ceph.conf with bracketed mon_host syntax (see ht... - 08:48 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
- Addressing backport-create-issue script complaint:...
- 06:25 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
- Luminous doesn't have the issue!
- 08:40 PM Backport #41640 (Resolved): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0) i...
- https://github.com/ceph/ceph/pull/30280
- 08:37 PM Bug #41639 (Rejected): mon/MgrMonitor: enable pg_autoscaler by default for nautilus
- Only https://github.com/ceph/ceph/pull/30112/commits/23edfd202ec1d98cc8c3d52aaaae1d985417aacf needs to be backported ...
- 08:08 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
- @Harry - since this is a backport ticket (just for tracking the nautilus backport), I copied your comment to the pare...
- 08:07 PM Bug #41330: hidden corei7 requirement in binary packages
- Hi Harry. I don't know about any distros other than openSUSE and SUSE Linux Enterprise. In those distros, there isn't...
- 08:01 PM Bug #41330: hidden corei7 requirement in binary packages
- At https://tracker.ceph.com/issues/41350#note-3 (i.e. in the nautilus backport ticket), Harry Coin wrote:
"The 'si... - 07:16 PM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- yep, dup of #39693
- 06:25 PM Backport #41582 (Rejected): luminous: backfill_toofull seen on cluster where the most full OSD is...
- 02:19 PM Bug #37654 (Pending Backport): FAILED ceph_assert(info.history.same_interval_since != 0) in PG::s...
- 02:04 PM Bug #41601 (Fix Under Review): oi(object_info_t).size does not match on disk size
- 02:33 AM Bug #41601: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30085
- 06:22 AM Bug #40646 (Fix Under Review): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-...
- * https://github.com/ceph/ceph-build/pull/1387
* https://github.com/ceph/ceph/pull/30088
* https://github.com/ceph/... - 01:19 AM Backport #41595 (In Progress): mimic: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30081
- 01:16 AM Backport #41596 (In Progress): nautilus: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30080
09/02/2019
- 02:04 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
- Thanks! The 'silent' requirement that ceph run only on -march=corei7 capable servers killed two ubuntu eoan based sy...
- 01:29 PM Bug #41601 (Resolved): oi(object_info_t).size does not match on disk size
- In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use ...
- 10:09 AM Backport #41597 (Rejected): luminous: ceph-objectstore-tool can't remove head with bad snapset
- 10:09 AM Backport #41596 (Resolved): nautilus: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30080
- 10:09 AM Backport #41595 (Resolved): mimic: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30081
- 10:07 AM Backport #41582 (Need More Info): luminous: backfill_toofull seen on cluster where the most full ...
08/31/2019
- 03:03 AM Bug #38238 (Duplicate): rados/test.sh: api_aio_pp doesn't seem to start
- 02:12 AM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
- ...
- 12:19 AM Bug #41517 (Resolved): Missing head object at primary with snapshots crashes primary
- 12:14 AM Bug #41522 (Pending Backport): ceph-objectstore-tool can't remove head with bad snapset
08/30/2019
- 10:41 PM Bug #41156: dump_float() poor output
Looking at osd dump output in teuthology.log on a test run and I this see output which is ugly:
"full_ratio": ...- 08:34 PM Bug #40522: on_local_recover doesn't touch?
- Failed multiple times: http://pulpito.ceph.com/dzafman-2019-08-28_09:11:55-rados-wip-zafman-testing-distro-basic-smit...
- 06:36 PM Backport #41582: luminous: backfill_toofull seen on cluster where the most full OSD is at 1%
- This bug doesn't exist on Luminous as far as i can tell, I've only ever seen it since Mimic.
- 08:00 AM Backport #41582 (Rejected): luminous: backfill_toofull seen on cluster where the most full OSD is...
- 06:15 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- Have been able to reproduce it here: http://pulpito.ceph.com/nojha-2019-08-28_19:12:09-rados:singleton-master-distro-...
- 04:43 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
- We didn't see this problem on any of our clusters with the 12.2.12 release, so maybe this isn't the fix if a backport...
- 08:01 AM Bug #41200 (Resolved): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory limit
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:01 AM Backport #41584 (Resolved): mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
- https://github.com/ceph/ceph/pull/32361
- 08:00 AM Backport #41583 (Resolved): nautilus: backfill_toofull seen on cluster where the most full OSD is...
- https://github.com/ceph/ceph/pull/29999
08/29/2019
- 11:16 PM Bug #23647: thrash-eio test can prevent recovery
- Several proposals that might improve things:
* from Josh, just turn down the odds
* from Greg, is it plausible to... - 09:24 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- Here's the chain of events that causes this:
Two objects go missing on the primary, and we want to recover them fr... - 06:54 PM Bug #41577: Erasure-Coded storage in bluestore has larger disk usage than expected
- The issue of small object size uses more space seems related to https://tracker.ceph.com/issues/41417
- 06:53 PM Bug #41577 (New): Erasure-Coded storage in bluestore has larger disk usage than expected
- The test is done in ceph 14.2.1
We've tested Erasure Coded storage with the same amount of data, which is 800 GiB.... - 05:55 PM Bug #41429 (Fix Under Review): Incorrect logical operator in Monitor::handle_auth_request()
- 03:36 PM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
- 02:43 PM Bug #37775 (Resolved): some pg_created messages not sent to mon
- 02:40 PM Bug #41517: Missing head object at primary with snapshots crashes primary
- Backporting note:
cherry pick https://github.com/ceph/ceph/pull/27575 first, and then https://github.com/ceph/ceph... - 02:39 PM Bug #39286: primary recovery local missing object did not update obc
- Backports to luminous, mimic, and nautilus are being handled via #41517
- 02:38 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
- Since this introduced a regression in master, I propose to refrain from backporting it separately, but instead backpo...
- 12:35 PM Backport #41568 (In Progress): nautilus: doc: pg_num should always be a power of two
- 08:14 AM Backport #41568 (Resolved): nautilus: doc: pg_num should always be a power of two
- https://github.com/ceph/ceph/pull/30004
- 12:29 PM Backport #41529 (In Progress): nautilus: doc: mon_health_to_clog_* values flipped
- 12:26 PM Bug #39152 (New): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- This is problematic to backport because the "Pull request ID" field is not populated and none of the notes mention a ...
- 11:28 AM Backport #41503 (In Progress): nautilus: Warning about past_interval bounds on deleting pg
- 11:21 AM Backport #41501 (In Progress): nautilus: backfill_toofull while OSDs are not full (Unneccessary H...
- 11:17 AM Backport #41491 (In Progress): nautilus: OSDCap.PoolClassRNS test aborts
- 11:15 AM Backport #41455: nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29745
m... - 11:14 AM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
- 10:47 AM Backport #41453 (In Progress): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
- 10:24 AM Backport #41448 (In Progress): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_...
- 10:20 AM Backport #40889 (Need More Info): luminous: Pool settings aren't populated to OSD after restart.
- non-trivial backport
- 10:20 AM Backport #40890 (Need More Info): mimic: Pool settings aren't populated to OSD after restart.
- non-trivial backport
- 10:20 AM Backport #40891 (Need More Info): nautilus: Pool settings aren't populated to OSD after restart.
- non-trivial backport
- 10:10 AM Bug #40112 (Resolved): mon: rados/multimon tests fail with clock skew
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:09 AM Backport #40228 (Resolved): nautilus: mon: rados/multimon tests fail with clock skew
- backport PR https://github.com/ceph/ceph/pull/28576
merge commit 1bc3cc4aa2588bef0acadcf6ba2703df0312b9b4 (v14.2.2-2... - 10:03 AM Backport #40084 (In Progress): nautilus: osd: Better error message when OSD count is less than os...
- 09:56 AM Backport #39700 (In Progress): nautilus: [RFE] If the nodeep-scrub/noscrub flags are set in pools...
- 09:31 AM Bug #41255 (Pending Backport): backfill_toofull seen on cluster where the most full OSD is at 1%
- 08:59 AM Backport #39682 (In Progress): nautilus: filestore pre-split may not split enough directories
- 08:50 AM Backport #39517 (In Progress): nautilus: Improvements to standalone tests.
- 03:51 AM Bug #38155 (Duplicate): PG stuck in undersized+degraded+remapped+backfill_toofull+peered
- I'm assuming that the fix for 24452 also fixed this issue. So marking duplicate.
- 03:27 AM Bug #39115 (Duplicate): ceph pg repair doesn't fix itself if osd is bluestore
OSD crashes are the underlying issue here and we can't say anything about repair until there aren't any more crashes.- 03:09 AM Documentation #41004 (Pending Backport): doc: pg_num should always be a power of two
08/28/2019
- 09:20 PM Bug #41313: PG distribution completely messed up since Nautilus
- Can you reach out on the ceph-users mailing list to see if others have seen similar issues? We've not seen a specific...
- 09:19 PM Bug #40522: on_local_recover doesn't touch?
I see this as a hang in running standalone tests in particular qa/standalone/osd/divergent-priors.sh. The test han...- 09:13 PM Bug #41336 (Resolved): All OSD Faild after Reboot.
- 09:13 PM Bug #41336: All OSD Faild after Reboot.
- This is fixed in later versions - the monitor makes sure stripe_unit is a valid value when the pool is created. With ...
- 09:12 PM Bug #41336: All OSD Faild after Reboot.
- ...
- 09:03 PM Bug #41526: Choosing the next PG for a deep scrubs wrong.
You never know what what scrubs can run with osd_max_scrubs (especially defaulting to 1). Without looking at which...- 08:44 PM Bug #41385 (In Progress): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(f...
- 08:36 PM Feature #41564 (Resolved): Issue health status warning if num_shards_repaired exceeds some threshold
Now that num_shards_repaired has been added, we can assist in noticing disk, controller, software or other issues b...- 08:27 PM Feature #41563: Add connection reset tracking to Network ping monitoring
Experimental code: https://github.com/dzafman/ceph/tree/wip-network-resets- 08:24 PM Feature #41563 (New): Add connection reset tracking to Network ping monitoring
Record connection resets on front and back interfaces and report with ping times- 08:25 PM Backport #41341 (In Progress): nautilus: "CMake Error" in test_envlibrados_for_rocksdb.sh
- 08:20 PM Bug #41517: Missing head object at primary with snapshots crashes primary
- This was caused by https://github.com/ceph/ceph/pull/27575
- 05:26 PM Bug #41517 (In Progress): Missing head object at primary with snapshots crashes primary
- 06:42 PM Bug #41522 (In Progress): ceph-objectstore-tool can't remove head with bad snapset
- 06:42 PM Backport #38450 (In Progress): mimic: src/osd/OSDMap.h: 1065: FAILED assert(__null != pool)
- 06:36 PM Bug #37775: some pg_created messages not sent to mon
- This patch does not make sense for mimic and luminous.
@Nathan can we please resolve this issue and close the corre... - 06:34 PM Bug #36498 (New): failed to recover before timeout expired due to pg stuck in creating+peering
- I don't think this is a duplicate of https://tracker.ceph.com/issues/37752 or https://tracker.ceph.com/issues/37775 f...
- 06:09 PM Bug #39286: primary recovery local missing object did not update obc
- https://tracker.ceph.com/issues/41517 is a follow on fix for this.
- 11:46 AM Bug #41550 (Fix Under Review): os/bluestore: fadvise_flag leak in generate_transaction
- 08:09 AM Bug #41550: os/bluestore: fadvise_flag leak in generate_transaction
- https://github.com/ceph/ceph/pull/29944
- 08:06 AM Bug #41550 (Resolved): os/bluestore: fadvise_flag leak in generate_transaction
- In generate_transaction when creating ceph::os::Transaction, ObjectOperation::BufferUpdate::Write::fadvise_flag is no...
- 11:13 AM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- ...
- 11:11 AM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27471
m... - 10:11 AM Backport #40638: luminous: osd: report omap/data/metadata usage
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28851
m... - 07:43 AM Backport #41548 (Resolved): nautilus: monc: send_command to specific down mon breaks other mon msgs
- https://github.com/ceph/ceph/pull/31037
- 07:43 AM Backport #41547 (Rejected): luminous: monc: send_command to specific down mon breaks other mon msgs
- 07:42 AM Backport #41546 (Rejected): mimic: monc: send_command to specific down mon breaks other mon msgs
08/27/2019
- 10:23 PM Bug #38416: crc cache should be invalidated when posting preallocated rx buffers
- This is causing lots of failures in luminous/mimic, marking it urgent to get the backports expedited.
- 09:16 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28110
m... - 09:16 PM Backport #39373: luminous: ceph tell osd.xx bench help : gives wrong help
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28112
m... - 09:14 PM Bug #40765 (Duplicate): mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- 09:07 PM Backport #38902: luminous: Minor rados related documentation fixes
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27185
m... - 08:18 PM Backport #41532 (In Progress): luminous: Move bluefs alloc size initialization log message to log...
- 08:46 AM Backport #41532 (Resolved): luminous: Move bluefs alloc size initialization log message to log le...
- https://github.com/ceph/ceph/pull/29910
- 06:19 PM Bug #41522 (Fix Under Review): ceph-objectstore-tool can't remove head with bad snapset
- 04:49 AM Bug #41522 (Resolved): ceph-objectstore-tool can't remove head with bad snapset
We should allow a --force remove of a head object with a bad snapset to remove the object instead of failing.- 05:26 PM Bug #20924: osd: leaked Session on osd.7
- https://github.com/ceph/ceph/pull/29859
- 05:16 PM Bug #41539 (New): luminous: TEST_backfill_remapped fails in above_margin
- ...
- 05:04 PM Bug #38513: luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_pro...
- /a/nojha-2019-08-26_20:27:46-rados-wip-bluefs-shared-alloc-luminous-2019-08-26-distro-basic-smithi/4255358/
- 03:04 PM Feature #41537: MON DNS Lookup for messenger V2
- Jason Dillaman wrote:
> I think v2 over DNS SRV is already handled here [1] and [2].
>
Great, in that case it's... - 03:00 PM Feature #41537: MON DNS Lookup for messenger V2
- I think v2 over DNS SRV is already handled here [1] and [2].
[1] https://github.com/ceph/ceph/blob/master/src/mon/... - 02:43 PM Feature #41537 (New): MON DNS Lookup for messenger V2
- Currently is possible for a client to use DNS SRV records to find the MONs addresses to connect to. But these address...
- 01:20 PM Backport #40650: luminous: os/bluestore: fix >2GB writes
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28965
m... - 01:19 PM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
- Should add one more thing: the only clusters bitten by this issue would be those that, *at any time,* ran the @balanc...
- 01:18 PM Backport #38276: luminous: osd_map_message_max default is too high?
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28640
m... - 01:14 PM Backport #38750: luminous: should report EINVAL in ErasureCode::parse() if m<=0
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28111
m... - 10:52 AM Backport #38719: luminous: crush: choose_args array size mis-sized when weight-sets are enabled
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27085
m... - 10:52 AM Backport #39343: luminous: ceph-objectstore-tool rename dump-import to dump-export
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27636
m... - 10:51 AM Backport #38873: luminous: Rados.get_fsid() returning bytes in python3
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27674
m... - 10:51 AM Backport #39042: luminous: osd/PGLog: preserve original_crt to check rollbackability
- backport PR https://github.com/ceph/ceph/pull/27715
merge commit f7c528dbafcf540ab046de2cd29010113055da5a (v12.2.12-... - 10:51 AM Backport #38905: luminous: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27715
m... - 10:51 AM Backport #39431: luminous: Degraded PG does not discover remapped data on originating OSD
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27751
m... - 10:50 AM Backport #39204: luminous: osd: leaked pg refs on shutdown
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27810
m... - 10:50 AM Backport #39218: luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27878
m... - 10:50 AM Backport #39563: luminous: Error message displayed when mon_osd_max_split_count would be exceeded...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27908
m... - 10:50 AM Backport #39719: luminous: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when la...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28185
m... - 10:14 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28493
m... - 10:12 AM Backport #39420: luminous: Don't mark removed osds in when running "ceph osd in any|all|*"
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27728
m... - 09:50 AM Backport #41534 (In Progress): nautilus: valgrind: UninitCondition in ceph::crypto::onwire::AES12...
- 08:49 AM Backport #41534 (Resolved): nautilus: valgrind: UninitCondition in ceph::crypto::onwire::AES128GC...
- https://github.com/ceph/ceph/pull/29928
- 09:34 AM Bug #40792 (Pending Backport): monc: send_command to specific down mon breaks other mon msgs
- 09:18 AM Bug #41424 (Resolved): readable.sh test fails
- 08:52 AM Bug #22266 (Resolved): mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:46 AM Backport #41533 (Resolved): mimic: Move bluefs alloc size initialization log message to log level 1
- https://github.com/ceph/ceph/pull/30219
- 08:46 AM Backport #41531 (Resolved): nautilus: Move bluefs alloc size initialization log message to log le...
- https://github.com/ceph/ceph/pull/30229
- 08:46 AM Backport #41530 (Resolved): mimic: doc: mon_health_to_clog_* values flipped
- https://github.com/ceph/ceph/pull/30227
- 08:46 AM Backport #41529 (Resolved): nautilus: doc: mon_health_to_clog_* values flipped
- https://github.com/ceph/ceph/pull/30003
- 08:32 AM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
- I have ceph cluster in this state:...
- 07:33 AM Backport #40943: mimic: mon/OSDMonitor.cc: better error message about min_size
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29618
m... - 07:33 AM Backport #41086: mimic: Change default for bluestore_fsck_on_mount_deep as false
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29699
m... - 07:25 AM Backport #39692 (Resolved): mimic: _txc_add_transaction error (39) Directory not empty not handle...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29217
m... - 07:18 AM Backport #40654: mimic: Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29174
m... - 07:18 AM Backport #38552: mimic: core: lazy omap stat collection
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29189
m... - 03:25 AM Bug #41517 (Resolved): Missing head object at primary with snapshots crashes primary
This script crashes osd.1 when it wants to recover to osd.3 after osd.2 is marked out. When it sees the missing "o...- 01:22 AM Bug #41406: common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient bootstrap
- Patrick Donnelly wrote:
> Does any code actually do that sequence of events. I would think a SafeTimer should not be... - 12:47 AM Bug #41514: in-flight manifest ops not properly cancelled on interval changing
- http://pulpito.ceph.com/xxg-2019-08-25_02:12:25-rados:thrash-wip-inc-recovery-5-distro-basic-smithi/4250539/
- 12:36 AM Bug #41514 (Resolved): in-flight manifest ops not properly cancelled on interval changing
- which as a result makes PrimaryLogPG::on_flushed() unhappy:...
08/26/2019
- 10:52 PM Bug #40721 (Need More Info): backfill caught in loop from block
- 10:51 PM Bug #40721: backfill caught in loop from block
- I don't think I can make further progress without more logs, I'm marking this need more info for the time being. As ...
- 09:29 PM Bug #40721: backfill caught in loop from block
- Based on the snapcontext, make_writeable should have created a clone.
- 09:29 PM Bug #40721: backfill caught in loop from block
- The copy_from on that object lasted until the end of the test. It did succeed, but presumably during shutdown once t...
- 08:36 PM Bug #40721: backfill caught in loop from block
- Or, I guess the directory is probably correct in that the teuthology.log output is consistent with the above, but the...
- 08:01 PM Bug #40721: backfill caught in loop from block
- Unfortunately, I think the job number is wrong -- I don't see that object in the log (smithi19817795-* objects are in...
- 09:48 PM Bug #41362 (Fix Under Review): Rados bench sequential and random read: not behaving as expected w...
- 09:18 PM Bug #24057 (Can't reproduce): cbt fails to copy results to the archive dir
- 09:10 PM Support #41402 (Rejected): OSD's memory are beyound controlled
- Please seek help on ceph-users mailing list. This is not the correct forum to seek support.
- 09:10 PM Documentation #41403 (Pending Backport): doc: mon_health_to_clog_* values flipped
- 09:09 PM Documentation #41403 (Resolved): doc: mon_health_to_clog_* values flipped
- 09:08 PM Documentation #41403 (Fix Under Review): doc: mon_health_to_clog_* values flipped
- 09:06 PM Bug #41406 (Need More Info): common: SafeTimer reinit doesn't fix up "stopping" bool, used in Mon...
- Does any code actually do that sequence of events. I would think a SafeTimer should not be re-inited after shutdown.
- 08:41 PM Bug #37775: some pg_created messages not sent to mon
- The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709...
- 08:40 PM Backport #39475: mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28206
m... - 08:38 PM Backport #40651: mimic: os/bluestore: fix >2GB writes
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28967
m... - 08:37 PM Bug #40720 (Resolved): mimic, nautilus: make bitmap allocator the default allocator for bluestore
- 08:35 PM Backport #38751: mimic: should report EINVAL in ErasureCode::parse() if m<=0
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28995
m... - 08:35 PM Backport #39513: mimic: osd: segv in _preboot -> heartbeat
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28220
m... - 08:28 PM Backport #39311: mimic: crushtool crash on Fedora 28 and newer
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27986
m... - 08:28 PM Backport #39720: mimic: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28089
m... - 08:28 PM Backport #39374: mimic: ceph tell osd.xx bench help : gives wrong help
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28097
m... - 08:28 PM Backport #39422: mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28142
m... - 08:27 PM Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28201
m... - 08:17 PM Backport #40639: mimic: osd: report omap/data/metadata usage
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28852
m... - 07:58 PM Bug #41399 (Pending Backport): Move bluefs alloc size initialization log message to log level 1
- 07:33 PM Bug #38827 (Pending Backport): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWir...
- seeing this in the rgw suite for nautilus runs, so tagging for backport of https://github.com/ceph/ceph/pull/28305
- 02:59 PM Backport #41503 (Resolved): nautilus: Warning about past_interval bounds on deleting pg
- https://github.com/ceph/ceph/pull/30000
- 02:59 PM Backport #41502 (Resolved): mimic: Warning about past_interval bounds on deleting pg
- https://github.com/ceph/ceph/pull/30222
- 02:59 PM Backport #41501 (Resolved): nautilus: backfill_toofull while OSDs are not full (Unneccessary HEAL...
- https://github.com/ceph/ceph/pull/29999
- 02:58 PM Backport #41500 (Rejected): luminous: backfill_toofull while OSDs are not full (Unneccessary HEAL...
- 02:58 PM Backport #41499 (Rejected): mimic: backfill_toofull while OSDs are not full (Unneccessary HEALTH_...
- 02:51 PM Backport #41491 (Resolved): nautilus: OSDCap.PoolClassRNS test aborts
- https://github.com/ceph/ceph/pull/29998
- 02:50 PM Backport #41490 (Resolved): mimic: OSDCap.PoolClassRNS test aborts
- https://github.com/ceph/ceph/pull/30214
- 02:42 PM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
- https://github.com/ceph/ceph/pull/29745
- 02:41 PM Backport #41453 (Resolved): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
- https://github.com/ceph/ceph/pull/29997
- 02:25 PM Backport #41449 (Resolved): mimic: mon: C_AckMarkedDown has not handled the Callback Arguments
- https://github.com/ceph/ceph/pull/30213
- 02:25 PM Backport #41448 (Resolved): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_deg...
- https://github.com/ceph/ceph/pull/29994
- 02:25 PM Backport #41447 (Resolved): mimic: osd/PrimaryLogPG: Access destroyed references in finish_degrad...
- https://github.com/ceph/ceph/pull/30291
- 11:23 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
- With thanks to Paul Emmerich in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGY75UVQEAT2SUHHKZC2K...
- 10:59 AM Backport #39698: mimic: OSD down on snaptrim.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28202
m... - 10:57 AM Backport #39518: mimic: snaps missing in mapper, should be: ca was r -2...repaired
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28232
m... - 10:56 AM Backport #39538: mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28259
m... - 10:56 AM Backport #39737: mimic: Binary data in OSD log from "CRC header" message
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28503
m... - 10:56 AM Backport #39744: mimic: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28540
m... - 09:06 AM Bug #41429: Incorrect logical operator in Monitor::handle_auth_request()
- “&&” in the following code snippet:...
- 08:48 AM Bug #41429 (Resolved): Incorrect logical operator in Monitor::handle_auth_request()
- When checking auth_mode against AUTH_MODE_MON and AUTH_MODE_MON_MAX in Monitor::handle_auth_request(),
a logical AND... - 08:59 AM Backport #40948 (Resolved): nautilus: Better default value for osd_snap_trim_sleep
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29678
m... - 08:48 AM Backport #40885 (Resolved): nautilus: ceph mgr module ls -f plain crashes mon
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29566
m... - 08:33 AM Backport #40322: nautilus: nautilus with requrie_osd_release < nautilus cannot increase pg_num
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29671
m... - 07:21 AM Bug #41427 (Resolved): set-chunk raced with deep-scrub
- which as a result cause object info inconsistency:
"2019-08-25T04:04:19.571852+0000 osd.1 (osd.1) 253 : cluster ...
08/25/2019
- 01:44 PM Bug #41424 (Fix Under Review): readable.sh test fails
- 10:38 AM Bug #41424 (Resolved): readable.sh test fails
- ...
- 03:18 AM Documentation #41403: doc: mon_health_to_clog_* values flipped
- Verified on nautilus (ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)) that the defa...
08/24/2019
08/23/2019
- 11:45 PM Backport #24360: luminous: osd: leaked Session on osd.7
- https://github.com/ceph/ceph/pull/29859
- 08:31 AM Bug #41406 (New): common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient boot...
- 1, New a object of SafeTimer().
2, call init.
3, call add_event_after.
4, call shutdown.
5, call init again.
6, ... - 05:30 AM Bug #39546 (Pending Backport): Warning about past_interval bounds on deleting pg
- 05:24 AM Bug #41217 (Pending Backport): mon: C_AckMarkedDown has not handled the Callback Arguments
- 05:17 AM Bug #40835 (Pending Backport): OSDCap.PoolClassRNS test aborts
- 02:39 AM Documentation #41403 (Resolved): doc: mon_health_to_clog_* values flipped
- On my Luminous cluster (ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)), the defau...
- 01:34 AM Support #41402 (Rejected): OSD's memory are beyound controlled
- My env :
[store@server01 ~]$ ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
...
08/22/2019
- 07:31 PM Backport #40840 (In Progress): nautilus: Explicitly requested repair of an inconsistent PG cannot...
- 05:27 PM Bug #41353 (Resolved): scrub/osd-scrub-snaps.sh fails
- 05:02 PM Bug #41399 (Fix Under Review): Move bluefs alloc size initialization log message to log level 1
- 04:07 PM Bug #41399: Move bluefs alloc size initialization log message to log level 1
- - At present, from a shared BlueStore OSD which has wal, db and block all in one it is being set as 64K we can see in...
- 04:05 PM Bug #41399 (Resolved): Move bluefs alloc size initialization log message to log level 1
- - https://github.com/ceph/ceph/pull/29537...
- 04:57 PM Bug #41255 (In Progress): backfill_toofull seen on cluster where the most full OSD is at 1%
- 02:48 PM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
- All of this code went away by mimic.
- 02:23 PM Backport #41238: nautilus: Implement mon_memory_target
- Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/29652
The above PR is dependent on the backport of ... - 12:55 PM Bug #41236 (Fix Under Review): cosbench failures in rados/perf
- https://github.com/ceph/cbt/pull/191
- 09:53 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
- We just bumped into this, on Luminous (12.2.12). It actually caused us momentary loss of quorum.
Sequence of event... - 08:45 AM Documentation #41389 (Resolved): wrong datatype describing crush_rule
- current documentation for luminous https://docs.ceph.com/docs/luminous/rados/operations/pools/ is wrong regarding cru...
- 05:47 AM Bug #37654: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_peering_interv...
- http://pulpito.ceph.com/xxg-2019-08-21_09:03:35-rados:thrash-wip-scrub-omap-error-distro-basic-smithi/4236636/
- 05:41 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- I had a chance to get back to this.
I fuse mounted the uploaded image and copied the osdmap data for epoch 80890 o...
08/21/2019
- 10:25 PM Bug #40792 (Fix Under Review): monc: send_command to specific down mon breaks other mon msgs
- Updated for a few issues and marked the PR for testing again.
- 08:21 PM Bug #40792 (In Progress): monc: send_command to specific down mon breaks other mon msgs
- 09:47 PM Bug #24531: Mimic MONs have slow/long running ops
- 09:20 PM Bug #40073 (Resolved): PG scrub stamps reset to 0.000000
- 09:18 PM Bug #39570 (Resolved): nautilus with requrie_osd_release < nautilus cannot increase pg_num
- 09:18 PM Backport #40322 (Resolved): nautilus: nautilus with requrie_osd_release < nautilus cannot increas...
- 09:18 PM Bug #39972 (Resolved): librados 'buffer::create' and related functions are not exported in C++ API
- 09:17 PM Backport #24360 (In Progress): luminous: osd: leaked Session on osd.7
- 08:45 PM Backport #24360 (New): luminous: osd: leaked Session on osd.7
- Meh, actually probably is.
- 08:40 PM Backport #24360 (Rejected): luminous: osd: leaked Session on osd.7
- Not worth backporting to luminous.
- 09:16 PM Backport #39506 (Rejected): mimic: Give recovery for inactive PGs a higher priority
- 09:16 PM Backport #39505 (Rejected): luminous: Give recovery for inactive PGs a higher priority
- 09:16 PM Bug #39484 (Resolved): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- 09:16 PM Bug #39099 (Resolved): Give recovery for inactive PGs a higher priority
- 09:13 PM Backport #39518 (Resolved): mimic: snaps missing in mapper, should be: ca was r -2...repaired
- 09:12 PM Bug #39333 (Resolved): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
- 09:10 PM Bug #37439 (Resolved): Degraded PG does not discover remapped data on originating OSD
- 09:10 PM Backport #39431 (Resolved): luminous: Degraded PG does not discover remapped data on originating OSD
- 09:08 PM Bug #38359 (Resolved): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:08 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:00 PM Documentation #23999 (Resolved): osd_recovery_priority is not documented (but osd_recovery_op_pri...
- 09:00 PM Backport #38567 (Resolved): luminous: osd_recovery_priority is not documented (but osd_recovery_o...
- 08:58 PM Bug #38432 (Resolved): ENOENT on setattrs (obj was recently deleted)
- 08:57 PM Backport #38507 (Resolved): mimic: ENOENT on setattrs (obj was recently deleted)
- 08:53 PM Bug #21142 (Won't Fix): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- If this pops up and causes more trouble we may try again but given the efforts so far it seems like we aren't going t...
- 08:52 PM Backport #38256 (Duplicate): luminous: OSD crashes when loading pgs with "FAILED assert(interval....
- The original issue #21142 is a luminous-only bug report and there's no code fixing it yet.
- 08:44 PM Bug #24174 (Resolved): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- 08:39 PM Backport #23926: luminous: disable bluestore cache caused a rocksdb error
- We need to discuss if this is worth backporting any more; it may not be but Kefu can probably talk to the right people?
- 08:37 PM Bug #18746 (Resolved): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+k...
- Already backported to luminous.
- 08:33 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
- 08:32 PM Bug #21127 (Resolved): qa/standalone/scrub/osd-scrub-repair.sh timeout
- 08:18 PM Bug #41383 (Need More Info): scrub object count mismatch on device_health_metrics pool
- 08:18 PM Bug #41383: scrub object count mismatch on device_health_metrics pool
- This may be the empty object names that the device health manager was inappropriately creating? See the thread "[ceph...
- 07:04 PM Bug #41383 (Resolved): scrub object count mismatch on device_health_metrics pool
- jenglisch on irc reports multiple scrub errors (error, repaired, reappeared a few days later) on metrics pool.
<pr... - 08:11 PM Bug #41200 (Pending Backport): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup me...
- 07:56 PM Bug #39286 (Pending Backport): primary recovery local missing object did not update obc
- 07:52 PM Bug #38649 (Can't reproduce): [ERR] full status failsafe engaged, dropping updates, now -21474836...
- 07:51 PM Bug #38402: ceph-objectstore-tool on down osd w/ not enough in osds
- We think it just needs test fixing. Those in the rados suite test review group can see https://docs.google.com/docume...
- 07:49 PM Bug #41385 (Resolved): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(from...
- ...
- 07:45 PM Bug #38322 (Fix Under Review): luminous: mons do not trim maps until restarted
- 07:44 PM Bug #40367: "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
- same thing upgrading from mimic:
/a/sage-2019-08-21_15:17:39-rados-wip-sage2-testing-2019-08-20-0935-distro-basic-... - 07:31 PM Bug #38023 (Closed): segv on FileJournal::prepare_entry in bufferlist
- Seems to have been resolved alongside those related tickets?
- 07:30 PM Bug #37808 (Can't reproduce): osd: osdmap cache weak_refs assert during shutdown
- 07:28 PM Bug #37798 (Can't reproduce): ceph-objectstore-tool crash from finisher
- 07:27 PM Bug #37786 (Can't reproduce): test fails in mon/crush_ops.sh
- 05:06 PM Backport #41084: nautilus: Change default for bluestore_fsck_on_mount_deep as false
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29697
m... - 05:04 PM Backport #40537: nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29372
m... - 04:59 PM Backport #40942: nautilus: mon/OSDMonitor.cc: better error message about min_size
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29617
m... - 04:58 PM Backport #40940: nautilus: Update rocksdb to v6.1.2
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29440
m... - 04:57 PM Backport #41092: nautilus: rocksdb: enable rocksdb_rmrange=true by default and make delete range ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29439
m... - 02:53 PM Bug #41353: scrub/osd-scrub-snaps.sh fails
- 02:39 PM Backport #39516 (Resolved): nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi_pa...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28187
m... - 02:38 PM Backport #40625: nautilus: OSDs get killed by OOM due to a broken switch
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29391
m... - 02:37 PM Bug #41052: nautilus: cbt cosbench workloads failing in rados/perf suite
- https://github.com/ceph/ceph/pull/29453
merge commit 59177f780c5be0e6530df2fdba1abfa6e3187569 (v14.2.2-230-g59177f780c) - 02:36 PM Backport #40180 (Resolved): nautilus: qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29252
m... - 02:35 PM Backport #40465 (Resolved): nautilus: osd beacon sometimes has empty pg list
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29254
m... - 02:35 PM Backport #39743 (Resolved): nautilus: mon: "FAILED assert(pending_finishers.empty())" when paxos ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28528
m... - 02:34 PM Backport #40382: nautilus: RuntimeError: expected MON_CLOCK_SKEW but got none
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28576
m... - 02:32 PM Backport #40274 (Resolved): nautilus: librados 'buffer::create' and related functions are not exp...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29244
m... - 02:25 PM Backport #40667: nautilus: PG scrub stamps reset to 0.000000
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28869
m... - 02:24 PM Backport #40730 (Resolved): nautilus: mon: auth mon isn't loading full KeyServerData after restart
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28993
m... - 02:24 PM Backport #39693 (Resolved): nautilus: _txc_add_transaction error (39) Directory not empty not han...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29115
m... - 07:53 AM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
- Not to my knowledge, but I haven't checked in a while.
- 07:33 AM Bug #22233: prime_pg_temp breaks on uncreated pgs
- > the mon should see hte pg mapping change from [3,6] to [4,6] and send the create to osd.4
exactly. that's why i ... - 01:16 AM Feature #41363 (New): Allow user to cancel scrub requests
If a user requests multiple scrubs or deep-scrubs, they should be able to cancel the requests. It may be that they...- 01:02 AM Bug #41362 (Resolved): Rados bench sequential and random read: not behaving as expected when op s...
- ObjBencher::seq_read_bench() is using "num_objects > data.started" to make sure
we don't issue more reads than what ...
08/20/2019
- 11:25 PM Bug #35974: Apparent export-diff/import-diff corruption
- @Josh: AFAIK, the diff calculations do not set the LOCALIZED/BALANCED read flags. Those are only (optionally) set on ...
- 11:07 PM Bug #35974: Apparent export-diff/import-diff corruption
- This sounds like it may be due to balance reads not behaving properly for the diff rados op, since it's not operating...
- 11:17 PM Bug #37656 (Can't reproduce): FileStore::_do_transaction() crashed with error 17 (merge collectio...
- 11:15 PM Bug #37654 (Can't reproduce): FAILED ceph_assert(info.history.same_interval_since != 0) in PG::st...
- 11:13 PM Bug #36746: Ignore osd_find_best_info_ignore_history_les for erasure-coded PGs
- Maybe change this from true/false to specify a PG, so only that PG is affected.
- 11:11 PM Bug #36388 (Resolved): osd: "out of order op"
- 11:01 PM Bug #35810 (Can't reproduce): FAILED assert(entries.begin()->version > info.last_update)
- 11:01 PM Bug #35542 (Won't Fix): Backfill and recovery should validate all checksums
Bluestore makes this unnecessary and it is only possible on a pull of the complete object.- 10:58 PM Bug #26947 (Resolved): ENOENT on collection_move_rename from divergent activate
- Neha thinks this is the same as a merge divergent object bug that was fixed.
- 10:57 PM Bug #25155 (Can't reproduce): mon crash from 'ceph osd erasure-code-profile set lrcprofile name=l...
- 10:56 PM Bug #24678 (Can't reproduce): ceph-mon segmentation fault after setting pool size to 1 on degrade...
- 10:53 PM Bug #24531 (Fix Under Review): Mimic MONs have slow/long running ops
- 10:52 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
- Eek what did you end up doing, Ilya? Anything happen here?
- 10:49 PM Bug #24320 (Resolved): out of order reply and/or osd assert with set-chunks-read.yaml
- 10:49 PM Bug #24148 (Duplicate): Segmentation fault out of ObcLockManager::get_lock_type()
- 10:47 PM Bug #23892 (Can't reproduce): luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread
- Believe we've made some fixes to OpHistory since April last year...
- 10:43 PM Bug #23830 (Can't reproduce): rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- 10:40 PM Bug #23760: mon: `config get <who>` does not allow `who` as 'mon'/'osd'
- Is this still an issue Joao, Josh?
- 10:13 PM Bug #23511 (Can't reproduce): forwarded osd_failure leak in mon
- I don't think we've seen this again and may have made even more no_reply fixes?
- 10:12 PM Bug #23402 (Duplicate): objecter: does not resend op on split interval
- Happily fixed now.
- 10:08 PM Bug #22624 (Duplicate): filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No suc...
- 10:04 PM Bug #22656 (Can't reproduce): scrub mismatch on bytes (cache pools)
- 10:01 PM Bug #22408 (Can't reproduce): objecter: sent out of order ops
- 10:00 PM Bug #22233 (In Progress): prime_pg_temp breaks on uncreated pgs
- 09:57 PM Bug #21965 (Can't reproduce): mon/MonClient.cc: 478: FAILED assert(authenticate_err == 0)
- 09:57 PM Bug #21823 (Can't reproduce): on_flushed: object ... obc still alive (ec + cache tiering)
- 09:56 PM Bug #21686 (Can't reproduce): osd/PrimaryLogPG.cc: 10195: FAILED assert(i->second == obc) in fini...
- 09:55 PM Bug #21557 (Can't reproduce): osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi1443...
- 09:55 PM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- We've made some improvements and fixed some bad inefficiencies in the CRUSH code and updates.
- 09:54 PM Bug #20909 (Can't reproduce): Error ETIMEDOUT: crush test failed with -110: timed out during smok...
- 09:54 PM Bug #21130 (Can't reproduce): "FAILED assert(bh->last_write_tid > tid)" in powercycle-master-test...
- 09:54 PM Bug #20874 (Can't reproduce): osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end()...
- 09:52 PM Bug #20798 (Can't reproduce): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- 09:52 PM Bug #20759: mon: valgrind detects a few leaks
- Maybe this was some of the leaked MForwards we weren't marking as no_reply?
- 09:51 PM Bug #20759 (Can't reproduce): mon: valgrind detects a few leaks
- 09:51 PM Bug #20694 (Can't reproduce): osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_lo...
- Sam changed this with his PeeringStateMachine refactor. :D
- 09:50 PM Bug #20283: qa: missing even trivial tests for many commands
ceph commands tests can go in qa/workunits/cephtool/test.sh- 09:48 PM Bug #20303 (Can't reproduce): filejournal: Unable to read past sequence ... journal is corrupt
- 09:45 PM Bug #20133 (Can't reproduce): EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksd...
- 09:44 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
- 09:42 PM Bug #20000 (Can't reproduce): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
- Re-open if this still occurs.
- 09:39 PM Bug #41318 (Resolved): per-pool omap broken with temp recovery objects
- 09:38 PM Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDs
- If this is still an issue in bluestore, let's fix it there.
- 09:37 PM Bug #18643 (Closed): SnapTrimmer: inconsistencies may lead to snaptrimmer hang
- This no longer seems to be the case. If trim_object() returns an error to its sole caller, PrimaryLogPG::AwaitAsyncWo...
- 09:37 PM Feature #41360 (New): snaptrim_error condition should allow repair and resume snaptrim
- 09:37 PM Bug #18667 (Can't reproduce): [cache tiering] omap data time-traveled to stale version
- 09:32 PM Bug #18209 (Duplicate): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- Looks the same as the Safetimer crash that Brad fixed last year.
- 09:30 PM Bug #17252 (Can't reproduce): [Librados] Deadlock on RadosClient::watch_flush
- This hasn't come up again and got fixed in the only user.
- 09:24 PM Bug #16236 (Won't Fix): cache/proxied ops from different primaries (cache interval change) don't ...
- 09:21 PM Bug #15653 (Resolved): crush: low weight devices get too many objects for num_rep > 1
- Closed since upmap fixes this.
- 09:12 PM Bug #38483: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Sage says the PR is buggy, and this case is very hard to hit, so moving to normal priority.
- 09:08 PM Bug #40245 (Won't Fix): filestore::read() does not assert on EIO
- 09:06 PM Bug #40530 (Resolved): Scrub reserves from actingbackfill put waits for acting
- The fix for this was included in https://github.com/ceph/ceph/pull/28334 for tracker #40073
- 09:02 PM Bug #40576 (Closed): src/osd/PrimaryLogPG.cc: 10513: FAILED assert(head_obc)
- 08:58 PM Backport #40667 (Resolved): nautilus: PG scrub stamps reset to 0.000000
- 08:57 PM Bug #39555 (Pending Backport): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- 08:47 PM Bug #40522: on_local_recover doesn't touch?
- Ping Sage, it sounds like you know what the issue is?
jianping, does your comment have anything to do with this st... - 08:40 PM Bug #40000: osds do not bound xattrs and/or aggregate xattr data in pg log
- 08:36 PM Bug #39978 (Duplicate): Adding OSD to Luminous Cluster will crash the active mon
- Closing in favor of the other since we've lost all the pastebins. :(
- 08:33 PM Bug #39152 (Pending Backport): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- 08:30 PM Bug #38555: scrub error on ec pg, got 6579891/0 or 7569408/6832128 bytes
- This report has been obviated by the PeeringState refactor/extraction.
- 08:30 PM Bug #38555 (Can't reproduce): scrub error on ec pg, got 6579891/0 or 7569408/6832128 bytes
- 08:24 PM Bug #38219 (In Progress): rebuild-mondb hangs
- Demoting as if you're running this you already need manual intervention anyway.
- 08:22 PM Bug #37969 (Can't reproduce): ENOENT on setattrs
- FileStore, only seen once.
- 08:22 PM Bug #37915 (Can't reproduce): osd: Segmentation fault in OpRequest::_unregistered
- There have been changes to TrackedOps since then.
- 08:21 PM Bug #37911 (Can't reproduce): osd dequeue misorder
- There have been pg merge fixes since then...
- 08:15 PM Bug #23879: test_mon_osdmap_prune.sh fails
- Are we really only seeing this about once a month? Is it just a probabilistic failure based on load of the monitor cl...
- 08:14 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- We can bump this priority up if it reappears again.
- 07:41 PM Feature #41359 (Resolved): Adding Placement Group id in Large omap log message
- ...
- 03:05 PM Bug #41353 (Fix Under Review): scrub/osd-scrub-snaps.sh fails
- https://github.com/ceph/ceph/pull/29774
- 03:00 PM Bug #41353 (Resolved): scrub/osd-scrub-snaps.sh fails
- ...
- 02:51 PM Backport #41350 (In Progress): nautilus: hidden corei7 requirement in binary packages
- 02:32 PM Backport #41350 (Resolved): nautilus: hidden corei7 requirement in binary packages
- https://github.com/ceph/ceph/pull/29772
- 02:32 PM Backport #41351 (Resolved): mimic: hidden corei7 requirement in binary packages
- https://github.com/ceph/ceph/pull/30183
- 02:32 PM Bug #41330 (Pending Backport): hidden corei7 requirement in binary packages
- 01:08 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- Brad Hubbard wrote:
> Hi Troy,
>
> Before we close this I'll take a look at the image you uploaded to see if I ca... - 12:31 AM Bug #41240 (Triaged): All of the cluster SSDs aborted at around the same time and will not start.
- Reducing severity since the cluster is currently healthy.
- 12:24 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- Hi Troy,
Before we close this I'll take a look at the image you uploaded to see if I can work out the nature of th... - 12:20 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- With guidance from badone on IRC, all of the osds are running and all of the pgs are active.
http://lists.ceph.com...
08/19/2019
- 11:04 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- Joao Eduardo Luis wrote:
> The pull request provided with the fix has been merged (https://github.com/ceph/ceph/pull... - 09:27 PM Bug #39546: Warning about past_interval bounds on deleting pg
- /a/sage-2019-08-19_13:35:06-rados-wip-sage-testing-2019-08-17-1023-distro-basic-smithi/4230273
- 09:13 PM Bug #41190 (Fix Under Review): osd: pg stuck in waitactingchange when new acting set doesn't change
- 04:13 PM Backport #40948: nautilus: Better default value for osd_snap_trim_sleep
- Prashant D wrote:
> https://github.com/ceph/ceph/pull/29678
merged - 03:03 PM Backport #41341 (Resolved): nautilus: "CMake Error" in test_envlibrados_for_rocksdb.sh
- https://github.com/ceph/ceph/pull/29979
- 02:38 PM Bug #40451 (Resolved): osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
- 02:38 PM Backport #40537 (Resolved): nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
- 01:08 PM Bug #41336 (Resolved): All OSD Faild after Reboot.
- We have Faced an issue with a Writeback-Cache + EC-Pool.
* our ec-pool cration in the fist place https://pastebin.... - 06:40 AM Backport #40949 (In Progress): mimic: Better default value for osd_snap_trim_sleep
- https://github.com/ceph/ceph/pull/29732
- 05:03 AM Bug #41330 (Fix Under Review): hidden corei7 requirement in binary packages
- 04:53 AM Bug #41330 (Resolved): hidden corei7 requirement in binary packages
- quote from Alexandre Oliva's mail
> After upgrading some old Phenom servers from Fedora/Freed-ora 29 to
> 30's, t... - 02:42 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- output of `ceph report` as requested by badone on #ceph IRC...
08/18/2019
- 03:22 PM Backport #40885: nautilus: ceph mgr module ls -f plain crashes mon
- Prashant D wrote:
> https://github.com/ceph/ceph/pull/29566
merged - 03:22 PM Backport #40322: nautilus: nautilus with requrie_osd_release < nautilus cannot increase pg_num
- Neha Ojha wrote:
> https://github.com/ceph/ceph/pull/29671
merged - 07:00 AM Bug #41253 (Pending Backport): "CMake Error" in test_envlibrados_for_rocksdb.sh
- 04:04 AM Bug #41253 (Fix Under Review): "CMake Error" in test_envlibrados_for_rocksdb.sh
08/17/2019
Also available in: Atom