Activity
From 10/22/2020 to 11/20/2020
11/20/2020
- 10:07 PM Bug #48219 (Resolved): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: return 1
- 10:07 PM Bug #48220 (Resolved): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: return 1
- 09:31 PM Backport #48244: nautilus: collection_list_legacy: pg inconsistent
- Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/38100
merged - 12:45 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- Now, about 18 hours later, the @num_pg@ already has dropped quite a bit. These are the exact same OSDs. The balancer ...
- 12:38 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- aah, was working alongside octopus batch, might had confused, sorry
11/19/2020
- 08:41 PM Documentation #7386: librados: document rados_osd_op_timeout and rados_mon_op_timeout options
- Josh and @zdover23 : Both of these are marked in @options.cc@ as @LEVEL_ADVANCED@. There are >1500 options now, too ...
- 06:53 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- Another observation: The @num_pgs@ is the highest if it was created first on the same host. Later-created devices (hi...
- 06:37 PM Bug #48298 (New): hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- I just added OSDs to my cluster running 14.2.13....
- 05:33 PM Documentation #22843: [doc][luminous] the configuration guide still contains osd_op_threads and d...
- @zdover23 given Nathan's observation that upstream != RHCS, that some of the new options are in that document now, an...
- 05:23 PM Documentation #23354: doc: osd_op_queue & osd_op_queue_cut_off
- The default value for `osd_op_queue_cutoff` changed to `high` with Octopus and was documented as such.
https://g... - 03:51 PM Bug #48297 (New): OSD process using up complete available memory after pg_num change / autoscaler on
- we did following change on our cluster (cephadm octopus 15.2.5):
ceph osd pool set one pg_num 512
after some ti... - 06:01 AM Documentation #23612: doc: add description of new auth profiles
- @zdover23 I think #23442 is a superset of this
- 04:24 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
- @zdover23 I believe that https://github.com/ceph/ceph/pull/31588 fixed this already.
- 04:13 AM Documentation #35968: [doc][jewel] sync documentation "OSD Config Reference" default values with ...
- @zdover23 This is another one that I fear is moot at this late date. We won't see another Jewel release.
- 04:09 AM Documentation #35967: [doc] sync documentation "OSD Config Reference" default values with code de...
- @zdover The @options.cc@ values listed here appear to be current in _master_, and as with #38429 the @mimic@ and @lum...
- 12:37 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Deepika Upadhyay wrote:
> seeing on octopus as well:
> https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip...
11/18/2020
- 09:55 PM Bug #46224 (Resolved): Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:58 PM Documentation #38558: doc: osd [test-]reweight-by-utilization is not properly documented in ceph cli
- @zdover23 I think https://github.com/ceph/ceph/pull/37268 fulfills this and thus it can be marked as completed.
- 12:39 PM Bug #48274 (New): mon_osd_adjust_heartbeat_grace blocks OSDs from being marked as down
- I encountered a situation which I also posted on the users list: https://lists.ceph.io/hyperkitty/list/ceph-users@cep...
- 05:11 AM Documentation #40579: doc: POOL_NEAR_FULL on OSD_NEAR_FULL
- This should be addressed by https://github.com/ceph/ceph/pull/38145
11/17/2020
- 07:20 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- /a/ksirivad-2020-11-16_07:16:50-rados-wip-mgr-progress-turn-off-option-distro-basic-smithi/5630402 - no logs
- 02:58 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- ...
- 12:31 PM Backport #48233 (Resolved): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38069
m... - 03:26 AM Bug #47767: octopus: setting noscrub crashed osd process
The function PG::abort_scrub() probably races with in messages in flight. It might help if we called scrub_unreser...
11/16/2020
- 09:21 PM Backport #48227 (In Progress): nautilus: Log "ceph health detail" periodically in cluster log
- 07:51 PM Bug #48230 (Resolved): nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- 06:53 PM Bug #47440: nautilus: valgrind caught leak in Messenger::ms_deliver_verify_authorizer
- http://qa-proxy.ceph.com/teuthology/yuriw-2020-11-11_16:17:30-rados-wip-yuri-testing-2020-11-09-0849-nautilus-distro-...
- 06:49 PM Bug #38219: rebuild-mondb hangs
description: ...- 12:09 PM Bug #48172: Nautilus 14.2.13 osdmap not trimming on clean cluster
- So, we managed to find the reason, and it's weird.
Cluster is not trimming osdmaps because it thinks that all PGs... - 07:44 AM Backport #48244 (In Progress): nautilus: collection_list_legacy: pg inconsistent
- 07:41 AM Backport #48244 (Resolved): nautilus: collection_list_legacy: pg inconsistent
- https://github.com/ceph/ceph/pull/38100
- 07:42 AM Backport #48243 (In Progress): octopus: collection_list_legacy: pg inconsistent
- 07:41 AM Backport #48243 (Resolved): octopus: collection_list_legacy: pg inconsistent
- https://github.com/ceph/ceph/pull/38098
- 07:38 AM Bug #48153 (Pending Backport): collection_list_legacy: pg inconsistent
- 12:08 AM Documentation #47163: document the difference between disk commit and apply time
- Suggestions re in which document this should go?
- 12:07 AM Documentation #47176: creating pool doc is very out-of-date
- FWIW, some admins (including me) have found the PG autoscaler to decrease `pgp_num` far too aggressively, and have se...
11/15/2020
- 05:01 PM Bug #47930 (Resolved): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
11/14/2020
- 11:27 PM Backport #48233: nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38069
merged - 05:10 AM Bug #38846 (Resolved): dump_pgstate_history doesn't really produce useful json output, needs an a...
- 01:43 AM Bug #38846 (Pending Backport): dump_pgstate_history doesn't really produce useful json output, ne...
11/13/2020
- 10:22 PM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
- 10:03 PM Bug #47930 (Fix Under Review): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background:...
- 09:48 PM Backport #48233 (In Progress): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODU...
- 09:45 PM Backport #48233 (Resolved): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_...
- https://github.com/ceph/ceph/pull/38069
- 09:38 PM Bug #46224 (Pending Backport): Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
- 09:30 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- Could be related to https://github.com/ceph/ceph/pull/37844. Also appeared in its test run https://trello.com/c/Nwckv...
- 08:56 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- This seems to be due to those 3 modules not being present in "modules" when get_health_checks() is called....
- 08:24 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- It's odd, because the mgr log for the job cited above shows a lot of what look like normal status messages from rbd_s...
- 07:53 PM Bug #48230 (Resolved): nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- ...
- 07:39 PM Bug #47617: rebuild_mondb: daemon-helper: command failed with exit status 1
- Deepika:
/a/yuriw-2020-09-16_23:57:37-rados-wip-yuri8-testing-2020-09-16-2220-octopus-distro-basic-smithi/5441511 a... - 05:35 PM Backport #48228 (Resolved): octopus: Log "ceph health detail" periodically in cluster log
- https://github.com/ceph/ceph/pull/38345
- 05:35 PM Backport #48227 (Resolved): nautilus: Log "ceph health detail" periodically in cluster log
- https://github.com/ceph/ceph/pull/38118
- 05:26 PM Bug #48219 (Fix Under Review): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: ...
- 05:25 PM Bug #48220 (Fix Under Review): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: re...
11/12/2020
- 11:18 PM Bug #48220 (Resolved): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: return 1
- ...
- 11:17 PM Bug #48219 (Resolved): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: return 1
- ...
- 11:03 PM Bug #48042 (Pending Backport): Log "ceph health detail" periodically in cluster log
- 10:40 AM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- I did a wrong copy and paste -- the ceph report had
"min_last_epoch_clean": 163735,
- 10:38 AM Bug #48212 (Resolved): poollast_epoch_clean floor is stuck after pg merging
- We just merged a pool (id 36) from 1024 to 64 PGs, and after this was done the cluster osdmaps were no longer trimmed...
11/11/2020
- 08:46 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
- Agreed, it looks related. Let's leave it here for now.
- 08:24 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
- ...
- 04:06 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip-yuri4-testing-2020-11-10-0959-distro-basic-smithi/
ht...- 01:27 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Fails deterministically: https://pulpito.ceph.com/nojha-2020-11-09_22:09:31-rados:monthrash-master-distro-basic-smith...
- 03:59 PM Documentation #47523 (Resolved): ceph df documentation is outdated
- 02:20 PM Bug #47697 (Resolved): mon: set session_timeout when adding to session_map
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:19 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:18 PM Backport #47748 (Resolved): nautilus: mon: set session_timeout when adding to session_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37554
m... - 01:05 AM Backport #47748: nautilus: mon: set session_timeout when adding to session_map
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37554
merged - 02:04 PM Bug #48183 (New): monmap::build_initial returns different error val on FreeBSD
- 3/3 Test #132: unittest_mon_monmap ..............***Failed 0.14 sec
Running main() from gmock_main.cc
[=========... - 01:56 PM Backport #47747 (Resolved): octopus: mon: set session_timeout when adding to session_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37553
m... - 11:22 AM Bug #47044: PG::_delete_some isn't optimal iterating objects
- Can confirm this makes replacing hw for an S3 cluster quite intrusive, due to the block-db devs getting overloaded (i...
- 10:14 AM Feature #48182 (Resolved): osd: allow remote read by calling cls method from within cls context
- Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if... - 06:37 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
- Please close this issue.
- 06:36 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
- Perf dump shows the crashed osd has too many active connections. Finally, we found some ceph-fuse clients on other ho...
11/10/2020
- 10:34 PM Bug #48077 (Resolved): Allowing scrub configs begin_day/end_day to include 7 and begin_hour/end_h...
- 10:04 PM Bug #48173 (New): Additional test cases for osd-recovery-scrub.sh
3. Remote reservation non-overlapping PGs start recovery on PG that has a replica
4. failed local (need sleep in O...- 08:13 PM Bug #48172 (New): Nautilus 14.2.13 osdmap not trimming on clean cluster
- We have cluster running on 14.2.13 (some osd's are on 14.2.9, they are on Debian). Cluster is in active+clean state, ...
- 07:32 PM Backport #47747: octopus: mon: set session_timeout when adding to session_map
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37553
merged - 06:57 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
- http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/ has ...
- 06:57 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/5597065
- 11:25 AM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
- More than 90 osds crash in my cluster, crash info is as follows:
"os_version_id": "7",
"assert_condition... - 09:52 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
- Actually, the problem with the weight not updated on the class subtree is easily reproducible on a vstart cluster (se...
- 12:23 AM Bug #18445 (Won't Fix): ceph: ping <mon.id> doesn't connect to cluster
- Long since fixed differently
11/09/2020
- 11:57 PM Bug #48065 (Need More Info): "ceph osd crush set|reweight-subtree" commands do not set weight on ...
- This does sound like a bug. Can you please share the osdmap?
- 10:11 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Fails on every run
https://pulpito.ceph.com/teuthology-2020-11-08_07:01:02-rados-master-distro-basic-smithi/
http... - 09:52 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
- rados/singleton/{all/max-pg-per-osd.from-replica mon_election/connectivity msgr-failures/many msgr/async-v2only objec...
- 08:35 PM Bug #48153 (Fix Under Review): collection_list_legacy: pg inconsistent
- 06:35 PM Bug #48153: collection_list_legacy: pg inconsistent
- And just as a note. The problem is observed only when the scrub is run for a pg that have both old and new versions o...
- 06:23 PM Bug #48153 (In Progress): collection_list_legacy: pg inconsistent
- > And if you see "collection_list_legacy" in the log, then I am interested to see more details about object names you...
- 05:08 PM Bug #48153: collection_list_legacy: pg inconsistent
- > ceph tell osd.430 injectargs '--debug-osd=10'
sorry, it should be '--debug-bluestore=10'.
And if you see "c... - 04:17 PM Bug #48153: collection_list_legacy: pg inconsistent
- Alexander, I am building a cluster to reproduce the issue.
Meantime, could you please increase debug_osd level on ... - 03:38 PM Bug #48153: collection_list_legacy: pg inconsistent
- Mykola Golub wrote:
> Alexander, did you have osds of different versions in your acting set? If you did, what exactl... - 03:31 PM Bug #48153: collection_list_legacy: pg inconsistent
- > missing replica happens only on osd with 14.2.12 or 14.2.13, on 14.2.11 all ok.
Can I assume that in [206,430,41... - 03:26 PM Bug #48153: collection_list_legacy: pg inconsistent
- Alexander, did you have osds of different versions in your acting set? If you did, what exactly they were?
Also is i... - 03:07 PM Bug #48153 (Resolved): collection_list_legacy: pg inconsistent
- hello ppl.
i have problem with osd on nau 14.2.12 and 14.2.13
after update some osd on 14.2.12 or 14.2.13 sta... - 02:09 PM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
- @Alexander -
This issue is closed and it is only by chance that I saw your comment on it and decided to respond.
... - 11:20 AM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
- 14.2.13 bug happens
ceph pg 21.4c1 query | jq .state... - 12:50 PM Feature #48151: osd: allow remote read by calling cls method from within cls context
- I should have created this ticket under RADOS project, not Ceph project.
- 12:43 PM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
- Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if...
11/08/2020
- 04:25 PM Bug #46318: mon_recovery: quorum_status times out
- /a/kchai-2020-11-08_14:53:34-rados-wip-kefu-testing-2020-11-07-2116-distro-basic-smithi/5602229/
11/07/2020
- 02:34 AM Bug #45706: Memory usage in buffer_anon showing unbounded growth in osds on EC pool. (14.2.9)
- There's this buffer::list::rebuild buffer_anon leak fix in the master branch that may solve the issue:
https://git...
11/06/2020
- 11:03 PM Bug #48033 (Need More Info): mon: after unrelated crash: handle_auth_request failed to assign glo...
- https://tracker.ceph.com/issues/47654#note-7 and https://tracker.ceph.com/issues/47654#note-8 may help to understand ...
11/05/2020
- 09:09 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Greg, I am assigning this bug to you, let me know if you need anything from me.
- 05:59 PM Backport #47993 (Resolved): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37818
m... - 05:17 PM Backport #47993: nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37818
merged - 05:59 PM Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37815
m... - 05:28 PM Backport #47825 (Resolved): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- 05:16 PM Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37815
merged - 05:57 PM Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37853
m... - 05:28 PM Backport #47826 (Resolved): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- 04:27 PM Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37853
merged - 05:56 PM Backport #47994 (Resolved): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37819
m... - 04:22 PM Backport #47994: octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37819
merged - 05:56 PM Backport #47987 (Resolved): octopus: MonClient: mon_host with DNS Round Robin results in 'unable ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37817
m... - 04:22 PM Backport #47987: octopus: MonClient: mon_host with DNS Round Robin results in 'unable to parse ad...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37817
merged - 05:29 PM Bug #46405 (Resolved): osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
- 04:15 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
- ...
- 12:46 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- /a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590078
- 12:44 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- /a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590040 looks similar
- 12:41 AM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
- /a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590019
- 12:39 AM Bug #48029: Exiting scrub checking -- not all pgs scrubbed.
- rados/singleton-nomsgr/{all/osd_stale_reads mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}...
11/04/2020
- 11:39 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Neha Ojha wrote:
> Greg Farnum wrote:
> > Oh the bug does occur while executing commands to change the strategy. Bu... - 11:10 PM Bug #47654 (Triaged): test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Greg Farnum wrote:
> Oh the bug does occur while executing commands to change the strategy. But this all still looks... - 07:46 PM Bug #46264: mon: check for mismatched daemon versions
Delay 7 days by default with a config value before health warning/error when detected.- 07:46 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
- Enhancement:
Possible test cases to replace existing test
1. Simple test for "not scheduling scrubs due to active...
11/03/2020
- 11:02 PM Bug #48077 (Fix Under Review): Allowing scrub configs begin_day/end_day to include 7 and begin_ho...
- 11:51 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Oh the bug does occur while executing commands to change the strategy. But this all still looks fine to me and certai...
- 11:39 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Hmm, I'm confused about the "^C" output as the incoming message strategy. Going over things, as best I can tell those...
- 12:46 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- When mon.c calls for election...
- 07:45 AM Bug #48060: data loss in EC pool
- osd.22 crashed every minute with '/build/ceph-15.2.5/src/osd/osd_types.cc: 5698: FAILED ceph_assert(clone_size.count(...
11/02/2020
- 11:34 PM Bug #48077 (Resolved): Allowing scrub configs begin_day/end_day to include 7 and begin_hour/end_h...
- Make the range of "osd scrub begin/end week day" 0-6 in options.cc
- Handle code in [1] to deal with this and te...- 07:26 PM Bug #48060: data loss in EC pool
- During the day unfound object list grows. Current state is:
[ERR] PG_DAMAGED: Possible data damage: 5 pgs recovery_u... - 06:04 AM Bug #48060: data loss in EC pool
- When osd.22, osd.34 and osd.43 was down ceph -w give us endless lines of:
2020-11-02T07:18:00.464219+0200 osd.45 ... - 05:18 AM Bug #48060: data loss in EC pool
- We accepted data loss and executing command:
root@ik01:~# ceph pg 30.17 mark_unfound_lost delete
pg has 1 object... - 11:02 AM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
- observed failing test case in interactive on error mode, with this config for yaml: ...
- 07:12 AM Bug #48065 (Resolved): "ceph osd crush set|reweight-subtree" commands do not set weight on device...
- We noticed that if one set an osd crush weight using the command...
- 01:12 AM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- /a/teuthology-2020-10-28_07:01:02-rados-master-distro-basic-smithi/5567279 shows the following at the time the failur...
11/01/2020
- 04:09 PM Bug #48060 (New): data loss in EC pool
- We have data LOSS in our EC pool k4m2.
Pool is used for RBD volumes. 15 RBD volumes have broken objects.
Broken ob... - 02:14 PM Bug #48059 (New): core dump running osdmaptool
- I have an Octopus (15.2.4) cluster with degraded and unfound objects, and PGs that have been stuck in the degraded an...
10/31/2020
- 05:13 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
- i can reproduce this issue using a vstart cluster locally using master HEAD at 038750c78afd56c7becf744cf7dc4f8d115793...
- 03:12 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- ...
10/30/2020
- 09:47 PM Bug #45243: nautilus: qa/standalone/scrub/osd-scrub-repair.sh fails with osd-scrub-repair.sh:698:...
- Haven't seen it recently but there might still be a race somewhere which causes this.
- 09:45 PM Bug #44945 (Need More Info): Mon High CPU usage when another mon syncing from it
- Which ceph version is this? I'd be curious to know if this is still an issue with Octopus since we improved removed s...
- 09:40 PM Bug #44694 (Duplicate): MON_DOWN during cluster setup
- https://tracker.ceph.com/issues/45441 seems to be same issue.
- 09:38 PM Bug #44643 (Can't reproduce): leaked buffer (alloc from MonClient::handle_auth_request)
- 09:37 PM Bug #44243 (Can't reproduce): memstore make check test fails
- 09:36 PM Bug #44217 (Can't reproduce): Leaked connection (alloc from AsyncMessenger::add_accept)
- 09:27 PM Bug #43915 (Can't reproduce): leaked Session (alloc from OSD::ms_handle_authentication)
- 09:24 PM Bug #43591: /sbin/fstrim can interfere with umount
- This might still be a problem, just haven't seen it recently.
- 09:16 PM Bug #43185 (Resolved): ceph -s not showing client activity
- 09:15 PM Bug #42921 (Can't reproduce): osd: segmentation fault in PGLog::check
- 09:14 PM Bug #42706 (Can't reproduce): LibRadosList.EnumerateObjectsSplit fails
- 09:13 PM Bug #42186 (Can't reproduce): "2019-10-04T19:31:51.053283+0000 osd.7 (osd.7) 108 : cluster [ERR] ...
- 09:12 PM Bug #42175 (Can't reproduce): _txc_add_transaction error (2) No such file or directory not handl...
- 09:11 PM Bug #41943 (Closed): ceph-mgr fails to report OSD status correctly
- Closing for lack of information and also luminous is EOL now. Please feel free to reopen if this reproduces on a rece...
- 09:10 PM Bug #41748 (Can't reproduce): log [ERR] : 7.19 caller_ops.size 62 > log size 61
- 09:09 PM Bug #40820 (Closed): standalone/scrub/osd-scrub-test.sh +3 day failed assert
- Haven't seen this in a while.
- 09:08 PM Bug #40721 (Can't reproduce): backfill caught in loop from block
- 09:07 PM Bug #40522 (Can't reproduce): on_local_recover doesn't touch?
- 09:05 PM Bug #40454 (Can't reproduce): snap_mapper error, scrub gets r -2..repaired
- 09:04 PM Bug #41183 (Resolved): pg autoscale on EC pools
- 06:28 PM Bug #47930 (In Progress): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: ret...
- 04:16 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
- to reproduce, we just need to change, `s/mon client directed command retry: 5/mon client directed command retry: 2 ru...
- 10:43 AM Bug #48042 (Fix Under Review): Log "ceph health detail" periodically in cluster log
- 10:24 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
- 鑫 王 wrote:
> *A slow IO will occur during execution.*
> I have another question why is the field buffer_anon also g...
10/29/2020
- 11:39 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Based on f7099f72faccb09aea5054c0b428bf89be67141c, "failed to assign global_id" is expected when we are not quorum. T...
- 07:55 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- Looking at logs from /a/nojha-2020-10-28_21:12:45-rados:singleton-bluestore-master-distro-basic-smithi/5569512/
We... - 05:34 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- I am able to reproduce this withtout msgr failure injection.
rados:singleton-bluestore/{all/cephtool mon_election/... - 08:08 PM Bug #43193 (Fix Under Review): "ceph ping mon.<id>" cannot work
- I can confirm this. There is a detailed explanation at https://github.com/ceph/ceph/pull/37716 but the briefest summa...
- 04:45 PM Bug #48042: Log "ceph health detail" periodically in cluster log
- Neha Ojha wrote:
> This will help us spot things like obvious network issues which can lead to racks/hosts down in a... - 04:39 PM Bug #48042 (Resolved): Log "ceph health detail" periodically in cluster log
- This will help us spot things like obvious network issues which can lead to racks/hosts down in a cluster. Also gives...
- 03:57 PM Documentation #18986: Need to document monitor health configuration values
- got outdated, for will discuss and update what are relevant metrics that needs to be documented and update soon
- 08:45 AM Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse a...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37816
m... - 03:31 AM Backport #47986 (Resolved): nautilus: MonClient: mon_host with DNS Round Robin results in 'unable...
- 04:42 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
- hi Igor,
I can't explain why OSD handles 850K(writing 7426528761/8704), but when load is very low (a client iodep... - 02:43 AM Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
- Yao Ning wrote:
> root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
> cluster:
> id: 299a04ba-dd3e-...
10/28/2020
- 06:42 PM Bug #48033 (Closed): mon: after unrelated crash: handle_auth_request failed to assign global_id; ...
- ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)
I have tried to extensively se... - 06:17 PM Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse a...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37816
merged - 05:12 PM Bug #45190: osd dump times out
- ...
- 05:00 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- /a/teuthology-2020-10-28_07:01:02-rados-master-distro-basic-smithi/5567283
- 04:59 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- rados/singleton/{all/mon-config mon_election/connectivity msgr-failures/many msgr/async objectstore/bluestore-comp-lz...
- 04:57 PM Bug #48030 (Resolved): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeou...
- ...
- 04:52 PM Bug #48029 (New): Exiting scrub checking -- not all pgs scrubbed.
- ...
- 03:28 PM Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
- Yao Ning wrote:
> root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
> cluster:
> id: 299a04ba-dd3e-... - 03:21 PM Bug #48028 (Won't Fix - EOL): ceph-mon always suffer lots of slow ops from v14.2.9
- root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
cluster:
id: 299a04ba-dd3e-43a7-af17-628190cf742f
... - 01:16 PM Bug #48026 (New): Mon crashes when adding 4th OSD
- *Context*: I'm running Ceph Octopus 15.2.5 (the latest as of this bug) using Rook on a toy Kubernetes cluster of two ...
10/27/2020
- 10:49 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
We only need 1 pool with 1 pg, if we orchestrate carefully. The existing test is more like a shotgun, sending lots...- 08:14 PM Bug #47930 (Triaged): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
- /a/teuthology-2020-10-21_07:01:02-rados-master-distro-basic-smithi/5544900 - here the failure occurred because the la...
- 09:06 PM Bug #47952: Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with filestor...
- Neha Ojha wrote:
> 14.2.12 introduced the following change in https://github.com/ceph/ceph/pull/37474, which is prob... - 09:34 AM Backport #47826 (In Progress): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: ret...
- 09:34 AM Backport #47741 (Duplicate): octopus: mon: set session_timeout when adding to session_map
10/26/2020
- 09:11 PM Backport #47994 (In Progress): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- 10:34 AM Backport #47994 (Resolved): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- https://github.com/ceph/ceph/pull/37819
- 09:05 PM Backport #47993 (In Progress): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- 10:34 AM Backport #47993 (Resolved): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- https://github.com/ceph/ceph/pull/37818
- 09:04 PM Backport #47987 (In Progress): octopus: MonClient: mon_host with DNS Round Robin results in 'unab...
- 10:32 AM Backport #47987 (Resolved): octopus: MonClient: mon_host with DNS Round Robin results in 'unable ...
- https://github.com/ceph/ceph/pull/37817
- 09:02 PM Backport #47986 (In Progress): nautilus: MonClient: mon_host with DNS Round Robin results in 'una...
- 10:32 AM Backport #47986 (Resolved): nautilus: MonClient: mon_host with DNS Round Robin results in 'unable...
- https://github.com/ceph/ceph/pull/37816
- 08:39 PM Backport #47825 (In Progress): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: re...
- 05:28 PM Bug #44981 (Resolved): rados/test_envlibrados_for_rocksdb.sh build failure (seen in nautilus)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 05:27 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- sentry event: https://sentry.ceph.com/organizations/ceph/issues/10/events/2bdb1a2346cf4325b1bfaa7adf609f15/?project=2...
- 11:21 AM Bug #47974: Slow requests due to unhealthy hearbeat - 'OSD::osd_op_tp thread 0x7f7f85903700' had ...
- Did you perform any large pool/PG removals recently? Or may be some data rebalancing that could result in PG migratio...
- 11:08 AM Backport #45781 (Rejected): mimic: rados/test_envlibrados_for_rocksdb.sh build failure (seen in n...
- mimic EOL
- 10:46 AM Backport #47898 (Resolved): octopus: mon stat prints plain text with -f json
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37705
m... - 10:33 AM Backport #47992 (Rejected): mimic: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- 07:00 AM Bug #47951 (Pending Backport): MonClient: mon_host with DNS Round Robin results in 'unable to par...
10/25/2020
- 01:56 PM Bug #47328 (Pending Backport): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- 04:26 AM Bug #47929: Huge RAM Usage on OSD recovery
- Neha Ojha wrote:
> Can you export and upload a copy the problematic PG via ceph-post-file?
ceph-post-file: 7639cc...
10/24/2020
- 04:05 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- ...
- 07:40 AM Bug #47974 (New): Slow requests due to unhealthy hearbeat - 'OSD::osd_op_tp thread 0x7f7f85903700...
- Slow requests observed due to unhealthy hearbeats on osd.2.
/a/sseshasa-2020-10-23_12:25:57-rados-wip-sseshasa-tes... - 02:23 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- Alex Litvak wrote:
> Will the fix to it be posted soon? I am building ceph in containers from existing releases, is... - 02:23 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- Will the fix it to it posted soon? I am building ceph in containers from existing releases, is there a tag I can use...
10/23/2020
- 10:08 PM Bug #47929: Huge RAM Usage on OSD recovery
- Neha Ojha wrote:
> Can you export and upload a copy the problematic PG via ceph-post-file?
there are differents P... - 08:11 PM Bug #47929: Huge RAM Usage on OSD recovery
- Can you export and upload a copy the problematic PG via ceph-post-file?
- 05:42 PM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- This appears to break any sort of resolution of IPv6 addresses from hostnames. This affects qemu's usage of rbd, in ...
- 11:30 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- The fix is probably:...
- 07:20 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- Seems like this commit broke this functionality: https://github.com/ceph/ceph/commit/2f075704073ff80f94c70cf79516028d...
- 03:48 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
- /a/teuthology-2020-10-23_07:01:02-rados-master-distro-basic-smithi/5550707
- 03:40 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- /a/teuthology-2020-10-23_07:01:02-rados-master-distro-basic-smithi/5550826
- 03:17 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
- Andrew Mitroshin wrote:
> Injecting into mgr has solved the issue, thanks!
What command did you use to inject int... - 02:03 PM Backport #47898: octopus: mon stat prints plain text with -f json
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37705
merged - 01:51 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
- http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-20_19:54:27-rados-wip-yuri-testing-2020-10-20-0934-octopus-distro-b...
- 10:17 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
- Still exist in 14.2.11. When you have some issues with network, then after all you're ending with SLOW_OPS with osd_f...
10/22/2020
- 10:11 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
- /a/kchai-2020-10-21_07:01:44-rados-wip-kefu-testing-2020-10-21-1144-distro-basic-smithi/5545065
- 07:52 PM Bug #47952: Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with filestor...
- 14.2.12 introduced the following change in https://github.com/ceph/ceph/pull/37474, which is probably the case you ar...
- 07:08 PM Bug #47952 (New): Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with fi...
- Tried pool creation using ceph-ansibles-4.0 and replication pool failed with following error :
Build : Nautilus 1... - 06:10 PM Bug #47929: Huge RAM Usage on OSD recovery
- Nop, not work the export-import behavior, because on recover, when need to recover that PG then OOM killed
- 01:52 PM Bug #47929: Huge RAM Usage on OSD recovery
- there are some extrange behavior because now in another failing OSD not work at all and i execute the export-remove a...
- 03:47 AM Bug #47929: Huge RAM Usage on OSD recovery
Changed and used the ...- 05:10 PM Bug #47951 (Fix Under Review): MonClient: mon_host with DNS Round Robin results in 'unable to par...
- 05:06 PM Bug #47951 (In Progress): MonClient: mon_host with DNS Round Robin results in 'unable to parse ad...
- 04:34 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- I performed a test upgrade to 14.2.12 today on a cluster using IPv6 with Round Robin DNS for mon_host...
- 04:33 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
- ...
- 02:54 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
- Deepika Upadhyay wrote:
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020... - 01:06 PM Bug #47949 (New): scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
- ...
- 03:53 PM Bug #40777 (New): hit assert in AuthMonitor::update_from_paxos
- 03:22 PM Bug #47767: octopus: setting noscrub crashed osd process
- It happened again moments after setting nodeep-scrub:...
- 11:00 AM Bug #46732: teuthology.exceptions.MaxWhileTries: 'check for active or peered' reached maximum tri...
- saw this recently, with same configuration description:
/a/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020-... - 07:41 AM Bug #47945 (Duplicate): scrubbing failure
- description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_
2-recovery-overri... - 04:59 AM Bug #46845 (Resolved): Newly orchestrated OSD fails with 'unable to find any IPv4 address in netw...
- https://github.com/ceph/ceph/pull/37709
Also available in: Atom