Activity
From 11/09/2020 to 12/08/2020
12/08/2020
- 01:23 PM Bug #38345 (Resolved): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 01:17 PM Backport #48496 (Resolved): octopus: Paxos::restart() and Paxos::shutdown() can race leading to u...
- https://github.com/ceph/ceph/pull/39161
- 01:17 PM Backport #48495 (Resolved): nautilus: Paxos::restart() and Paxos::shutdown() can race leading to ...
- https://github.com/ceph/ceph/pull/39160
- 07:59 AM Bug #48489 (New): osdmap cluster-only data is interpreted by the userspace client and causes cras...
- We have a cluster-only portion of the OSDMap, which is carefully encoded separately from the portion which clients ne...
12/07/2020
- 09:42 PM Bug #48386 (Pending Backport): Paxos::restart() and Paxos::shutdown() can race leading to use-aft...
- 09:17 PM Backport #43880 (Rejected): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 07:45 PM Bug #48485 (New): osd thrasher timeout
- One of my test runs failed with this:...
- 06:16 PM Backport #48480 (In Progress): octopus: PG::_delete_some isn't optimal iterating objects
- https://github.com/ceph/ceph/pull/38477
- 02:11 PM Backport #48480 (Resolved): octopus: PG::_delete_some isn't optimal iterating objects
- https://github.com/ceph/ceph/pull/38477
- 06:15 PM Backport #48482 (In Progress): nautilus: PG::_delete_some isn't optimal iterating objects
- https://github.com/ceph/ceph/pull/38478
- 02:11 PM Backport #48482 (Resolved): nautilus: PG::_delete_some isn't optimal iterating objects
- https://github.com/ceph/ceph/pull/38478
- 02:11 PM Backport #48481 (Rejected): mimic: PG::_delete_some isn't optimal iterating objects
- 01:44 PM Bug #47044 (Pending Backport): PG::_delete_some isn't optimal iterating objects
12/05/2020
- 02:13 PM Bug #48432 (Resolved): test/azy-omap-stats fails to compile on alpine linux
- 01:51 PM Bug #47044 (Resolved): PG::_delete_some isn't optimal iterating objects
12/04/2020
- 10:18 PM Bug #48323 (Fix Under Review): "size 0 != clone_size 10" in clog - clone size mismatch when dedup...
- 09:11 PM Bug #48468: ceph-osd crash before being up again
- Igor Fedotov wrote:
> Just to mention - telemetry reports show multiple crashes inside HeartbeatMap::_check for diff... - 09:08 PM Bug #48468: ceph-osd crash before being up again
- Just to mention - telemetry reports show multiple crashes inside HeartbeatMap::_check for different clusters.
Hence ... - 09:06 PM Bug #48468: ceph-osd crash before being up again
- Igor Fedotov wrote:
> I believe this isn't ceph-deploy issue...
Probably not indeed, my bad
- 09:05 PM Bug #48468: ceph-osd crash before being up again
- I believe this isn't ceph-deploy issue...
- 08:59 PM Bug #48468: ceph-osd crash before being up again
- I'm adding the crash report as well
- 08:42 PM Bug #48468 (Need More Info): ceph-osd crash before being up again
- Hi hi,
I'm in trouble with 3 osd's never able to be up again inside de cluster after having manually marked as "ou... - 11:32 AM Backport #48444: nautilus: octopus: setting noscrub crashed osd process
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38411
m... - 12:14 AM Backport #48444 (Resolved): nautilus: octopus: setting noscrub crashed osd process
- 05:35 AM Bug #48030 (Resolved): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeou...
- 12:15 AM Bug #47767 (Resolved): octopus: setting noscrub crashed osd process
12/03/2020
- 10:14 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- epoch 100 [0, 1, 2, 3 , 4, 5 ] (100'950, 100'1000]
epoch 110 [0, 1, 2, 3 , 4, N ] 0,1,2,3 (100'960, 110'1020] 4 (100... - 09:41 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- proc_master_log -> merge_log -> log.roll_forward_to -> advance_can_rollback_to
I think recovery below min_size (pe... - 12:14 AM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- -Ok, with osd.31 we have 4 copies of the correct version of this object. However, min_size is 5, so one would assume...
- 06:12 PM Bug #48452 (New): pg merge explodes osdmap mempool size
- We have one cluster with several osds having >500MB osdmap mempools.
Here is one example from today:... - 03:10 PM Bug #46847: Loss of placement information on OSD reboot
- Ok, that's the same state I see our PGs in when they become degraded due to remappings (might_have_unfound: already_p...
- 01:44 AM Backport #48444 (In Progress): nautilus: octopus: setting noscrub crashed osd process
- 01:39 AM Backport #48444 (Resolved): nautilus: octopus: setting noscrub crashed osd process
- https://github.com/ceph/ceph/pull/38411
- 01:38 AM Bug #47767 (Pending Backport): octopus: setting noscrub crashed osd process
- 12:35 AM Bug #48033 (Closed): mon: after unrelated crash: handle_auth_request failed to assign global_id; ...
12/02/2020
- 11:09 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- Yeah, osd.31 has a copy:
root@mira093:/var/lib/ceph/osd/ceph-31# ceph-objectstore-tool --data-path . '["119.385s4"... - 10:56 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- If this guess is correct one or more of the following 8 osds should have a copy of {"oid":"10006fc22f8.00000000","key...
- 10:40 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- My current working assumption is that PeeringState::activate is willing to backfill a non-contiguous replica which ha...
- 09:46 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- 122 doesn't seem to have a 119.385 instance.
- 09:36 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- [Wrong, used the wrong shard field]
- 08:57 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- Unfortunately, it's present but an older version...
- 08:49 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- I spot checked an object in 119.385
sjust@reesi002:~$ sudo ceph pg 119.385 list_unfound | head -n 30... - 08:39 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- [NVM, used the command wrong]
- 08:03 PM Bug #48440 (Need More Info): log [ERR] : scrub mismatch
- ...
- 12:27 PM Bug #48432 (Resolved): test/azy-omap-stats fails to compile on alpine linux
- lazy_omap_stats_test.h uses uint which alpine linux lacks
- 11:30 AM Feature #48430 (New): Add memory consumption of nodes to health checks
- During some tests using a (very small) virtual cluster I noticed that Ceph doesn't seem to 'notice' when a node runs ...
12/01/2020
- 11:07 PM Documentation #48420 (In Progress): Add guidelines regarding introducing new dependencies to Ceph
- This should help developers better understand the consequences of introducing new dependencies to the project.
- 10:41 PM Feature #48419 (In Progress): Add support for balance by space utilization and evenly spread prim...
Replace pg count balancing with utilization with deviation %
followed by primary balancing for replicated that onl...- 09:53 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
- shard 122(5) looks suspicious, it seems to not have been queried from the pg query output but all_unfound_are_queried...
- 08:59 PM Bug #48417 (Duplicate): unfound EC objects in sepia's LRC after upgrade
- ...
- 07:42 PM Backport #48227 (Resolved): nautilus: Log "ceph health detail" periodically in cluster log
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38118
m... - 05:53 PM Feature #39362: ignore osd_max_scrubs for forced repair
- We probably should only do this if #41363 has been implemented together or beforehand. Since too many repairs could ...
- 05:08 PM Bug #47380 (Fix Under Review): mon: slow ops due to osd_failure
11/30/2020
- 11:14 PM Bug #48320 (Resolved): mon/mon-last-epoch-clean.sh fails
- 07:55 PM Bug #48030 (Fix Under Review): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_o...
- 07:46 PM Bug #48030 (In Progress): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_tim...
- 06:32 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Looking at the constructor of RadosClient class, the "add_observer()" call is missing due to which
the config option... - 06:51 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- The logs did not show any clues on why the "pg dump" command became hung forever.
The suspicion is that the "rados_m... - 06:06 PM Bug #35808 (Rejected): ceph osd ok-to-stop result dosen't match the real situation
Marking rejected because reporter hasn't responded to request.- 06:03 PM Bug #23875 (Resolved): Removal of snapshot with corrupt replica crashes osd
Marking resolved since no issue has been seen since the supposed partial fix was added.- 06:00 PM Bug #27988 (Rejected): Warn if queue of scrubs ready to run exceeds some threshold
This was already handled in a different but reasonable way by https://github.com/ceph/ceph/pull/15643 and refined b...- 05:43 PM Bug #46264 (Resolved): mon: check for mismatched daemon versions
- 05:40 PM Backport #48227: nautilus: Log "ceph health detail" periodically in cluster log
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38118
merged - 03:15 PM Bug #48385 (Fix Under Review): nautilus: statfs: a cluster with any up but out osd will report by...
- 02:52 PM Bug #48385 (In Progress): nautilus: statfs: a cluster with any up but out osd will report bytes_u...
- I can reproduce the issue at vstart cluster with both latest Nautilus and Octopus. But not for master.
Looks like th... - 12:24 PM Backport #48378 (In Progress): octopus: invalid values of crush-failure-domain should not be allo...
- 12:22 PM Backport #48228 (In Progress): octopus: Log "ceph health detail" periodically in cluster log
11/29/2020
- 03:01 PM Feature #48392 (New): ceph ignores --keyring?
- I'm trying to set up a new OSD. I'm having some issues with the rollback not performing properly.
When "ceph-volume ...
11/27/2020
- 06:13 PM Bug #48033: mon: after unrelated crash: handle_auth_request failed to assign global_id; probing, ...
- It seems that this can be closed: there is something deeply cursed on one of the machines in that cluster and seeming...
- 09:39 AM Bug #42341: OSD PGs are not being purged
- Has long been resolved, dont even remember the details anymore.
- 04:26 AM Bug #48386 (Fix Under Review): Paxos::restart() and Paxos::shutdown() can race leading to use-aft...
- 12:15 AM Bug #48386 (Resolved): Paxos::restart() and Paxos::shutdown() can race leading to use-after-free ...
- ...
11/26/2020
- 07:52 PM Bug #48385: nautilus: statfs: a cluster with any up but out osd will report bytes_used == stored
- My best guess is it's related to this in PGMap.h:...
- 07:18 PM Bug #48385 (Resolved): nautilus: statfs: a cluster with any up but out osd will report bytes_used...
- The pool df stats are supposed to preset user bytes as "stored" and raw used as "bytes_used".
But if any osd is up... - 11:18 AM Bug #46816 (Resolved): mon stat prints plain text with -f json
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:17 AM Backport #48379 (Resolved): nautilus: invalid values of crush-failure-domain should not be allowe...
- https://github.com/ceph/ceph/pull/39124
- 11:17 AM Backport #48378 (Resolved): octopus: invalid values of crush-failure-domain should not be allowed...
- https://github.com/ceph/ceph/pull/38347
- 10:34 AM Bug #48060: data loss in EC pool
- We ended up decommissioning ceph. For unlucky 'incomplete PG' googlers - here are steps how to migrate openstack VM s...
- 10:27 AM Bug #42341: OSD PGs are not being purged
- Does the workaround mentioned in #43948 help?
- 08:18 AM Bug #43948: Remapped PGs are sometimes not deleted from previous OSDs
- I can report the same in 14.2.11.
We set some osds to crush weight 0, they were draining. But due to #47044 some of ...
11/25/2020
- 10:42 PM Bug #47452 (Pending Backport): invalid values of crush-failure-domain should not be allowed while...
- 07:50 PM Backport #47899 (Resolved): nautilus: mon stat prints plain text with -f json
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37706
m... - 04:48 PM Feature #48361 (New): Parallel querying of buckets.index listomapkeys
- "Querying listomapkeys for rgw.buckets.index takes 45 min in our test env (5 buckets * 521 shards). Each non-zero ind...
- 01:09 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Updating the findings so far from logs under https://pulpito.ceph.com/nojha-2020-11-10_20:16:13-rados:monthrash-maste...
11/24/2020
- 11:43 PM Bug #47767 (In Progress): octopus: setting noscrub crashed osd process
- 08:47 PM Documentation #40579: doc: POOL_NEAR_FULL on OSD_NEAR_FULL
- This should be set to "resolved".
Anthony D'Atri fixed this one. - 05:47 PM Backport #47899: nautilus: mon stat prints plain text with -f json
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37706
merged - 05:22 PM Bug #47419: make check: src/test/smoke.sh: TEST_multimon: timeout 8 rados -p foo bench 4 write -b...
- https://jenkins.ceph.com/job/ceph-pull-requests/64337/consoleFull#10356408840526d21-3511-427d-909c-dd086c0d1034
- 03:15 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- I am assigning this to myself. Looking into the logs.
- 08:53 AM Bug #48065 (New): "ceph osd crush set|reweight-subtree" commands do not set weight on device clas...
- 08:53 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
- I eventually have got approve from the customer to publish their data.
I have attached a tarball that includes `ce... - 06:06 AM Bug #48336 (Fix Under Review): monmaptool --create --add nodeA --clobber monmap aborts in entity_...
- 05:01 AM Bug #48336 (Resolved): monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_t::...
- It's incorrect usage of the command since an IP address is required but we should not abort IMHO....
- 03:10 AM Bug #48334 (Rejected): Check of osd_scrub_auto_repair_num_errors happens before PG stat error is ...
- 03:06 AM Bug #48334 (Rejected): Check of osd_scrub_auto_repair_num_errors happens before PG stat error is ...
11/23/2020
- 01:06 PM Backport #48244 (Resolved): nautilus: collection_list_legacy: pg inconsistent
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38100
m... - 08:02 AM Bug #48320 (Fix Under Review): mon/mon-last-epoch-clean.sh fails
- 07:39 AM Bug #48323: "size 0 != clone_size 10" in clog - clone size mismatch when deduped object is evicted
- https://github.com/ceph/ceph/pull/38237
- 07:38 AM Bug #48323 (Resolved): "size 0 != clone_size 10" in clog - clone size mismatch when deduped objec...
- When evicting deduped object which is cloned,
we try to shrink its size to zero if all chunks are in chunk_map.
How...
11/22/2020
- 02:44 PM Bug #48320 (Resolved): mon/mon-last-epoch-clean.sh fails
- ...
- 06:10 AM Documentation #40579 (Fix Under Review): doc: POOL_NEAR_FULL on OSD_NEAR_FULL
- 05:47 AM Bug #45761 (Fix Under Review): mon_thrasher: "Error ENXIO: mon unavailable" during sync_force com...
11/20/2020
- 10:07 PM Bug #48219 (Resolved): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: return 1
- 10:07 PM Bug #48220 (Resolved): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: return 1
- 09:31 PM Backport #48244: nautilus: collection_list_legacy: pg inconsistent
- Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/38100
merged - 12:45 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- Now, about 18 hours later, the @num_pg@ already has dropped quite a bit. These are the exact same OSDs. The balancer ...
- 12:38 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- aah, was working alongside octopus batch, might had confused, sorry
11/19/2020
- 08:41 PM Documentation #7386: librados: document rados_osd_op_timeout and rados_mon_op_timeout options
- Josh and @zdover23 : Both of these are marked in @options.cc@ as @LEVEL_ADVANCED@. There are >1500 options now, too ...
- 06:53 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- Another observation: The @num_pgs@ is the highest if it was created first on the same host. Later-created devices (hi...
- 06:37 PM Bug #48298 (New): hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- I just added OSDs to my cluster running 14.2.13....
- 05:33 PM Documentation #22843: [doc][luminous] the configuration guide still contains osd_op_threads and d...
- @zdover23 given Nathan's observation that upstream != RHCS, that some of the new options are in that document now, an...
- 05:23 PM Documentation #23354: doc: osd_op_queue & osd_op_queue_cut_off
- The default value for `osd_op_queue_cutoff` changed to `high` with Octopus and was documented as such.
https://g... - 03:51 PM Bug #48297 (New): OSD process using up complete available memory after pg_num change / autoscaler on
- we did following change on our cluster (cephadm octopus 15.2.5):
ceph osd pool set one pg_num 512
after some ti... - 06:01 AM Documentation #23612: doc: add description of new auth profiles
- @zdover23 I think #23442 is a superset of this
- 04:24 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
- @zdover23 I believe that https://github.com/ceph/ceph/pull/31588 fixed this already.
- 04:13 AM Documentation #35968: [doc][jewel] sync documentation "OSD Config Reference" default values with ...
- @zdover23 This is another one that I fear is moot at this late date. We won't see another Jewel release.
- 04:09 AM Documentation #35967: [doc] sync documentation "OSD Config Reference" default values with code de...
- @zdover The @options.cc@ values listed here appear to be current in _master_, and as with #38429 the @mimic@ and @lum...
- 12:37 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Deepika Upadhyay wrote:
> seeing on octopus as well:
> https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip...
11/18/2020
- 09:55 PM Bug #46224 (Resolved): Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:58 PM Documentation #38558: doc: osd [test-]reweight-by-utilization is not properly documented in ceph cli
- @zdover23 I think https://github.com/ceph/ceph/pull/37268 fulfills this and thus it can be marked as completed.
- 12:39 PM Bug #48274 (New): mon_osd_adjust_heartbeat_grace blocks OSDs from being marked as down
- I encountered a situation which I also posted on the users list: https://lists.ceph.io/hyperkitty/list/ceph-users@cep...
- 05:11 AM Documentation #40579: doc: POOL_NEAR_FULL on OSD_NEAR_FULL
- This should be addressed by https://github.com/ceph/ceph/pull/38145
11/17/2020
- 07:20 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- /a/ksirivad-2020-11-16_07:16:50-rados-wip-mgr-progress-turn-off-option-distro-basic-smithi/5630402 - no logs
- 02:58 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- ...
- 12:31 PM Backport #48233 (Resolved): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38069
m... - 03:26 AM Bug #47767: octopus: setting noscrub crashed osd process
The function PG::abort_scrub() probably races with in messages in flight. It might help if we called scrub_unreser...
11/16/2020
- 09:21 PM Backport #48227 (In Progress): nautilus: Log "ceph health detail" periodically in cluster log
- 07:51 PM Bug #48230 (Resolved): nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- 06:53 PM Bug #47440: nautilus: valgrind caught leak in Messenger::ms_deliver_verify_authorizer
- http://qa-proxy.ceph.com/teuthology/yuriw-2020-11-11_16:17:30-rados-wip-yuri-testing-2020-11-09-0849-nautilus-distro-...
- 06:49 PM Bug #38219: rebuild-mondb hangs
description: ...- 12:09 PM Bug #48172: Nautilus 14.2.13 osdmap not trimming on clean cluster
- So, we managed to find the reason, and it's weird.
Cluster is not trimming osdmaps because it thinks that all PGs... - 07:44 AM Backport #48244 (In Progress): nautilus: collection_list_legacy: pg inconsistent
- 07:41 AM Backport #48244 (Resolved): nautilus: collection_list_legacy: pg inconsistent
- https://github.com/ceph/ceph/pull/38100
- 07:42 AM Backport #48243 (In Progress): octopus: collection_list_legacy: pg inconsistent
- 07:41 AM Backport #48243 (Resolved): octopus: collection_list_legacy: pg inconsistent
- https://github.com/ceph/ceph/pull/38098
- 07:38 AM Bug #48153 (Pending Backport): collection_list_legacy: pg inconsistent
- 12:08 AM Documentation #47163: document the difference between disk commit and apply time
- Suggestions re in which document this should go?
- 12:07 AM Documentation #47176: creating pool doc is very out-of-date
- FWIW, some admins (including me) have found the PG autoscaler to decrease `pgp_num` far too aggressively, and have se...
11/15/2020
- 05:01 PM Bug #47930 (Resolved): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
11/14/2020
- 11:27 PM Backport #48233: nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38069
merged - 05:10 AM Bug #38846 (Resolved): dump_pgstate_history doesn't really produce useful json output, needs an a...
- 01:43 AM Bug #38846 (Pending Backport): dump_pgstate_history doesn't really produce useful json output, ne...
11/13/2020
- 10:22 PM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
- 10:03 PM Bug #47930 (Fix Under Review): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background:...
- 09:48 PM Backport #48233 (In Progress): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODU...
- 09:45 PM Backport #48233 (Resolved): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_...
- https://github.com/ceph/ceph/pull/38069
- 09:38 PM Bug #46224 (Pending Backport): Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
- 09:30 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- Could be related to https://github.com/ceph/ceph/pull/37844. Also appeared in its test run https://trello.com/c/Nwckv...
- 08:56 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- This seems to be due to those 3 modules not being present in "modules" when get_health_checks() is called....
- 08:24 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- It's odd, because the mgr log for the job cited above shows a lot of what look like normal status messages from rbd_s...
- 07:53 PM Bug #48230 (Resolved): nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
- ...
- 07:39 PM Bug #47617: rebuild_mondb: daemon-helper: command failed with exit status 1
- Deepika:
/a/yuriw-2020-09-16_23:57:37-rados-wip-yuri8-testing-2020-09-16-2220-octopus-distro-basic-smithi/5441511 a... - 05:35 PM Backport #48228 (Resolved): octopus: Log "ceph health detail" periodically in cluster log
- https://github.com/ceph/ceph/pull/38345
- 05:35 PM Backport #48227 (Resolved): nautilus: Log "ceph health detail" periodically in cluster log
- https://github.com/ceph/ceph/pull/38118
- 05:26 PM Bug #48219 (Fix Under Review): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: ...
- 05:25 PM Bug #48220 (Fix Under Review): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: re...
11/12/2020
- 11:18 PM Bug #48220 (Resolved): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: return 1
- ...
- 11:17 PM Bug #48219 (Resolved): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: return 1
- ...
- 11:03 PM Bug #48042 (Pending Backport): Log "ceph health detail" periodically in cluster log
- 10:40 AM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- I did a wrong copy and paste -- the ceph report had
"min_last_epoch_clean": 163735,
- 10:38 AM Bug #48212 (Resolved): poollast_epoch_clean floor is stuck after pg merging
- We just merged a pool (id 36) from 1024 to 64 PGs, and after this was done the cluster osdmaps were no longer trimmed...
11/11/2020
- 08:46 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
- Agreed, it looks related. Let's leave it here for now.
- 08:24 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
- ...
- 04:06 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip-yuri4-testing-2020-11-10-0959-distro-basic-smithi/
ht...- 01:27 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Fails deterministically: https://pulpito.ceph.com/nojha-2020-11-09_22:09:31-rados:monthrash-master-distro-basic-smith...
- 03:59 PM Documentation #47523 (Resolved): ceph df documentation is outdated
- 02:20 PM Bug #47697 (Resolved): mon: set session_timeout when adding to session_map
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:19 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:18 PM Backport #47748 (Resolved): nautilus: mon: set session_timeout when adding to session_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37554
m... - 01:05 AM Backport #47748: nautilus: mon: set session_timeout when adding to session_map
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37554
merged - 02:04 PM Bug #48183 (New): monmap::build_initial returns different error val on FreeBSD
- 3/3 Test #132: unittest_mon_monmap ..............***Failed 0.14 sec
Running main() from gmock_main.cc
[=========... - 01:56 PM Backport #47747 (Resolved): octopus: mon: set session_timeout when adding to session_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37553
m... - 11:22 AM Bug #47044: PG::_delete_some isn't optimal iterating objects
- Can confirm this makes replacing hw for an S3 cluster quite intrusive, due to the block-db devs getting overloaded (i...
- 10:14 AM Feature #48182 (Resolved): osd: allow remote read by calling cls method from within cls context
- Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if... - 06:37 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
- Please close this issue.
- 06:36 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
- Perf dump shows the crashed osd has too many active connections. Finally, we found some ceph-fuse clients on other ho...
11/10/2020
- 10:34 PM Bug #48077 (Resolved): Allowing scrub configs begin_day/end_day to include 7 and begin_hour/end_h...
- 10:04 PM Bug #48173 (New): Additional test cases for osd-recovery-scrub.sh
3. Remote reservation non-overlapping PGs start recovery on PG that has a replica
4. failed local (need sleep in O...- 08:13 PM Bug #48172 (New): Nautilus 14.2.13 osdmap not trimming on clean cluster
- We have cluster running on 14.2.13 (some osd's are on 14.2.9, they are on Debian). Cluster is in active+clean state, ...
- 07:32 PM Backport #47747: octopus: mon: set session_timeout when adding to session_map
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37553
merged - 06:57 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
- http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/ has ...
- 06:57 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/5597065
- 11:25 AM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
- More than 90 osds crash in my cluster, crash info is as follows:
"os_version_id": "7",
"assert_condition... - 09:52 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
- Actually, the problem with the weight not updated on the class subtree is easily reproducible on a vstart cluster (se...
- 12:23 AM Bug #18445 (Won't Fix): ceph: ping <mon.id> doesn't connect to cluster
- Long since fixed differently
11/09/2020
- 11:57 PM Bug #48065 (Need More Info): "ceph osd crush set|reweight-subtree" commands do not set weight on ...
- This does sound like a bug. Can you please share the osdmap?
- 10:11 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
- Fails on every run
https://pulpito.ceph.com/teuthology-2020-11-08_07:01:02-rados-master-distro-basic-smithi/
http... - 09:52 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
- rados/singleton/{all/max-pg-per-osd.from-replica mon_election/connectivity msgr-failures/many msgr/async-v2only objec...
- 08:35 PM Bug #48153 (Fix Under Review): collection_list_legacy: pg inconsistent
- 06:35 PM Bug #48153: collection_list_legacy: pg inconsistent
- And just as a note. The problem is observed only when the scrub is run for a pg that have both old and new versions o...
- 06:23 PM Bug #48153 (In Progress): collection_list_legacy: pg inconsistent
- > And if you see "collection_list_legacy" in the log, then I am interested to see more details about object names you...
- 05:08 PM Bug #48153: collection_list_legacy: pg inconsistent
- > ceph tell osd.430 injectargs '--debug-osd=10'
sorry, it should be '--debug-bluestore=10'.
And if you see "c... - 04:17 PM Bug #48153: collection_list_legacy: pg inconsistent
- Alexander, I am building a cluster to reproduce the issue.
Meantime, could you please increase debug_osd level on ... - 03:38 PM Bug #48153: collection_list_legacy: pg inconsistent
- Mykola Golub wrote:
> Alexander, did you have osds of different versions in your acting set? If you did, what exactl... - 03:31 PM Bug #48153: collection_list_legacy: pg inconsistent
- > missing replica happens only on osd with 14.2.12 or 14.2.13, on 14.2.11 all ok.
Can I assume that in [206,430,41... - 03:26 PM Bug #48153: collection_list_legacy: pg inconsistent
- Alexander, did you have osds of different versions in your acting set? If you did, what exactly they were?
Also is i... - 03:07 PM Bug #48153 (Resolved): collection_list_legacy: pg inconsistent
- hello ppl.
i have problem with osd on nau 14.2.12 and 14.2.13
after update some osd on 14.2.12 or 14.2.13 sta... - 02:09 PM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
- @Alexander -
This issue is closed and it is only by chance that I saw your comment on it and decided to respond.
... - 11:20 AM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
- 14.2.13 bug happens
ceph pg 21.4c1 query | jq .state... - 12:50 PM Feature #48151: osd: allow remote read by calling cls method from within cls context
- I should have created this ticket under RADOS project, not Ceph project.
- 12:43 PM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
- Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if...
Also available in: Atom