Activity
From 11/21/2022 to 12/20/2022
12/20/2022
- 09:57 PM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
- https://github.com/ceph/ceph/pull/49109/commits/31750d5e8ae5f64edf934e2350dfa3c98df68b5a
- 09:56 PM Bug #47025 (Fix Under Review): rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNo...
- 12:06 PM Backport #58315 (In Progress): quincy: Valgrind reports memory "Leak_DefinitelyLost" errors.
- 12:04 PM Backport #58314 (In Progress): pacific: Valgrind reports memory "Leak_DefinitelyLost" errors.
- 11:56 AM Bug #58305: src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
- Radoslaw Zarzynski wrote:
> Thanks for the report! Do have a corresponding log or coredump by any chance?
This lo... - 09:23 AM Bug #58316: Ceph health metric Scraping still broken
- BTW this is the output of @smartctl -a --json@ on the device:...
- 09:17 AM Bug #58316 (New): Ceph health metric Scraping still broken
- This was brought up in #46285 already, but the issue has been marked as rejected.
When I run @ceph device scrape-h...
12/19/2022
- 07:08 PM Backport #58315 (Resolved): quincy: Valgrind reports memory "Leak_DefinitelyLost" errors.
- https://github.com/ceph/ceph/pull/49522
- 07:08 PM Backport #58314 (Resolved): pacific: Valgrind reports memory "Leak_DefinitelyLost" errors.
- https://github.com/ceph/ceph/pull/49521
- 07:00 PM Bug #58218 (Duplicate): osd
- 06:59 PM Bug #58178 (Need More Info): FAILED ceph_assert(last_e.version.version < e.version.version)
- 06:59 PM Bug #52136 (Pending Backport): Valgrind reports memory "Leak_DefinitelyLost" errors.
- 06:58 PM Bug #57751 (Resolved): LibRadosAio.SimpleWritePP hang and pkill
- 06:56 PM Bug #58288 (In Progress): quincy: mon: pg_num_check() according to crush rule
- Just updating the tracker's state to fit the reality.
- 06:47 PM Bug #51652: heartbeat timeouts on filestore OSDs while deleting objects in upgrade:pacific-p2p-pa...
- Lowered the priority as @FileStore` is not only deprecated but also being removed right now.
- 06:40 PM Bug #58305 (Need More Info): src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
- Thanks for the report! Do have a corresponding log or coredump by any chance?
- 05:06 PM Documentation #46126: RGW docs lack an explanation of how permissions management works, especiall...
- Sure, very much appreciated.
Matt - 05:03 PM Documentation #46126: RGW docs lack an explanation of how permissions management works, especiall...
- Matt,
I don't mean to endorse dirtwash's rudeness. I mean to capture an impassioned--if inelegant and abusive--req... - 10:01 AM Bug #58281 (Rejected): osd:memory usage exceeds the osd_memory_target
- 03:56 AM Bug #58281: osd:memory usage exceeds the osd_memory_target
- Igor Fedotov wrote:
> Please note that osd_memory_target is not a hard limit. It's just 'target' OSD usage that OSD ...
12/17/2022
- 07:47 PM Bug #58305 (Need More Info): src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
- ...
12/16/2022
- 04:44 PM Bug #58304 (Fix Under Review): pybind: ioctx.get_omap_keys asserts if start_after parameter is no...
- 04:30 PM Bug #58304 (In Progress): pybind: ioctx.get_omap_keys asserts if start_after parameter is non-empty
- 04:29 PM Bug #58304 (Pending Backport): pybind: ioctx.get_omap_keys asserts if start_after parameter is no...
12/15/2022
- 10:51 PM Bug #51652: heartbeat timeouts on filestore OSDs while deleting objects in upgrade:pacific-p2p-pa...
- /a/yuriw-2022-12-14_15:40:37-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/7116495
- 10:45 PM Bug #58289 (New): "AssertionError: wait_for_recovery: failed before timeout expired" from down pg...
- /a/yuriw-2022-12-13_15:58:24-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/7114849...
- 09:49 PM Bug #56034: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
- /a/ksirivad-2022-12-15_06:28:05-rados-wip-ksirivad-testing-main-distro-default-smithi/7118004/
- 08:45 PM Bug #58288 (Resolved): quincy: mon: pg_num_check() according to crush rule
- Corresponding BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2153654
Introduced here in Q: https://github.com/cep... - 05:33 PM Bug #53789 (Fix Under Review): CommandFailedError (rados/test_python.sh): "RADOS object not found...
- 05:04 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
- Hypothesis no 1: the issue is a fallout from 65d05fdd579d21dd57b72b1d9148380bc6074269 (PR https://github.com/ceph/cep...
- 04:24 PM Bug #53575 (Resolved): Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
- 04:14 PM Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
- https://github.com/ceph/ceph/pull/48641 merged
- 04:14 PM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
- https://github.com/ceph/ceph/pull/48641 merged
- 04:14 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- https://github.com/ceph/ceph/pull/48641 merged
- 02:39 PM Fix #57963 (Pending Backport): osd: Misleading information displayed for the running configuratio...
- 02:37 PM Fix #57963 (Resolved): osd: Misleading information displayed for the running configuration of osd...
- 01:58 PM Bug #58281: osd:memory usage exceeds the osd_memory_target
- Please note that osd_memory_target is not a hard limit. It's just 'target' OSD usage that OSD attempts to align with....
- 11:21 AM Bug #58281: osd:memory usage exceeds the osd_memory_target
- ceph daemon osd.1 perf dump...
- 11:16 AM Bug #58281: osd:memory usage exceeds the osd_memory_target
- ...
- 11:06 AM Bug #58281 (Rejected): osd:memory usage exceeds the osd_memory_target
- I want to limit osd_memory_target to 3758096384 bytes...
- 11:17 AM Bug #57529 (Resolved): mclock backfill is getting higher priority than WPQ
- 11:16 AM Backport #58273 (Resolved): quincy: mclock backfill is getting higher priority than WPQ
12/14/2022
- 07:12 PM Backport #58273 (In Progress): quincy: mclock backfill is getting higher priority than WPQ
- 07:08 PM Backport #58273 (Resolved): quincy: mclock backfill is getting higher priority than WPQ
- https://github.com/ceph/ceph/pull/49437
- 06:59 PM Bug #57529 (Pending Backport): mclock backfill is getting higher priority than WPQ
- 04:52 PM Bug #58178: FAILED ceph_assert(last_e.version.version < e.version.version)
- I can not, sorry. I reported the issue as soon as I saw it, waited a day after it showed up, then reformatted the dri...
- 03:20 PM Bug #58178: FAILED ceph_assert(last_e.version.version < e.version.version)
- @Kevin Fox, can you please share the failing osd logs with osd debug 20?
We will suppose to print the previous log e... - 02:21 AM Bug #58218: osd
- https://github.com/ceph/ceph/pull/40441
12/13/2022
- 09:14 PM Bug #56785: crash: void OSDShard::register_and_wake_split_child(PG*): assert(!slot->waiting_for_s...
- Only 4 occurrences of this crash in the wild, but let's keep an eye on this since now we have a test that reproduced it.
- 09:13 PM Bug #56785: crash: void OSDShard::register_and_wake_split_child(PG*): assert(!slot->waiting_for_s...
- /a/yuriw-2022-12-10_00:03:28-rados-wip-yuri7-testing-2022-12-09-1107-quincy-distro-default-smithi/7111159
- 03:59 PM Bug #51729: Upmap verification fails for multi-level crush rule
- Update on this Tracker: I am discussing this scenario with Josh Salomon, someone who is very knowledgeable about bala...
- 05:10 AM Backport #58260 (In Progress): pacific: rados: fix extra tabs on warning for pool copy
- 04:45 AM Backport #58260 (In Progress): pacific: rados: fix extra tabs on warning for pool copy
- https://github.com/ceph/ceph/pull/49400
- 05:09 AM Backport #58259 (In Progress): quincy: rados: fix extra tabs on warning for pool copy
- 04:45 AM Backport #58259 (In Progress): quincy: rados: fix extra tabs on warning for pool copy
- https://github.com/ceph/ceph/pull/49399
- 04:40 AM Bug #58165 (Pending Backport): rados: fix extra tabs on warning for pool copy
12/12/2022
- 11:35 PM Bug #58239: pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
- Analyzing the coredump:
Looking at the backtrace (same as above, but here, the frames are numbered)...... - 04:08 PM Bug #58239: pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
- The failure reproduced 7 times out of 50.
- 07:07 PM Bug #52129: LibRadosWatchNotify.AioWatchDelete failed
- /a/yuriw-2022-12-07_15:47:33-rados-wip-yuri-testing-2022-12-06-1204-distro-default-smithi/7106771
- 07:01 PM Bug #58165: rados: fix extra tabs on warning for pool copy
- https://github.com/ceph/ceph/pull/49251 merged
- 05:33 PM Bug #57989: test-erasure-eio.sh fails since pg is not in unfound
- Looks releated,
/a/yuriw-2022-12-09_22:27:10-rados-main-distro-default-smithi/7110655/... - 05:27 PM Cleanup #58149 (Resolved): Clarify pool creation failure message due to exceeding max_pgs_per_osd
- 05:26 PM Bug #58173 (Resolved): api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolE...
- 05:01 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- @Radek it's also still in main
- 05:00 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- /a/yuriw-2022-12-07_15:48:38-rados-wip-yuri3-testing-2022-12-06-1211-distro-default-smithi/7106890
12/11/2022
- 03:08 PM Bug #58240 (Fix Under Review): osd/scrub: modifying osd_deep_scrub_stride while pg is doing deep ...
- Modify osd_deep_scrub_stride(e.g., 512KiB to 1MiB) while some pgs are doing deep scrub and then check the status of p...
12/09/2022
- 11:04 PM Bug #58239: pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
- First seen in https://github.com/ceph/ceph/pull/48803
I scheduled some tests to run over the weekend so we can see... - 10:36 PM Bug #58239 (Resolved): pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
- This is not deterministic, but when ``run_osd`` in qa/standalone/ceph-helpers.sh can result in timeout when trying to...
- 08:34 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Alright, please let me know when you have the file so I can remove it from my drive:
https://drive.google.com/file/d... - 08:26 PM Bug #55851 (Resolved): Assert in Ceph messenger
- 08:25 PM Backport #57258 (Resolved): pacific: Assert in Ceph messenger
- 06:49 PM Backport #57258: pacific: Assert in Ceph messenger
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48255
merged - 08:03 PM Bug #55355 (Resolved): osd thread deadlock
- 08:00 PM Backport #56722 (Resolved): pacific: osd thread deadlock
- 06:49 PM Backport #56722: pacific: osd thread deadlock
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48254
merged - 05:16 PM Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- /a/yuriw-2022-12-08_15:36:34-rados-wip-yuri2-testing-2022-12-07-0821-pacific-distro-default-smithi/7108597...
- 05:05 PM Bug #56770: crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots.end())
- We should address this crash, as it's been seen in 7 clusters over 100 times.
- 05:03 PM Bug #56770: crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots.end())
- /a/yuriw-2022-12-08_15:36:34-rados-wip-yuri2-testing-2022-12-07-0821-pacific-distro-default-smithi/7108558...
- 04:48 PM Backport #58116 (Resolved): pacific: qa/workunits/rados/test_librados_build.sh: specify redirect ...
- 08:35 AM Bug #56772: crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap.count(...
- Hi, Would it be possible to raise the priority of this bug to High (as well as #57940), as this prevent the incomplet...
12/08/2022
- 10:43 PM Backport #58039 (Resolved): pacific: osd: add created_at and ceph_version_when_created metadata
- 08:59 PM Backport #58039: pacific: osd: add created_at and ceph_version_when_created metadata
- https://github.com/ceph/ceph/pull/49144 merged
- 06:57 PM Bug #48896: osd/OSDMap.cc: FAILED ceph_assert(osd_weight.count(i.first))
- /a/yuriw-2022-12-02_14:50:43-rados-wip-yuri8-testing-2022-12-01-0905-pacific-distro-default-smithi/7101371
- 04:48 PM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- https://github.com/ceph/ceph/pull/48318 merged
- 04:31 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
- /a/yuriw-2022-12-06_15:43:07-rados-wip-yuri8-testing-2022-12-05-1031-pacific-distro-default-smithi/7105473$
- 04:07 PM Bug #58098 (Fix Under Review): qa/workunits/rados/test_crash.sh: crashes are never posted
- It seems reasonable to me!
- 01:10 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Laura Flores wrote:
> Can we make the default behavior a ceph user, and then provide --setgroup and --setuser option... - 09:27 AM Bug #58218: osd
- OSD: crash...
- 09:26 AM Bug #58218 (Duplicate): osd
- 07:55 AM Backport #58214 (In Progress): quincy: osd: Improve osd bench accuracy by using buffers with rand...
- 06:15 AM Backport #58214 (Resolved): quincy: osd: Improve osd bench accuracy by using buffers with random ...
- https://github.com/ceph/ceph/pull/49323
- 06:03 AM Fix #57577 (Pending Backport): osd: Improve osd bench accuracy by using buffers with random patterns
- 12:56 AM Bug #58052: Empty Pool (zero objects) shows usage.
- Alright, I found the logs can be accessed from docker itself. In the process of pulling them, but I am already at 5G...
12/07/2022
- 10:37 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Can we make the default behavior a ceph user, and then provide --setgroup and --setuser options in case we need to re...
- 11:54 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Still waiting for that build (debuginfo seems to take an unbelievably long time to publish...)
Meanwhile, I did a ... - 05:51 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- I've just rerun "rados/singleton/{all/test-crash mon_election/connectivity msgr-failures/few msgr/async objectstore/b...
- 05:30 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- (Sorry, I didn't mean to update any of those fields with my previous comment)
- 02:34 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Thanks Laura, I'll try to figure out what's going on. So far, looking at the journal log, the keyring must be OK, or...
- 03:00 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- Sent a PR for quincy: https://github.com/ceph/ceph/pull/49304.
- 01:47 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- I'm having the same BT in my tests:
/a/nmordech-2022-12-06_13:26:40-rados:thrash-erasure-code-wip-nitzan-peering-aut... - 01:34 PM Bug #56371 (Duplicate): crash: MOSDPGLog::encode_payload(unsigned long)
- 12:04 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Laura, i think it is different than that bug (57751), in that case all the osds are still up.
We can see that we nev... - 09:57 AM Backport #58006 (In Progress): quincy: bail from handle_command() if _generate_command_map() fails
- 09:57 AM Backport #58007 (In Progress): pacific: bail from handle_command() if _generate_command_map() fails
12/06/2022
- 05:30 PM Bug #58098 (New): qa/workunits/rados/test_crash.sh: crashes are never posted
- Laura Flores wrote:
> I scheduled some tests here with the reverts committed to see if they pass: http://pulpito.fro... - 03:48 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- I scheduled some tests here with the reverts committed to see if they pass: http://pulpito.front.sepia.ceph.com/lflor...
- 03:41 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Yes, there's one available at /a/yuriw-2022-11-23_15:09:06-rados-wip-yuri10-testing-2022-11-22-1711-distro-default-sm...
- 06:11 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Is there a way to view the journalctl-b0.gz archive from the failed runs? Because if ceph-crash can't post crashes o...
- 11:53 AM Backport #58186 (In Progress): quincy: osd: Misleading information displayed for the running conf...
- 11:45 AM Backport #58186 (Resolved): quincy: osd: Misleading information displayed for the running configu...
- https://github.com/ceph/ceph/pull/49281
- 11:43 AM Fix #57963 (Pending Backport): osd: Misleading information displayed for the running configuratio...
- 10:02 AM Bug #58173 (Fix Under Review): api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosA...
- 05:09 AM Bug #57937: pg autoscaler of rgw pools doesn't work after creating otp pool
- This problem was fixed in Rook v1.10.2. I updated my Rook/Ceph cluster to v1.10.5 and confirmed that this problem dis...
- 03:07 AM Bug #58182 (Fix Under Review): Suicide when osd bootup timeout
- When the osd is started, if a message is lost, the OSD is stuck in the startup phase.
Restart the osd node through t... - 01:37 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Radoslaw Zarzynski wrote:
> Hello!
>
> what is on disk is actually serialized from the the in-memory representati... - 12:09 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- /a/yuriw-2022-11-28_16:10:10-rados-wip-yuri6-testing-2022-11-23-1348-distro-default-smithi/7093588
12/05/2022
- 11:37 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- https://shaman.ceph.com/builds/ceph/wip-revert-pr-48713/2b583578473c82604cfdab2faef9f161dc2fb0b9/
- 11:20 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- The bug reproduced on Yuri's test branch. The difference between the test branch and the main SHA is that the test br...
- 07:23 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Laura Flores wrote:
> Scheduled 50x tests to run here: http://pulpito.front.sepia.ceph.com/lflores-2022-12-05_17:05:... - 07:22 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- I have a feeling that the tests I scheduled earlier on the main branch all passed since the SHA it picked up is older...
- 07:14 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Wondering if there could have been a regression caused by https://github.com/ceph/ceph/pull/48713.
- 06:38 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- /a/yuriw-2022-11-28_21:26:12-rados-wip-yuri7-testing-2022-11-18-1548-distro-default-smithi/7095988
/a/lflores-2022-1... - 04:17 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Scheduled 50x tests to run here: http://pulpito.front.sepia.ceph.com/lflores-2022-12-05_17:05:59-rados-wip-yuri10-tes...
- 04:10 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Three recent instances of this bug in the main branch point to a regression. My next steps here will be to schedule m...
- 10:46 PM Bug #58052: Empty Pool (zero objects) shows usage.
- That is every log file from every node. There are no ceph-mgr* logs. :/
Even from inside the docker on the adm n... - 06:33 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Hello. Thanks for response and the files....
- 09:11 PM Bug #58173: api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag
- Building a branch here with https://github.com/ceph/ceph/pull/49029 reverted, which can be used to verify whether it ...
- 09:03 PM Bug #58173: api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag
- Excuse my update Sam, I see you already added it as a duplicate.
- 08:55 PM Bug #58173: api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag
- Matan added that test within the last two weeks: https://github.com/ceph/ceph/pull/49029
- 07:10 PM Bug #58173 (Resolved): api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolE...
- The workunits/rados/test.sh script is run in the orch suite on some tests. In a few of them, these two tests were fai...
- 08:06 PM Bug #58178: FAILED ceph_assert(last_e.version.version < e.version.version)
- Noticed an osd, doing this, on a cluster over the weekend. Its been crashing consistently since.
- 08:05 PM Bug #58178 (Need More Info): FAILED ceph_assert(last_e.version.version < e.version.version)
- debug -4> 2022-12-05T19:14:03.556+0000 7fe51028a200 5 osd.57 pg_epoch: 261349 pg[1.573( v 261349'617978754 (2613...
- 07:07 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- I've just let Mark and Ronen know about this issue.
- 07:05 PM Bug #58156: Monitors do not permit OSD to join after upgrading to Quincy
- Radoslaw Zarzynski wrote:
> Hi Igor! What was the intermediary version during the upgrade? We merged https://github.... - 06:40 PM Bug #58156: Monitors do not permit OSD to join after upgrading to Quincy
- Hi Igor! What was the intermediary version during the upgrade? We merged https://github.com/ceph/ceph/pull/44090 but ...
- 07:00 PM Bug #58142 (In Progress): rbd-python snaps-many-objects: deep-scrub : stat mismatch
- Moving to @In progress@ basing the core standup 1 Dec.
- 06:56 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Hello!
what is on disk is actually serialized from the the in-memory representation. We don't see huge numbers of ... - 06:24 PM Bug #58166 (Need More Info): mon:DAEMON_OLD_VERSION newer versions is considered older than earlier
- If your cluster is in the same state, can you please share mon logs with debug_mon=20? The following code snippet in ...
- 02:53 PM Bug #58166: mon:DAEMON_OLD_VERSION newer versions is considered older than earlier
- This was probably introduced in https://github.com/ceph/ceph/pull/36759
- 02:52 PM Bug #58166 (Need More Info): mon:DAEMON_OLD_VERSION newer versions is considered older than earlier
- We have a cluster with most mon/mgr/osd are running 16.2.10 and some OSDs are running 16.2.9
The healthcheck does ... - 06:24 PM Backport #58169 (Resolved): quincy: extra debugs for: [mon] high cpu usage by fn_monstore thread
- https://github.com/ceph/ceph/pull/50406
- 06:16 PM Feature #58168 (Pending Backport): extra debugs for: [mon] high cpu usage by fn_monstore thread
- 06:16 PM Feature #58168 (Pending Backport): extra debugs for: [mon] high cpu usage by fn_monstore thread
- 06:10 PM Bug #53806: unessesarily long laggy PG state
- > I think as long as `acting` does not have duplicate entries, the logic is exactly the same as before.
Yeah. I'm ... - 05:51 PM Backport #55768: pacific: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46499
merged - 05:34 PM Backport #56648: quincy: [Progress] Do not show NEW PG_NUM value for pool if autoscaler is set to...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47925
merged - 05:15 PM Fix #57963: osd: Misleading information displayed for the running configuration of osd_mclock_max...
- https://github.com/ceph/ceph/pull/48708 merged
- 05:12 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
- Radoslaw Zarzynski wrote:
> NOT A FIX (extra debugs): https://github.com/ceph/ceph/pull/48513
merged - 04:02 PM Bug #58165 (Fix Under Review): rados: fix extra tabs on warning for pool copy
- 12:57 PM Bug #58165 (Pending Backport): rados: fix extra tabs on warning for pool copy
- BZ link: https://bugzilla.redhat.com/show_bug.cgi?id=2148242
- 03:52 PM Bug #57632 (Fix Under Review): test_envlibrados_for_rocksdb: free(): invalid pointer
- 07:37 AM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
- Thomas Le Gentil wrote:
> I could avoid this crash by removing all pg for which ceph could not get the clone_bytes, ...
12/04/2022
- 11:56 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- /a/yuriw-2022-11-28_21:13:47-rados-wip-yuri11-testing-2022-11-18-1506-distro-default-smithi/7095031/
- 11:46 AM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
- /a/yuriw-2022-11-23_21:36:17-rados-wip-yuri11-testing-2022-11-18-1506-distro-default-smithi/7089814/
- 09:41 AM Backport #58144 (In Progress): pacific: mon/MonCommands: Support dump_historic_slow_ops
- 09:37 AM Backport #58143 (In Progress): quincy: mon/MonCommands: Support dump_historic_slow_ops
12/02/2022
- 09:49 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- In a passed job, the crashes are posted:...
- 09:33 PM Bug #58098 (In Progress): qa/workunits/rados/test_crash.sh: crashes are never posted
- In the job that passed, the mgr.server reports a recent crash:
/a/lflores-2022-11-30_22:53:49-rados-main-distro-de... - 09:06 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- In one of the jobs that passed, the OSDs were also failed for 31 seconds, but this time, the crashes were detected. S...
- 09:02 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Didn't reproduce in the 20x run above, but it did reproduce a second time here:
/a/yuriw-2022-11-28_21:09:37-rados... - 06:09 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Attaching server2 to this message.
- 06:09 PM Bug #58052: Empty Pool (zero objects) shows usage.
- I am realizing those logs are from a single host (server4).
server3 got removed today.
Attaching server1 to this me... - 05:42 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Radoslaw Zarzynski wrote:
> Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the c... - 03:28 PM Bug #58156 (In Progress): Monitors do not permit OSD to join after upgrading to Quincy
- 03:28 PM Bug #58156 (Resolved): Monitors do not permit OSD to join after upgrading to Quincy
- The Nautilus cluster has been eventually upgraded to Quincy and at the end OSDs stopped joining the cluster.
The i... - 03:24 PM Bug #58155 (Resolved): mon:ceph_assert(m < ranks.size()) `different code path than tracker 50089`
- Same problem with https://tracker.ceph.com/issues/50089, but it is a different code path.
We opened a new tracker ... - 01:31 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Nitzan Mordechai wrote:
> 王子敬 wang wrote:
> > Nitzan Mordechai wrote:
> > > Since you attached part of the pglog, ... - 01:06 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
- Linked a possible solution for skipping ubuntu with this test. I scheduled a teuthology test for it, which I will use...
12/01/2022
- 09:44 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Thanks for your observations, Brad! I'm going to dedicate this Tracker to `LibRadosAio.SimpleWrite` and mark it as re...
- 09:20 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- The issue appears to be in the api_aio test as it gets started but doesn't complete....
- 08:04 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Ran into another instance of this here:
/a/yuriw-2022-11-30_23:13:27-rados-wip-yuri2-testing-2022-11-30-0724-pacif... - 09:43 PM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
- /a/yuriw-2022-11-29_22:29:58-rados-wip-yuri10-testing-2022-11-29-1005-pacific-distro-default-smithi/7097464/
- 09:23 PM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
- possibly 58130 is related
- 07:30 PM Cleanup #58149 (Resolved): Clarify pool creation failure message due to exceeding max_pgs_per_osd
- This was inspired by the Re: [ceph-users] proxmox hyperconverged pg calculations in ceph pacific, pve 7.2 thread.
- 07:30 PM Bug #50089 (Resolved): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of...
- 06:59 PM Bug #50089 (New): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of moni...
- 04:12 PM Backport #58144 (Resolved): pacific: mon/MonCommands: Support dump_historic_slow_ops
- https://github.com/ceph/ceph/pull/49233
- 04:12 PM Backport #58143 (Resolved): quincy: mon/MonCommands: Support dump_historic_slow_ops
- https://github.com/ceph/ceph/pull/49232
- 04:02 PM Bug #58141 (Pending Backport): mon/MonCommands: Support dump_historic_slow_ops
- 12:42 PM Bug #58141 (Resolved): mon/MonCommands: Support dump_historic_slow_ops
- Slow ops are being tracked in the mon while `dump_historic_slow_ops` command is not registered:
```
$ ceph daemon .... - 03:56 PM Bug #58142 (In Progress): rbd-python snaps-many-objects: deep-scrub : stat mismatch
- ...
- 03:45 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- It seems more like generic RADOS issue.
- 12:27 PM Bug #57757 (Fix Under Review): ECUtil: terminate called after throwing an instance of 'ceph::buff...
- 08:18 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- 王子敬 wang wrote:
> Nitzan Mordechai wrote:
> > Since you attached part of the pglog, i can't see how many entries yo... - 01:50 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Nitzan Mordechai wrote:
> Since you attached part of the pglog, i can't see how many entries you have for log and ho... - 03:41 AM Bug #53806: unessesarily long laggy PG state
- Radoslaw Zarzynski wrote:
> OK, Aishwarya has found in testing that the @break@-related commit (https://github.com/c... - 12:51 AM Backport #58040: quincy: osd: add created_at and ceph_version_when_created metadata
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/49159
ceph-backport.sh versi...
11/30/2022
- 11:15 PM Bug #58132 (In Progress): qa/standalone/mon: --mon-initial-members setting causes us to populate ...
- 11:08 PM Bug #58132 (Resolved): qa/standalone/mon: --mon-initial-members setting causes us to populate rem...
- Problem:
--mon-initial-members does nothing but cause monmap
to populate ``removed_ranks`` because the way we sta... - 10:57 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Neha suggested we see how reproducible this is, so as not to mask any underlying problems by sleeping longer. I sched...
- 10:34 PM Bug #58130 (In Progress): LibRadosAio.SimpleWrite hang and pkill
- A rados api test experienced a failure after the last global tests had successfully run.
/a/yuriw-2022-11-29_22:29... - 07:31 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbos...
- 07:25 PM Bug #57977: osd:tick checking mon for new map
- The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find som... - 07:09 PM Bug #58106 (Need More Info): when a large number of error ops appear in the OSDs,pglog does not t...
- 10:43 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you pl... - 08:38 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- 王子敬 wang wrote:
> Nitzan Mordechai wrote:
> > @王子敬 wang, can you please send us the output for one of the pgs from ... - 08:32 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Nitzan Mordechai wrote:
> @王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool... - 07:30 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- @王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool?...
- 02:16 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Nitzan Mordechai wrote:
> @王子敬 wang can you please provide the output of 'ceph pg dump' ?
ok, the output in the pg_... - 07:07 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- I think the invariant here is that the @acting@ container should not have duplicates. If it is broken, we have a more...
- 01:55 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- If there are indeed duplicated entries in the acting set, should there be a 'break' at all in this loop? It seems lik...
- 07:00 PM Bug #53806: unessesarily long laggy PG state
- OK, Aishwarya has found in testing that the @break@-related commit (https://github.com/ceph/ceph/pull/44499/commits/9...
- 02:02 PM Bug #53806: unessesarily long laggy PG state
- FWIW, we've seen this happen very frequently during Nautilus->{Octopus,Pacific} upgrades. I had just tracked down the...
- 03:36 PM Bug #58114 (Closed): mon: FAILED ceph_assert(rank == new_rank)
- Close due to this issue is found pre-merge testing from PR: https://github.com/ceph/ceph/pull/48698/
- 04:14 AM Backport #58039: pacific: osd: add created_at and ceph_version_when_created metadata
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/49144
ceph-backport.sh versi...
11/29/2022
- 11:18 PM Bug #54438: test/objectstore/store_test.cc: FAILED ceph_assert(bl_eq(state->contents[noid].data, ...
- /a/yuriw-2022-11-28_16:28:53-rados-wip-yuri-testing-2022-11-18-1500-pacific-distro-default-smithi/7094026
- 07:14 PM Backport #58117 (In Progress): quincy: qa/workunits/rados/test_librados_build.sh: specify redirec...
- https://github.com/ceph/ceph/pull/49140
- 06:58 PM Backport #58117 (In Progress): quincy: qa/workunits/rados/test_librados_build.sh: specify redirec...
- 07:11 PM Backport #58116 (In Progress): pacific: qa/workunits/rados/test_librados_build.sh: specify redire...
- https://github.com/ceph/ceph/pull/49139
- 06:58 PM Backport #58116 (Resolved): pacific: qa/workunits/rados/test_librados_build.sh: specify redirect ...
- 06:52 PM Bug #58046 (Pending Backport): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
- 05:37 PM Bug #58046: qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
- Seen in Pacific run: /a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smi...
- 05:52 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
- We discussed this tracker in the RADOS meeting. Sam pointed out that this set of tests doesn't have any actual users,...
- 05:24 PM Bug #58114 (Closed): mon: FAILED ceph_assert(rank == new_rank)
- /a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smithi/7095280/remote/sm...
- 04:59 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
- ...
- 03:05 PM Bug #58107: mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
- Therefore, there is nothing we can do but wait for the other site to come back up, so pgs can complete peering and th...
- 03:04 PM Bug #58107 (Closed): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
- Closed due to this is not a corner case but quote from Greg Farnum:
``it’s that electing those two monitors means ... - 04:15 AM Bug #58107 (In Progress): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
- 04:14 AM Bug #58107 (Closed): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
- h1. How to reproduce the issue
h2. Set up:
mon.a (zone 1) rank=0
mon.b (zone 1) rank=1
mon.c (zone 2) rank=2
... - 01:07 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- @王子敬 wang can you please provide the output of 'ceph pg dump' ?
- 01:42 AM Bug #58106 (Need More Info): when a large number of error ops appear in the OSDs,pglog does not t...
- When We use the s3 interface append and copy of the object gateway, a large number of error ops appear in the OSDs wh...
- 11:12 AM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
- I could avoid this crash by removing all pg for which ceph could not get the clone_bytes, except the one I was sure t...
- 09:02 AM Backport #57496 (Resolved): quincy: Invalid read of size 8 in handle_recovery_delete()
- 07:05 AM Bug #50042 (Fix Under Review): rados/test.sh: api_watch_notify failures
11/28/2022
- 10:24 PM Bug #58098 (Fix Under Review): qa/workunits/rados/test_crash.sh: crashes are never posted
- 05:34 PM Bug #58098 (Resolved): qa/workunits/rados/test_crash.sh: crashes are never posted
- /a/yuriw-2022-11-23_15:09:06-rados-wip-yuri10-testing-2022-11-22-1711-distro-default-smithi/7087281...
- 09:43 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Just a follow-up.
Finally, what's helping us the best is increasing osd_scrub_sleep to 0.4. - 02:47 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Aishwarya Mathuria wrote:
> We suspect that this assert failure is hit in cases when we try to encode a message befo... - 05:05 AM Support #58091 (New): osd: reduce default value of osd_heartbeat_grace
- Client io hang 20s when peer osd ping failure, 20s is too long. In case of network jitter, it generally does not exce...
11/24/2022
- 03:54 AM Bug #57977: osd:tick checking mon for new map
- The more I dig, the more I'm thinking that this might be some race to do with noup, and probably has nothing to do wi...
- 03:42 AM Bug #57977: osd:tick checking mon for new map
- Something that's probably worth mentioning - we had noup set in the cluster for each upgrade, and we wait until all O...
- 03:12 AM Bug #57977: osd:tick checking mon for new map
- We saw this happen to roughly a dozen OSDs (1-2 per host for some hosts) during a recent upgrade from Nautilus to Pac...
11/22/2022
- 06:17 PM Bug #57977: osd:tick checking mon for new map
- I already restart osd daemon, but have no reproduct. If it happens again, I will collect more logs
- 03:54 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Radoslaw Zarzynski wrote:
> Could you please provide a log from an active mgr with @debug_ms=1@ and @debug_mgr=20@?
...
11/21/2022
- 06:35 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
- @Radek I have been trying to reproduce this locally with no luck. I'll try your suggestion and update if I'm successful.
- 06:34 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
- Thanks for the link, Matan! I'm a bit worried the experiment there involved changing 2 parameters the same: compiler ...
- 06:29 PM Bug #58044 (Need More Info): ceph-osd: osd numa affinity setting doesn't work
- How do you check the affinity?
Have you rebooted the OSD after the injecting the setting?
Could you please provide ... - 06:22 PM Bug #58046 (Resolved): qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
- 06:21 PM Bug #58052 (Need More Info): Empty Pool (zero objects) shows usage.
- Could you please provide a log from an active mgr with @debug_ms=1@ and @debug_mgr=20@? We would like to see which OS...
- 07:18 AM Bug #58027: op slow from throttled to header_read
- Radoslaw Zarzynski wrote:
> Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether thi...
Also available in: Atom