Activity
From 07/11/2022 to 08/09/2022
08/09/2022
- 07:49 PM Bug #57074: common: Latest version of main experiences build failures
- Per https://en.cppreference.com/w/cpp/compiler_support/20 (found by Mark Nelson), only some features were enabled in ...
- 05:23 PM Bug #57074: common: Latest version of main experiences build failures
- '-std=c++2a' seems to be the way that gcc versions < 9 add support for C++20, per https://gcc.gnu.org/projects/cxx-st...
- 05:08 PM Bug #57074: common: Latest version of main experiences build failures
- The gcc version running here is 8.5.0:
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15)
For the out... - 04:52 PM Bug #57074: common: Latest version of main experiences build failures
- one thing that stands out from the command line is '-std=c++2a' instead of '-std=c++20'. what compiler version is run...
- 03:07 PM Bug #57074 (Duplicate): common: Latest version of main experiences build failures
- Built on:...
- 07:22 PM Backport #56663 (Resolved): pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should...
- 04:52 PM Backport #56663: pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= ...
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/47211
merged - 07:22 PM Backport #56664 (Resolved): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should ...
- 04:43 PM Backport #56664: quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= m...
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/47210
merged - 05:59 PM Backport #57025 (Resolved): quincy: test_pool_min_size:AssertionError:wait_for_clean:failed befor...
- 05:57 PM Backport #57024 (Resolved): quincy: test_pool_min_size: 'check for active or peered' reached maxi...
- 05:56 PM Backport #57019 (Resolved): quincy: test_pool_min_size: AssertionError: not clean before minsize ...
- 05:43 PM Backport #57076 (Resolved): pacific: Invalid read of size 8 in handle_recovery_delete()
- https://github.com/ceph/ceph/pull/47525
- 04:51 PM Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- Laura Flores wrote:
> https://github.com/ceph/ceph/pull/46748
merged - 04:44 PM Bug #55153 (Resolved): Make the mClock config options related to [res, wgt, lim] modifiable durin...
- 04:44 PM Backport #56498 (Resolved): quincy: Make the mClock config options related to [res, wgt, lim] mod...
- 04:40 PM Backport #56498: quincy: Make the mClock config options related to [res, wgt, lim] modifiable dur...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47020
merged - 04:41 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
- https://github.com/ceph/ceph/pull/47086 merged
- 04:06 PM Bug #52124 (Pending Backport): Invalid read of size 8 in handle_recovery_delete()
- 02:36 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958376
- 03:32 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- /a/yuriw-2022-08-08_22:19:17-rados-wip-yuri-testing-2022-08-08-1230-quincy-distro-default-smithi/6962388/
- 01:53 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- /a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958138
- 12:35 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Hi,
Another longer one.
OSD.25, data on sdh, db on sdb
- 11:13 AM Bug #56530 (Resolved): Quincy: High CPU and slow progress during backfill
- 11:13 AM Backport #57052 (Resolved): quincy: Quincy: High CPU and slow progress during backfill
- 09:11 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in recent quincy run https://pulpito.ceph.com/yuriw-2022-08-02_21:20:37-fs-wip-yuri7-testing-2022-07-27-0808-qui...
- 08:08 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- Seeing this in a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-defa... - 06:58 AM Bug #47589: radosbench times out "reached maximum tries (800) after waiting for 4800 seconds"
- Seeing this on a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-defa... - 06:34 AM Backport #49775 (Rejected): nautilus: Get more parallel scrubs within osd_max_scrubs limits
- Nautilus is EOL
08/08/2022
- 03:53 PM Bug #57061 (Fix Under Review): Use single cluster log level (mon_cluster_log_level) config to con...
- 02:56 PM Bug #57061 (Fix Under Review): Use single cluster log level (mon_cluster_log_level) config to con...
- We donot control the verbosity of the cluster logs which are getting logged to stderr, graylog and journald. Each Log...
- 12:19 PM Bug #49231: MONs unresponsive over extended periods of time
- We are planning to upgrade to Octopus. However, I do not believe we can reproduce the issue here. The above config ha...
- 08:51 AM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Dan van der Ster wrote:
> Good point. In fact it is sufficient to just create some files in the cephfs after taking ... - 08:43 AM Backport #51497 (Rejected): nautilus: mgr spamming with repeated set pgp_num_actual while merging
- Nautilus is EOL
- 08:42 AM Bug #48212 (Resolved): poollast_epoch_clean floor is stuck after pg merging
- Nautilus is EOL
- 08:41 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
- Nautilus EOL
- 04:50 AM Backport #57052 (In Progress): quincy: Quincy: High CPU and slow progress during backfill
- 04:30 AM Backport #57052 (Resolved): quincy: Quincy: High CPU and slow progress during backfill
- https://github.com/ceph/ceph/pull/47490
- 04:27 AM Bug #56530 (Pending Backport): Quincy: High CPU and slow progress during backfill
08/07/2022
- 09:38 AM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- http://pulpito.front.sepia.ceph.com/rfriedma-2022-08-06_12:17:03-rados-wip-rf-snprefix-distro-default-smithi/6960416/...
08/06/2022
- 10:06 PM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
- Who is in charge of this one? Is there any advance?
- 10:03 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Wow I`m quite surprised to see this is taking so much time to be resolved. >
Can someone do a small recap on what's ...
08/05/2022
- 04:13 PM Bug #57049 (Duplicate): cluster logging does not adhere to mon_cluster_log_file_level
- Even after setting mon_cluster_log_file_level to info or less verbose level, we are still seeing debug logs are getti...
- 01:26 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Hello,
We don't have as much stalled moments these last days, only ~ 5 min.
I've taken some logs but really at the ...
08/04/2022
- 03:29 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
- /a/yuriw-2022-08-03_20:33:43-rados-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi/6957591
- 11:36 AM Fix #57040 (Fix Under Review): osd: Update osd's IOPS capacity using async Context completion ins...
- 10:47 AM Fix #57040 (Resolved): osd: Update osd's IOPS capacity using async Context completion instead of ...
- The method, OSD::mon_cmd_set_config(), sets a config option related to
mClock during OSD boot-up. The method waits o... - 05:12 AM Backport #57030 (In Progress): quincy: rados/test.sh: Early exit right after LibRados global test...
- 05:10 AM Backport #57029 (In Progress): pacific: rados/test.sh: Early exit right after LibRados global tes...
08/03/2022
- 07:26 PM Backport #57020: pacific: test_pool_min_size: AssertionError: not clean before minsize thrashing ...
- https://github.com/ceph/ceph/pull/47446
- 03:10 PM Backport #57020 (Resolved): pacific: test_pool_min_size: AssertionError: not clean before minsize...
- 07:25 PM Backport #57022: pacific: test_pool_min_size: 'check for active or peered' reached maximum tries ...
- https://github.com/ceph/ceph/pull/47446
- 03:12 PM Backport #57022 (Resolved): pacific: test_pool_min_size: 'check for active or peered' reached max...
- 07:23 PM Backport #57019: quincy: test_pool_min_size: AssertionError: not clean before minsize thrashing s...
- https://github.com/ceph/ceph/pull/47445
- 03:10 PM Backport #57019 (Resolved): quincy: test_pool_min_size: AssertionError: not clean before minsize ...
- 07:22 PM Backport #57024: quincy: test_pool_min_size: 'check for active or peered' reached maximum tries (...
- https://github.com/ceph/ceph/pull/47445
- 03:12 PM Backport #57024 (Resolved): quincy: test_pool_min_size: 'check for active or peered' reached maxi...
- 07:22 PM Backport #57023 (Rejected): octopus: test_pool_min_size: 'check for active or peered' reached max...
- 03:12 PM Backport #57023 (Rejected): octopus: test_pool_min_size: 'check for active or peered' reached max...
- 07:13 PM Backport #57026: pacific: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout ...
- https://github.com/ceph/ceph/pull/47446/
- 03:16 PM Backport #57026 (Resolved): pacific: test_pool_min_size:AssertionError:wait_for_clean:failed befo...
- 07:04 PM Backport #57025: quincy: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout e...
- https://github.com/ceph/ceph/pull/47445
- 03:16 PM Backport #57025 (Resolved): quincy: test_pool_min_size:AssertionError:wait_for_clean:failed befor...
- 05:30 PM Backport #57030 (Resolved): quincy: rados/test.sh: Early exit right after LibRados global tests c...
- https://github.com/ceph/ceph/pull/47452
- 05:30 PM Backport #57029 (Resolved): pacific: rados/test.sh: Early exit right after LibRados global tests ...
- https://github.com/ceph/ceph/pull/47451
- 05:27 PM Bug #55001 (Pending Backport): rados/test.sh: Early exit right after LibRados global tests complete
- 03:05 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- https://github.com/ceph/ceph/pull/47165 merged
- 03:10 PM Bug #51904 (Pending Backport): test_pool_min_size:AssertionError:wait_for_clean:failed before tim...
- 03:09 PM Bug #54511 (Pending Backport): test_pool_min_size: AssertionError: not clean before minsize thras...
- 03:09 PM Bug #49777 (Pending Backport): test_pool_min_size: 'check for active or peered' reached maximum t...
- 02:58 PM Bug #57017 (Pending Backport): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- There are certain scenarios in degraded
stretched cluster where will try to
go into the
function Monitor::go_recov... - 02:33 PM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
- 11:50 AM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- To get some more insight on the issue I would suggest to do the following once the issue is faced again:
1) For OSD-... - 06:02 AM Bug #55773 (Resolved): Assertion failure (ceph_assert(have_pending)) when creating new OSDs durin...
- 06:01 AM Backport #56060 (Resolved): quincy: Assertion failure (ceph_assert(have_pending)) when creating n...
08/01/2022
- 11:32 PM Bug #37808: osd: osdmap cache weak_refs assert during shutdown
- /a/yuriw-2022-07-27_22:35:53-rados-wip-yuri8-testing-2022-07-27-1303-pacific-distro-default-smithi/6950918
- 08:13 PM Tasks #56952 (In Progress): Set mgr_pool to true for a handful of tests in the rados qa suite
- 17.2.2 had the libcephsqlite failure. I am scheduling some rados/thrash tests here to see the current results. Since ...
- 05:54 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- I was able to try the patch on Pacific this morning. Running one OSD with the patch, getting 500s from RGW when I pre...
- 03:05 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- I've just had a latency plateau. No scrub/deep-scrub on the impacted OSD during that time...
At least, no message in... - 12:42 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Zero occurrence of "timed out" in all my ceph-osds logs for 2 days. But, as I have increased bluestore_prefer_deferre...
- 12:10 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Gilles Mocellin wrote:
> This morning, I have :
> PG_NOT_DEEP_SCRUBBED: 11 pgs not deep-scrubbed in time
> Never h... - 08:36 AM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- This morning, I have :
PG_NOT_DEEP_SCRUBBED: 11 pgs not deep-scrubbed in time
Never had before Pacific.
Could it... - 06:29 AM Backport #55157 (In Progress): quincy: mon: config commands do not accept whitespace style config...
- 06:28 AM Backport #55156 (In Progress): pacific: mon: config commands do not accept whitespace style confi...
- 06:03 AM Bug #52124 (Fix Under Review): Invalid read of size 8 in handle_recovery_delete()
07/29/2022
- 02:31 PM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- ...
- 10:52 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- The wrong return code is just an echo of a failure with an auth entity deletion:...
- 05:57 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- as i said, i dont have any more logs, as i had to bring the cluster back in a working state.
As this issue is comi... - 04:24 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- Hm... at first glance, OSD calls stop_block() on a head object, which is already stopped, in kick_object_context_bloc...
07/28/2022
- 09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
- 09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
- There is no point in indefinitely waiting when pg of an object is inactive. It is appropriate to cancel the op in suc...
- 07:12 PM Tasks #56952 (In Progress): Set mgr_pool to true for a handful of tests in the rados qa suite
- In most places in the rados suite we use `sudo ceph config set mgr mgr_pool false --force` (see https://github.com/ce...
- 01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- That is very strange. I've been able to reproduce 100% of the time with this:...
- 12:41 PM Bug #56707 (Fix Under Review): pglog growing unbounded on EC with copy by ref
- 12:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alex, thanks for the information. Unfortunately, I couldn't recreate the issue, but I did found some issue with refco...
- 02:23 AM Bug #56926 (New): crash: int BlueFS::_flush_range_F(BlueFS::FileWriter*, uint64_t, uint64_t): abort
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=97c9a15c7262222fd841813a...- 02:22 AM Bug #56903 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8749e9b5d1fac718fbbb96fb...- 02:22 AM Bug #56901 (New): crash: LogMonitor::log_external_backlog()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=64ca4b6b04c168da450a852a...- 02:22 AM Bug #56896 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=50bf2266e28cc1764b47775b...- 02:22 AM Bug #56895 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f96348a2ae0d2c754de01fc7...- 02:22 AM Bug #56892 (New): crash: StackStringBuf<4096ul>::xsputn(char const*, long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3a3287f5eaa9fbb99295b2b7...- 02:22 AM Bug #56890 (New): crash: MOSDRepOp::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9be8aeab4dd246c5baf1f1c7...- 02:22 AM Bug #56889 (New): crash: MOSDRepOp::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=fce79f2ea6c1a34825a23dd9...- 02:22 AM Bug #56888 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8df8f5fbb1ef85f0956e0f78...- 02:22 AM Bug #56887 (New): crash: void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::Col...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=625223857a28a74eae75273a...- 02:22 AM Bug #56883 (New): crash: rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Sli...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ae08527e7a8d310b5740fbf6...- 02:21 AM Bug #56878 (New): crash: MonitorDBStore::get_synchronizer(std::pair<std::basic_string<char, std::...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5cacc7785f8a352e3cd86982...- 02:21 AM Bug #56873 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=210d418989a6bc9fdb60989c...- 02:21 AM Bug #56872 (New): crash: __cxa_rethrow()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ce84c33423abe42eac8cc98...- 02:21 AM Bug #56871 (New): crash: __cxa_rethrow()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c6c9906c46f7979e39f2a3d...- 02:21 AM Bug #56867 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e151a6a9ae5a0a079dad1ca4...- 02:21 AM Bug #56863 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d1c8198db9a116b38c161a79...- 02:21 AM Bug #56856 (New): crash: ceph::buffer::list::iterator_impl<true>::copy(unsigned int, std::basic_s...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=03d7803d6cda8b31445b5fa2...- 02:21 AM Bug #56855 (New): crash: rocksdb::CompactionJob::Run()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=b79a082186434ab8becebddb...- 02:21 AM Bug #56850 (Resolved): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=106ff764dfe8a5f766a511a1...- 02:21 AM Bug #56849 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ff0cd923e0b4beb646ae133...- 02:20 AM Bug #56848 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0dcd9dfbff0c25591d64a41a...- 02:20 AM Bug #56847 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7a53cbc0bcdeffa2f26d71d0...- 02:20 AM Bug #56843 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=339539062c280c5c4e5e605c...- 02:20 AM Bug #56837 (New): crash: __assert_perror_fail()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8b423fcbfb14f36724d15462...- 02:20 AM Bug #56835 (New): crash: ceph::logging::detail::JournaldClient::JournaldClient(): assert(fd > 0)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e226e4ce8be4c94d64dd6104...- 02:20 AM Bug #56833 (New): crash: __assert_perror_fail()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e0d06d29c57064910751db9d...- 02:20 AM Bug #56826 (New): crash: MOSDPGLog::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ee3ed1408924d926185a65e3...- 02:20 AM Bug #56821 (New): crash: MOSDRepOp::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=6d21b2c78bcc5092dac5bcc9...- 02:19 AM Bug #56816 (New): crash: unsigned long const md_config_t::get_val<unsigned long>(ConfigValues con...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=66ff3f43b85f15283932865d...- 02:19 AM Bug #56814 (New): crash: rocksdb::MemTableIterator::key() const
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7329bea2aaafb66aa5060938...- 02:19 AM Bug #56813 (New): crash: MOSDPGLog::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e4eeb1a3b34df8062d7d1788...- 02:19 AM Bug #56809 (New): crash: MOSDPGScan::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=2fe9b06ce88dccd8c9fe8f41...- 02:18 AM Bug #56797 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c3fa597eda743682305f64b...- 02:18 AM Bug #56796 (New): crash: void ECBackend::handle_recovery_push(const PushOp&, RecoveryMessages*, b...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=dbf2120428a133c3689fa508...- 02:18 AM Bug #56794 (New): crash: void LogMonitor::_create_sub_incremental(MLog*, int, version_t): assert(...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=1f3b5497ed0df042120d8ff7...- 02:17 AM Bug #56793 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=54876dfe5b7062de7d1d3ee5...- 02:17 AM Bug #56789 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a87f94f67786787071927f90...- 02:17 AM Bug #56787 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f58b099fd24ce33032cf74bd...- 02:17 AM Bug #56785 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(!slot->waiting...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d44ea277d2ae53e186d6b488...- 02:16 AM Bug #56781 (New): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(version > osdm...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=4aed07fd08164fe65fe7c6e0...- 02:16 AM Bug #56780 (New): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(version > key...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=11756492895a3349dfb227aa...- 02:16 AM Bug #56779 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=83d5be7b2d08c79f23a10dba...- 02:16 AM Bug #56778 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=803b4a91fd84c3d26353cb47...- 02:16 AM Bug #56776 (New): crash: std::string MonMap::get_name(unsigned int) const: assert(n < ranks.size())
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7464294c2c2ac69856297e37...- 02:16 AM Bug #56773 (New): crash: int64_t BlueFS::_read_random(BlueFS::FileReader*, uint64_t, uint64_t, ch...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ba26d388e9213afb18b683ee...- 02:16 AM Bug #56772 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap....
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=62b8a9e7f0bb7fc1fc81b2dc...- 02:16 AM Bug #56770 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots....
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d9289f1067de7f0cc0e374ff...- 02:15 AM Bug #56764 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_size.cou...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3969752632dfdff2c710083a...- 02:14 AM Bug #56756 (New): crash: long const md_config_t::get_val<long>(ConfigValues const&, std::basic_st...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a4792692d74b82c4590d9b51...- 02:14 AM Bug #56755 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
*New crash events were reported via Telemetry with newer versions (['16.2.6', '16.2.7', '16.2.9']) than encountered...- 02:14 AM Bug #56754 (New): crash: DeviceList::DeviceList(ceph::common::CephContext*): assert(num)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=17b0ccd87cab46177149698e...- 02:14 AM Bug #56752 (New): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [wi...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=34f05776defb000d033885b3...- 02:14 AM Bug #56750 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=eb1729ae63d80bd79b6ea92b...- 02:14 AM Bug #56749 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9bfe9728f3e90e92bcab42f9...- 02:13 AM Bug #56748 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
*New crash events were reported via Telemetry with newer versions (['16.2.0', '16.2.1', '16.2.2', '16.2.5', '16.2.6...- 02:13 AM Bug #56747 (New): crash: std::__cxx11::string MonMap::get_name(unsigned int) const: assert(n < ra...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0846d215ecad4c78633623e5...
07/27/2022
- 11:46 PM Backport #56736 (In Progress): quincy: unessesarily long laggy PG state
- https://github.com/ceph/ceph/pull/47901
- 11:46 PM Backport #56735 (Resolved): octopus: unessesarily long laggy PG state
- 11:46 PM Backport #56734 (In Progress): pacific: unessesarily long laggy PG state
- https://github.com/ceph/ceph/pull/47899
- 11:40 PM Bug #53806 (Pending Backport): unessesarily long laggy PG state
- 06:23 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943721/remote/smithi042/l...
- 05:58 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- Moving to next week's bug scrub.
- 05:59 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Tried that a few times for different PGs on different OSDs, but it doesn't help
- 05:47 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Pascal Ehlert wrote:
> This indeed happened during an upgrade from Octopus to Pacific.
> I had forgotten to reduce ... - 12:24 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- This indeed happened during an upgrade from Octopus to Pacific.
I had forgotten to reduce the number of ranks in Cep... - 05:54 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- Nitzan, could it be a different issue?
- 04:27 PM Bug #56733 (New): Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Hello,
Since our upgrade to Pacific, we suffer from sporadic latencies on disks, not always the same.
The cluster... - 02:09 PM Bug #55851 (Fix Under Review): Assert in Ceph messenger
- 01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- >1. "dumping the refcount" - how did you dump the refcount?
I extracted it with rados getxattr refcont and used the... - 10:50 AM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alex
few more question, so i'll be able to recreate the scenario as you got it
1. "dumping the refcount" - how did ... - 07:20 AM Backport #56723 (Resolved): quincy: osd thread deadlock
- https://github.com/ceph/ceph/pull/47930
- 07:20 AM Backport #56722 (Resolved): pacific: osd thread deadlock
- https://github.com/ceph/ceph/pull/48254
- 07:16 AM Bug #55355 (Pending Backport): osd thread deadlock
07/26/2022
- 03:14 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- All the tests that this has failed on involve thrashing. Specifically, they all use thrashosds-health.yaml (https://g...
- 03:09 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- That was faster than I thought. Attached massif outfile (let me know if that's what you expect not super familiar wit...
- 02:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- I don't have one handy everything is in prometheus and sharing a screen of all the mempools isn't very legible. Valgr...
- 02:04 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alexandre, can you please send us the dump_mempools and if you can also run valgrind massif ?
- 02:57 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
- 02:32 PM Bug #55851: Assert in Ceph messenger
- Perhaps we should move into @deactivate_existing@ part of @reuse_connection()@ where we hold both locks the same time.
- 02:28 PM Bug #55851 (In Progress): Assert in Ceph messenger
- 02:27 PM Bug #55851: Assert in Ceph messenger
- It looks @reuse_connection()@ holds the ...
- 02:18 PM Bug #55851: Assert in Ceph messenger
- The number of elements in @FrameAssembler::m_desc@ can be altered only by:
1. ... - 03:17 AM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
- https://github.com/ceph/ceph/pull/47272
test/osd/TestPGLog.cc has a mistake description between log and olog in ...
07/25/2022
- 11:21 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Attached a pglog at the peak of one prod issue. I had to redact the object names since it's prod but let me know if y...
- 11:07 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Can you dump the pg log using the ceph-objectstore-tool when the OSD is consuming high memory and share it with us?
- 10:51 PM Bug #56707 (Pending Backport): pglog growing unbounded on EC with copy by ref
*How to reproduce*
- create a 10GB object in bucket1 using multipart upload
- copy object 200x via s3:Objec...- 10:46 PM Bug #56700: MGR pod on CLBO on rook deployment
- I am hitting a bunch of these failures on a recent teuthology run I scheduled. The ceph version is 17.2.0:
http://... - 05:34 PM Bug #56700: MGR pod on CLBO on rook deployment
- Quoting from a chat group:
@Travis Nielsen I think the issue you are seeing was first seen in https://tracker.ceph... - 04:59 PM Bug #56700 (Duplicate): MGR pod on CLBO on rook deployment
- 04:51 PM Bug #56700: MGR pod on CLBO on rook deployment
- Parth Arora wrote:
> MGR pod is failing for the new ceph version v17.2.2, till v17.2.1 it was working fine.
>
> ... - 04:48 PM Bug #56700 (Duplicate): MGR pod on CLBO on rook deployment
- MGR pod is failing for the new ceph version v17.2.2, tillv17.2.1 it was working fine.
```
29: PyObject_Call()
3... - 07:03 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- /a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943763/
- 02:04 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
- https://github.com/ceph/ceph/pull/47087 merged
07/24/2022
- 08:09 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- Myoungwon Oh, can you please take a look?
- 07:36 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- Sadly i dont have any more logs anymore, as i had to destroy the ceph - getting it back in working order was top prio...
- 05:28 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- @Chris Kul, I'm trying to understand the sequence of failing osd's, can you please upload the osds logs that failed?
...
07/21/2022
- 08:29 PM Bug #55836: add an asok command for pg log investigations
- https://github.com/ceph/ceph/pull/46561 merged
- 07:19 PM Bug #56530 (Fix Under Review): Quincy: High CPU and slow progress during backfill
- 06:58 PM Bug #56530: Quincy: High CPU and slow progress during backfill
- The issue is addressed currently in Ceph's main branch. Please see the linked PR. This will be back-ported to Quincy ...
- 02:59 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- Just a note, i was able to recreate it with vstart, without error injection but with valgrind
as soon as we step in... - 02:00 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- Ah, thanks Sridhar. I will compare the two Trackers and mark this one as a duplicate if needed.
- 02:57 AM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- This looks similar to https://tracker.ceph.com/issues/52948. See comment https://tracker.ceph.com/issues/52948#note-5...
- 02:57 PM Backport #56664 (In Progress): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change shou...
- https://github.com/ceph/ceph/pull/47210
- 02:45 PM Backport #56664 (Resolved): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should ...
- 02:49 PM Backport #56663: pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= ...
- https://github.com/ceph/ceph/pull/47211
- 02:45 PM Backport #56663 (Resolved): pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should...
- 02:40 PM Bug #56151 (Pending Backport): mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be ga...
- 01:34 PM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- BTW the initial version was 17.2.0, we tried to update to 17.2.1 in hope this bug got fixed, sadly without luck.
- 01:33 PM Bug #56661 (Need More Info): Quincy: OSD crashing one after another with data loss with ceph_asse...
- After two weeks after an upgrade to quincy from a octopus setup, the SSD pool reported one OSD down in the middle of ...
- 09:05 AM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- Looks like a race condition. Does our a @Context@ makes a dependency on @RefCountedObj@ (e.g. @TrackedOp@) but forget...
07/20/2022
- 11:33 PM Bug #44089 (New): mon: --format=json does not work for config get or show
- This would be a good issue for Open Source Day if someone would be willing to take over the closed PR: https://github...
- 09:40 PM Bug #56530: Quincy: High CPU and slow progress during backfill
- ceph-users discussion - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z7AILAXZDBIT6IIF2E6M3BLUE6B7L...
- 07:45 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- Found another occurrence here: /a/yuriw-2022-07-18_18:20:02-rados-wip-yuri8-testing-2022-07-18-0918-distro-default-sm...
- 06:11 PM Bug #56574 (Need More Info): rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down...
- Watching for more reoccurances.
- 10:25 AM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- osd.0 is still down..
The valagrind for osd.0 shows:... - 06:25 PM Bug #51168: ceph-osd state machine crash during peering process
- Yao Ning wrote:
> Radoslaw Zarzynski wrote:
> > The PG was in @ReplicaActive@ so we shouldn't see any backfill acti... - 06:06 PM Backport #56656 (New): pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDur...
- 06:06 PM Backport #56655 (In Progress): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierF...
- https://github.com/ceph/ceph/pull/47929
- 06:03 PM Bug #53294 (Pending Backport): rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuri...
- 03:20 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- /a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939431...
- 06:02 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
- Notes from the scrub:
1. It looks this happens mostly (only?) on pacific.
2. In at least of two replications Valg... - 05:56 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
- ...
- 03:58 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
- /a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939660
- 04:42 PM Backport #56408: quincy: ceph version 16.2.7 PG scrubs not progressing
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46844
merged - 04:40 PM Backport #56060: quincy: Assertion failure (ceph_assert(have_pending)) when creating new OSDs dur...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46689
merged - 04:40 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
- https://github.com/ceph/ceph/pull/46498 merged
- 04:08 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
- /a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939513
- 04:07 PM Bug #53767 (Duplicate): qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing c...
- Same failure on test_cls_2pc_queue.sh, but this one came with remote logs. I suspect this is a duplicate of #55809.
... - 03:43 PM Bug #43584: MON_DOWN during mon_join process
- /a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939512
- 02:50 PM Bug #56650: ceph df reports invalid MAX AVAIL value for stretch mode crush rule
- Before applying PR#47189, MAX AVAIL for stretch_rule pools is incorrect :...
- 02:07 PM Bug #56650 (Fix Under Review): ceph df reports invalid MAX AVAIL value for stretch mode crush rule
- 01:26 PM Bug #56650 (Fix Under Review): ceph df reports invalid MAX AVAIL value for stretch mode crush rule
- If we define crush rule for stretch mode cluster with multiple take then MAX AVAIL for pools associated with crush ru...
- 01:15 PM Backport #56649 (New): pacific: [Progress] Do not show NEW PG_NUM value for pool if autoscaler is...
- 01:15 PM Backport #56648 (In Progress): quincy: [Progress] Do not show NEW PG_NUM value for pool if autosc...
- https://github.com/ceph/ceph/pull/47925
- 01:14 PM Bug #56136 (Pending Backport): [Progress] Do not show NEW PG_NUM value for pool if autoscaler is ...
07/19/2022
- 09:20 PM Backport #56642 (Resolved): pacific: Log at 1 when Throttle::get_or_fail() fails
- 09:20 PM Backport #56641 (Resolved): quincy: Log at 1 when Throttle::get_or_fail() fails
- 09:18 PM Bug #56495 (Pending Backport): Log at 1 when Throttle::get_or_fail() fails
- 02:07 PM Bug #56495: Log at 1 when Throttle::get_or_fail() fails
- https://github.com/ceph/ceph/pull/47019 merged
- 04:24 PM Bug #50222 (In Progress): osd: 5.2s0 deep-scrub : stat mismatch
- Thanks Rishabh, I am having a look into this.
- 04:11 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- This error showed up in QA runs -
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-tes... - 10:25 AM Bug #55001 (Fix Under Review): rados/test.sh: Early exit right after LibRados global tests complete
- 08:28 AM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- the core dump showing:...
- 08:28 AM Bug #49689 (Fix Under Review): osd/PeeringState.cc: ceph_abort_msg("past_interval start interval ...
- PR is marked as draft for now.
- 08:26 AM Backport #56580 (Resolved): octopus: snapshots will not be deleted after upgrade from nautilus to...
- 12:48 AM Bug #50853 (Can't reproduce): libcephsqlite: Core dump while running test_libcephsqlite.sh.
07/18/2022
- 08:43 PM Backport #56580: octopus: snapshots will not be deleted after upgrade from nautilus to pacific
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47108
merged - 01:52 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
- I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931...
- 12:44 PM Bug #49777 (Fix Under Review): test_pool_min_size: 'check for active or peered' reached maximum t...
- 01:50 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-07-13_19:41:18-rados-wip-yuri7-testing-2022-07-11-1631-distro-default-smithi/6929396/remote/smithi204/...
- 01:47 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- We have coredump and the console_log showing:
smithi042.log:[ 852.382596] ceph_test_rados[110223]: segfault at 0 ip... - 01:42 PM Backport #56604 (Resolved): pacific: ceph report missing osdmap_clean_epochs if answered by peon
- https://github.com/ceph/ceph/pull/51258
- 01:42 PM Backport #56603 (Rejected): octopus: ceph report missing osdmap_clean_epochs if answered by peon
- 01:42 PM Backport #56602 (Resolved): quincy: ceph report missing osdmap_clean_epochs if answered by peon
- https://github.com/ceph/ceph/pull/47928
- 01:37 PM Bug #47273 (Pending Backport): ceph report missing osdmap_clean_epochs if answered by peon
- 01:34 PM Bug #54511: test_pool_min_size: AssertionError: not clean before minsize thrashing starts
- I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931...
- 12:44 PM Bug #54511 (Fix Under Review): test_pool_min_size: AssertionError: not clean before minsize thras...
- 01:16 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931...
- 12:44 PM Bug #51904 (Fix Under Review): test_pool_min_size:AssertionError:wait_for_clean:failed before tim...
- 10:18 AM Bug #56575 (Fix Under Review): test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fai...
07/17/2022
- 01:16 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- /a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932690
- 01:04 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- /a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932687
- 09:03 AM Backport #56579 (In Progress): pacific: snapshots will not be deleted after upgrade from nautilus...
- 09:02 AM Backport #56578 (In Progress): quincy: snapshots will not be deleted after upgrade from nautilus ...
- 06:51 AM Bug #56575: test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fails from "method loc...
- The lock expired, so the next ioctx.stat won't return -2 (-ENOENT) we need to change that as well based on r1 that re...
07/16/2022
- 03:18 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
- This issue is fixed (including a unit test) and will be backported in order to prevent future clusters upgrades from ...
07/15/2022
- 09:17 PM Cleanup #56581 (Fix Under Review): mon: fix ElectionLogic warnings
- 09:06 PM Cleanup #56581 (Resolved): mon: fix ElectionLogic warnings
- h3. Problem: compilation warnings in the ElectionLogic code...
- 08:58 PM Backport #56580 (In Progress): octopus: snapshots will not be deleted after upgrade from nautilus...
- 08:55 PM Backport #56580 (Resolved): octopus: snapshots will not be deleted after upgrade from nautilus to...
- https://github.com/ceph/ceph/pull/47108
- 08:55 PM Backport #56579 (Resolved): pacific: snapshots will not be deleted after upgrade from nautilus to...
- https://github.com/ceph/ceph/pull/47134
- 08:55 PM Backport #56578 (Resolved): quincy: snapshots will not be deleted after upgrade from nautilus to ...
- https://github.com/ceph/ceph/pull/47133
- 08:51 PM Bug #56147 (Pending Backport): snapshots will not be deleted after upgrade from nautilus to pacific
- 07:31 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- /a/nojha-2022-07-15_14:45:04-rados-snapshot_key_conversion-distro-default-smithi/6932156
- 07:23 PM Bug #56574 (Need More Info): rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down...
- Description: rados/valgrind-leaks/{1-start 2-inject-leak/osd centos_latest}
/a/nojha-2022-07-14_20:32:09-rados-sn... - 07:29 PM Bug #56575 (Fix Under Review): test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fai...
- /a/nojha-2022-07-14_20:32:09-rados-snapshot_key_conversion-distro-default-smithi/6930848...
- 12:09 PM Bug #56565 (Won't Fix): Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap
- I was just told there is a step in the upgrade documentation to set mon_mds_skip_sanity param before upgrade [1], whi...
- 10:07 AM Bug #51168: ceph-osd state machine crash during peering process
- Radoslaw Zarzynski wrote:
> The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event...
07/14/2022
- 12:19 PM Bug #56565 (Won't Fix): Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap
- I have no idea if this needs to be fixed but at least the case looks worth reporting.
We faced the issue when upgr...
07/13/2022
- 07:48 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- Noticed that this PR was newly included in 17.2.1, and it makes a change to GetApproximateSizes: https://github.com/c...
- 07:10 PM Backport #56551: quincy: mon/Elector: notify_ranked_removed() does not properly erase dead_ping i...
- https://github.com/ceph/ceph/pull/47086
- 06:55 PM Backport #56551 (Resolved): quincy: mon/Elector: notify_ranked_removed() does not properly erase ...
- 07:09 PM Backport #56550 (In Progress): pacific: mon/Elector: notify_ranked_removed() does not properly er...
- https://github.com/ceph/ceph/pull/47087
- 06:55 PM Backport #56550 (Resolved): pacific: mon/Elector: notify_ranked_removed() does not properly erase...
- 06:51 PM Bug #56034: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
- This looks like a test failure, so nor terribly high priority.
- 06:49 PM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
- Sridhar Seshasayee wrote:
> /a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smit... - 06:42 PM Bug #56438 (Need More Info): found snap mapper error on pg 3.bs0> oid 3:d81a0fb3:::smithi10749189...
- Waiting for reoccurrences.
- 06:38 PM Bug #56439: mon/crush_ops.sh: Error ENOENT: no backward-compatible weight-set
- Let's observe whether there will be reoccurances.
- 06:33 PM Bug #55450 (Resolved): [DOC] stretch_rule defined in the doc needs updation
- 06:33 PM Bug #55450: [DOC] stretch_rule defined in the doc needs updation
- Opensource contributor, github username: elacunza. Created https://github.com/ceph/ceph/pull/46170 and have resolved ...
- 06:33 PM Bug #56147 (Fix Under Review): snapshots will not be deleted after upgrade from nautilus to pacific
- 06:31 PM Bug #56463 (Triaged): osd nodes with NVME try to run `smartctl` and `nvme` even when the tools ar...
- They are called from @block_device_get_metrics()@ in @common/blkdev.cc@.
- 06:26 PM Bug #54485 (Resolved): doc/rados/operations/placement-groups/#automated-scaling: --bulk invalid c...
- 06:23 PM Backport #54505 (Resolved): pacific: doc/rados/operations/placement-groups/#automated-scaling: --...
- 06:22 PM Backport #54506 (Resolved): quincy: doc/rados/operations/placement-groups/#automated-scaling: --b...
- 06:22 PM Bug #54576 (Resolved): cache tier set proxy faild
- Fix merged.
- 06:19 PM Bug #55665 (Fix Under Review): osd: osd_fast_fail_on_connection_refused will cause the mon to con...
- 06:11 PM Bug #51168: ceph-osd state machine crash during peering process
- Nautilus is EOL now and it is also possible that we may have fixed such a bug after 14.2.18.
Can you tell me the P... - 06:08 PM Bug #51168: ceph-osd state machine crash during peering process
- The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event maybe?
- 06:04 PM Bug #51168: ceph-osd state machine crash during peering process
- ...
- 06:02 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/6910169/ first timed out and the...
- 06:01 PM Bug #56192: crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- Reoccurence reported in https://tracker.ceph.com/issues/51904#note-21. See also the replies:
* https://tracker.cep... - 05:57 PM Bug #49777 (In Progress): test_pool_min_size: 'check for active or peered' reached maximum tries ...
- 05:46 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Hello Aishwarya! How about coworking on this? Ping me when you have time.
- 05:42 PM Bug #50242: test_repair_corrupted_obj fails with assert not inconsistent
- Hello Ronen. It looks to be somehow scrub-related. Mind taking a look? Nothing urgent.
- 05:38 PM Bug #56392 (Resolved): ceph build warning: comparison of integer expressions of different signedness
- 12:07 PM Feature #56543 (New): About the performance improvement of ceph's erasure code storage pool
- Hello everyone:
Although I know that the erause code storage pool is not suitable for use in scenarios with many ran...
07/12/2022
- 10:29 PM Bug #56495 (Fix Under Review): Log at 1 when Throttle::get_or_fail() fails
- 01:57 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Greg Farnum wrote:
> That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use wel... - 01:06 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use well. What were you tryin...
- 12:59 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- AFAICT this is just a RADOS issue?
- 01:30 PM Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46767
merged - 12:41 PM Bug #56530: Quincy: High CPU and slow progress during backfill
- Thanks for looking at this. Answers to your questions:
1. Backfill started at around 4-5 objects per second, and t... - 11:56 AM Bug #56530: Quincy: High CPU and slow progress during backfill
- While we look into this, I have a couple of questions:
1. Did the recovery rate stay at 1 object/sec throughout? I... - 11:16 AM Bug #56530 (Resolved): Quincy: High CPU and slow progress during backfill
- I'm seeing a similar problem on a small cluster just upgraded from Pacific 16.2.9 to Quincy 17.2.1 (non-cephadm). The...
07/11/2022
- 09:18 PM Bug #54396 (Resolved): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snapt...
- 09:17 PM Feature #55982 (Resolved): log the numbers of dups in PG Log
- 09:17 PM Backport #55985 (Resolved): octopus: log the numbers of dups in PG Log
- 01:35 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- https://github.com/ceph/ceph/pull/46845 merged
- 01:31 PM Backport #51287: pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: ...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46677
merged
Also available in: Atom