Project

General

Profile

Activity

From 06/29/2022 to 07/28/2022

07/28/2022

09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
Vikhyat Umrao
09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
There is no point in indefinitely waiting when pg of an object is inactive. It is appropriate to cancel the op in suc... Vikhyat Umrao
07:12 PM Tasks #56952 (Closed): Set mgr_pool to true for a handful of tests in the rados qa suite
In most places in the rados suite we use `sudo ceph config set mgr mgr_pool false --force` (see https://github.com/ce... Laura Flores
01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
That is very strange. I've been able to reproduce 100% of the time with this:... Alexandre Marangone
12:41 PM Bug #56707 (Fix Under Review): pglog growing unbounded on EC with copy by ref
Nitzan Mordechai
12:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Alex, thanks for the information. Unfortunately, I couldn't recreate the issue, but I did found some issue with refco... Nitzan Mordechai
02:23 AM Bug #56926 (New): crash: int BlueFS::_flush_range_F(BlueFS::FileWriter*, uint64_t, uint64_t): abort

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=97c9a15c7262222fd841813a...
Telemetry Bot
02:22 AM Bug #56903 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8749e9b5d1fac718fbbb96fb...
Telemetry Bot
02:22 AM Bug #56901 (New): crash: LogMonitor::log_external_backlog()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=64ca4b6b04c168da450a852a...
Telemetry Bot
02:22 AM Bug #56896 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=50bf2266e28cc1764b47775b...
Telemetry Bot
02:22 AM Bug #56895 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f96348a2ae0d2c754de01fc7...
Telemetry Bot
02:22 AM Bug #56892 (New): crash: StackStringBuf<4096ul>::xsputn(char const*, long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3a3287f5eaa9fbb99295b2b7...
Telemetry Bot
02:22 AM Bug #56890 (New): crash: MOSDRepOp::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9be8aeab4dd246c5baf1f1c7...
Telemetry Bot
02:22 AM Bug #56889 (New): crash: MOSDRepOp::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=fce79f2ea6c1a34825a23dd9...
Telemetry Bot
02:22 AM Bug #56888 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8df8f5fbb1ef85f0956e0f78...
Telemetry Bot
02:22 AM Bug #56887 (New): crash: void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::Col...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=625223857a28a74eae75273a...
Telemetry Bot
02:22 AM Bug #56883 (New): crash: rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Sli...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ae08527e7a8d310b5740fbf6...
Telemetry Bot
02:21 AM Bug #56878 (New): crash: MonitorDBStore::get_synchronizer(std::pair<std::basic_string<char, std::...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5cacc7785f8a352e3cd86982...
Telemetry Bot
02:21 AM Bug #56873 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=210d418989a6bc9fdb60989c...
Telemetry Bot
02:21 AM Bug #56872 (New): crash: __cxa_rethrow()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ce84c33423abe42eac8cc98...
Telemetry Bot
02:21 AM Bug #56871 (New): crash: __cxa_rethrow()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c6c9906c46f7979e39f2a3d...
Telemetry Bot
02:21 AM Bug #56867 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e151a6a9ae5a0a079dad1ca4...
Telemetry Bot
02:21 AM Bug #56863 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d1c8198db9a116b38c161a79...
Telemetry Bot
02:21 AM Bug #56856 (New): crash: ceph::buffer::list::iterator_impl<true>::copy(unsigned int, std::basic_s...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=03d7803d6cda8b31445b5fa2...
Telemetry Bot
02:21 AM Bug #56855 (New): crash: rocksdb::CompactionJob::Run()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=b79a082186434ab8becebddb...
Telemetry Bot
02:21 AM Bug #56850 (Resolved): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=106ff764dfe8a5f766a511a1...
Telemetry Bot
02:21 AM Bug #56849 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ff0cd923e0b4beb646ae133...
Telemetry Bot
02:20 AM Bug #56848 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0dcd9dfbff0c25591d64a41a...
Telemetry Bot
02:20 AM Bug #56847 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7a53cbc0bcdeffa2f26d71d0...
Telemetry Bot
02:20 AM Bug #56843 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=339539062c280c5c4e5e605c...
Telemetry Bot
02:20 AM Bug #56837 (New): crash: __assert_perror_fail()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8b423fcbfb14f36724d15462...
Telemetry Bot
02:20 AM Bug #56835 (New): crash: ceph::logging::detail::JournaldClient::JournaldClient(): assert(fd > 0)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e226e4ce8be4c94d64dd6104...
Telemetry Bot
02:20 AM Bug #56833 (New): crash: __assert_perror_fail()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e0d06d29c57064910751db9d...
Telemetry Bot
02:20 AM Bug #56826 (New): crash: MOSDPGLog::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ee3ed1408924d926185a65e3...
Telemetry Bot
02:20 AM Bug #56821 (New): crash: MOSDRepOp::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=6d21b2c78bcc5092dac5bcc9...
Telemetry Bot
02:19 AM Bug #56816 (New): crash: unsigned long const md_config_t::get_val<unsigned long>(ConfigValues con...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=66ff3f43b85f15283932865d...
Telemetry Bot
02:19 AM Bug #56814 (New): crash: rocksdb::MemTableIterator::key() const

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7329bea2aaafb66aa5060938...
Telemetry Bot
02:19 AM Bug #56813 (New): crash: MOSDPGLog::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e4eeb1a3b34df8062d7d1788...
Telemetry Bot
02:19 AM Bug #56809 (New): crash: MOSDPGScan::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=2fe9b06ce88dccd8c9fe8f41...
Telemetry Bot
02:18 AM Bug #56797 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c3fa597eda743682305f64b...
Telemetry Bot
02:18 AM Bug #56796 (New): crash: void ECBackend::handle_recovery_push(const PushOp&, RecoveryMessages*, b...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=dbf2120428a133c3689fa508...
Telemetry Bot
02:18 AM Bug #56794 (New): crash: void LogMonitor::_create_sub_incremental(MLog*, int, version_t): assert(...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=1f3b5497ed0df042120d8ff7...
Telemetry Bot
02:17 AM Bug #56793 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=54876dfe5b7062de7d1d3ee5...
Telemetry Bot
02:17 AM Bug #56789 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a87f94f67786787071927f90...
Telemetry Bot
02:17 AM Bug #56787 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f58b099fd24ce33032cf74bd...
Telemetry Bot
02:17 AM Bug #56785 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(!slot->waiting...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d44ea277d2ae53e186d6b488...
Telemetry Bot
02:16 AM Bug #56781 (New): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(version > osdm...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=4aed07fd08164fe65fe7c6e0...
Telemetry Bot
02:16 AM Bug #56780 (New): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(version > key...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=11756492895a3349dfb227aa...
Telemetry Bot
02:16 AM Bug #56779 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=83d5be7b2d08c79f23a10dba...
Telemetry Bot
02:16 AM Bug #56778 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=803b4a91fd84c3d26353cb47...
Telemetry Bot
02:16 AM Bug #56776 (New): crash: std::string MonMap::get_name(unsigned int) const: assert(n < ranks.size())

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7464294c2c2ac69856297e37...
Telemetry Bot
02:16 AM Bug #56773 (New): crash: int64_t BlueFS::_read_random(BlueFS::FileReader*, uint64_t, uint64_t, ch...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ba26d388e9213afb18b683ee...
Telemetry Bot
02:16 AM Bug #56772 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap....

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=62b8a9e7f0bb7fc1fc81b2dc...
Telemetry Bot
02:16 AM Bug #56770 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots....

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d9289f1067de7f0cc0e374ff...
Telemetry Bot
02:15 AM Bug #56764 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_size.cou...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3969752632dfdff2c710083a...
Telemetry Bot
02:14 AM Bug #56756 (New): crash: long const md_config_t::get_val<long>(ConfigValues const&, std::basic_st...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a4792692d74b82c4590d9b51...
Telemetry Bot
02:14 AM Bug #56755 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

*New crash events were reported via Telemetry with newer versions (['16.2.6', '16.2.7', '16.2.9']) than encountered...
Telemetry Bot
02:14 AM Bug #56754 (New): crash: DeviceList::DeviceList(ceph::common::CephContext*): assert(num)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=17b0ccd87cab46177149698e...
Telemetry Bot
02:14 AM Bug #56752 (New): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [wi...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=34f05776defb000d033885b3...
Telemetry Bot
02:14 AM Bug #56750 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=eb1729ae63d80bd79b6ea92b...
Telemetry Bot
02:14 AM Bug #56749 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9bfe9728f3e90e92bcab42f9...
Telemetry Bot
02:13 AM Bug #56748 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

*New crash events were reported via Telemetry with newer versions (['16.2.0', '16.2.1', '16.2.2', '16.2.5', '16.2.6...
Telemetry Bot
02:13 AM Bug #56747 (New): crash: std::__cxx11::string MonMap::get_name(unsigned int) const: assert(n < ra...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0846d215ecad4c78633623e5...
Telemetry Bot

07/27/2022

11:46 PM Backport #56736 (Resolved): quincy: unessesarily long laggy PG state
https://github.com/ceph/ceph/pull/47901 Backport Bot
11:46 PM Backport #56735 (Resolved): octopus: unessesarily long laggy PG state
Backport Bot
11:46 PM Backport #56734 (Rejected): pacific: unessesarily long laggy PG state
https://github.com/ceph/ceph/pull/47899 Backport Bot
11:40 PM Bug #53806 (Pending Backport): unessesarily long laggy PG state
Kefu Chai
06:23 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943721/remote/smithi042/l... Kamoltat (Junior) Sirivadhna
05:58 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
Moving to next week's bug scrub. Radoslaw Zarzynski
05:59 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Tried that a few times for different PGs on different OSDs, but it doesn't help Pascal Ehlert
05:47 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Pascal Ehlert wrote:
> This indeed happened during an upgrade from Octopus to Pacific.
> I had forgotten to reduce ...
Neha Ojha
12:24 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
This indeed happened during an upgrade from Octopus to Pacific.
I had forgotten to reduce the number of ranks in Cep...
Pascal Ehlert
05:54 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Nitzan, could it be a different issue? Radoslaw Zarzynski
04:27 PM Bug #56733 (New): Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Hello,
Since our upgrade to Pacific, we suffer from sporadic latencies on disks, not always the same.
The cluster...
Gilles Mocellin
02:09 PM Bug #55851 (Fix Under Review): Assert in Ceph messenger
Radoslaw Zarzynski
01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
>1. "dumping the refcount" - how did you dump the refcount?
I extracted it with rados getxattr refcont and used the...
Alexandre Marangone
10:50 AM Bug #56707: pglog growing unbounded on EC with copy by ref
Alex
few more question, so i'll be able to recreate the scenario as you got it
1. "dumping the refcount" - how did ...
Nitzan Mordechai
07:20 AM Backport #56723 (Resolved): quincy: osd thread deadlock
https://github.com/ceph/ceph/pull/47930 Backport Bot
07:20 AM Backport #56722 (Resolved): pacific: osd thread deadlock
https://github.com/ceph/ceph/pull/48254 Backport Bot
07:16 AM Bug #55355 (Pending Backport): osd thread deadlock
Radoslaw Zarzynski

07/26/2022

03:14 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
All the tests that this has failed on involve thrashing. Specifically, they all use thrashosds-health.yaml (https://g... Laura Flores
03:09 PM Bug #56707: pglog growing unbounded on EC with copy by ref
That was faster than I thought. Attached massif outfile (let me know if that's what you expect not super familiar wit... Alexandre Marangone
02:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
I don't have one handy everything is in prometheus and sharing a screen of all the mempools isn't very legible. Valgr... Alexandre Marangone
02:04 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Alexandre, can you please send us the dump_mempools and if you can also run valgrind massif ? Nitzan Mordechai
02:57 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
Laura Flores
02:32 PM Bug #55851: Assert in Ceph messenger
Perhaps we should move into @deactivate_existing@ part of @reuse_connection()@ where we hold both locks the same time. Radoslaw Zarzynski
02:28 PM Bug #55851 (In Progress): Assert in Ceph messenger
Radoslaw Zarzynski
02:27 PM Bug #55851: Assert in Ceph messenger
It looks @reuse_connection()@ holds the ... Radoslaw Zarzynski
02:18 PM Bug #55851: Assert in Ceph messenger
The number of elements in @FrameAssembler::m_desc@ can be altered only by:
1. ...
Radoslaw Zarzynski
03:17 AM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
https://github.com/ceph/ceph/pull/47272
test/osd/TestPGLog.cc has a mistake description between log and olog in ...
dheart joe

07/25/2022

11:21 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Attached a pglog at the peak of one prod issue. I had to redact the object names since it's prod but let me know if y... Alexandre Marangone
11:07 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Can you dump the pg log using the ceph-objectstore-tool when the OSD is consuming high memory and share it with us? Neha Ojha
10:51 PM Bug #56707 (Resolved): pglog growing unbounded on EC with copy by ref

*How to reproduce*
- create a 10GB object in bucket1 using multipart upload
- copy object 200x via s3:Objec...
Alexandre Marangone
10:46 PM Bug #56700: MGR pod on CLBO on rook deployment
I am hitting a bunch of these failures on a recent teuthology run I scheduled. The ceph version is 17.2.0:
http://...
Laura Flores
05:34 PM Bug #56700: MGR pod on CLBO on rook deployment
Quoting from a chat group:
@Travis Nielsen I think the issue you are seeing was first seen in https://tracker.ceph...
Neha Ojha
04:59 PM Bug #56700 (Duplicate): MGR pod on CLBO on rook deployment
Vikhyat Umrao
04:51 PM Bug #56700: MGR pod on CLBO on rook deployment
Parth Arora wrote:
> MGR pod is failing for the new ceph version v17.2.2, till v17.2.1 it was working fine.
>
> ...
Parth Arora
04:48 PM Bug #56700 (Duplicate): MGR pod on CLBO on rook deployment
MGR pod is failing for the new ceph version v17.2.2, tillv17.2.1 it was working fine.
```
29: PyObject_Call()
3...
Parth Arora
10:19 PM Bug #53768 (New): timed out waiting for admin_socket to appear after osd.2 restart in thrasher/de...
Joseph Sawaya
08:55 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
job dead hit max timeout but trace back suggests:... Kamoltat (Junior) Sirivadhna
08:32 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944338/ Kamoltat (Junior) Sirivadhna
07:30 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
Hey Joseph what's the status on this? Kamoltat (Junior) Sirivadhna
07:28 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943791/ Kamoltat (Junior) Sirivadhna
07:03 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943763/ Kamoltat (Junior) Sirivadhna
02:04 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
https://github.com/ceph/ceph/pull/47087 merged Yuri Weinstein

07/24/2022

08:09 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
Myoungwon Oh, can you please take a look? Nitzan Mordechai
07:36 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
Sadly i dont have any more logs anymore, as i had to destroy the ceph - getting it back in working order was top prio... Chris Kul
05:28 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
@Chris Kul, I'm trying to understand the sequence of failing osd's, can you please upload the osds logs that failed?
...
Nitzan Mordechai

07/21/2022

08:29 PM Bug #55836: add an asok command for pg log investigations
https://github.com/ceph/ceph/pull/46561 merged Yuri Weinstein
07:19 PM Bug #56530 (Fix Under Review): Quincy: High CPU and slow progress during backfill
Sridhar Seshasayee
06:58 PM Bug #56530: Quincy: High CPU and slow progress during backfill
The issue is addressed currently in Ceph's main branch. Please see the linked PR. This will be back-ported to Quincy ... Sridhar Seshasayee
02:59 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Just a note, i was able to recreate it with vstart, without error injection but with valgrind
as soon as we step in...
Nitzan Mordechai
02:00 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Ah, thanks Sridhar. I will compare the two Trackers and mark this one as a duplicate if needed. Laura Flores
02:57 AM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
This looks similar to https://tracker.ceph.com/issues/52948. See comment https://tracker.ceph.com/issues/52948#note-5... Sridhar Seshasayee
02:57 PM Backport #56664 (In Progress): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change shou...
https://github.com/ceph/ceph/pull/47210 Kamoltat (Junior) Sirivadhna
02:45 PM Backport #56664 (Resolved): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should ...
Backport Bot
02:49 PM Backport #56663: pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= ...
https://github.com/ceph/ceph/pull/47211 Kamoltat (Junior) Sirivadhna
02:45 PM Backport #56663 (Resolved): pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should...
Backport Bot
02:40 PM Bug #56151 (Pending Backport): mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be ga...
Kamoltat (Junior) Sirivadhna
01:34 PM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
BTW the initial version was 17.2.0, we tried to update to 17.2.1 in hope this bug got fixed, sadly without luck. Chris Kul
01:33 PM Bug #56661 (Need More Info): Quincy: OSD crashing one after another with data loss with ceph_asse...
After two weeks after an upgrade to quincy from a octopus setup, the SSD pool reported one OSD down in the middle of ... Chris Kul
09:05 AM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
Looks like a race condition. Does our a @Context@ makes a dependency on @RefCountedObj@ (e.g. @TrackedOp@) but forget... Radoslaw Zarzynski

07/20/2022

11:33 PM Bug #44089 (New): mon: --format=json does not work for config get or show
This would be a good issue for Open Source Day if someone would be willing to take over the closed PR: https://github... Laura Flores
09:40 PM Bug #56530: Quincy: High CPU and slow progress during backfill
ceph-users discussion - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z7AILAXZDBIT6IIF2E6M3BLUE6B7L... Vikhyat Umrao
07:45 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Found another occurrence here: /a/yuriw-2022-07-18_18:20:02-rados-wip-yuri8-testing-2022-07-18-0918-distro-default-sm... Laura Flores
06:11 PM Bug #56574 (Need More Info): rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down...
Watching for more reoccurances. Radoslaw Zarzynski
10:25 AM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
osd.0 is still down..
The valagrind for osd.0 shows:...
Nitzan Mordechai
06:25 PM Bug #51168: ceph-osd state machine crash during peering process
Yao Ning wrote:
> Radoslaw Zarzynski wrote:
> > The PG was in @ReplicaActive@ so we shouldn't see any backfill acti...
Neha Ojha
06:06 PM Backport #56656 (New): pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDur...
Backport Bot
06:06 PM Backport #56655 (Resolved): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlus...
https://github.com/ceph/ceph/pull/47929 Backport Bot
06:03 PM Bug #53294 (Pending Backport): rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuri...
Neha Ojha
03:20 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939431... Laura Flores
06:02 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
Notes from the scrub:
1. It looks this happens mostly (only?) on pacific.
2. In at least of two replications Valg...
Radoslaw Zarzynski
05:56 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
... Radoslaw Zarzynski
03:58 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939660 Laura Flores
04:42 PM Backport #56408: quincy: ceph version 16.2.7 PG scrubs not progressing
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46844
merged
Yuri Weinstein
04:40 PM Backport #56060: quincy: Assertion failure (ceph_assert(have_pending)) when creating new OSDs dur...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46689
merged
Yuri Weinstein
04:40 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
https://github.com/ceph/ceph/pull/46498 merged Yuri Weinstein
04:08 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939513 Laura Flores
04:07 PM Bug #53767 (Duplicate): qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing c...
Same failure on test_cls_2pc_queue.sh, but this one came with remote logs. I suspect this is a duplicate of #55809.
...
Laura Flores
03:43 PM Bug #43584: MON_DOWN during mon_join process
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939512 Laura Flores
02:50 PM Bug #56650: ceph df reports invalid MAX AVAIL value for stretch mode crush rule
Before applying PR#47189, MAX AVAIL for stretch_rule pools is incorrect :... Prashant D
02:07 PM Bug #56650 (Fix Under Review): ceph df reports invalid MAX AVAIL value for stretch mode crush rule
Prashant D
01:26 PM Bug #56650 (Fix Under Review): ceph df reports invalid MAX AVAIL value for stretch mode crush rule
If we define crush rule for stretch mode cluster with multiple take then MAX AVAIL for pools associated with crush ru... Prashant D
01:15 PM Backport #56649 (Resolved): pacific: [Progress] Do not show NEW PG_NUM value for pool if autoscal...
https://github.com/ceph/ceph/pull/53464 Backport Bot
01:15 PM Backport #56648 (Resolved): quincy: [Progress] Do not show NEW PG_NUM value for pool if autoscale...
https://github.com/ceph/ceph/pull/47925 Backport Bot
01:14 PM Bug #56136 (Pending Backport): [Progress] Do not show NEW PG_NUM value for pool if autoscaler is ...
Prashant D

07/19/2022

09:20 PM Backport #56642 (Resolved): pacific: Log at 1 when Throttle::get_or_fail() fails
Backport Bot
09:20 PM Backport #56641 (Resolved): quincy: Log at 1 when Throttle::get_or_fail() fails
Backport Bot
09:18 PM Bug #56495 (Pending Backport): Log at 1 when Throttle::get_or_fail() fails
Brad Hubbard
02:07 PM Bug #56495: Log at 1 when Throttle::get_or_fail() fails
https://github.com/ceph/ceph/pull/47019 merged Yuri Weinstein
04:24 PM Bug #50222 (In Progress): osd: 5.2s0 deep-scrub : stat mismatch
Thanks Rishabh, I am having a look into this. Laura Flores
04:11 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
This error showed up in QA runs -
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-tes...
Rishabh Dave
10:25 AM Bug #55001 (Fix Under Review): rados/test.sh: Early exit right after LibRados global tests complete
Nitzan Mordechai
08:28 AM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
the core dump showing:... Nitzan Mordechai
08:28 AM Bug #49689 (Fix Under Review): osd/PeeringState.cc: ceph_abort_msg("past_interval start interval ...
PR is marked as draft for now. Matan Breizman
08:26 AM Backport #56580 (Resolved): octopus: snapshots will not be deleted after upgrade from nautilus to...
Matan Breizman
12:48 AM Bug #50853 (Can't reproduce): libcephsqlite: Core dump while running test_libcephsqlite.sh.
Patrick Donnelly

07/18/2022

08:43 PM Backport #56580: octopus: snapshots will not be deleted after upgrade from nautilus to pacific
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47108
merged
Yuri Weinstein
01:52 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931... Kamoltat (Junior) Sirivadhna
12:44 PM Bug #49777 (Fix Under Review): test_pool_min_size: 'check for active or peered' reached maximum t...
Kamoltat (Junior) Sirivadhna
01:50 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-07-13_19:41:18-rados-wip-yuri7-testing-2022-07-11-1631-distro-default-smithi/6929396/remote/smithi204/... Aishwarya Mathuria
01:47 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
We have coredump and the console_log showing:
smithi042.log:[ 852.382596] ceph_test_rados[110223]: segfault at 0 ip...
Nitzan Mordechai
01:42 PM Backport #56604 (Resolved): pacific: ceph report missing osdmap_clean_epochs if answered by peon
https://github.com/ceph/ceph/pull/51258 Backport Bot
01:42 PM Backport #56603 (Rejected): octopus: ceph report missing osdmap_clean_epochs if answered by peon
Backport Bot
01:42 PM Backport #56602 (Resolved): quincy: ceph report missing osdmap_clean_epochs if answered by peon
https://github.com/ceph/ceph/pull/47928 Backport Bot
01:37 PM Bug #47273 (Pending Backport): ceph report missing osdmap_clean_epochs if answered by peon
Dan van der Ster
01:34 PM Bug #54511: test_pool_min_size: AssertionError: not clean before minsize thrashing starts
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931... Kamoltat (Junior) Sirivadhna
12:44 PM Bug #54511 (Fix Under Review): test_pool_min_size: AssertionError: not clean before minsize thras...
Kamoltat (Junior) Sirivadhna
01:16 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931... Kamoltat (Junior) Sirivadhna
12:44 PM Bug #51904 (Fix Under Review): test_pool_min_size:AssertionError:wait_for_clean:failed before tim...
Kamoltat (Junior) Sirivadhna
10:18 AM Bug #56575 (Fix Under Review): test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fai...
Nitzan Mordechai

07/17/2022

01:16 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
/a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932690 Matan Breizman
01:04 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
/a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932687 Matan Breizman
09:03 AM Backport #56579 (In Progress): pacific: snapshots will not be deleted after upgrade from nautilus...
Matan Breizman
09:02 AM Backport #56578 (In Progress): quincy: snapshots will not be deleted after upgrade from nautilus ...
Matan Breizman
06:51 AM Bug #56575: test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fails from "method loc...
The lock expired, so the next ioctx.stat won't return -2 (-ENOENT) we need to change that as well based on r1 that re... Nitzan Mordechai

07/16/2022

03:18 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
This issue is fixed (including a unit test) and will be backported in order to prevent future clusters upgrades from ... Matan Breizman

07/15/2022

09:17 PM Cleanup #56581 (Fix Under Review): mon: fix ElectionLogic warnings
Laura Flores
09:06 PM Cleanup #56581 (Resolved): mon: fix ElectionLogic warnings
h3. Problem: compilation warnings in the ElectionLogic code... Laura Flores
08:58 PM Backport #56580 (In Progress): octopus: snapshots will not be deleted after upgrade from nautilus...
Neha Ojha
08:55 PM Backport #56580 (Resolved): octopus: snapshots will not be deleted after upgrade from nautilus to...
https://github.com/ceph/ceph/pull/47108 Backport Bot
08:55 PM Backport #56579 (Resolved): pacific: snapshots will not be deleted after upgrade from nautilus to...
https://github.com/ceph/ceph/pull/47134 Backport Bot
08:55 PM Backport #56578 (Resolved): quincy: snapshots will not be deleted after upgrade from nautilus to ...
https://github.com/ceph/ceph/pull/47133 Backport Bot
08:51 PM Bug #56147 (Pending Backport): snapshots will not be deleted after upgrade from nautilus to pacific
Neha Ojha
07:31 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
/a/nojha-2022-07-15_14:45:04-rados-snapshot_key_conversion-distro-default-smithi/6932156 Laura Flores
07:23 PM Bug #56574 (Need More Info): rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down...
Description: rados/valgrind-leaks/{1-start 2-inject-leak/osd centos_latest}
/a/nojha-2022-07-14_20:32:09-rados-sn...
Laura Flores
07:29 PM Bug #56575 (Pending Backport): test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fai...
/a/nojha-2022-07-14_20:32:09-rados-snapshot_key_conversion-distro-default-smithi/6930848... Laura Flores
12:09 PM Bug #56565 (Won't Fix): Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap
I was just told there is a step in the upgrade documentation to set mon_mds_skip_sanity param before upgrade [1], whi... Mykola Golub
10:07 AM Bug #51168: ceph-osd state machine crash during peering process
Radoslaw Zarzynski wrote:
> The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event...
Yao Ning

07/14/2022

12:19 PM Bug #56565 (Won't Fix): Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap
I have no idea if this needs to be fixed but at least the case looks worth reporting.
We faced the issue when upgr...
Mykola Golub

07/13/2022

07:48 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Noticed that this PR was newly included in 17.2.1, and it makes a change to GetApproximateSizes: https://github.com/c... Laura Flores
07:10 PM Backport #56551: quincy: mon/Elector: notify_ranked_removed() does not properly erase dead_ping i...
https://github.com/ceph/ceph/pull/47086 Kamoltat (Junior) Sirivadhna
06:55 PM Backport #56551 (Resolved): quincy: mon/Elector: notify_ranked_removed() does not properly erase ...
Backport Bot
07:09 PM Backport #56550 (In Progress): pacific: mon/Elector: notify_ranked_removed() does not properly er...
https://github.com/ceph/ceph/pull/47087 Kamoltat (Junior) Sirivadhna
06:55 PM Backport #56550 (Resolved): pacific: mon/Elector: notify_ranked_removed() does not properly erase...
Backport Bot
06:51 PM Bug #56034: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
This looks like a test failure, so nor terribly high priority. Radoslaw Zarzynski
06:49 PM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
Sridhar Seshasayee wrote:
> /a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smit...
Neha Ojha
06:42 PM Bug #56438 (Need More Info): found snap mapper error on pg 3.bs0> oid 3:d81a0fb3:::smithi10749189...
Waiting for reoccurrences. Radoslaw Zarzynski
06:38 PM Bug #56439: mon/crush_ops.sh: Error ENOENT: no backward-compatible weight-set
Let's observe whether there will be reoccurances. Radoslaw Zarzynski
06:33 PM Bug #55450 (Resolved): [DOC] stretch_rule defined in the doc needs updation
Kamoltat (Junior) Sirivadhna
06:33 PM Bug #55450: [DOC] stretch_rule defined in the doc needs updation
Opensource contributor, github username: elacunza. Created https://github.com/ceph/ceph/pull/46170 and have resolved ... Kamoltat (Junior) Sirivadhna
06:33 PM Bug #56147 (Fix Under Review): snapshots will not be deleted after upgrade from nautilus to pacific
Radoslaw Zarzynski
06:31 PM Bug #56463 (Triaged): osd nodes with NVME try to run `smartctl` and `nvme` even when the tools ar...
They are called from @block_device_get_metrics()@ in @common/blkdev.cc@. Radoslaw Zarzynski
06:26 PM Bug #54485 (Resolved): doc/rados/operations/placement-groups/#automated-scaling: --bulk invalid c...
Kamoltat (Junior) Sirivadhna
06:23 PM Backport #54505 (Resolved): pacific: doc/rados/operations/placement-groups/#automated-scaling: --...
Kamoltat (Junior) Sirivadhna
06:22 PM Backport #54506 (Resolved): quincy: doc/rados/operations/placement-groups/#automated-scaling: --b...
Kamoltat (Junior) Sirivadhna
06:22 PM Bug #54576 (Resolved): cache tier set proxy faild
Fix merged. Radoslaw Zarzynski
06:19 PM Bug #55665 (Fix Under Review): osd: osd_fast_fail_on_connection_refused will cause the mon to con...
Radoslaw Zarzynski
06:11 PM Bug #51168: ceph-osd state machine crash during peering process
Nautilus is EOL now and it is also possible that we may have fixed such a bug after 14.2.18.
Can you tell me the P...
Neha Ojha
06:08 PM Bug #51168: ceph-osd state machine crash during peering process
The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event maybe? Radoslaw Zarzynski
06:04 PM Bug #51168: ceph-osd state machine crash during peering process
... Radoslaw Zarzynski
06:02 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
/a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/6910169/ first timed out and the... Neha Ojha
06:01 PM Bug #56192: crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
Reoccurence reported in https://tracker.ceph.com/issues/51904#note-21. See also the replies:
* https://tracker.cep...
Radoslaw Zarzynski
05:57 PM Bug #49777 (In Progress): test_pool_min_size: 'check for active or peered' reached maximum tries ...
Kamoltat (Junior) Sirivadhna
05:46 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Hello Aishwarya! How about coworking on this? Ping me when you have time. Radoslaw Zarzynski
05:42 PM Bug #50242: test_repair_corrupted_obj fails with assert not inconsistent
Hello Ronen. It looks to be somehow scrub-related. Mind taking a look? Nothing urgent. Radoslaw Zarzynski
05:38 PM Bug #56392 (Resolved): ceph build warning: comparison of integer expressions of different signedness
Kamoltat (Junior) Sirivadhna
12:07 PM Feature #56543 (New): About the performance improvement of ceph's erasure code storage pool
Hello everyone:
Although I know that the erause code storage pool is not suitable for use in scenarios with many ran...
Sheng Xie

07/12/2022

10:29 PM Bug #56495 (Fix Under Review): Log at 1 when Throttle::get_or_fail() fails
Neha Ojha
01:57 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Greg Farnum wrote:
> That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use wel...
Dan van der Ster
01:06 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
That said, I wouldn’t expect anything useful from running this — pool snaps are hard to use well. What were you tryin... Greg Farnum
12:59 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
AFAICT this is just a RADOS issue? Greg Farnum
01:30 PM Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46767
merged
Yuri Weinstein
12:41 PM Bug #56530: Quincy: High CPU and slow progress during backfill
Thanks for looking at this. Answers to your questions:
1. Backfill started at around 4-5 objects per second, and t...
Chris Palmer
11:56 AM Bug #56530: Quincy: High CPU and slow progress during backfill
While we look into this, I have a couple of questions:
1. Did the recovery rate stay at 1 object/sec throughout? I...
Sridhar Seshasayee
11:16 AM Bug #56530 (Resolved): Quincy: High CPU and slow progress during backfill
I'm seeing a similar problem on a small cluster just upgraded from Pacific 16.2.9 to Quincy 17.2.1 (non-cephadm). The... Chris Palmer

07/11/2022

09:18 PM Bug #54396 (Resolved): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snapt...
Neha Ojha
09:17 PM Feature #55982 (Resolved): log the numbers of dups in PG Log
Neha Ojha
09:17 PM Backport #55985 (Resolved): octopus: log the numbers of dups in PG Log
Neha Ojha
01:35 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
https://github.com/ceph/ceph/pull/46845 merged Yuri Weinstein
01:31 PM Backport #51287: pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: ...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46677
merged
Yuri Weinstein

07/08/2022

05:27 AM Backport #56498 (In Progress): quincy: Make the mClock config options related to [res, wgt, lim] ...
Sridhar Seshasayee
04:50 AM Backport #56498 (Resolved): quincy: Make the mClock config options related to [res, wgt, lim] mod...
https://github.com/ceph/ceph/pull/47020 Backport Bot
04:46 AM Bug #55153 (Pending Backport): Make the mClock config options related to [res, wgt, lim] modifiab...
Sridhar Seshasayee
01:48 AM Bug #56495 (Resolved): Log at 1 when Throttle::get_or_fail() fails
When trying to debug a throttle failure we currently need to set debug_ms=20 which can delay troubleshooting due to t... Brad Hubbard
01:00 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
I'll take a look Myoungwon Oh

07/07/2022

08:51 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
Potential Pacific occurrence? Although this one is catching on LibRadosTwoPoolsPP.CachePin rather than LibRadosTwoPoo... Laura Flores
03:44 PM Bug #55153: Make the mClock config options related to [res, wgt, lim] modifiable during runtime f...
https://github.com/ceph/ceph/pull/46700 merged Yuri Weinstein
01:21 AM Bug #56487: Error EPERM: problem getting command descriptions from mon, when execute "ceph -s".
similar issue as:
https://tracker.ceph.com/issues/36300
liqun zhang
01:20 AM Bug #56487: Error EPERM: problem getting command descriptions from mon, when execute "ceph -s".
in the case, cephx is disabled. test script as below:
#/usr/bin/bash
while true
do
echo `date` >> /tmp/o.log
r...
liqun zhang
01:18 AM Bug #56487 (New): Error EPERM: problem getting command descriptions from mon, when execute "ceph ...
version 15.2.13
disable cephx, and excute "ceph -s" every 1 second,
A great chance to reproduce this error. log as ...
liqun zhang

07/06/2022

11:19 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
Can you make a new ticket with your details and link to this one? We may have recreated a similar issue but the detai... Greg Farnum
03:14 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
@Myoungwon Oh - can you take a look at
http://pulpito.front.sepia.ceph.com/rfriedma-2022-07-05_18:14:55-rados-wip-...
Ronen Friedman
11:12 AM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
Ronen Friedman wrote:
> Kamoltat Sirivadhna wrote:
> > /a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-ma...
Ronen Friedman
11:10 AM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
Kamoltat Sirivadhna wrote:
> /a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/69...
Ronen Friedman
10:47 AM Bug #51168: ceph-osd state machine crash during peering process
ceph-osd log on crashed osd uploaded Yao Ning

07/05/2022

02:32 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
/a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/6910169/ Kamoltat (Junior) Sirivadhna
02:06 PM Bug #54511: test_pool_min_size: AssertionError: not clean before minsize thrashing starts
/a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/6910103/ Kamoltat (Junior) Sirivadhna
09:17 AM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Dan van der Ster wrote:
> Venky Shankar wrote:
> > Hi Dan,
> >
> > I need to check, but does the inconsistent ob...
Venky Shankar
07:24 AM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
Looks like we don't have the correct primary (was osd.1, changed to osd.4, and after the wait_for_clean was back to o... Nitzan Mordechai
01:51 AM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
#/usr/bin/bash
while true
do
echo `date` >> /tmp/o.log
ret=`ceph -s >> /tmp/o.log 2>&1 `
sleep 1
echo '' >> /t...
liqun zhang
01:28 AM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
version 15.2.13
disable cephx, and excute "ceph -s" every 1 second,
A great chance to reproduce this error. log as ...
liqun zhang
01:24 AM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
Mon Jul 4 15:31:19 CST 2022
2022-07-04T15:31:20.219+0800 7f8595551700 10 monclient: get_monmap_and_config
2022-07-0...
liqun zhang

07/04/2022

08:54 PM Backport #55981 (Resolved): quincy: don't trim excessive PGLog::IndexedLog::dups entries on-line
Ilya Dryomov
08:18 PM Bug #56463 (Triaged): osd nodes with NVME try to run `smartctl` and `nvme` even when the tools ar...
Using debian packages:
ceph-osd 17.2.1-1~bpo11+1
ceph-volume 17.2.1-1~bpo11+1
Every day some job runs wh...
Matthew Darwin
07:53 PM Backport #55746 (Resolved): quincy: Support blocklisting a CIDR range
Ilya Dryomov
05:48 PM Feature #55693 (Fix Under Review): Limit the Health Detail MSG log size in cluster logs
Prashant D

07/03/2022

12:49 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
Radoslaw Zarzynski wrote:
> Hello Matan! Does this snapshot issue ring a bell?
Introduced here:
https://github.c...
Matan Breizman

07/01/2022

05:36 PM Backport #54386: octopus: [RFE] Limit slow request details to mgr log
Ponnuvel P wrote:
> please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/45154
...
Yuri Weinstein
04:17 PM Bug #56439 (New): mon/crush_ops.sh: Error ENOENT: no backward-compatible weight-set
/a/yuriw-2022-06-23_16:06:40-rados-wip-yuri7-testing-2022-06-23-0725-octopus-distro-default-smithi/6894952... Laura Flores
01:51 PM Bug #56392: ceph build warning: comparison of integer expressions of different signedness
Note: this warning was caused by merging https://github.com/ceph/ceph/pull/46029/ Kamoltat (Junior) Sirivadhna
01:40 PM Bug #55435 (Pending Backport): mon/Elector: notify_ranked_removed() does not properly erase dead_...
Kamoltat (Junior) Sirivadhna
01:17 PM Bug #55435 (Resolved): mon/Elector: notify_ranked_removed() does not properly erase dead_ping in ...
Kamoltat (Junior) Sirivadhna
01:16 PM Bug #55708 (Fix Under Review): Reducing 2 Monitors Causes Stray Daemon
Kamoltat (Junior) Sirivadhna
12:55 PM Bug #56438 (Need More Info): found snap mapper error on pg 3.bs0> oid 3:d81a0fb3:::smithi10749189...
/a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smithi/6906226
The error looks...
Sridhar Seshasayee
12:29 PM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
/a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smithi/6906076
/a/yuriw-2022-06-...
Sridhar Seshasayee
09:21 AM Cleanup #52753 (Rejected): rbd cls : centos 8 warning
Ilya Dryomov
09:20 AM Cleanup #52753: rbd cls : centos 8 warning
Looks like this warning is no longer there with a newer g++:
https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=...
Ilya Dryomov
12:12 AM Backport #55983 (Resolved): quincy: log the numbers of dups in PG Log
Neha Ojha

06/30/2022

07:53 PM Bug #56034: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
/a/yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6905537 Kamoltat (Junior) Sirivadhna
07:41 PM Bug #50242 (New): test_repair_corrupted_obj fails with assert not inconsistent
Kamoltat (Junior) Sirivadhna
07:41 PM Bug #50242: test_repair_corrupted_obj fails with assert not inconsistent
/a/yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6905523/ Kamoltat (Junior) Sirivadhna
07:30 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
/a/yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6905499 Kamoltat (Junior) Sirivadhna
12:44 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
Here I have a PR, which should fix the conversion on update
https://github.com/ceph/ceph/pull/46908
But what is w...
Manuel Lausch

06/29/2022

06:29 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
Not a terribly high priority. Radoslaw Zarzynski
06:24 PM Bug #48029: Exiting scrub checking -- not all pgs scrubbed.
The code that generated the exception is (from the @main@ branch):... Radoslaw Zarzynski
06:13 PM Bug #56392 (Fix Under Review): ceph build warning: comparison of integer expressions of different...
Neha Ojha
06:12 PM Bug #56393: failed to complete snap trimming before timeout
Could it be srub related? Radoslaw Zarzynski
06:08 PM Bug #56147 (New): snapshots will not be deleted after upgrade from nautilus to pacific
Hello Matan! Does this snapshot issue ring a bell? Radoslaw Zarzynski
06:03 PM Bug #46889: librados: crashed in service_daemon_update_status
Lowering the priority to match the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2101415#c9. Radoslaw Zarzynski
05:55 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Yeah, this clearly looks like a race condition (likely around life time management).
Lowering to High as it happen...
Radoslaw Zarzynski
05:50 PM Bug #56101 (Need More Info): Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function saf...
Well, it seems the logs on @dell-per320-4.gsslab.pnq.redhat.com:/home/core/tracker56101@ are on the default levels. S... Radoslaw Zarzynski
03:37 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
A Telemetry contact was able to provide their OSD log. There was not a coredump available anymore, but they were able... Laura Flores
03:48 PM Bug #56420 (New): ceph-object-store: there is no chunking in --op log
The current implementation assumes that huge amount of memory are always available.... Radoslaw Zarzynski
 

Also available in: Atom