Activity
From 08/17/2021 to 09/15/2021
09/15/2021
- 09:23 PM Bug #52605 (Fix Under Review): osd: add scrub duration to pg dump
- 06:39 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-...
- 06:05 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
- ...
- 03:20 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/43513
- 03:18 PM Bug #52583 (Pending Backport): partial recovery become whole object recovery after restart osd
- 02:50 PM Backport #52337: octopus: ceph df detail reports dirty objects without a cache tier
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42862
merged - 02:43 PM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
- https://github.com/ceph/ceph/pull/42498 merged
- 02:18 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- 2021-09-02 14:25:37.173453 7f2235baf700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/...
09/14/2021
- 05:53 PM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
- Request to add new PG states to provide feedback to the admin when a PG scrub/repair is scheduled ( via command line,...
- 01:14 PM Bug #52605 (Resolved): osd: add scrub duration to pg dump
- We would like to add a new column to the pg dump which would give us the time it took for a pg to get scrubbed.
09/13/2021
- 10:48 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
- Looking at the osd and mon logs
Here's when the osd was restarted in revive_osd()... - 09:05 PM Backport #52596 (Rejected): octopus: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:05 PM Backport #52595 (Rejected): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:03 PM Bug #51725 (Pending Backport): make bufferlist::c_str() skip rebuild when it isn't necessary
- This spares a heap allocation on every received message. Marking for backporting to both octopus and pacific as a lo...
- 02:21 PM Bug #45871: Incorrect (0) number of slow requests in health check
- On a...
- 10:35 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
- https://github.com/ceph/ceph/pull/41731
- 10:24 AM Bug #52583: partial recovery become whole object recovery after restart osd
- FIX URL:
https://github.com/ceph/ceph/pull/43146
https://github.com/ceph/ceph/pull/42904 - 09:43 AM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
- Problem: After the osd that is undergoing partial recovery is restarted, the data recovery is rolled back from the pa...
- 05:32 AM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
- CLI - osd pool rm --help message is wrong or misleading
Version-Release number of selected component (if applicabl... - 04:19 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
09/09/2021
- 11:09 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- I got the same assert for 14.2.22 when the scenario is replayed
1. make rbd pool write-back cache layer
2. restar... - 10:00 PM Bug #52140 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:57 PM Bug #52141 (Need More Info): crash: void OSD::load_pgs(): abort
- 09:53 PM Bug #52142 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:52 PM Bug #52145 (Duplicate): crash: OSDMapRef OSDService::get_map(epoch_t): assert(ret)
- 09:49 PM Bug #52147 (Duplicate): crash: rocksdb::InstrumentedMutex::Lock()
- 09:48 PM Bug #52148 (Duplicate): crash: pthread_getname_np()
- 09:47 PM Bug #52149 (Duplicate): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_...
- 09:46 PM Bug #52150 (Won't Fix): crash: bool HealthMonitor::check_member_health(): assert(store_size > 0)
- 09:37 PM Bug #52152 (Duplicate): crash: pthread_getname_np()
- 09:37 PM Bug #52154 (Won't Fix): crash: Infiniband::MemoryManager::Chunk::write(char*, unsigned int)
- RDMA is not being actively worked on.
- 09:32 PM Bug #52155 (Need More Info): crash: pthread_rwlock_rdlock() in queue_want_up_thru
- 09:30 PM Bug #52156 (Duplicate): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(err == 0)
- 09:28 PM Bug #52158 (Need More Info): crash: ceph::common::PerfCounters::set(int, unsigned long)
- 09:25 PM Bug #52159 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:25 PM Bug #52160 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:24 PM Bug #52153 (Won't Fix): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:22 PM Bug #52161 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:22 PM Bug #52163 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:20 PM Bug #52165 (Rejected): crash: void MonitorDBStore::clear(std::set<std::__cxx11::basic_string<char...
- A non-zero return value could be possibly due to a rocksdb corruption and there just 2 clusters reporting this.
- 09:17 PM Bug #52166 (Won't Fix): crash: void Device::binding_port(ceph::common::CephContext*, int): assert...
- RDMA is not being actively worked on, this is one cluster reporting all the crashes.
- 09:15 PM Bug #52167 (Won't Fix): crash: RDMAConnectedSocketImpl::RDMAConnectedSocketImpl(ceph::common::Cep...
- RDMA is not being actively worked on, this is one cluster reporting all the crashes.
- 09:14 PM Bug #52162 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:14 PM Bug #52164 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:13 PM Bug #52168 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:06 PM Bug #52170 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
- 09:05 PM Bug #52171 (Triaged): crash: virtual int RocksDBStore::get(const string&, const string&, ceph::bu...
- Seen on 2 clusters, could be related to some sort of rocksdb corruption.
- 09:01 PM Bug #52173 (Need More Info): crash in ProtocolV2::send_message()
- Seen on 2 octopus clusters.
- 08:45 PM Bug #52189 (Need More Info): crash in AsyncConnection::maybe_start_delay_thread()
- We'll need more information to debug a crash like this.
- 05:25 PM Backport #51605: pacific: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42976
merged - 05:20 PM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
- https://github.com/ceph/ceph/pull/41731
- 05:19 PM Fix #52025 (Pending Backport): osd: Add config option to skip running the OSD benchmark on init.
- Merged https://github.com/ceph/ceph/pull/42604
- 05:17 PM Fix #52329 (Pending Backport): src/vstart: The command "set config key osd_mclock_max_capacity_io...
- 05:10 PM Fix #52329: src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" fails with ...
- https://github.com/ceph/ceph/pull/42853 merged
- 04:17 PM Bug #52562 (Closed): Thrashosds read error injection failed with error ENXIO
- /a/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379886
As part of the th... - 02:24 PM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- 10:15 AM Backport #52557 (Resolved): pacific: pybind: rados.RadosStateError raised when closed watch objec...
- https://github.com/ceph/ceph/pull/51259
- 10:15 AM Backport #52556 (Rejected): octopus: pybind: rados.RadosStateError raised when closed watch objec...
- 10:11 AM Bug #52553 (Pending Backport): pybind: rados.RadosStateError raised when closed watch object goes...
- 07:10 AM Bug #52553 (Fix Under Review): pybind: rados.RadosStateError raised when closed watch object goes...
- 07:06 AM Bug #52553 (Resolved): pybind: rados.RadosStateError raised when closed watch object goes out of ...
- This one is easiest to demonstrate by example. Here's some code:...
09/08/2021
- 09:57 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- I got the logs of sd.{3,11,13}'s during their boot. This data was collected with tuning up the log level.
https://... - 06:57 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- The ticket can be closed from our side - and it may be a duplicate, but I'm not able to say this for sure. But I have...
- 05:30 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- Roland Sommer wrote:
> The cluster is running without any problems since we rolled out the latest dev release from t... - 01:17 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- The cluster is running without any problems since we rolled out the latest dev release from the pacific branch to all...
- 09:21 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- We started rolling out 16.2.5-522-gde2ff323-1bionic from the dev repos on the osd nodes, as there is no release/tag v...
- 05:35 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- This could be related to https://tracker.ceph.com/issues/52089
@Roland, could yu please upgrade to 16.2.6 and update... - 04:42 PM Bug #52408: osds not peering correctly after startup
- Erm, in fact, right after doing cephadm bootstrap, before rebooting anything:...
- 04:17 PM Bug #52408: osds not peering correctly after startup
- Odd. The hosts in question are all KVM nodes on the same physical host, so I wouldn't expect networking issues.
I ... - 03:42 PM Backport #51952 (In Progress): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log()....
- 09:11 AM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- Increasing priority, as this happens pretty often in the ceph-volume jenkins jobs recently
- 09:02 AM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
- seeing failures in ceph-volume CI because of monitor crashing after an OSD gets destroyed....
09/07/2021
- 02:34 PM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
- Nautilus is EOL and the backport is too intrusive.
- 01:23 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- I attached another graph showing the increased amount of written data.
- 10:01 AM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- After having run pacific in our low volume staging system for 2 months, yesterday we upgraded our production cluster ...
- 10:55 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- PG was actually inconsistent...
09/06/2021
- 06:42 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Hey Ilya, nope, since the issue was seen in pacific I thought it might be something we backported to recent versions....
- 06:34 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- On osd0 and osd1:...
- 04:20 PM Bug #51799 (Resolved): osd: snaptrim logs to derr at every tick
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:18 PM Bug #52421 (Resolved): test tracker
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:04 PM Backport #52336 (Resolved): pacific: ceph df detail reports dirty objects without a cache tier
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42860
m... - 04:03 PM Backport #51830 (Resolved): pacific: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42615
m... - 04:02 PM Backport #51290: pacific: mon: stretch mode clusters do not sanely set default crush rules
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42909
m... - 09:20 AM Bug #52513 (New): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- We get crash of two simultaneously OSD's served 17.7ff [684,768,760] ...
09/05/2021
- 02:34 PM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
- Hi, we get a couple of outages with two stuck PG it premerge+peered state:...
09/03/2021
- 05:25 PM Bug #52503 (New): cli_generic.sh: slow ops when trying rand write on cache pools
- failing: leads to slow ops: http://qa-proxy.ceph.com/teuthology/yuriw-2021-09-01_19:04:25-rbd-wip-yuri-testing-2021-0...
- 02:13 PM Bug #52445: OSD asserts on starting too many pushes
- Hi,
I have managed to set debug log using ceph config set command and taken log output.
# options before changi... - 06:44 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
09/02/2021
- 10:12 PM Bug #44715 (Fix Under Review): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_li...
- 10:10 PM Bug #52408: osds not peering correctly after startup
- Jeff Layton wrote:
> Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts o... - 09:53 PM Backport #52498 (Rejected): nautilus: test tracker: please ignore
- 09:50 PM Bug #52445 (Need More Info): OSD asserts on starting too many pushes
- Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of ceph -s?
Is this crash... - 09:42 PM Backport #52497 (Rejected): octopus: test tracker: please ignore
- 08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Ronen Friedman wrote:
> Some possibly helpful hints:
> 1. In "my" specific instance, the pg address handed over to ... - 08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- more useful debug logging being added in https://github.com/ceph/ceph/pull/42965
- 05:48 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I dug into this more today and I am wondering if it has something to do with `_conf->cluster` not being set right (to...
- 01:26 PM Backport #52495 (Rejected): pacific: test tracker: please ignore
- 01:26 PM Bug #52486 (Pending Backport): test tracker: please ignore
- 06:25 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2021-08-27_21:20:08-rados-wip-yuri2-testing-2021-08-27-1207-distro-basic-smithi/6363835
- 01:22 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- I'm in the midst of an upgrade from Octopus to Pacific. Due to issues during the upgrade, rather than simply upgradin...
- 12:42 AM Bug #52488 (New): Pacific mon won't join Octopus mons
- I'm in the midst of an upgrade from Octopus to Pacific. Due to issues with the version of docker available on Debian ...
09/01/2021
- 11:55 PM Bug #52421 (Pending Backport): test tracker
- 07:24 PM Bug #52486 (Closed): test tracker: please ignore
- please ignore
- 05:18 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-08-31_22:30:47-rados-wip-yuri8-testing-2021-08-30-0930-pacific-distro-basic-smithi/6369129/remote/smith...
- 03:55 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Some possibly helpful hints:
1. In "my" specific instance, the pg address handed over to register_and_wake_split_chi...
08/31/2021
- 09:57 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
- 09:41 PM Bug #52421 (Resolved): test tracker
- 09:41 PM Backport #52475 (Resolved): octopus: test tracker
- 09:20 PM Backport #52475 (Resolved): octopus: test tracker
- 09:40 PM Backport #52474 (Resolved): nautilus: test tracker
- 08:52 PM Backport #52474 (Resolved): nautilus: test tracker
- 09:40 PM Backport #52466 (Resolved): pacific: test tracker
- 03:44 PM Backport #52466 (Resolved): pacific: test tracker
- 08:59 AM Bug #49697: prime pg temp: unexpected optimization
- Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af69e" can solve my is...
- 03:44 AM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
- Neha Ojha wrote:
> can you share your osdmap? are all your osds up and in? the crushmap looks fine.
wish to get y...
08/30/2021
- 04:59 PM Bug #52408: osds not peering correctly after startup
- Other requested info from this rebuild of the cluster:...
- 04:57 PM Bug #52408: osds not peering correctly after startup
- Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts or just 1. I ran it on ...
- 02:28 PM Bug #50657: smart query on monitors
- Yaarit Hatuka wrote:
> Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
We hav... - 08:56 AM Bug #50657 (Pending Backport): smart query on monitors
- 01:44 PM Backport #51605 (In Progress): pacific: bufferlist::splice() may cause stack corruption in buffer...
- 01:44 PM Backport #51604 (In Progress): octopus: bufferlist::splice() may cause stack corruption in buffer...
- 09:00 AM Backport #52451 (Resolved): octopus: smart query on monitors
- https://github.com/ceph/ceph/pull/44177
- 09:00 AM Backport #52450 (Resolved): pacific: smart query on monitors
- https://github.com/ceph/ceph/pull/44164
- 07:01 AM Bug #52448 (Fix Under Review): osd: pg may get stuck in backfill_toofull after backfill is interr...
- 06:51 AM Bug #52448 (Resolved): osd: pg may get stuck in backfill_toofull after backfill is interrupted du...
- Consider a scenario:
- Data is written to a pool so one osd X is close to full but still lower than nearfool/toofu...
08/28/2021
- 02:59 PM Bug #52445 (New): OSD asserts on starting too many pushes
- I am running ceph version 15.2.5 cluster in the recent days scrub reported error and few pg failed due to OSD's rando...
08/27/2021
- 04:28 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-08-26_18:40:53-rados-wip-yuri7-testing-2021-08-26-0841-distro-basic-smithi/6360450/remote/smithi052/log...
- 01:21 PM Bug #52421 (Pending Backport): test tracker
08/26/2021
- 09:55 PM Bug #52172 (Triaged): crash: ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsig...
- 09:51 PM Bug #52174 (Triaged): crash: ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsig...
- 09:46 PM Bug #52176 (Duplicate): crash: std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrus...
- 09:41 PM Bug #52178 (Duplicate): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(ret == 0)
- 09:37 PM Bug #52180 (Duplicate): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_...
- 09:37 PM Bug #47299 (New): Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- 09:33 PM Bug #52183 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
- 09:31 PM Bug #52186 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:29 PM Bug #52195 (Duplicate): crash: /lib64/libpthread.so.0(
- 09:26 PM Bug #52190 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:26 PM Bug #52191 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:25 PM Bug #52192 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:25 PM Bug #52193 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:25 PM Bug #52197 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:23 PM Bug #52198 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:22 PM Bug #52199 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:21 PM Bug #52200 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:18 PM Bug #52207 (Duplicate): crash: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<ch...
- 09:17 PM Bug #52210 (Closed): crash: CrushWrapper::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)
- One cluster reporting all the crashes, likely failing to decode due to a corrupted on disk state.
- 09:15 PM Bug #52211 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:13 PM Bug #52212 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:11 PM Bug #52213 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:10 PM Bug #52214 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:10 PM Bug #52217 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:10 PM Bug #52218 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:09 PM Bug #44715 (New): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->o...
- 09:07 PM Bug #52220: crash: void ECUtil::HashInfo::append(uint64_t, std::map<int, ceph::buffer::v15_2_0::l...
- One cluster reporting all the crashes.
- 09:06 PM Bug #52221 (Triaged): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:04 PM Bug #52143 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:00 PM Bug #52225: crash: void Thread::create(const char*, size_t): assert(ret == 0)
- One cluster is reporting all the crashes.
- 08:59 PM Bug #52226: crash: PosixNetworkStack::spawn_worker(unsigned int, std::function<void ()>&&)
- One cluster reporting all the crashes.
- 08:58 PM Bug #52231: crash: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::al...
- One cluster is reporting all the crashes.
- 08:56 PM Bug #52233: crash: void Infiniband::init(): assert(device)
- One cluster is reporting all the crashes.
- 08:19 PM Feature #52424 (Resolved): [RFE] Limit slow request details to mgr log
- Slow requests can overwhelm a cluster log with too many details, filling up the monitor DB.
There's no need to log... - 08:04 PM Feature #51984: [RFE] Provide warning when the 'require-osd-release' flag does not match current ...
- Please check - https://tracker.ceph.com/issues/52423
- 08:02 PM Feature #52423 (New): Do not allow running enable-msgr2 if cluster don't have osd release set to ...
- Do not allow running enable-msgr2 if cluster don't have osd release set to nautilus
See also - https://tracker.ceph.... - 07:53 PM Bug #50657: smart query on monitors
- Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
> Do you have a bug number for... - 07:30 PM Bug #50657: smart query on monitors
- > > Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
>
> Yes, bare metal d... - 11:00 AM Bug #50657: smart query on monitors
- Yaarit Hatuka wrote:
> This fixes the missing sudoers file in mon nodes:
> https://github.com/ceph/ceph/pull/42913
... - 07:49 PM Bug #52408: osds not peering correctly after startup
- Thanks for providing these logs, but they don't have debug_osd=20 (we need it on all the osds). The pg query for 1.7c...
- 10:46 AM Bug #52408: osds not peering correctly after startup
- Tore down and rebuild the cluster again using my quincy-based image. This time, I didn't create any filesystems. ceph...
- 06:14 PM Bug #52421 (Resolved): test tracker
- please ignore
- 05:58 PM Bug #52418 (New): workloads/dedup-io-snaps: ceph_assert(!context->check_oldest_snap_flushed(oid, ...
- /a/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356797...
- 05:12 PM Bug #52416 (Resolved): devices: mon devices appear empty when scraping SMART metrics
- When invoking smartctl on mon devices, the device name is empty:...
- 03:56 PM Bug #52415 (Closed): rocksdb: build error with rocksdb-6.22.x
- https://github.com/ceph/ceph/pull/42815
- 03:11 PM Bug #52415: rocksdb: build error with rocksdb-6.22.x
- possibly fixed by https://github.com/ceph/ceph/pull/42815?
- 01:58 PM Bug #52415 (Resolved): rocksdb: build error with rocksdb-6.22.x
- Fedora rawhide (f35, f36) have recently upgraded to rocksdb-6.22.1
Now ceph's rocksdb integration fails to compile... - 04:10 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- ...
08/25/2021
- 08:39 PM Bug #52408: osds not peering correctly after startup
- Tore down the old cluster and built a Pacific one (v16.2.5). That one doesn't have the same issue. I'll do a clean te...
- 07:44 PM Bug #52408: osds not peering correctly after startup
- peering info:...
- 06:53 PM Bug #52408: osds not peering correctly after startup
- Nothing in the logs for crashed osd.0. I think the last thing in the logs was a rocksdb dump. coredumpctl also didn't...
- 06:38 PM Bug #52408: osds not peering correctly after startup
- Jeff Layton wrote:
> This time when I brought it up, one osd didn't go "up". First two bits of info you asked for:
... - 06:17 PM Bug #52408: osds not peering correctly after startup
- This time when I brought it up, one osd didn't go "up". First two bits of info you asked for:...
- 05:33 PM Bug #52408: osds not peering correctly after startup
- 1. Can you try to reproduce this with 1 pool containing few pgs?
2. Turn the autoscaler off (ceph osd pool set foo p... - 01:46 PM Bug #52408: osds not peering correctly after startup
- My current build is based on upstream commit a49f10e760b4. It has some MDS patches on top, but nothing that should af...
- 01:45 PM Bug #52408 (Can't reproduce): osds not peering correctly after startup
- I might not have the right terminology here. I have a host that I run 3 VMs on that act as cephadm cluster nodes (mos...
- 04:00 AM Bug #50657 (Fix Under Review): smart query on monitors
- This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913
We'll address the fix f...
08/24/2021
- 09:54 PM Backport #52336: pacific: ceph df detail reports dirty objects without a cache tier
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42860
merged - 09:53 PM Backport #51830: pacific: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42615
merged - 08:12 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
- 05:54 PM Backport #51290 (In Progress): pacific: mon: stretch mode clusters do not sanely set default crus...
- 06:31 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2021-08-23_19:24:05-rados-wip-yuri4-testing-2021-08-23-0812-pacific-distro-basic-smithi/6353883
- 06:16 PM Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing()....
- Causing failures in pacific: /a/yuriw-2021-08-23_19:24:05-rados-wip-yuri4-testing-2021-08-23-0812-pacific-distro-basi...
- 10:45 AM Bug #50441 (Resolved): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- Dan Mick wrote:
> Deepika, was that the reason why?
yep Dan, Neha marked needs info because of MB's comment, mark... - 12:40 AM Bug #52385 (Closed): a possible data loss due to recovery_unfound PG after restarting all nodes
- Related to the discussion in ceph-users ML.
https://marc.info/?l=ceph-users&m=162947327817532&w=2
I encountered a...
08/23/2021
- 09:53 PM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- Deepika, was that the reason why?
- 08:08 PM Backport #51549 (Resolved): pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42211
m... - 08:03 PM Backport #51568 (Resolved): pacific: pool last_epoch_clean floor is stuck after pg merging
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42224
m... - 05:54 PM Fix #52329: src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" fails with ...
- Mon logs showing that the command is capable after the fix is applied:...
- 04:51 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Just wanted to note that we recently encountered what appears to be the same issue on some Luminous (12.2.12) cluster...
08/20/2021
- 04:14 PM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
- Neha Ojha wrote:
> can you share your osdmap? are all your osds up and in? the crushmap looks fine.
all the osds ... - 05:30 AM Backport #52337 (In Progress): octopus: ceph df detail reports dirty objects without a cache tier
- 02:36 AM Backport #52337 (Resolved): octopus: ceph df detail reports dirty objects without a cache tier
- https://github.com/ceph/ceph/pull/42862
- 03:02 AM Backport #52336: pacific: ceph df detail reports dirty objects without a cache tier
- https://github.com/ceph/ceph/pull/42860
- 02:36 AM Backport #52336 (Resolved): pacific: ceph df detail reports dirty objects without a cache tier
- https://github.com/ceph/ceph/pull/42860
- 02:36 AM Bug #52335 (Pending Backport): ceph df detail reports dirty objects without a cache tier
- 02:32 AM Bug #52335 (Resolved): ceph df detail reports dirty objects without a cache tier
- Description of problem:
'ceph df detail' reports a column for DIRTY objects under POOLS even though cache tiers are ...
08/19/2021
- 10:48 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- I don't have the logs right now but it prints the state of the PG so if you search for`snaptrim` in `f0208568-fbf4-48...
- 08:48 PM Bug #52026 (New): osd: pgs went back into snaptrim state after osd restart
- Thanks for providing the logs, is there a particular PG we should look at the in the logs?
- 09:19 PM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- I assume because of MB's comment, but that seems now to be historical
- 09:17 PM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- Deepika: why is this issue in need-more-info? Looks like the original fix and pacific backport https://github.com/cep...
- 09:12 PM Bug #48844 (Duplicate): api_watch_notify: LibRadosWatchNotify.AioWatchDelete failed
- 09:08 PM Bug #52261 (Need More Info): OSD takes all memory and crashes, after pg_num increase
- 09:08 PM Bug #52255 (Need More Info): The pgs state are degraded, but all the osds is up and there is no r...
- 09:08 PM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
- can you share your osdmap? are all your osds up and in? the crushmap looks fine.
- 08:54 PM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
- Brad, are you aware of this one?
- 03:54 AM Bug #52319 (New): LibRadosWatchNotify.WatchNotify2 fails
- 2021-08-17T01:34:43.023 INFO:tasks.workunit.client.0.smithi111.stdout: api_watch_notify: [ RUN ] LibRado...
- 08:51 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- Adam Kupczyk wrote:
> This leak is from internals of RocksDB.
> We have no access to FileMetaData objects, we canno... - 07:34 AM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- This leak is from internals of RocksDB.
We have no access to FileMetaData objects, we cannot be responsible for this... - 08:48 PM Backport #51549: pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42211
merged - 08:45 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- Adam, can you start talking a look at this?
- 03:24 PM Fix #52329 (Fix Under Review): src/vstart: The command "set config key osd_mclock_max_capacity_io...
- 02:28 PM Fix #52329 (Resolved): src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" ...
- The following was observed when bringing up a vstart cluster:...
- 07:45 AM Backport #52322 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- https://github.com/ceph/ceph/pull/43306
- 07:42 AM Bug #51000 (Pending Backport): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- 04:47 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- I see the same assertion error in this dead job - https://pulpito.ceph.com/yuriw-2021-08-16_21:15:00-rados-wip-yuri-t...
08/18/2021
- 11:19 PM Backport #51569 (In Progress): octopus: pool last_epoch_clean floor is stuck after pg merging
- 09:03 PM Backport #51569: octopus: pool last_epoch_clean floor is stuck after pg merging
- https://github.com/ceph/ceph/pull/42837
- 09:53 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
- ...
- 07:18 PM Bug #52316 (Resolved): qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(...
- 2021-08-17T03:12:45.055 INFO:tasks.workunit.client.0.smithi135.stderr:2021-08-17T03:12:45.052+0000 7f27d941a700 1 --...
- 03:50 AM Backport #52307 (Resolved): pacific: doc: clarify use of `rados rm` command
- https://github.com/ceph/ceph/pull/51260
- 03:50 AM Backport #52306 (Rejected): octopus: doc: clarify use of `rados rm` command
- 03:47 AM Bug #52288 (Pending Backport): doc: clarify use of `rados rm` command
08/17/2021
- 04:40 PM Bug #52012 (Fix Under Review): osd/scrub: src/osd/scrub_machine.cc: 55: FAILED ceph_assert(state_...
- 01:35 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- I searched a bit through the log I sent and I don't see any traces of a pg into the snaptrim state, probably because ...
- 07:12 AM Fix #51116: osd: Run osd bench test to override default max osd capacity for mclock.
- Removed the classification of the tracker as a "Feature". This is better classified as a "Fix" with the aim of improv...
- 04:09 AM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
- This is my crushmap
Also available in: Atom