Project

General

Profile

Activity

From 08/17/2021 to 09/15/2021

09/15/2021

09:23 PM Bug #52605 (Fix Under Review): osd: add scrub duration to pg dump
Neha Ojha
06:39 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-... Neha Ojha
06:05 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
... Neha Ojha
03:20 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
https://github.com/ceph/ceph/pull/43513 Backport Bot
03:18 PM Bug #52583 (Pending Backport): partial recovery become whole object recovery after restart osd
Kefu Chai
02:50 PM Backport #52337: octopus: ceph df detail reports dirty objects without a cache tier
Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42862
merged
Yuri Weinstein
02:43 PM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
https://github.com/ceph/ceph/pull/42498 merged Yuri Weinstein
02:18 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
2021-09-02 14:25:37.173453 7f2235baf700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/... ceph ceph

09/14/2021

05:53 PM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
Request to add new PG states to provide feedback to the admin when a PG scrub/repair is scheduled ( via command line,... Michael Kidd
01:14 PM Bug #52605 (Resolved): osd: add scrub duration to pg dump
We would like to add a new column to the pg dump which would give us the time it took for a pg to get scrubbed. Aishwarya Mathuria

09/13/2021

10:48 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
Looking at the osd and mon logs
Here's when the osd was restarted in revive_osd()...
Neha Ojha
09:05 PM Backport #52596 (Rejected): octopus: make bufferlist::c_str() skip rebuild when it isn't necessary
Backport Bot
09:05 PM Backport #52595 (Rejected): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
Backport Bot
09:03 PM Bug #51725 (Pending Backport): make bufferlist::c_str() skip rebuild when it isn't necessary
This spares a heap allocation on every received message. Marking for backporting to both octopus and pacific as a lo... Ilya Dryomov
02:21 PM Bug #45871: Incorrect (0) number of slow requests in health check
On a... Nico Schottelius
10:35 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
https://github.com/ceph/ceph/pull/41731 Backport Bot
10:24 AM Bug #52583: partial recovery become whole object recovery after restart osd
FIX URL:
https://github.com/ceph/ceph/pull/43146
https://github.com/ceph/ceph/pull/42904
jianwei zhang
09:43 AM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
Problem: After the osd that is undergoing partial recovery is restarted, the data recovery is rolled back from the pa... jianwei zhang
05:32 AM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
CLI - osd pool rm --help message is wrong or misleading
Version-Release number of selected component (if applicabl...
Vasishta Shastry
04:19 AM Bug #52445: OSD asserts on starting too many pushes
Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
Amudhan Pandia n

09/09/2021

11:09 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
I got the same assert for 14.2.22 when the scenario is replayed
1. make rbd pool write-back cache layer
2. restar...
Pawel Stefanski
10:00 PM Bug #52140 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
Neha Ojha
09:57 PM Bug #52141 (Need More Info): crash: void OSD::load_pgs(): abort
Neha Ojha
09:53 PM Bug #52142 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
Neha Ojha
09:52 PM Bug #52145 (Duplicate): crash: OSDMapRef OSDService::get_map(epoch_t): assert(ret)
Neha Ojha
09:49 PM Bug #52147 (Duplicate): crash: rocksdb::InstrumentedMutex::Lock()
Neha Ojha
09:48 PM Bug #52148 (Duplicate): crash: pthread_getname_np()
Neha Ojha
09:47 PM Bug #52149 (Duplicate): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_...
Neha Ojha
09:46 PM Bug #52150 (Won't Fix): crash: bool HealthMonitor::check_member_health(): assert(store_size > 0)
Neha Ojha
09:37 PM Bug #52152 (Duplicate): crash: pthread_getname_np()
Neha Ojha
09:37 PM Bug #52154 (Won't Fix): crash: Infiniband::MemoryManager::Chunk::write(char*, unsigned int)
RDMA is not being actively worked on. Neha Ojha
09:32 PM Bug #52155 (Need More Info): crash: pthread_rwlock_rdlock() in queue_want_up_thru
Neha Ojha
09:30 PM Bug #52156 (Duplicate): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(err == 0)
Neha Ojha
09:28 PM Bug #52158 (Need More Info): crash: ceph::common::PerfCounters::set(int, unsigned long)
Neha Ojha
09:25 PM Bug #52159 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Neha Ojha
09:25 PM Bug #52160 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Neha Ojha
09:24 PM Bug #52153 (Won't Fix): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
Josh Durgin
09:22 PM Bug #52161 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb. Neha Ojha
09:22 PM Bug #52163 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb. Neha Ojha
09:20 PM Bug #52165 (Rejected): crash: void MonitorDBStore::clear(std::set<std::__cxx11::basic_string<char...
A non-zero return value could be possibly due to a rocksdb corruption and there just 2 clusters reporting this. Neha Ojha
09:17 PM Bug #52166 (Won't Fix): crash: void Device::binding_port(ceph::common::CephContext*, int): assert...
RDMA is not being actively worked on, this is one cluster reporting all the crashes. Neha Ojha
09:15 PM Bug #52167 (Won't Fix): crash: RDMAConnectedSocketImpl::RDMAConnectedSocketImpl(ceph::common::Cep...
RDMA is not being actively worked on, this is one cluster reporting all the crashes. Neha Ojha
09:14 PM Bug #52162 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
Josh Durgin
09:14 PM Bug #52164 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
Josh Durgin
09:13 PM Bug #52168 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
Neha Ojha
09:06 PM Bug #52170 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
Neha Ojha
09:05 PM Bug #52171 (Triaged): crash: virtual int RocksDBStore::get(const string&, const string&, ceph::bu...
Seen on 2 clusters, could be related to some sort of rocksdb corruption. Neha Ojha
09:01 PM Bug #52173 (Need More Info): crash in ProtocolV2::send_message()
Seen on 2 octopus clusters. Neha Ojha
08:45 PM Bug #52189 (Need More Info): crash in AsyncConnection::maybe_start_delay_thread()
We'll need more information to debug a crash like this. Neha Ojha
05:25 PM Backport #51605: pacific: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42976
merged
Yuri Weinstein
05:20 PM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
https://github.com/ceph/ceph/pull/41731 Backport Bot
05:19 PM Fix #52025 (Pending Backport): osd: Add config option to skip running the OSD benchmark on init.
Merged https://github.com/ceph/ceph/pull/42604 Sridhar Seshasayee
05:17 PM Fix #52329 (Pending Backport): src/vstart: The command "set config key osd_mclock_max_capacity_io...
Sridhar Seshasayee
05:10 PM Fix #52329: src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" fails with ...
https://github.com/ceph/ceph/pull/42853 merged Yuri Weinstein
04:17 PM Bug #52562 (Closed): Thrashosds read error injection failed with error ENXIO
/a/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379886
As part of the th...
Sridhar Seshasayee
02:24 PM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
Igor Fedotov
10:15 AM Backport #52557 (Resolved): pacific: pybind: rados.RadosStateError raised when closed watch objec...
https://github.com/ceph/ceph/pull/51259 Backport Bot
10:15 AM Backport #52556 (Rejected): octopus: pybind: rados.RadosStateError raised when closed watch objec...
Backport Bot
10:11 AM Bug #52553 (Pending Backport): pybind: rados.RadosStateError raised when closed watch object goes...
Mykola Golub
07:10 AM Bug #52553 (Fix Under Review): pybind: rados.RadosStateError raised when closed watch object goes...
Tim Serong
07:06 AM Bug #52553 (Resolved): pybind: rados.RadosStateError raised when closed watch object goes out of ...
This one is easiest to demonstrate by example. Here's some code:... Tim Serong

09/08/2021

09:57 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
I got the logs of sd.{3,11,13}'s during their boot. This data was collected with tuning up the log level.
https://...
Satoru Takeuchi
06:57 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
The ticket can be closed from our side - and it may be a duplicate, but I'm not able to say this for sure. But I have... Roland Sommer
05:30 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
Roland Sommer wrote:
> The cluster is running without any problems since we rolled out the latest dev release from t...
Igor Fedotov
01:17 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
The cluster is running without any problems since we rolled out the latest dev release from the pacific branch to all... Roland Sommer
09:21 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
We started rolling out 16.2.5-522-gde2ff323-1bionic from the dev repos on the osd nodes, as there is no release/tag v... Roland Sommer
05:35 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
This could be related to https://tracker.ceph.com/issues/52089
@Roland, could yu please upgrade to 16.2.6 and update...
Igor Fedotov
04:42 PM Bug #52408: osds not peering correctly after startup
Erm, in fact, right after doing cephadm bootstrap, before rebooting anything:... Jeff Layton
04:17 PM Bug #52408: osds not peering correctly after startup
Odd. The hosts in question are all KVM nodes on the same physical host, so I wouldn't expect networking issues.
I ...
Jeff Layton
03:42 PM Backport #51952 (In Progress): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log()....
Cory Snyder
09:11 AM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
Increasing priority, as this happens pretty often in the ceph-volume jenkins jobs recently Sebastian Wagner
09:02 AM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
seeing failures in ceph-volume CI because of monitor crashing after an OSD gets destroyed.... Guillaume Abrioux

09/07/2021

02:34 PM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
Nautilus is EOL and the backport is too intrusive. Mykola Golub
01:23 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
I attached another graph showing the increased amount of written data. Roland Sommer
10:01 AM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
After having run pacific in our low volume staging system for 2 months, yesterday we upgraded our production cluster ... Roland Sommer
10:55 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
PG was actually inconsistent... Konstantin Shalygin

09/06/2021

06:42 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
Hey Ilya, nope, since the issue was seen in pacific I thought it might be something we backported to recent versions.... Deepika Upadhyay
06:34 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
On osd0 and osd1:... Ilya Dryomov
04:20 PM Bug #51799 (Resolved): osd: snaptrim logs to derr at every tick
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:18 PM Bug #52421 (Resolved): test tracker
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:04 PM Backport #52336 (Resolved): pacific: ceph df detail reports dirty objects without a cache tier
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42860
m...
Loïc Dachary
04:03 PM Backport #51830 (Resolved): pacific: set a non-zero default value for osd_client_message_cap
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42615
m...
Loïc Dachary
04:02 PM Backport #51290: pacific: mon: stretch mode clusters do not sanely set default crush rules
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42909
m...
Loïc Dachary
09:20 AM Bug #52513 (New): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
We get crash of two simultaneously OSD's served 17.7ff [684,768,760] ... Konstantin Shalygin

09/05/2021

02:34 PM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
Hi, we get a couple of outages with two stuck PG it premerge+peered state:... Konstantin Shalygin

09/03/2021

05:25 PM Bug #52503 (New): cli_generic.sh: slow ops when trying rand write on cache pools
failing: leads to slow ops: http://qa-proxy.ceph.com/teuthology/yuriw-2021-09-01_19:04:25-rbd-wip-yuri-testing-2021-0... Deepika Upadhyay
02:13 PM Bug #52445: OSD asserts on starting too many pushes
Hi,
I have managed to set debug log using ceph config set command and taken log output.
# options before changi...
Amudhan Pandia n
06:44 AM Bug #52445: OSD asserts on starting too many pushes
Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
Amudhan Pandia n

09/02/2021

10:12 PM Bug #44715 (Fix Under Review): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_li...
Neha Ojha
10:10 PM Bug #52408: osds not peering correctly after startup
Jeff Layton wrote:
> Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts o...
Neha Ojha
09:53 PM Backport #52498 (Rejected): nautilus: test tracker: please ignore
Deepika Upadhyay
09:50 PM Bug #52445 (Need More Info): OSD asserts on starting too many pushes
Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of ceph -s?
Is this crash...
Neha Ojha
09:42 PM Backport #52497 (Rejected): octopus: test tracker: please ignore
Deepika Upadhyay
08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
Ronen Friedman wrote:
> Some possibly helpful hints:
> 1. In "my" specific instance, the pg address handed over to ...
Neha Ojha
08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
more useful debug logging being added in https://github.com/ceph/ceph/pull/42965 Neha Ojha
05:48 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
I dug into this more today and I am wondering if it has something to do with `_conf->cluster` not being set right (to... Andrew Davidoff
01:26 PM Backport #52495 (Rejected): pacific: test tracker: please ignore
Deepika Upadhyay
01:26 PM Bug #52486 (Pending Backport): test tracker: please ignore
Deepika Upadhyay
06:25 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
/a/yuriw-2021-08-27_21:20:08-rados-wip-yuri2-testing-2021-08-27-1207-distro-basic-smithi/6363835 Aishwarya Mathuria
01:22 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
I'm in the midst of an upgrade from Octopus to Pacific. Due to issues during the upgrade, rather than simply upgradin... Chris Dunlop
12:42 AM Bug #52488 (New): Pacific mon won't join Octopus mons
I'm in the midst of an upgrade from Octopus to Pacific. Due to issues with the version of docker available on Debian ... Chris Dunlop

09/01/2021

11:55 PM Bug #52421 (Pending Backport): test tracker
Deepika Upadhyay
07:24 PM Bug #52486 (Closed): test tracker: please ignore
please ignore Deepika Upadhyay
05:18 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2021-08-31_22:30:47-rados-wip-yuri8-testing-2021-08-30-0930-pacific-distro-basic-smithi/6369129/remote/smith... Neha Ojha
03:55 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
Some possibly helpful hints:
1. In "my" specific instance, the pg address handed over to register_and_wake_split_chi...
Ronen Friedman

08/31/2021

09:57 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
Sage Weil
09:41 PM Bug #52421 (Resolved): test tracker
Deepika Upadhyay
09:41 PM Backport #52475 (Resolved): octopus: test tracker
Deepika Upadhyay
09:20 PM Backport #52475 (Resolved): octopus: test tracker
Backport Bot
09:40 PM Backport #52474 (Resolved): nautilus: test tracker
Deepika Upadhyay
08:52 PM Backport #52474 (Resolved): nautilus: test tracker
Deepika Upadhyay
09:40 PM Backport #52466 (Resolved): pacific: test tracker
Deepika Upadhyay
03:44 PM Backport #52466 (Resolved): pacific: test tracker
Deepika Upadhyay
08:59 AM Bug #49697: prime pg temp: unexpected optimization
Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af69e" can solve my is... fan chen
03:44 AM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
Neha Ojha wrote:
> can you share your osdmap? are all your osds up and in? the crushmap looks fine.
wish to get y...
Ke Xiao

08/30/2021

04:59 PM Bug #52408: osds not peering correctly after startup
Other requested info from this rebuild of the cluster:... Jeff Layton
04:57 PM Bug #52408: osds not peering correctly after startup
Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts or just 1. I ran it on ... Jeff Layton
02:28 PM Bug #50657: smart query on monitors
Yaarit Hatuka wrote:
> Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
We hav...
Hannes von Haugwitz
08:56 AM Bug #50657 (Pending Backport): smart query on monitors
Deepika Upadhyay
01:44 PM Backport #51605 (In Progress): pacific: bufferlist::splice() may cause stack corruption in buffer...
Ilya Dryomov
01:44 PM Backport #51604 (In Progress): octopus: bufferlist::splice() may cause stack corruption in buffer...
Ilya Dryomov
09:00 AM Backport #52451 (Resolved): octopus: smart query on monitors
https://github.com/ceph/ceph/pull/44177 Backport Bot
09:00 AM Backport #52450 (Resolved): pacific: smart query on monitors
https://github.com/ceph/ceph/pull/44164 Backport Bot
07:01 AM Bug #52448 (Fix Under Review): osd: pg may get stuck in backfill_toofull after backfill is interr...
Mykola Golub
06:51 AM Bug #52448 (Resolved): osd: pg may get stuck in backfill_toofull after backfill is interrupted du...
Consider a scenario:
- Data is written to a pool so one osd X is close to full but still lower than nearfool/toofu...
Mykola Golub

08/28/2021

02:59 PM Bug #52445 (New): OSD asserts on starting too many pushes
I am running ceph version 15.2.5 cluster in the recent days scrub reported error and few pg failed due to OSD's rando... Amudhan Pandia n

08/27/2021

04:28 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2021-08-26_18:40:53-rados-wip-yuri7-testing-2021-08-26-0841-distro-basic-smithi/6360450/remote/smithi052/log... Neha Ojha
01:21 PM Bug #52421 (Pending Backport): test tracker
Deepika Upadhyay

08/26/2021

09:55 PM Bug #52172 (Triaged): crash: ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsig...
Neha Ojha
09:51 PM Bug #52174 (Triaged): crash: ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsig...
Neha Ojha
09:46 PM Bug #52176 (Duplicate): crash: std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrus...
Neha Ojha
09:41 PM Bug #52178 (Duplicate): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(ret == 0)
Neha Ojha
09:37 PM Bug #52180 (Duplicate): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_...
Neha Ojha
09:37 PM Bug #47299 (New): Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
Neha Ojha
09:33 PM Bug #52183 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
Neha Ojha
09:31 PM Bug #52186 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
Neha Ojha
09:29 PM Bug #52195 (Duplicate): crash: /lib64/libpthread.so.0(
Neha Ojha
09:26 PM Bug #52190 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb.
Neha Ojha
09:26 PM Bug #52191 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb.
Neha Ojha
09:25 PM Bug #52192 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb.
Neha Ojha
09:25 PM Bug #52193 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb.
Neha Ojha
09:25 PM Bug #52197 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb.
Neha Ojha
09:23 PM Bug #52198 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
Neha Ojha
09:22 PM Bug #52199 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
Neha Ojha
09:21 PM Bug #52200 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
Neha Ojha
09:18 PM Bug #52207 (Duplicate): crash: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<ch...
Neha Ojha
09:17 PM Bug #52210 (Closed): crash: CrushWrapper::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)
One cluster reporting all the crashes, likely failing to decode due to a corrupted on disk state. Neha Ojha
09:15 PM Bug #52211 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
Not a ceph bug, most likely failed to write to rocksdb. Neha Ojha
09:13 PM Bug #52212 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Neha Ojha
09:11 PM Bug #52213 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
Neha Ojha
09:10 PM Bug #52214 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
Neha Ojha
09:10 PM Bug #52217 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
Neha Ojha
09:10 PM Bug #52218 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
Neha Ojha
09:09 PM Bug #44715 (New): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->o...
Neha Ojha
09:07 PM Bug #52220: crash: void ECUtil::HashInfo::append(uint64_t, std::map<int, ceph::buffer::v15_2_0::l...
One cluster reporting all the crashes. Neha Ojha
09:06 PM Bug #52221 (Triaged): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
Josh Durgin
09:04 PM Bug #52143 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
Neha Ojha
09:00 PM Bug #52225: crash: void Thread::create(const char*, size_t): assert(ret == 0)
One cluster is reporting all the crashes. Neha Ojha
08:59 PM Bug #52226: crash: PosixNetworkStack::spawn_worker(unsigned int, std::function<void ()>&&)
One cluster reporting all the crashes. Neha Ojha
08:58 PM Bug #52231: crash: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::al...
One cluster is reporting all the crashes. Neha Ojha
08:56 PM Bug #52233: crash: void Infiniband::init(): assert(device)
One cluster is reporting all the crashes. Neha Ojha
08:19 PM Feature #52424 (Resolved): [RFE] Limit slow request details to mgr log
Slow requests can overwhelm a cluster log with too many details, filling up the monitor DB.
There's no need to log...
Vikhyat Umrao
08:04 PM Feature #51984: [RFE] Provide warning when the 'require-osd-release' flag does not match current ...
Please check - https://tracker.ceph.com/issues/52423 Vikhyat Umrao
08:02 PM Feature #52423 (New): Do not allow running enable-msgr2 if cluster don't have osd release set to ...
Do not allow running enable-msgr2 if cluster don't have osd release set to nautilus
See also - https://tracker.ceph....
Vikhyat Umrao
07:53 PM Bug #50657: smart query on monitors
Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
> Do you have a bug number for...
Yaarit Hatuka
07:30 PM Bug #50657: smart query on monitors
> > Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
>
> Yes, bare metal d...
Jan-Philipp Litza
11:00 AM Bug #50657: smart query on monitors
Yaarit Hatuka wrote:
> This fixes the missing sudoers file in mon nodes:
> https://github.com/ceph/ceph/pull/42913
...
Hannes von Haugwitz
07:49 PM Bug #52408: osds not peering correctly after startup
Thanks for providing these logs, but they don't have debug_osd=20 (we need it on all the osds). The pg query for 1.7c... Neha Ojha
10:46 AM Bug #52408: osds not peering correctly after startup
Tore down and rebuild the cluster again using my quincy-based image. This time, I didn't create any filesystems. ceph... Jeff Layton
06:14 PM Bug #52421 (Resolved): test tracker
please ignore Deepika Upadhyay
05:58 PM Bug #52418 (New): workloads/dedup-io-snaps: ceph_assert(!context->check_oldest_snap_flushed(oid, ...
/a/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356797... Sridhar Seshasayee
05:12 PM Bug #52416 (Resolved): devices: mon devices appear empty when scraping SMART metrics
When invoking smartctl on mon devices, the device name is empty:... Yaarit Hatuka
03:56 PM Bug #52415 (Closed): rocksdb: build error with rocksdb-6.22.x
https://github.com/ceph/ceph/pull/42815 Kaleb KEITHLEY
03:11 PM Bug #52415: rocksdb: build error with rocksdb-6.22.x
possibly fixed by https://github.com/ceph/ceph/pull/42815? Casey Bodley
01:58 PM Bug #52415 (Resolved): rocksdb: build error with rocksdb-6.22.x
Fedora rawhide (f35, f36) have recently upgraded to rocksdb-6.22.1
Now ceph's rocksdb integration fails to compile...
Kaleb KEITHLEY
04:10 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
... jianwei zhang

08/25/2021

08:39 PM Bug #52408: osds not peering correctly after startup
Tore down the old cluster and built a Pacific one (v16.2.5). That one doesn't have the same issue. I'll do a clean te... Jeff Layton
07:44 PM Bug #52408: osds not peering correctly after startup
peering info:... Jeff Layton
06:53 PM Bug #52408: osds not peering correctly after startup
Nothing in the logs for crashed osd.0. I think the last thing in the logs was a rocksdb dump. coredumpctl also didn't... Jeff Layton
06:38 PM Bug #52408: osds not peering correctly after startup
Jeff Layton wrote:
> This time when I brought it up, one osd didn't go "up". First two bits of info you asked for:
...
Neha Ojha
06:17 PM Bug #52408: osds not peering correctly after startup
This time when I brought it up, one osd didn't go "up". First two bits of info you asked for:... Jeff Layton
05:33 PM Bug #52408: osds not peering correctly after startup
1. Can you try to reproduce this with 1 pool containing few pgs?
2. Turn the autoscaler off (ceph osd pool set foo p...
Neha Ojha
01:46 PM Bug #52408: osds not peering correctly after startup
My current build is based on upstream commit a49f10e760b4. It has some MDS patches on top, but nothing that should af... Jeff Layton
01:45 PM Bug #52408 (Can't reproduce): osds not peering correctly after startup
I might not have the right terminology here. I have a host that I run 3 VMs on that act as cephadm cluster nodes (mos... Jeff Layton
04:00 AM Bug #50657 (Fix Under Review): smart query on monitors
This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913
We'll address the fix f...
Yaarit Hatuka

08/24/2021

09:54 PM Backport #52336: pacific: ceph df detail reports dirty objects without a cache tier
Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42860
merged
Yuri Weinstein
09:53 PM Backport #51830: pacific: set a non-zero default value for osd_client_message_cap
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42615
merged
Yuri Weinstein
08:12 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
Greg Farnum
05:54 PM Backport #51290 (In Progress): pacific: mon: stretch mode clusters do not sanely set default crus...
Greg Farnum
06:31 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-08-23_19:24:05-rados-wip-yuri4-testing-2021-08-23-0812-pacific-distro-basic-smithi/6353883 Neha Ojha
06:16 PM Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing()....
Causing failures in pacific: /a/yuriw-2021-08-23_19:24:05-rados-wip-yuri4-testing-2021-08-23-0812-pacific-distro-basi... Neha Ojha
10:45 AM Bug #50441 (Resolved): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Dan Mick wrote:
> Deepika, was that the reason why?
yep Dan, Neha marked needs info because of MB's comment, mark...
Deepika Upadhyay
12:40 AM Bug #52385 (Closed): a possible data loss due to recovery_unfound PG after restarting all nodes
Related to the discussion in ceph-users ML.
https://marc.info/?l=ceph-users&m=162947327817532&w=2
I encountered a...
Satoru Takeuchi

08/23/2021

09:53 PM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Deepika, was that the reason why? Dan Mick
08:08 PM Backport #51549 (Resolved): pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42211
m...
Loïc Dachary
08:03 PM Backport #51568 (Resolved): pacific: pool last_epoch_clean floor is stuck after pg merging
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42224
m...
Loïc Dachary
05:54 PM Fix #52329: src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" fails with ...
Mon logs showing that the command is capable after the fix is applied:... Sridhar Seshasayee
04:51 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
Just wanted to note that we recently encountered what appears to be the same issue on some Luminous (12.2.12) cluster... Joshua Baergen

08/20/2021

04:14 PM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
Neha Ojha wrote:
> can you share your osdmap? are all your osds up and in? the crushmap looks fine.
all the osds ...
Ke Xiao
05:30 AM Backport #52337 (In Progress): octopus: ceph df detail reports dirty objects without a cache tier
Deepika Upadhyay
02:36 AM Backport #52337 (Resolved): octopus: ceph df detail reports dirty objects without a cache tier
https://github.com/ceph/ceph/pull/42862 Deepika Upadhyay
03:02 AM Backport #52336: pacific: ceph df detail reports dirty objects without a cache tier
https://github.com/ceph/ceph/pull/42860 Deepika Upadhyay
02:36 AM Backport #52336 (Resolved): pacific: ceph df detail reports dirty objects without a cache tier
https://github.com/ceph/ceph/pull/42860 Deepika Upadhyay
02:36 AM Bug #52335 (Pending Backport): ceph df detail reports dirty objects without a cache tier
Deepika Upadhyay
02:32 AM Bug #52335 (Resolved): ceph df detail reports dirty objects without a cache tier
Description of problem:
'ceph df detail' reports a column for DIRTY objects under POOLS even though cache tiers are ...
Deepika Upadhyay

08/19/2021

10:48 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
I don't have the logs right now but it prints the state of the PG so if you search for`snaptrim` in `f0208568-fbf4-48... Arthur Outhenin-Chalandre
08:48 PM Bug #52026 (New): osd: pgs went back into snaptrim state after osd restart
Thanks for providing the logs, is there a particular PG we should look at the in the logs? Neha Ojha
09:19 PM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
I assume because of MB's comment, but that seems now to be historical Dan Mick
09:17 PM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Deepika: why is this issue in need-more-info? Looks like the original fix and pacific backport https://github.com/cep... Neha Ojha
09:12 PM Bug #48844 (Duplicate): api_watch_notify: LibRadosWatchNotify.AioWatchDelete failed
Neha Ojha
09:08 PM Bug #52261 (Need More Info): OSD takes all memory and crashes, after pg_num increase
Neha Ojha
09:08 PM Bug #52255 (Need More Info): The pgs state are degraded, but all the osds is up and there is no r...
Neha Ojha
09:08 PM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
can you share your osdmap? are all your osds up and in? the crushmap looks fine. Neha Ojha
08:54 PM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
Brad, are you aware of this one? Neha Ojha
03:54 AM Bug #52319 (New): LibRadosWatchNotify.WatchNotify2 fails
2021-08-17T01:34:43.023 INFO:tasks.workunit.client.0.smithi111.stdout: api_watch_notify: [ RUN ] LibRado... Aishwarya Mathuria
08:51 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
Adam Kupczyk wrote:
> This leak is from internals of RocksDB.
> We have no access to FileMetaData objects, we canno...
Neha Ojha
07:34 AM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
This leak is from internals of RocksDB.
We have no access to FileMetaData objects, we cannot be responsible for this...
Adam Kupczyk
08:48 PM Backport #51549: pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42211
merged
Yuri Weinstein
08:45 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
Adam, can you start talking a look at this? Neha Ojha
03:24 PM Fix #52329 (Fix Under Review): src/vstart: The command "set config key osd_mclock_max_capacity_io...
Sridhar Seshasayee
02:28 PM Fix #52329 (Resolved): src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" ...
The following was observed when bringing up a vstart cluster:... Sridhar Seshasayee
07:45 AM Backport #52322 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
https://github.com/ceph/ceph/pull/43306 Backport Bot
07:42 AM Bug #51000 (Pending Backport): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
Kefu Chai
04:47 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
I see the same assertion error in this dead job - https://pulpito.ceph.com/yuriw-2021-08-16_21:15:00-rados-wip-yuri-t... Aishwarya Mathuria

08/18/2021

11:19 PM Backport #51569 (In Progress): octopus: pool last_epoch_clean floor is stuck after pg merging
Neha Ojha
09:03 PM Backport #51569: octopus: pool last_epoch_clean floor is stuck after pg merging
https://github.com/ceph/ceph/pull/42837 Steve Taylor
09:53 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
... Neha Ojha
07:18 PM Bug #52316 (Resolved): qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(...
2021-08-17T03:12:45.055 INFO:tasks.workunit.client.0.smithi135.stderr:2021-08-17T03:12:45.052+0000 7f27d941a700 1 --... Aishwarya Mathuria
03:50 AM Backport #52307 (Resolved): pacific: doc: clarify use of `rados rm` command
https://github.com/ceph/ceph/pull/51260 Backport Bot
03:50 AM Backport #52306 (Rejected): octopus: doc: clarify use of `rados rm` command
Backport Bot
03:47 AM Bug #52288 (Pending Backport): doc: clarify use of `rados rm` command
Kefu Chai

08/17/2021

04:40 PM Bug #52012 (Fix Under Review): osd/scrub: src/osd/scrub_machine.cc: 55: FAILED ceph_assert(state_...
Neha Ojha
01:35 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
I searched a bit through the log I sent and I don't see any traces of a pg into the snaptrim state, probably because ... Arthur Outhenin-Chalandre
07:12 AM Fix #51116: osd: Run osd bench test to override default max osd capacity for mclock.
Removed the classification of the tracker as a "Feature". This is better classified as a "Fix" with the aim of improv... Sridhar Seshasayee
04:09 AM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
This is my crushmap Ke Xiao
 

Also available in: Atom