Activity
From 08/24/2021 to 09/22/2021
09/22/2021
- 10:45 PM Backport #52710 (Resolved): octopus: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/44165
- 10:35 PM Backport #43623 (Rejected): nautilus: pg: fastinfo incorrect when last_update moves backward in time
- Nautilus is EOL.
- 10:33 PM Backport #44835 (Rejected): nautilus: librados mon_command (mgr) command hang
- Nautilus is EOL
- 10:32 PM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
- 06:18 PM Backport #51555 (In Progress): octopus: mon: return -EINVAL when handling unknown option in 'ceph...
- 06:02 PM Backport #51552 (In Progress): octopus: rebuild-mondb hangs
- 05:39 PM Bug #51527: Ceph osd crashed due to segfault
- I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15.2.13.
I can reproduce it... - 05:24 PM Bug #52707 (New): mixed pool types reported via telemetry
- In telemetry reports there are clusters with pools of type `replicated` with erasure_code_profile defined, for exampl...
- 09:15 AM Support #52700 (New): OSDs wont start
- We have a three node ceph cluster with 3 baremetal nodes running 4 OSDs each (In total 12 OSDs). Recently, my collegu...
09/21/2021
- 11:51 PM Bug #52694 (Duplicate): src/messages/MOSDPGLog.h: virtual void MOSDPGLog::encode_payload(uint64_t...
- ...
- 08:47 PM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
- 11:23 AM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
- Following an old change that made operator-initiated scrubs into a type of 'scheduled scrubs':
the operator command ... - 03:30 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- I think I found an issue in `ECBackend::get_hash_info` that might be responsible for introducing the inconsistency in...
- 12:13 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- Just for the record. In our customer case it was a mix of bluestore and filestore osds. The primary osd was the fails...
- 11:31 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- That's awesome, thanks. The behaviour you suggest sounds sensible to me.
Since it's been a while, I should probabl... - 10:44 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- We have a customer who experienced the same issue. In our case the hash info was corrupted only on two shards. I have...
- 10:38 AM Bug #48959 (Fix Under Review): Primary OSD crash caused corrupted object and further crashes duri...
- 01:45 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
- ...
- 11:43 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- ...
- 09:00 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Thank you for your reply.
My cluster has another trouble now. So I'll take these logs after resolving this problem.
09/20/2021
- 04:29 PM Bug #38931 (Resolved): osd does not proactively remove leftover PGs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:24 PM Bug #52335 (Resolved): ceph df detail reports dirty objects without a cache tier
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:23 PM Backport #51966 (Resolved): nautilus: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42617
m... - 04:23 PM Backport #51583 (Resolved): nautilus: osd does not proactively remove leftover PGs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42240
m... - 04:21 PM Backport #52337 (Resolved): octopus: ceph df detail reports dirty objects without a cache tier
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42862
m... - 04:19 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Neha Ojha wrote:
> > [...]
> >
> > Looks like peering induced by mapping change by the... - 07:17 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/p...
09/19/2021
09/18/2021
- 12:23 PM Bug #52657 (Fix Under Review): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(featu...
Test: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overr...- 11:21 AM Backport #51497 (In Progress): nautilus: mgr spamming with repeated set pgp_num_actual while merging
- 11:07 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "...
- 11:04 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- Neha, sorry, logs already rotated by logrotate.
When I was removes all upmaps from this pool - all PG merges passed ... - 01:08 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is not be observed by ...
- 12:38 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Neha Ojha wrote:
> [...]
>
> Looks like peering induced by mapping change by the balancer. How often does this ha... - 12:03 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- Chris Dunlop wrote:
> Neha Ojha wrote:
> > This is expected when mons don't form quorum, here it was caused by http...
09/17/2021
- 11:29 PM Bug #52489: Adding a Pacific MON to an Octopus cluster: All PGs inactive
- Neha Ojha wrote:
> This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issu... - 10:26 PM Bug #52489 (Duplicate): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use ...
- 10:52 PM Bug #52509 (Need More Info): PG merge: PG stuck in premerge+peered state
- This is similar to https://tracker.ceph.com/issues/44684, which has already been fixed. It seems like the pgs are in ...
- 10:30 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Most likely caused by failure injections, might be worth looking at what was going in the osd logs when we started se...
- 10:13 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)?
- 09:58 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- Is this reproducible on octopus or pacific, which are not EOL?
- 09:55 PM Bug #52445 (New): OSD asserts on starting too many pushes
- Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-po...
- 09:47 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- Please re-open if you happen to see the same issue on a recent release.
- 09:46 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- ...
- 09:35 PM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
- We can workaround this by temporarily increasing mon_max_pg_per_osd, right?
- 03:41 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- https://github.com/ceph/ceph/pull/43201
- 03:15 AM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
- At first,my cluster have 6 osds,3 pools which pgnum are all 256 and size are 2,mon_max_pg_per_osd is 300.So ,we need ...
- 09:30 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
- Deepika Upadhyay wrote:
> [...]
> /ceph/teuthology-archive/yuriw-2021-09-13_19:12:32-rados-wip-yuri6-testing-2021-0... - 12:32 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
- ...
- 09:24 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- The log attached has a sha1 ca906d0d7a65c8a598d397b764dd262cce645fe3, is this the first time you encountered this iss...
- 05:56 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- /a/sage-2021-09-16_18:04:19-rados-wip-sage-testing-2021-09-16-1020-distro-basic-smithi/6393058
note that this is m... - 04:27 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- /a/yuriw-2021-09-16_18:23:18-rados-wip-yuri2-testing-2021-09-16-0923-distro-basic-smithi/6393474
- 03:46 PM Documentation #35968 (Won't Fix): [doc][jewel] sync documentation "OSD Config Reference" default ...
- 09:41 AM Bug #50351 (Resolved): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restar...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:35 AM Backport #51605 (Resolved): pacific: bufferlist::splice() may cause stack corruption in bufferlis...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42976
m... - 08:35 AM Backport #52644 (In Progress): nautilus: pool last_epoch_clean floor is stuck after pg merging
- 08:34 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
- https://github.com/ceph/ceph/pull/43204
- 06:09 AM Bug #48508 (Resolved): Donot roundoff the bucket weight while decompiling crush map to source
09/16/2021
- 10:14 PM Backport #51966: nautilus: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42617
merged - 10:13 PM Backport #51583: nautilus: osd does not proactively remove leftover PGs
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42240
merged - 07:01 PM Bug #49697 (Resolved): prime pg temp: unexpected optimization
- fan chen wrote:
> Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af6... - 11:16 AM Bug #52408 (Can't reproduce): osds not peering correctly after startup
- I rebuilt my cluster yesterday using a container image based on commit d906f946e845, and I'm not able to reproduce th...
- 02:08 AM Bug #52624 (New): qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABIL...
- /ceph/teuthology-archive/pdonnell-2021-09-14_01:17:08-fs-wip-pdonnell-testing-20210910.181451-distro-basic-smithi/638...
09/15/2021
- 09:23 PM Bug #52605 (Fix Under Review): osd: add scrub duration to pg dump
- 06:39 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-...
- 06:05 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
- ...
- 03:20 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/43513
- 03:18 PM Bug #52583 (Pending Backport): partial recovery become whole object recovery after restart osd
- 02:50 PM Backport #52337: octopus: ceph df detail reports dirty objects without a cache tier
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42862
merged - 02:43 PM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
- https://github.com/ceph/ceph/pull/42498 merged
- 02:18 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- 2021-09-02 14:25:37.173453 7f2235baf700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/...
09/14/2021
- 05:53 PM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
- Request to add new PG states to provide feedback to the admin when a PG scrub/repair is scheduled ( via command line,...
- 01:14 PM Bug #52605 (Resolved): osd: add scrub duration to pg dump
- We would like to add a new column to the pg dump which would give us the time it took for a pg to get scrubbed.
09/13/2021
- 10:48 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
- Looking at the osd and mon logs
Here's when the osd was restarted in revive_osd()... - 09:05 PM Backport #52596 (Rejected): octopus: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:05 PM Backport #52595 (Rejected): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:03 PM Bug #51725 (Pending Backport): make bufferlist::c_str() skip rebuild when it isn't necessary
- This spares a heap allocation on every received message. Marking for backporting to both octopus and pacific as a lo...
- 02:21 PM Bug #45871: Incorrect (0) number of slow requests in health check
- On a...
- 10:35 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
- https://github.com/ceph/ceph/pull/41731
- 10:24 AM Bug #52583: partial recovery become whole object recovery after restart osd
- FIX URL:
https://github.com/ceph/ceph/pull/43146
https://github.com/ceph/ceph/pull/42904 - 09:43 AM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
- Problem: After the osd that is undergoing partial recovery is restarted, the data recovery is rolled back from the pa...
- 05:32 AM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
- CLI - osd pool rm --help message is wrong or misleading
Version-Release number of selected component (if applicabl... - 04:19 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
09/09/2021
- 11:09 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- I got the same assert for 14.2.22 when the scenario is replayed
1. make rbd pool write-back cache layer
2. restar... - 10:00 PM Bug #52140 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:57 PM Bug #52141 (Need More Info): crash: void OSD::load_pgs(): abort
- 09:53 PM Bug #52142 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:52 PM Bug #52145 (Duplicate): crash: OSDMapRef OSDService::get_map(epoch_t): assert(ret)
- 09:49 PM Bug #52147 (Duplicate): crash: rocksdb::InstrumentedMutex::Lock()
- 09:48 PM Bug #52148 (Duplicate): crash: pthread_getname_np()
- 09:47 PM Bug #52149 (Duplicate): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_...
- 09:46 PM Bug #52150 (Won't Fix): crash: bool HealthMonitor::check_member_health(): assert(store_size > 0)
- 09:37 PM Bug #52152 (Duplicate): crash: pthread_getname_np()
- 09:37 PM Bug #52154 (Won't Fix): crash: Infiniband::MemoryManager::Chunk::write(char*, unsigned int)
- RDMA is not being actively worked on.
- 09:32 PM Bug #52155 (Need More Info): crash: pthread_rwlock_rdlock() in queue_want_up_thru
- 09:30 PM Bug #52156 (Duplicate): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(err == 0)
- 09:28 PM Bug #52158 (Need More Info): crash: ceph::common::PerfCounters::set(int, unsigned long)
- 09:25 PM Bug #52159 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:25 PM Bug #52160 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:24 PM Bug #52153 (Won't Fix): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:22 PM Bug #52161 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:22 PM Bug #52163 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:20 PM Bug #52165 (Rejected): crash: void MonitorDBStore::clear(std::set<std::__cxx11::basic_string<char...
- A non-zero return value could be possibly due to a rocksdb corruption and there just 2 clusters reporting this.
- 09:17 PM Bug #52166 (Won't Fix): crash: void Device::binding_port(ceph::common::CephContext*, int): assert...
- RDMA is not being actively worked on, this is one cluster reporting all the crashes.
- 09:15 PM Bug #52167 (Won't Fix): crash: RDMAConnectedSocketImpl::RDMAConnectedSocketImpl(ceph::common::Cep...
- RDMA is not being actively worked on, this is one cluster reporting all the crashes.
- 09:14 PM Bug #52162 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:14 PM Bug #52164 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:13 PM Bug #52168 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:06 PM Bug #52170 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
- 09:05 PM Bug #52171 (Triaged): crash: virtual int RocksDBStore::get(const string&, const string&, ceph::bu...
- Seen on 2 clusters, could be related to some sort of rocksdb corruption.
- 09:01 PM Bug #52173 (Need More Info): crash in ProtocolV2::send_message()
- Seen on 2 octopus clusters.
- 08:45 PM Bug #52189 (Need More Info): crash in AsyncConnection::maybe_start_delay_thread()
- We'll need more information to debug a crash like this.
- 05:25 PM Backport #51605: pacific: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42976
merged - 05:20 PM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
- https://github.com/ceph/ceph/pull/41731
- 05:19 PM Fix #52025 (Pending Backport): osd: Add config option to skip running the OSD benchmark on init.
- Merged https://github.com/ceph/ceph/pull/42604
- 05:17 PM Fix #52329 (Pending Backport): src/vstart: The command "set config key osd_mclock_max_capacity_io...
- 05:10 PM Fix #52329: src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" fails with ...
- https://github.com/ceph/ceph/pull/42853 merged
- 04:17 PM Bug #52562 (Closed): Thrashosds read error injection failed with error ENXIO
- /a/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379886
As part of the th... - 02:24 PM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- 10:15 AM Backport #52557 (Resolved): pacific: pybind: rados.RadosStateError raised when closed watch objec...
- https://github.com/ceph/ceph/pull/51259
- 10:15 AM Backport #52556 (Rejected): octopus: pybind: rados.RadosStateError raised when closed watch objec...
- 10:11 AM Bug #52553 (Pending Backport): pybind: rados.RadosStateError raised when closed watch object goes...
- 07:10 AM Bug #52553 (Fix Under Review): pybind: rados.RadosStateError raised when closed watch object goes...
- 07:06 AM Bug #52553 (Resolved): pybind: rados.RadosStateError raised when closed watch object goes out of ...
- This one is easiest to demonstrate by example. Here's some code:...
09/08/2021
- 09:57 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- I got the logs of sd.{3,11,13}'s during their boot. This data was collected with tuning up the log level.
https://... - 06:57 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- The ticket can be closed from our side - and it may be a duplicate, but I'm not able to say this for sure. But I have...
- 05:30 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- Roland Sommer wrote:
> The cluster is running without any problems since we rolled out the latest dev release from t... - 01:17 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- The cluster is running without any problems since we rolled out the latest dev release from the pacific branch to all...
- 09:21 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- We started rolling out 16.2.5-522-gde2ff323-1bionic from the dev repos on the osd nodes, as there is no release/tag v...
- 05:35 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- This could be related to https://tracker.ceph.com/issues/52089
@Roland, could yu please upgrade to 16.2.6 and update... - 04:42 PM Bug #52408: osds not peering correctly after startup
- Erm, in fact, right after doing cephadm bootstrap, before rebooting anything:...
- 04:17 PM Bug #52408: osds not peering correctly after startup
- Odd. The hosts in question are all KVM nodes on the same physical host, so I wouldn't expect networking issues.
I ... - 03:42 PM Backport #51952 (In Progress): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log()....
- 09:11 AM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- Increasing priority, as this happens pretty often in the ceph-volume jenkins jobs recently
- 09:02 AM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
- seeing failures in ceph-volume CI because of monitor crashing after an OSD gets destroyed....
09/07/2021
- 02:34 PM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
- Nautilus is EOL and the backport is too intrusive.
- 01:23 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- I attached another graph showing the increased amount of written data.
- 10:01 AM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- After having run pacific in our low volume staging system for 2 months, yesterday we upgraded our production cluster ...
- 10:55 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- PG was actually inconsistent...
09/06/2021
- 06:42 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Hey Ilya, nope, since the issue was seen in pacific I thought it might be something we backported to recent versions....
- 06:34 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- On osd0 and osd1:...
- 04:20 PM Bug #51799 (Resolved): osd: snaptrim logs to derr at every tick
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:18 PM Bug #52421 (Resolved): test tracker
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:04 PM Backport #52336 (Resolved): pacific: ceph df detail reports dirty objects without a cache tier
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42860
m... - 04:03 PM Backport #51830 (Resolved): pacific: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42615
m... - 04:02 PM Backport #51290: pacific: mon: stretch mode clusters do not sanely set default crush rules
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42909
m... - 09:20 AM Bug #52513 (New): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- We get crash of two simultaneously OSD's served 17.7ff [684,768,760] ...
09/05/2021
- 02:34 PM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
- Hi, we get a couple of outages with two stuck PG it premerge+peered state:...
09/03/2021
- 05:25 PM Bug #52503 (New): cli_generic.sh: slow ops when trying rand write on cache pools
- failing: leads to slow ops: http://qa-proxy.ceph.com/teuthology/yuriw-2021-09-01_19:04:25-rbd-wip-yuri-testing-2021-0...
- 02:13 PM Bug #52445: OSD asserts on starting too many pushes
- Hi,
I have managed to set debug log using ceph config set command and taken log output.
# options before changi... - 06:44 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
09/02/2021
- 10:12 PM Bug #44715 (Fix Under Review): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_li...
- 10:10 PM Bug #52408: osds not peering correctly after startup
- Jeff Layton wrote:
> Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts o... - 09:53 PM Backport #52498 (Rejected): nautilus: test tracker: please ignore
- 09:50 PM Bug #52445 (Need More Info): OSD asserts on starting too many pushes
- Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of ceph -s?
Is this crash... - 09:42 PM Backport #52497 (Rejected): octopus: test tracker: please ignore
- 08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Ronen Friedman wrote:
> Some possibly helpful hints:
> 1. In "my" specific instance, the pg address handed over to ... - 08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- more useful debug logging being added in https://github.com/ceph/ceph/pull/42965
- 05:48 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I dug into this more today and I am wondering if it has something to do with `_conf->cluster` not being set right (to...
- 01:26 PM Backport #52495 (Rejected): pacific: test tracker: please ignore
- 01:26 PM Bug #52486 (Pending Backport): test tracker: please ignore
- 06:25 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2021-08-27_21:20:08-rados-wip-yuri2-testing-2021-08-27-1207-distro-basic-smithi/6363835
- 01:22 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- I'm in the midst of an upgrade from Octopus to Pacific. Due to issues during the upgrade, rather than simply upgradin...
- 12:42 AM Bug #52488 (New): Pacific mon won't join Octopus mons
- I'm in the midst of an upgrade from Octopus to Pacific. Due to issues with the version of docker available on Debian ...
09/01/2021
- 11:55 PM Bug #52421 (Pending Backport): test tracker
- 07:24 PM Bug #52486 (Closed): test tracker: please ignore
- please ignore
- 05:18 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-08-31_22:30:47-rados-wip-yuri8-testing-2021-08-30-0930-pacific-distro-basic-smithi/6369129/remote/smith...
- 03:55 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Some possibly helpful hints:
1. In "my" specific instance, the pg address handed over to register_and_wake_split_chi...
08/31/2021
- 09:57 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
- 09:41 PM Bug #52421 (Resolved): test tracker
- 09:41 PM Backport #52475 (Resolved): octopus: test tracker
- 09:20 PM Backport #52475 (Resolved): octopus: test tracker
- 09:40 PM Backport #52474 (Resolved): nautilus: test tracker
- 08:52 PM Backport #52474 (Resolved): nautilus: test tracker
- 09:40 PM Backport #52466 (Resolved): pacific: test tracker
- 03:44 PM Backport #52466 (Resolved): pacific: test tracker
- 08:59 AM Bug #49697: prime pg temp: unexpected optimization
- Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af69e" can solve my is...
- 03:44 AM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
- Neha Ojha wrote:
> can you share your osdmap? are all your osds up and in? the crushmap looks fine.
wish to get y...
08/30/2021
- 04:59 PM Bug #52408: osds not peering correctly after startup
- Other requested info from this rebuild of the cluster:...
- 04:57 PM Bug #52408: osds not peering correctly after startup
- Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts or just 1. I ran it on ...
- 02:28 PM Bug #50657: smart query on monitors
- Yaarit Hatuka wrote:
> Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
We hav... - 08:56 AM Bug #50657 (Pending Backport): smart query on monitors
- 01:44 PM Backport #51605 (In Progress): pacific: bufferlist::splice() may cause stack corruption in buffer...
- 01:44 PM Backport #51604 (In Progress): octopus: bufferlist::splice() may cause stack corruption in buffer...
- 09:00 AM Backport #52451 (Resolved): octopus: smart query on monitors
- https://github.com/ceph/ceph/pull/44177
- 09:00 AM Backport #52450 (Resolved): pacific: smart query on monitors
- https://github.com/ceph/ceph/pull/44164
- 07:01 AM Bug #52448 (Fix Under Review): osd: pg may get stuck in backfill_toofull after backfill is interr...
- 06:51 AM Bug #52448 (Resolved): osd: pg may get stuck in backfill_toofull after backfill is interrupted du...
- Consider a scenario:
- Data is written to a pool so one osd X is close to full but still lower than nearfool/toofu...
08/28/2021
- 02:59 PM Bug #52445 (New): OSD asserts on starting too many pushes
- I am running ceph version 15.2.5 cluster in the recent days scrub reported error and few pg failed due to OSD's rando...
08/27/2021
- 04:28 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-08-26_18:40:53-rados-wip-yuri7-testing-2021-08-26-0841-distro-basic-smithi/6360450/remote/smithi052/log...
- 01:21 PM Bug #52421 (Pending Backport): test tracker
08/26/2021
- 09:55 PM Bug #52172 (Triaged): crash: ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsig...
- 09:51 PM Bug #52174 (Triaged): crash: ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsig...
- 09:46 PM Bug #52176 (Duplicate): crash: std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrus...
- 09:41 PM Bug #52178 (Duplicate): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(ret == 0)
- 09:37 PM Bug #52180 (Duplicate): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_...
- 09:37 PM Bug #47299 (New): Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- 09:33 PM Bug #52183 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
- 09:31 PM Bug #52186 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:29 PM Bug #52195 (Duplicate): crash: /lib64/libpthread.so.0(
- 09:26 PM Bug #52190 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:26 PM Bug #52191 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:25 PM Bug #52192 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:25 PM Bug #52193 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:25 PM Bug #52197 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:23 PM Bug #52198 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:22 PM Bug #52199 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:21 PM Bug #52200 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:18 PM Bug #52207 (Duplicate): crash: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<ch...
- 09:17 PM Bug #52210 (Closed): crash: CrushWrapper::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)
- One cluster reporting all the crashes, likely failing to decode due to a corrupted on disk state.
- 09:15 PM Bug #52211 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:13 PM Bug #52212 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:11 PM Bug #52213 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:10 PM Bug #52214 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:10 PM Bug #52217 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:10 PM Bug #52218 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:09 PM Bug #44715 (New): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->o...
- 09:07 PM Bug #52220: crash: void ECUtil::HashInfo::append(uint64_t, std::map<int, ceph::buffer::v15_2_0::l...
- One cluster reporting all the crashes.
- 09:06 PM Bug #52221 (Triaged): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:04 PM Bug #52143 (Duplicate): crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end())
- 09:00 PM Bug #52225: crash: void Thread::create(const char*, size_t): assert(ret == 0)
- One cluster is reporting all the crashes.
- 08:59 PM Bug #52226: crash: PosixNetworkStack::spawn_worker(unsigned int, std::function<void ()>&&)
- One cluster reporting all the crashes.
- 08:58 PM Bug #52231: crash: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::al...
- One cluster is reporting all the crashes.
- 08:56 PM Bug #52233: crash: void Infiniband::init(): assert(device)
- One cluster is reporting all the crashes.
- 08:19 PM Feature #52424 (Resolved): [RFE] Limit slow request details to mgr log
- Slow requests can overwhelm a cluster log with too many details, filling up the monitor DB.
There's no need to log... - 08:04 PM Feature #51984: [RFE] Provide warning when the 'require-osd-release' flag does not match current ...
- Please check - https://tracker.ceph.com/issues/52423
- 08:02 PM Feature #52423 (New): Do not allow running enable-msgr2 if cluster don't have osd release set to ...
- Do not allow running enable-msgr2 if cluster don't have osd release set to nautilus
See also - https://tracker.ceph.... - 07:53 PM Bug #50657: smart query on monitors
- Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
> Do you have a bug number for... - 07:30 PM Bug #50657: smart query on monitors
- > > Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
>
> Yes, bare metal d... - 11:00 AM Bug #50657: smart query on monitors
- Yaarit Hatuka wrote:
> This fixes the missing sudoers file in mon nodes:
> https://github.com/ceph/ceph/pull/42913
... - 07:49 PM Bug #52408: osds not peering correctly after startup
- Thanks for providing these logs, but they don't have debug_osd=20 (we need it on all the osds). The pg query for 1.7c...
- 10:46 AM Bug #52408: osds not peering correctly after startup
- Tore down and rebuild the cluster again using my quincy-based image. This time, I didn't create any filesystems. ceph...
- 06:14 PM Bug #52421 (Resolved): test tracker
- please ignore
- 05:58 PM Bug #52418 (New): workloads/dedup-io-snaps: ceph_assert(!context->check_oldest_snap_flushed(oid, ...
- /a/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356797...
- 05:12 PM Bug #52416 (Resolved): devices: mon devices appear empty when scraping SMART metrics
- When invoking smartctl on mon devices, the device name is empty:...
- 03:56 PM Bug #52415 (Closed): rocksdb: build error with rocksdb-6.22.x
- https://github.com/ceph/ceph/pull/42815
- 03:11 PM Bug #52415: rocksdb: build error with rocksdb-6.22.x
- possibly fixed by https://github.com/ceph/ceph/pull/42815?
- 01:58 PM Bug #52415 (Resolved): rocksdb: build error with rocksdb-6.22.x
- Fedora rawhide (f35, f36) have recently upgraded to rocksdb-6.22.1
Now ceph's rocksdb integration fails to compile... - 04:10 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- ...
08/25/2021
- 08:39 PM Bug #52408: osds not peering correctly after startup
- Tore down the old cluster and built a Pacific one (v16.2.5). That one doesn't have the same issue. I'll do a clean te...
- 07:44 PM Bug #52408: osds not peering correctly after startup
- peering info:...
- 06:53 PM Bug #52408: osds not peering correctly after startup
- Nothing in the logs for crashed osd.0. I think the last thing in the logs was a rocksdb dump. coredumpctl also didn't...
- 06:38 PM Bug #52408: osds not peering correctly after startup
- Jeff Layton wrote:
> This time when I brought it up, one osd didn't go "up". First two bits of info you asked for:
... - 06:17 PM Bug #52408: osds not peering correctly after startup
- This time when I brought it up, one osd didn't go "up". First two bits of info you asked for:...
- 05:33 PM Bug #52408: osds not peering correctly after startup
- 1. Can you try to reproduce this with 1 pool containing few pgs?
2. Turn the autoscaler off (ceph osd pool set foo p... - 01:46 PM Bug #52408: osds not peering correctly after startup
- My current build is based on upstream commit a49f10e760b4. It has some MDS patches on top, but nothing that should af...
- 01:45 PM Bug #52408 (Can't reproduce): osds not peering correctly after startup
- I might not have the right terminology here. I have a host that I run 3 VMs on that act as cephadm cluster nodes (mos...
- 04:00 AM Bug #50657 (Fix Under Review): smart query on monitors
- This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913
We'll address the fix f...
08/24/2021
- 09:54 PM Backport #52336: pacific: ceph df detail reports dirty objects without a cache tier
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42860
merged - 09:53 PM Backport #51830: pacific: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42615
merged - 08:12 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
- 05:54 PM Backport #51290 (In Progress): pacific: mon: stretch mode clusters do not sanely set default crus...
- 06:31 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2021-08-23_19:24:05-rados-wip-yuri4-testing-2021-08-23-0812-pacific-distro-basic-smithi/6353883
- 06:16 PM Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing()....
- Causing failures in pacific: /a/yuriw-2021-08-23_19:24:05-rados-wip-yuri4-testing-2021-08-23-0812-pacific-distro-basi...
- 10:45 AM Bug #50441 (Resolved): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- Dan Mick wrote:
> Deepika, was that the reason why?
yep Dan, Neha marked needs info because of MB's comment, mark... - 12:40 AM Bug #52385 (Closed): a possible data loss due to recovery_unfound PG after restarting all nodes
- Related to the discussion in ceph-users ML.
https://marc.info/?l=ceph-users&m=162947327817532&w=2
I encountered a...
Also available in: Atom