Activity
From 05/19/2021 to 06/17/2021
06/17/2021
- 11:07 PM Bug #51083: Raw space filling up faster than used space
- Jan-Philipp Litza wrote:
> Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems t... - 09:13 PM Bug #51083: Raw space filling up faster than used space
- Patrick: do you understand how upgrading the MDS daemons helped in this case? There is nothing in the osd/bluestore s...
- 09:03 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
- We definitely do not use cache tiering on any of our clusters. On the cluster above, we do use snapshots (via cephfs...
- 08:48 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
- It seems like you are using cache tiering, and there has been similar bugs reported like this. I don't understand why...
- 09:01 PM Bug #51234 (Fix Under Review): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
- 08:57 PM Bug #50842 (Need More Info): pacific: recovery does not complete because of rw_manager lock not ...
- 08:53 PM Backport #51269 (In Progress): octopus: rados/perf: cosbench workloads hang forever
- 07:14 PM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
- https://github.com/ceph/ceph/pull/41922
- 08:42 PM Bug #51074 (Pending Backport): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with...
- marking Pending Backport, needs to be included with https://github.com/ceph/ceph/pull/41731
- 08:40 PM Bug #51168 (Need More Info): ceph-osd state machine crash during peering process
- Can you please attach the osd log for this crash?
- 08:07 PM Bug #51270 (Fix Under Review): mon: stretch mode clusters do not sanely set default crush rules
- 08:03 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
- If you do not specify a crush rule when creating a pool, the OSDMonitor picks the default one for you out of the conf...
- 07:12 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
- 02:32 PM Backport #51237: nautilus: rebuild-mondb hangs
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41874
merged
06/16/2021
- 10:40 PM Bug #51254 (New): deep-scrub stat mismatch on last PG in pool
- In the past few weeks, we got inconsistent PGs in deep-scrub a few times, always on the very last PG in the pool:
... - 07:25 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
- ...
- 07:22 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- /ceph/teuthology-archive/yuriw-2021-06-14_19:20:57-rados-wip-yuri6-testing-2021-06-14-1106-octopus-distro-basic-smith...
- 06:48 PM Bug #50042: rados/test.sh: api_watch_notify failures
- ...
- 02:49 PM Bug #51246 (New): error in open_pools_parallel: rados_write(0.obj) failed with error: -2
- ...
- 01:22 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- > Will this patch be released in 14.2.22?
yes the PR has been merged to the nautilus branch, so it will be in the ... - 12:59 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- We hit this bug yesterday in a nautilus 14.2.18 cluster.
All monitors went down and started crashing on restart.
... - 02:18 AM Backport #51237 (In Progress): nautilus: rebuild-mondb hangs
- 02:16 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
- https://github.com/ceph/ceph/pull/41874
- 02:13 AM Bug #38219 (Pending Backport): rebuild-mondb hangs
06/15/2021
- 08:05 PM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
- Just to note:
IMO ceph-bluestore-tool crash is caused by a bag in AvlAllocator and is a duplicate of https://tracker... - 06:59 PM Backport #51215 (In Progress): pacific: Global Recovery Event never completes
- 12:55 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
- Backport PR https://github.com/ceph/ceph/pull/41872
- 06:50 PM Backport #50706: pacific: _delete_some additional unexpected onode list
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41680
merged - 06:47 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
- @Neha: I did, but am afraid they are lost, the test was from https://pulpito.ceph.com/ideepika-2021-05-17_10:16:28-ra...
- 06:31 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
- @Deepika, do you happen to have the logs saved somewhere?
- 06:41 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
- ...
- 06:30 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/osd-dispatch-...
- 06:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- Looks very similar...
- 04:12 PM Backport #50750: octopus: max_misplaced was replaced by target_max_misplaced_ratio
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41624
merged - 12:20 PM Bug #51223 (New): statfs: a cluster with filestore and bluestore OSD's will report bytes_used == ...
- Cluster migrated from Luminous mixed bluestore+filestore OSD's to Nautilus 14.2.21
After last filestore OSD purged f... - 10:47 AM Bug #49677 (Resolved): debian ceph-common package post-inst clobbers ownership of cephadm log dirs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:47 AM Bug #49781 (Resolved): unittest_mempool.check_shard_select failed
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:44 AM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:44 AM Bug #50558 (Resolved): Data loss propagation after backfill
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:42 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41762
m... - 10:41 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
- 10:36 AM Backport #50153 (Resolved): nautilus: Reproduce https://tracker.ceph.com/issues/48417
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41611
m... - 10:36 AM Backport #49729 (Resolved): nautilus: debian ceph-common package post-inst clobbers ownership of ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40698
m... - 10:32 AM Backport #50988: nautilus: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41519
m... - 09:05 AM Backport #50406: pacific: mon: new monitors may direct MMonJoin to a peon instead of the leader
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41131
m... - 09:04 AM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41130
m... - 09:04 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41320
m... - 09:03 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41236
m... - 09:03 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41125
m... - 09:02 AM Backport #49992 (Resolved): pacific: unittest_mempool.check_shard_select failed
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40566
m... - 12:54 AM Bug #49988 (Pending Backport): Global Recovery Event never completes
06/14/2021
- 03:24 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- For now, we do not have a global flag, like `ceph osd set noout` for the pg autoscale feature. We have pool flags[1] ...
- 09:24 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
A related crash, happened when I disabled scrubbing:
-1> 2021-06-14T11:17:15.373+0200 7fb9916f5700 -1 /home/...
06/13/2021
- 03:34 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
- To prevent the user IO from being blocked, we took this action:
1. First, we queried the unfound objects. osd.951 ... - 01:36 PM Bug #51194 (New): PG recovery_unfound after scrub repair failed on primary
- This comes from a mail I send to the ceph-users ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3...
- 03:30 PM Backport #51195 (Resolved): pacific: [rfe] increase osd_max_write_op_reply_len default value to 6...
- https://github.com/ceph/ceph/pull/53470
- 03:28 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
06/12/2021
06/10/2021
- 08:35 PM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
- 08:35 PM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
- https://github.com/ceph/ceph/pull/44175
- 08:35 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
- https://github.com/ceph/ceph/pull/44176
- 08:32 PM Bug #51002 (Pending Backport): regression in ceph daemonperf command output, osd columns aren't v...
- 06:48 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
- Dan van der Ster wrote:
> https://github.com/ceph/ceph/pull/41762
merged - 03:58 PM Bug #51168 (New): ceph-osd state machine crash during peering process
- ...
- 02:22 PM Bug #51166 (Fix Under Review): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
- 02:16 PM Bug #51166 (Resolved): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
- As agreed in #ceph-devel, Sage, Josh, Neha concurring.
- 01:53 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- For the dead jobs, relevant logs have been uploaded to senta02 under /home/sseshasa/recovery_timeout.
Please let me ... - 10:25 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
- On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especially when RBD writes excee...
06/09/2021
- 01:32 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- From the logs of 6161181 snapshot recovery is not able to proceed since a rwlock on the head version
(3:cb63772d:::... - 08:07 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- Ran the same test repeatedly (5 times) on master by setting osd_op_queue to 'wpq' and 'mclock_scheduler' on different...
- 10:39 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- Kefu, yes I did read the update and your effort to find the commit(s) that caused the regression in the standalone te...
- 10:25 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- Sridhar, please read the https://tracker.ceph.com/issues/51074#note-3. that's my finding in the last 3 days.
- 09:44 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- Raised PR https://github.com/ceph/ceph/pull/41782 to address the test failure.
Please see latest update to https:/... - 09:05 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
- 09:05 AM Backport #51150 (Resolved): pacific: When read failed, ret can not take as data len, in FillInVer...
- https://github.com/ceph/ceph/pull/44173
- 09:05 AM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
- https://github.com/ceph/ceph/pull/44174
- 09:02 AM Bug #51115 (Pending Backport): When read failed, ret can not take as data len, in FillInVerifyExtent
06/08/2021
- 11:10 PM Bug #38219: rebuild-mondb hangs
- http://qa-proxy.ceph.com/teuthology/yuriw-2021-06-08_20:53:36-rados-wip-yuri-testing-2021-06-04-0753-nautilus-distro-...
- 06:39 PM Bug #38219: rebuild-mondb hangs
- 2021-06-04T23:05:38.775 INFO:tasks.ceph.mon.a.smithi071.stderr:/build/ceph-14.2.21-305-gac8fcfa6/src/mon/OSDMonitor.c...
- 07:52 PM Backport #50797 (In Progress): pacific: mon: spawn loop after mon reinstalled
- 05:58 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
- https://github.com/ceph/ceph/pull/41762
- 04:50 PM Bug #50681: memstore: apparent memory leak when removing objects
- Sven Anderson wrote:
> Greg Farnum wrote:
> > How long did you wait to see if memory usage dropped? Did you look at... - 08:48 AM Bug #51074 (Triaged): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad dat...
06/07/2021
- 11:37 AM Backport #51117 (In Progress): pacific: osd: Run osd bench test to override default max osd capac...
- 10:25 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
- https://github.com/ceph/ceph/pull/41731
- 10:22 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
- 08:28 AM Bug #51115: When read failed, ret can not take as data len, in FillInVerifyExtent
- https://github.com/ceph/ceph/pull/41727
- 08:12 AM Bug #51115 (Fix Under Review): When read failed, ret can not take as data len, in FillInVerifyExtent
- 07:42 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
- when read failed, such as return -EIO, FillInVerifyExtent take ret as data length.
- 06:59 AM Bug #51083: Raw space filling up faster than used space
- Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems to have stopped the space was...
06/06/2021
- 11:22 AM Feature #51110 (New): invalidate crc in buffer::ptr::c_str()
- h3. what:
*buffer::ptr* (or more precisely, *buffer::raw*) has the ability to cache CRC codes that are calculated ...
06/05/2021
- 04:14 PM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- not able to reproduce this issue locally. bisecting:
|0331281e8a74d0b744cdcede1db24e7fea4656fc | https://pulpito.c... - 04:07 PM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- /a/kchai-2021-06-05_13:57:48-rados-master-distro-basic-smithi/6154221/
- 05:05 AM Bug #50441 (Pending Backport): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
06/04/2021
- 11:01 PM Bug #51030 (Fix Under Review): osd crush during writing to EC pool when enabling jaeger tracing
- 10:44 PM Bug #50943 (Closed): mon crash due to assert failed
- Luminous is EOL, can you please redeploy the monitor and upgrade to a supported version of Ceph. Please reopen this t...
- 09:56 PM Bug #50308 (Resolved): mon: stretch state is inconsistently-maintained on peons, preventing prope...
- 09:55 PM Backport #50344 (Resolved): pacific: mon: stretch state is inconsistently-maintained on peons, pr...
- 09:55 PM Bug #50345 (Resolved): mon: new monitors may direct MMonJoin to a peon instead of the leader
- 09:54 PM Backport #50406 (Resolved): pacific: mon: new monitors may direct MMonJoin to a peon instead of t...
- https://github.com/ceph/ceph/pull/41131
- 09:41 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- https://pulpito.ceph.com/gregf-2021-06-03_20:03:04-rados-pacific-mmonjoin-leader-testing-distro-basic-smithi/6150351/
- 08:56 PM Bug #50853: libcephsqlite: Core dump while running test_libcephsqlite.sh.
- Also a little more information about what the test is doing: this stage is testing that libcephsqlite kills all I/O i...
- 08:41 PM Bug #50853 (Need More Info): libcephsqlite: Core dump while running test_libcephsqlite.sh.
- So, unfortunately I've been unable to get the correct debugging symbols for the core file so I haven't been able to g...
- 07:13 PM Bug #51101 (Resolved): rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: ...
- ...
- 06:27 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147408
- 06:25 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
- /a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147404
- 06:24 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- /a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147462 - with logs!
- 04:08 PM Bug #47440: nautilus: valgrind caught leak in Messenger::ms_deliver_verify_authorizer
- ...
- 06:46 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Yes, "debug paxos = 30" would definitely help! Sorry, I missed it because the previous set of... - 02:08 AM Bug #50813 (Duplicate): mon/OSDMonitor: should clear new flag when do destroy
- the issue had fixed by https://github.com/ceph/ceph/commit/13393f6108a89973e0415caa61c6025c760a3930
06/03/2021
- 09:49 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Yes, "debug paxos = 30" would definitely help! Sorry, I missed it because the previous set of logs that you shared h...
- 06:12 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> These logs are still weird. Now there is plenty of update_from_paxos log messages but virtual... - 07:48 PM Bug #51083 (Need More Info): Raw space filling up faster than used space
- We're seeing something strange currently. Our cluster is filling up faster than it should, and I assume it has someth...
- 07:48 PM Backport #50153: nautilus: Reproduce https://tracker.ceph.com/issues/48417
- Dan van der Ster wrote:
> Nautilus still has the buggy code in PG.cc (it was factored out to PeeringState.cc in octo... - 07:28 PM Backport #49729: nautilus: debian ceph-common package post-inst clobbers ownership of cephadm log...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40698
merged - 05:52 PM Backport #50704 (In Progress): nautilus: _delete_some additional unexpected onode list
- 04:11 PM Backport #50706 (In Progress): pacific: _delete_some additional unexpected onode list
- 10:34 AM Bug #51076 (Resolved): "wait_for_recovery: failed before timeout expired" during thrashosd test w...
- /a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/6145021
Unfortunately t... - 09:17 AM Bug #46847: Loss of placement information on OSD reboot
- I had a look at the reproducer and am not entirely sure if it is equivalent to the problem discussed here. it might b...
- 09:07 AM Bug #51074 (Resolved): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad da...
- Observed on Master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450... - 12:54 AM Bug #47654 (Resolved): test_mon_pg: mon fails to join quorum to due election strategy mismatch
- 12:54 AM Backport #50087 (Resolved): pacific: test_mon_pg: mon fails to join quorum to due election strate...
06/02/2021
- 08:40 PM Backport #50794: pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41320
merged - 06:54 PM Backport #50702: pacific: Data loss propagation after backfill
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41236
merged - 06:53 PM Backport #50606: pacific: osd/scheduler/mClockScheduler: Async reservers are not updated with the...
- Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/41125
merged - 06:51 PM Backport #49992: pacific: unittest_mempool.check_shard_select failed
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40566
merged - 06:46 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2021-05-25_19:21:19-rados-wip-yuri2-testing-2021-05-25-0940-pacific-distro-basic-smithi/6134490
- 06:38 PM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2021-05-25_19:21:19-rados-wip-yuri2-testing-2021-05-25-0940-pacific-distro-basic-smithi/6134471
- 11:46 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
- Closing this since the issue was hit during teuthology testing of my PR: https://github.com/ceph/ceph/pull/41308. Thi...
- 06:57 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- Observed on master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450... - 06:54 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Observed on master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450... - 04:14 AM Bug #49962 (Resolved): 'sudo ceph --cluster ceph osd crush tunables default' fails due to valgrin...
- Thanks Radoslaw!
- 03:47 AM Bug #50853 (In Progress): libcephsqlite: Core dump while running test_libcephsqlite.sh.
06/01/2021
- 05:37 PM Bug #50853: libcephsqlite: Core dump while running test_libcephsqlite.sh.
- /a/sage-2021-05-29_16:04:00-rados-wip-sage3-testing-2021-05-29-1009-distro-basic-smithi/6142109
- 05:24 PM Bug #50743: *: crash in pthread_getname_np
- ...
- 11:51 AM Backport #50750 (In Progress): octopus: max_misplaced was replaced by target_max_misplaced_ratio
- 11:50 AM Backport #50705 (In Progress): octopus: _delete_some additional unexpected onode list
- 11:49 AM Backport #50987 (In Progress): octopus: unaligned access to member variables of crush_work_bucket
- 11:48 AM Backport #50796 (In Progress): octopus: mon: spawn loop after mon reinstalled
- 11:47 AM Backport #50790 (In Progress): octopus: osd: write_trunc omitted to clear data digest
- 11:41 AM Backport #50990 (In Progress): octopus: mon: slow ops due to osd_failure
- 09:43 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
- > I set the cluster into "maintenance mode", noout, norebalance, nobackfill, norecover. And then proceeded to reboot ...
- 09:24 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
- Could be related to https://github.com/ceph/ceph/pull/40572
- 07:57 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
- https://tracker.ceph.com/issues/48060 is the same
- 09:19 AM Backport #50153: nautilus: Reproduce https://tracker.ceph.com/issues/48417
- Nautilus still has the buggy code in PG.cc (it was factored out to PeeringState.cc in octopus and newer).
I backpo... - 08:54 AM Backport #50152 (In Progress): octopus: Reproduce https://tracker.ceph.com/issues/48417
- Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
Copy it to something with backpor... - 07:58 AM Bug #48060: data loss in EC pool
- I have had exactly the same issue with my cluster - https://tracker.ceph.com/issues/51024 while not even having any d...
- 06:03 AM Bug #51030: osd crush during writing to EC pool when enabling jaeger tracing
- PR: https://github.com/ceph/ceph/pull/41604
- 05:48 AM Bug #51030 (Fix Under Review): osd crush during writing to EC pool when enabling jaeger tracing
- On cent8(x86_64)
1. compiled with -DWITH_JAEGER=ON
2. starts vstart cluster
3. write to ec pool (i.e. rados benc...
05/31/2021
- 02:46 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
- This issue is related to the changes currently under review: https://github.com/ceph/ceph/pull/41308
The ceph_obje... - 02:01 PM Bug #50688 (Duplicate): Ceph can't be deployed using cephadm on nodes with /32 ip addresses
- 02:01 PM Bug #50688: Ceph can't be deployed using cephadm on nodes with /32 ip addresses
- should have been fixed by https://github.com/ceph/ceph/pull/40961
- 07:56 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
- I forgot to add.
I pulled v15.2.12 on the affected host, and also try running the OSD in that version. It didn't m... - 07:55 AM Bug #51024 (New): OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one...
- Good day
I'm currently experiencing the same issue as with this gentleman: https://www.mail-archive.com/ceph-users...
05/29/2021
- 08:04 AM Bug #50775: mds and osd unable to obtain rotating service keys
- These logs are still weird. Now there is plenty of update_from_paxos log messages but virtually no paxosservice log ...
- 02:25 AM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- /a/kchai-2021-05-28_13:33:45-rados-wip-kefu-testing-2021-05-28-1806-distro-basic-smithi/6140866
05/27/2021
- 03:38 PM Backport #50988 (Resolved): nautilus: mon: slow ops due to osd_failure
- 03:16 PM Backport #50988: nautilus: mon: slow ops due to osd_failure
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41519
merged - 07:24 AM Backport #50988 (In Progress): nautilus: mon: slow ops due to osd_failure
- 07:20 AM Backport #50988 (Resolved): nautilus: mon: slow ops due to osd_failure
- https://github.com/ceph/ceph/pull/41519
- 03:06 PM Bug #51002 (Fix Under Review): regression in ceph daemonperf command output, osd columns aren't v...
- 03:03 PM Bug #51002 (Resolved): regression in ceph daemonperf command output, osd columns aren't visible a...
- See the original list of columns from v14.2.11:
------bluefs------- ---------------bluestore--------------- ------... - 02:11 PM Bug #50950 (Won't Fix): MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causin...
- Mimic is EOL, can you please upgrade to newer version and re-open this ticker if you continue to see this issue.
- 02:06 PM Bug #51000 (Resolved): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- ...
- 11:07 AM Bug #43915: leaked Session (alloc from OSD::ms_handle_authentication)
- remote/*/log/valgrind/osd.6.log.gz...
- 09:51 AM Bug #50441 (Fix Under Review): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
- 07:20 AM Backport #50990 (Resolved): octopus: mon: slow ops due to osd_failure
- https://github.com/ceph/ceph/pull/41618
- 07:20 AM Backport #50989 (Resolved): pacific: mon: slow ops due to osd_failure
- https://github.com/ceph/ceph/pull/41982
- 07:20 AM Backport #50987 (Resolved): octopus: unaligned access to member variables of crush_work_bucket
- https://github.com/ceph/ceph/pull/41622
- 07:20 AM Backport #50986 (Resolved): pacific: unaligned access to member variables of crush_work_bucket
- https://github.com/ceph/ceph/pull/41983
- 07:20 AM Backport #50985 (Rejected): nautilus: unaligned access to member variables of crush_work_bucket
- 07:19 AM Bug #50964 (Pending Backport): mon: slow ops due to osd_failure
- 07:18 AM Bug #50978 (Pending Backport): unaligned access to member variables of crush_work_bucket
05/26/2021
- 10:41 AM Bug #50978 (Resolved): unaligned access to member variables of crush_work_bucket
- when compiled with ASan, it complains like...
- 06:55 AM Bug #50775: mds and osd unable to obtain rotating service keys
- I reproduced with "debug mon = 30", "debug monc = 30", "debug auth = 30" and "debug ms = 1" on all daemons,mon.a is l...
- 01:51 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Out of curiosity, how many iterations of bugshell does it take to reproduce? I might try it o...
05/25/2021
- 10:24 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Out of curiosity, how many iterations of bugshell does it take to reproduce? I might try it on the weekend, but it w...
- 10:21 PM Bug #50775: mds and osd unable to obtain rotating service keys
- The logs that you provided are weird. Some log messages that should be there are not there. For example, I don't se...
- 08:20 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- Here's a bit more info that may be useful. Only because it's a volume already exported to the container out of the bo...
- 04:06 PM Bug #46847: Loss of placement information on OSD reboot
- It seems it does find the data when I issue @ceph pg repeer $pgid@. Observed on MON 14.2.21 with all OSDs 14.2.15.
- 12:30 PM Cleanup #50925 (Fix Under Review): add backfill_unfound test
- 10:24 AM Bug #50950: MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck a...
- And the what looks like from top:...
- 10:18 AM Bug #50950: MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck a...
- Finally, I got the cpu killer stack:...
- 09:37 AM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- Neha, can you draw any conclusions from the above debug_osd=30 log with this issue?
- 09:36 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Neha Ojha wrote:
> Do the OSDs hitting this assert come up fine on restarting? or are they repeatedly hitting this a... - 06:46 AM Bug #50657: smart query on monitors
- Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course
@ceph device ls@ doesn't list any devices... - 06:38 AM Bug #47380: mon: slow ops due to osd_failure
- https://github.com/ceph/ceph/pull/40033 failed to address this issue, i am creating another issue #50964 to track thi...
- 06:37 AM Bug #50964 (Resolved): mon: slow ops due to osd_failure
- ...
05/24/2021
- 09:46 PM Bug #49052 (Resolved): pick_a_shard() always select shard 0
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:42 PM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41238
m... - 09:35 PM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41321
m... - 09:30 PM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41237
m... - 09:25 PM Backport #49993 (Resolved): octopus: unittest_mempool.check_shard_select failed
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m... - 09:25 PM Backport #49053 (Resolved): octopus: pick_a_shard() always select shard 0
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m... - 03:34 PM Bug #50657: smart query on monitors
- Thanks, Jan-Philipp.
I tried to reproduce this issue and get the empty device name, while not having a sudoer perm... - 11:54 AM Bug #50775: mds and osd unable to obtain rotating service keys
- bugshell is my test case,mon.b is a peon monitor
- 11:25 AM Bug #50775: mds and osd unable to obtain rotating service keys
- wenge song wrote:
> mds.b unable to obtain rotating service keys,this is mds.b log
2021-05-24T18:48:22.934+0800 7... - 11:18 AM Bug #50775: mds and osd unable to obtain rotating service keys
- mds.b unable to obtain rotating service keys,this is mds.b log
- 11:15 AM Bug #50775: mds and osd unable to obtain rotating service keys
- mon leader log
- 07:48 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Just to be clear, are you saying that if the proposal with the new keys doesn't get sent becau... - 07:30 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Can you share the full monitor logs? Specifically, I'm interested in the log where the follow... - 10:03 AM Bug #50950 (Won't Fix): MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causin...
- I'm using this mimic cluster (about 530 OSDs) for over 1 year, recently I found some particular OSDs randomly run int...
- 02:34 AM Bug #50943 (Closed): mon crash due to assert failed
- Ceph version 12.2.11
3 mons, 1 mon can't start up due to assert failed
-6> 2021-05-20 16:11:32.755959 7fffd...
05/23/2021
- 08:53 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Just to be clear, are you saying that if the proposal with the new keys doesn't get sent because trigger_propose() re...
- 08:36 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Can you share the full monitor logs? Specifically, I'm interested in the log where the following excerpt came from
...
05/21/2021
- 09:18 PM Bug #50829: nautilus: valgrind leak in SimpleMessenger
- ...
- 03:11 PM Bug #50681: memstore: apparent memory leak when removing objects
- The ceph-osd had a RES memory footprint of 2.6GB while I created above files.
- 03:08 PM Bug #50681: memstore: apparent memory leak when removing objects
- Greg Farnum wrote:
> How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool... - 01:57 PM Backport #50793: octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41321
merged - 09:22 AM Cleanup #50925 (Fix Under Review): add backfill_unfound test
- Add a teuthology test that would use a scenarios similar to described in [1].
[1] https://tracker.ceph.com/issues/...
05/20/2021
- 06:07 PM Bug #48385: nautilus: statfs: a cluster with any up but out osd will report bytes_used == stored
- Fixed starting 14.2.16
- 05:04 PM Backport #50701: nautilus: Data loss propagation after backfill
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41238
merged - 04:50 PM Backport #50911 (Rejected): nautilus: PGs always go into active+clean+scrubbing+deep+repair in th...
- 04:50 PM Backport #50910 (Rejected): octopus: PGs always go into active+clean+scrubbing+deep+repair in the...
- 04:46 PM Bug #50446: PGs always go into active+clean+scrubbing+deep+repair in the LRC
- This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports.
- 06:29 AM Bug #50446 (Pending Backport): PGs always go into active+clean+scrubbing+deep+repair in the LRC
- 12:12 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
- JobId:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-basic-smithi/6118306
Obs... - 11:58 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
- 10:05 AM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
- 09:57 AM Bug #50775: mds and osd unable to obtain rotating service keys
- wenge song wrote:
> Ilya Dryomov wrote:
> > I posted https://github.com/ceph/ceph/pull/41368, please take a look. ... - 06:30 AM Backport #50900 (Resolved): pacific: PGs always go into active+clean+scrubbing+deep+repair in the...
- https://github.com/ceph/ceph/pull/42398
- 06:20 AM Backport #50893 (Resolved): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_s...
- https://github.com/ceph/ceph/pull/46120
- 06:17 AM Bug #50806 (Pending Backport): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.g...
- 12:40 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- I think pacific.
- 01:39 AM Bug #50743: *: crash in pthread_getname_np
- oh, I mean in general, not necessarily in this case.
This was opened automatically by a telemetry-to-redmine bot t...
05/19/2021
- 11:32 PM Bug #50813 (Fix Under Review): mon/OSDMonitor: should clear new flag when do destroy
- 03:10 AM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
- There are, at this time, three different versions of this problem as seen in https://tracker.ceph.com/issues/50042#no...
- 03:07 AM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086155 is the same ...
- 01:37 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> I posted https://github.com/ceph/ceph/pull/41368, please take a look. It's probably not going... - 12:19 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
- If the replica has a log entry for write on the object more recent than last_complete_ondisk (iirc), it will bounce t...
- 12:05 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
- The following indicates that that it is not safe to do a balanced read from the secondary at this time. Making the "c...
Also available in: Atom