Project

General

Profile

Activity

From 05/19/2021 to 06/17/2021

06/17/2021

11:07 PM Bug #51083: Raw space filling up faster than used space
Jan-Philipp Litza wrote:
> Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems t...
Patrick Donnelly
09:13 PM Bug #51083: Raw space filling up faster than used space
Patrick: do you understand how upgrading the MDS daemons helped in this case? There is nothing in the osd/bluestore s... Neha Ojha
09:03 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
We definitely do not use cache tiering on any of our clusters. On the cluster above, we do use snapshots (via cephfs... Andras Pataki
08:48 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
It seems like you are using cache tiering, and there has been similar bugs reported like this. I don't understand why... Neha Ojha
09:01 PM Bug #51234 (Fix Under Review): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Sage Weil
08:57 PM Bug #50842 (Need More Info): pacific: recovery does not complete because of rw_manager lock not ...
Neha Ojha
08:53 PM Backport #51269 (In Progress): octopus: rados/perf: cosbench workloads hang forever
Deepika Upadhyay
07:14 PM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
https://github.com/ceph/ceph/pull/41922 Deepika Upadhyay
08:42 PM Bug #51074 (Pending Backport): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with...
marking Pending Backport, needs to be included with https://github.com/ceph/ceph/pull/41731 Neha Ojha
08:40 PM Bug #51168 (Need More Info): ceph-osd state machine crash during peering process
Can you please attach the osd log for this crash? Neha Ojha
08:07 PM Bug #51270 (Fix Under Review): mon: stretch mode clusters do not sanely set default crush rules
Greg Farnum
08:03 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
If you do not specify a crush rule when creating a pool, the OSDMonitor picks the default one for you out of the conf... Greg Farnum
07:12 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
Deepika Upadhyay
02:32 PM Backport #51237: nautilus: rebuild-mondb hangs
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41874
merged
Yuri Weinstein

06/16/2021

10:40 PM Bug #51254 (New): deep-scrub stat mismatch on last PG in pool
In the past few weeks, we got inconsistent PGs in deep-scrub a few times, always on the very last PG in the pool:
...
Andras Pataki
07:25 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Deepika Upadhyay
07:22 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/ceph/teuthology-archive/yuriw-2021-06-14_19:20:57-rados-wip-yuri6-testing-2021-06-14-1106-octopus-distro-basic-smith... Deepika Upadhyay
06:48 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Deepika Upadhyay
02:49 PM Bug #51246 (New): error in open_pools_parallel: rados_write(0.obj) failed with error: -2
... Deepika Upadhyay
01:22 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
> Will this patch be released in 14.2.22?
yes the PR has been merged to the nautilus branch, so it will be in the ...
Dan van der Ster
12:59 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
We hit this bug yesterday in a nautilus 14.2.18 cluster.
All monitors went down and started crashing on restart.
...
Rob Haverkamp
02:18 AM Backport #51237 (In Progress): nautilus: rebuild-mondb hangs
Kefu Chai
02:16 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
https://github.com/ceph/ceph/pull/41874 Backport Bot
02:13 AM Bug #38219 (Pending Backport): rebuild-mondb hangs
Kefu Chai

06/15/2021

08:05 PM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Just to note:
IMO ceph-bluestore-tool crash is caused by a bag in AvlAllocator and is a duplicate of https://tracker...
Igor Fedotov
06:59 PM Backport #51215 (In Progress): pacific: Global Recovery Event never completes
Kamoltat (Junior) Sirivadhna
12:55 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
Backport PR https://github.com/ceph/ceph/pull/41872 Backport Bot
06:50 PM Backport #50706: pacific: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41680
merged
Yuri Weinstein
06:47 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Neha: I did, but am afraid they are lost, the test was from https://pulpito.ceph.com/ideepika-2021-05-17_10:16:28-ra... Deepika Upadhyay
06:31 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Deepika, do you happen to have the logs saved somewhere? Neha Ojha
06:41 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
... Neha Ojha
06:30 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/osd-dispatch-... Neha Ojha
06:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Looks very similar... Neha Ojha
04:12 PM Backport #50750: octopus: max_misplaced was replaced by target_max_misplaced_ratio
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41624
merged
Yuri Weinstein
12:20 PM Bug #51223 (New): statfs: a cluster with filestore and bluestore OSD's will report bytes_used == ...
Cluster migrated from Luminous mixed bluestore+filestore OSD's to Nautilus 14.2.21
After last filestore OSD purged f...
Konstantin Shalygin
10:47 AM Bug #49677 (Resolved): debian ceph-common package post-inst clobbers ownership of cephadm log dirs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:47 AM Bug #49781 (Resolved): unittest_mempool.check_shard_select failed
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50558 (Resolved): Data loss propagation after backfill
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:42 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41762
m...
Loïc Dachary
10:41 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
Loïc Dachary
10:36 AM Backport #50153 (Resolved): nautilus: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41611
m...
Loïc Dachary
10:36 AM Backport #49729 (Resolved): nautilus: debian ceph-common package post-inst clobbers ownership of ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40698
m...
Loïc Dachary
10:32 AM Backport #50988: nautilus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41519
m...
Loïc Dachary
09:05 AM Backport #50406: pacific: mon: new monitors may direct MMonJoin to a peon instead of the leader
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41131
m...
Loïc Dachary
09:04 AM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41130
m...
Loïc Dachary
09:04 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41320
m...
Loïc Dachary
09:03 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41236
m...
Loïc Dachary
09:03 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41125
m...
Loïc Dachary
09:02 AM Backport #49992 (Resolved): pacific: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40566
m...
Loïc Dachary
12:54 AM Bug #49988 (Pending Backport): Global Recovery Event never completes
Neha Ojha

06/14/2021

03:24 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
For now, we do not have a global flag, like `ceph osd set noout` for the pg autoscale feature. We have pool flags[1] ... Vikhyat Umrao
09:24 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())

A related crash, happened when I disabled scrubbing:
-1> 2021-06-14T11:17:15.373+0200 7fb9916f5700 -1 /home/...
Andrej Filipcic

06/13/2021

03:34 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
To prevent the user IO from being blocked, we took this action:
1. First, we queried the unfound objects. osd.951 ...
Dan van der Ster
01:36 PM Bug #51194 (New): PG recovery_unfound after scrub repair failed on primary
This comes from a mail I send to the ceph-users ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3... Dan van der Ster
03:30 PM Backport #51195 (Resolved): pacific: [rfe] increase osd_max_write_op_reply_len default value to 6...
https://github.com/ceph/ceph/pull/53470 Backport Bot
03:28 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Kefu Chai

06/12/2021

12:38 AM Bug #49988 (Resolved): Global Recovery Event never completes
Kefu Chai

06/10/2021

08:35 PM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
Backport Bot
08:35 PM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
https://github.com/ceph/ceph/pull/44175 Backport Bot
08:35 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
https://github.com/ceph/ceph/pull/44176 Backport Bot
08:32 PM Bug #51002 (Pending Backport): regression in ceph daemonperf command output, osd columns aren't v...
Igor Fedotov
06:48 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
Dan van der Ster wrote:
> https://github.com/ceph/ceph/pull/41762
merged
Yuri Weinstein
03:58 PM Bug #51168 (New): ceph-osd state machine crash during peering process
... Yao Ning
02:22 PM Bug #51166 (Fix Under Review): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Matt Benjamin
02:16 PM Bug #51166 (Resolved): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
As agreed in #ceph-devel, Sage, Josh, Neha concurring.
Matt Benjamin
01:53 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
For the dead jobs, relevant logs have been uploaded to senta02 under /home/sseshasa/recovery_timeout.
Please let me ...
Sridhar Seshasayee
10:25 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especially when RBD writes excee... Andrej Filipcic

06/09/2021

01:32 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
From the logs of 6161181 snapshot recovery is not able to proceed since a rwlock on the head version
(3:cb63772d:::...
Sridhar Seshasayee
08:07 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Ran the same test repeatedly (5 times) on master by setting osd_op_queue to 'wpq' and 'mclock_scheduler' on different... Sridhar Seshasayee
10:39 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Kefu, yes I did read the update and your effort to find the commit(s) that caused the regression in the standalone te... Sridhar Seshasayee
10:25 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Sridhar, please read the https://tracker.ceph.com/issues/51074#note-3. that's my finding in the last 3 days. Kefu Chai
09:44 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Raised PR https://github.com/ceph/ceph/pull/41782 to address the test failure.
Please see latest update to https:/...
Sridhar Seshasayee
09:05 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
Backport Bot
09:05 AM Backport #51150 (Resolved): pacific: When read failed, ret can not take as data len, in FillInVer...
https://github.com/ceph/ceph/pull/44173 Backport Bot
09:05 AM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
https://github.com/ceph/ceph/pull/44174 Backport Bot
09:02 AM Bug #51115 (Pending Backport): When read failed, ret can not take as data len, in FillInVerifyExtent
Kefu Chai

06/08/2021

11:10 PM Bug #38219: rebuild-mondb hangs
http://qa-proxy.ceph.com/teuthology/yuriw-2021-06-08_20:53:36-rados-wip-yuri-testing-2021-06-04-0753-nautilus-distro-... Deepika Upadhyay
06:39 PM Bug #38219: rebuild-mondb hangs
2021-06-04T23:05:38.775 INFO:tasks.ceph.mon.a.smithi071.stderr:/build/ceph-14.2.21-305-gac8fcfa6/src/mon/OSDMonitor.c... Deepika Upadhyay
07:52 PM Backport #50797 (In Progress): pacific: mon: spawn loop after mon reinstalled
Neha Ojha
05:58 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41762 Dan van der Ster
04:50 PM Bug #50681: memstore: apparent memory leak when removing objects
Sven Anderson wrote:
> Greg Farnum wrote:
> > How long did you wait to see if memory usage dropped? Did you look at...
Greg Farnum
08:48 AM Bug #51074 (Triaged): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad dat...
Kefu Chai

06/07/2021

11:37 AM Backport #51117 (In Progress): pacific: osd: Run osd bench test to override default max osd capac...
Sridhar Seshasayee
10:25 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
https://github.com/ceph/ceph/pull/41731 Backport Bot
10:22 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
Sridhar Seshasayee
08:28 AM Bug #51115: When read failed, ret can not take as data len, in FillInVerifyExtent
https://github.com/ceph/ceph/pull/41727 yanqiang sun
08:12 AM Bug #51115 (Fix Under Review): When read failed, ret can not take as data len, in FillInVerifyExtent
Kefu Chai
07:42 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
when read failed, such as return -EIO, FillInVerifyExtent take ret as data length. yanqiang sun
06:59 AM Bug #51083: Raw space filling up faster than used space
Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems to have stopped the space was... Jan-Philipp Litza

06/06/2021

11:22 AM Feature #51110 (New): invalidate crc in buffer::ptr::c_str()
h3. what:
*buffer::ptr* (or more precisely, *buffer::raw*) has the ability to cache CRC codes that are calculated ...
Wenjun Huang

06/05/2021

04:14 PM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
not able to reproduce this issue locally. bisecting:
|0331281e8a74d0b744cdcede1db24e7fea4656fc | https://pulpito.c...
Kefu Chai
04:07 PM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
/a/kchai-2021-06-05_13:57:48-rados-master-distro-basic-smithi/6154221/ Kefu Chai
05:05 AM Bug #50441 (Pending Backport): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Kefu Chai

06/04/2021

11:01 PM Bug #51030 (Fix Under Review): osd crush during writing to EC pool when enabling jaeger tracing
Neha Ojha
10:44 PM Bug #50943 (Closed): mon crash due to assert failed
Luminous is EOL, can you please redeploy the monitor and upgrade to a supported version of Ceph. Please reopen this t... Neha Ojha
09:56 PM Bug #50308 (Resolved): mon: stretch state is inconsistently-maintained on peons, preventing prope...
Greg Farnum
09:55 PM Backport #50344 (Resolved): pacific: mon: stretch state is inconsistently-maintained on peons, pr...
Greg Farnum
09:55 PM Bug #50345 (Resolved): mon: new monitors may direct MMonJoin to a peon instead of the leader
Greg Farnum
09:54 PM Backport #50406 (Resolved): pacific: mon: new monitors may direct MMonJoin to a peon instead of t...
https://github.com/ceph/ceph/pull/41131 Greg Farnum
09:41 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://pulpito.ceph.com/gregf-2021-06-03_20:03:04-rados-pacific-mmonjoin-leader-testing-distro-basic-smithi/6150351/ Greg Farnum
08:56 PM Bug #50853: libcephsqlite: Core dump while running test_libcephsqlite.sh.
Also a little more information about what the test is doing: this stage is testing that libcephsqlite kills all I/O i... Patrick Donnelly
08:41 PM Bug #50853 (Need More Info): libcephsqlite: Core dump while running test_libcephsqlite.sh.
So, unfortunately I've been unable to get the correct debugging symbols for the core file so I haven't been able to g... Patrick Donnelly
07:13 PM Bug #51101 (Resolved): rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: ...
... Neha Ojha
06:27 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147408 Neha Ojha
06:25 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
/a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147404 Neha Ojha
06:24 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
/a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147462 - with logs! Neha Ojha
04:08 PM Bug #47440: nautilus: valgrind caught leak in Messenger::ms_deliver_verify_authorizer
... Deepika Upadhyay
06:46 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Yes, "debug paxos = 30" would definitely help! Sorry, I missed it because the previous set of...
wenge song
02:08 AM Bug #50813 (Duplicate): mon/OSDMonitor: should clear new flag when do destroy
the issue had fixed by https://github.com/ceph/ceph/commit/13393f6108a89973e0415caa61c6025c760a3930 Zengran Zhang

06/03/2021

09:49 PM Bug #50775: mds and osd unable to obtain rotating service keys
Yes, "debug paxos = 30" would definitely help! Sorry, I missed it because the previous set of logs that you shared h... Ilya Dryomov
06:12 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> These logs are still weird. Now there is plenty of update_from_paxos log messages but virtual...
wenge song
07:48 PM Bug #51083 (Need More Info): Raw space filling up faster than used space
We're seeing something strange currently. Our cluster is filling up faster than it should, and I assume it has someth... Jan-Philipp Litza
07:48 PM Backport #50153: nautilus: Reproduce https://tracker.ceph.com/issues/48417
Dan van der Ster wrote:
> Nautilus still has the buggy code in PG.cc (it was factored out to PeeringState.cc in octo...
Yuri Weinstein
07:28 PM Backport #49729: nautilus: debian ceph-common package post-inst clobbers ownership of cephadm log...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40698
merged
Yuri Weinstein
05:52 PM Backport #50704 (In Progress): nautilus: _delete_some additional unexpected onode list
Neha Ojha
04:11 PM Backport #50706 (In Progress): pacific: _delete_some additional unexpected onode list
Neha Ojha
10:34 AM Bug #51076 (Resolved): "wait_for_recovery: failed before timeout expired" during thrashosd test w...
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/6145021
Unfortunately t...
Sridhar Seshasayee
09:17 AM Bug #46847: Loss of placement information on OSD reboot
I had a look at the reproducer and am not entirely sure if it is equivalent to the problem discussed here. it might b... Frank Schilder
09:07 AM Bug #51074 (Resolved): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad da...
Observed on Master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450...
Sridhar Seshasayee
12:54 AM Bug #47654 (Resolved): test_mon_pg: mon fails to join quorum to due election strategy mismatch
Greg Farnum
12:54 AM Backport #50087 (Resolved): pacific: test_mon_pg: mon fails to join quorum to due election strate...
Greg Farnum

06/02/2021

08:40 PM Backport #50794: pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41320
merged
Yuri Weinstein
06:54 PM Backport #50702: pacific: Data loss propagation after backfill
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41236
merged
Yuri Weinstein
06:53 PM Backport #50606: pacific: osd/scheduler/mClockScheduler: Async reservers are not updated with the...
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/41125
merged
Yuri Weinstein
06:51 PM Backport #49992: pacific: unittest_mempool.check_shard_select failed
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40566
merged
Yuri Weinstein
06:46 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-05-25_19:21:19-rados-wip-yuri2-testing-2021-05-25-0940-pacific-distro-basic-smithi/6134490 Neha Ojha
06:38 PM Bug #50042: rados/test.sh: api_watch_notify failures
/a/yuriw-2021-05-25_19:21:19-rados-wip-yuri2-testing-2021-05-25-0940-pacific-distro-basic-smithi/6134471 Neha Ojha
11:46 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
Closing this since the issue was hit during teuthology testing of my PR: https://github.com/ceph/ceph/pull/41308. Thi... Sridhar Seshasayee
06:57 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
Observed on master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450...
Sridhar Seshasayee
06:54 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
Observed on master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450...
Sridhar Seshasayee
04:14 AM Bug #49962 (Resolved): 'sudo ceph --cluster ceph osd crush tunables default' fails due to valgrin...
Thanks Radoslaw! Patrick Donnelly
03:47 AM Bug #50853 (In Progress): libcephsqlite: Core dump while running test_libcephsqlite.sh.
Patrick Donnelly

06/01/2021

05:37 PM Bug #50853: libcephsqlite: Core dump while running test_libcephsqlite.sh.
/a/sage-2021-05-29_16:04:00-rados-wip-sage3-testing-2021-05-29-1009-distro-basic-smithi/6142109
Sage Weil
05:24 PM Bug #50743: *: crash in pthread_getname_np
... Patrick Donnelly
11:51 AM Backport #50750 (In Progress): octopus: max_misplaced was replaced by target_max_misplaced_ratio
Cory Snyder
11:50 AM Backport #50705 (In Progress): octopus: _delete_some additional unexpected onode list
Cory Snyder
11:49 AM Backport #50987 (In Progress): octopus: unaligned access to member variables of crush_work_bucket
Cory Snyder
11:48 AM Backport #50796 (In Progress): octopus: mon: spawn loop after mon reinstalled
Cory Snyder
11:47 AM Backport #50790 (In Progress): octopus: osd: write_trunc omitted to clear data digest
Cory Snyder
11:41 AM Backport #50990 (In Progress): octopus: mon: slow ops due to osd_failure
Cory Snyder
09:43 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
> I set the cluster into "maintenance mode", noout, norebalance, nobackfill, norecover. And then proceeded to reboot ... Dan van der Ster
09:24 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
Could be related to https://github.com/ceph/ceph/pull/40572 Dan van der Ster
07:57 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
https://tracker.ceph.com/issues/48060 is the same Jeremi A
09:19 AM Backport #50153: nautilus: Reproduce https://tracker.ceph.com/issues/48417
Nautilus still has the buggy code in PG.cc (it was factored out to PeeringState.cc in octopus and newer).
I backpo...
Dan van der Ster
08:54 AM Backport #50152 (In Progress): octopus: Reproduce https://tracker.ceph.com/issues/48417
Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
Copy it to something with backpor...
Dan van der Ster
07:58 AM Bug #48060: data loss in EC pool
I have had exactly the same issue with my cluster - https://tracker.ceph.com/issues/51024 while not even having any d... Jeremi A
06:03 AM Bug #51030: osd crush during writing to EC pool when enabling jaeger tracing
PR: https://github.com/ceph/ceph/pull/41604 Tomohiro Misono
05:48 AM Bug #51030 (Fix Under Review): osd crush during writing to EC pool when enabling jaeger tracing
On cent8(x86_64)
1. compiled with -DWITH_JAEGER=ON
2. starts vstart cluster
3. write to ec pool (i.e. rados benc...
Tomohiro Misono

05/31/2021

02:46 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
This issue is related to the changes currently under review: https://github.com/ceph/ceph/pull/41308
The ceph_obje...
Sridhar Seshasayee
02:01 PM Bug #50688 (Duplicate): Ceph can't be deployed using cephadm on nodes with /32 ip addresses
Kefu Chai
02:01 PM Bug #50688: Ceph can't be deployed using cephadm on nodes with /32 ip addresses
should have been fixed by https://github.com/ceph/ceph/pull/40961 Kefu Chai
07:56 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
I forgot to add.
I pulled v15.2.12 on the affected host, and also try running the OSD in that version. It didn't m...
Jeremi A
07:55 AM Bug #51024 (New): OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one...
Good day
I'm currently experiencing the same issue as with this gentleman: https://www.mail-archive.com/ceph-users...
Jeremi A

05/29/2021

08:04 AM Bug #50775: mds and osd unable to obtain rotating service keys
These logs are still weird. Now there is plenty of update_from_paxos log messages but virtually no paxosservice log ... Ilya Dryomov
02:25 AM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/a/kchai-2021-05-28_13:33:45-rados-wip-kefu-testing-2021-05-28-1806-distro-basic-smithi/6140866 Kefu Chai

05/27/2021

03:38 PM Backport #50988 (Resolved): nautilus: mon: slow ops due to osd_failure
Kefu Chai
03:16 PM Backport #50988: nautilus: mon: slow ops due to osd_failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41519
merged
Yuri Weinstein
07:24 AM Backport #50988 (In Progress): nautilus: mon: slow ops due to osd_failure
Kefu Chai
07:20 AM Backport #50988 (Resolved): nautilus: mon: slow ops due to osd_failure
https://github.com/ceph/ceph/pull/41519 Backport Bot
03:06 PM Bug #51002 (Fix Under Review): regression in ceph daemonperf command output, osd columns aren't v...
Igor Fedotov
03:03 PM Bug #51002 (Resolved): regression in ceph daemonperf command output, osd columns aren't visible a...
See the original list of columns from v14.2.11:
------bluefs------- ---------------bluestore--------------- ------...
Igor Fedotov
02:11 PM Bug #50950 (Won't Fix): MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causin...
Mimic is EOL, can you please upgrade to newer version and re-open this ticker if you continue to see this issue. Neha Ojha
02:06 PM Bug #51000 (Resolved): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
... Kefu Chai
11:07 AM Bug #43915: leaked Session (alloc from OSD::ms_handle_authentication)
remote/*/log/valgrind/osd.6.log.gz... Deepika Upadhyay
09:51 AM Bug #50441 (Fix Under Review): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Sebastian Wagner
07:20 AM Backport #50990 (Resolved): octopus: mon: slow ops due to osd_failure
https://github.com/ceph/ceph/pull/41618 Backport Bot
07:20 AM Backport #50989 (Resolved): pacific: mon: slow ops due to osd_failure
https://github.com/ceph/ceph/pull/41982 Backport Bot
07:20 AM Backport #50987 (Resolved): octopus: unaligned access to member variables of crush_work_bucket
https://github.com/ceph/ceph/pull/41622 Backport Bot
07:20 AM Backport #50986 (Resolved): pacific: unaligned access to member variables of crush_work_bucket
https://github.com/ceph/ceph/pull/41983 Backport Bot
07:20 AM Backport #50985 (Rejected): nautilus: unaligned access to member variables of crush_work_bucket
Backport Bot
07:19 AM Bug #50964 (Pending Backport): mon: slow ops due to osd_failure
Kefu Chai
07:18 AM Bug #50978 (Pending Backport): unaligned access to member variables of crush_work_bucket
Kefu Chai

05/26/2021

10:41 AM Bug #50978 (Resolved): unaligned access to member variables of crush_work_bucket
when compiled with ASan, it complains like... Kefu Chai
06:55 AM Bug #50775: mds and osd unable to obtain rotating service keys
I reproduced with "debug mon = 30", "debug monc = 30", "debug auth = 30" and "debug ms = 1" on all daemons,mon.a is l... wenge song
01:51 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Out of curiosity, how many iterations of bugshell does it take to reproduce? I might try it o...
wenge song

05/25/2021

10:24 PM Bug #50775: mds and osd unable to obtain rotating service keys
Out of curiosity, how many iterations of bugshell does it take to reproduce? I might try it on the weekend, but it w... Ilya Dryomov
10:21 PM Bug #50775: mds and osd unable to obtain rotating service keys
The logs that you provided are weird. Some log messages that should be there are not there. For example, I don't se... Ilya Dryomov
08:20 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
Here's a bit more info that may be useful. Only because it's a volume already exported to the container out of the bo... Andrew Davidoff
04:06 PM Bug #46847: Loss of placement information on OSD reboot
It seems it does find the data when I issue @ceph pg repeer $pgid@. Observed on MON 14.2.21 with all OSDs 14.2.15. Jonas Jelten
12:30 PM Cleanup #50925 (Fix Under Review): add backfill_unfound test
Mykola Golub
10:24 AM Bug #50950: MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck a...
And the what looks like from top:... Bin Guo
10:18 AM Bug #50950: MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck a...
Finally, I got the cpu killer stack:... Bin Guo
09:37 AM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
Neha, can you draw any conclusions from the above debug_osd=30 log with this issue? Tobias Urdin
09:36 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Neha Ojha wrote:
> Do the OSDs hitting this assert come up fine on restarting? or are they repeatedly hitting this a...
Tobias Urdin
06:46 AM Bug #50657: smart query on monitors
Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course
@ceph device ls@ doesn't list any devices...
Jan-Philipp Litza
06:38 AM Bug #47380: mon: slow ops due to osd_failure
https://github.com/ceph/ceph/pull/40033 failed to address this issue, i am creating another issue #50964 to track thi... Kefu Chai
06:37 AM Bug #50964 (Resolved): mon: slow ops due to osd_failure
... Kefu Chai

05/24/2021

09:46 PM Bug #49052 (Resolved): pick_a_shard() always select shard 0
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
09:42 PM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41238
m...
Loïc Dachary
09:35 PM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41321
m...
Loïc Dachary
09:30 PM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41237
m...
Loïc Dachary
09:25 PM Backport #49993 (Resolved): octopus: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m...
Loïc Dachary
09:25 PM Backport #49053 (Resolved): octopus: pick_a_shard() always select shard 0
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m...
Loïc Dachary
03:34 PM Bug #50657: smart query on monitors
Thanks, Jan-Philipp.
I tried to reproduce this issue and get the empty device name, while not having a sudoer perm...
Yaarit Hatuka
11:54 AM Bug #50775: mds and osd unable to obtain rotating service keys
bugshell is my test case,mon.b is a peon monitor wenge song
11:25 AM Bug #50775: mds and osd unable to obtain rotating service keys
wenge song wrote:
> mds.b unable to obtain rotating service keys,this is mds.b log
2021-05-24T18:48:22.934+0800 7...
wenge song
11:18 AM Bug #50775: mds and osd unable to obtain rotating service keys
mds.b unable to obtain rotating service keys,this is mds.b log wenge song
11:15 AM Bug #50775: mds and osd unable to obtain rotating service keys
mon leader log wenge song
07:48 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Just to be clear, are you saying that if the proposal with the new keys doesn't get sent becau...
wenge song
07:30 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Can you share the full monitor logs? Specifically, I'm interested in the log where the follow...
wenge song
10:03 AM Bug #50950 (Won't Fix): MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causin...
I'm using this mimic cluster (about 530 OSDs) for over 1 year, recently I found some particular OSDs randomly run int... Bin Guo
02:34 AM Bug #50943 (Closed): mon crash due to assert failed
Ceph version 12.2.11
3 mons, 1 mon can't start up due to assert failed
-6> 2021-05-20 16:11:32.755959 7fffd...
wencong wan

05/23/2021

08:53 PM Bug #50775: mds and osd unable to obtain rotating service keys
Just to be clear, are you saying that if the proposal with the new keys doesn't get sent because trigger_propose() re... Ilya Dryomov
08:36 PM Bug #50775: mds and osd unable to obtain rotating service keys
Can you share the full monitor logs? Specifically, I'm interested in the log where the following excerpt came from
...
Ilya Dryomov

05/21/2021

09:18 PM Bug #50829: nautilus: valgrind leak in SimpleMessenger
... Neha Ojha
03:11 PM Bug #50681: memstore: apparent memory leak when removing objects
The ceph-osd had a RES memory footprint of 2.6GB while I created above files. Sven Anderson
03:08 PM Bug #50681: memstore: apparent memory leak when removing objects
Greg Farnum wrote:
> How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool...
Sven Anderson
01:57 PM Backport #50793: octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41321
merged
Yuri Weinstein
09:22 AM Cleanup #50925 (Fix Under Review): add backfill_unfound test
Add a teuthology test that would use a scenarios similar to described in [1].
[1] https://tracker.ceph.com/issues/...
Mykola Golub

05/20/2021

06:07 PM Bug #48385: nautilus: statfs: a cluster with any up but out osd will report bytes_used == stored
Fixed starting 14.2.16 Igor Fedotov
05:04 PM Backport #50701: nautilus: Data loss propagation after backfill
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41238
merged
Yuri Weinstein
04:50 PM Backport #50911 (Rejected): nautilus: PGs always go into active+clean+scrubbing+deep+repair in th...
Backport Bot
04:50 PM Backport #50910 (Rejected): octopus: PGs always go into active+clean+scrubbing+deep+repair in the...
Backport Bot
04:46 PM Bug #50446: PGs always go into active+clean+scrubbing+deep+repair in the LRC
This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports. Neha Ojha
06:29 AM Bug #50446 (Pending Backport): PGs always go into active+clean+scrubbing+deep+repair in the LRC
Kefu Chai
12:12 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
JobId:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-basic-smithi/6118306
Obs...
Sridhar Seshasayee
11:58 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
Sridhar Seshasayee
10:05 AM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
Kefu Chai
09:57 AM Bug #50775: mds and osd unable to obtain rotating service keys
wenge song wrote:
> Ilya Dryomov wrote:
> > I posted https://github.com/ceph/ceph/pull/41368, please take a look. ...
wenge song
06:30 AM Backport #50900 (Resolved): pacific: PGs always go into active+clean+scrubbing+deep+repair in the...
https://github.com/ceph/ceph/pull/42398 Backport Bot
06:20 AM Backport #50893 (Resolved): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_s...
https://github.com/ceph/ceph/pull/46120 Backport Bot
06:17 AM Bug #50806 (Pending Backport): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.g...
Kefu Chai
12:40 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
I think pacific. Myoungwon Oh
01:39 AM Bug #50743: *: crash in pthread_getname_np
oh, I mean in general, not necessarily in this case.
This was opened automatically by a telemetry-to-redmine bot t...
Yaarit Hatuka

05/19/2021

11:32 PM Bug #50813 (Fix Under Review): mon/OSDMonitor: should clear new flag when do destroy
Neha Ojha
03:10 AM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
There are, at this time, three different versions of this problem as seen in https://tracker.ceph.com/issues/50042#no... Brad Hubbard
03:07 AM Bug #50042: rados/test.sh: api_watch_notify failures
/a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086155 is the same ... Brad Hubbard
01:37 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> I posted https://github.com/ceph/ceph/pull/41368, please take a look. It's probably not going...
wenge song
12:19 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
If the replica has a log entry for write on the object more recent than last_complete_ondisk (iirc), it will bounce t... Samuel Just
12:05 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
The following indicates that that it is not safe to do a balanced read from the secondary at this time. Making the "c... Neha Ojha
 

Also available in: Atom