Project

General

Profile

Activity

From 05/28/2021 to 06/26/2021

06/26/2021

02:27 PM Backport #50986: pacific: unaligned access to member variables of crush_work_bucket
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41983
merged
Yuri Weinstein
02:26 PM Backport #50989: pacific: mon: slow ops due to osd_failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41982
merged
Yuri Weinstein

06/25/2021

09:43 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
/a/yuriw-2021-06-24_16:54:31-rados-wip-yuri-testing-2021-06-24-0708-pacific-distro-basic-smithi/6190738 Neha Ojha
03:50 PM Backport #51371 (Resolved): pacific: OSD crash FAILED ceph_assert(!is_scrubbing())
https://github.com/ceph/ceph/pull/41944 Backport Bot
03:48 PM Bug #50346 (Pending Backport): OSD crash FAILED ceph_assert(!is_scrubbing())
Neha Ojha
06:48 AM Backport #50990 (Resolved): octopus: mon: slow ops due to osd_failure
Kefu Chai

06/24/2021

11:22 PM Backport #50791 (In Progress): pacific: osd: write_trunc omitted to clear data digest
Neha Ojha

06/23/2021

10:43 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Andrej Filipcic wrote:
> A related crash, happened when I disabled scrubbing:
>
> -1> 2021-06-14T11:17:15.373...
Neha Ojha
10:42 PM Bug #51338 (Duplicate): osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*&g...
Originally reported in https://tracker.ceph.com/issues/50346#note-6... Neha Ojha
06:16 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged
Yuri Weinstein
03:31 PM Backport #50797: pacific: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41768
merged
Yuri Weinstein
03:30 PM Bug #49988: Global Recovery Event never completes
https://github.com/ceph/ceph/pull/41872 merged Yuri Weinstein

06/22/2021

10:47 PM Backport #50986 (In Progress): pacific: unaligned access to member variables of crush_work_bucket
Neha Ojha
10:44 PM Backport #50989 (In Progress): pacific: mon: slow ops due to osd_failure
Neha Ojha
05:21 PM Backport #50705: octopus: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41623
merged
Yuri Weinstein
05:21 PM Backport #50152: octopus: Reproduce https://tracker.ceph.com/issues/48417
Dan van der Ster wrote:
> Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
> Copy...
Yuri Weinstein
11:23 AM Backport #51315 (In Progress): nautilus: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51315 (Resolved): nautilus: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41973 Mykola Golub
11:11 AM Backport #51314 (In Progress): octopus: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51314 (Resolved): octopus: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41972 Mykola Golub
10:56 AM Backport #51313 (In Progress): pacific: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41971 Mykola Golub
10:55 AM Backport #51316 (Duplicate): nautilus: osd:scrub skip some pg
Backport Bot
10:53 AM Bug #49487 (Pending Backport): osd:scrub skip some pg
Mykola Golub
10:28 AM Bug #50346 (Fix Under Review): OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen Friedman
06:59 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Andrej Filipcic wrote:
> On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especi...
玮文 胡

06/21/2021

08:50 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
FYI I tried with ceph/daemon-base:master-24e1f91-pacific-centos-8-x86_64 (the latest non-devel build at this time) ju... Andrew Davidoff
06:07 PM Bug #51307 (Resolved): LibRadosWatchNotify.Watch2Delete fails
... Sage Weil
03:40 PM Bug #51270: mon: stretch mode clusters do not sanely set default crush rules
Accidentally requested backports to Octopus/Nautilus, so nuking those. Greg Farnum
03:40 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
Accidental backport request Greg Farnum
03:40 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
Accidental backport request Greg Farnum

06/20/2021

11:58 AM Bug #50346 (In Progress): OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen Friedman

06/19/2021

03:00 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
https://github.com/ceph/ceph/pull/42909 Backport Bot
03:00 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
Backport Bot
03:00 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
Backport Bot
02:58 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
Kefu Chai
01:15 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
https://github.com/ceph/ceph/pull/46677 Backport Bot
01:12 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Kefu Chai
01:11 PM Bug #51234 (Resolved): LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
Kefu Chai
02:19 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
I think this one is related?
/ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-2021061...
Patrick Donnelly

06/18/2021

09:15 PM Bug #51083: Raw space filling up faster than used space
I don't have any ideas from the logs. Moving this back to RADOS. I doubt it has anything to do with CephFS. Patrick Donnelly
11:56 AM Bug #51083: Raw space filling up faster than used space
Patrick Donnelly wrote:
> Scrub is unlikely to help.
I came to the same conclusion after reading the documentatio...
Jan-Philipp Litza

06/17/2021

11:07 PM Bug #51083: Raw space filling up faster than used space
Jan-Philipp Litza wrote:
> Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems t...
Patrick Donnelly
09:13 PM Bug #51083: Raw space filling up faster than used space
Patrick: do you understand how upgrading the MDS daemons helped in this case? There is nothing in the osd/bluestore s... Neha Ojha
09:03 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
We definitely do not use cache tiering on any of our clusters. On the cluster above, we do use snapshots (via cephfs... Andras Pataki
08:48 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
It seems like you are using cache tiering, and there has been similar bugs reported like this. I don't understand why... Neha Ojha
09:01 PM Bug #51234 (Fix Under Review): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Sage Weil
08:57 PM Bug #50842 (Need More Info): pacific: recovery does not complete because of rw_manager lock not ...
Neha Ojha
08:53 PM Backport #51269 (In Progress): octopus: rados/perf: cosbench workloads hang forever
Deepika Upadhyay
07:14 PM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
https://github.com/ceph/ceph/pull/41922 Deepika Upadhyay
08:42 PM Bug #51074 (Pending Backport): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with...
marking Pending Backport, needs to be included with https://github.com/ceph/ceph/pull/41731 Neha Ojha
08:40 PM Bug #51168 (Need More Info): ceph-osd state machine crash during peering process
Can you please attach the osd log for this crash? Neha Ojha
08:07 PM Bug #51270 (Fix Under Review): mon: stretch mode clusters do not sanely set default crush rules
Greg Farnum
08:03 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
If you do not specify a crush rule when creating a pool, the OSDMonitor picks the default one for you out of the conf... Greg Farnum
07:12 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
Deepika Upadhyay
02:32 PM Backport #51237: nautilus: rebuild-mondb hangs
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41874
merged
Yuri Weinstein

06/16/2021

10:40 PM Bug #51254 (New): deep-scrub stat mismatch on last PG in pool
In the past few weeks, we got inconsistent PGs in deep-scrub a few times, always on the very last PG in the pool:
...
Andras Pataki
07:25 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Deepika Upadhyay
07:22 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/ceph/teuthology-archive/yuriw-2021-06-14_19:20:57-rados-wip-yuri6-testing-2021-06-14-1106-octopus-distro-basic-smith... Deepika Upadhyay
06:48 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Deepika Upadhyay
02:49 PM Bug #51246 (New): error in open_pools_parallel: rados_write(0.obj) failed with error: -2
... Deepika Upadhyay
01:22 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
> Will this patch be released in 14.2.22?
yes the PR has been merged to the nautilus branch, so it will be in the ...
Dan van der Ster
12:59 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
We hit this bug yesterday in a nautilus 14.2.18 cluster.
All monitors went down and started crashing on restart.
...
Rob Haverkamp
02:18 AM Backport #51237 (In Progress): nautilus: rebuild-mondb hangs
Kefu Chai
02:16 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
https://github.com/ceph/ceph/pull/41874 Backport Bot
02:13 AM Bug #38219 (Pending Backport): rebuild-mondb hangs
Kefu Chai

06/15/2021

08:05 PM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Just to note:
IMO ceph-bluestore-tool crash is caused by a bag in AvlAllocator and is a duplicate of https://tracker...
Igor Fedotov
06:59 PM Backport #51215 (In Progress): pacific: Global Recovery Event never completes
Kamoltat (Junior) Sirivadhna
12:55 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
Backport PR https://github.com/ceph/ceph/pull/41872 Backport Bot
06:50 PM Backport #50706: pacific: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41680
merged
Yuri Weinstein
06:47 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Neha: I did, but am afraid they are lost, the test was from https://pulpito.ceph.com/ideepika-2021-05-17_10:16:28-ra... Deepika Upadhyay
06:31 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Deepika, do you happen to have the logs saved somewhere? Neha Ojha
06:41 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
... Neha Ojha
06:30 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/osd-dispatch-... Neha Ojha
06:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Looks very similar... Neha Ojha
04:12 PM Backport #50750: octopus: max_misplaced was replaced by target_max_misplaced_ratio
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41624
merged
Yuri Weinstein
12:20 PM Bug #51223 (New): statfs: a cluster with filestore and bluestore OSD's will report bytes_used == ...
Cluster migrated from Luminous mixed bluestore+filestore OSD's to Nautilus 14.2.21
After last filestore OSD purged f...
Konstantin Shalygin
10:47 AM Bug #49677 (Resolved): debian ceph-common package post-inst clobbers ownership of cephadm log dirs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:47 AM Bug #49781 (Resolved): unittest_mempool.check_shard_select failed
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50558 (Resolved): Data loss propagation after backfill
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:42 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41762
m...
Loïc Dachary
10:41 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
Loïc Dachary
10:36 AM Backport #50153 (Resolved): nautilus: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41611
m...
Loïc Dachary
10:36 AM Backport #49729 (Resolved): nautilus: debian ceph-common package post-inst clobbers ownership of ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40698
m...
Loïc Dachary
10:32 AM Backport #50988: nautilus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41519
m...
Loïc Dachary
09:05 AM Backport #50406: pacific: mon: new monitors may direct MMonJoin to a peon instead of the leader
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41131
m...
Loïc Dachary
09:04 AM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41130
m...
Loïc Dachary
09:04 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41320
m...
Loïc Dachary
09:03 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41236
m...
Loïc Dachary
09:03 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41125
m...
Loïc Dachary
09:02 AM Backport #49992 (Resolved): pacific: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40566
m...
Loïc Dachary
12:54 AM Bug #49988 (Pending Backport): Global Recovery Event never completes
Neha Ojha

06/14/2021

03:24 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
For now, we do not have a global flag, like `ceph osd set noout` for the pg autoscale feature. We have pool flags[1] ... Vikhyat Umrao
09:24 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())

A related crash, happened when I disabled scrubbing:
-1> 2021-06-14T11:17:15.373+0200 7fb9916f5700 -1 /home/...
Andrej Filipcic

06/13/2021

03:34 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
To prevent the user IO from being blocked, we took this action:
1. First, we queried the unfound objects. osd.951 ...
Dan van der Ster
01:36 PM Bug #51194 (New): PG recovery_unfound after scrub repair failed on primary
This comes from a mail I send to the ceph-users ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3... Dan van der Ster
03:30 PM Backport #51195 (Resolved): pacific: [rfe] increase osd_max_write_op_reply_len default value to 6...
https://github.com/ceph/ceph/pull/53470 Backport Bot
03:28 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Kefu Chai

06/12/2021

12:38 AM Bug #49988 (Resolved): Global Recovery Event never completes
Kefu Chai

06/10/2021

08:35 PM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
Backport Bot
08:35 PM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
https://github.com/ceph/ceph/pull/44175 Backport Bot
08:35 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
https://github.com/ceph/ceph/pull/44176 Backport Bot
08:32 PM Bug #51002 (Pending Backport): regression in ceph daemonperf command output, osd columns aren't v...
Igor Fedotov
06:48 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
Dan van der Ster wrote:
> https://github.com/ceph/ceph/pull/41762
merged
Yuri Weinstein
03:58 PM Bug #51168 (New): ceph-osd state machine crash during peering process
... Yao Ning
02:22 PM Bug #51166 (Fix Under Review): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Matt Benjamin
02:16 PM Bug #51166 (Resolved): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
As agreed in #ceph-devel, Sage, Josh, Neha concurring.
Matt Benjamin
01:53 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
For the dead jobs, relevant logs have been uploaded to senta02 under /home/sseshasa/recovery_timeout.
Please let me ...
Sridhar Seshasayee
10:25 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especially when RBD writes excee... Andrej Filipcic

06/09/2021

01:32 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
From the logs of 6161181 snapshot recovery is not able to proceed since a rwlock on the head version
(3:cb63772d:::...
Sridhar Seshasayee
08:07 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Ran the same test repeatedly (5 times) on master by setting osd_op_queue to 'wpq' and 'mclock_scheduler' on different... Sridhar Seshasayee
10:39 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Kefu, yes I did read the update and your effort to find the commit(s) that caused the regression in the standalone te... Sridhar Seshasayee
10:25 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Sridhar, please read the https://tracker.ceph.com/issues/51074#note-3. that's my finding in the last 3 days. Kefu Chai
09:44 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Raised PR https://github.com/ceph/ceph/pull/41782 to address the test failure.
Please see latest update to https:/...
Sridhar Seshasayee
09:05 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
Backport Bot
09:05 AM Backport #51150 (Resolved): pacific: When read failed, ret can not take as data len, in FillInVer...
https://github.com/ceph/ceph/pull/44173 Backport Bot
09:05 AM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
https://github.com/ceph/ceph/pull/44174 Backport Bot
09:02 AM Bug #51115 (Pending Backport): When read failed, ret can not take as data len, in FillInVerifyExtent
Kefu Chai

06/08/2021

11:10 PM Bug #38219: rebuild-mondb hangs
http://qa-proxy.ceph.com/teuthology/yuriw-2021-06-08_20:53:36-rados-wip-yuri-testing-2021-06-04-0753-nautilus-distro-... Deepika Upadhyay
06:39 PM Bug #38219: rebuild-mondb hangs
2021-06-04T23:05:38.775 INFO:tasks.ceph.mon.a.smithi071.stderr:/build/ceph-14.2.21-305-gac8fcfa6/src/mon/OSDMonitor.c... Deepika Upadhyay
07:52 PM Backport #50797 (In Progress): pacific: mon: spawn loop after mon reinstalled
Neha Ojha
05:58 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41762 Dan van der Ster
04:50 PM Bug #50681: memstore: apparent memory leak when removing objects
Sven Anderson wrote:
> Greg Farnum wrote:
> > How long did you wait to see if memory usage dropped? Did you look at...
Greg Farnum
08:48 AM Bug #51074 (Triaged): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad dat...
Kefu Chai

06/07/2021

11:37 AM Backport #51117 (In Progress): pacific: osd: Run osd bench test to override default max osd capac...
Sridhar Seshasayee
10:25 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
https://github.com/ceph/ceph/pull/41731 Backport Bot
10:22 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
Sridhar Seshasayee
08:28 AM Bug #51115: When read failed, ret can not take as data len, in FillInVerifyExtent
https://github.com/ceph/ceph/pull/41727 yanqiang sun
08:12 AM Bug #51115 (Fix Under Review): When read failed, ret can not take as data len, in FillInVerifyExtent
Kefu Chai
07:42 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
when read failed, such as return -EIO, FillInVerifyExtent take ret as data length. yanqiang sun
06:59 AM Bug #51083: Raw space filling up faster than used space
Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems to have stopped the space was... Jan-Philipp Litza

06/06/2021

11:22 AM Feature #51110 (New): invalidate crc in buffer::ptr::c_str()
h3. what:
*buffer::ptr* (or more precisely, *buffer::raw*) has the ability to cache CRC codes that are calculated ...
Wenjun Huang

06/05/2021

04:14 PM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
not able to reproduce this issue locally. bisecting:
|0331281e8a74d0b744cdcede1db24e7fea4656fc | https://pulpito.c...
Kefu Chai
04:07 PM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
/a/kchai-2021-06-05_13:57:48-rados-master-distro-basic-smithi/6154221/ Kefu Chai
05:05 AM Bug #50441 (Pending Backport): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Kefu Chai

06/04/2021

11:01 PM Bug #51030 (Fix Under Review): osd crush during writing to EC pool when enabling jaeger tracing
Neha Ojha
10:44 PM Bug #50943 (Closed): mon crash due to assert failed
Luminous is EOL, can you please redeploy the monitor and upgrade to a supported version of Ceph. Please reopen this t... Neha Ojha
09:56 PM Bug #50308 (Resolved): mon: stretch state is inconsistently-maintained on peons, preventing prope...
Greg Farnum
09:55 PM Backport #50344 (Resolved): pacific: mon: stretch state is inconsistently-maintained on peons, pr...
Greg Farnum
09:55 PM Bug #50345 (Resolved): mon: new monitors may direct MMonJoin to a peon instead of the leader
Greg Farnum
09:54 PM Backport #50406 (Resolved): pacific: mon: new monitors may direct MMonJoin to a peon instead of t...
https://github.com/ceph/ceph/pull/41131 Greg Farnum
09:41 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://pulpito.ceph.com/gregf-2021-06-03_20:03:04-rados-pacific-mmonjoin-leader-testing-distro-basic-smithi/6150351/ Greg Farnum
08:56 PM Bug #50853: libcephsqlite: Core dump while running test_libcephsqlite.sh.
Also a little more information about what the test is doing: this stage is testing that libcephsqlite kills all I/O i... Patrick Donnelly
08:41 PM Bug #50853 (Need More Info): libcephsqlite: Core dump while running test_libcephsqlite.sh.
So, unfortunately I've been unable to get the correct debugging symbols for the core file so I haven't been able to g... Patrick Donnelly
07:13 PM Bug #51101 (Resolved): rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: ...
... Neha Ojha
06:27 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147408 Neha Ojha
06:25 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
/a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147404 Neha Ojha
06:24 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
/a/yuriw-2021-06-02_18:33:05-rados-wip-yuri3-testing-2021-06-02-0826-pacific-distro-basic-smithi/6147462 - with logs! Neha Ojha
04:08 PM Bug #47440: nautilus: valgrind caught leak in Messenger::ms_deliver_verify_authorizer
... Deepika Upadhyay
06:46 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Yes, "debug paxos = 30" would definitely help! Sorry, I missed it because the previous set of...
wenge song
02:08 AM Bug #50813 (Duplicate): mon/OSDMonitor: should clear new flag when do destroy
the issue had fixed by https://github.com/ceph/ceph/commit/13393f6108a89973e0415caa61c6025c760a3930 Zengran Zhang

06/03/2021

09:49 PM Bug #50775: mds and osd unable to obtain rotating service keys
Yes, "debug paxos = 30" would definitely help! Sorry, I missed it because the previous set of logs that you shared h... Ilya Dryomov
06:12 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> These logs are still weird. Now there is plenty of update_from_paxos log messages but virtual...
wenge song
07:48 PM Bug #51083 (Need More Info): Raw space filling up faster than used space
We're seeing something strange currently. Our cluster is filling up faster than it should, and I assume it has someth... Jan-Philipp Litza
07:48 PM Backport #50153: nautilus: Reproduce https://tracker.ceph.com/issues/48417
Dan van der Ster wrote:
> Nautilus still has the buggy code in PG.cc (it was factored out to PeeringState.cc in octo...
Yuri Weinstein
07:28 PM Backport #49729: nautilus: debian ceph-common package post-inst clobbers ownership of cephadm log...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40698
merged
Yuri Weinstein
05:52 PM Backport #50704 (In Progress): nautilus: _delete_some additional unexpected onode list
Neha Ojha
04:11 PM Backport #50706 (In Progress): pacific: _delete_some additional unexpected onode list
Neha Ojha
10:34 AM Bug #51076 (Resolved): "wait_for_recovery: failed before timeout expired" during thrashosd test w...
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/6145021
Unfortunately t...
Sridhar Seshasayee
09:17 AM Bug #46847: Loss of placement information on OSD reboot
I had a look at the reproducer and am not entirely sure if it is equivalent to the problem discussed here. it might b... Frank Schilder
09:07 AM Bug #51074 (Resolved): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad da...
Observed on Master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450...
Sridhar Seshasayee
12:54 AM Bug #47654 (Resolved): test_mon_pg: mon fails to join quorum to due election strategy mismatch
Greg Farnum
12:54 AM Backport #50087 (Resolved): pacific: test_mon_pg: mon fails to join quorum to due election strate...
Greg Farnum

06/02/2021

08:40 PM Backport #50794: pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41320
merged
Yuri Weinstein
06:54 PM Backport #50702: pacific: Data loss propagation after backfill
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41236
merged
Yuri Weinstein
06:53 PM Backport #50606: pacific: osd/scheduler/mClockScheduler: Async reservers are not updated with the...
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/41125
merged
Yuri Weinstein
06:51 PM Backport #49992: pacific: unittest_mempool.check_shard_select failed
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40566
merged
Yuri Weinstein
06:46 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-05-25_19:21:19-rados-wip-yuri2-testing-2021-05-25-0940-pacific-distro-basic-smithi/6134490 Neha Ojha
06:38 PM Bug #50042: rados/test.sh: api_watch_notify failures
/a/yuriw-2021-05-25_19:21:19-rados-wip-yuri2-testing-2021-05-25-0940-pacific-distro-basic-smithi/6134471 Neha Ojha
11:46 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
Closing this since the issue was hit during teuthology testing of my PR: https://github.com/ceph/ceph/pull/41308. Thi... Sridhar Seshasayee
06:57 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
Observed on master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450...
Sridhar Seshasayee
06:54 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
Observed on master:
/a/sseshasa-2021-06-01_08:27:04-rados-wip-sseshasa-testing-objs-test-2-distro-basic-smithi/61450...
Sridhar Seshasayee
04:14 AM Bug #49962 (Resolved): 'sudo ceph --cluster ceph osd crush tunables default' fails due to valgrin...
Thanks Radoslaw! Patrick Donnelly
03:47 AM Bug #50853 (In Progress): libcephsqlite: Core dump while running test_libcephsqlite.sh.
Patrick Donnelly

06/01/2021

05:37 PM Bug #50853: libcephsqlite: Core dump while running test_libcephsqlite.sh.
/a/sage-2021-05-29_16:04:00-rados-wip-sage3-testing-2021-05-29-1009-distro-basic-smithi/6142109
Sage Weil
05:24 PM Bug #50743: *: crash in pthread_getname_np
... Patrick Donnelly
11:51 AM Backport #50750 (In Progress): octopus: max_misplaced was replaced by target_max_misplaced_ratio
Cory Snyder
11:50 AM Backport #50705 (In Progress): octopus: _delete_some additional unexpected onode list
Cory Snyder
11:49 AM Backport #50987 (In Progress): octopus: unaligned access to member variables of crush_work_bucket
Cory Snyder
11:48 AM Backport #50796 (In Progress): octopus: mon: spawn loop after mon reinstalled
Cory Snyder
11:47 AM Backport #50790 (In Progress): octopus: osd: write_trunc omitted to clear data digest
Cory Snyder
11:41 AM Backport #50990 (In Progress): octopus: mon: slow ops due to osd_failure
Cory Snyder
09:43 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
> I set the cluster into "maintenance mode", noout, norebalance, nobackfill, norecover. And then proceeded to reboot ... Dan van der Ster
09:24 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
Could be related to https://github.com/ceph/ceph/pull/40572 Dan van der Ster
07:57 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
https://tracker.ceph.com/issues/48060 is the same Jeremi A
09:19 AM Backport #50153: nautilus: Reproduce https://tracker.ceph.com/issues/48417
Nautilus still has the buggy code in PG.cc (it was factored out to PeeringState.cc in octopus and newer).
I backpo...
Dan van der Ster
08:54 AM Backport #50152 (In Progress): octopus: Reproduce https://tracker.ceph.com/issues/48417
Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
Copy it to something with backpor...
Dan van der Ster
07:58 AM Bug #48060: data loss in EC pool
I have had exactly the same issue with my cluster - https://tracker.ceph.com/issues/51024 while not even having any d... Jeremi A
06:03 AM Bug #51030: osd crush during writing to EC pool when enabling jaeger tracing
PR: https://github.com/ceph/ceph/pull/41604 Tomohiro Misono
05:48 AM Bug #51030 (Fix Under Review): osd crush during writing to EC pool when enabling jaeger tracing
On cent8(x86_64)
1. compiled with -DWITH_JAEGER=ON
2. starts vstart cluster
3. write to ec pool (i.e. rados benc...
Tomohiro Misono

05/31/2021

02:46 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
This issue is related to the changes currently under review: https://github.com/ceph/ceph/pull/41308
The ceph_obje...
Sridhar Seshasayee
02:01 PM Bug #50688 (Duplicate): Ceph can't be deployed using cephadm on nodes with /32 ip addresses
Kefu Chai
02:01 PM Bug #50688: Ceph can't be deployed using cephadm on nodes with /32 ip addresses
should have been fixed by https://github.com/ceph/ceph/pull/40961 Kefu Chai
07:56 AM Bug #51024: OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one host ...
I forgot to add.
I pulled v15.2.12 on the affected host, and also try running the OSD in that version. It didn't m...
Jeremi A
07:55 AM Bug #51024 (New): OSD - FAILED ceph_assert(clone_size.count(clone), keeps on restarting after one...
Good day
I'm currently experiencing the same issue as with this gentleman: https://www.mail-archive.com/ceph-users...
Jeremi A

05/29/2021

08:04 AM Bug #50775: mds and osd unable to obtain rotating service keys
These logs are still weird. Now there is plenty of update_from_paxos log messages but virtually no paxosservice log ... Ilya Dryomov
02:25 AM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/a/kchai-2021-05-28_13:33:45-rados-wip-kefu-testing-2021-05-28-1806-distro-basic-smithi/6140866 Kefu Chai
 

Also available in: Atom