Project

General

Profile

Activity

From 06/06/2021 to 07/05/2021

07/05/2021

01:27 PM Bug #46847: Loss of placement information on OSD reboot
Last week we had a power outage affecting all OSD machines in a 14.2.20 cluster. A small percentage of PGs didn't act... Dan van der Ster
01:18 PM Bug #51527 (Resolved): Ceph osd crashed due to segfault
Hi everyone,
We have 9 osd nodes with 12 deamons for each node.
Ceph used for s3 objects and rbd images.
ceph ...
Evgeny Zakharov
12:06 PM Bug #48965: qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
/a/sseshasa-2021-07-05_10:18:42-rados:standalone-wip-test-stdalone-mclk-1-distro-basic-smithi/6253062 Sridhar Seshasayee
11:49 AM Bug #45761 (Need More Info): mon_thrasher: "Error ENXIO: mon unavailable" during sync_force comma...
Stopped Reproducing, please reopen if you hit another instance Deepika Upadhyay
11:47 AM Bug #48609 (Closed): osd/PGLog: don’t fast-forward can_rollback_to during merge_log if the log is...
Root cause resolved Deepika Upadhyay
11:46 AM Backport #51522 (Resolved): pacific: osd: Delay sending info to new backfill peer resetting last_...
https://github.com/ceph/ceph/pull/41136 Deepika Upadhyay
11:35 AM Backport #51522 (Resolved): pacific: osd: Delay sending info to new backfill peer resetting last_...
Deepika Upadhyay
11:45 AM Backport #51523: octopus: osd: Delay sending info to new backfill peer resetting last_backfill un...
https://github.com/ceph/ceph/pull/40593/ Deepika Upadhyay
11:35 AM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
Deepika Upadhyay
11:36 AM Backport #51525 (Rejected): octopus: osd: Delay sending info to new backfill peer resetting last_...
Backport Bot
11:35 AM Bug #48611: osd: Delay sending info to new backfill peer resetting last_backfill until backfill a...
since nautilus has reached EOL removed it Deepika Upadhyay
11:34 AM Bug #48611 (Pending Backport): osd: Delay sending info to new backfill peer resetting last_backfi...
Deepika Upadhyay
04:19 AM Bug #45457 (Fix Under Review): CEPH Graylog Logging Missing "host" Field
Kefu Chai

07/03/2021

06:16 AM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
Another OSD crash after scrub assert bug, log attached. corrupted rocskdb. Andrej Filipcic

07/02/2021

06:55 PM Bug #50866: osd: stat mismatch on objects
/ceph/teuthology-archive/pdonnell-2021-07-02_10:08:50-fs-wip-pdonnell-testing-20210701.192056-distro-basic-smithi/624... Patrick Donnelly
05:00 PM Backport #51498 (Resolved): pacific: mgr spamming with repeated set pgp_num_actual while merging
https://github.com/ceph/ceph/pull/42223 Backport Bot
05:00 PM Backport #51497 (Rejected): nautilus: mgr spamming with repeated set pgp_num_actual while merging
https://github.com/ceph/ceph/pull/43218 Backport Bot
05:00 PM Backport #51496 (Resolved): octopus: mgr spamming with repeated set pgp_num_actual while merging
https://github.com/ceph/ceph/pull/42420 Backport Bot
04:59 PM Bug #51433 (Pending Backport): mgr spamming with repeated set pgp_num_actual while merging
Kefu Chai

07/01/2021

09:39 PM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
Seems very similar to https://tracker.ceph.com/issues/50042#note-2 Neha Ojha
09:05 PM Bug #48417 (Duplicate): unfound EC objects in sepia's LRC after upgrade
Neha Ojha
06:57 PM Backport #51453: pacific: Add simultaneous scrubs to rados/thrash
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42120
merged
Yuri Weinstein
04:42 PM Bug #48212 (Fix Under Review): poollast_epoch_clean floor is stuck after pg merging
Dan van der Ster
02:13 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
Dan van der Ster wrote:
> I suspect the cause is that there's a leftover epoch value for the now-deleted PG in `epoc...
Dan van der Ster
01:28 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
I suspect the cause is that there's a leftover epoch value for the now-deleted PG in `epoch_by_pg` in `void LastEpoch... Dan van der Ster
04:38 PM Bug #38931 (Fix Under Review): osd does not proactively remove leftover PGs
Our customer reported a similar case, providing an easy way to reproduce the issue: if when purging a pg the osd is m... Mykola Golub
11:03 AM Fix #51464 (Fix Under Review): osd: Add mechanism to avoid running osd benchmark on osd init when...
Sridhar Seshasayee
08:54 AM Fix #51464 (Resolved): osd: Add mechanism to avoid running osd benchmark on osd init when using m...
The current behavior is to let the osd benchmark run on each osd
init, which is not necessary. If the underlying dev...
Sridhar Seshasayee
07:44 AM Bug #51463 (Resolved): blocked requests while stopping/starting OSDs
Hi,
we run into a lot of slow requests. (IO blocked for several seconds) while stopping or starting one or more OS...
Manuel Lausch
07:13 AM Bug #51419: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_aligned_size_a...
Initially triggered with fio when testing rbd persistent write-back cache in ssd mode:... Ilya Dryomov

06/30/2021

09:50 PM Bug #49894 (In Progress): set a non-zero default value for osd_client_message_cap
Neha Ojha wrote:
> Neha Ojha wrote:
> > The current default of 0 doesn't help and we've tried setting it to 5000 fo...
Neha Ojha
06:53 PM Backport #50790: octopus: osd: write_trunc omitted to clear data digest
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41620
merged
Yuri Weinstein
06:50 PM Backport #50791: pacific: osd: write_trunc omitted to clear data digest
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42019
merged
Yuri Weinstein
06:49 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
https://github.com/ceph/ceph/pull/41944 merged Yuri Weinstein
06:44 PM Bug #51457 (New): qa/standalone/scrub/osd-scrub-test.sh: TEST_interval_changes: date check failed
... Neha Ojha
04:44 PM Backport #51453 (In Progress): pacific: Add simultaneous scrubs to rados/thrash
Neha Ojha
04:30 PM Backport #51453 (Resolved): pacific: Add simultaneous scrubs to rados/thrash
https://github.com/ceph/ceph/pull/42120 Backport Bot
04:39 PM Bug #45868: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
/a/yuriw-2021-06-29_19:12:08-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6243653 Neha Ojha
04:35 PM Bug #51454 (New): Simultaneous OSD's crash with tp_osd_tp on rocksdb::MergingIterator::Next()
Ceph v14.2.15
Main use case is RGW.
Bucket indexes on SSD OSDs.
Majority of SSD OSD under bucket intexes are FileS...
Aleksandr Rudenko
04:30 PM Backport #51452 (Resolved): octopus: Add simultaneous scrubs to rados/thrash
https://github.com/ceph/ceph/pull/42422 Backport Bot
04:28 PM Bug #51451 (Resolved): Add simultaneous scrubs to rados/thrash
Motivated by https://tracker.ceph.com/issues/50346. Neha Ojha
09:18 AM Bug #51419 (Fix Under Review): bufferlist::splice() may cause stack corruption in bufferlist::reb...
Kefu Chai
06:55 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
http://qa-proxy.ceph.com/teuthology/ideepika-2021-06-30_04:28:06-rados-wip-yuri7-testing-2021-06-28-1224-octopus-dist... Deepika Upadhyay

06/29/2021

09:16 PM Bug #51433 (Fix Under Review): mgr spamming with repeated set pgp_num_actual while merging
Neha Ojha
08:33 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
While merging PGs our osdmaps are churning through ~2000 epochs per hour.
The osdmap diffs are empty:...
Dan van der Ster
08:03 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
The sequence looks a lit different this time.
/a/rfriedma-2021-06-26_19:32:15-rados-wip-ronenf-scrubs-config-distr...
Neha Ojha
05:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/yuriw-2021-06-28_17:32:48-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6239590 Neha Ojha
04:59 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-06-28_17:32:48-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6239575 Neha Ojha
08:13 AM Bug #48613 (Resolved): Reproduce https://tracker.ceph.com/issues/48417
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:12 AM Bug #49139 (Resolved): rados/perf: cosbench workloads hang forever
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:11 AM Bug #49988 (Resolved): Global Recovery Event never completes
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:11 AM Bug #50230 (Resolved): mon: spawn loop after mon reinstalled
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:10 AM Bug #50466 (Resolved): _delete_some additional unexpected onode list
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:10 AM Bug #50477 (Resolved): mon/MonClient: reset authenticate_err in _reopen_session()
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:08 AM Bug #50964 (Resolved): mon: slow ops due to osd_failure
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:03 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41874
m...
Loïc Dachary
08:03 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a//kchai-2021-06-27_13:33:07-rados-wip-kefu-testing-2021-06-27-1907-distro-basic-smithi/6238237 Kefu Chai
08:00 AM Backport #50987 (Resolved): octopus: unaligned access to member variables of crush_work_bucket
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41622
m...
Loïc Dachary
08:00 AM Backport #50796 (Resolved): octopus: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41621
m...
Loïc Dachary
07:56 AM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41922
m...
Loïc Dachary
07:56 AM Backport #50990: octopus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41618
m...
Loïc Dachary
07:53 AM Backport #50705 (Resolved): octopus: _delete_some additional unexpected onode list
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41623
m...
Loïc Dachary
07:53 AM Backport #50152 (Resolved): octopus: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41609
m...
Loïc Dachary
07:52 AM Backport #50750 (Resolved): octopus: max_misplaced was replaced by target_max_misplaced_ratio
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41624
m...
Loïc Dachary
07:38 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41971
m...
Loïc Dachary
07:37 AM Backport #50505 (Resolved): pacific: mon/MonClient: reset authenticate_err in _reopen_session()
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41019
m...
Loïc Dachary
07:37 AM Backport #50986 (Resolved): pacific: unaligned access to member variables of crush_work_bucket
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41983
m...
Loïc Dachary
07:36 AM Backport #50989 (Resolved): pacific: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41982
m...
Loïc Dachary
07:32 AM Backport #50797 (Resolved): pacific: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41768
m...
Loïc Dachary
07:28 AM Backport #51215: pacific: Global Recovery Event never completes
Nathan, would you be so kind as to add a link to this issue in https://github.com/ceph/ceph/pull/41872 ? Loïc Dachary
07:27 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
Loïc Dachary
07:19 AM Backport #50706 (Resolved): pacific: _delete_some additional unexpected onode list
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41680
m...
Loïc Dachary
05:40 AM Bug #51419 (Resolved): bufferlist::splice() may cause stack corruption in bufferlist::rebuild_ali...
*** stack smashing detected ***: terminated2073 IOPS][eta 02h:59m:36s]
--Type <RET> for more, q to quit, c to contin...
CONGMIN YIN

06/28/2021

07:29 PM Backport #50987: octopus: unaligned access to member variables of crush_work_bucket
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41622
merged
Yuri Weinstein
07:29 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged
Yuri Weinstein
04:25 PM Backport #50505: pacific: mon/MonClient: reset authenticate_err in _reopen_session()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41019
mergedReviewed-by: Kefu Chai <kchai@redhat.com>
Yuri Weinstein
06:54 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
I see a similar crash on quincy, suspect its seen when I try to add mons from 1 to 3 .
/]# ceph crash info 2021-06...
Tejas C

06/26/2021

02:27 PM Backport #50986: pacific: unaligned access to member variables of crush_work_bucket
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41983
merged
Yuri Weinstein
02:26 PM Backport #50989: pacific: mon: slow ops due to osd_failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41982
merged
Yuri Weinstein

06/25/2021

09:43 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
/a/yuriw-2021-06-24_16:54:31-rados-wip-yuri-testing-2021-06-24-0708-pacific-distro-basic-smithi/6190738 Neha Ojha
03:50 PM Backport #51371 (Resolved): pacific: OSD crash FAILED ceph_assert(!is_scrubbing())
https://github.com/ceph/ceph/pull/41944 Backport Bot
03:48 PM Bug #50346 (Pending Backport): OSD crash FAILED ceph_assert(!is_scrubbing())
Neha Ojha
06:48 AM Backport #50990 (Resolved): octopus: mon: slow ops due to osd_failure
Kefu Chai

06/24/2021

11:22 PM Backport #50791 (In Progress): pacific: osd: write_trunc omitted to clear data digest
Neha Ojha

06/23/2021

10:43 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Andrej Filipcic wrote:
> A related crash, happened when I disabled scrubbing:
>
> -1> 2021-06-14T11:17:15.373...
Neha Ojha
10:42 PM Bug #51338 (Duplicate): osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&g...
Originally reported in https://tracker.ceph.com/issues/50346#note-6... Neha Ojha
06:16 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged
Yuri Weinstein
03:31 PM Backport #50797: pacific: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41768
merged
Yuri Weinstein
03:30 PM Bug #49988: Global Recovery Event never completes
https://github.com/ceph/ceph/pull/41872 merged Yuri Weinstein

06/22/2021

10:47 PM Backport #50986 (In Progress): pacific: unaligned access to member variables of crush_work_bucket
Neha Ojha
10:44 PM Backport #50989 (In Progress): pacific: mon: slow ops due to osd_failure
Neha Ojha
05:21 PM Backport #50705: octopus: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41623
merged
Yuri Weinstein
05:21 PM Backport #50152: octopus: Reproduce https://tracker.ceph.com/issues/48417
Dan van der Ster wrote:
> Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
> Copy...
Yuri Weinstein
11:23 AM Backport #51315 (In Progress): nautilus: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51315 (Resolved): nautilus: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41973 Mykola Golub
11:11 AM Backport #51314 (In Progress): octopus: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51314 (Resolved): octopus: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41972 Mykola Golub
10:56 AM Backport #51313 (In Progress): pacific: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41971 Mykola Golub
10:55 AM Backport #51316 (Duplicate): nautilus: osd:scrub skip some pg
Backport Bot
10:53 AM Bug #49487 (Pending Backport): osd:scrub skip some pg
Mykola Golub
10:28 AM Bug #50346 (Fix Under Review): OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen Friedman
06:59 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Andrej Filipcic wrote:
> On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especi...
玮文 胡

06/21/2021

08:50 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
FYI I tried with ceph/daemon-base:master-24e1f91-pacific-centos-8-x86_64 (the latest non-devel build at this time) ju... Andrew Davidoff
06:07 PM Bug #51307 (Resolved): LibRadosWatchNotify.Watch2Delete fails
... Sage Weil
03:40 PM Bug #51270: mon: stretch mode clusters do not sanely set default crush rules
Accidentally requested backports to Octopus/Nautilus, so nuking those. Greg Farnum
03:40 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
Accidental backport request Greg Farnum
03:40 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
Accidental backport request Greg Farnum

06/20/2021

11:58 AM Bug #50346 (In Progress): OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen Friedman

06/19/2021

03:00 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
https://github.com/ceph/ceph/pull/42909 Backport Bot
03:00 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
Backport Bot
03:00 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
Backport Bot
02:58 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
Kefu Chai
01:15 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
https://github.com/ceph/ceph/pull/46677 Backport Bot
01:12 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Kefu Chai
01:11 PM Bug #51234 (Resolved): LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
Kefu Chai
02:19 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
I think this one is related?
/ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-2021061...
Patrick Donnelly

06/18/2021

09:15 PM Bug #51083: Raw space filling up faster than used space
I don't have any ideas from the logs. Moving this back to RADOS. I doubt it has anything to do with CephFS. Patrick Donnelly
11:56 AM Bug #51083: Raw space filling up faster than used space
Patrick Donnelly wrote:
> Scrub is unlikely to help.
I came to the same conclusion after reading the documentatio...
Jan-Philipp Litza

06/17/2021

11:07 PM Bug #51083: Raw space filling up faster than used space
Jan-Philipp Litza wrote:
> Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems t...
Patrick Donnelly
09:13 PM Bug #51083: Raw space filling up faster than used space
Patrick: do you understand how upgrading the MDS daemons helped in this case? There is nothing in the osd/bluestore s... Neha Ojha
09:03 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
We definitely do not use cache tiering on any of our clusters. On the cluster above, we do use snapshots (via cephfs... Andras Pataki
08:48 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
It seems like you are using cache tiering, and there has been similar bugs reported like this. I don't understand why... Neha Ojha
09:01 PM Bug #51234 (Fix Under Review): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Sage Weil
08:57 PM Bug #50842 (Need More Info): pacific: recovery does not complete because of rw_manager lock not ...
Neha Ojha
08:53 PM Backport #51269 (In Progress): octopus: rados/perf: cosbench workloads hang forever
Deepika Upadhyay
07:14 PM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
https://github.com/ceph/ceph/pull/41922 Deepika Upadhyay
08:42 PM Bug #51074 (Pending Backport): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with...
marking Pending Backport, needs to be included with https://github.com/ceph/ceph/pull/41731 Neha Ojha
08:40 PM Bug #51168 (Need More Info): ceph-osd state machine crash during peering process
Can you please attach the osd log for this crash? Neha Ojha
08:07 PM Bug #51270 (Fix Under Review): mon: stretch mode clusters do not sanely set default crush rules
Greg Farnum
08:03 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
If you do not specify a crush rule when creating a pool, the OSDMonitor picks the default one for you out of the conf... Greg Farnum
07:12 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
Deepika Upadhyay
02:32 PM Backport #51237: nautilus: rebuild-mondb hangs
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41874
merged
Yuri Weinstein

06/16/2021

10:40 PM Bug #51254 (New): deep-scrub stat mismatch on last PG in pool
In the past few weeks, we got inconsistent PGs in deep-scrub a few times, always on the very last PG in the pool:
...
Andras Pataki
07:25 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Deepika Upadhyay
07:22 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/ceph/teuthology-archive/yuriw-2021-06-14_19:20:57-rados-wip-yuri6-testing-2021-06-14-1106-octopus-distro-basic-smith... Deepika Upadhyay
06:48 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Deepika Upadhyay
02:49 PM Bug #51246 (New): error in open_pools_parallel: rados_write(0.obj) failed with error: -2
... Deepika Upadhyay
01:22 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
> Will this patch be released in 14.2.22?
yes the PR has been merged to the nautilus branch, so it will be in the ...
Dan van der Ster
12:59 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
We hit this bug yesterday in a nautilus 14.2.18 cluster.
All monitors went down and started crashing on restart.
...
Rob Haverkamp
02:18 AM Backport #51237 (In Progress): nautilus: rebuild-mondb hangs
Kefu Chai
02:16 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
https://github.com/ceph/ceph/pull/41874 Backport Bot
02:13 AM Bug #38219 (Pending Backport): rebuild-mondb hangs
Kefu Chai

06/15/2021

08:05 PM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Just to note:
IMO ceph-bluestore-tool crash is caused by a bag in AvlAllocator and is a duplicate of https://tracker...
Igor Fedotov
06:59 PM Backport #51215 (In Progress): pacific: Global Recovery Event never completes
Kamoltat (Junior) Sirivadhna
12:55 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
Backport PR https://github.com/ceph/ceph/pull/41872 Backport Bot
06:50 PM Backport #50706: pacific: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41680
merged
Yuri Weinstein
06:47 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Neha: I did, but am afraid they are lost, the test was from https://pulpito.ceph.com/ideepika-2021-05-17_10:16:28-ra... Deepika Upadhyay
06:31 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Deepika, do you happen to have the logs saved somewhere? Neha Ojha
06:41 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
... Neha Ojha
06:30 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/osd-dispatch-... Neha Ojha
06:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Looks very similar... Neha Ojha
04:12 PM Backport #50750: octopus: max_misplaced was replaced by target_max_misplaced_ratio
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41624
merged
Yuri Weinstein
12:20 PM Bug #51223 (New): statfs: a cluster with filestore and bluestore OSD's will report bytes_used == ...
Cluster migrated from Luminous mixed bluestore+filestore OSD's to Nautilus 14.2.21
After last filestore OSD purged f...
Konstantin Shalygin
10:47 AM Bug #49677 (Resolved): debian ceph-common package post-inst clobbers ownership of cephadm log dirs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:47 AM Bug #49781 (Resolved): unittest_mempool.check_shard_select failed
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50558 (Resolved): Data loss propagation after backfill
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:42 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41762
m...
Loïc Dachary
10:41 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
Loïc Dachary
10:36 AM Backport #50153 (Resolved): nautilus: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41611
m...
Loïc Dachary
10:36 AM Backport #49729 (Resolved): nautilus: debian ceph-common package post-inst clobbers ownership of ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40698
m...
Loïc Dachary
10:32 AM Backport #50988: nautilus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41519
m...
Loïc Dachary
09:05 AM Backport #50406: pacific: mon: new monitors may direct MMonJoin to a peon instead of the leader
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41131
m...
Loïc Dachary
09:04 AM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41130
m...
Loïc Dachary
09:04 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41320
m...
Loïc Dachary
09:03 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41236
m...
Loïc Dachary
09:03 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41125
m...
Loïc Dachary
09:02 AM Backport #49992 (Resolved): pacific: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40566
m...
Loïc Dachary
12:54 AM Bug #49988 (Pending Backport): Global Recovery Event never completes
Neha Ojha

06/14/2021

03:24 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
For now, we do not have a global flag, like `ceph osd set noout` for the pg autoscale feature. We have pool flags[1] ... Vikhyat Umrao
09:24 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())

A related crash, happened when I disabled scrubbing:
-1> 2021-06-14T11:17:15.373+0200 7fb9916f5700 -1 /home/...
Andrej Filipcic

06/13/2021

03:34 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
To prevent the user IO from being blocked, we took this action:
1. First, we queried the unfound objects. osd.951 ...
Dan van der Ster
01:36 PM Bug #51194 (New): PG recovery_unfound after scrub repair failed on primary
This comes from a mail I send to the ceph-users ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3... Dan van der Ster
03:30 PM Backport #51195 (In Progress): pacific: [rfe] increase osd_max_write_op_reply_len default value t...
https://github.com/ceph/ceph/pull/53470 Backport Bot
03:28 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Kefu Chai

06/12/2021

12:38 AM Bug #49988 (Resolved): Global Recovery Event never completes
Kefu Chai

06/10/2021

08:35 PM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
Backport Bot
08:35 PM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
https://github.com/ceph/ceph/pull/44175 Backport Bot
08:35 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
https://github.com/ceph/ceph/pull/44176 Backport Bot
08:32 PM Bug #51002 (Pending Backport): regression in ceph daemonperf command output, osd columns aren't v...
Igor Fedotov
06:48 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
Dan van der Ster wrote:
> https://github.com/ceph/ceph/pull/41762
merged
Yuri Weinstein
03:58 PM Bug #51168 (New): ceph-osd state machine crash during peering process
... Yao Ning
02:22 PM Bug #51166 (Fix Under Review): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Matt Benjamin
02:16 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
As agreed in #ceph-devel, Sage, Josh, Neha concurring.
Matt Benjamin
01:53 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
For the dead jobs, relevant logs have been uploaded to senta02 under /home/sseshasa/recovery_timeout.
Please let me ...
Sridhar Seshasayee
10:25 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especially when RBD writes excee... Andrej Filipcic

06/09/2021

01:32 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
From the logs of 6161181 snapshot recovery is not able to proceed since a rwlock on the head version
(3:cb63772d:::...
Sridhar Seshasayee
08:07 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Ran the same test repeatedly (5 times) on master by setting osd_op_queue to 'wpq' and 'mclock_scheduler' on different... Sridhar Seshasayee
10:39 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Kefu, yes I did read the update and your effort to find the commit(s) that caused the regression in the standalone te... Sridhar Seshasayee
10:25 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Sridhar, please read the https://tracker.ceph.com/issues/51074#note-3. that's my finding in the last 3 days. Kefu Chai
09:44 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
Raised PR https://github.com/ceph/ceph/pull/41782 to address the test failure.
Please see latest update to https:/...
Sridhar Seshasayee
09:05 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
Backport Bot
09:05 AM Backport #51150 (Resolved): pacific: When read failed, ret can not take as data len, in FillInVer...
https://github.com/ceph/ceph/pull/44173 Backport Bot
09:05 AM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
https://github.com/ceph/ceph/pull/44174 Backport Bot
09:02 AM Bug #51115 (Pending Backport): When read failed, ret can not take as data len, in FillInVerifyExtent
Kefu Chai

06/08/2021

11:10 PM Bug #38219: rebuild-mondb hangs
http://qa-proxy.ceph.com/teuthology/yuriw-2021-06-08_20:53:36-rados-wip-yuri-testing-2021-06-04-0753-nautilus-distro-... Deepika Upadhyay
06:39 PM Bug #38219: rebuild-mondb hangs
2021-06-04T23:05:38.775 INFO:tasks.ceph.mon.a.smithi071.stderr:/build/ceph-14.2.21-305-gac8fcfa6/src/mon/OSDMonitor.c... Deepika Upadhyay
07:52 PM Backport #50797 (In Progress): pacific: mon: spawn loop after mon reinstalled
Neha Ojha
05:58 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41762 Dan van der Ster
04:50 PM Bug #50681: memstore: apparent memory leak when removing objects
Sven Anderson wrote:
> Greg Farnum wrote:
> > How long did you wait to see if memory usage dropped? Did you look at...
Greg Farnum
08:48 AM Bug #51074 (Triaged): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad dat...
Kefu Chai

06/07/2021

11:37 AM Backport #51117 (In Progress): pacific: osd: Run osd bench test to override default max osd capac...
Sridhar Seshasayee
10:25 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
https://github.com/ceph/ceph/pull/41731 Backport Bot
10:22 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
Sridhar Seshasayee
08:28 AM Bug #51115: When read failed, ret can not take as data len, in FillInVerifyExtent
https://github.com/ceph/ceph/pull/41727 yanqiang sun
08:12 AM Bug #51115 (Fix Under Review): When read failed, ret can not take as data len, in FillInVerifyExtent
Kefu Chai
07:42 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
when read failed, such as return -EIO, FillInVerifyExtent take ret as data length. yanqiang sun
06:59 AM Bug #51083: Raw space filling up faster than used space
Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems to have stopped the space was... Jan-Philipp Litza

06/06/2021

11:22 AM Feature #51110 (New): invalidate crc in buffer::ptr::c_str()
h3. what:
*buffer::ptr* (or more precisely, *buffer::raw*) has the ability to cache CRC codes that are calculated ...
Wenjun Huang
 

Also available in: Atom