Project

General

Profile

Activity

From 06/13/2021 to 07/12/2021

07/12/2021

11:11 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
Neha Ojha wrote:
> fails differently with centos 8.stream
Tracked in https://tracker.ceph.com/issues/51638
>
...
Neha Ojha
05:56 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
fails differently with centos 8.stream... Neha Ojha
05:38 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
/a/yuriw-2021-07-12_16:33:44-rados-wip-yuriw-master-7.8.21-distro-basic-smithi/6265227/
This seems to be an issue ...
Neha Ojha
09:25 PM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
spotted again at ksirivad-2021-07-11_01:45:00-rados-wip-pg-autoscaler-overlap-distro-basic-smithi/6262857/ Kamoltat (Junior) Sirivadhna
01:20 PM Bug #51627 (Fix Under Review): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missi...
Kefu Chai
08:16 AM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
https://github.com/ceph/ceph/pull/42279 Myoungwon Oh
02:31 AM Bug #51627 (Resolved): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_...
spotted again,... Kefu Chai
09:24 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
spotted again /a/ksirivad-2021-07-11_01:45:00-rados-wip-pg-autoscaler-overlap-distro-basic-smithi/6262966/ Kamoltat (Junior) Sirivadhna
08:51 PM Bug #51638 (Resolved): rados/test_envlibrados_for_rocksdb.sh: No match for argument: snappy-devel...
... Neha Ojha
08:00 PM Backport #50748 (Resolved): pacific: max_misplaced was replaced by target_max_misplaced_ratio
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42250
m...
Loïc Dachary
07:54 PM Backport #50790 (Resolved): octopus: osd: write_trunc omitted to clear data digest
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41620
m...
Loïc Dachary
02:31 AM Bug #50192 (Pending Backport): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missi...
Kefu Chai
02:22 AM Bug #50192 (New): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missi...
Kefu Chai

07/11/2021

04:02 PM Bug #51581: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
The bug is triggered when scrubbing is not initiated on the first tick-timer after being requested. That happens if t... Ronen Friedman
03:55 PM Bug #51581 (In Progress): scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
Ronen Friedman
09:13 AM Bug #51626 (New): OSD uses all host memory (80g) on startup due to pg_split
After upgrading from 15.2.4 to 15.2.13 _some_ OSDs fails to start.
The OSDs which are failing to start seem to be...
Tor Martin Ølberg
08:41 AM Support #51609: OSD refuses to start (OOMK) due to pg split
Tor Martin Ølberg wrote:
> After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OS...
Tor Martin Ølberg

07/09/2021

10:32 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
reducing priority based on https://tracker.ceph.com/issues/45761#note-26 Neha Ojha
10:19 PM Bug #50659 (New): Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
Neha Ojha
01:04 PM Support #51609 (New): OSD refuses to start (OOMK) due to pg split
After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts... Tor Martin Ølberg
11:35 AM Backport #51605 (Resolved): pacific: bufferlist::splice() may cause stack corruption in bufferlis...
https://github.com/ceph/ceph/pull/42976 Backport Bot
11:35 AM Backport #51604 (Resolved): octopus: bufferlist::splice() may cause stack corruption in bufferlis...
https://github.com/ceph/ceph/pull/42975 Backport Bot
11:32 AM Bug #51419 (Pending Backport): bufferlist::splice() may cause stack corruption in bufferlist::reb...
Ilya Dryomov
09:18 AM Backport #51603 (In Progress): pacific: qa/standalone: Add missing teardowns at the end of a subs...
Sridhar Seshasayee
08:25 AM Backport #51603 (Resolved): pacific: qa/standalone: Add missing teardowns at the end of a subset ...
https://github.com/ceph/ceph/pull/42258 Backport Bot
08:24 AM Fix #51580 (Pending Backport): qa/standalone: Add missing teardowns at the end of a subset of osd...
Sridhar Seshasayee

07/08/2021

11:48 PM Backport #50748 (In Progress): pacific: max_misplaced was replaced by target_max_misplaced_ratio
https://github.com/ceph/ceph/pull/42250 Neha Ojha
09:46 PM Bug #50346 (Resolved): OSD crash FAILED ceph_assert(!is_scrubbing())
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
09:36 PM Backport #51453 (Resolved): pacific: Add simultaneous scrubs to rados/thrash
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42120
m...
Loïc Dachary
09:33 PM Backport #50791 (Resolved): pacific: osd: write_trunc omitted to clear data digest
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42019
m...
Loïc Dachary
06:37 PM Backport #51316 (Duplicate): nautilus: osd:scrub skip some pg
Konstantin Shalygin
05:16 PM Bug #51581: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
Looks like an issue with the test that was added in d6eb3e3a3c29a02d6c7c088ef7c8c668a872d16e. Ronen, can you please t... Neha Ojha
04:53 PM Bug #51581: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
/a/sage-2021-06-12_13:06:29-rados-master-distro-basic-smithi/6168272 Neha Ojha
02:43 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
I just wanted to note that I see the status is listed as "Need More Info", but I think I have provided everything I h... Andrew Davidoff
05:58 AM Backport #51583 (In Progress): nautilus: osd does not proactively remove leftover PGs
Mykola Golub
01:11 AM Backport #51583 (Resolved): nautilus: osd does not proactively remove leftover PGs
https://github.com/ceph/ceph/pull/42240 Backport Bot
05:48 AM Backport #51582 (In Progress): octopus: osd does not proactively remove leftover PGs
Mykola Golub
01:11 AM Backport #51582 (Resolved): octopus: osd does not proactively remove leftover PGs
https://github.com/ceph/ceph/pull/42239 Backport Bot
05:47 AM Backport #51584 (In Progress): pacific: osd does not proactively remove leftover PGs
Mykola Golub
01:11 AM Backport #51584 (Resolved): pacific: osd does not proactively remove leftover PGs
https://github.com/ceph/ceph/pull/42238 Backport Bot
05:39 AM Fix #51580 (Fix Under Review): qa/standalone: Add missing teardowns at the end of a subset of osd...
Sridhar Seshasayee
01:06 AM Bug #38931 (Pending Backport): osd does not proactively remove leftover PGs
Kefu Chai
12:27 AM Bug #51000: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
Ok, I'll take a look. Myoungwon Oh

07/07/2021

10:24 PM Bug #51581 (Resolved): scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
... Sridhar Seshasayee
10:18 PM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a/sseshasa-2021-07-07_19:22:19-rados:standalone-master-distro-basic-smithi/6258022 Sridhar Seshasayee
10:16 PM Bug #49961: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed
/a/sseshasa-2021-07-07_19:22:19-rados:standalone-master-distro-basic-smithi/6258018
/a/sseshasa-2021-07-14_10:37:09-...
Sridhar Seshasayee
09:49 PM Fix #51580 (Resolved): qa/standalone: Add missing teardowns at the end of a subset of osd and scr...
A subset of osd and scrub standalone tests are not properly cleaning up after
completion.

# osd/osd-force-cre...
Sridhar Seshasayee
05:44 PM Backport #51556 (In Progress): pacific: mon: return -EINVAL when handling unknown option in 'ceph...
Cory Snyder
08:20 AM Backport #51556 (Resolved): pacific: mon: return -EINVAL when handling unknown option in 'ceph os...
https://github.com/ceph/ceph/pull/42229 Backport Bot
05:17 PM Backport #51568 (In Progress): pacific: pool last_epoch_clean floor is stuck after pg merging
Cory Snyder
01:26 PM Backport #51568 (Resolved): pacific: pool last_epoch_clean floor is stuck after pg merging
https://github.com/ceph/ceph/pull/42224 Backport Bot
05:13 PM Backport #51498 (In Progress): pacific: mgr spamming with repeated set pgp_num_actual while merging
Cory Snyder
05:11 PM Backport #51371 (Resolved): pacific: OSD crash FAILED ceph_assert(!is_scrubbing())
https://github.com/ceph/ceph/pull/41944 Cory Snyder
03:53 PM Bug #51000: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
Myoungwon Oh, can you please help take a look at this. Neha Ojha
03:47 PM Bug #51576 (New): qa/tasks/radosbench.py times out
... Neha Ojha
02:56 PM Backport #51549 (In Progress): pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafa...
https://github.com/ceph/ceph/pull/42211 Deepika Upadhyay
07:58 AM Backport #51549 (Resolved): pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana ...
https://github.com/ceph/ceph/pull/42211 Kefu Chai
02:36 PM Backport #51570 (In Progress): pacific: CommandCrashedError: Command crashed: 'mkdir -p -- /home/...
https://github.com/ceph/ceph/pull/42221 Neha Ojha
02:35 PM Backport #51570 (Resolved): pacific: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubu...
https://github.com/ceph/ceph/pull/42221 Backport Bot
02:30 PM Bug #50393 (Pending Backport): CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/ce...
... Neha Ojha
01:26 PM Backport #51569 (Resolved): octopus: pool last_epoch_clean floor is stuck after pg merging
https://github.com/ceph/ceph/pull/42837 Backport Bot
01:23 PM Bug #48212 (Pending Backport): poollast_epoch_clean floor is stuck after pg merging
Kefu Chai
11:02 AM Bug #50441 (Resolved): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Deepika Upadhyay
10:16 AM Bug #50441: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
moved temp to RADOS so that we can use backport scripts Deepika Upadhyay
10:15 AM Bug #50441 (Pending Backport): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Deepika Upadhyay
09:33 AM Bug #50441 (Resolved): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Sebastian Wagner
09:36 AM Bug #42884: OSDMapTest.CleanPGUpmaps failure
https://jenkins.ceph.com/job/ceph-pull-requests/78813/consoleFull#-4535647526733401c-e9d0-4737-9832-6594c5da0afa Deepika Upadhyay
08:30 AM Bug #45457 (Pending Backport): CEPH Graylog Logging Missing "host" Field
Kefu Chai
08:20 AM Backport #51555 (Resolved): octopus: mon: return -EINVAL when handling unknown option in 'ceph os...
https://github.com/ceph/ceph/pull/43266 Backport Bot
08:03 AM Backport #51553 (Resolved): pacific: rebuild-mondb hangs
https://github.com/ceph/ceph/pull/42411 Kefu Chai
08:02 AM Backport #51552 (Resolved): octopus: rebuild-mondb hangs
https://github.com/ceph/ceph/pull/43263 Kefu Chai
07:59 AM Backport #51551 (Rejected): octopus: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana ...
Kefu Chai

07/06/2021

05:38 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-07-02_17:35:44-rados-pacific-distro-basic-smithi/6249971 Neha Ojha
05:35 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
/a/yuriw-2021-07-02_17:35:44-rados-pacific-distro-basic-smithi/6250131 Neha Ojha
05:32 PM Bug #51000: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
similar failure seen in pacific... Neha Ojha
09:27 AM Bug #23565 (Fix Under Review): Inactive PGs don't seem to cause HEALTH_ERR
Dan van der Ster

07/05/2021

01:27 PM Bug #46847: Loss of placement information on OSD reboot
Last week we had a power outage affecting all OSD machines in a 14.2.20 cluster. A small percentage of PGs didn't act... Dan van der Ster
01:18 PM Bug #51527 (Resolved): Ceph osd crashed due to segfault
Hi everyone,
We have 9 osd nodes with 12 deamons for each node.
Ceph used for s3 objects and rbd images.
ceph ...
Evgeny Zakharov
12:06 PM Bug #48965: qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
/a/sseshasa-2021-07-05_10:18:42-rados:standalone-wip-test-stdalone-mclk-1-distro-basic-smithi/6253062 Sridhar Seshasayee
11:49 AM Bug #45761 (Need More Info): mon_thrasher: "Error ENXIO: mon unavailable" during sync_force comma...
Stopped Reproducing, please reopen if you hit another instance Deepika Upadhyay
11:47 AM Bug #48609 (Closed): osd/PGLog: don’t fast-forward can_rollback_to during merge_log if the log is...
Root cause resolved Deepika Upadhyay
11:46 AM Backport #51522 (Resolved): pacific: osd: Delay sending info to new backfill peer resetting last_...
https://github.com/ceph/ceph/pull/41136 Deepika Upadhyay
11:35 AM Backport #51522 (Resolved): pacific: osd: Delay sending info to new backfill peer resetting last_...
Deepika Upadhyay
11:45 AM Backport #51523: octopus: osd: Delay sending info to new backfill peer resetting last_backfill un...
https://github.com/ceph/ceph/pull/40593/ Deepika Upadhyay
11:35 AM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
Deepika Upadhyay
11:36 AM Backport #51525 (Rejected): octopus: osd: Delay sending info to new backfill peer resetting last_...
Backport Bot
11:35 AM Bug #48611: osd: Delay sending info to new backfill peer resetting last_backfill until backfill a...
since nautilus has reached EOL removed it Deepika Upadhyay
11:34 AM Bug #48611 (Pending Backport): osd: Delay sending info to new backfill peer resetting last_backfi...
Deepika Upadhyay
04:19 AM Bug #45457 (Fix Under Review): CEPH Graylog Logging Missing "host" Field
Kefu Chai

07/03/2021

06:16 AM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
Another OSD crash after scrub assert bug, log attached. corrupted rocskdb. Andrej Filipcic

07/02/2021

06:55 PM Bug #50866: osd: stat mismatch on objects
/ceph/teuthology-archive/pdonnell-2021-07-02_10:08:50-fs-wip-pdonnell-testing-20210701.192056-distro-basic-smithi/624... Patrick Donnelly
05:00 PM Backport #51498 (Resolved): pacific: mgr spamming with repeated set pgp_num_actual while merging
https://github.com/ceph/ceph/pull/42223 Backport Bot
05:00 PM Backport #51497 (Rejected): nautilus: mgr spamming with repeated set pgp_num_actual while merging
https://github.com/ceph/ceph/pull/43218 Backport Bot
05:00 PM Backport #51496 (Resolved): octopus: mgr spamming with repeated set pgp_num_actual while merging
https://github.com/ceph/ceph/pull/42420 Backport Bot
04:59 PM Bug #51433 (Pending Backport): mgr spamming with repeated set pgp_num_actual while merging
Kefu Chai

07/01/2021

09:39 PM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
Seems very similar to https://tracker.ceph.com/issues/50042#note-2 Neha Ojha
09:05 PM Bug #48417 (Duplicate): unfound EC objects in sepia's LRC after upgrade
Neha Ojha
06:57 PM Backport #51453: pacific: Add simultaneous scrubs to rados/thrash
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42120
merged
Yuri Weinstein
04:42 PM Bug #48212 (Fix Under Review): poollast_epoch_clean floor is stuck after pg merging
Dan van der Ster
02:13 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
Dan van der Ster wrote:
> I suspect the cause is that there's a leftover epoch value for the now-deleted PG in `epoc...
Dan van der Ster
01:28 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
I suspect the cause is that there's a leftover epoch value for the now-deleted PG in `epoch_by_pg` in `void LastEpoch... Dan van der Ster
04:38 PM Bug #38931 (Fix Under Review): osd does not proactively remove leftover PGs
Our customer reported a similar case, providing an easy way to reproduce the issue: if when purging a pg the osd is m... Mykola Golub
11:03 AM Fix #51464 (Fix Under Review): osd: Add mechanism to avoid running osd benchmark on osd init when...
Sridhar Seshasayee
08:54 AM Fix #51464 (Resolved): osd: Add mechanism to avoid running osd benchmark on osd init when using m...
The current behavior is to let the osd benchmark run on each osd
init, which is not necessary. If the underlying dev...
Sridhar Seshasayee
07:44 AM Bug #51463 (Resolved): blocked requests while stopping/starting OSDs
Hi,
we run into a lot of slow requests. (IO blocked for several seconds) while stopping or starting one or more OS...
Manuel Lausch
07:13 AM Bug #51419: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_aligned_size_a...
Initially triggered with fio when testing rbd persistent write-back cache in ssd mode:... Ilya Dryomov

06/30/2021

09:50 PM Bug #49894 (In Progress): set a non-zero default value for osd_client_message_cap
Neha Ojha wrote:
> Neha Ojha wrote:
> > The current default of 0 doesn't help and we've tried setting it to 5000 fo...
Neha Ojha
06:53 PM Backport #50790: octopus: osd: write_trunc omitted to clear data digest
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41620
merged
Yuri Weinstein
06:50 PM Backport #50791: pacific: osd: write_trunc omitted to clear data digest
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42019
merged
Yuri Weinstein
06:49 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
https://github.com/ceph/ceph/pull/41944 merged Yuri Weinstein
06:44 PM Bug #51457 (New): qa/standalone/scrub/osd-scrub-test.sh: TEST_interval_changes: date check failed
... Neha Ojha
04:44 PM Backport #51453 (In Progress): pacific: Add simultaneous scrubs to rados/thrash
Neha Ojha
04:30 PM Backport #51453 (Resolved): pacific: Add simultaneous scrubs to rados/thrash
https://github.com/ceph/ceph/pull/42120 Backport Bot
04:39 PM Bug #45868: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
/a/yuriw-2021-06-29_19:12:08-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6243653 Neha Ojha
04:35 PM Bug #51454 (New): Simultaneous OSD's crash with tp_osd_tp on rocksdb::MergingIterator::Next()
Ceph v14.2.15
Main use case is RGW.
Bucket indexes on SSD OSDs.
Majority of SSD OSD under bucket intexes are FileS...
Aleksandr Rudenko
04:30 PM Backport #51452 (Resolved): octopus: Add simultaneous scrubs to rados/thrash
https://github.com/ceph/ceph/pull/42422 Backport Bot
04:28 PM Bug #51451 (Resolved): Add simultaneous scrubs to rados/thrash
Motivated by https://tracker.ceph.com/issues/50346. Neha Ojha
09:18 AM Bug #51419 (Fix Under Review): bufferlist::splice() may cause stack corruption in bufferlist::reb...
Kefu Chai
06:55 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
http://qa-proxy.ceph.com/teuthology/ideepika-2021-06-30_04:28:06-rados-wip-yuri7-testing-2021-06-28-1224-octopus-dist... Deepika Upadhyay

06/29/2021

09:16 PM Bug #51433 (Fix Under Review): mgr spamming with repeated set pgp_num_actual while merging
Neha Ojha
08:33 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
While merging PGs our osdmaps are churning through ~2000 epochs per hour.
The osdmap diffs are empty:...
Dan van der Ster
08:03 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
The sequence looks a lit different this time.
/a/rfriedma-2021-06-26_19:32:15-rados-wip-ronenf-scrubs-config-distr...
Neha Ojha
05:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/yuriw-2021-06-28_17:32:48-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6239590 Neha Ojha
04:59 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-06-28_17:32:48-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6239575 Neha Ojha
08:13 AM Bug #48613 (Resolved): Reproduce https://tracker.ceph.com/issues/48417
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:12 AM Bug #49139 (Resolved): rados/perf: cosbench workloads hang forever
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:11 AM Bug #49988 (Resolved): Global Recovery Event never completes
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:11 AM Bug #50230 (Resolved): mon: spawn loop after mon reinstalled
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:10 AM Bug #50466 (Resolved): _delete_some additional unexpected onode list
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:10 AM Bug #50477 (Resolved): mon/MonClient: reset authenticate_err in _reopen_session()
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:08 AM Bug #50964 (Resolved): mon: slow ops due to osd_failure
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:03 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41874
m...
Loïc Dachary
08:03 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a//kchai-2021-06-27_13:33:07-rados-wip-kefu-testing-2021-06-27-1907-distro-basic-smithi/6238237 Kefu Chai
08:00 AM Backport #50987 (Resolved): octopus: unaligned access to member variables of crush_work_bucket
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41622
m...
Loïc Dachary
08:00 AM Backport #50796 (Resolved): octopus: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41621
m...
Loïc Dachary
07:56 AM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41922
m...
Loïc Dachary
07:56 AM Backport #50990: octopus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41618
m...
Loïc Dachary
07:53 AM Backport #50705 (Resolved): octopus: _delete_some additional unexpected onode list
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41623
m...
Loïc Dachary
07:53 AM Backport #50152 (Resolved): octopus: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41609
m...
Loïc Dachary
07:52 AM Backport #50750 (Resolved): octopus: max_misplaced was replaced by target_max_misplaced_ratio
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41624
m...
Loïc Dachary
07:38 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41971
m...
Loïc Dachary
07:37 AM Backport #50505 (Resolved): pacific: mon/MonClient: reset authenticate_err in _reopen_session()
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41019
m...
Loïc Dachary
07:37 AM Backport #50986 (Resolved): pacific: unaligned access to member variables of crush_work_bucket
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41983
m...
Loïc Dachary
07:36 AM Backport #50989 (Resolved): pacific: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41982
m...
Loïc Dachary
07:32 AM Backport #50797 (Resolved): pacific: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41768
m...
Loïc Dachary
07:28 AM Backport #51215: pacific: Global Recovery Event never completes
Nathan, would you be so kind as to add a link to this issue in https://github.com/ceph/ceph/pull/41872 ? Loïc Dachary
07:27 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
Loïc Dachary
07:19 AM Backport #50706 (Resolved): pacific: _delete_some additional unexpected onode list
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41680
m...
Loïc Dachary
05:40 AM Bug #51419 (Resolved): bufferlist::splice() may cause stack corruption in bufferlist::rebuild_ali...
*** stack smashing detected ***: terminated2073 IOPS][eta 02h:59m:36s]
--Type <RET> for more, q to quit, c to contin...
CONGMIN YIN

06/28/2021

07:29 PM Backport #50987: octopus: unaligned access to member variables of crush_work_bucket
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41622
merged
Yuri Weinstein
07:29 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged
Yuri Weinstein
04:25 PM Backport #50505: pacific: mon/MonClient: reset authenticate_err in _reopen_session()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41019
mergedReviewed-by: Kefu Chai <kchai@redhat.com>
Yuri Weinstein
06:54 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
I see a similar crash on quincy, suspect its seen when I try to add mons from 1 to 3 .
/]# ceph crash info 2021-06...
Tejas C

06/26/2021

02:27 PM Backport #50986: pacific: unaligned access to member variables of crush_work_bucket
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41983
merged
Yuri Weinstein
02:26 PM Backport #50989: pacific: mon: slow ops due to osd_failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41982
merged
Yuri Weinstein

06/25/2021

09:43 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
/a/yuriw-2021-06-24_16:54:31-rados-wip-yuri-testing-2021-06-24-0708-pacific-distro-basic-smithi/6190738 Neha Ojha
03:50 PM Backport #51371 (Resolved): pacific: OSD crash FAILED ceph_assert(!is_scrubbing())
https://github.com/ceph/ceph/pull/41944 Backport Bot
03:48 PM Bug #50346 (Pending Backport): OSD crash FAILED ceph_assert(!is_scrubbing())
Neha Ojha
06:48 AM Backport #50990 (Resolved): octopus: mon: slow ops due to osd_failure
Kefu Chai

06/24/2021

11:22 PM Backport #50791 (In Progress): pacific: osd: write_trunc omitted to clear data digest
Neha Ojha

06/23/2021

10:43 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Andrej Filipcic wrote:
> A related crash, happened when I disabled scrubbing:
>
> -1> 2021-06-14T11:17:15.373...
Neha Ojha
10:42 PM Bug #51338 (Duplicate): osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&g...
Originally reported in https://tracker.ceph.com/issues/50346#note-6... Neha Ojha
06:16 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged
Yuri Weinstein
03:31 PM Backport #50797: pacific: mon: spawn loop after mon reinstalled
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41768
merged
Yuri Weinstein
03:30 PM Bug #49988: Global Recovery Event never completes
https://github.com/ceph/ceph/pull/41872 merged Yuri Weinstein

06/22/2021

10:47 PM Backport #50986 (In Progress): pacific: unaligned access to member variables of crush_work_bucket
Neha Ojha
10:44 PM Backport #50989 (In Progress): pacific: mon: slow ops due to osd_failure
Neha Ojha
05:21 PM Backport #50705: octopus: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41623
merged
Yuri Weinstein
05:21 PM Backport #50152: octopus: Reproduce https://tracker.ceph.com/issues/48417
Dan van der Ster wrote:
> Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
> Copy...
Yuri Weinstein
11:23 AM Backport #51315 (In Progress): nautilus: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51315 (Resolved): nautilus: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41973 Mykola Golub
11:11 AM Backport #51314 (In Progress): octopus: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51314 (Resolved): octopus: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41972 Mykola Golub
10:56 AM Backport #51313 (In Progress): pacific: osd:scrub skip some pg
Mykola Golub
10:55 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
https://github.com/ceph/ceph/pull/41971 Mykola Golub
10:55 AM Backport #51316 (Duplicate): nautilus: osd:scrub skip some pg
Backport Bot
10:53 AM Bug #49487 (Pending Backport): osd:scrub skip some pg
Mykola Golub
10:28 AM Bug #50346 (Fix Under Review): OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen Friedman
06:59 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Andrej Filipcic wrote:
> On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especi...
玮文 胡

06/21/2021

08:50 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
FYI I tried with ceph/daemon-base:master-24e1f91-pacific-centos-8-x86_64 (the latest non-devel build at this time) ju... Andrew Davidoff
06:07 PM Bug #51307 (Resolved): LibRadosWatchNotify.Watch2Delete fails
... Sage Weil
03:40 PM Bug #51270: mon: stretch mode clusters do not sanely set default crush rules
Accidentally requested backports to Octopus/Nautilus, so nuking those. Greg Farnum
03:40 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
Accidental backport request Greg Farnum
03:40 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
Accidental backport request Greg Farnum

06/20/2021

11:58 AM Bug #50346 (In Progress): OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen Friedman

06/19/2021

03:00 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
https://github.com/ceph/ceph/pull/42909 Backport Bot
03:00 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
Backport Bot
03:00 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
Backport Bot
02:58 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
Kefu Chai
01:15 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
https://github.com/ceph/ceph/pull/46677 Backport Bot
01:12 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Kefu Chai
01:11 PM Bug #51234 (Resolved): LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
Kefu Chai
02:19 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
I think this one is related?
/ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-2021061...
Patrick Donnelly

06/18/2021

09:15 PM Bug #51083: Raw space filling up faster than used space
I don't have any ideas from the logs. Moving this back to RADOS. I doubt it has anything to do with CephFS. Patrick Donnelly
11:56 AM Bug #51083: Raw space filling up faster than used space
Patrick Donnelly wrote:
> Scrub is unlikely to help.
I came to the same conclusion after reading the documentatio...
Jan-Philipp Litza

06/17/2021

11:07 PM Bug #51083: Raw space filling up faster than used space
Jan-Philipp Litza wrote:
> Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems t...
Patrick Donnelly
09:13 PM Bug #51083: Raw space filling up faster than used space
Patrick: do you understand how upgrading the MDS daemons helped in this case? There is nothing in the osd/bluestore s... Neha Ojha
09:03 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
We definitely do not use cache tiering on any of our clusters. On the cluster above, we do use snapshots (via cephfs... Andras Pataki
08:48 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
It seems like you are using cache tiering, and there has been similar bugs reported like this. I don't understand why... Neha Ojha
09:01 PM Bug #51234 (Fix Under Review): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
Sage Weil
08:57 PM Bug #50842 (Need More Info): pacific: recovery does not complete because of rw_manager lock not ...
Neha Ojha
08:53 PM Backport #51269 (In Progress): octopus: rados/perf: cosbench workloads hang forever
Deepika Upadhyay
07:14 PM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
https://github.com/ceph/ceph/pull/41922 Deepika Upadhyay
08:42 PM Bug #51074 (Pending Backport): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with...
marking Pending Backport, needs to be included with https://github.com/ceph/ceph/pull/41731 Neha Ojha
08:40 PM Bug #51168 (Need More Info): ceph-osd state machine crash during peering process
Can you please attach the osd log for this crash? Neha Ojha
08:07 PM Bug #51270 (Fix Under Review): mon: stretch mode clusters do not sanely set default crush rules
Greg Farnum
08:03 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
If you do not specify a crush rule when creating a pool, the OSDMonitor picks the default one for you out of the conf... Greg Farnum
07:12 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
Deepika Upadhyay
02:32 PM Backport #51237: nautilus: rebuild-mondb hangs
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41874
merged
Yuri Weinstein

06/16/2021

10:40 PM Bug #51254 (New): deep-scrub stat mismatch on last PG in pool
In the past few weeks, we got inconsistent PGs in deep-scrub a few times, always on the very last PG in the pool:
...
Andras Pataki
07:25 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Deepika Upadhyay
07:22 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/ceph/teuthology-archive/yuriw-2021-06-14_19:20:57-rados-wip-yuri6-testing-2021-06-14-1106-octopus-distro-basic-smith... Deepika Upadhyay
06:48 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Deepika Upadhyay
02:49 PM Bug #51246 (New): error in open_pools_parallel: rados_write(0.obj) failed with error: -2
... Deepika Upadhyay
01:22 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
> Will this patch be released in 14.2.22?
yes the PR has been merged to the nautilus branch, so it will be in the ...
Dan van der Ster
12:59 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
We hit this bug yesterday in a nautilus 14.2.18 cluster.
All monitors went down and started crashing on restart.
...
Rob Haverkamp
02:18 AM Backport #51237 (In Progress): nautilus: rebuild-mondb hangs
Kefu Chai
02:16 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
https://github.com/ceph/ceph/pull/41874 Backport Bot
02:13 AM Bug #38219 (Pending Backport): rebuild-mondb hangs
Kefu Chai

06/15/2021

08:05 PM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Just to note:
IMO ceph-bluestore-tool crash is caused by a bag in AvlAllocator and is a duplicate of https://tracker...
Igor Fedotov
06:59 PM Backport #51215 (In Progress): pacific: Global Recovery Event never completes
Kamoltat (Junior) Sirivadhna
12:55 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
Backport PR https://github.com/ceph/ceph/pull/41872 Backport Bot
06:50 PM Backport #50706: pacific: _delete_some additional unexpected onode list
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41680
merged
Yuri Weinstein
06:47 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Neha: I did, but am afraid they are lost, the test was from https://pulpito.ceph.com/ideepika-2021-05-17_10:16:28-ra... Deepika Upadhyay
06:31 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
@Deepika, do you happen to have the logs saved somewhere? Neha Ojha
06:41 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
... Neha Ojha
06:30 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/osd-dispatch-... Neha Ojha
06:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Looks very similar... Neha Ojha
04:12 PM Backport #50750: octopus: max_misplaced was replaced by target_max_misplaced_ratio
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41624
merged
Yuri Weinstein
12:20 PM Bug #51223 (New): statfs: a cluster with filestore and bluestore OSD's will report bytes_used == ...
Cluster migrated from Luminous mixed bluestore+filestore OSD's to Nautilus 14.2.21
After last filestore OSD purged f...
Konstantin Shalygin
10:47 AM Bug #49677 (Resolved): debian ceph-common package post-inst clobbers ownership of cephadm log dirs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:47 AM Bug #49781 (Resolved): unittest_mempool.check_shard_select failed
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:44 AM Bug #50558 (Resolved): Data loss propagation after backfill
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
10:42 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41762
m...
Loïc Dachary
10:41 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
Loïc Dachary
10:36 AM Backport #50153 (Resolved): nautilus: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41611
m...
Loïc Dachary
10:36 AM Backport #49729 (Resolved): nautilus: debian ceph-common package post-inst clobbers ownership of ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40698
m...
Loïc Dachary
10:32 AM Backport #50988: nautilus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41519
m...
Loïc Dachary
09:05 AM Backport #50406: pacific: mon: new monitors may direct MMonJoin to a peon instead of the leader
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41131
m...
Loïc Dachary
09:04 AM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41130
m...
Loïc Dachary
09:04 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41320
m...
Loïc Dachary
09:03 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41236
m...
Loïc Dachary
09:03 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41125
m...
Loïc Dachary
09:02 AM Backport #49992 (Resolved): pacific: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40566
m...
Loïc Dachary
12:54 AM Bug #49988 (Pending Backport): Global Recovery Event never completes
Neha Ojha

06/14/2021

03:24 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
For now, we do not have a global flag, like `ceph osd set noout` for the pg autoscale feature. We have pool flags[1] ... Vikhyat Umrao
09:24 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())

A related crash, happened when I disabled scrubbing:
-1> 2021-06-14T11:17:15.373+0200 7fb9916f5700 -1 /home/...
Andrej Filipcic

06/13/2021

03:34 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
To prevent the user IO from being blocked, we took this action:
1. First, we queried the unfound objects. osd.951 ...
Dan van der Ster
01:36 PM Bug #51194 (New): PG recovery_unfound after scrub repair failed on primary
This comes from a mail I send to the ceph-users ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3... Dan van der Ster
03:30 PM Backport #51195 (Resolved): pacific: [rfe] increase osd_max_write_op_reply_len default value to 6...
https://github.com/ceph/ceph/pull/53470 Backport Bot
03:28 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
Kefu Chai
 

Also available in: Atom