Project

General

Profile

Activity

From 04/26/2022 to 05/25/2022

05/25/2022

08:37 PM Bug #55750: mon: slow request of very long time
Radoslaw Zarzynski wrote:
> Could you please provide an info on which version of Ceph this issue happened?
# ceph -...
yite gu
06:19 PM Bug #55750 (Need More Info): mon: slow request of very long time
Could you please provide an info on which version of Ceph this issue happened? Radoslaw Zarzynski
08:17 PM Bug #53895 (Resolved): Unable to format `ceph config dump` command output in yaml using `-f yaml`
Laura Flores
06:46 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
Not urgent, perhaps not low-hanging-fruit but still good as a training issue. Radoslaw Zarzynski
06:41 PM Bug #55726 (Need More Info): Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on c...
It would be really helpful to compare logs around @choose_acting@ from Nautilus vs Octopus. Radoslaw Zarzynski
06:32 PM Backport #55768 (Resolved): pacific: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
https://github.com/ceph/ceph/pull/46499 Backport Bot
06:32 PM Backport #55767 (Rejected): octopus: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
https://github.com/ceph/ceph/pull/46500 Backport Bot
06:28 PM Bug #45868 (Pending Backport): rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
Neha Ojha
06:27 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
Let me paste a Laura's comment from https://github.com/ceph/ceph/pull/45825:
> @NitzanMordhai perhaps similar logi...
Radoslaw Zarzynski
06:11 PM Bug #46847: Loss of placement information on OSD reboot
Notes from the bug scrub:
1. There is a theoretical way to enter backfill instead of recovery in such a scenario.
...
Radoslaw Zarzynski
05:57 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
https://tracker.ceph.com/issues/53685 shows the issue is not restricted just to @MOSDPGLog@. Radoslaw Zarzynski
05:56 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
The investigation doc: https://docs.google.com/document/d/1s-Vzv3yLTMSO8Hz_MHMg5ix1v53P4jlN6dX1L06yyls/edit#. Radoslaw Zarzynski
03:38 PM Backport #55743 (In Progress): octopus: "wait_for_recovery: failed before timeout expired" during...
Laura Flores
03:37 PM Backport #55745 (In Progress): pacific: "wait_for_recovery: failed before timeout expired" during...
Laura Flores
03:33 PM Backport #55744 (Resolved): quincy: "wait_for_recovery: failed before timeout expired" during thr...
Laura Flores
03:01 PM Backport #55624 (Resolved): quincy: Unable to format `ceph config dump` command output in yaml us...
Laura Flores
12:12 PM Feature #55764 (New): Adaptive mon_warn_pg_not_deep_scrubbed_ratio according to actual scrub thro...
This request comes from the Science Users Working Group https://pad.ceph.com/p/Ceph_Science_User_Group_20220524
Fo...
Dan van der Ster
06:54 AM Bug #55355: osd thread deadlock
... jianwei zhang
06:53 AM Bug #55355: osd thread deadlock
910583--wait-->910587: (gdb) t 32 wait MgrClient::lock (owner=910587)
910587--wait-->910429: (gdb) t 62 hold MgrClie...
jianwei zhang
03:43 AM Bug #55355: osd thread deadlock
thread-35 : holding AsyncMessenger::lock, waiting AsyncConnection::lock
thread-3: holding AsyncConnection::lock, wai...
jianwei zhang
03:12 AM Bug #55355: osd thread deadlock
... jianwei zhang
03:06 AM Bug #55355: osd thread deadlock
ceph v15.2.13
I found an almost identical stack waiting for a lock...
jianwei zhang

05/24/2022

03:51 PM Backport #55744 (In Progress): quincy: "wait_for_recovery: failed before timeout expired" during ...
Laura Flores
10:40 AM Bug #55662 (Rejected): EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop....
Nitzan Mordechai
10:39 AM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
The test needed osd_read_ec_check_for_errors to be set to true, when it is set, the EIO error is ignored and we can g... Nitzan Mordechai
03:04 AM Bug #55750: mon: slow request of very long time
It appears that this mon request has been completed,but it have no erase from ops_in_flight_sharded?
yite gu
02:47 AM Bug #55750: mon: slow request of very long time
... yite gu
02:45 AM Bug #55750 (Need More Info): mon: slow request of very long time
... yite gu
02:38 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
We are seeing this bug in Nautilus 14.2.15 to 14.2.22 replicated pool.
Two of our osds are stuck in a crash loop ...
Justin Mammarella

05/23/2022

11:56 PM Backport #55747 (Resolved): pacific: Support blocklisting a CIDR range
https://github.com/ceph/ceph/pull/46470 Backport Bot
11:56 PM Backport #55746 (Resolved): quincy: Support blocklisting a CIDR range
https://github.com/ceph/ceph/pull/46469 Backport Bot
11:52 PM Feature #53050: Support blocklisting a CIDR range
The Backport field was empty, therefore no backport tickets were created. Neha Ojha
11:32 PM Backport #55745 (Resolved): pacific: "wait_for_recovery: failed before timeout expired" during th...
https://github.com/ceph/ceph/pull/46391 Backport Bot
11:32 PM Backport #55744 (Resolved): quincy: "wait_for_recovery: failed before timeout expired" during thr...
https://github.com/ceph/ceph/pull/46384 Backport Bot
11:31 PM Backport #55743 (Resolved): octopus: "wait_for_recovery: failed before timeout expired" during th...
https://github.com/ceph/ceph/pull/46392 Backport Bot
11:26 PM Bug #51076 (Pending Backport): "wait_for_recovery: failed before timeout expired" during thrashos...
Neha Ojha
09:24 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
/a/yuriw-2022-05-19_18:50:25-rados-wip-yuri4-testing-2022-05-19-0831-quincy-distro-default-smithi/6841763
Descriptio...
Laura Flores
08:16 PM Backport #55742 (In Progress): quincy: monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao
07:55 PM Backport #55742 (Resolved): quincy: monitor cluster logs(ceph.log) appear empty until rotated
https://github.com/ceph/ceph/pull/46374 Backport Bot
07:50 PM Bug #55383 (Pending Backport): monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao

05/21/2022

11:45 PM Backport #55306 (In Progress): quincy: prometheus metrics shows incorrect ceph version for upgrad...
Adam King
11:41 PM Backport #55306: quincy: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
including this in https://github.com/ceph/ceph/pull/46360 Adam King

05/20/2022

03:33 PM Bug #55726: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NQKDCBJ2SH3DTUCMV6KU4T3EGKOSCGJV/ Ilya Dryomov
02:14 PM Bug #55726 (Need More Info): Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on c...
Hi
I observed high latencies and mount points hanging since Octopus release
and it's still observed on Pacific l...
Denis Polom
03:24 PM Bug #46847: Loss of placement information on OSD reboot
we also encounter the similar issue and when ecpool during rebalance; sometime (osd overload or pg peering crash), th... Yao Ning

05/19/2022

10:01 PM Bug #51076 (Fix Under Review): "wait_for_recovery: failed before timeout expired" during thrashos...
Laura Flores
09:19 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
Neha Ojha wrote:
> Laura Flores wrote:
> > /a/yuriw-2022-03-25_18:42:52-rados-wip-yuri7-testing-2022-03-24-1341-pac...
Laura Flores
01:55 PM Bug #55711 (Fix Under Review): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
Radoslaw Zarzynski
01:51 PM Bug #55711 (Resolved): mon: race condition between `mgr fail` and MgrMonitor::prepare_beacon()
https://gist.github.com/rzarzynski/25ac59c8422e9ad0b1710a765a77f19a#the-race-condition Radoslaw Zarzynski
06:01 AM Bug #55708 (Fix Under Review): Reducing 2 Monitors Causes Stray Daemon
Example of the problem:
Roles:
smithi001: mon.a
smithi002: mon.b
smithi070: mon.c
smithi100 : mon.d
smithi2...
Kamoltat (Junior) Sirivadhna
05:27 AM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
i used /qa/standalone/erasure-code/test-erasure-eio.sh, the test that failed is TEST_ec_object_attr_read_error when i... Nitzan Mordechai

05/18/2022

09:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832699 Laura Flores
08:56 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
/a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832711... Laura Flores
07:37 PM Bug #53485 (Fix Under Review): monstore: logm entries are not garbage collected
Neha Ojha
01:37 PM Bug #53485: monstore: logm entries are not garbage collected
PR https://github.com/ceph/ceph/pull/44511 Daniel Poelzleithner
06:26 PM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
Can you please add the test that helped you discover this issue? I believe the same test was passing with other EC pl... Neha Ojha
06:17 PM Bug #55407: quincy osd's fail to boot and crash
This looks like something new and unrelated to other crashes in this ticket, so created a new one: https://tracker.ce... Radoslaw Zarzynski
06:17 PM Bug #51858: octopus: rados/test_crash.sh failure
/a/nojha-2022-05-17_22:38:06-rados-wip-lrc-fix-pacific-distro-basic-smithi/6839177 Laura Flores
06:17 PM Bug #55698 (New): osd: segfault at boot up
In the https://tracker.ceph.com/issues/55407#note-14 an OSD crash during early boot up is reported:... Radoslaw Zarzynski
06:09 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
The common theme between these failures (this one and #47026) is @check()@ function of @qa/standalone/osd-backfill/os... Radoslaw Zarzynski
03:02 PM Bug #55695: Shutting down a monitor forces Paxos to restart and sometimes disregard subsequent co...
https://docs.google.com/document/d/1ucVz54vMlm26oiqQoqJ2upUPmiROd4AmwSwbVM_s2A0/edit# Kamoltat (Junior) Sirivadhna
03:01 PM Bug #55695 (Fix Under Review): Shutting down a monitor forces Paxos to restart and sometimes disr...
*Problem:*
mon.a
mon.b
mon.c
mon.d
mon.e
ceph -a stop mon.d
ceph mon remove d
.
.
mon.d is down...
Kamoltat (Junior) Sirivadhna

05/17/2022

11:21 PM Feature #55693 (Fix Under Review): Limit the Health Detail MSG log size in cluster logs
RHBZ# https://bugzilla.redhat.com/show_bug.cgi?id=2087527
Version-Release number of selected component (if applica...
Vikhyat Umrao
10:58 PM Backport #55513: quincy: mount.ceph fails to understand AAAA records from SRV record
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46113
merged
Yuri Weinstein
10:56 PM Backport #55280: quincy: mon/OSDMonitor: properly set last_force_op_resend in stretch mode
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45871
merged
Yuri Weinstein
09:17 PM Bug #53485: monstore: logm entries are not garbage collected
Sorry, forgot to add - we faced this issue on v15.2.13 and on v14.2.22 as well. Peter Razumovsky
09:17 PM Bug #53485: monstore: logm entries are not garbage collected
We observed this several times on a customer side. each out of 3 mon store.db rapidly growing, had tons of logm keys ... Peter Razumovsky
06:05 PM Backport #52077 (Resolved): octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Laura Flores
09:15 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
... jianwei zhang
01:38 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
... jianwei zhang

05/16/2022

05:07 PM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
Laura Flores
09:01 AM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
Step to reproduce the issue:
1. ceph osd getmap > osdmap.bin
2. ./bin/osdmaptool --test-map-pgs-dump --pool <pool...
dongdong tao
02:44 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
Laura Flores
02:41 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
I opened a new issue since a different test failed this time. The failure does look the same though, so maybe the one... Laura Flores
02:43 PM Bug #47026: osd-backfill-stats.sh fails in TEST_backfill_ec_down_all_out
This was originally tracked in #55559. A different test was affected (TEST_backfill_ec_prim_out), but the failure loo... Laura Flores
09:40 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
https://github.com/ceph/ceph/pull/46281
add codes for master branch
jianwei zhang
09:17 AM Bug #55407: quincy osd's fail to boot and crash
Hi! Today I've just build master again.
Rebuilt osd and on first boot:
...
-79> 2022-05-16T09:10:21.707+0000...
Gonzalo Aguilar Delgado
08:51 AM Bug #55669: osd: add log for pg peering and activiting complete
https://github.com/ceph/ceph/pull/46279 jianwei zhang
08:51 AM Bug #55669 (New): osd: add log for pg peering and activiting complete
... jianwei zhang
08:22 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
Since I was developing on ceph15.2.13, I did not adapt to the master.
If there is no problem with the review, consi...
jianwei zhang
08:21 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:16 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:07 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:06 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:05 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
The problem was first described at this link:
https://tracker.ceph.com/issues/54966?next_issue_id=54965
jianwei zhang
08:03 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
https://github.com/ceph/ceph/pull/46276 jianwei zhang
07:56 AM Bug #55668 (New): osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recover...
problem:... jianwei zhang
06:34 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
Manuel Lausch wrote:
> Hi Nitzan,
> I checked your patch on the current pacific branch.
>
> unfortunately I stil...
jianwei zhang
06:30 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
pull_request: https://github.com/ceph/ceph/pull/46273 jianwei zhang
04:09 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
For scenarios where a single node has both mon and osd, and the number of osd on a single node is large, or the mon e... jianwei zhang
03:12 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
My test result:... jianwei zhang
03:11 AM Bug #55665 (Fix Under Review): osd: osd_fast_fail_on_connection_refused will cause the mon to con...
The first issue is described at https://tracker.ceph.com/issues/55067
Problem Description:...
jianwei zhang
04:10 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
https://tracker.ceph.com/issues/55665?next_issue_id=55662 jianwei zhang

05/15/2022

06:23 AM Bug #55662 (Rejected): EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop....
... Nitzan Mordechai

05/13/2022

10:49 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
I reproduced the symptoms of this bug locally by incrementing the notify count before an eq check. The extra incremen... Laura Flores
09:29 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
My test result:... jianwei zhang

05/12/2022

11:10 PM Backport #55633 (In Progress): octopus: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/46253 Radoslaw Zarzynski
06:07 PM Backport #55633 (Rejected): octopus: ceph-osd takes all memory before oom on boot
Backport Bot
10:56 PM Backport #52077: octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45320
merged
Yuri Weinstein
10:42 PM Backport #55631 (In Progress): pacific: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/46252 Radoslaw Zarzynski
06:06 PM Backport #55631 (Resolved): pacific: ceph-osd takes all memory before oom on boot
Backport Bot
10:39 PM Backport #55632 (In Progress): quincy: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/46251 Radoslaw Zarzynski
06:06 PM Backport #55632 (Resolved): quincy: ceph-osd takes all memory before oom on boot
Backport Bot
06:52 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
Hello Laura! Is there a thing that makes you think this isn't a duplicate of #47026? Radoslaw Zarzynski
06:48 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
If more date is necessary, it might be worth no contact Richard Bateman who replicated something awfully similar to t... Radoslaw Zarzynski
06:29 PM Bug #55582: octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails because `rados_w...
Yet another in in the family of Watch / Notify ENOENT -> ENOTCONN bugs. Radoslaw Zarzynski
06:28 PM Bug #44229: monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early
... Radoslaw Zarzynski
06:25 PM Bug #44229 (New): monclient: _check_auth_rotating possible clock skew, rotating keys expired way ...
Perhaps this replicated in:
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28...
Radoslaw Zarzynski
06:28 PM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
I can't find @Degraded data redundancy@ in the mgr's log but I can find messages about expired cephx keys:... Radoslaw Zarzynski
06:09 PM Bug #52993: upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump" command
We haven't backported the fix for https://tracker.ceph.com/issues/51815 to Octopus (per Neha's explanation). Radoslaw Zarzynski
06:02 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
Hello! A note from a bug scrub:
1. This issue looks like being caused by a particular data stored in OSD which
2....
Radoslaw Zarzynski
09:33 AM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
... Tobias Urdin
05:53 PM Bug #48440 (Need More Info): log [ERR] : scrub mismatch
We would need to ensure the latest reoccurence is about the OSD scrub (we haven't seen too many mon scrubbing issues ... Radoslaw Zarzynski
05:45 PM Bug #53729 (Pending Backport): ceph-osd takes all memory before oom on boot
Neha Ojha
02:29 PM Backport #55624 (In Progress): quincy: Unable to format `ceph config dump` command output in yaml...
Laura Flores
02:26 PM Backport #55624 (Resolved): quincy: Unable to format `ceph config dump` command output in yaml us...
https://github.com/ceph/ceph/pull/46246 Laura Flores
02:25 PM Bug #53895 (Pending Backport): Unable to format `ceph config dump` command output in yaml using `...
Laura Flores
10:12 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
I think we should also use failure_pending queue like send_failures to avoid one osd sending target osd to mon multip... jianwei zhang
09:35 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
... jianwei zhang
06:08 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default

When a node is actively shut down for operation and maintenance,
the osd/mon/mds process on it will automatically...
jianwei zhang
05:55 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
Nitzan Mordechai wrote:
> jianwei zhang wrote:
> > osd_fast_shutdown(true)
> > osd_fast_shutdown_notify_mon(false)...
jianwei zhang

05/11/2022

08:59 PM Backport #54568: octopus: mon/MonCommands.h: target_size_ratio range is incorrect
Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/45398
merged
Yuri Weinstein
08:46 PM Backport #55012: octopus: librados: check latest osdmap on ENOENT in pool_reverse_lookup()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45587
merged
Yuri Weinstein
08:12 PM Backport #53550: octopus: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44260
merged
Yuri Weinstein
04:58 PM Backport #52078 (Resolved): pacific: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Laura Flores
04:58 PM Backport #55047 (Resolved): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.Manifest...
Laura Flores
04:58 PM Backport #55439 (Resolved): quincy: FAILED ceph_assert due to issue manifest API to the original ...
Laura Flores
04:56 PM Backport #54468 (Resolved): octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely cl...
Laura Flores
04:15 PM Backport #54468: octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the sn...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45324
merged
Yuri Weinstein
04:56 PM Backport #55074 (Resolved): octopus: osd: osd_fast_shutdown_notify_mon not quite right
Laura Flores
04:13 PM Backport #55074: octopus: osd: osd_fast_shutdown_notify_mon not quite right
Laura Flores wrote:
> https://github.com/ceph/ceph/pull/45655
merged
Yuri Weinstein
04:15 PM Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty
https://github.com/ceph/ceph/pull/45593 merged Yuri Weinstein
03:53 PM Bug #52993: upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump" command
Similar problem happened on a rados/singleton test for Octopus:
/a/yuriw-2022-04-26_20:58:55-rados-wip-yuri2-testi...
Laura Flores
05:22 AM Bug #48440: log [ERR] : scrub mismatch
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681... Nitzan Mordechai
05:20 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681... Nitzan Mordechai
05:18 AM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681... Nitzan Mordechai

05/10/2022

12:38 PM Backport #53971 (In Progress): octopus: BufferList.rebuild_aligned_size_and_memory failure
https://github.com/ceph/ceph/pull/46216 Radoslaw Zarzynski
12:32 PM Backport #53972 (In Progress): pacific: BufferList.rebuild_aligned_size_and_memory failure
https://github.com/ceph/ceph/pull/46215 Radoslaw Zarzynski
04:38 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
they did: https://tracker.ceph.com/issues/55074 Nitzan Mordechai
01:59 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
octopus: osd/OSD: osd_fast_shutdown_notify_mon not quite right #45655
https://github.com/ceph/ceph/pull/45655/commit...
jianwei zhang
04:31 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
jianwei zhang wrote:
> osd_fast_shutdown(true)
> osd_fast_shutdown_notify_mon(false)
> osd_mon_shutdown_timeout(5...
Nitzan Mordechai
12:55 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
osd_fast_shutdown(true)
osd_fast_shutdown_notify_mon(false)
osd_mon_shutdown_timeout(5s) --> cannot send MOSDMar...
jianwei zhang
12:49 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
... jianwei zhang
12:44 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
mon.a/c has millions of osd_failure (immediate+timeout). There should be messages forwarded by mon.c. jianwei zhang
12:41 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
ceph version: v15.2.13
I found a problem with the mon election, which should be related to it.
Test steps when ...
jianwei zhang
01:02 AM Bug #53328 (Duplicate): osd_fast_shutdown_notify_mon option should be true by default
Neha Ojha

05/09/2022

04:47 PM Bug #55582 (New): octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails because `r...
/a/lflores-2022-05-09_14:54:06-rados-wip-55077-octopus-distro-default-smithi/6828789... Laura Flores
04:10 PM Bug #48793: out of order op
@Neha @Ronen this Octopus failure looks a lot like this Tracker. Was the revised scrub code backported to Octopus, or... Laura Flores
03:37 PM Backport #55581 (Rejected): octopus: api_list: LibRadosList.EnumerateObjects and LibRadosList.Enu...
Backport Bot
03:35 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
/a/lflores-2022-05-04_18:59:38-rados-wip-55077-octopus-distro-default-smithi/6821227... Laura Flores
03:31 PM Bug #48899 (Pending Backport): api_list: LibRadosList.EnumerateObjects and LibRadosList.Enumerate...
/a/lflores-2022-05-04_18:59:38-rados-wip-55077-octopus-distro-default-smithi/6820998... Laura Flores
11:43 AM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
I just observed this issue once more and forgot to drop the info that a restart of an OSD actually resets this counte... Christian Rohmann
10:08 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
(upgrade and restart OSDs is probably more accurate wording). If I upgrade node #2 and OSD on node #1 would die with ... Tobias Urdin
10:07 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Always happens when you upgrade nodes, probably some timing issue with PGs going or flapping primary. I never have de... Tobias Urdin
10:06 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, Obj... Tobias Urdin
07:32 AM Bug #55573: stretch mode: be more sane about changing different size/min_size
Realized my suggestion/formula in the mailing list wasn't good :)
This is what I intended originally:
- degraded ...
Eneko Lacunza

05/06/2022

04:46 PM Bug #55573 (New): stretch mode: be more sane about changing different size/min_size
From the mailing list:
I created 2 aditional pools each with a matching stretch rule:
- size=2/min=1 (not advised...
Greg Farnum
01:01 AM Bug #55549: OSDs crashing
After days of fighting this (it's on a production cluster) I finally gave up on the least important of the pools -- t... Richard Bateman

05/05/2022

04:23 PM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
This is from the 16.2.8 run.
/a/yuriw-2022-05-04_20:09:21-rados-pacific-distro-default-smithi/6821705...
Laura Flores
03:22 PM Backport #55439: quincy: FAILED ceph_assert due to issue manifest API to the original object
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46061
merged
Yuri Weinstein
03:20 PM Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45624
merged
Yuri Weinstein
03:13 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
/a/yuriw-2022-04-28_14:23:18-rados-wip-yuri-testing-2022-04-27-1456-quincy-distro-default-smithi/6811107... Laura Flores
01:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
Can I kindly ask if there's an estimate when this will be fixed and backported? We have customers that have been in t... Ruben Kerkhof

05/04/2022

10:20 PM Bug #55549 (Resolved): OSDs crashing
My apologies if this is the wrong project; I'm so lost on this particular issue that I'm not even sure where to ask f... Richard Bateman
08:11 PM Bug #55407: quincy osd's fail to boot and crash
It seems I messed up everything... Let me startover.
I have a ceph cluster running since looooong time ago. Recent...
Gonzalo Aguilar Delgado
05:56 PM Bug #55407: quincy osd's fail to boot and crash
Gonzalo Aguilar Delgado wrote:
> It doesn't matter. This is just a side effect. I mean... The bug is not caused by t...
Neha Ojha
05:52 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
I think the lack of @-2@ (@ENOENT@) **might** be caused the errno normalization @Objecter@ has. Radoslaw Zarzynski
07:42 AM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
I hit another issue when we have socket failure injection active when running the tests. I think this is not only the... Nitzan Mordechai
05:43 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
To judge how severe the problem really is we need the information whether the stall is permanent (PG gets stuck and t... Radoslaw Zarzynski
01:35 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
"These PG_AVAILBILITY warnings are frequently seen with snap-schedule teuthology jobs.":https://pulpito.ceph.com/mcha... Milind Changire
05:37 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Just the record: we suspect the issue is related to the error injection in async-msgr. Some runs without them are sup... Radoslaw Zarzynski
11:30 AM Backport #55543 (Resolved): quincy: should use TCMalloc for better performance
https://github.com/ceph/ceph/pull/47927 Backport Bot
11:30 AM Backport #55542 (Rejected): octopus: should use TCMalloc for better performance
Backport Bot
11:30 AM Backport #55541 (Rejected): pacific: should use TCMalloc for better performance
https://github.com/ceph/ceph/pull/51282 Backport Bot
11:29 AM Bug #55519 (Pending Backport): should use TCMalloc for better performance
Kefu Chai
06:13 AM Documentation #46120 (Resolved): Improve ceph-objectstore-tool documentation
This issue has been resolved. The ceph-objectstore-tool documentation now exists, and there's even a good manpage.
...
Zac Dover

05/03/2022

07:48 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/46120
Thanks for looking into it and creating the backport!
Neha Ojha
02:18 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
https://github.com/ceph/ceph/pull/46120 Myoungwon Oh
01:40 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
I think this is the same issue as https://tracker.ceph.com/issues/50806.
This issue was already fixed, but not backp...
Myoungwon Oh
01:07 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
Sure. Myoungwon Oh
07:46 PM Backport #50893 (In Progress): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recover...
Neha Ojha
06:26 PM Backport #55019: octopus: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty
Christian Rohmann wrote:
> Sorry for being a nag ... I initially reported https://tracker.ceph.com/issues/53663 and ...
Vikhyat Umrao
04:41 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
玮文 胡 wrote:
> Maybe we should fix the release note(https://docs.ceph.com/en/latest/releases/quincy/) first? The work...
Vikhyat Umrao
04:32 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
玮文 胡 wrote:
> https://github.com/ceph/ceph/pull/46124
>
> Tested locally with
>
> [...]
Thank you.
Vikhyat Umrao
04:27 PM Bug #55383 (Fix Under Review): monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao
09:55 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
Maybe we should fix the release note(https://docs.ceph.com/en/latest/releases/quincy/) first? The workaround there is... 玮文 胡
09:44 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
https://github.com/ceph/ceph/pull/46124
Tested locally with...
玮文 胡
03:49 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Ran 100 thrash-erasure-code-big tests in octopus, and the `wait_for_recovery` assertion occurred 18/100 times, with 1... Laura Flores
11:19 AM Bug #55407: quincy osd's fail to boot and crash
It doesn't matter. This is just a side effect. I mean... The bug is not caused by the tool.
The bug is caused becau...
Gonzalo Aguilar Delgado
07:23 AM Bug #55519 (Fix Under Review): should use TCMalloc for better performance
Kefu Chai
07:22 AM Bug #55519 (Resolved): should use TCMalloc for better performance
we had been using TCMalloc in older releases. but somehow, we stopped doing so. let's bring it back. Kefu Chai

05/02/2022

06:50 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
To me looks like this is the problem?
https://github.com/ceph/ceph/commit/7c84e06e6f846f6b4b6fd959218b4d474520f429...
Vikhyat Umrao
05:29 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
Myoungwon Oh: Seeing this in pacific as well, can you confirm if it is the same issue?
/a/yuriw-2022-04-30_17:01:...
Neha Ojha
02:12 PM Bug #53789 (In Progress): CommandFailedError (rados/test_python.sh): "RADOS object not found" cau...
Nitzan Mordechai
01:11 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Scheduled another run with just the rados/verify test that failed and I can see this happen frequently:
/a/amathu...
Aishwarya Mathuria
01:07 PM Backport #55513 (In Progress): quincy: mount.ceph fails to understand AAAA records from SRV record
Matan Breizman
12:57 PM Backport #55513 (Resolved): quincy: mount.ceph fails to understand AAAA records from SRV record
https://github.com/ceph/ceph/pull/46113 Backport Bot
01:03 PM Backport #55514 (In Progress): pacific: mount.ceph fails to understand AAAA records from SRV record
Matan Breizman
12:57 PM Backport #55514 (Resolved): pacific: mount.ceph fails to understand AAAA records from SRV record
https://github.com/ceph/ceph/pull/46112 Backport Bot
12:52 PM Bug #47300 (Pending Backport): mount.ceph fails to understand AAAA records from SRV record
Matan Breizman
08:34 AM Bug #47300 (Resolved): mount.ceph fails to understand AAAA records from SRV record
Kefu Chai

05/01/2022

05:40 AM Bug #43887 (Fix Under Review): ceph_test_rados_delete_pools_parallel failure
https://github.com/ceph/ceph/pull/46099 Nitzan Mordechai

04/28/2022

09:29 PM Bug #55488 (New): ENOENT on clone on EC non-primary shard
... Neha Ojha
06:59 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2022-04-27_02:52:22-rados-pacific-distro-default-smithi/6807766... Laura Flores

04/27/2022

09:51 PM Backport #55439 (In Progress): quincy: FAILED ceph_assert due to issue manifest API to the origin...
Laura Flores
09:25 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Laura Flores wrote:
> This one looks somewhat different from the other reported failures. First of all, it failed on...
Laura Flores
05:37 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Let's discuss this on the next RADOS Team Meeting. Radoslaw Zarzynski
06:07 PM Bug #55424 (Won't Fix): ceph-mon process exit in dead status , which backtrace displayed has bloc...
Sorry, the version is EOL :-(. Radoslaw Zarzynski
06:06 PM Bug #55419 (Resolved): cephtool/test.sh: failure on blocklist testing
Neha Ojha
05:58 PM Bug #55440: osd-scrub-test.sh: TEST_scrub_test failed due to inconsistent PG
... Neha Ojha
05:56 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
Laura Flores wrote:
> /a/yuriw-2022-04-26_00:11:14-rados-wip-55324-pacific-backport-distro-default-smithi/6805265/re...
Neha Ojha
05:49 PM Bug #55407: quincy osd's fail to boot and crash
Hello Gonzalo!
Just a quick note from a bug srub: we don't support mixing the tool from a newer release with OSDs fr...
Radoslaw Zarzynski
05:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
This was discussed in the rados meeting this week. Laura is trying to check if the bug exists in Octopus or not, to h... Neha Ojha
05:34 PM Bug #55433 (Closed): common: FAILED ceph_assert(((lock).is_locked()))
The fix will be merged with the original PR. Neha Ojha
10:24 AM Bug #47300 (Fix Under Review): mount.ceph fails to understand AAAA records from SRV record
Matan Breizman

04/26/2022

03:58 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
This one looks somewhat different from the other reported failures. First of all, it failed on a rados/verify test, n... Laura Flores
02:33 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
/a/yuriw-2022-04-26_00:11:14-rados-wip-55324-pacific-backport-distro-default-smithi/6805265/remote/smithi061/crash/20... Laura Flores
10:27 AM Bug #55450 (Resolved): [DOC] stretch_rule defined in the doc needs updation
in section [1], the stretch_rule defined to be added to the crush map needs to be updated.
min size and max size par...
Pawan Dhiran
 

Also available in: Atom