Project

General

Profile

Activity

From 04/24/2022 to 05/23/2022

05/23/2022

11:56 PM Backport #55747 (Resolved): pacific: Support blocklisting a CIDR range
https://github.com/ceph/ceph/pull/46470 Backport Bot
11:56 PM Backport #55746 (Resolved): quincy: Support blocklisting a CIDR range
https://github.com/ceph/ceph/pull/46469 Backport Bot
11:52 PM Feature #53050: Support blocklisting a CIDR range
The Backport field was empty, therefore no backport tickets were created. Neha Ojha
11:32 PM Backport #55745 (Resolved): pacific: "wait_for_recovery: failed before timeout expired" during th...
https://github.com/ceph/ceph/pull/46391 Backport Bot
11:32 PM Backport #55744 (Resolved): quincy: "wait_for_recovery: failed before timeout expired" during thr...
https://github.com/ceph/ceph/pull/46384 Backport Bot
11:31 PM Backport #55743 (Resolved): octopus: "wait_for_recovery: failed before timeout expired" during th...
https://github.com/ceph/ceph/pull/46392 Backport Bot
11:26 PM Bug #51076 (Pending Backport): "wait_for_recovery: failed before timeout expired" during thrashos...
Neha Ojha
09:24 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
/a/yuriw-2022-05-19_18:50:25-rados-wip-yuri4-testing-2022-05-19-0831-quincy-distro-default-smithi/6841763
Descriptio...
Laura Flores
08:16 PM Backport #55742 (In Progress): quincy: monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao
07:55 PM Backport #55742 (Resolved): quincy: monitor cluster logs(ceph.log) appear empty until rotated
https://github.com/ceph/ceph/pull/46374 Backport Bot
07:50 PM Bug #55383 (Pending Backport): monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao

05/21/2022

11:45 PM Backport #55306 (In Progress): quincy: prometheus metrics shows incorrect ceph version for upgrad...
Adam King
11:41 PM Backport #55306: quincy: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
including this in https://github.com/ceph/ceph/pull/46360 Adam King

05/20/2022

03:33 PM Bug #55726: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NQKDCBJ2SH3DTUCMV6KU4T3EGKOSCGJV/ Ilya Dryomov
02:14 PM Bug #55726 (Need More Info): Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on c...
Hi
I observed high latencies and mount points hanging since Octopus release
and it's still observed on Pacific l...
Denis Polom
03:24 PM Bug #46847: Loss of placement information on OSD reboot
we also encounter the similar issue and when ecpool during rebalance; sometime (osd overload or pg peering crash), th... Yao Ning

05/19/2022

10:01 PM Bug #51076 (Fix Under Review): "wait_for_recovery: failed before timeout expired" during thrashos...
Laura Flores
09:19 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
Neha Ojha wrote:
> Laura Flores wrote:
> > /a/yuriw-2022-03-25_18:42:52-rados-wip-yuri7-testing-2022-03-24-1341-pac...
Laura Flores
01:55 PM Bug #55711 (Fix Under Review): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
Radoslaw Zarzynski
01:51 PM Bug #55711 (Resolved): mon: race condition between `mgr fail` and MgrMonitor::prepare_beacon()
https://gist.github.com/rzarzynski/25ac59c8422e9ad0b1710a765a77f19a#the-race-condition Radoslaw Zarzynski
06:01 AM Bug #55708 (Fix Under Review): Reducing 2 Monitors Causes Stray Daemon
Example of the problem:
Roles:
smithi001: mon.a
smithi002: mon.b
smithi070: mon.c
smithi100 : mon.d
smithi2...
Kamoltat (Junior) Sirivadhna
05:27 AM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
i used /qa/standalone/erasure-code/test-erasure-eio.sh, the test that failed is TEST_ec_object_attr_read_error when i... Nitzan Mordechai

05/18/2022

09:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832699 Laura Flores
08:56 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
/a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832711... Laura Flores
07:37 PM Bug #53485 (Fix Under Review): monstore: logm entries are not garbage collected
Neha Ojha
01:37 PM Bug #53485: monstore: logm entries are not garbage collected
PR https://github.com/ceph/ceph/pull/44511 Daniel Poelzleithner
06:26 PM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
Can you please add the test that helped you discover this issue? I believe the same test was passing with other EC pl... Neha Ojha
06:17 PM Bug #55407: quincy osd's fail to boot and crash
This looks like something new and unrelated to other crashes in this ticket, so created a new one: https://tracker.ce... Radoslaw Zarzynski
06:17 PM Bug #51858: octopus: rados/test_crash.sh failure
/a/nojha-2022-05-17_22:38:06-rados-wip-lrc-fix-pacific-distro-basic-smithi/6839177 Laura Flores
06:17 PM Bug #55698 (New): osd: segfault at boot up
In the https://tracker.ceph.com/issues/55407#note-14 an OSD crash during early boot up is reported:... Radoslaw Zarzynski
06:09 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
The common theme between these failures (this one and #47026) is @check()@ function of @qa/standalone/osd-backfill/os... Radoslaw Zarzynski
03:02 PM Bug #55695: Shutting down a monitor forces Paxos to restart and sometimes disregard subsequent co...
https://docs.google.com/document/d/1ucVz54vMlm26oiqQoqJ2upUPmiROd4AmwSwbVM_s2A0/edit# Kamoltat (Junior) Sirivadhna
03:01 PM Bug #55695 (Fix Under Review): Shutting down a monitor forces Paxos to restart and sometimes disr...
*Problem:*
mon.a
mon.b
mon.c
mon.d
mon.e
ceph -a stop mon.d
ceph mon remove d
.
.
mon.d is down...
Kamoltat (Junior) Sirivadhna

05/17/2022

11:21 PM Feature #55693 (Fix Under Review): Limit the Health Detail MSG log size in cluster logs
RHBZ# https://bugzilla.redhat.com/show_bug.cgi?id=2087527
Version-Release number of selected component (if applica...
Vikhyat Umrao
10:58 PM Backport #55513: quincy: mount.ceph fails to understand AAAA records from SRV record
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46113
merged
Yuri Weinstein
10:56 PM Backport #55280: quincy: mon/OSDMonitor: properly set last_force_op_resend in stretch mode
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45871
merged
Yuri Weinstein
09:17 PM Bug #53485: monstore: logm entries are not garbage collected
Sorry, forgot to add - we faced this issue on v15.2.13 and on v14.2.22 as well. Peter Razumovsky
09:17 PM Bug #53485: monstore: logm entries are not garbage collected
We observed this several times on a customer side. each out of 3 mon store.db rapidly growing, had tons of logm keys ... Peter Razumovsky
06:05 PM Backport #52077 (Resolved): octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Laura Flores
09:15 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
... jianwei zhang
01:38 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
... jianwei zhang

05/16/2022

05:07 PM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
Laura Flores
09:01 AM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
Step to reproduce the issue:
1. ceph osd getmap > osdmap.bin
2. ./bin/osdmaptool --test-map-pgs-dump --pool <pool...
dongdong tao
02:44 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
Laura Flores
02:41 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
I opened a new issue since a different test failed this time. The failure does look the same though, so maybe the one... Laura Flores
02:43 PM Bug #47026: osd-backfill-stats.sh fails in TEST_backfill_ec_down_all_out
This was originally tracked in #55559. A different test was affected (TEST_backfill_ec_prim_out), but the failure loo... Laura Flores
09:40 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
https://github.com/ceph/ceph/pull/46281
add codes for master branch
jianwei zhang
09:17 AM Bug #55407: quincy osd's fail to boot and crash
Hi! Today I've just build master again.
Rebuilt osd and on first boot:
...
-79> 2022-05-16T09:10:21.707+0000...
Gonzalo Aguilar Delgado
08:51 AM Bug #55669: osd: add log for pg peering and activiting complete
https://github.com/ceph/ceph/pull/46279 jianwei zhang
08:51 AM Bug #55669 (New): osd: add log for pg peering and activiting complete
... jianwei zhang
08:22 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
Since I was developing on ceph15.2.13, I did not adapt to the master.
If there is no problem with the review, consi...
jianwei zhang
08:21 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:16 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:07 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:06 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
... jianwei zhang
08:05 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
The problem was first described at this link:
https://tracker.ceph.com/issues/54966?next_issue_id=54965
jianwei zhang
08:03 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
https://github.com/ceph/ceph/pull/46276 jianwei zhang
07:56 AM Bug #55668 (New): osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recover...
problem:... jianwei zhang
06:34 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
Manuel Lausch wrote:
> Hi Nitzan,
> I checked your patch on the current pacific branch.
>
> unfortunately I stil...
jianwei zhang
06:30 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
pull_request: https://github.com/ceph/ceph/pull/46273 jianwei zhang
04:09 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
For scenarios where a single node has both mon and osd, and the number of osd on a single node is large, or the mon e... jianwei zhang
03:12 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
My test result:... jianwei zhang
03:11 AM Bug #55665 (Fix Under Review): osd: osd_fast_fail_on_connection_refused will cause the mon to con...
The first issue is described at https://tracker.ceph.com/issues/55067
Problem Description:...
jianwei zhang
04:10 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
https://tracker.ceph.com/issues/55665?next_issue_id=55662 jianwei zhang

05/15/2022

06:23 AM Bug #55662 (Rejected): EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop....
... Nitzan Mordechai

05/13/2022

10:49 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
I reproduced the symptoms of this bug locally by incrementing the notify count before an eq check. The extra incremen... Laura Flores
09:29 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
My test result:... jianwei zhang

05/12/2022

11:10 PM Backport #55633 (In Progress): octopus: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/46253 Radoslaw Zarzynski
06:07 PM Backport #55633 (Rejected): octopus: ceph-osd takes all memory before oom on boot
Backport Bot
10:56 PM Backport #52077: octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45320
merged
Yuri Weinstein
10:42 PM Backport #55631 (In Progress): pacific: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/46252 Radoslaw Zarzynski
06:06 PM Backport #55631 (Resolved): pacific: ceph-osd takes all memory before oom on boot
Backport Bot
10:39 PM Backport #55632 (In Progress): quincy: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/46251 Radoslaw Zarzynski
06:06 PM Backport #55632 (Resolved): quincy: ceph-osd takes all memory before oom on boot
Backport Bot
06:52 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
Hello Laura! Is there a thing that makes you think this isn't a duplicate of #47026? Radoslaw Zarzynski
06:48 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
If more date is necessary, it might be worth no contact Richard Bateman who replicated something awfully similar to t... Radoslaw Zarzynski
06:29 PM Bug #55582: octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails because `rados_w...
Yet another in in the family of Watch / Notify ENOENT -> ENOTCONN bugs. Radoslaw Zarzynski
06:28 PM Bug #44229: monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early
... Radoslaw Zarzynski
06:25 PM Bug #44229 (New): monclient: _check_auth_rotating possible clock skew, rotating keys expired way ...
Perhaps this replicated in:
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28...
Radoslaw Zarzynski
06:28 PM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
I can't find @Degraded data redundancy@ in the mgr's log but I can find messages about expired cephx keys:... Radoslaw Zarzynski
06:09 PM Bug #52993: upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump" command
We haven't backported the fix for https://tracker.ceph.com/issues/51815 to Octopus (per Neha's explanation). Radoslaw Zarzynski
06:02 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
Hello! A note from a bug scrub:
1. This issue looks like being caused by a particular data stored in OSD which
2....
Radoslaw Zarzynski
09:33 AM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
... Tobias Urdin
05:53 PM Bug #48440 (Need More Info): log [ERR] : scrub mismatch
We would need to ensure the latest reoccurence is about the OSD scrub (we haven't seen too many mon scrubbing issues ... Radoslaw Zarzynski
05:45 PM Bug #53729 (Pending Backport): ceph-osd takes all memory before oom on boot
Neha Ojha
02:29 PM Backport #55624 (In Progress): quincy: Unable to format `ceph config dump` command output in yaml...
Laura Flores
02:26 PM Backport #55624 (Resolved): quincy: Unable to format `ceph config dump` command output in yaml us...
https://github.com/ceph/ceph/pull/46246 Laura Flores
02:25 PM Bug #53895 (Pending Backport): Unable to format `ceph config dump` command output in yaml using `...
Laura Flores
10:12 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
I think we should also use failure_pending queue like send_failures to avoid one osd sending target osd to mon multip... jianwei zhang
09:35 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
... jianwei zhang
06:08 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default

When a node is actively shut down for operation and maintenance,
the osd/mon/mds process on it will automatically...
jianwei zhang
05:55 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
Nitzan Mordechai wrote:
> jianwei zhang wrote:
> > osd_fast_shutdown(true)
> > osd_fast_shutdown_notify_mon(false)...
jianwei zhang

05/11/2022

08:59 PM Backport #54568: octopus: mon/MonCommands.h: target_size_ratio range is incorrect
Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/45398
merged
Yuri Weinstein
08:46 PM Backport #55012: octopus: librados: check latest osdmap on ENOENT in pool_reverse_lookup()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45587
merged
Yuri Weinstein
08:12 PM Backport #53550: octopus: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44260
merged
Yuri Weinstein
04:58 PM Backport #52078 (Resolved): pacific: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Laura Flores
04:58 PM Backport #55047 (Resolved): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.Manifest...
Laura Flores
04:58 PM Backport #55439 (Resolved): quincy: FAILED ceph_assert due to issue manifest API to the original ...
Laura Flores
04:56 PM Backport #54468 (Resolved): octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely cl...
Laura Flores
04:15 PM Backport #54468: octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the sn...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45324
merged
Yuri Weinstein
04:56 PM Backport #55074 (Resolved): octopus: osd: osd_fast_shutdown_notify_mon not quite right
Laura Flores
04:13 PM Backport #55074: octopus: osd: osd_fast_shutdown_notify_mon not quite right
Laura Flores wrote:
> https://github.com/ceph/ceph/pull/45655
merged
Yuri Weinstein
04:15 PM Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty
https://github.com/ceph/ceph/pull/45593 merged Yuri Weinstein
03:53 PM Bug #52993: upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump" command
Similar problem happened on a rados/singleton test for Octopus:
/a/yuriw-2022-04-26_20:58:55-rados-wip-yuri2-testi...
Laura Flores
05:22 AM Bug #48440: log [ERR] : scrub mismatch
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681... Nitzan Mordechai
05:20 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681... Nitzan Mordechai
05:18 AM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681... Nitzan Mordechai

05/10/2022

12:38 PM Backport #53971 (In Progress): octopus: BufferList.rebuild_aligned_size_and_memory failure
https://github.com/ceph/ceph/pull/46216 Radoslaw Zarzynski
12:32 PM Backport #53972 (In Progress): pacific: BufferList.rebuild_aligned_size_and_memory failure
https://github.com/ceph/ceph/pull/46215 Radoslaw Zarzynski
04:38 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
they did: https://tracker.ceph.com/issues/55074 Nitzan Mordechai
01:59 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
octopus: osd/OSD: osd_fast_shutdown_notify_mon not quite right #45655
https://github.com/ceph/ceph/pull/45655/commit...
jianwei zhang
04:31 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
jianwei zhang wrote:
> osd_fast_shutdown(true)
> osd_fast_shutdown_notify_mon(false)
> osd_mon_shutdown_timeout(5...
Nitzan Mordechai
12:55 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
osd_fast_shutdown(true)
osd_fast_shutdown_notify_mon(false)
osd_mon_shutdown_timeout(5s) --> cannot send MOSDMar...
jianwei zhang
12:49 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
... jianwei zhang
12:44 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
mon.a/c has millions of osd_failure (immediate+timeout). There should be messages forwarded by mon.c. jianwei zhang
12:41 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
ceph version: v15.2.13
I found a problem with the mon election, which should be related to it.
Test steps when ...
jianwei zhang
01:02 AM Bug #53328 (Duplicate): osd_fast_shutdown_notify_mon option should be true by default
Neha Ojha

05/09/2022

04:47 PM Bug #55582 (New): octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails because `r...
/a/lflores-2022-05-09_14:54:06-rados-wip-55077-octopus-distro-default-smithi/6828789... Laura Flores
04:10 PM Bug #48793: out of order op
@Neha @Ronen this Octopus failure looks a lot like this Tracker. Was the revised scrub code backported to Octopus, or... Laura Flores
03:37 PM Backport #55581 (Rejected): octopus: api_list: LibRadosList.EnumerateObjects and LibRadosList.Enu...
Backport Bot
03:35 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
/a/lflores-2022-05-04_18:59:38-rados-wip-55077-octopus-distro-default-smithi/6821227... Laura Flores
03:31 PM Bug #48899 (Pending Backport): api_list: LibRadosList.EnumerateObjects and LibRadosList.Enumerate...
/a/lflores-2022-05-04_18:59:38-rados-wip-55077-octopus-distro-default-smithi/6820998... Laura Flores
11:43 AM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
I just observed this issue once more and forgot to drop the info that a restart of an OSD actually resets this counte... Christian Rohmann
10:08 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
(upgrade and restart OSDs is probably more accurate wording). If I upgrade node #2 and OSD on node #1 would die with ... Tobias Urdin
10:07 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Always happens when you upgrade nodes, probably some timing issue with PGs going or flapping primary. I never have de... Tobias Urdin
10:06 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, Obj... Tobias Urdin
07:32 AM Bug #55573: stretch mode: be more sane about changing different size/min_size
Realized my suggestion/formula in the mailing list wasn't good :)
This is what I intended originally:
- degraded ...
Eneko Lacunza

05/06/2022

04:46 PM Bug #55573 (New): stretch mode: be more sane about changing different size/min_size
From the mailing list:
I created 2 aditional pools each with a matching stretch rule:
- size=2/min=1 (not advised...
Greg Farnum
01:01 AM Bug #55549: OSDs crashing
After days of fighting this (it's on a production cluster) I finally gave up on the least important of the pools -- t... Richard Bateman

05/05/2022

04:23 PM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
This is from the 16.2.8 run.
/a/yuriw-2022-05-04_20:09:21-rados-pacific-distro-default-smithi/6821705...
Laura Flores
03:22 PM Backport #55439: quincy: FAILED ceph_assert due to issue manifest API to the original object
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46061
merged
Yuri Weinstein
03:20 PM Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45624
merged
Yuri Weinstein
03:13 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
/a/yuriw-2022-04-28_14:23:18-rados-wip-yuri-testing-2022-04-27-1456-quincy-distro-default-smithi/6811107... Laura Flores
01:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
Can I kindly ask if there's an estimate when this will be fixed and backported? We have customers that have been in t... Ruben Kerkhof

05/04/2022

10:20 PM Bug #55549 (Resolved): OSDs crashing
My apologies if this is the wrong project; I'm so lost on this particular issue that I'm not even sure where to ask f... Richard Bateman
08:11 PM Bug #55407: quincy osd's fail to boot and crash
It seems I messed up everything... Let me startover.
I have a ceph cluster running since looooong time ago. Recent...
Gonzalo Aguilar Delgado
05:56 PM Bug #55407: quincy osd's fail to boot and crash
Gonzalo Aguilar Delgado wrote:
> It doesn't matter. This is just a side effect. I mean... The bug is not caused by t...
Neha Ojha
05:52 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
I think the lack of @-2@ (@ENOENT@) **might** be caused the errno normalization @Objecter@ has. Radoslaw Zarzynski
07:42 AM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
I hit another issue when we have socket failure injection active when running the tests. I think this is not only the... Nitzan Mordechai
05:43 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
To judge how severe the problem really is we need the information whether the stall is permanent (PG gets stuck and t... Radoslaw Zarzynski
01:35 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
"These PG_AVAILBILITY warnings are frequently seen with snap-schedule teuthology jobs.":https://pulpito.ceph.com/mcha... Milind Changire
05:37 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Just the record: we suspect the issue is related to the error injection in async-msgr. Some runs without them are sup... Radoslaw Zarzynski
11:30 AM Backport #55543 (Resolved): quincy: should use TCMalloc for better performance
https://github.com/ceph/ceph/pull/47927 Backport Bot
11:30 AM Backport #55542 (Rejected): octopus: should use TCMalloc for better performance
Backport Bot
11:30 AM Backport #55541 (Rejected): pacific: should use TCMalloc for better performance
https://github.com/ceph/ceph/pull/51282 Backport Bot
11:29 AM Bug #55519 (Pending Backport): should use TCMalloc for better performance
Kefu Chai
06:13 AM Documentation #46120 (Resolved): Improve ceph-objectstore-tool documentation
This issue has been resolved. The ceph-objectstore-tool documentation now exists, and there's even a good manpage.
...
Zac Dover

05/03/2022

07:48 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/46120
Thanks for looking into it and creating the backport!
Neha Ojha
02:18 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
https://github.com/ceph/ceph/pull/46120 Myoungwon Oh
01:40 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
I think this is the same issue as https://tracker.ceph.com/issues/50806.
This issue was already fixed, but not backp...
Myoungwon Oh
01:07 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
Sure. Myoungwon Oh
07:46 PM Backport #50893 (In Progress): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recover...
Neha Ojha
06:26 PM Backport #55019: octopus: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty
Christian Rohmann wrote:
> Sorry for being a nag ... I initially reported https://tracker.ceph.com/issues/53663 and ...
Vikhyat Umrao
04:41 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
玮文 胡 wrote:
> Maybe we should fix the release note(https://docs.ceph.com/en/latest/releases/quincy/) first? The work...
Vikhyat Umrao
04:32 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
玮文 胡 wrote:
> https://github.com/ceph/ceph/pull/46124
>
> Tested locally with
>
> [...]
Thank you.
Vikhyat Umrao
04:27 PM Bug #55383 (Fix Under Review): monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao
09:55 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
Maybe we should fix the release note(https://docs.ceph.com/en/latest/releases/quincy/) first? The workaround there is... 玮文 胡
09:44 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
https://github.com/ceph/ceph/pull/46124
Tested locally with...
玮文 胡
03:49 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
Ran 100 thrash-erasure-code-big tests in octopus, and the `wait_for_recovery` assertion occurred 18/100 times, with 1... Laura Flores
11:19 AM Bug #55407: quincy osd's fail to boot and crash
It doesn't matter. This is just a side effect. I mean... The bug is not caused by the tool.
The bug is caused becau...
Gonzalo Aguilar Delgado
07:23 AM Bug #55519 (Fix Under Review): should use TCMalloc for better performance
Kefu Chai
07:22 AM Bug #55519 (Resolved): should use TCMalloc for better performance
we had been using TCMalloc in older releases. but somehow, we stopped doing so. let's bring it back. Kefu Chai

05/02/2022

06:50 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
To me looks like this is the problem?
https://github.com/ceph/ceph/commit/7c84e06e6f846f6b4b6fd959218b4d474520f429...
Vikhyat Umrao
05:29 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
Myoungwon Oh: Seeing this in pacific as well, can you confirm if it is the same issue?
/a/yuriw-2022-04-30_17:01:...
Neha Ojha
02:12 PM Bug #53789 (In Progress): CommandFailedError (rados/test_python.sh): "RADOS object not found" cau...
Nitzan Mordechai
01:11 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Scheduled another run with just the rados/verify test that failed and I can see this happen frequently:
/a/amathu...
Aishwarya Mathuria
01:07 PM Backport #55513 (In Progress): quincy: mount.ceph fails to understand AAAA records from SRV record
Matan Breizman
12:57 PM Backport #55513 (Resolved): quincy: mount.ceph fails to understand AAAA records from SRV record
https://github.com/ceph/ceph/pull/46113 Backport Bot
01:03 PM Backport #55514 (In Progress): pacific: mount.ceph fails to understand AAAA records from SRV record
Matan Breizman
12:57 PM Backport #55514 (Resolved): pacific: mount.ceph fails to understand AAAA records from SRV record
https://github.com/ceph/ceph/pull/46112 Backport Bot
12:52 PM Bug #47300 (Pending Backport): mount.ceph fails to understand AAAA records from SRV record
Matan Breizman
08:34 AM Bug #47300 (Resolved): mount.ceph fails to understand AAAA records from SRV record
Kefu Chai

05/01/2022

05:40 AM Bug #43887 (Fix Under Review): ceph_test_rados_delete_pools_parallel failure
https://github.com/ceph/ceph/pull/46099 Nitzan Mordechai

04/28/2022

09:29 PM Bug #55488 (New): ENOENT on clone on EC non-primary shard
... Neha Ojha
06:59 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2022-04-27_02:52:22-rados-pacific-distro-default-smithi/6807766... Laura Flores

04/27/2022

09:51 PM Backport #55439 (In Progress): quincy: FAILED ceph_assert due to issue manifest API to the origin...
Laura Flores
09:25 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Laura Flores wrote:
> This one looks somewhat different from the other reported failures. First of all, it failed on...
Laura Flores
05:37 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Let's discuss this on the next RADOS Team Meeting. Radoslaw Zarzynski
06:07 PM Bug #55424 (Won't Fix): ceph-mon process exit in dead status , which backtrace displayed has bloc...
Sorry, the version is EOL :-(. Radoslaw Zarzynski
06:06 PM Bug #55419 (Resolved): cephtool/test.sh: failure on blocklist testing
Neha Ojha
05:58 PM Bug #55440: osd-scrub-test.sh: TEST_scrub_test failed due to inconsistent PG
... Neha Ojha
05:56 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
Laura Flores wrote:
> /a/yuriw-2022-04-26_00:11:14-rados-wip-55324-pacific-backport-distro-default-smithi/6805265/re...
Neha Ojha
05:49 PM Bug #55407: quincy osd's fail to boot and crash
Hello Gonzalo!
Just a quick note from a bug srub: we don't support mixing the tool from a newer release with OSDs fr...
Radoslaw Zarzynski
05:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
This was discussed in the rados meeting this week. Laura is trying to check if the bug exists in Octopus or not, to h... Neha Ojha
05:34 PM Bug #55433 (Closed): common: FAILED ceph_assert(((lock).is_locked()))
The fix will be merged with the original PR. Neha Ojha
10:24 AM Bug #47300 (Fix Under Review): mount.ceph fails to understand AAAA records from SRV record
Matan Breizman

04/26/2022

03:58 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
This one looks somewhat different from the other reported failures. First of all, it failed on a rados/verify test, n... Laura Flores
02:33 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
/a/yuriw-2022-04-26_00:11:14-rados-wip-55324-pacific-backport-distro-default-smithi/6805265/remote/smithi061/crash/20... Laura Flores
10:27 AM Bug #55450 (Resolved): [DOC] stretch_rule defined in the doc needs updation
in section [1], the stretch_rule defined to be added to the crush map needs to be updated.
min size and max size par...
Pawan Dhiran

04/25/2022

10:11 PM Bug #55433: common: FAILED ceph_assert(((lock).is_locked()))
https://github.com/ceph/ceph/pull/46028 has been merged to unblock other master PR merges. Neha Ojha
06:28 PM Bug #55433 (Fix Under Review): common: FAILED ceph_assert(((lock).is_locked()))
Neha Ojha
05:36 PM Bug #55433 (Closed): common: FAILED ceph_assert(((lock).is_locked()))
Seen in jenkins make check tests, i.e. https://jenkins.ceph.com/job/ceph-pull-requests/94227/console... Laura Flores
09:59 PM Bug #55440 (New): osd-scrub-test.sh: TEST_scrub_test failed due to inconsistent PG
/a/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-smithi/6800338... Laura Flores
07:31 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
/a/yuriw-2022-04-25_14:14:44-rados-wip-yuri3-testing-2022-04-22-0534-quincy-distro-default-smithi/6805186... Laura Flores
07:07 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-04-22_21:06:04-rados-wip-yuri3-testing-2022-04-22-0534-quincy-distro-default-smithi/6802072 Laura Flores
07:00 PM Backport #55439 (Resolved): quincy: FAILED ceph_assert due to issue manifest API to the original ...
https://github.com/ceph/ceph/pull/46061 Backport Bot
06:59 PM Bug #54509 (Pending Backport): FAILED ceph_assert due to issue manifest API to the original object
Laura Flores
06:58 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
/a/yuriw-2022-04-22_21:06:04-rados-wip-yuri3-testing-2022-04-22-0534-quincy-distro-default-smithi/6802065/remote/smit... Laura Flores
06:56 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
In an example scenario where we have 5 monitors:
rank_size = 5
mon.a (rank 0)
mon.b (rank 1)
mon.c (rank 2)
mo...
Kamoltat (Junior) Sirivadhna
06:29 PM Bug #55435 (Resolved): mon/Elector: notify_ranked_removed() does not properly erase dead_ping in ...
Kamoltat (Junior) Sirivadhna
05:54 PM Bug #55407: quincy osd's fail to boot and crash
Gonzalo Aguilar Delgado wrote:
> Neha Ojha wrote:
> > Did you see the same segmentation fault in quincy and pacific...
Gonzalo Aguilar Delgado
05:50 PM Bug #55407: quincy osd's fail to boot and crash
The situation is even worse. Any osd created with ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quin... Gonzalo Aguilar Delgado
06:23 AM Bug #55407: quincy osd's fail to boot and crash
I managed to reproduce...
I install an OSD with the pacific version. Then I let it run for a while (10 min or so)...
Gonzalo Aguilar Delgado
05:51 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
Looks like it is happening because of mon/LogMonitor changing it back to RADOS. Vikhyat Umrao
05:49 PM Bug #55383 (Triaged): monitor cluster logs(ceph.log) appear empty until rotated
Vikhyat Umrao
05:49 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
If you are okay can you please send a quick fix? Vikhyat Umrao
05:48 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
玮文 胡 wrote:
> I suspect this issue is due to https://github.com/ceph/ceph/commit/7c84e06e6f846f6b4b6fd959218b4d47452...
Vikhyat Umrao
05:19 PM Bug #55419 (Fix Under Review): cephtool/test.sh: failure on blocklist testing
Neha Ojha
03:23 PM Bug #54458: osd-scrub-snaps.sh: TEST_scrub_snaps failed due to malformed log message
Perhaps this has resurfaced?
/a/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-s...
Laura Flores
09:42 AM Bug #55424 (Won't Fix): ceph-mon process exit in dead status , which backtrace displayed has bloc...
plz see abc.png
LevelDBstore::close
set thread quit flag, compact_queue_stop = true.
then send signal ...
Yong Wang
 

Also available in: Atom