Project

General

Profile

Activity

From 11/08/2022 to 12/07/2022

12/07/2022

10:37 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Can we make the default behavior a ceph user, and then provide --setgroup and --setuser options in case we need to re... Laura Flores
11:54 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Still waiting for that build (debuginfo seems to take an unbelievably long time to publish...)
Meanwhile, I did a ...
Tim Serong
05:51 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
I've just rerun "rados/singleton/{all/test-crash mon_election/connectivity msgr-failures/few msgr/async objectstore/b... Tim Serong
05:30 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
(Sorry, I didn't mean to update any of those fields with my previous comment) Tim Serong
02:34 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Thanks Laura, I'll try to figure out what's going on. So far, looking at the journal log, the keyring must be OK, or... Tim Serong
03:00 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
Sent a PR for quincy: https://github.com/ceph/ceph/pull/49304. Radoslaw Zarzynski
01:47 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
I'm having the same BT in my tests:
/a/nmordech-2022-12-06_13:26:40-rados:thrash-erasure-code-wip-nitzan-peering-aut...
Nitzan Mordechai
01:34 PM Bug #56371 (Duplicate): crash: MOSDPGLog::encode_payload(unsigned long)
Radoslaw Zarzynski
12:04 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Laura, i think it is different than that bug (57751), in that case all the osds are still up.
We can see that we nev...
Nitzan Mordechai
09:57 AM Backport #58006 (In Progress): quincy: bail from handle_command() if _generate_command_map() fails
Ilya Dryomov
09:57 AM Backport #58007 (In Progress): pacific: bail from handle_command() if _generate_command_map() fails
Ilya Dryomov

12/06/2022

05:30 PM Bug #58098 (New): qa/workunits/rados/test_crash.sh: crashes are never posted
Laura Flores wrote:
> I scheduled some tests here with the reverts committed to see if they pass: http://pulpito.fro...
Laura Flores
03:48 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
I scheduled some tests here with the reverts committed to see if they pass: http://pulpito.front.sepia.ceph.com/lflor... Laura Flores
03:41 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Yes, there's one available at /a/yuriw-2022-11-23_15:09:06-rados-wip-yuri10-testing-2022-11-22-1711-distro-default-sm... Laura Flores
06:11 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Is there a way to view the journalctl-b0.gz archive from the failed runs? Because if ceph-crash can't post crashes o... Tim Serong
11:53 AM Backport #58186 (In Progress): quincy: osd: Misleading information displayed for the running conf...
Sridhar Seshasayee
11:45 AM Backport #58186 (Resolved): quincy: osd: Misleading information displayed for the running configu...
https://github.com/ceph/ceph/pull/49281 Backport Bot
11:43 AM Fix #57963 (Pending Backport): osd: Misleading information displayed for the running configuratio...
Sridhar Seshasayee
10:02 AM Bug #58173 (Fix Under Review): api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosA...
Matan Breizman
05:09 AM Bug #57937: pg autoscaler of rgw pools doesn't work after creating otp pool
This problem was fixed in Rook v1.10.2. I updated my Rook/Ceph cluster to v1.10.5 and confirmed that this problem dis... Satoru Takeuchi
03:07 AM Bug #58182 (Fix Under Review): Suicide when osd bootup timeout
When the osd is started, if a message is lost, the OSD is stuck in the startup phase.
Restart the osd node through t...
Yao Wu
01:37 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Radoslaw Zarzynski wrote:
> Hello!
>
> what is on disk is actually serialized from the the in-memory representati...
王子敬 wang
12:09 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
/a/yuriw-2022-11-28_16:10:10-rados-wip-yuri6-testing-2022-11-23-1348-distro-default-smithi/7093588 Laura Flores

12/05/2022

11:37 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
https://shaman.ceph.com/builds/ceph/wip-revert-pr-48713/2b583578473c82604cfdab2faef9f161dc2fb0b9/ Laura Flores
11:20 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
The bug reproduced on Yuri's test branch. The difference between the test branch and the main SHA is that the test br... Laura Flores
07:23 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Laura Flores wrote:
> Scheduled 50x tests to run here: http://pulpito.front.sepia.ceph.com/lflores-2022-12-05_17:05:...
Laura Flores
07:22 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
I have a feeling that the tests I scheduled earlier on the main branch all passed since the SHA it picked up is older... Laura Flores
07:14 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Wondering if there could have been a regression caused by https://github.com/ceph/ceph/pull/48713. Laura Flores
06:38 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
/a/yuriw-2022-11-28_21:26:12-rados-wip-yuri7-testing-2022-11-18-1548-distro-default-smithi/7095988
/a/lflores-2022-1...
Laura Flores
04:17 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Scheduled 50x tests to run here: http://pulpito.front.sepia.ceph.com/lflores-2022-12-05_17:05:59-rados-wip-yuri10-tes... Laura Flores
04:10 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Three recent instances of this bug in the main branch point to a regression. My next steps here will be to schedule m... Laura Flores
10:46 PM Bug #58052: Empty Pool (zero objects) shows usage.
That is every log file from every node. There are no ceph-mgr* logs. :/
Even from inside the docker on the adm n...
Brian Woods
06:33 PM Bug #58052: Empty Pool (zero objects) shows usage.
Hello. Thanks for response and the files.... Radoslaw Zarzynski
09:11 PM Bug #58173: api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag
Building a branch here with https://github.com/ceph/ceph/pull/49029 reverted, which can be used to verify whether it ... Laura Flores
09:03 PM Bug #58173: api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag
Excuse my update Sam, I see you already added it as a duplicate. Laura Flores
08:55 PM Bug #58173: api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag
Matan added that test within the last two weeks: https://github.com/ceph/ceph/pull/49029 Samuel Just
07:10 PM Bug #58173 (Resolved): api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolE...
The workunits/rados/test.sh script is run in the orch suite on some tests. In a few of them, these two tests were fai... Adam King
08:06 PM Bug #58178: FAILED ceph_assert(last_e.version.version < e.version.version)
Noticed an osd, doing this, on a cluster over the weekend. Its been crashing consistently since. Kevin Fox
08:05 PM Bug #58178 (Need More Info): FAILED ceph_assert(last_e.version.version < e.version.version)
debug -4> 2022-12-05T19:14:03.556+0000 7fe51028a200 5 osd.57 pg_epoch: 261349 pg[1.573( v 261349'617978754 (2613... Kevin Fox
07:07 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
I've just let Mark and Ronen know about this issue. Radoslaw Zarzynski
07:05 PM Bug #58156: Monitors do not permit OSD to join after upgrading to Quincy
Radoslaw Zarzynski wrote:
> Hi Igor! What was the intermediary version during the upgrade? We merged https://github....
Igor Fedotov
06:40 PM Bug #58156: Monitors do not permit OSD to join after upgrading to Quincy
Hi Igor! What was the intermediary version during the upgrade? We merged https://github.com/ceph/ceph/pull/44090 but ... Radoslaw Zarzynski
07:00 PM Bug #58142 (In Progress): rbd-python snaps-many-objects: deep-scrub : stat mismatch
Moving to @In progress@ basing the core standup 1 Dec. Radoslaw Zarzynski
06:56 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Hello!
what is on disk is actually serialized from the the in-memory representation. We don't see huge numbers of ...
Radoslaw Zarzynski
06:24 PM Bug #58166 (Need More Info): mon:DAEMON_OLD_VERSION newer versions is considered older than earlier
If your cluster is in the same state, can you please share mon logs with debug_mon=20? The following code snippet in ... Neha Ojha
02:53 PM Bug #58166: mon:DAEMON_OLD_VERSION newer versions is considered older than earlier
This was probably introduced in https://github.com/ceph/ceph/pull/36759 Tobias Urdin
02:52 PM Bug #58166 (Need More Info): mon:DAEMON_OLD_VERSION newer versions is considered older than earlier
We have a cluster with most mon/mgr/osd are running 16.2.10 and some OSDs are running 16.2.9
The healthcheck does ...
Tobias Urdin
06:24 PM Backport #58169 (Resolved): quincy: extra debugs for: [mon] high cpu usage by fn_monstore thread
https://github.com/ceph/ceph/pull/50406 Backport Bot
06:16 PM Feature #58168 (Pending Backport): extra debugs for: [mon] high cpu usage by fn_monstore thread
Radoslaw Zarzynski
06:16 PM Feature #58168 (Pending Backport): extra debugs for: [mon] high cpu usage by fn_monstore thread
Radoslaw Zarzynski
06:10 PM Bug #53806: unessesarily long laggy PG state
> I think as long as `acting` does not have duplicate entries, the logic is exactly the same as before.
Yeah. I'm ...
Radoslaw Zarzynski
05:51 PM Backport #55768: pacific: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46499
merged
Yuri Weinstein
05:34 PM Backport #56648: quincy: [Progress] Do not show NEW PG_NUM value for pool if autoscaler is set to...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47925
merged
Yuri Weinstein
05:15 PM Fix #57963: osd: Misleading information displayed for the running configuration of osd_mclock_max...
https://github.com/ceph/ceph/pull/48708 merged Yuri Weinstein
05:12 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
Radoslaw Zarzynski wrote:
> NOT A FIX (extra debugs): https://github.com/ceph/ceph/pull/48513
merged
Yuri Weinstein
04:02 PM Bug #58165 (Fix Under Review): rados: fix extra tabs on warning for pool copy
Laura Flores
12:57 PM Bug #58165 (Resolved): rados: fix extra tabs on warning for pool copy
BZ link: https://bugzilla.redhat.com/show_bug.cgi?id=2148242 Shreyansh Sancheti
03:52 PM Bug #57632 (Fix Under Review): test_envlibrados_for_rocksdb: free(): invalid pointer
Laura Flores
07:37 AM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
Thomas Le Gentil wrote:
> I could avoid this crash by removing all pg for which ceph could not get the clone_bytes, ...
Thomas Le Gentil

12/04/2022

11:56 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
/a/yuriw-2022-11-28_21:13:47-rados-wip-yuri11-testing-2022-11-18-1506-distro-default-smithi/7095031/ Matan Breizman
11:46 AM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
/a/yuriw-2022-11-23_21:36:17-rados-wip-yuri11-testing-2022-11-18-1506-distro-default-smithi/7089814/ Matan Breizman
09:41 AM Backport #58144 (In Progress): pacific: mon/MonCommands: Support dump_historic_slow_ops
Matan Breizman
09:37 AM Backport #58143 (In Progress): quincy: mon/MonCommands: Support dump_historic_slow_ops
Matan Breizman

12/02/2022

09:49 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
In a passed job, the crashes are posted:... Laura Flores
09:33 PM Bug #58098 (In Progress): qa/workunits/rados/test_crash.sh: crashes are never posted
In the job that passed, the mgr.server reports a recent crash:
/a/lflores-2022-11-30_22:53:49-rados-main-distro-de...
Laura Flores
09:06 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
In one of the jobs that passed, the OSDs were also failed for 31 seconds, but this time, the crashes were detected. S... Laura Flores
09:02 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Didn't reproduce in the 20x run above, but it did reproduce a second time here:
/a/yuriw-2022-11-28_21:09:37-rados...
Laura Flores
06:09 PM Bug #58052: Empty Pool (zero objects) shows usage.
Attaching server2 to this message.
Brian Woods
06:09 PM Bug #58052: Empty Pool (zero objects) shows usage.
I am realizing those logs are from a single host (server4).
server3 got removed today.
Attaching server1 to this me...
Brian Woods
05:42 PM Bug #58052: Empty Pool (zero objects) shows usage.
Radoslaw Zarzynski wrote:
> Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the c...
Brian Woods
03:28 PM Bug #58156 (In Progress): Monitors do not permit OSD to join after upgrading to Quincy
Igor Fedotov
03:28 PM Bug #58156 (Resolved): Monitors do not permit OSD to join after upgrading to Quincy
The Nautilus cluster has been eventually upgraded to Quincy and at the end OSDs stopped joining the cluster.
The i...
Igor Fedotov
03:24 PM Bug #58155 (Resolved): mon:ceph_assert(m < ranks.size()) `different code path than tracker 50089`
Same problem with https://tracker.ceph.com/issues/50089, but it is a different code path.
We opened a new tracker ...
Kamoltat (Junior) Sirivadhna
01:31 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> 王子敬 wang wrote:
> > Nitzan Mordechai wrote:
> > > Since you attached part of the pglog, ...
王子敬 wang
01:06 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Linked a possible solution for skipping ubuntu with this test. I scheduled a teuthology test for it, which I will use... Laura Flores

12/01/2022

09:44 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Thanks for your observations, Brad! I'm going to dedicate this Tracker to `LibRadosAio.SimpleWrite` and mark it as re... Laura Flores
09:20 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
The issue appears to be in the api_aio test as it gets started but doesn't complete.... Brad Hubbard
08:04 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Ran into another instance of this here:
/a/yuriw-2022-11-30_23:13:27-rados-wip-yuri2-testing-2022-11-30-0724-pacif...
Laura Flores
09:43 PM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
/a/yuriw-2022-11-29_22:29:58-rados-wip-yuri10-testing-2022-11-29-1005-pacific-distro-default-smithi/7097464/ Laura Flores
09:23 PM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
possibly 58130 is related Brad Hubbard
07:30 PM Cleanup #58149 (Resolved): Clarify pool creation failure message due to exceeding max_pgs_per_osd
This was inspired by the Re: [ceph-users] proxmox hyperconverged pg calculations in ceph pacific, pve 7.2 thread.
Anthony D'Atri
07:30 PM Bug #50089 (Resolved): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of...
Kamoltat (Junior) Sirivadhna
06:59 PM Bug #50089 (New): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of moni...
Kamoltat (Junior) Sirivadhna
04:12 PM Backport #58144 (Resolved): pacific: mon/MonCommands: Support dump_historic_slow_ops
https://github.com/ceph/ceph/pull/49233 Backport Bot
04:12 PM Backport #58143 (Resolved): quincy: mon/MonCommands: Support dump_historic_slow_ops
https://github.com/ceph/ceph/pull/49232 Backport Bot
04:02 PM Bug #58141 (Pending Backport): mon/MonCommands: Support dump_historic_slow_ops
Matan Breizman
12:42 PM Bug #58141 (Resolved): mon/MonCommands: Support dump_historic_slow_ops
Slow ops are being tracked in the mon while `dump_historic_slow_ops` command is not registered:
```
$ ceph daemon ....
Matan Breizman
03:56 PM Bug #58142 (In Progress): rbd-python snaps-many-objects: deep-scrub : stat mismatch
... Matan Breizman
03:45 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
It seems more like generic RADOS issue. Adam Kupczyk
12:27 PM Bug #57757 (Fix Under Review): ECUtil: terminate called after throwing an instance of 'ceph::buff...
Nitzan Mordechai
08:18 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
王子敬 wang wrote:
> Nitzan Mordechai wrote:
> > Since you attached part of the pglog, i can't see how many entries yo...
Nitzan Mordechai
01:50 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> Since you attached part of the pglog, i can't see how many entries you have for log and ho...
王子敬 wang
03:41 AM Bug #53806: unessesarily long laggy PG state
Radoslaw Zarzynski wrote:
> OK, Aishwarya has found in testing that the @break@-related commit (https://github.com/c...
玮文 胡
12:51 AM Backport #58040: quincy: osd: add created_at and ceph_version_when_created metadata
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/49159
ceph-backport.sh versi...
Kaoru Esashika

11/30/2022

11:15 PM Bug #58132 (In Progress): qa/standalone/mon: --mon-initial-members setting causes us to populate ...
Kamoltat (Junior) Sirivadhna
11:08 PM Bug #58132 (Resolved): qa/standalone/mon: --mon-initial-members setting causes us to populate rem...
Problem:
--mon-initial-members does nothing but cause monmap
to populate ``removed_ranks`` because the way we sta...
Kamoltat (Junior) Sirivadhna
10:57 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Neha suggested we see how reproducible this is, so as not to mask any underlying problems by sleeping longer. I sched... Laura Flores
10:34 PM Bug #58130 (In Progress): LibRadosAio.SimpleWrite hang and pkill
A rados api test experienced a failure after the last global tests had successfully run.
/a/yuriw-2022-11-29_22:29...
Laura Flores
07:31 PM Bug #58052: Empty Pool (zero objects) shows usage.
Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbos... Radoslaw Zarzynski
07:25 PM Bug #57977: osd:tick checking mon for new map
The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find som...
Radoslaw Zarzynski
07:09 PM Bug #58106 (Need More Info): when a large number of error ops appear in the OSDs,pglog does not t...
Radoslaw Zarzynski
10:43 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you pl...
Nitzan Mordechai
08:38 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
王子敬 wang wrote:
> Nitzan Mordechai wrote:
> > @王子敬 wang, can you please send us the output for one of the pgs from ...
王子敬 wang
08:32 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> @王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool...
王子敬 wang
07:30 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
@王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool?... Nitzan Mordechai
02:16 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> @王子敬 wang can you please provide the output of 'ceph pg dump' ?
ok, the output in the pg_...
王子敬 wang
07:07 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
I think the invariant here is that the @acting@ container should not have duplicates. If it is broken, we have a more... Radoslaw Zarzynski
01:55 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
If there are indeed duplicated entries in the acting set, should there be a 'break' at all in this loop? It seems lik... Joshua Baergen
07:00 PM Bug #53806: unessesarily long laggy PG state
OK, Aishwarya has found in testing that the @break@-related commit (https://github.com/ceph/ceph/pull/44499/commits/9... Radoslaw Zarzynski
02:02 PM Bug #53806: unessesarily long laggy PG state
FWIW, we've seen this happen very frequently during Nautilus->{Octopus,Pacific} upgrades. I had just tracked down the... Joshua Baergen
03:36 PM Bug #58114 (Closed): mon: FAILED ceph_assert(rank == new_rank)
Close due to this issue is found pre-merge testing from PR: https://github.com/ceph/ceph/pull/48698/ Kamoltat (Junior) Sirivadhna
04:14 AM Backport #58039: pacific: osd: add created_at and ceph_version_when_created metadata
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/49144
ceph-backport.sh versi...
Kaoru Esashika

11/29/2022

11:18 PM Bug #54438: test/objectstore/store_test.cc: FAILED ceph_assert(bl_eq(state->contents[noid].data, ...
/a/yuriw-2022-11-28_16:28:53-rados-wip-yuri-testing-2022-11-18-1500-pacific-distro-default-smithi/7094026 Laura Flores
07:14 PM Backport #58117 (In Progress): quincy: qa/workunits/rados/test_librados_build.sh: specify redirec...
https://github.com/ceph/ceph/pull/49140 Laura Flores
06:58 PM Backport #58117 (In Progress): quincy: qa/workunits/rados/test_librados_build.sh: specify redirec...
Backport Bot
07:11 PM Backport #58116 (In Progress): pacific: qa/workunits/rados/test_librados_build.sh: specify redire...
https://github.com/ceph/ceph/pull/49139 Laura Flores
06:58 PM Backport #58116 (Resolved): pacific: qa/workunits/rados/test_librados_build.sh: specify redirect ...
Backport Bot
06:52 PM Bug #58046 (Pending Backport): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
Laura Flores
05:37 PM Bug #58046: qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
Seen in Pacific run: /a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smi... Aishwarya Mathuria
05:52 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
We discussed this tracker in the RADOS meeting. Sam pointed out that this set of tests doesn't have any actual users,... Laura Flores
05:24 PM Bug #58114 (Closed): mon: FAILED ceph_assert(rank == new_rank)
/a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smithi/7095280/remote/sm... Aishwarya Mathuria
04:59 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
... Aishwarya Mathuria
03:05 PM Bug #58107: mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
Therefore, there is nothing we can do but wait for the other site to come back up, so pgs can complete peering and th... Kamoltat (Junior) Sirivadhna
03:04 PM Bug #58107 (Closed): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
Closed due to this is not a corner case but quote from Greg Farnum:
``it’s that electing those two monitors means ...
Kamoltat (Junior) Sirivadhna
04:15 AM Bug #58107 (In Progress): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
Kamoltat (Junior) Sirivadhna
04:14 AM Bug #58107 (Closed): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
h1. How to reproduce the issue
h2. Set up:
mon.a (zone 1) rank=0
mon.b (zone 1) rank=1
mon.c (zone 2) rank=2
...
Kamoltat (Junior) Sirivadhna
01:07 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
@王子敬 wang can you please provide the output of 'ceph pg dump' ? Nitzan Mordechai
01:42 AM Bug #58106 (Need More Info): when a large number of error ops appear in the OSDs,pglog does not t...
When We use the s3 interface append and copy of the object gateway, a large number of error ops appear in the OSDs wh... 王子敬 wang
11:12 AM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
I could avoid this crash by removing all pg for which ceph could not get the clone_bytes, except the one I was sure t... Thomas Le Gentil
09:02 AM Backport #57496 (Resolved): quincy: Invalid read of size 8 in handle_recovery_delete()
Nitzan Mordechai
07:05 AM Bug #50042 (Fix Under Review): rados/test.sh: api_watch_notify failures
Nitzan Mordechai

11/28/2022

10:24 PM Bug #58098 (Fix Under Review): qa/workunits/rados/test_crash.sh: crashes are never posted
Laura Flores
05:34 PM Bug #58098 (Resolved): qa/workunits/rados/test_crash.sh: crashes are never posted
/a/yuriw-2022-11-23_15:09:06-rados-wip-yuri10-testing-2022-11-22-1711-distro-default-smithi/7087281... Laura Flores
09:43 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Just a follow-up.
Finally, what's helping us the best is increasing osd_scrub_sleep to 0.4.
Gilles Mocellin
02:47 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Aishwarya Mathuria wrote:
> We suspect that this assert failure is hit in cases when we try to encode a message befo...
Ben Gao
05:05 AM Support #58091 (New): osd: reduce default value of osd_heartbeat_grace
Client io hang 20s when peer osd ping failure, 20s is too long. In case of network jitter, it generally does not exce... yite gu

11/24/2022

03:54 AM Bug #57977: osd:tick checking mon for new map
The more I dig, the more I'm thinking that this might be some race to do with noup, and probably has nothing to do wi... Joshua Baergen
03:42 AM Bug #57977: osd:tick checking mon for new map
Something that's probably worth mentioning - we had noup set in the cluster for each upgrade, and we wait until all O... Joshua Baergen
03:12 AM Bug #57977: osd:tick checking mon for new map
We saw this happen to roughly a dozen OSDs (1-2 per host for some hosts) during a recent upgrade from Nautilus to Pac... Joshua Baergen

11/22/2022

06:17 PM Bug #57977: osd:tick checking mon for new map
I already restart osd daemon, but have no reproduct. If it happens again, I will collect more logs yite gu
03:54 PM Bug #58052: Empty Pool (zero objects) shows usage.
Radoslaw Zarzynski wrote:
> Could you please provide a log from an active mgr with @debug_ms=1@ and @debug_mgr=20@?
...
Brian Woods

11/21/2022

06:35 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
@Radek I have been trying to reproduce this locally with no luck. I'll try your suggestion and update if I'm successful. Laura Flores
06:34 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Thanks for the link, Matan! I'm a bit worried the experiment there involved changing 2 parameters the same: compiler ... Radoslaw Zarzynski
06:29 PM Bug #58044 (Need More Info): ceph-osd: osd numa affinity setting doesn't work
How do you check the affinity?
Have you rebooted the OSD after the injecting the setting?
Could you please provide ...
Radoslaw Zarzynski
06:22 PM Bug #58046 (Resolved): qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
Radoslaw Zarzynski
06:21 PM Bug #58052 (Need More Info): Empty Pool (zero objects) shows usage.
Could you please provide a log from an active mgr with @debug_ms=1@ and @debug_mgr=20@? We would like to see which OS... Radoslaw Zarzynski
07:18 AM Bug #58027: op slow from throttled to header_read
Radoslaw Zarzynski wrote:
> Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether thi...
yite gu

11/20/2022

05:23 PM Bug #58052 (Need More Info): Empty Pool (zero objects) shows usage.
I have a pool that was/is being used in a CephFS. I have migrated all of the files off of the pool and was preparing... Brian Woods

11/18/2022

03:29 PM Bug #58049 (Resolved): mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker...
First encountered in the downstream: https://bugzilla.redhat.com/show_bug.cgi?id=2142674
When we failover monitors...
Kamoltat (Junior) Sirivadhna
12:40 AM Bug #58046 (Fix Under Review): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
Laura Flores
12:36 AM Bug #58046 (Pending Backport): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
The workunit currently grabs files with:... Laura Flores

11/17/2022

05:07 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
We suspect that this assert failure is hit in cases when we try to encode a message before the connection is in a sta... Aishwarya Mathuria
03:30 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
> For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.
https...
Matan Breizman
02:09 PM Bug #58044 (Need More Info): ceph-osd: osd numa affinity setting doesn't work
After setting osd_numa_node parameter, the osd numa is not as expected.

xu wang
01:20 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Radoslaw Zarzynski wrote:
> Do we know the reason why switching g++11 helps? Is it a known compiler's bug?
See Br...
Matan Breizman
12:15 PM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
Thomas Le Gentil wrote:
> the osd process does not crash if it is marked 'out'
Sorry, this is false. The OSD cras...
Thomas Le Gentil
09:42 AM Backport #58040 (Resolved): quincy: osd: add created_at and ceph_version_when_created metadata
Backport Bot
09:42 AM Backport #58039 (Resolved): pacific: osd: add created_at and ceph_version_when_created metadata
Backport Bot
09:34 AM Feature #58038 (Pending Backport): osd: add created_at and ceph_version_when_created metadata
Igor Fedotov
07:24 AM Feature #58038: osd: add created_at and ceph_version_when_created metadata
PR#48298 has already been merged. Could you change the status of this issue to "Pending Backport"?
I'll create backp...
Kaoru Esashika
07:15 AM Feature #58038 (Resolved): osd: add created_at and ceph_version_when_created metadata
Add the following two OSD metadata.
- created_at: the timestamp when OSD was created. It's useful when getting som...
Kaoru Esashika

11/16/2022

07:11 PM Bug #57977: osd:tick checking mon for new map
Thanks for the update! Yeah, it might stuck there. To confirm we would logs with increased debugs (maybe @debug_mon =... Radoslaw Zarzynski
07:06 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks for formulating the hypothesis!
Just updating to keep this ticket in the front of the tracker.
Radoslaw Zarzynski
07:02 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
Yeah, worth looking the msgr encode issue has the priority. Radoslaw Zarzynski
07:00 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Discussed during the RADOS Team Meeting on 15 Nov.
Linking the Nitzan's gist: https://gist.github.com/NitzanMordhai/...
Radoslaw Zarzynski
06:58 PM Bug #57989: test-erasure-eio.sh fails since pg is not in unfound
Definitely a low priority. Radoslaw Zarzynski
06:52 PM Bug #58027 (Closed): op slow from throttled to header_read
Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether this is really a bug. Seeing 0,5... Radoslaw Zarzynski
06:48 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Do we know the reason why switching g++11 helps? Is it a known compiler's bug? Radoslaw Zarzynski
05:47 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
I was able to schedule a teuthology run: http://pulpito.front.sepia.ceph.com/lflores-2022-11-16_15:49:13-rados:single... Laura Flores
01:11 PM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
the osd process does not crash if it is marked 'out' Thomas Le Gentil

11/15/2022

08:44 AM Bug #56772: crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap.count(...
This bug is present in v17.2.5 Thomas Le Gentil
07:32 AM Bug #58027 (Closed): op slow from throttled to header_read
ceph version 15.2.7
Op spend 500ms from throttled to header_read...
yite gu
12:24 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
There is also a coredump located at `/a/matan-2022-09-08_11:12:20-rados:singleton-main-distro-default-smithi/7020422/... Laura Flores
12:01 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Some relevant frames:... Laura Flores

11/14/2022

11:39 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
I followed Brad's ubuntu 20.04 coredump tutorial: https://source.redhat.com/personal_blogs/debugging_a_ceph_osd_cored... Laura Flores
08:20 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
The original build is by now expired, so I'm rebuilding it here: https://shaman.ceph.com/builds/ceph/wip-kefu-testing... Laura Flores
08:14 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Ran the test locally in an ubuntu 20.04 environment, and the test ran fine.
There is a coredump located under /a/k...
Laura Flores
11:37 AM Bug #55750: mon: slow request of very long time
{
"description": "osd_failure(failed timeout osd.6 [v2:10.172.98.151:6800/39,v1:10.172.98.151:68...
yite gu

11/11/2022

08:31 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Also to note: We set `ceph config set mgr mgr_stats_period 1` on the gibba cluster to reproduce this bug. (This occur... Laura Flores
06:27 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
I think https://tracker.ceph.com/issues/49689#note-31 makes sense and the following logs also show what max_oldest_ma... Neha Ojha
10:08 AM Backport #58007: pacific: bail from handle_command() if _generate_command_map() fails
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/48846
ceph-backport.sh versi...
nikhil kshirsagar
09:07 AM Backport #58007 (Resolved): pacific: bail from handle_command() if _generate_command_map() fails
https://github.com/ceph/ceph/pull/48846 Backport Bot
10:03 AM Backport #58006: quincy: bail from handle_command() if _generate_command_map() fails
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/48845
ceph-backport.sh versi...
nikhil kshirsagar
09:07 AM Backport #58006 (Resolved): quincy: bail from handle_command() if _generate_command_map() fails
https://github.com/ceph/ceph/pull/48845 Backport Bot
09:01 AM Bug #57859 (Pending Backport): bail from handle_command() if _generate_command_map() fails
PR https://github.com/ceph/ceph/pull/48044 has been merged in main. Ponnuvel P

11/10/2022

11:37 PM Bug #56101 (Fix Under Review): Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function s...
Laura Flores
11:21 PM Bug #56101 (In Progress): Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_t...
Laura Flores
04:52 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Thanks for your work in capturing the core Laura.
I had a look at the coredump and it shows exactly what we had sp...
Brad Hubbard
07:14 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
/a/yuriw-2022-10-17_17:31:25-rados-wip-yuri7-testing-2022-10-17-0814-distro-default-smithi/7071031 Laura Flores
11:50 AM Bug #57989: test-erasure-eio.sh fails since pg is not in unfound
For some reason, the pool already exist... Nitzan Mordechai
08:44 AM Bug #57757 (In Progress): ECUtil: terminate called after throwing an instance of 'ceph::buffer::v...
Nitzan Mordechai
08:42 AM Bug #57618 (Fix Under Review): rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
Nitzan Mordechai
08:34 AM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
Some of the OSDs stopped due to valgrind errors. This is duplicate of other bug Nitzan Mordechai
08:39 AM Bug #57751 (Fix Under Review): LibRadosAio.SimpleWritePP hang and pkill
Nitzan Mordechai
07:38 AM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
Thanks for taking a look Radek! That's a good point since we are seeing this issue with rados/thrash-erasure-code tes... Aishwarya Mathuria

11/09/2022

10:56 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Managed to reproduce this on the Gibba cluster and produce a coredump!
The core file is located on gibba001 under ...
Laura Flores
08:18 PM Backport #57704 (Resolved): quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reduc...
https://github.com/ceph/ceph/pull/48321 Kamoltat (Junior) Sirivadhna
08:17 PM Backport #57705 (Resolved): pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when redu...
https://github.com/ceph/ceph/pull/48320 Kamoltat (Junior) Sirivadhna
08:17 PM Bug #50089 (Resolved): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of...
Kamoltat (Junior) Sirivadhna
04:34 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks again for looking at this.
I haven't looked further, but I suspect the issue will come down to the variable...
Chris Durham

11/08/2022

09:23 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
pacific backport: https://github.com/ceph/ceph/pull/48803 Kamoltat (Junior) Sirivadhna
08:59 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
quincy backport: https://github.com/ceph/ceph/pull/48802 Kamoltat (Junior) Sirivadhna
07:23 PM Bug #51729: Upmap verification fails for multi-level crush rule
I believe I've reproduced the issue using the osdmaps that Chris provided.
First, I used the osdmaptool to run the...
Laura Flores
02:08 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
after rechecking the logs it looks like we are taking 2 different versions of smithi01231941-9:head
All chunks with ...
Nitzan Mordechai
05:44 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Laura, thanks for confirm that in the coredump, yes, shard0 also showing that when it get the chunk from bluestore:
...
Nitzan Mordechai
12:07 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Brad and I did some more debugging today.
Here is the end of the log associated with the coredump:...
Laura Flores
 

Also available in: Atom