Project

General

Profile

Activity

From 11/05/2022 to 12/04/2022

12/04/2022

11:56 AM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
/a/yuriw-2022-11-28_21:13:47-rados-wip-yuri11-testing-2022-11-18-1506-distro-default-smithi/7095031/ Matan Breizman
11:46 AM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
/a/yuriw-2022-11-23_21:36:17-rados-wip-yuri11-testing-2022-11-18-1506-distro-default-smithi/7089814/ Matan Breizman
09:41 AM Backport #58144 (In Progress): pacific: mon/MonCommands: Support dump_historic_slow_ops
Matan Breizman
09:37 AM Backport #58143 (In Progress): quincy: mon/MonCommands: Support dump_historic_slow_ops
Matan Breizman

12/02/2022

09:49 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
In a passed job, the crashes are posted:... Laura Flores
09:33 PM Bug #58098 (In Progress): qa/workunits/rados/test_crash.sh: crashes are never posted
In the job that passed, the mgr.server reports a recent crash:
/a/lflores-2022-11-30_22:53:49-rados-main-distro-de...
Laura Flores
09:06 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
In one of the jobs that passed, the OSDs were also failed for 31 seconds, but this time, the crashes were detected. S... Laura Flores
09:02 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Didn't reproduce in the 20x run above, but it did reproduce a second time here:
/a/yuriw-2022-11-28_21:09:37-rados...
Laura Flores
06:09 PM Bug #58052: Empty Pool (zero objects) shows usage.
Attaching server2 to this message.
Brian Woods
06:09 PM Bug #58052: Empty Pool (zero objects) shows usage.
I am realizing those logs are from a single host (server4).
server3 got removed today.
Attaching server1 to this me...
Brian Woods
05:42 PM Bug #58052: Empty Pool (zero objects) shows usage.
Radoslaw Zarzynski wrote:
> Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the c...
Brian Woods
03:28 PM Bug #58156 (In Progress): Monitors do not permit OSD to join after upgrading to Quincy
Igor Fedotov
03:28 PM Bug #58156 (Resolved): Monitors do not permit OSD to join after upgrading to Quincy
The Nautilus cluster has been eventually upgraded to Quincy and at the end OSDs stopped joining the cluster.
The i...
Igor Fedotov
03:24 PM Bug #58155 (Resolved): mon:ceph_assert(m < ranks.size()) `different code path than tracker 50089`
Same problem with https://tracker.ceph.com/issues/50089, but it is a different code path.
We opened a new tracker ...
Kamoltat (Junior) Sirivadhna
01:31 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> 王子敬 wang wrote:
> > Nitzan Mordechai wrote:
> > > Since you attached part of the pglog, ...
王子敬 wang
01:06 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Linked a possible solution for skipping ubuntu with this test. I scheduled a teuthology test for it, which I will use... Laura Flores

12/01/2022

09:44 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Thanks for your observations, Brad! I'm going to dedicate this Tracker to `LibRadosAio.SimpleWrite` and mark it as re... Laura Flores
09:20 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
The issue appears to be in the api_aio test as it gets started but doesn't complete.... Brad Hubbard
08:04 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Ran into another instance of this here:
/a/yuriw-2022-11-30_23:13:27-rados-wip-yuri2-testing-2022-11-30-0724-pacif...
Laura Flores
09:43 PM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
/a/yuriw-2022-11-29_22:29:58-rados-wip-yuri10-testing-2022-11-29-1005-pacific-distro-default-smithi/7097464/ Laura Flores
09:23 PM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
possibly 58130 is related Brad Hubbard
07:30 PM Cleanup #58149 (Resolved): Clarify pool creation failure message due to exceeding max_pgs_per_osd
This was inspired by the Re: [ceph-users] proxmox hyperconverged pg calculations in ceph pacific, pve 7.2 thread.
Anthony D'Atri
07:30 PM Bug #50089 (Resolved): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of...
Kamoltat (Junior) Sirivadhna
06:59 PM Bug #50089 (New): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of moni...
Kamoltat (Junior) Sirivadhna
04:12 PM Backport #58144 (Resolved): pacific: mon/MonCommands: Support dump_historic_slow_ops
https://github.com/ceph/ceph/pull/49233 Backport Bot
04:12 PM Backport #58143 (Resolved): quincy: mon/MonCommands: Support dump_historic_slow_ops
https://github.com/ceph/ceph/pull/49232 Backport Bot
04:02 PM Bug #58141 (Pending Backport): mon/MonCommands: Support dump_historic_slow_ops
Matan Breizman
12:42 PM Bug #58141 (Resolved): mon/MonCommands: Support dump_historic_slow_ops
Slow ops are being tracked in the mon while `dump_historic_slow_ops` command is not registered:
```
$ ceph daemon ....
Matan Breizman
03:56 PM Bug #58142 (In Progress): rbd-python snaps-many-objects: deep-scrub : stat mismatch
... Matan Breizman
03:45 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
It seems more like generic RADOS issue. Adam Kupczyk
12:27 PM Bug #57757 (Fix Under Review): ECUtil: terminate called after throwing an instance of 'ceph::buff...
Nitzan Mordechai
08:18 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
王子敬 wang wrote:
> Nitzan Mordechai wrote:
> > Since you attached part of the pglog, i can't see how many entries yo...
Nitzan Mordechai
01:50 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> Since you attached part of the pglog, i can't see how many entries you have for log and ho...
王子敬 wang
03:41 AM Bug #53806: unessesarily long laggy PG state
Radoslaw Zarzynski wrote:
> OK, Aishwarya has found in testing that the @break@-related commit (https://github.com/c...
玮文 胡
12:51 AM Backport #58040: quincy: osd: add created_at and ceph_version_when_created metadata
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/49159
ceph-backport.sh versi...
Kaoru Esashika

11/30/2022

11:15 PM Bug #58132 (In Progress): qa/standalone/mon: --mon-initial-members setting causes us to populate ...
Kamoltat (Junior) Sirivadhna
11:08 PM Bug #58132 (Resolved): qa/standalone/mon: --mon-initial-members setting causes us to populate rem...
Problem:
--mon-initial-members does nothing but cause monmap
to populate ``removed_ranks`` because the way we sta...
Kamoltat (Junior) Sirivadhna
10:57 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Neha suggested we see how reproducible this is, so as not to mask any underlying problems by sleeping longer. I sched... Laura Flores
10:34 PM Bug #58130 (In Progress): LibRadosAio.SimpleWrite hang and pkill
A rados api test experienced a failure after the last global tests had successfully run.
/a/yuriw-2022-11-29_22:29...
Laura Flores
07:31 PM Bug #58052: Empty Pool (zero objects) shows usage.
Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbos... Radoslaw Zarzynski
07:25 PM Bug #57977: osd:tick checking mon for new map
The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find som...
Radoslaw Zarzynski
07:09 PM Bug #58106 (Need More Info): when a large number of error ops appear in the OSDs,pglog does not t...
Radoslaw Zarzynski
10:43 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Since you attached part of the pglog, i can't see how many entries you have for log and how many for dups
can you pl...
Nitzan Mordechai
08:38 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
王子敬 wang wrote:
> Nitzan Mordechai wrote:
> > @王子敬 wang, can you please send us the output for one of the pgs from ...
王子敬 wang
08:32 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> @王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool...
王子敬 wang
07:30 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
@王子敬 wang, can you please send us the output for one of the pgs from ceph-objectstore-tool?... Nitzan Mordechai
02:16 AM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Nitzan Mordechai wrote:
> @王子敬 wang can you please provide the output of 'ceph pg dump' ?
ok, the output in the pg_...
王子敬 wang
07:07 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
I think the invariant here is that the @acting@ container should not have duplicates. If it is broken, we have a more... Radoslaw Zarzynski
01:55 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
If there are indeed duplicated entries in the acting set, should there be a 'break' at all in this loop? It seems lik... Joshua Baergen
07:00 PM Bug #53806: unessesarily long laggy PG state
OK, Aishwarya has found in testing that the @break@-related commit (https://github.com/ceph/ceph/pull/44499/commits/9... Radoslaw Zarzynski
02:02 PM Bug #53806: unessesarily long laggy PG state
FWIW, we've seen this happen very frequently during Nautilus->{Octopus,Pacific} upgrades. I had just tracked down the... Joshua Baergen
03:36 PM Bug #58114 (Closed): mon: FAILED ceph_assert(rank == new_rank)
Close due to this issue is found pre-merge testing from PR: https://github.com/ceph/ceph/pull/48698/ Kamoltat (Junior) Sirivadhna
04:14 AM Backport #58039: pacific: osd: add created_at and ceph_version_when_created metadata
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/49144
ceph-backport.sh versi...
Kaoru Esashika

11/29/2022

11:18 PM Bug #54438: test/objectstore/store_test.cc: FAILED ceph_assert(bl_eq(state->contents[noid].data, ...
/a/yuriw-2022-11-28_16:28:53-rados-wip-yuri-testing-2022-11-18-1500-pacific-distro-default-smithi/7094026 Laura Flores
07:14 PM Backport #58117 (In Progress): quincy: qa/workunits/rados/test_librados_build.sh: specify redirec...
https://github.com/ceph/ceph/pull/49140 Laura Flores
06:58 PM Backport #58117 (In Progress): quincy: qa/workunits/rados/test_librados_build.sh: specify redirec...
Backport Bot
07:11 PM Backport #58116 (In Progress): pacific: qa/workunits/rados/test_librados_build.sh: specify redire...
https://github.com/ceph/ceph/pull/49139 Laura Flores
06:58 PM Backport #58116 (Resolved): pacific: qa/workunits/rados/test_librados_build.sh: specify redirect ...
Backport Bot
06:52 PM Bug #58046 (Pending Backport): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
Laura Flores
05:37 PM Bug #58046: qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
Seen in Pacific run: /a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smi... Aishwarya Mathuria
05:52 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
We discussed this tracker in the RADOS meeting. Sam pointed out that this set of tests doesn't have any actual users,... Laura Flores
05:24 PM Bug #58114 (Closed): mon: FAILED ceph_assert(rank == new_rank)
/a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smithi/7095280/remote/sm... Aishwarya Mathuria
04:59 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
... Aishwarya Mathuria
03:05 PM Bug #58107: mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
Therefore, there is nothing we can do but wait for the other site to come back up, so pgs can complete peering and th... Kamoltat (Junior) Sirivadhna
03:04 PM Bug #58107 (Closed): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
Closed due to this is not a corner case but quote from Greg Farnum:
``it’s that electing those two monitors means ...
Kamoltat (Junior) Sirivadhna
04:15 AM Bug #58107 (In Progress): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
Kamoltat (Junior) Sirivadhna
04:14 AM Bug #58107 (Closed): mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
h1. How to reproduce the issue
h2. Set up:
mon.a (zone 1) rank=0
mon.b (zone 1) rank=1
mon.c (zone 2) rank=2
...
Kamoltat (Junior) Sirivadhna
01:07 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
@王子敬 wang can you please provide the output of 'ceph pg dump' ? Nitzan Mordechai
01:42 AM Bug #58106 (Need More Info): when a large number of error ops appear in the OSDs,pglog does not t...
When We use the s3 interface append and copy of the object gateway, a large number of error ops appear in the OSDs wh... 王子敬 wang
11:12 AM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
I could avoid this crash by removing all pg for which ceph could not get the clone_bytes, except the one I was sure t... Thomas Le Gentil
09:02 AM Backport #57496 (Resolved): quincy: Invalid read of size 8 in handle_recovery_delete()
Nitzan Mordechai
07:05 AM Bug #50042 (Fix Under Review): rados/test.sh: api_watch_notify failures
Nitzan Mordechai

11/28/2022

10:24 PM Bug #58098 (Fix Under Review): qa/workunits/rados/test_crash.sh: crashes are never posted
Laura Flores
05:34 PM Bug #58098 (Resolved): qa/workunits/rados/test_crash.sh: crashes are never posted
/a/yuriw-2022-11-23_15:09:06-rados-wip-yuri10-testing-2022-11-22-1711-distro-default-smithi/7087281... Laura Flores
09:43 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Just a follow-up.
Finally, what's helping us the best is increasing osd_scrub_sleep to 0.4.
Gilles Mocellin
02:47 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Aishwarya Mathuria wrote:
> We suspect that this assert failure is hit in cases when we try to encode a message befo...
Ben Gao
05:05 AM Support #58091 (New): osd: reduce default value of osd_heartbeat_grace
Client io hang 20s when peer osd ping failure, 20s is too long. In case of network jitter, it generally does not exce... yite gu

11/24/2022

03:54 AM Bug #57977: osd:tick checking mon for new map
The more I dig, the more I'm thinking that this might be some race to do with noup, and probably has nothing to do wi... Joshua Baergen
03:42 AM Bug #57977: osd:tick checking mon for new map
Something that's probably worth mentioning - we had noup set in the cluster for each upgrade, and we wait until all O... Joshua Baergen
03:12 AM Bug #57977: osd:tick checking mon for new map
We saw this happen to roughly a dozen OSDs (1-2 per host for some hosts) during a recent upgrade from Nautilus to Pac... Joshua Baergen

11/22/2022

06:17 PM Bug #57977: osd:tick checking mon for new map
I already restart osd daemon, but have no reproduct. If it happens again, I will collect more logs yite gu
03:54 PM Bug #58052: Empty Pool (zero objects) shows usage.
Radoslaw Zarzynski wrote:
> Could you please provide a log from an active mgr with @debug_ms=1@ and @debug_mgr=20@?
...
Brian Woods

11/21/2022

06:35 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
@Radek I have been trying to reproduce this locally with no luck. I'll try your suggestion and update if I'm successful. Laura Flores
06:34 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Thanks for the link, Matan! I'm a bit worried the experiment there involved changing 2 parameters the same: compiler ... Radoslaw Zarzynski
06:29 PM Bug #58044 (Need More Info): ceph-osd: osd numa affinity setting doesn't work
How do you check the affinity?
Have you rebooted the OSD after the injecting the setting?
Could you please provide ...
Radoslaw Zarzynski
06:22 PM Bug #58046 (Resolved): qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
Radoslaw Zarzynski
06:21 PM Bug #58052 (Need More Info): Empty Pool (zero objects) shows usage.
Could you please provide a log from an active mgr with @debug_ms=1@ and @debug_mgr=20@? We would like to see which OS... Radoslaw Zarzynski
07:18 AM Bug #58027: op slow from throttled to header_read
Radoslaw Zarzynski wrote:
> Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether thi...
yite gu

11/20/2022

05:23 PM Bug #58052 (Need More Info): Empty Pool (zero objects) shows usage.
I have a pool that was/is being used in a CephFS. I have migrated all of the files off of the pool and was preparing... Brian Woods

11/18/2022

03:29 PM Bug #58049 (Resolved): mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker...
First encountered in the downstream: https://bugzilla.redhat.com/show_bug.cgi?id=2142674
When we failover monitors...
Kamoltat (Junior) Sirivadhna
12:40 AM Bug #58046 (Fix Under Review): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
Laura Flores
12:36 AM Bug #58046 (Pending Backport): qa/workunits/rados/test_librados_build.sh: specify redirect in cur...
The workunit currently grabs files with:... Laura Flores

11/17/2022

05:07 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
We suspect that this assert failure is hit in cases when we try to encode a message before the connection is in a sta... Aishwarya Mathuria
03:30 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
> For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.
https...
Matan Breizman
02:09 PM Bug #58044 (Need More Info): ceph-osd: osd numa affinity setting doesn't work
After setting osd_numa_node parameter, the osd numa is not as expected.

xu wang
01:20 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Radoslaw Zarzynski wrote:
> Do we know the reason why switching g++11 helps? Is it a known compiler's bug?
See Br...
Matan Breizman
12:15 PM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
Thomas Le Gentil wrote:
> the osd process does not crash if it is marked 'out'
Sorry, this is false. The OSD cras...
Thomas Le Gentil
09:42 AM Backport #58040 (Resolved): quincy: osd: add created_at and ceph_version_when_created metadata
Backport Bot
09:42 AM Backport #58039 (Resolved): pacific: osd: add created_at and ceph_version_when_created metadata
Backport Bot
09:34 AM Feature #58038 (Pending Backport): osd: add created_at and ceph_version_when_created metadata
Igor Fedotov
07:24 AM Feature #58038: osd: add created_at and ceph_version_when_created metadata
PR#48298 has already been merged. Could you change the status of this issue to "Pending Backport"?
I'll create backp...
Kaoru Esashika
07:15 AM Feature #58038 (Resolved): osd: add created_at and ceph_version_when_created metadata
Add the following two OSD metadata.
- created_at: the timestamp when OSD was created. It's useful when getting som...
Kaoru Esashika

11/16/2022

07:11 PM Bug #57977: osd:tick checking mon for new map
Thanks for the update! Yeah, it might stuck there. To confirm we would logs with increased debugs (maybe @debug_mon =... Radoslaw Zarzynski
07:06 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks for formulating the hypothesis!
Just updating to keep this ticket in the front of the tracker.
Radoslaw Zarzynski
07:02 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
Yeah, worth looking the msgr encode issue has the priority. Radoslaw Zarzynski
07:00 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Discussed during the RADOS Team Meeting on 15 Nov.
Linking the Nitzan's gist: https://gist.github.com/NitzanMordhai/...
Radoslaw Zarzynski
06:58 PM Bug #57989: test-erasure-eio.sh fails since pg is not in unfound
Definitely a low priority. Radoslaw Zarzynski
06:52 PM Bug #58027 (Closed): op slow from throttled to header_read
Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether this is really a bug. Seeing 0,5... Radoslaw Zarzynski
06:48 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Do we know the reason why switching g++11 helps? Is it a known compiler's bug? Radoslaw Zarzynski
05:47 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
I was able to schedule a teuthology run: http://pulpito.front.sepia.ceph.com/lflores-2022-11-16_15:49:13-rados:single... Laura Flores
01:11 PM Bug #57940: ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when nobackfill ...
the osd process does not crash if it is marked 'out' Thomas Le Gentil

11/15/2022

08:44 AM Bug #56772: crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap.count(...
This bug is present in v17.2.5 Thomas Le Gentil
07:32 AM Bug #58027 (Closed): op slow from throttled to header_read
ceph version 15.2.7
Op spend 500ms from throttled to header_read...
yite gu
12:24 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
There is also a coredump located at `/a/matan-2022-09-08_11:12:20-rados:singleton-main-distro-default-smithi/7020422/... Laura Flores
12:01 AM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Some relevant frames:... Laura Flores

11/14/2022

11:39 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
I followed Brad's ubuntu 20.04 coredump tutorial: https://source.redhat.com/personal_blogs/debugging_a_ceph_osd_cored... Laura Flores
08:20 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
The original build is by now expired, so I'm rebuilding it here: https://shaman.ceph.com/builds/ceph/wip-kefu-testing... Laura Flores
08:14 PM Bug #57632: test_envlibrados_for_rocksdb: free(): invalid pointer
Ran the test locally in an ubuntu 20.04 environment, and the test ran fine.
There is a coredump located under /a/k...
Laura Flores
11:37 AM Bug #55750: mon: slow request of very long time
{
"description": "osd_failure(failed timeout osd.6 [v2:10.172.98.151:6800/39,v1:10.172.98.151:68...
yite gu

11/11/2022

08:31 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Also to note: We set `ceph config set mgr mgr_stats_period 1` on the gibba cluster to reproduce this bug. (This occur... Laura Flores
06:27 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
I think https://tracker.ceph.com/issues/49689#note-31 makes sense and the following logs also show what max_oldest_ma... Neha Ojha
10:08 AM Backport #58007: pacific: bail from handle_command() if _generate_command_map() fails
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/48846
ceph-backport.sh versi...
nikhil kshirsagar
09:07 AM Backport #58007 (Resolved): pacific: bail from handle_command() if _generate_command_map() fails
https://github.com/ceph/ceph/pull/48846 Backport Bot
10:03 AM Backport #58006: quincy: bail from handle_command() if _generate_command_map() fails
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/48845
ceph-backport.sh versi...
nikhil kshirsagar
09:07 AM Backport #58006 (Resolved): quincy: bail from handle_command() if _generate_command_map() fails
https://github.com/ceph/ceph/pull/48845 Backport Bot
09:01 AM Bug #57859 (Pending Backport): bail from handle_command() if _generate_command_map() fails
PR https://github.com/ceph/ceph/pull/48044 has been merged in main. Ponnuvel P

11/10/2022

11:37 PM Bug #56101 (Fix Under Review): Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function s...
Laura Flores
11:21 PM Bug #56101 (In Progress): Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_t...
Laura Flores
04:52 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Thanks for your work in capturing the core Laura.
I had a look at the coredump and it shows exactly what we had sp...
Brad Hubbard
07:14 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
/a/yuriw-2022-10-17_17:31:25-rados-wip-yuri7-testing-2022-10-17-0814-distro-default-smithi/7071031 Laura Flores
11:50 AM Bug #57989: test-erasure-eio.sh fails since pg is not in unfound
For some reason, the pool already exist... Nitzan Mordechai
08:44 AM Bug #57757 (In Progress): ECUtil: terminate called after throwing an instance of 'ceph::buffer::v...
Nitzan Mordechai
08:42 AM Bug #57618 (Fix Under Review): rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
Nitzan Mordechai
08:34 AM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
Some of the OSDs stopped due to valgrind errors. This is duplicate of other bug Nitzan Mordechai
08:39 AM Bug #57751 (Fix Under Review): LibRadosAio.SimpleWritePP hang and pkill
Nitzan Mordechai
07:38 AM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
Thanks for taking a look Radek! That's a good point since we are seeing this issue with rados/thrash-erasure-code tes... Aishwarya Mathuria

11/09/2022

10:56 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Managed to reproduce this on the Gibba cluster and produce a coredump!
The core file is located on gibba001 under ...
Laura Flores
08:18 PM Backport #57704 (Resolved): quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reduc...
https://github.com/ceph/ceph/pull/48321 Kamoltat (Junior) Sirivadhna
08:17 PM Backport #57705 (Resolved): pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when redu...
https://github.com/ceph/ceph/pull/48320 Kamoltat (Junior) Sirivadhna
08:17 PM Bug #50089 (Resolved): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of...
Kamoltat (Junior) Sirivadhna
04:34 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks again for looking at this.
I haven't looked further, but I suspect the issue will come down to the variable...
Chris Durham

11/08/2022

09:23 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
pacific backport: https://github.com/ceph/ceph/pull/48803 Kamoltat (Junior) Sirivadhna
08:59 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
quincy backport: https://github.com/ceph/ceph/pull/48802 Kamoltat (Junior) Sirivadhna
07:23 PM Bug #51729: Upmap verification fails for multi-level crush rule
I believe I've reproduced the issue using the osdmaps that Chris provided.
First, I used the osdmaptool to run the...
Laura Flores
02:08 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
after rechecking the logs it looks like we are taking 2 different versions of smithi01231941-9:head
All chunks with ...
Nitzan Mordechai
05:44 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Laura, thanks for confirm that in the coredump, yes, shard0 also showing that when it get the chunk from bluestore:
...
Nitzan Mordechai
12:07 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Brad and I did some more debugging today.
Here is the end of the log associated with the coredump:...
Laura Flores

11/07/2022

09:27 PM Bug #57977: osd:tick checking mon for new map
Radoslaw Zarzynski wrote:
> Octopus is EOL. Does it happen on a supported release?
>
> Regardless of that, could ...
yite gu
06:13 PM Bug #57977 (Need More Info): osd:tick checking mon for new map
Octopus is EOL. Does it happen on a supported release?
Regardless of that, could you please provide logs from this...
Radoslaw Zarzynski
07:30 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Also to note, we can see information about argument `to_read` here:... Laura Flores
07:27 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Nitzan, what do you think about this analysis? Or are there any other frames/locals you'd like me to check? Laura Flores
07:12 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Looking at frame 12, I can see that the incorrect length (262144) for shard 0 is evident in the local variable "from"... Laura Flores
06:02 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Got it to detect the right symbols with the new build!
I will attempt to analyze this coredump at a deeper level, ...
Laura Flores
03:16 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
According to Brad, the build needs to be as close to the test branch that originally experienced the crash as possibl... Laura Flores
07:18 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks Chris! @Radek I have been taking some time to analyze this scenario, and will post updates soon. Laura Flores
06:36 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks for the info! Laura, would you mind retaking a look? Radoslaw Zarzynski
06:36 PM Bug #51729 (New): Upmap verification fails for multi-level crush rule
Radoslaw Zarzynski
06:43 PM Bug #50219 (Closed): qa/standalone/erasure-code/test-erasure-eio.sh fails since pg is not in reco...
The original issue was caused by a commit in a wip branch being tested, so it's highly unprobable it's a reoccurence.... Radoslaw Zarzynski
06:42 PM Bug #57989 (New): test-erasure-eio.sh fails since pg is not in unfound
/a/lflores-2022-10-17_18:19:55-rados:standalone-main-distro-default-smithi/7071287... Radoslaw Zarzynski
06:35 PM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
Likely it's even a duplicate of https://tracker.ceph.com/issues/52657. Radoslaw Zarzynski
06:28 PM Bug #52136 (Fix Under Review): Valgrind reports memory "Leak_DefinitelyLost" errors.
Neha Ojha
06:26 PM Bug #57940 (Duplicate): ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when...
Looks like a duplicate of 56772. Radoslaw Zarzynski
06:24 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Nitzan Mordechai wrote:
> Radoslaw Zarzynski wrote:
> > Well, just found a new occurance.
> Where can i find it?
...
Radoslaw Zarzynski
06:12 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Brad and I ran a reproducer on the gibba cluster (restarting OSDs with `for osd in $(systemctl -l |grep osd|gawk '{pr... Laura Flores
06:01 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Is there any news on that? Radoslaw Zarzynski
05:59 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Updated the PR link. Radoslaw Zarzynski
01:08 AM Bug #57937: pg autoscaler of rgw pools doesn't work after creating otp pool
Is there any updates? Please let me know if I can do something. Satoru Takeuchi

11/06/2022

05:47 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@brad, maybe it's a good candidate for another blog for upstream core dump analysis that you talked about (ubuntu 20.04) Nitzan Mordechai
 

Also available in: Atom