Activity
From 02/06/2020 to 03/06/2020
03/06/2020
- 11:39 PM Bug #43865: osd-scrub-test.sh fails date check
- reproducing this here: http://pulpito.ceph.com/sage-2020-03-06_22:05:09-rados:standalone-wip-sage4-testing-2020-03-05...
- 10:52 PM Bug #43862 (Can't reproduce): mkfs fsck found fatal error: (2) No such file or directory during c...
- 06:15 PM Feature #43377: Make Zstandard compression level a configurable option
- *PR*: https://github.com/ceph/ceph/pull/33790
- 05:39 PM Bug #44362: osd: uninitialized memory in sendmsg
- Merged https://github.com/ceph/ceph/pull/33757 ... should we keep this open or close it?
- 03:01 PM Bug #43882 (Need More Info): osd to mon connection lost, osd stuck down
- i thought i reproduced this, but it was a bug in another PR i was testing.
- 02:40 PM Bug #43882 (In Progress): osd to mon connection lost, osd stuck down
- 12:28 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:25 PM Backport #44070 (Resolved): luminous: Add builtin functionality in ceph-kvstore-tool to repair co...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33195
m... - 12:05 PM Backport #43852 (Resolved): nautilus: osd-scrub-snaps.sh fails
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33274
m... - 10:37 AM Backport #44490 (Resolved): nautilus: lz4 compressor corrupts data when buffers are unaligned
- https://github.com/ceph/ceph/pull/35004
- 10:36 AM Backport #44489 (Rejected): mimic: lz4 compressor corrupts data when buffers are unaligned
- https://github.com/ceph/ceph/pull/35054
- 10:33 AM Backport #44486 (Resolved): nautilus: Nautilus: Random mon crashes in failed assertion at ceph::t...
- https://github.com/ceph/ceph/pull/34542
- 10:30 AM Backport #44468 (Resolved): nautilus: mon: Get session_map_lock before remove_session
- https://github.com/ceph/ceph/pull/34677
- 10:30 AM Backport #44467 (Rejected): mimic: mon: Get session_map_lock before remove_session
- 10:30 AM Backport #44464 (Resolved): nautilus: mon: fix/improve mon sync over small keys
- https://github.com/ceph/ceph/pull/33765
- 07:03 AM Bug #44454 (Resolved): expected valgrind issues and found none
- 03:31 AM Bug #44454 (In Progress): expected valgrind issues and found none
- running with suite-repo pointing to the commit just *before* the py3 task merge faf701d33aeb6e1657c969a41223b37a6972b...
- 04:34 AM Bug #44439 (Fix Under Review): osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST...
- 02:47 AM Bug #44439: osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_stats_ec: ...
- This did reproduce after multiple runs. I added a flush_pg_stats and run it many times without seeing the failure.
- 03:37 AM Bug #44373 (Fix Under Review): objecter: invalid read
- Fix at https://github.com/ceph/ceph/pull/33771
03/05/2020
- 10:13 PM Bug #44454 (Resolved): expected valgrind issues and found none
- http://pulpito.ceph.com/sage-2020-03-05_19:46:30-rados:valgrind-leaks-wip-sage4-testing-2020-03-05-0754-distro-basic-...
- 09:06 PM Bug #44439: osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_stats_ec: ...
- hmm, does not reproduce locally for me.
- 01:20 PM Bug #44439 (Resolved): osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_...
- ...
- 08:19 PM Bug #44453: mon: fix/improve mon sync over small keys
- Nautilus backport: https://github.com/ceph/ceph/pull/33765
- 07:18 PM Bug #44453 (Resolved): mon: fix/improve mon sync over small keys
- Background: [ceph-users] Can't add a ceph-mon to existing large cluster
- 07:32 PM Bug #42830: problem returning mon to cluster
- Workaround in our case is: `ceph config set mon mon_sync_max_payload_size 4096`
We have 5 mons again! - 02:35 PM Bug #42830: problem returning mon to cluster
- I also posted this on the mailinglist, but let me post it here as well:...
- 04:45 PM Backport #44070: luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupted key...
> https://github.com/ceph/ceph/pull/33195
merged- 02:02 PM Bug #44385 (Resolved): ClsHello.WriteReturnData failure
- 01:19 PM Bug #41923 (Can't reproduce): 3 different ceph-osd asserts caused by enabling auto-scaler
03/04/2020
- 10:25 PM Bug #44311 (New): crash in Objecter and CRUSH map lookup
- 01:42 PM Bug #44311: crash in Objecter and CRUSH map lookup
- Scratch that. If you replace qa/workunits/rbd/read-flags.sh with this script https://gist.github.com/MahatiC/a4bf4310...
- 01:35 PM Bug #44311: crash in Objecter and CRUSH map lookup
- Neha Ojha wrote:
> Is this something that started appearing recently? Do you have a commit or version that works for... - 10:11 PM Bug #44400: Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSD...
- This is worth investigating, currently nothing in the choose_acting() function looks at primary-affinity.
- 10:07 PM Bug #44348 (Resolved): thrasher can trigger osd shutdown
- 09:40 PM Bug #44427 (New): osd: stuck during shutdown
- ...
- 08:55 PM Bug #44362: osd: uninitialized memory in sendmsg
- The hole represented by @filler@ is supposed to carry two things:
* zero-byte long ciphertext's fragment acquired fr... - 07:58 PM Bug #37656 (Triaged): FileStore::_do_transaction() crashed with error 17 (merge collection vs osd...
- 07:56 PM Bug #37656: FileStore::_do_transaction() crashed with error 17 (merge collection vs osd restart)
- the merge happens right before we shut down:...
- 07:40 PM Bug #37656: FileStore::_do_transaction() crashed with error 17 (merge collection vs osd restart)
- ...
- 02:30 PM Bug #44420 (Fix Under Review): cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon....
- $SUBJ says it all, almost - The error is:...
- 12:52 PM Bug #43365 (Pending Backport): Nautilus: Random mon crashes in failed assertion at ceph::time_det...
- 05:44 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- ...
- 12:43 PM Bug #44407 (Pending Backport): mon: Get session_map_lock before remove_session
- 10:33 AM Bug #44407 (Fix Under Review): mon: Get session_map_lock before remove_session
- 06:08 AM Bug #44407 (Resolved): mon: Get session_map_lock before remove_session
- We should protect session_map with session_map_lock.
- 10:59 AM Backport #44413 (In Progress): nautilus: FTBFS on s390x in openSUSE Build Service due to presence...
- 10:58 AM Backport #44413 (Resolved): nautilus: FTBFS on s390x in openSUSE Build Service due to presence of...
- https://github.com/ceph/ceph/pull/33716
- 04:45 AM Bug #39525 (Pending Backport): lz4 compressor corrupts data when buffers are unaligned
- 12:16 AM Bug #44385 (Fix Under Review): ClsHello.WriteReturnData failure
- reproduced locally by making the test loop and setting ms_inject_socket_failures=500 on the osd. confirmed this fixe...
- 12:00 AM Bug #44385 (In Progress): ClsHello.WriteReturnData failure
03/03/2020
- 10:13 PM Bug #44362 (In Progress): osd: uninitialized memory in sendmsg
- 03:15 AM Bug #44362: osd: uninitialized memory in sendmsg
- It seems to me that the specific commit just exposed an existing issue that for some reason did't show up before (lik...
- 07:23 PM Bug #44400 (Won't Fix): Marking OSD out causes primary-affinity 0 to be ignored when up_set has n...
- Process:
Set primary-affinity 0 on osd.0
Watch 'ceph osd ls-by-primary osd.0' until it has 0 PGs listed.
Mark os... - 07:12 PM Bug #43150: osd-scrub-snaps.sh fails
- https://github.com/ceph/ceph/pull/33274 merged
- 04:09 PM Bug #43365 (Fix Under Review): Nautilus: Random mon crashes in failed assertion at ceph::time_det...
- 12:58 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Hi,
same behaviour for us: one of the 3 mons crashes randomly, nearly once per day.
We are using Ceph 14.2.6 PVE ... - 04:09 PM Bug #44311: crash in Objecter and CRUSH map lookup
- Is this something that started appearing recently? Do you have a commit or version that works for this same command? ...
- 02:48 PM Bug #44184: Slow / Hanging Ops after pool creation
- We've got similar case with a plenty of slow op indications many of them are osd_op_create ones.
Which eventually g... - 12:34 AM Bug #44388 (New): osd: valgrind: Invalid read of size 8
- ...
03/02/2020
- 10:29 PM Bug #44362: osd: uninitialized memory in sendmsg
- the takeaway from http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/ is that t...
- 09:08 PM Bug #44362: osd: uninitialized memory in sendmsg
- The regression is between these commits: d27f512d1731988cf7f369559f2fc324f1592047..7b0e18c09eb6060ee23f00c06dac4203a2...
- 08:39 PM Bug #44385 (Resolved): ClsHello.WriteReturnData failure
- ...
- 06:57 PM Bug #44311: crash in Objecter and CRUSH map lookup
- To give more context, this issue is blocking progress on rbd op threads config change -> https://github.com/ceph/ceph...
- 06:04 PM Bug #44358 (Resolved): messenger addr nonces aren't unique with cephadm
- 02:06 PM Bug #44373 (Resolved): objecter: invalid read
- ...
- 12:36 PM Backport #44370 (Resolved): nautilus: msg/async: the event center is blocked by rdma construct co...
- https://github.com/ceph/ceph/pull/34780
- 12:36 PM Backport #44369 (Rejected): mimic: msg/async: the event center is blocked by rdma construct conec...
- 12:36 PM Backport #44368 (Rejected): mimic: Rados should use the '-o outfile' convention
03/01/2020
- 11:00 PM Bug #44362 (Can't reproduce): osd: uninitialized memory in sendmsg
- ...
- 10:55 PM Bug #44358 (Fix Under Review): messenger addr nonces aren't unique with cephadm
- 07:47 AM Bug #42452 (Pending Backport): msg/async: the event center is blocked by rdma construct conection...
- 04:18 AM Backport #44360 (In Progress): nautilus: Rados should use the '-o outfile' convention
- 04:18 AM Backport #44360: nautilus: Rados should use the '-o outfile' convention
- https://github.com/ceph/ceph/pull/33641
- 04:17 AM Backport #44360 (Resolved): nautilus: Rados should use the '-o outfile' convention
- https://github.com/ceph/ceph/pull/33641
- 04:08 AM Bug #42477 (Pending Backport): Rados should use the '-o outfile' convention
- we have to backport this change, otherwise we have ...
02/29/2020
- 06:24 AM Bug #43185: ceph -s not showing client activity
- ...
- 06:19 AM Bug #43185: ceph -s not showing client activity
- ...
- 12:18 AM Bug #44314: osd-backfill-stats.sh failing intermittently in TEST_backfill_sizeup_out() (degraded ...
The kick_recovery_wq didn't get backfill restarted on the failed run. Or a recovery attempt (periodic?) was someho...
02/28/2020
- 11:01 PM Bug #44314: osd-backfill-stats.sh failing intermittently in TEST_backfill_sizeup_out() (degraded ...
-
The unset of nobackfill happened after an attempt to start backfill was initiated and it deferred due to the fl... - 09:37 PM Bug #43126 (Resolved): OSD_SLOW_PING_TIME_BACK nits
- 08:30 PM Bug #44358 (Resolved): messenger addr nonces aren't unique with cephadm
- we use the pid for the nonce all over the place, but with cephadm the pid of daemons is always 1.
- 03:45 PM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
- For future references.
If I understand right: it seems that this happens during recovery when pg gets trim command a... - 12:56 PM Bug #44352 (New): pool listings are slow after deleting objects
- I'm seeing a weird problem on a system where the following was done:
* multi-site setup with two zones
* primary ... - 12:14 PM Bug #41016 (Resolved): Improve upmap change reporting in logs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:14 PM Bug #41317 (Resolved): PeeringState::GoClean will call purge_strays unconditionally
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:12 PM Bug #42387 (Resolved): ceph_test_admin_socket_output fails in rados qa suite
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:12 PM Bug #42501 (Resolved): format error: ceph osd stat --format=json
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:11 PM Backport #43992 (Need More Info): nautilus: objecter doesn't send osd_op
- first attempted backport - https://github.com/ceph/ceph/pull/33144 - was closed
marking non-trivial - 12:11 PM Bug #43308 (Resolved): negative num_objects can set PG_STATE_DEGRADED
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:11 PM Backport #43991 (Need More Info): mimic: objecter doesn't send osd_op
- first attempted backport - https://github.com/ceph/ceph/pull/33143 - was closed
marking non-trivial - 09:17 AM Bug #44297 (Resolved): mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymous conne...
- 09:07 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-27_05:12:04-rados-wip-sage2-testing-2020-02-26-1925-distro-basic-smithi/4806157
- 09:06 AM Bug #44348 (Fix Under Review): thrasher can trigger osd shutdown
- 08:56 AM Bug #44348 (Resolved): thrasher can trigger osd shutdown
- ...
- 03:42 AM Bug #44296 (Resolved): qa/standalone/mgr/balancer.sh fails due to test error and not waiting for ...
- 02:39 AM Bug #44296 (Fix Under Review): qa/standalone/mgr/balancer.sh fails due to test error and not wait...
- 02:32 AM Bug #44022 (Fix Under Review): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes...
- 02:10 AM Bug #44022 (In Progress): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an o...
- 02:00 AM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- Ah, this is why: 168e20ab8b8da3a5aed41b73f9627d10971be67b...
02/27/2020
- 11:30 PM Bug #44022 (Fix Under Review): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes...
- In any case, https://github.com/ceph/ceph/pull/33590 will just prevent it from crashing.
- 11:22 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- The part that I don't understand is when osd.6 responded, the epoch_sent/epoch_requested(4115/4100) seem correct
<... - 03:45 AM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- On osd.8(mimic)
This is when we request the log from osd.6... - 02:29 AM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- ...
- 09:50 PM Support #22749 (Closed): dmClock OP classification
- 09:11 PM Bug #42328 (Resolved): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- 06:22 PM Backport #43472 (Resolved): mimic: negative num_objects can set PG_STATE_DEGRADED
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33331
m... - 06:21 PM Backport #43320 (Resolved): mimic: PeeringState::GoClean will call purge_strays unconditionally
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33329
m... - 06:20 PM Backport #42998 (Resolved): mimic: acting_recovery_backfill won't catch all up peers
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33324
m... - 06:20 PM Backport #42852 (Resolved): mimic: format error: ceph osd stat --format=json
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33322
m... - 06:18 PM Backport #43881 (Resolved): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33154
m... - 06:18 PM Backport #43987 (Resolved): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33145
m... - 06:17 PM Backport #43652 (Resolved): mimic: Improve upmap change reporting in logs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32717
m... - 06:17 PM Backport #40890 (Resolved): mimic: Pool settings aren't populated to OSD after restart.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32125
m... - 06:16 PM Backport #42879 (Resolved): mimic: ceph_test_admin_socket_output fails in rados qa suite
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33323
m... - 06:16 PM Backport #43630 (Resolved): mimic: segv in collect_sys_info
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32902
m... - 04:16 PM Bug #39525 (Fix Under Review): lz4 compressor corrupts data when buffers are unaligned
- Thanks to Dan we have a reproducer! I cleaned it up a bit, rebased on master, added a workaround to the LZ4 plugin, ...
- 03:44 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- No, not positive. Very early on we did play around with compression also for the metadata, but in the end decided lat...
- 03:26 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Erik Lindahl wrote:
> Hi,
>
> Oops; Sorry Dan, but I just realised I misled you. While we do have aggressive LZ4 ... - 10:57 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Hi,
Oops; Sorry Dan, but I just realised I misled you. While we do have aggressive LZ4 enabled by *default* (in pa... - 10:34 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- I got confirmation from Troy and Erik -- both are using lz4 compression like us.
I'm trying to reproduce using uni... - 12:55 PM Backport #44324 (Resolved): nautilus: Receiving RemoteBackfillReserved in WaitLocalBackfillReserv...
- https://github.com/ceph/ceph/pull/34512
02/26/2020
- 11:57 PM Backport #43472: mimic: negative num_objects can set PG_STATE_DEGRADED
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33331
merged - 11:56 PM Backport #43320: mimic: PeeringState::GoClean will call purge_strays unconditionally
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33329
merged - 11:55 PM Backport #42998: mimic: acting_recovery_backfill won't catch all up peers
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33324
merged - 11:55 PM Backport #42852: mimic: format error: ceph osd stat --format=json
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33322
merged - 11:54 PM Backport #43881: mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33154
merged - 11:52 PM Backport #43881: mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33154
merged - 11:51 PM Backport #43987: mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33145
merged - 11:51 PM Backport #43652: mimic: Improve upmap change reporting in logs
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/32717
merged - 11:49 PM Backport #40890: mimic: Pool settings aren't populated to OSD after restart.
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32125
merged - 11:47 PM Backport #42879: mimic: ceph_test_admin_socket_output fails in rados qa suite
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33323
merged - 11:44 PM Backport #43630: mimic: segv in collect_sys_info
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32902
merged - 11:08 PM Bug #44311: crash in Objecter and CRUSH map lookup
- Mahati Chamarthy wrote:
> Neha Ojha wrote:
> > Which version is this on?
>
Current master. To reproduce set rbd_... - 10:13 PM Bug #44311: crash in Objecter and CRUSH map lookup
- Neha Ojha wrote:
> Which version is this on?
Current master - 10:09 PM Bug #44311 (Need More Info): crash in Objecter and CRUSH map lookup
- Which version is this on?
- 05:45 PM Bug #44311 (Resolved): crash in Objecter and CRUSH map lookup
- When Concurrent reads are issued with the below rbd command, it results in failure due to crash in Objecter and CRUSH...
- 09:54 PM Bug #44296 (In Progress): qa/standalone/mgr/balancer.sh fails due to test error and not waiting f...
- 09:50 PM Feature #44107: mon: produce stable election results when netsplits and other errors happen
- Marking anything we need for octopus as "Urgent".
- 08:51 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- ...
- 08:48 PM Bug #44314 (Resolved): osd-backfill-stats.sh failing intermittently in TEST_backfill_sizeup_out()...
- ...
- 07:47 PM Bug #43914 (Fix Under Review): nautilus: ceph tell command times out
- 06:48 PM Bug #43914: nautilus: ceph tell command times out
- okay yeah, it's because the command wq uses osd_lock...
- 06:41 PM Bug #43914: nautilus: ceph tell command times out
- so, this was fixed in nautilus, in the sense that https://github.com/ceph/ceph/pull/27696 went into nautilus.
- 06:37 PM Bug #43914: nautilus: ceph tell command times out
- The thread (or lock?) is busy with...
- 05:13 PM Bug #43914: nautilus: ceph tell command times out
- This run has more relavant information: /a/nojha-2020-02-26_03:20:34-upgrade:mimic-x:stress-split-nautilus-distro-bas...
- 07:33 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- follow-up fix: https://github.com/ceph/ceph/pull/33559 (typo in original commit)
- 05:25 PM Bug #41183: pg autoscale on EC pools
- Looks like a fix is going in: https://github.com/ceph/ceph/pull/33170
- 04:54 PM Bug #41183: pg autoscale on EC pools
- Seem to have the same issue here.
158 OSDs with 1 main pool, an EC 5+2 pool with a 2048 pg_num, but the autoscaler... - 02:28 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- We're seeing this a couple times a day on debian 10.1, using croit's repo:
kernel 4.19.67-2+deb10u1
ceph version 14... - 10:54 AM Cleanup #44309 (New): auth: remove deprecated 'auid' field from pool metadata
- As per https://github.com/ceph/ceph/pull/23540#issuecomment-413589557, 'auid' field was deprecated but never removed ...
- 12:09 AM Bug #44297 (Fix Under Review): mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymo...
- 12:02 AM Bug #44297: mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymous connection")
- The command is passed from a nautilus monitor:...
02/25/2020
- 11:51 PM Bug #44275 (Resolved): NameError: name 'retval' is not defined
- 11:50 PM Bug #44248 (Pending Backport): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can ...
- 03:16 AM Bug #44248 (Fix Under Review): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can ...
- 02:37 AM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
- The problem is that though osd.1 sent a RELEASE to osd.8, we still ended up de-queueing "4184 RemoteBackfillReserved"...
- 12:49 AM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
- This is when 4184 RemoteBackfillReserved was enqueued...
- 11:43 PM Bug #44297 (Resolved): mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymous conne...
- on nautilus->octopus/master upgrade...
- 11:39 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- ...
- 11:32 PM Bug #44296 (Resolved): qa/standalone/mgr/balancer.sh fails due to test error and not waiting for ...
http://pulpito.ceph.com/dzafman-2020-02-08_20:24:49-rados-wip-zafman-testing-distro-basic-smithi/4746333
With 2 ...- 09:36 PM Bug #43914: nautilus: ceph tell command times out
- First observation from teuthology.log for /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-b...
- 06:34 PM Bug #38219: rebuild-mondb hangs
- Seen in nautilus: /a/yuriw-2020-02-15_16:49:25-rados-nautilus-distro-basic-smithi/4767419/
- 04:40 PM Backport #43650: nautilus: Improve upmap change reporting in logs
- 250a778fe8bd6eadf16fa1988403e0410c528543 will be in v14.2.8
- 03:46 PM Backport #44289 (Resolved): nautilus: mon: update + monmap update triggers spawn loop
- https://github.com/ceph/ceph/pull/34500
- 02:28 PM Backport #44206 (In Progress): nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempM...
- 02:23 PM Bug #44286 (New): Cache tiering shows unfound objects after OSD reboots
- We've got a cluster with a 3/2 size/min_size replicated cache pool in front of an erasure coded pool used for RBD.
... - 01:11 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Might be a stretch, but I just noticed that our bits are flipped nearby the 128k boundary, which is ?coincidentally? ...
02/24/2020
- 10:38 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
- 07:42 PM Bug #44076 (Pending Backport): mon: update + monmap update triggers spawn loop
- 07:36 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- https://github.com/ceph/ceph/pull/33470 - fixing the order of msgr2 vs nautilus install is the first step here.
- 05:48 PM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
- ...
- 04:25 PM Bug #44275 (Fix Under Review): NameError: name 'retval' is not defined
- 04:17 PM Bug #44275 (Resolved): NameError: name 'retval' is not defined
- ...
- 03:50 PM Bug #42830: problem returning mon to cluster
- I noticed there is very little osdmap caching in the leader mon -- here we see only 1 single osdmap in the mempool.
... - 05:45 AM Backport #44259 (In Progress): nautilus: Slow Requests/OP's types not getting logged
- 05:03 AM Backport #44259 (Resolved): nautilus: Slow Requests/OP's types not getting logged
- https://github.com/ceph/ceph/pull/33503
- 05:24 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- More ftr: the corruption occurs in the crush part of the osdmap:...
- 05:16 AM Bug #43975 (Pending Backport): Slow Requests/OP's types not getting logged
02/23/2020
- 10:08 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Likely related....
- 09:29 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Adding crash signature (cf2864eb1281dffc3340730dc2caae163b4c0170132bcbd3dcbd6147d8f29fa8) for the crash described in ...
- 09:05 PM Bug #43861: ceph_test_rados_watch_notify hang
- ...
- 02:29 PM Bug #41313: PG distribution completely messed up since Nautilus
- ...
- 12:13 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- A bit more about our incident ftr.
The cluster has 1301 osds in total: 752 filestore and 549 bluestore. The filest...
02/22/2020
- 04:47 PM Bug #44248 (Resolved): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause th...
- ...
- 01:25 PM Backport #44206: nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- Started a backport here https://github.com/ceph/ceph/pull/33483
- 09:49 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- > o->decode(obl); <------ HERE
I have gdb working now on a coredump so can confirm that:... - 01:00 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- ^^ Is a weird red-herring. The FFFFFFFF is because the osdmap contains the crc32c in the last 4 bytes, so that cancel...
- 01:23 AM Bug #43914: nautilus: ceph tell command times out
- This is on nautilus: /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4788575/
... - 01:14 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-21_21:08:33-rados-wip-sage3-testing-2020-02-21-1218-distro-basic-smithi/4788714...
02/21/2020
- 10:48 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Found something. The crc32c for all my *good* maps is FFFFFFFF (and I assure you they are different maps.. gsutil out...
- 10:16 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Just to provide the same update I gave to Dan van der Ster over email:
IIRC, we saw this 1-2 times more after the ... - 09:42 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- This is continuing to happen for us. Log file here.
ceph-post-file: 589aa7aa-7a80-49a2-ba55-376e467c4550 - 10:19 PM Bug #42830: problem returning mon to cluster
- Seeing the same here in 13.2.8 starting a new empty mon. Leader's CPU goes to 100%, until an election is called then ...
- 09:03 PM Bug #44243 (Can't reproduce): memstore make check test fails
- ...
- 01:29 PM Bug #42328 (New): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- It looks like this is still occurring even with a branch that included 8182f52149: http://qa-proxy.ceph.com/teutholo...
- 01:21 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
- Bastian Mäuser wrote:
> This is still an issue on 14.2.6 (at least the one shipped with proxmox)
It will appear i... - 12:49 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- FTR this looks identical to https://tracker.ceph.com/issues/39525#note-6
- 12:25 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- So the timeout, as previously mentioned, was 10 seconds although osd_default_notify_timeout is 30 seconds by default....
02/20/2020
- 07:02 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- ok, the first crash isn't becuase we just got bad data.. it's because we just read bad data off of disk. see:...
- 04:09 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Notes from CERN incident:
- identical corruption, different OSDmaps on different OSDs:... - 05:40 PM Bug #44229 (New): monclient: _check_auth_rotating possible clock skew, rotating keys expired way ...
- seems to affect cephadm bootstrap tests
first, the error message doesn't make sense, since the bound 2020-02-20T16... - 12:20 PM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> Hi Wido,
>
> I did come across something like this while investigating https://tracker.ceph.co... - 12:42 AM Bug #44217 (Can't reproduce): Leaked connection (alloc from AsyncMessenger::add_accept)
- ...
02/19/2020
- 11:42 PM Bug #44076 (Fix Under Review): mon: update + monmap update triggers spawn loop
- 10:45 PM Bug #44157 (Resolved): cli throws bad exceptoin on control-c
- 10:11 PM Bug #44120 (Need More Info): NVMEDevice failed in certain NVMe Disk
- Can you attach logs from the crash? Which version are using?
- 10:08 PM Bug #44184 (Need More Info): Slow / Hanging Ops after pool creation
- Hi Wido,
I did come across something like this while investigating https://tracker.ceph.com/issues/43048. It was a... - 07:18 PM Bug #44184: Slow / Hanging Ops after pool creation
- On the Ceph users list there are multiple reports of people experiencing this:
- https://www.spinics.net/lists/cep... - 04:55 PM Bug #37656 (New): FileStore::_do_transaction() crashed with error 17 (merge collection vs osd res...
- /a/teuthology-2020-02-11_02:30:03-upgrade:mimic-x-nautilus-distro-basic-smithi/4753470/
upgrade:mimic-x/stress-spl... - 11:00 AM Bug #43151 (Resolved): ok-to-stop incorrect for some ec pgs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:00 AM Bug #43721 (Resolved): qa/standalone/misc/ok-to-stop.sh occasionally fails
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:59 AM Backport #44206 (Resolved): nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap:...
- https://github.com/ceph/ceph/pull/33530
02/18/2020
- 07:55 PM Bug #43903 (Pending Backport): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- 07:52 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- ...
- 04:43 PM Bug #44184 (Need More Info): Slow / Hanging Ops after pool creation
- On a cluster with 1405 OSDs I've ran into a situation for the second time now where a pool creation resulted into mas...
- 10:28 AM Backport #44085 (Resolved): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33278
m... - 10:28 AM Backport #44082 (Resolved): nautilus: expected MON_CLOCK_SKEW but got none
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33276
m... - 10:27 AM Backport #43772 (Resolved): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32844
m... - 10:27 AM Backport #43239 (Resolved): nautilus: ok-to-stop incorrect for some ec pgs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32844
m...
02/17/2020
- 11:45 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /ceph/teuthology-archive/pdonnell-2020-02-15_16:51:06-fs-wip-pdonnell-testing-20200215.033325-distro-basic-smithi/476...
- 07:08 PM Backport #42662 (In Progress): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]...
- 02:16 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
- This is still an issue on 14.2.6 (at least the one shipped with proxmox)
02/16/2020
02/15/2020
- 03:11 PM Bug #44157 (Fix Under Review): cli throws bad exceptoin on control-c
- 02:37 PM Bug #44041 (Resolved): osd: MLease in stray state -> Crashed
- 02:37 PM Bug #42328 (Resolved): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- 02:36 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-15_04:59:38-rados-wip-sage3-testing-2020-02-14-1951-distro-basic-smithi/4765960
- 02:56 AM Bug #43975: Slow Requests/OP's types not getting logged
- Before and after logs to show the extra information relating to slow op/types:
Before:
--------...
02/14/2020
- 09:58 PM Bug #43975 (Fix Under Review): Slow Requests/OP's types not getting logged
- 08:22 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- Neha Ojha wrote:
> pg3.4, which is stuck in "peering" shows similar behavior as https://tracker.ceph.com/issues/4304... - 03:28 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- pg3.4, which is stuck in "peering" shows similar behavior as https://tracker.ceph.com/issues/43048#note-15
osd.10 ... - 07:24 PM Bug #44156 (Fix Under Review): RenewLease sent to pre-octopus osds during upgrade
- 05:20 PM Bug #44156 (Resolved): RenewLease sent to pre-octopus osds during upgrade
- ...
- 05:35 PM Bug #44157 (Resolved): cli throws bad exceptoin on control-c
- ...
- 05:23 PM Backport #44085: nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33278
merged - 05:23 PM Backport #44082: nautilus: expected MON_CLOCK_SKEW but got none
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33276
merged - 05:22 PM Backport #43772: nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32844
merged - 05:22 PM Backport #43239: nautilus: ok-to-stop incorrect for some ec pgs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32844
merged - 02:48 PM Backport #43996 (Need More Info): mimic: Ceph tools utilizing "global_[pre_]init" no longer proce...
- should be based on the nautilus backport
- 02:37 PM Backport #42662 (New): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]size == 1
- Changed status to re-attempt the backport.
- 02:23 PM Backport #43621: luminous: pg: fastinfo incorrect when last_update moves backward in time
- nautilus backport is marked non-trivial, so this one is also non-trivial
- 02:23 PM Backport #43622 (Need More Info): mimic: pg: fastinfo incorrect when last_update moves backward i...
- nautilus backport is marked non-trivial, so this one is also non-trivial
- 02:21 PM Backport #43472 (In Progress): mimic: negative num_objects can set PG_STATE_DEGRADED
- 02:19 PM Backport #43470 (In Progress): mimic: asynchronous recovery + backfill might spin pg undersized f...
- 02:18 PM Backport #43320 (In Progress): mimic: PeeringState::GoClean will call purge_strays unconditionally
- 12:56 PM Backport #43257 (In Progress): mimic: monitor config store: Deleting logging config settings does...
- 12:50 PM Backport #42996 (In Progress): luminous: acting_recovery_backfill won't catch all up peers
- 12:40 PM Backport #42998 (In Progress): mimic: acting_recovery_backfill won't catch all up peers
- 12:28 PM Backport #42879 (In Progress): mimic: ceph_test_admin_socket_output fails in rados qa suite
- 12:26 PM Backport #42852 (In Progress): mimic: format error: ceph osd stat --format=json
- 09:29 AM Bug #43296 (Resolved): Ceph assimilate-conf results in config entries which can not be removed
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:29 AM Bug #43404 (Resolved): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:28 AM Bug #43552 (Resolved): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:28 AM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:26 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33152
m... - 09:26 AM Backport #43821 (Resolved): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32908
m... - 09:25 AM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33155
m... - 09:25 AM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33147
m... - 09:24 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33007
m... - 09:23 AM Backport #43731 (Resolved): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32905
m... - 09:23 AM Backport #43822 (Resolved): nautilus: Ceph assimilate-conf results in config entries which can no...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32856
m... - 06:16 AM Bug #44120: NVMEDevice failed in certain NVMe Disk
- I tested that my NVME card can create at most 6 pair queues.
- 02:48 AM Bug #43861: ceph_test_rados_watch_notify hang
- Almost certainly the same issue as #44062
02/13/2020
- 11:19 PM Bug #43124 (Resolved): Probably legal crush rules cause upmaps to be cleaned
- 11:05 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Reproduced and I see #43808 while doing so so I'm going to treat them as related for now at least.
I think we can ... - 02:53 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- I can't reproduce this so far. If anyone can reproduce it reliably maybe we could try increasing the notify timeout i...
- 02:22 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Ah, that's right, from memory these Warnings are related to valgrind. Valgrind is also notorious for slowing things d...
- 02:02 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- When trying to reproduce I am seeing a *lot* of these which may, or may not, be related....
- 10:48 PM Feature #44131 (New): Add AAAA DNS record for drop.ceph.com
- drop.ceph.com is only reachable through IPv4 because of a lack of a IPv6 DNS record (AAAA). For IPv6 only clusters th...
- 08:15 PM Backport #43879: nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33152
merged - 08:12 PM Backport #43821: nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32908
merged - 08:09 PM Backport #43916: nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33155
merged - 08:08 PM Backport #43989: nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33147
merged - 07:35 PM Backport #43928: nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33007
merged - 07:30 PM Backport #43731: nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32905
merged - 07:29 PM Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32856
merged - 06:08 PM Feature #44025 (In Progress): Make it harder to set pool replica size to 1
- 03:43 PM Bug #43975: Slow Requests/OP's types not getting logged
- The logging to cluster logs was removed as part of re-factoring effort in mimic. Here are the commits of interest:
... - 02:21 PM Backport #44085 (In Progress): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump E...
- 02:19 PM Bug #44120 (Need More Info): NVMEDevice failed in certain NVMe Disk
- I got the error as following:
nvme_ctrlr.c: 308:spdk_nvme_ctrlr_alloc_io_qpair: *ERROR*: No free I/O queue IDs
Th... - 02:15 PM Backport #44082 (In Progress): nautilus: expected MON_CLOCK_SKEW but got none
- 02:09 PM Backport #44081 (In Progress): nautilus: ceph -s does not show >32bit pg states
- 02:06 PM Backport #43346: nautilus: short pg log + cache tier ceph_test_rados out of order reply
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32848
m... - 02:06 PM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
- 11:54 AM Backport #43346 (In Progress): nautilus: short pg log + cache tier ceph_test_rados out of order r...
- 02:03 PM Backport #43852 (In Progress): nautilus: osd-scrub-snaps.sh fails
- 11:31 AM Backport #43997 (In Progress): nautilus: Ceph tools utilizing "global_[pre_]init" no longer proce...
- 10:27 AM Bug #44089 (Fix Under Review): mon: --format=json does not work for config get or show
- 08:37 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
- grep for checking ASCII-only names:...
- 08:35 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
- Hi, David
> Do all the objects with missing copies have names that included multi-byte characters?
yes, most of... - 12:29 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
Two questions:
Do all the objects with missing copies have names that included multi-byte characters?
Are the...- 07:09 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- I'd like to reopen this, since there are now reports about crashes on Centos (see possible duplicate linked to this i...
- 01:43 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- In the failure that sage observed on master, I looked at pg4.7, which is stuck in creating+peering.
osd.10(mimic) ...
02/12/2020
- 11:58 PM Feature #44108 (In Progress): mon: osd: handle 2-(main-)site stretch clusters explicitly, so no a...
- People have hacked together stretch clusters on top of Ceph using 3 sites for years, or even using 2 sites and interv...
- 11:56 PM Feature #44107 (Fix Under Review): mon: produce stable election results when netsplits and other ...
- 11:53 PM Feature #44107 (Resolved): mon: produce stable election results when netsplits and other errors h...
- Right now, in netsplits and similar error conditions the monitors do not produce a stable quorum: whichever monitors ...
- 10:42 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Sure Neha
- 10:16 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Brad, can you please take a look at this?
- 12:18 AM Bug #44062 (Triaged): LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-11_20:49:48-rados-wip-sage-testing-2020-02-11-1121-distro-basic-smithi/4755080
- 10:35 PM Bug #44004 (Can't reproduce): "ceph" command crashes
- 10:34 PM Bug #44015: Cant compile src/tools/rados/rados.cc on 32 bit systems
- Following is the explanation for why it was done....
- 04:33 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- Runs:
* http://pulpito.ceph.com/rzarzynski_bug43903,
* http://pulpito.ceph.com/rzarzynski_bug43903_more_pgnum_c... - 03:58 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Saw something similar but on Centos 8: https://tracker.ceph.com/issues/44078.
Marking as related for now, possibly i... - 07:18 AM Feature #44025: Make it harder to set pool replica size to 1
- Deepika Upadhyay wrote:
> I assume you are talking about:
>
> > To remove a pool the mon_allow_pool_delete flag ... - 05:13 AM Feature #44025: Make it harder to set pool replica size to 1
- Greg Farnum wrote:
> Pool deletion also requires a config option to be set on the monitor before it's allowed throug... - 03:44 AM Bug #44092 (Resolved): mon: config commands do not accept whitespace style config name
- e.g....
02/11/2020
- 10:31 PM Backport #44070 (In Progress): luminous: Add builtin functionality in ceph-kvstore-tool to repair...
- https://github.com/ceph/ceph/pull/33195
- 10:28 AM Backport #44070: luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupted key...
- We need backporting of PR 16745 and subsequent PRs. Refer original tracker #17730 for adding support to repair leveld...
- 03:07 AM Backport #44070 (New): luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupt...
- We seems to have it in ceph-kvstore-tool as "destructive-repair" option ? Is this option does leveldb/rocksdb repair?...
- 03:01 AM Backport #44070 (Closed): luminous: Add builtin functionality in ceph-kvstore-tool to repair corr...
- 02:57 AM Backport #44070 (Resolved): luminous: Add builtin functionality in ceph-kvstore-tool to repair co...
- In some cases like ceph cluster upgrade or due to filesystem issue, the leveldb/rocksdb gets corrupted which can caus...
- 10:20 PM Bug #44089 (Fix Under Review): mon: --format=json does not work for config get or show
- In addition to the json output not working, when giving either these commands a specific key to fetch:...
- 09:58 PM Bug #38358 (Resolved): short pg log + cache tier ceph_test_rados out of order reply
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:55 PM Feature #41647 (Resolved): pg_autoscaler should show a warning if pg_num isn't a power of two
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:54 PM Bug #42346 (Resolved): Nearfull warnings are incorrect
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:54 PM Bug #42411 (Resolved): nautilus:osd: network numa affinity not supporting subnet port
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:54 PM Bug #42566 (Resolved): mgr commands fail when using non-client auth
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:53 PM Bug #42780 (Resolved): recursive lock of OpTracker::lock (70)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:53 PM Bug #42961 (Resolved): osd: increase priority in certain OSD perf counters
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:53 PM Backport #44088 (Rejected): mimic: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 09:53 PM Backport #44087 (Rejected): luminous: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 09:51 PM Backport #44086 (Rejected): mimic: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 09:51 PM Backport #44085 (Resolved): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- https://github.com/ceph/ceph/pull/33278
- 09:51 PM Backport #44084 (Rejected): luminous: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 09:51 PM Bug #43587 (Resolved): mon shutdown timeout (race with async compaction)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:51 PM Bug #43592 (Resolved): osd-recovery-space.sh has a race
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:51 PM Backport #44083 (Resolved): mimic: expected MON_CLOCK_SKEW but got none
- https://github.com/ceph/ceph/pull/34370
- 09:50 PM Backport #44082 (Resolved): nautilus: expected MON_CLOCK_SKEW but got none
- https://github.com/ceph/ceph/pull/33276
- 09:50 PM Backport #44081 (Resolved): nautilus: ceph -s does not show >32bit pg states
- https://github.com/ceph/ceph/pull/33275
- 09:38 PM Backport #43256 (Resolved): nautilus: monitor config store: Deleting logging config settings does...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32846
m... - 09:37 PM Backport #43631 (Resolved): nautilus: segv in collect_sys_info
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32901
m... - 09:37 PM Backport #43473 (Resolved): nautilus: recursive lock of OpTracker::lock (70)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32858
m... - 09:36 PM Backport #43245 (Resolved): nautilus: osd: increase priority in certain OSD perf counters
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32845
m... - 09:36 PM Backport #43726 (Resolved): nautilus: osd-recovery-space.sh has a race
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32774
m... - 04:26 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
- Backport also requires -https://github.com/ceph/ceph/pull/33213- https://github.com/ceph/ceph/pull/33243
- 04:03 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
- There will be a second fix for this issue since CLI optionals are no longer overriding the environment.
- 04:03 PM Backport #43997: nautilus: Ceph tools utilizing "global_[pre_]init" no longer process "early" env...
- There will be a second fix for this issue since CLI optionals are no longer overriding the environment.
- 04:02 PM Backport #43996: mimic: Ceph tools utilizing "global_[pre_]init" no longer process "early" enviro...
- There will be a second fix for this issue since CLI optionals are no longer overriding the environment.
- 02:28 PM Bug #44067 (Resolved): cephtool/test.sh test fails to scrub all pools
- 02:27 PM Bug #44076 (Resolved): mon: update + monmap update triggers spawn loop
- - upgrade monitors from mimic to octopus
- quorum of 2/3 monitors
- enable msgr2
then
- third monitor probes... - 09:57 AM Bug #44072 (New): Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_...
- Hi,
I sat severity=Critical for attention grabbing because i think is serious problem!
We have two different Lu... - 06:18 AM Bug #43582 (Pending Backport): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 02:57 AM Bug #44050 (Resolved): mon tell command args don't work
- 02:34 AM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
- This hasn't shown up in master for a while and Sridhar has also not been able to reproduce this, hence reducing prior...
- 12:50 AM Bug #44053 (Resolved): test_envlibrados_for_rocksdb.sh fails on master
- 12:50 AM Bug #44053 (Rejected): test_envlibrados_for_rocksdb.sh fails on master
- 12:49 AM Bug #43833 (Resolved): shaman on bionic/cromson: cmake error: undefined reference to `pthread_cre...
- the error message is misleading. the root cause is...
02/10/2020
- 11:14 PM Bug #43889 (Pending Backport): expected MON_CLOCK_SKEW but got none
- 02:41 PM Bug #43889 (Fix Under Review): expected MON_CLOCK_SKEW but got none
- 09:37 PM Backport #43256: nautilus: monitor config store: Deleting logging config settings does not decrea...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32846
merged - 08:44 PM Backport #43631: nautilus: segv in collect_sys_info
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32901
merged - 08:43 PM Backport #43473: nautilus: recursive lock of OpTracker::lock (70)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32858
merged - 08:41 PM Backport #43245: nautilus: osd: increase priority in certain OSD perf counters
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32845
merged - 08:38 PM Backport #43726: nautilus: osd-recovery-space.sh has a race
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32774
merged - 06:50 PM Feature #44025: Make it harder to set pool replica size to 1
- Pool deletion also requires a config option to be set on the monitor before it's allowed through.
I think we should ... - 05:27 PM Bug #44067 (Fix Under Review): cephtool/test.sh test fails to scrub all pools
- 05:14 PM Bug #44067 (Resolved): cephtool/test.sh test fails to scrub all pools
- ...
- 02:55 PM Bug #44052 (Pending Backport): ceph -s does not show >32bit pg states
- 02:42 PM Bug #44062 (Resolved): LibRadosWatchNotify.WatchNotify failure
- ...
- 02:37 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- /a/sage-2020-02-09_21:18:03-rados-wip-sage2-testing-2020-02-09-1152-distro-basic-smithi/4749175...
- 10:37 AM Backport #42120 (Resolved): nautilus: pg_autoscaler should show a warning if pg_num isn't a power...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30689
m... - 10:37 AM Backport #43471 (Resolved): nautilus: negative num_objects can set PG_STATE_DEGRADED
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32857
m... - 10:37 AM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
- 10:36 AM Backport #43319 (Resolved): nautilus: PeeringState::GoClean will call purge_strays unconditionally
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32847
m... - 10:36 AM Backport #43099 (Resolved): nautilus: nautilus:osd: network numa affinity not supporting subnet port
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32843
m... - 10:36 AM Backport #43246 (Resolved): nautilus: Nearfull warnings are incorrect
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32773
m... - 10:00 AM Backport #43650 (Resolved): nautilus: Improve upmap change reporting in logs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32716
m... - 10:00 AM Backport #43620 (Resolved): nautilus: mon shutdown timeout (race with async compaction)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32715
m... - 09:59 AM Backport #43783 (Resolved): nautilus: mgr commands fail when using non-client auth
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32769
m... - 09:41 AM Bug #43582 (Fix Under Review): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 05:19 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Nathan Cutler wrote:
> But if the issue was introduced in 2008, then we'd need to backport further than nautilus...
...
02/09/2020
- 05:40 PM Bug #43889 (In Progress): expected MON_CLOCK_SKEW but got none
- 01:31 PM Bug #44053 (Fix Under Review): test_envlibrados_for_rocksdb.sh fails on master
- 01:30 PM Bug #44053 (Resolved): test_envlibrados_for_rocksdb.sh fails on master
- see https://github.com/ceph/ceph/commit/c724369010a753bd44e11a534d1f42156c4fc12d
should be fixed by https://github... - 12:45 AM Bug #42328 (Fix Under Review): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- 12:25 AM Bug #43903 (In Progress): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
02/08/2020
- 09:55 PM Backport #43919 (In Progress): nautilus: osd stuck down
- 09:47 PM Backport #43916 (In Progress): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pendin...
- 09:43 PM Backport #43881 (In Progress): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 09:42 PM Backport #43880 (In Progress): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 09:41 PM Backport #43879 (In Progress): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 09:12 PM Backport #43989 (In Progress): nautilus: osd: Allow 64-char hostname to be added as the "host" in...
- 09:11 PM Backport #43988 (In Progress): luminous: osd: Allow 64-char hostname to be added as the "host" in...
- 09:10 PM Backport #43987 (In Progress): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- 09:08 PM Backport #43992 (In Progress): nautilus: objecter doesn't send osd_op
- 09:05 PM Backport #43991 (In Progress): mimic: objecter doesn't send osd_op
- 06:11 PM Bug #44052 (Fix Under Review): ceph -s does not show >32bit pg states
- 06:07 PM Bug #44052 (Resolved): ceph -s does not show >32bit pg states
- ceph -s does not show newer pg states, like repair_failed
- 03:26 PM Bug #44050 (Fix Under Review): mon tell command args don't work
- 02:37 PM Bug #44050: mon tell command args don't work
- 'ceph tell mon.a help' works, but '-h' does not.
- 02:07 PM Bug #44050 (Resolved): mon tell command args don't work
- Also, 'ceph tell mon.a force-sync --yes-i-really-mean-it' seems to be broken:...
- 02:11 PM Feature #42638 (Resolved): Allow specifying pg_autoscale_mode when creating a new pool
- 01:53 PM Bug #43889: expected MON_CLOCK_SKEW but got none
- /a/sage-2020-02-07_23:51:30-rados-wip-sage2-testing-2020-02-07-1439-distro-basic-smithi/4742672
- 01:34 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
- 08:18 AM Bug #43885: failed to reach quorum size 9 before timeout expired
- Since I could not reproduce the issue, I analyzed logs from the original run:
/a/sage-2020-01-29_20:14:58-rados-wip-... - 01:13 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- David Zafman wrote:
> For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these... - 12:27 AM Backport #42120: nautilus: pg_autoscaler should show a warning if pg_num isn't a power of two
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30689
merged
02/07/2020
- 10:31 PM Backport #43471: nautilus: negative num_objects can set PG_STATE_DEGRADED
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32857
merged - 10:31 PM Backport #43346: nautilus: short pg log + cache tier ceph_test_rados out of order reply
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32848
merged - 10:30 PM Backport #43319: nautilus: PeeringState::GoClean will call purge_strays unconditionally
- Nathan Cutler wrote:Reviewed-by: Neha Ojha <nojha@redhat.com>
> https://github.com/ceph/ceph/pull/32847
merged - 10:29 PM Backport #43099: nautilus: nautilus:osd: network numa affinity not supporting subnet port
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32843
merged - 10:29 PM Backport #43246: nautilus: Nearfull warnings are incorrect
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32773
merged - 10:11 PM Backport #43650: nautilus: Improve upmap change reporting in logs
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/32716
merged - 10:09 PM Backport #43620: nautilus: mon shutdown timeout (race with async compaction)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32715
merged - 10:03 PM Backport #43783: nautilus: mgr commands fail when using non-client auth
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32769
merged - 08:34 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- For whatever reason we do not have complete osd logs for this, but from nojha-2020-02-06_01:27:32-upgrade:mimic-x:str...
- 05:28 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- ...
- 04:34 PM Bug #44041 (Fix Under Review): osd: MLease in stray state -> Crashed
- 04:03 PM Bug #44041 (Resolved): osd: MLease in stray state -> Crashed
- ...
02/06/2020
- 11:55 PM Feature #44025: Make it harder to set pool replica size to 1
- Neha Ojha wrote:
> Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to w... - 11:50 PM Feature #44025 (Resolved): Make it harder to set pool replica size to 1
- Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to what we have for pool...
- 11:53 PM Bug #44024 (Fix Under Review): change in utime_t rendering ('T' separator) conflicts with cache t...
- 11:26 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
- crash like...
- 06:15 PM Bug #44022 (Resolved): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd ...
- The crash happens on a mimic OSD. Telemetry crash reports have been reporting similar crashes in 14.2.4(may or may no...
- 12:50 PM Bug #44015 (New): Cant compile src/tools/rados/rados.cc on 32 bit systems
- On my machine size_t is unsigned int. This causes an overflow in src/tools/rados/rados.cc:776: max_obj_len = 5ull * 1...
- 04:48 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these are the log lines th...- 03:26 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- From mgr.x's log after the last time pg_stats are received we see ...
- 03:42 AM Bug #44004: "ceph" command crashes
- not reproducible in my testbed.
- 03:36 AM Bug #44004: "ceph" command crashes
- Sometimes the "ceph" command fails with a segmentation fault, here is the core_backtrace. It seems that it has someth...
- 02:59 AM Bug #44004 (Can't reproduce): "ceph" command crashes
- On the most recent master, after building, I ran the command "./bin/ceph -s --connect-timeout 1 -c /home/xuxuehan/cep...
- 01:37 AM Feature #42638 (Fix Under Review): Allow specifying pg_autoscale_mode when creating a new pool
Also available in: Atom