Activity
From 08/28/2021 to 09/26/2021
09/26/2021
- 10:15 AM Backport #52736 (Resolved): octopus: bogus "unable to find a keyring" warning on "rbd mirroring b...
- https://github.com/ceph/ceph/pull/43312
- 10:15 AM Backport #52735 (Resolved): pacific: bogus "unable to find a keyring" warning on "rbd mirroring b...
- https://github.com/ceph/ceph/pull/43313
- 10:15 AM Backport #52734 (Resolved): pacific: full cluster handling regressions in pacific
- https://github.com/ceph/ceph/pull/43311
- 10:10 AM Bug #51628 (Pending Backport): bogus "unable to find a keyring" warning on "rbd mirroring bootstr...
- 10:10 AM Bug #52648 (Pending Backport): full cluster handling regressions in pacific
09/25/2021
- 09:00 AM Backport #52733 (Resolved): pacific: [rbd-mirror] unbreak one-way snapshot-based mirroring
- https://github.com/ceph/ceph/pull/43315
- 09:00 AM Backport #52732 (Resolved): octopus: [rbd-mirror] unbreak one-way snapshot-based mirroring
- https://github.com/ceph/ceph/pull/43314
- 08:59 AM Bug #52675 (Pending Backport): [rbd-mirror] unbreak one-way snapshot-based mirroring
09/24/2021
- 03:55 PM Bug #52690: rbd: failed to remove snapshot: (1) Operation not permitted -- but the volume is removed
- Still seeing this on an almost daily basis.
- 01:09 PM Bug #52637: IO cannot return to normal even though qos is removed
- wb song wrote:
> It is some sort of ordering/prioritization issue.
So it doesn't always get "stuck"? Can you des...
09/22/2021
- 05:32 AM Bug #49876: [luks] sporadic failure in TestLibRBD.TestEncryptionLUKS2
- - does not reproduce without caching...
- 12:05 AM Bug #52637: IO cannot return to normal even though qos is removed
- Ilya Dryomov wrote:
> This should tell whether it is some sort of ordering/prioritization issue or a permanent locku...
09/21/2021
- 03:30 PM Bug #52690 (Need More Info): rbd: failed to remove snapshot: (1) Operation not permitted -- but t...
- Since upgrading to 15.2.10 I sometimes receive the following error when attempting to remove a snapshot:
```
rbd:... - 11:59 AM Bug #52637: IO cannot return to normal even though qos is removed
- It could be that the header update notification gets "behind" all the pending I/O. You set an extremely low limit (j...
- 09:05 AM Bug #52637: IO cannot return to normal even though qos is removed
- Ilya Dryomov wrote:
> After the first "rbd image-meta set" command, client A clearly picked up the header update not...
09/20/2021
- 09:19 PM Bug #52637: IO cannot return to normal even though qos is removed
- After the first "rbd image-meta set" command, client A clearly picked up the header update notification (send_v2_get_...
- 09:10 PM Bug #52675 (Fix Under Review): [rbd-mirror] unbreak one-way snapshot-based mirroring
- 08:34 PM Bug #52675 (Resolved): [rbd-mirror] unbreak one-way snapshot-based mirroring
- ...
- 04:25 PM Bug #51867 (Resolved): [pybind] mirror_image_get_status() throws TypeError if remote_statuses isn...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:24 PM Bug #52063 (Resolved): rbd-mirror: potential hang on shutdown
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:19 PM Backport #52105 (Resolved): octopus: rbd-mirror: potential hang on shutdown
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42978
m... - 04:19 PM Backport #52005 (Resolved): octopus: [pybind] mirror_image_get_status() throws TypeError if remot...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42971
m...
09/19/2021
- 06:40 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- > Indeed bootstrap does somethings, but documentation doesn't say what.
> I compared omap values and after bootstrap... - 06:20 PM Bug #51628 (Fix Under Review): bogus "unable to find a keyring" warning on "rbd mirroring bootstr...
09/17/2021
- 03:18 PM Bug #52648 (Fix Under Review): full cluster handling regressions in pacific
- 03:04 PM Bug #52648 (Resolved): full cluster handling regressions in pacific
- https://github.com/ceph/ceph/pull/39282 introduced at least three regressions:
- extra_op_flags isn't applied to w... - 09:40 AM Bug #50522 (Resolved): [rbd-nbd] default pool isn't picked up
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:32 AM Backport #52461 (Resolved): pacific: rbd-mirrror: expose perf dump info for snapshot replayer
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42987
m... - 09:32 AM Backport #50907 (Resolved): pacific: [rbd-nbd] default pool isn't picked up
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42980
m... - 09:32 AM Backport #52106 (Resolved): pacific: rbd-mirror: potential hang on shutdown
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42979
m... - 09:31 AM Backport #52006 (Resolved): pacific: [pybind] mirror_image_get_status() throws TypeError if remot...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42972
m...
09/16/2021
- 11:32 AM Bug #52637 (Won't Fix): IO cannot return to normal even though qos is removed
- Set rbd_qos_bps_limit to a smaller value, and then remove it, the IO cannot return to normal.
Steps to reproduce:
...
09/15/2021
09/14/2021
- 08:25 AM Bug #52277 (New): [pwl] IO hang when the single IO size * io_depth > cache size
- 08:17 AM Bug #52599 (Resolved): [pwl] flush requests are dispatched in advance
- If the user sends _write1 write2 write3 write4 flush write5_, and the depth is 2, it may become _write1 write2 write3...
09/13/2021
- 10:05 AM Backport #52585 (Resolved): pacific: [pwl ssd] m_blocks_to_log_entires not cleared on retiring
- https://github.com/ceph/ceph/pull/43772
- 10:05 AM Bug #52579 (Pending Backport): [pwl ssd] m_blocks_to_log_entires not cleared on retiring
- 08:47 AM Bug #52579 (Resolved): [pwl ssd] m_blocks_to_log_entires not cleared on retiring
- 09:55 AM Backport #52584 (Resolved): pacific: [pwl] possible data corruption in TEST_F(TestLibRBD, TestFUA)
- https://github.com/ceph/ceph/pull/44199
- 09:51 AM Bug #51438 (Pending Backport): [pwl] possible data corruption in TEST_F(TestLibRBD, TestFUA)
09/10/2021
- 09:40 AM Backport #52570 (Resolved): pacific: [pwl ssd] memory corruption (shared_ptr related?)
- https://github.com/ceph/ceph/pull/43772
- 09:37 AM Bug #52400 (Pending Backport): [pwl ssd] memory corruption (shared_ptr related?)
- 07:45 AM Backport #52569 (Resolved): pacific: [pwl] m_bytes_allocated is calculated incorrectly on reopen
- https://github.com/ceph/ceph/pull/43772
- 07:41 AM Bug #52341 (Pending Backport): [pwl] m_bytes_allocated is calculated incorrectly on reopen
- 04:15 AM Bug #52567 (Duplicate): rbd images will be disappeared on dashboard if i enable image mirror.
- Problem reproduction:
1. setup two ceph cluster, configurate rbd pool with image mirror mode. data is only mirrored ... - 02:00 AM Bug #52566 (Resolved): [pwl ssd] assert in _aio_stop() during shutdown
- After fix https://tracker.ceph.com/issues/52235 by moving finish_op to the end of call back func. If aio_thread call ...
09/09/2021
- 01:37 PM Backport #52559 (In Progress): pacific: pool validation lockup on selfmanaged_snap_create()
- 01:25 PM Backport #52559 (Resolved): pacific: pool validation lockup on selfmanaged_snap_create()
- https://github.com/ceph/ceph/pull/43113
- 01:23 PM Bug #52537 (Pending Backport): pool validation lockup on selfmanaged_snap_create()
09/08/2021
- 03:15 PM Backport #52105: octopus: rbd-mirror: potential hang on shutdown
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42978
merged - 03:15 PM Backport #52005: octopus: [pybind] mirror_image_get_status() throws TypeError if remote_statuses ...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42971
merged - 10:44 AM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Indeed bootstrap does somethings, but documentation doesn't say what.
I compared omap values and after bootstrap rbd... - 10:35 AM Bug #52537 (Fix Under Review): pool validation lockup on selfmanaged_snap_create()
- 10:25 AM Bug #52537 (In Progress): pool validation lockup on selfmanaged_snap_create()
- 10:21 AM Bug #52537 (Resolved): pool validation lockup on selfmanaged_snap_create()
- Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated (uninitialized) pool trigger a lockup in Vali...
- 09:20 AM Backport #52536 (Resolved): pacific: [pwl ssd] assert in AbstractWriteLog::handle_flushed_sync_po...
- https://github.com/ceph/ceph/pull/43772
- 09:15 AM Bug #52465 (Pending Backport): [pwl ssd] assert in AbstractWriteLog::handle_flushed_sync_point()
- 08:55 AM Backport #52534 (Resolved): pacific: rbd children:logging crashes after open or close fails.
- https://github.com/ceph/ceph/pull/44999
- 08:55 AM Backport #52533 (Resolved): octopus: rbd children:logging crashes after open or close fails.
- https://github.com/ceph/ceph/pull/45000
- 08:50 AM Bug #52522 (Pending Backport): rbd children:logging crashes after open or close fails.
09/07/2021
- 04:02 PM Backport #52461: pacific: rbd-mirrror: expose perf dump info for snapshot replayer
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42987
merged - 04:01 PM Backport #50907: pacific: [rbd-nbd] default pool isn't picked up
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42980
merged - 03:58 PM Backport #52106: pacific: rbd-mirror: potential hang on shutdown
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42979
merged - 03:58 PM Backport #52006: pacific: [pybind] mirror_image_get_status() throws TypeError if remote_statuses ...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42972
merged - 12:20 PM Bug #52522 (Fix Under Review): rbd children:logging crashes after open or close fails.
- 08:36 AM Bug #52522 (Resolved): rbd children:logging crashes after open or close fails.
- In method: librbd::api::Image<I>::list_descendants().
Ictx is deleted when "ictx->state->open()" and "ictx->state->c... - 03:43 AM Bug #52511: [pwl ssd] flush cause io re-oreder to writeback layer
- Also it will reorder write_entry and non-write_entry(discard).
09/06/2021
- 07:06 PM Bug #52400 (Fix Under Review): [pwl ssd] memory corruption (shared_ptr related?)
- 04:18 PM Bug #52425 (Resolved): "rbd unmap" misbehaves if the image name contains glob metacharacters
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:15 PM Backport #52452 (Resolved): pacific: "rbd unmap" misbehaves if the image name contains glob metac...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42969
m... - 04:01 PM Backport #52453 (Resolved): octopus: "rbd unmap" misbehaves if the image name contains glob metac...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42968
m... - 08:12 AM Bug #50970 (Resolved): replication: local site naming in manual mode and snapshot mirroring
- 08:12 AM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Marking this as Resolved because the documentation fixup is now merged but feel free to continue commenting here. If...
- 06:15 AM Bug #49876: [luks] sporadic failure in TestLibRBD.TestEncryptionLUKS2
- I can reproduce this bug w/ pwl_cache. And found the root reason. I create a new issue(https://tracker.ceph.com/issue...
- 02:48 AM Bug #49876: [luks] sporadic failure in TestLibRBD.TestEncryptionLUKS2
- @ Deepika Upadhyay, when you met the problem, are you enable pwl_cache or not? I want to know if this problem is rela...
- 06:13 AM Bug #52511 (Resolved): [pwl ssd] flush cause io re-oreder to writeback layer
- consider this workload:
writeA(0,4K)
writeB(0,512).
ssd can make sure writeA, writeB order in cache file. But w... - 03:05 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
> Does the fix work? Were you able to reproduce the deadlock or the above segfault with https://github.com/ceph/ce...
09/05/2021
- 10:16 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- No, "rbd mirror pool peer bootstrap import" adds the remote cluster details and key to the config store on the local ...
- 10:05 PM Bug #52088: Stuck rbd-nbd processes.
- The patch is probably working as intended. The problem is that it added a synchronization point to the "rbd-nbd map"...
09/03/2021
- 05:21 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Hi Ilyua,
Ok, but I can't find any keyring file anywhere. Isn't bootstrap supposed to create it ? rbd-mirror after... - 04:30 PM Bug #50970 (Fix Under Review): replication: local site naming in manual mode and snapshot mirroring
- 04:30 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- This warning is harmless, "rbd mirror pool peer bootstrap import" doesn't actually need that keyring.
- 10:46 AM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Hi Arthur and Ilya,
Indeed changing rbd-mirror user mon profile from "rbd" to "rbd-mirror-peer" seems to be the pr... - 09:22 AM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Hi Lubo,
Why are you attempting to add a peer manually with "rbd mirror pool peer add"? If you have two clusters,... - 03:21 PM Bug #50905: [rbd-nbd] kernel lockup during rbd_fsx_nbd
- /ceph/teuthology-archive/yuriw-2021-09-01_19:04:25-rbd-wip-yuri-testing-2021-09-01-0844-pacific-distro-basic-smithi/6...
- 02:53 PM Bug #49504: qemu_dynamic_features.sh times out
- http://qa-proxy.ceph.com/teuthology/yuriw-2021-09-01_19:04:25-rbd-wip-yuri-testing-2021-09-01-0844-pacific-distro-bas...
- 01:06 PM Bug #52465 (Fix Under Review): [pwl ssd] assert in AbstractWriteLog::handle_flushed_sync_point()
- Appears to be a regression caused by https://github.com/ceph/ceph/pull/42149 (the workaround for https://tracker.ceph...
- 09:29 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- CONGMIN YIN wrote:
> I wrote a script to loop SSD Unitest, ran it dozens of times, and finally reproduced the deadlo...
09/02/2021
- 10:43 AM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- In my setup I use a slightly different cephx caps:...
- 10:17 AM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Maybe a regression then ? Client name doesn't change anything :...
- 09:39 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- I find a segment fault(core dump). I think this is the same root cause as the deadlock issue. Dead lock is caused by ...
09/01/2021
- 03:00 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Hmm I no longer have my test clusters but on my end it was working without the trick you did. Could you try using the...
- 02:15 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Forgot Ceph version : ...
- 02:12 PM Bug #50970: replication: local site naming in manual mode and snapshot mirroring
- Hi,
From my tests, site-name is not sufficient to make snapshot based mirroring work out of the box :... - 02:44 PM Bug #52485: rbd-mirror: trashed and linked secondary image cannot be restored
- If some admin can replace ...
- 02:29 PM Bug #52485 (New): rbd-mirror: trashed and linked secondary image cannot be restored
- Following discussion in "PR #41696":https://github.com/ceph/ceph/pull/41696
When secondary image has cloned snapsh... - 08:51 AM Bug #51031 (Fix Under Review): rbd-mirror: metadata of mirrored image are not properly cleaned up...
- 08:16 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- I wrote a script to loop SSD Unitest, ran it dozens of times, and finally reproduced the deadlock problem. On my mach...
- 02:53 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- Ilya Dryomov wrote:
> FWIW while trying to reproduce this deadlock I also hit https://tracker.ceph.com/issues/52465.... - 03:16 AM Bug #52478 (Closed): [pwl ssd] Aborted in load_existing_entries: Decoded root:
- This issue was triggered when I tested my PR....
- 02:15 AM Bug #52258: [pwl] The write back time of cache is too long
- > Linking https://github.com/ceph/ceph/pull/42775 that was supposed to address this for posterity.
In RWL mode, I ...
08/31/2021
- 08:10 PM Bug #52277 (Closed): [pwl] IO hang when the single IO size * io_depth > cache size
- Changing to Closed as there is no code change associated with this ticket. https://github.com/ceph/ceph/pull/40208 w...
- 02:57 AM Bug #52277 (Resolved): [pwl] IO hang when the single IO size * io_depth > cache size
- https://github.com/ceph/ceph/pull/40208 There is no problem with this solution, and this solution is still adopted.
- 06:43 PM Bug #52088: Stuck rbd-nbd processes.
- Thanks for the detailed reply!
So this definitely sounds like a kernel issue and related to this patch. Is there ... - 04:56 PM Backport #52461 (In Progress): pacific: rbd-mirrror: expose perf dump info for snapshot replayer
- 07:44 AM Backport #52461: pacific: rbd-mirrror: expose perf dump info for snapshot replayer
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/42987
ceph-backport.sh versi... - 07:40 AM Backport #52461 (Resolved): pacific: rbd-mirrror: expose perf dump info for snapshot replayer
- https://github.com/ceph/ceph/pull/42987
- 04:55 PM Backport #52460 (In Progress): octopus: rbd-mirrror: expose perf dump info for snapshot replayer
- 07:43 AM Backport #52460: octopus: rbd-mirrror: expose perf dump info for snapshot replayer
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/42986
ceph-backport.sh versi... - 07:40 AM Backport #52460 (Resolved): octopus: rbd-mirrror: expose perf dump info for snapshot replayer
- https://github.com/ceph/ceph/pull/42986
- 02:25 PM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- FWIW while trying to reproduce this deadlock I also hit https://tracker.ceph.com/issues/52465.
- 02:16 PM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- Just hit it on master (980cf670ed7cb8afe9f58c26782e34e559848051), TestMockCacheSSDWriteLog.read_hit_part_ssd_cache te...
- 10:00 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- Since it happens on shutdown, I don't think the test case particularly matters. I saw it hang on TestMockCacheSSDWri...
- 02:23 PM Bug #52465 (Resolved): [pwl ssd] assert in AbstractWriteLog::handle_flushed_sync_point()
- On today's master (980cf670ed7cb8afe9f58c26782e34e559848051):...
- 10:13 AM Bug #52258: [pwl] The write back time of cache is too long
- Linking https://github.com/ceph/ceph/pull/42775 that was supposed to address this for posterity.
- 10:08 AM Bug #52258: [pwl] The write back time of cache is too long
- Yes, most of my tests were performed on the ssd mode. But if you are saying that the rwl mode is also sometimes slow...
- 07:01 AM Bug #52258: [pwl] The write back time of cache is too long
- > Is that always the case? I saw "write back 1G from cache" take a lot longer than "write 1G directly to the cluster...
- 07:39 AM Feature #50973 (Pending Backport): rbd-mirrror: expose perf dump info for snapshot replayer
08/30/2021
- 09:48 PM Bug #52088: Stuck rbd-nbd processes.
- 15.y.z is Octopus, not Nautilus.
> We are getting packages from here: https://download.ceph.com/debian-nautilus/di... - 09:42 PM Bug #52088: Stuck rbd-nbd processes.
- > Nautilus (14.2.22) - Bionic (5.4.0-1037) - Test Passed - Stable when deployed..
> Nautilus (14.2.22) - Bionic (5.4... - 02:53 PM Backport #50716 (Rejected): pacific: Global config overrides do not apply to in-use images
- Already in pacific (landed before pacific was branched off). The parent was tagged for pacific by mistake.
- 02:35 PM Backport #50907 (In Progress): pacific: [rbd-nbd] default pool isn't picked up
- 02:29 PM Backport #52106 (In Progress): pacific: rbd-mirror: potential hang on shutdown
- 02:29 PM Backport #52105 (In Progress): octopus: rbd-mirror: potential hang on shutdown
- 12:27 PM Backport #52006 (In Progress): pacific: [pybind] mirror_image_get_status() throws TypeError if re...
- 12:26 PM Backport #52005 (In Progress): octopus: [pybind] mirror_image_get_status() throws TypeError if re...
- 10:50 AM Bug #52118 (Won't Fix): RBD qos causes io to be out of order
- 10:44 AM Backport #52452 (In Progress): pacific: "rbd unmap" misbehaves if the image name contains glob me...
- 09:30 AM Backport #52452 (Resolved): pacific: "rbd unmap" misbehaves if the image name contains glob metac...
- https://github.com/ceph/ceph/pull/42969
- 10:43 AM Backport #52453 (In Progress): octopus: "rbd unmap" misbehaves if the image name contains glob me...
- 09:30 AM Backport #52453 (Resolved): octopus: "rbd unmap" misbehaves if the image name contains glob metac...
- https://github.com/ceph/ceph/pull/42968
- 09:53 AM Bug #52258 (New): [pwl] The write back time of cache is too long
- > write end: write back time = cache_cap / bw_write_back_to_cluster
Is that always the case? I saw "write back 1G... - 07:15 AM Bug #52258 (Closed): [pwl] The write back time of cache is too long
- If the bandwidth of write to cache is larger than that of write back to cluster(general case):
cache not full: fil... - 09:27 AM Bug #52425 (Pending Backport): "rbd unmap" misbehaves if the image name contains glob metacharacters
- 07:26 AM Bug #52235: [pwl] deadlock on AbstractWriteLog::m_lock during shutdown
- I can't reproduce this problem through unit tests on the master. Is this inevitable? Could you please provide your co...
08/28/2021
Also available in: Atom