Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2022-12-07T14:58:01Z
Ceph
Redmine
CephFS - Bug #58196 (Rejected): CephFS mirroring fails when using with Rook
https://tracker.ceph.com/issues/58196
2022-12-07T14:58:01Z
Deepika Upadhyay
<p>Reproducer:<br />Configure FS mirroring in rook ceph <br /><a class="external" href="https://rook.io/docs/rook/v1.10/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-mirroring/">https://rook.io/docs/rook/v1.10/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-mirroring/</a></p>
<p>I used 1 hr as the snapshot schedule interval.</p>
<pre>
[root@centos-8gb-hel1-1 ~]# kubectl logs -nrook-ceph rook-ceph-fs-mirror-dc5b66b74-f8bb4
Defaulted container "fs-mirror" out of: fs-mirror, log-collector, chown-container-data-dir (init)
debug 2022-12-05T15:12:29.819+0000 7f8c9cabd100 0 set uid:gid to 167:167 (ceph:ceph)
debug 2022-12-05T15:12:29.819+0000 7f8c9cabd100 0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process cephfs-mirror, pid 1634
debug 2022-12-05T15:12:29.819+0000 7f8c9cabd100 0 pidfile_write: ignore empty --pid-file
debug 2022-12-05T15:12:29.832+0000 7f8c9cabd100 1 mgrc service_daemon_register cephfs-mirror.2179192 metadata {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable),ceph_version_short=17.2.5,container_hostname=centos-8gb-hel1-1,container_image=quay.io/ceph/ceph:v17.2.5,cpu=Intel Xeon Processor (Skylake, IBRS),distro=centos,distro_description=CentOS Stream 8,distro_version=8,hostname=centos-8gb-hel1-1,id=fs-mirror,instance_id=2179192,kernel_description=#1 SMP Mon Jul 18 17:42:52 UTC 2022,kernel_version=4.18.0-408.el8.x86_64,mem_swap_kb=0,mem_total_kb=7755460,os=Linux,pod_name=rook-ceph-fs-mirror-dc5b66b74-f8bb4,pod_namespace=rook-ceph}
debug 2022-12-05T15:12:33.956+0000 7f8c97649700 0 cephfs::mirror::PeerReplayer(4c35ed9f-d8e0-49b7-8525-d58cf02b61ee) init: remote monitor host=[v2:65.109.139.111:3300,v1:65.109.139.111:6789],[v2:95.217.219.154:3300,v1:95.217.219.154:6789],[v2:65.21.2.149:3300,v1:65.21.2.149:6789]
debug 2022-12-05T15:12:35.102+0000 7f8c7a60f700 -1 cephfs::mirror::PeerReplayer(4c35ed9f-d8e0-49b7-8525-d58cf02b61ee) build_snap_map: failed to open local snap directory=/tmp/registry/.snap: (2) No such file or directory
debug 2022-12-05T15:12:35.102+0000 7f8c7a60f700 -1 cephfs::mirror::PeerReplayer(4c35ed9f-d8e0-49b7-8525-d58cf02b61ee) do_sync_snaps: failed to build local snap map
debug 2022-12-05T15:12:35.102+0000 7f8c7a60f700 -1 cephfs::mirror::PeerReplayer(4c35ed9f-d8e0-49b7-8525-d58cf02b61ee) sync_snaps: failed to sync snapshots for dir_root=/tmp/registry
debug 2022-12-05T15:12:36.076+0000 7f8c97649700 0 cephfs::mirror::PeerReplayer(f63ff9e0-3ecb-4406-a001-1bcf81e53bee) init: remote monitor host=[v2:65.109.139.111:3300,v1:65.109.139.111:6789],[v2:95.217.219.154:3300,v1:95.217.219.154:6789],[v2:65.21.2.149:3300,v1:65.21.2.149:6789]
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/tools/cephfs_mirror/FSMirror.cc: In function 'void cephfs::mirror::FSMirror::add_peer(const Peer&)' thread 7f8c97649700 time 2022-12-05T15:12:36.218409+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/tools/cephfs_mirror/FSMirror.cc: 396: FAILED ceph_assert(m_peer_replayers.size() == 1)
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f8ca006743f]
2: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7f8ca0067605]
3: (cephfs::mirror::FSMirror::add_peer(Peer const&)+0x527) [0x5634e525edb7]
4: (Context::complete(int)+0xd) [0x5634e5258bad]
5: (cephfs::mirror::Mirror::update_fs_mirrors()+0x6b8) [0x5634e5257118]
6: (Context::complete(int)+0xd) [0x5634e5258bad]
7: (CommonSafeTimer<std::mutex>::timer_thread()+0x12f) [0x7f8ca0166c6f]
8: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x7f8ca0167d61]
9: /lib64/libpthread.so.0(+0x81ca) [0x7f8c9f2921ca]
10: clone()
*** Caught signal (Aborted) **
in thread 7f8c97649700 thread_name:safe_timer
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: /lib64/libpthread.so.0(+0x12cf0) [0x7f8c9f29ccf0]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f8ca0067499]
5: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7f8ca0067605]
6: (cephfs::mirror::FSMirror::add_peer(Peer const&)+0x527) [0x5634e525edb7]
7: (Context::complete(int)+0xd) [0x5634e5258bad]
8: (cephfs::mirror::Mirror::update_fs_mirrors()+0x6b8) [0x5634e5257118]
9: (Context::complete(int)+0xd) [0x5634e5258bad]
10: (CommonSafeTimer<std::mutex>::timer_thread()+0x12f) [0x7f8ca0166c6f]
11: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x7f8ca0167d61]
12: /lib64/libpthread.so.0(+0x81ca) [0x7f8c9f2921ca]
13: clone()
debug 2022-12-05T15:12:36.219+0000 7f8c97649700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/tools/cephfs_mirror/FSMirror.cc: In function 'void cephfs::mirror::FSMirror::add_peer(const Peer&)' thread 7f8c97649700 time 2022-12-05T15:12:36.218409+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/tools/cephfs_mirror/FSMirror.cc: 396: FAILED ceph_assert(m_peer_replayers.size() == 1)
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f8ca006743f]
2: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7f8ca0067605]
3: (cephfs::mirror::FSMirror::add_peer(Peer const&)+0x527) [0x5634e525edb7]
4: (Context::complete(int)+0xd) [0x5634e5258bad]
5: (cephfs::mirror::Mirror::update_fs_mirrors()+0x6b8) [0x5634e5257118]
6: (Context::complete(int)+0xd) [0x5634e5258bad]
7: (CommonSafeTimer<std::mutex>::timer_thread()+0x12f) [0x7f8ca0166c6f]
8: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x7f8ca0167d61]
9: /lib64/libpthread.so.0(+0x81ca) [0x7f8c9f2921ca]
10: clone()
</pre>
RADOS - Bug #57782 (Fix Under Review): [mon] high cpu usage by fn_monstore thread
https://tracker.ceph.com/issues/57782
2022-10-06T14:08:15Z
Deepika Upadhyay
<p>We observed high cpu usage by ms_dispatch and fn_monstore thread (amounting to 100-99% in top) Ceph [ deployment was with rook ]</p>
<p>We ran gdb and perf tooling to understand better what might be the origin and found to be :</p>
<pre>
100.00% 0.00% ms_dispatch ceph-mon [.] Monitor::handle_command
|
---Monitor::handle_command
PaxosService::dispatch
OSDMonitor::prepare_update
OSDMonitor::prepare_command
OSDMonitor::prepare_command_impl
OSDMonitor::prepare_new_pool
CrushTester::test_with_fork
|
--99.81%--__libc_close (inlined)
|
|--33.42%--entry_SYSCALL_64_after_hwframe
| do_syscall_64
| |
| |--22.22%--syscall_enter_from_user_mode
| |
| |--6.06%--__x64_sys_close
| | |
| | --4.57%--close_fd
| | |
| | --3.56%--pick_file
| | |
| | --2.72%--_raw_spin_lock
| | |
| | --1.38%--preempt_count_add
| |
| |--2.48%--syscall_exit_to_user_mode
| | |
| | --2.11%--syscall_exit_to_user_mode_prepare
| | |
| | --0.62%--__audit_syscall_exit
| |
| --1.84%--syscall_trace_enter.constprop.0
| |
| --0.76%--__audit_syscall_entry
|
|--1.79%--__pthread_enable_asynccancel
|
--1.24%--__pthread_disable_asynccancel
</pre>
<p>CrushTester leading to libc_close consuming 99% of the cpu insteading of<br />creating a fork:</p>
<pre>
---Monitor::handle_command
PaxosService::dispatch
OSDMonitor::prepare_update
OSDMonitor::prepare_command
OSDMonitor::prepare_command_impl
OSDMonitor::prepare_new_pool
CrushTester::test_with_fork
__libc_close (inlined)
entry_SYSCALL_64_after_hwframe
do_syscall_64
syscall_enter_from_user_mode
</pre>
<p>from mon logs termination can be observed<br /><pre>
2022-09-15T14:29:06.155+0000 7fe451263700 10 -- [v2:10.6.125.48:3300/0,v1:10.6.125.48:6789/0] >> 10.6.137.218:0/260641845 conn(0x5638020d9400 msgr2=0x5637fa16f600 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 96 remaining bytes 0
2022-09-15T14:29:06.209+0000 7fe458271700 -1 received signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-09-15T14:29:06.209+0000 7fe458271700 -1 mon.a@0(leader) e1 *** Got Signal Terminated ***
</pre></p>
<p>there has been no assert failure or anything of sorts, adding it here to see if anyone has any feedback/hit this issue so far.</p>
rbd - Bug #56490 (New): [rbd-mirror] failure to unlink peer when image copy in progress
https://tracker.ceph.com/issues/56490
2022-07-07T15:32:18Z
Deepika Upadhyay
<pre>
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 update_non_primary_snapshot:
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15402, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15403, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15404, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15405, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6913, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15406, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6914, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15407, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6915, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15408, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6916, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15409, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6917, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15410, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6918, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15411, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6919, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6920, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6921, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15412, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15413, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15414, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15415, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6922, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15416, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15417, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15418, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6923, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15419, object_count=30720
2022-07-06T12:26:57.361+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6924, object_count=30720
2022-07-06T12:26:57.361+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15420, object_count=30720
2022-07-06T12:26:57.362+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6925, object_count=30720
2022-07-06T12:26:57.362+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d8644dc00 handle_copy_image_progress: object_number=15421, object_count=30720
2022-07-06T12:26:57.362+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6926, object_count=30720
2022-07-06T12:26:57.362+0000 7f091c535700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d86f32800 handle_copy_image_progress: object_number=6927, object_count=30720
2022-07-06T12:26:57.362+0000 7f091cd36700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555d861fb800 handle_unlink_peer: r=-108
</pre>
rbd - Bug #55852 (Duplicate): [rbd-mirror] remote got demoted in non-primary without actually per...
https://tracker.ceph.com/issues/55852
2022-06-03T15:31:00Z
Deepika Upadhyay
<pre>
- keeping image up+unknown
2022-06-03T11:31:20.003+0000 7f3bb6018700 15 rbd::mirror::ImageReplayer: 0x55b43dafe300 [2/421faf66-0be0-41e4-88a6-697d395a76e6] set_mirror_image_status_update: status={state=up+unknown, description=remote image demoted, last_update=0.000000]}
some images report demotion on both sides, when we just performed demotion on
cepheast:
22 test22:
global_id: a44cc01b-7ef9-4ab8-8584-9fddb6e41ce3
state: up+unknown
description: remote image demoted
service: admin on vossi03
last_update: 2022-06-03 15:19:48
peer_sites:
name: cephwest
state: up+unknown
description: remote image demoted
last_update: 2022-06-03 15:19:48
status persists on restart, and hinders us to perform force promote/removal as
image has become read only.
</pre>
rbd - Bug #54613 (New): rbd-mirror: unable to disable mirroring and remove non-primary mirror image
https://tracker.ceph.com/issues/54613
2022-03-17T20:51:54Z
Deepika Upadhyay
<pre>
[ideepika@senta03 hack]$ ./tbox.sh cepheast rbd mirror image disable replicapool/test-demote-sb --force --debug-rbd 0 --debug-ms 0
2022-03-17T20:13:45.379+0000 7fb767fff700 -1 librbd::managed_lock::BreakRequest: 0x7fb754001bf0 handle_blocklist: failed to blocklist lock owner: (22) Invalid argument
2022-03-17T20:13:45.379+0000 7fb767fff700 -1 librbd::managed_lock::AcquireRequest: 0x7fb7540016d0 handle_break_lock: failed to break lock : (22) Invalid argument
2022-03-17T20:13:45.379+0000 7fb767fff700 -1 librbd::ManagedLock: 0x7fb758016658 handle_acquire_lock: failed to acquire exclusive lock: (22) Invalid argument
2022-03-17T20:13:45.379+0000 7fb767fff700 -1 librbd::mirror::snapshot::CreateNonPrimaryRequest: 0x7fb758015fc0 handle_create_snapshot: failed to create mirror snapshot: (30) Read-only file system
2022-03-17T20:13:45.379+0000 7fb767fff700 -1 librbd::mirror::snapshot::PromoteRequest: 0x7fb758016900 handle_create_orphan_snapshot: failed to create orphan snapshot: (30) Read-only file system
2022-03-17T20:13:45.379+0000 7fb767fff700 -1 librbd::mirror::DisableRequest: 0x5555d4ddd8b0 handle_promote_image: failed to promote image: (30) Read-only file system
2022-03-17T20:13:45.379+0000 7fb786857200 -1 librbd::api::Mirror: image_disable: cannot disable mirroring: (30) Read-only file system
command terminated with exit code 30
[ideepika@senta03 hack]$ ./tbox.sh cepheast rbd rm --image replicapool/test-demote-sb --debug-rbd 0 --debug-ms 0
2022-03-17T20:13:59.917+0000 7f13b7fff700 -1 librbd::mirror::DisableRequest: 0x56241e0878b0 handle_get_mirror_info: mirrored image is not primary, add force option to disable mirroring
2022-03-17T20:13:59.917+0000 7f13d6603200 -1 librbd::api::Trash: disable_mirroring: failed to disable mirroring: (22) Invalid argument
Removing image: 0% complete...failed.
rbd: delete error: (22) Invalid argument
command terminated with exit code 22
</pre>
the image had no watchers, to disable mirroring by force was not possible because of the image being listed in a read-only system and hence cannot be removed.
rbd - Bug #54448 (Resolved): [rbd-mirror] "failed to unlink local peer from remote image" due to ...
https://tracker.ceph.com/issues/54448
2022-03-02T14:48:05Z
Deepika Upadhyay
<p>this issue is reproducible with using 150 images with the reproducer script from: <a class="external" href="https://tracker.ceph.com/issues/55852">https://tracker.ceph.com/issues/55852</a> <br /><pre>
debug 2022-03-01T17:05:22.217+0000 7fa13ac4d700 5 librbd::SnapshotRemoveRequest: 0x564785038d80 trash_snap:
debug 2022-03-01T17:05:22.363+0000 7fa12dc33700 5 librbd::Watcher: 0x564784d9e300 notifications_blocked: blocked=0
debug 2022-03-01T17:05:22.363+0000 7fa12dc33700 10 librbd::Watcher::C_NotifyAck 0x564787a0e2d0 C_NotifyAck: id=25460566169347, handle=94865195372928
debug 2022-03-01T17:05:22.363+0000 7fa12dc33700 10 librbd::ImageWatcher: 0x564784d9e300 remote snap_remove request: .mirror.primary.6e69a78f-b13b-4202-adf2-db6c669df6a9.35fcedad-b1d6-4f7b-848e-443639f0f057
debug 2022-03-01T17:05:22.363+0000 7fa12dc33700 20 librbd::ExclusiveLock: 0x564780fa2820 accept_request=0 (request_type=0)
debug 2022-03-01T17:05:22.363+0000 7fa12dc33700 10 librbd::Watcher::C_NotifyAck 0x564787a0e2d0 finish: r=0
debug 2022-03-01T17:05:22.366+0000 7fa14145a700 20 librbd::watcher::Notifier: 0x564781b94a00 handle_notify: r=-110
debug 2022-03-01T17:05:22.366+0000 7fa14145a700 20 librbd::watcher::Notifier: 0x564781b94a00 handle_notify: pending=0
debug 2022-03-01T17:05:22.366+0000 7fa148468700 20 librbd::watcher::Notifier: 0x564781b94a00 notify: pending=1debug 2022-03-01T17:05:22.366+0000 7fa148468700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x564787c00d80 finish: r=-110
debug 2022-03-01T17:05:22.366+0000 7fa148468700 -1 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x564787c00d80 finish: resending after timeout
debug 2022-03-01T17:05:22.366+0000 7fa148468700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x564787c00d80 send
debug 2022-03-01T17:05:22.366+0000 7fa148468700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x564787c00d80 send: sending to 11037
debug 2022-03-01T17:05:22.367+0000 7fa141c5b700 5 librbd::Watcher: 0x564781029c00 notifications_blocked: blocked=0
debug 2022-03-01T17:05:22.367+0000 7fa141c5b700 10 librbd::Watcher::C_NotifyAck 0x56479abb6d70 C_NotifyAck: id=43748537016369, handle=94865132570880
debug 2022-03-01T17:05:22.367+0000 7fa141c5b700 10 rbd::mirror::InstanceWatcher: 0x564781029c00 handle_notify: notify_id=43748537016369, handle=94865132570880, notifier_id=11037
debug 2022-03-01T17:05:22.367+0000 7fa141c5b700 10 rbd::mirror::InstanceWatcher: 0x564781029c00 handle_payload: sync_start: instance_id=11037, request_id=3438
debug 2022-03-01T17:05:22.367+0000 7fa141c5b700 10 rbd::mirror::InstanceWatcher: 0x564781029c00 prepare_request: instance_id=11037, request_id=3438
debug 2022-03-01T17:05:22.367+0000 7fa141c5b700 10 rbd::mirror::InstanceWatcher: 0x564781029c00 prepare_request: duplicate for in-progress request
debug 2022-03-01T17:05:22.385+0000 7fa12d432700 5 librbd::Watcher: 0x56478534e300 notifications_blocked: blocked=0
debug 2022-03-01T17:05:22.385+0000 7fa12d432700 10 librbd::Watcher::C_NotifyAck 0x5647855bde00 C_NotifyAck: id=25460566169348, handle=94865161705088
debug 2022-03-01T17:05:22.385+0000 7fa12d432700 10 librbd::ImageWatcher: 0x56478534e300 remote snap_remove request: .mirror.primary.0251e40e-7a21-4b82-b7ba-d3daa42a08f5.ea71c53d-2759-4e0b-bf92-e25d5d76a6a6
debug 2022-03-01T17:05:22.385+0000 7fa12d432700 20 librbd::ExclusiveLock: 0x564780f9fa00 accept_request=0 (request_type=0)
debug 2022-03-01T17:05:22.385+0000 7fa12d432700 10 librbd::Watcher::C_NotifyAck 0x5647855bde00 finish: r=0
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::watcher::Notifier: 0x564784d9e380 handle_notify: r=0
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::watcher::Notifier: 0x564784d9e380 handle_notify: pending=0
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::image_watcher::NotifyLockOwner: 0x564785107b80 handle_notify: r=0
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::Operations: handle_remote_request: r=-30
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::ImageState: 0x564786eabe80 handle_update_notification: refresh_seq = 1, last_refresh = 0
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::ImageState: 0x564784c220e0 ImageUpdateWatchers::notify
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 20 librbd::ImageState: 0x564784c220e0 ImageUpdateWatchers::send_notify: handle=0, watcher=0x56479acd0760
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 15 librbd::mirror::snapshot::UnlinkPeerRequest: 0x5647864b6140 handle_remove_snapshot: r=-30
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 -1 librbd::mirror::snapshot::UnlinkPeerRequest: 0x5647864b6140 handle_remove_snapshot: failed to remove snapshot: (30) Read-only file system
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 15 librbd::mirror::snapshot::UnlinkPeerRequest: 0x5647864b6140 finish: r=-30
debug 2022-03-01T17:05:22.598+0000 7fa121c1b700 20 librbd::ImageState: 0x564784c220e0 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x56479acd0760
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x56479af89800 handle_unlink_peer: r=-30
debug 2022-03-01T17:05:22.598+0000 7fa12dc33700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x56479af89800 handle_unlink_peer: failed to unlink local peer from remote image: (30) Read-only file system
</pre></p>
rbd - Bug #54319 (New): rbd_mirror: TestImageSync.SnapshotStress: FAILED ceph_assert(object_no <...
https://tracker.ceph.com/issues/54319
2022-02-17T18:41:15Z
Deepika Upadhyay
<pre>
[ RUN ] TestImageSync.SnapshotStress
../src/librbd/ObjectMap.cc: In function 'uint8_t librbd::ObjectMap<ImageCtxT>::operator[](uint64_t) const [with ImageCtxT = librbd::ImageCtx; uint8_t = unsigned char; uint64_t = long unsigned int]' thread 7fac9bbea700 time 2022-02-17T10:40:34.963062+0000
../src/librbd/ObjectMap.cc: 79: FAILED ceph_assert(object_no < m_object_map.size())
ceph version 17.0.0-10768-g74b7fe82641 (74b7fe826412ae29748bfccbdb23f42c33ac42d6) quincy (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7face359981d]
2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7face3599a4f]
3: (librbd::ObjectMap<librbd::ImageCtx>::operator[](unsigned long) const+0x8e) [0x55e08ffc9984]
4: (librbd::ObjectMap<librbd::ImageCtx>::object_may_exist(unsigned long) const+0x9e) [0x55e08ffcb238]
5: (librbd::io::AbstractObjectWriteRequest<librbd::ImageCtx>::send()+0x2ba) [0x55e09036564a]
6: (librbd::io::ObjectDispatch<librbd::ImageCtx>::write(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&&, std::shared_ptr<neorados::IOContext>, int, int, std::optional<unsigned long>, ZTracer::Trace const&, int*, unsigned long*, librbd::io::DispatchResult*, Context**, Context*)+0xa6b) [0x55e0903585d1]
7: (librbd::io::ObjectDispatcher<librbd::ImageCtx>::send_dispatch(librbd::io::ObjectDispatchInterface*, librbd::io::ObjectDispatchSpec*)+0x322) [0x55e090166678]
8: (librbd::io::Dispatcher<librbd::ImageCtx, librbd::io::ObjectDispatcherInterface>::send(librbd::io::ObjectDispatchSpec*)+0xce) [0x55e09016b154]
9: (librbd::io::ObjectDispatchSpec::send()+0x11) [0x55e09035c40b]
10: (librbd::io::AbstractImageWriteRequest<librbd::ImageCtx>::send_object_requests(boost::container::small_vector<striper::LightweightObjectExtent, 4ul, void, void> const&, std::shared_ptr<neorados::IOContext>, unsigned long)+0x3f1) [0x55e090355ec3]
11: (librbd::io::AbstractImageWriteRequest<librbd::ImageCtx>::send_request()+0x1b6) [0x55e0903561ac]
12: (librbd::io::ImageRequest<librbd::ImageCtx>::send()+0x127) [0x55e0903528fb]
13: (librbd::io::ImageRequest<librbd::ImageCtx>::aio_write(librbd::ImageCtx*, librbd::io::AioCompletion*, std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&&, ceph::buffer::v15_2_0::list&&, std::shared_ptr<neorados::IOContext>, int, ZTracer::Trace const&)+0x1a5) [0x55e090352e15]
14: (librbd::journal::Replay<librbd::ImageCtx>::handle_event(librbd::journal::AioWriteEvent const&, Context*, Context*)+0x3e9) [0x55e0901c6589]
15: (librbd::journal::Replay<librbd::ImageCtx>::EventVisitor const::result_type boost::variant<boost::detail::variant::over_sequence<boost::mpl::l_item<mpl_::long_<21l>, librbd::journal::AioDiscardEvent, boost::mpl::l_item<mpl_::long_<20l>, librbd::journal::AioWriteEvent, boost::mpl::l_item<mpl_::long_<19l>, librbd::journal::AioFlushEvent, boost::mpl::l_item<mpl_::long_<18l>, librbd::journal::OpFinishEvent, boost::mpl::l_item<mpl_::long_<17l>, librbd::journal::SnapCreateEvent, boost::mpl::l_item<mpl_::long_<16l>, librbd::journal::SnapRemoveEvent, boost::mpl::l_item<mpl_::long_<15l>, librbd::journal::SnapRenameEvent, boost::mpl::l_item<mpl_::long_<14l>, librbd::journal::SnapProtectEvent, boost::mpl::l_item<mpl_::long_<13l>, librbd::journal::SnapUnprotectEvent, boost::mpl::l_item<mpl_::long_<12l>, librbd::journal::SnapRollbackEvent, boost::mpl::l_item<mpl_::long_<11l>, librbd::journal::RenameEvent, boost::mpl::l_item<mpl_::long_<10l>, librbd::journal::ResizeEvent, boost::mpl::l_item<mpl_::long_<9l>, librbd::journal::FlattenEvent, boost::mpl::l_item<mpl_::long_<8l>, librbd::journal::DemotePromoteEvent, boost::mpl::l_item<mpl_::long_<7l>, librbd::journal::SnapLimitEvent, boost::mpl::l_item<mpl_::long_<6l>, librbd::journal::UpdateFeaturesEvent, boost::mpl::l_item<mpl_::long_<5l>, librbd::journal::MetadataSetEvent, boost::mpl::l_item<mpl_::long_<4l>, librbd::journal::MetadataRemoveEvent, boost::mpl::l_item<mpl_::long_<3l>, librbd::journal::AioWriteSameEvent, boost::mpl::l_item<mpl_::long_<2l>, librbd::journal::AioCompareAndWriteEvent, boost::mpl::l_item<mpl_::long_<1l>, librbd::journal::UnknownEvent, boost::mpl::l_end> > > > > > > > > > > > > > > > > > > > > >>::apply_visitor<librbd::journal::Replay<librbd::ImageCtx>::EventVisitor const>(librbd::journal::Replay<librbd::ImageCtx>::EventVisitor const&) const &+0x55) [0x55e0901c7137]
16: (librbd::journal::Replay<librbd::ImageCtx>::process(librbd::journal::EventEntry const&, Context*, Context*)+0x297) [0x55e0901c75d3]
17: (librbd::Journal<librbd::ImageCtx>::handle_replay_ready()+0x5c7) [0x55e08ffb895d]
18: (librbd::Journal<librbd::ImageCtx>::handle_replay_process_ready(int)+0x85) [0x55e08ffb8a4f]
19: (librbd::util::detail::C_CallbackAdapter<librbd::Journal<librbd::ImageCtx>, &librbd::Journal<librbd::ImageCtx>::handle_replay_process_ready>::finish(int)+0xd) [0x55e08ffb8bd5]
20: (Context::complete(int)+0x9) [0x55e08f9554cb]
21: (boost::asio::detail::completion_handler<librbd::asio::ContextWQ::queue(Context*, int)::{lambda()#1}, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x9b) [0x55e08f962a5e]
22: (boost::asio::detail::strand_service::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x90) [0x55e08ff5965a]
23: (boost::asio::detail::scheduler::run(boost::system::error_code&)+0x49c) [0x55e09036fa00]
24: (std::thread::_State_impl<std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}> > >::_M_run()+0x59) [0x55e0905bc7d7]
25: /lib64/libstdc++.so.6(+0xc2ba3) [0x7face0972ba3]
26: /lib64/libpthread.so.0(+0x817f) [0x7face4d4917f]
</pre>
rbd - Tasks #54312 (Resolved): combine the journal and snapshot test scripts
https://tracker.ceph.com/issues/54312
2022-02-17T06:48:30Z
Deepika Upadhyay
<p>combine the journal and snapshot test scripts into one generic, that would test either journal or snapshot depending on a value of an evn variable, to avoid the code duplication.</p>
rbd - Bug #54259 (New): rbd_csi: mirroring state lost(unknown) in csi rbd env
https://tracker.ceph.com/issues/54259
2022-02-11T14:51:03Z
Deepika Upadhyay
<pre>
rbd image 'test-demote-sb':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 691f95312d38
block_name_prefix: rbd_data.691f95312d38
format: 2
features: layering, exclusive-lock, object-map, fast-diff, non-primary
op_features:
flags:
create_timestamp: Fri Feb 11 13:33:19 2022
access_timestamp: Fri Feb 11 13:33:19 2022
modify_timestamp: Fri Feb 11 13:33:19 2022
mirroring state: unknown
mirroring mode: snapshot
mirroring global id: 3c20e812-aeb2-40b3-ad3b-8dd421d370b5
mirroring primary: false
vanilla ceph:
rbd image 'a1':
rbd image 'a1':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 1
id: 10ecdc486a57
block_name_prefix: rbd_data.10ecdc486a57
format: 2
features: layering, exclusive-lock, object-map, fast-diff, non-primary
op_features:
flags:
create_timestamp: Fri Feb 11 14:29:16 2022
access_timestamp: Fri Feb 11 14:29:16 2022
modify_timestamp: Fri Feb 11 14:29:16 2022
2022-02-11T14:50:31.531+0000 7fed807a1400 20 librbd::api::Image: get_parent: image_ctx=0x55ec5fb4e650
mirroring state: enabled
mirroring mode: snapshot
mirroring global id: c91fc91e-786f-49a2-a16e-69e664c5bf57
mirroring primary: false
</pre>
<p>restarting doesn't help, mirroring state remains unknown.<br />Same test with same branch works fine on vanilla ceph</p>
rbd - Bug #54158 (Closed): jenkins api tests: RBD permission error (error deleting image from trash)
https://tracker.ceph.com/issues/54158
2022-02-04T21:00:42Z
Deepika Upadhyay
<pre>
2022-02-04T18:24:42.355+0000 7fc429176700 -1 librbd::api::Trash: remove: error: deferment time has not expired.
2022-02-04T18:24:42.359+0000 7fc429176700 0 [dashboard ERROR taskexec] Error while calling Task(ns=rbd/trash/remove, md={'image_id_spec': 'rbd/6e0f356c0547'})
Traceback (most recent call last):
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/services/exception.py", line 61, in handle_rbd_error
yield
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/controllers/rbd.py", line 445, in delete
return rbd_call(pool_name, namespace, self.rbd_inst.trash_remove, image_id,
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/services/rbd.py", line 87, in rbd_call
func(ioctx, *args, **kwargs)
File "rbd.pyx", line 876, in rbd.RBD.trash_remove
rbd.PermissionError: [errno 1] RBD permission error (error deleting image from trash)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/tools.py", line 550, in _run
val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs) # type: ignore
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/services/exception.py", line 63, in handle_rbd_error
raise DashboardException(e, component='rbd')
dashboard.exceptions.DashboardException: [errno 1] RBD permission error (error deleting image from trash)
2022-02-04T18:24:42.359+0000 7fc417dd4700 0 [dashboard INFO taskmgr] finished Task(ns=rbd/trash/remove, md={'
</pre>
<p><a class="external" href="https://jenkins.ceph.com/job/ceph-api/31358/consoleFull#-811452663e4bfde06-44c1-4b3c-8379-d9ee175fb257">https://jenkins.ceph.com/job/ceph-api/31358/consoleFull#-811452663e4bfde06-44c1-4b3c-8379-d9ee175fb257</a></p>
rbd - Bug #54098 (Duplicate): [rbd-mirror] hangs forever after split brain, while searching for d...
https://tracker.ceph.com/issues/54098
2022-02-01T07:21:39Z
Deepika Upadhyay
<p>this is not readily reproducible in ceph dev env, but the operations that leads to this condition:</p>
<p>perform IO workload for substantial time(~1hr) on primary cluster c1<br />while workload is running failover to secondary cluster c2; demote the image</p>
<p>this leads to split-brain:<br /><pre>
2021-11-19T13:45:06.549+0000 7f1eccd83700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd936b6000 scan_remote_mirror_snapshots: split-brain detected: failed to find matching non-primary snapshot in remote image: local_snap_id_start=9339, local_snap_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=c3249fb1-852f-4253-8ffa-cc4117d17a21, primary_mirror_uuid=, primary_snap_id=head, last_copied_object_number=0, snap_seqs={}]
586581 2021-11-19T13:45:06.549+0000 7f1eccd83700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd936b6000 notify_status_updated:
</pre></p>
<p>this leads to some missing demotion snapshots that are never found:<br /><pre>
2021-11-19T15:19:06.108+0000 7f1eccd83700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd95eb6400 scan_remote_mirror_snapshots: remote mirror snapshot: id=7451, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=e0e53715-2be7-46e3-8001-d8048321babc, primary_mirror_uuid=ad4d4f8b-4b43-49e3-8d83-768c763e9a4b, primary_snap_id=1359, last_copied_object_number=12800, snap_seqs={4953=18446744073709551614}]
2021-11-19T15:19:06.108+0000 7f1eccd83700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd95eb6400 scan_remote_mirror_snapshots: skipping remote snapshot 7451 while searching for demotion
</pre></p>
<p>complete logs: <a class="external" href="https://drive.google.com/file/d/1FfVT84daBKZgXJkwuWbhDUZQzEiaI8pU/view?usp=sharing">https://drive.google.com/file/d/1FfVT84daBKZgXJkwuWbhDUZQzEiaI8pU/view?usp=sharing</a></p>
rbd - Bug #53854 (Resolved): test_librbd: possible memory leak in getting group_info for image
https://tracker.ceph.com/issues/53854
2022-01-12T15:50:20Z
Deepika Upadhyay
<p>Not sure real bug or false alarm but recording the failure just in case:</p>
<pre>
2022-01-07T10:41:24.579 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== HEAP SUMMARY:
2022-01-07T10:41:24.580 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== in use at exit: 6,636 bytes in 24 blocks
2022-01-07T10:41:24.580 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== total heap usage: 73,736,674 allocs, 73,736,650 frees, 126,212,815,186 bytes allocated
2022-01-07T10:41:24.580 INFO:tasks.workunit.client.0.smithi123.stderr:==37500==
2022-01-07T10:41:24.600 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== 1 bytes in 1 blocks are definitely lost in loss record 1 of 14
2022-01-07T10:41:24.600 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
2022-01-07T10:41:24.601 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0x876B80D: strdup (in /usr/lib64/libc-2.28.so)
2022-01-07T10:41:24.601 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0x8D9AA0: group_info_cpp_to_c (librbd.cc:221)
2022-01-07T10:41:24.601 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0x8D9AA0: rbd_get_group (librbd.cc:5264)
2022-01-07T10:41:24.601 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0x870B18: TestGroup_add_image_Test::TestBody() (test_Groups.cc:101)
2022-01-07T10:41:24.601 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFD43ED: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest.cc:2605)
2022-01-07T10:41:24.602 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFD43ED: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2641)
2022-01-07T10:41:24.602 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC825A: Run (gtest.cc:2680)
2022-01-07T10:41:24.602 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC825A: testing::Test::Run() (gtest.cc:2670)
2022-01-07T10:41:24.603 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC83C4: Run (gtest.cc:2858)
2022-01-07T10:41:24.603 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC83C4: testing::TestInfo::Run() (gtest.cc:2831)
2022-01-07T10:41:24.603 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC8920: Run (gtest.cc:3012)
2022-01-07T10:41:24.603 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC8920: testing::TestSuite::Run() (gtest.cc:2991)
2022-01-07T10:41:24.604 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFCA534: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5723)
2022-01-07T10:41:24.604 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFD495D: HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (gtest.cc:2605)
2022-01-07T10:41:24.604 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFD495D: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2641)
2022-01-07T10:41:24.604 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0xFC848F: testing::UnitTest::Run() (gtest.cc:5306)
2022-01-07T10:41:24.605 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0x6A698C: RUN_ALL_TESTS (gtest.h:2486)
2022-01-07T10:41:24.605 INFO:tasks.workunit.client.0.smithi123.stderr:==37500== by 0x6A698C: main (test_main.cc:70)
</pre>
<p>seen in : <a class="external" href="https://trello.com/c/ZochLpMG/1413-wip-yuri6-testing-2022-01-05-1255">https://trello.com/c/ZochLpMG/1413-wip-yuri6-testing-2022-01-05-1255</a><br />which is majorly a filestore change and seems unrelated.</p>
rbd - Bug #53826 (Closed): rbd-mirror: captures debug log in rbd mirror status last update
https://tracker.ceph.com/issues/53826
2022-01-11T08:28:59Z
Deepika Upadhyay
<pre>
2022-01-11T08:26:07.413+0000 7f8f88f25700 10 librbd::mirror::GetInfoRequest: 0x560af48fbbd0 calc_promotion_state: promotion_state=1, primary_mirror_uuid=a1:
global_id: 612080f8-b205-4679-bca4-8ab2644b672f
state: up+stopped
description: local image is primary
service: admin on vossi03
last_update: 2022-01-11T08:26:07.413+0000 7f8f88f25700 20 librbd::mirror::GetInfoRequest: 0x560af48fbbd0 finish: r=02022-01-11 08:25:44
peer_sites:
name: primary
state: up+replaying
description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"remote_snapshot_timestamp":1641889534,"replay_state":"idle"}
last_update: 2022-01-11 08:25:44
snapshots:
</pre>
<pre>
last_update: 2022-01-11T08:26:07.413+0000 7f8f88f25700 20 librbd::mirror::GetInfoRequest: 0x560af48fbbd0 finish: r=02022-01-11 08:25:44
</pre>
<pre>
should be just last_update: 2022-01-11 08:25:44
</pre>
<p>Similar, other instance<br /><pre>
state: up+replaying2022-01-11T08:40:44.841+0000 7f02592af380 20 librbd::api::Snapshot: list: snap_list 0x557449d925c0
</pre></p>
<p>Snap listing also has this issue<br /><pre>
SNAPID 2022-01-11T15:45:39.722+0000 7f3729ffb700 5 librbd::io::Dispatcher: 0x55fea5765fc0 register_dispatch: dispatch_layer=1NAME
SIZE PROTECTED TIMESTAMP 2022-01-11T15:45:39.722+0000 7f3729ffb700 10 librbd::ImageState: 0x55fea5765f40 0x55fea5765f40 handle_open: r=0
NAMESPACE
183 .mirror.primary.ce3375e2-b6e8-41c3-b959-d3ae6f517bb0.af63dda5-994c-418f-a1b2-13197fa5cf03 1 GiB 2022-01-11T15:45:39.722+0000 7f3738329380 20 librbd::api::Snapshot: list: snap_list 0x55fea571e840Tue Jan 11 15:41:12 2022 mirror (demoted peer_uuids:[4ab979b6-f1b9-437f-a3a1-a99afa321acf])
</pre></p>
rbd - Bug #53713 (Closed): [pwl] persistent cache hits abort when running rbd bench
https://tracker.ceph.com/issues/53713
2021-12-23T12:01:42Z
Deepika Upadhyay
<p>1. clone latest pacific upstream branch<br />2. compile ceph with options -DWITH_RBD_SSD_CACHE=ON -DWITH_RBD_RWL=ON<br />3. start vstart with pwl cache:<br /><pre>
MON=1 OSD=1 MGR=0 MDS=0 RGW=0 ../src/vstart.sh -n -d -o "debug librbd = 20
debug rbd = 20
rbd_default_features = 61
rbd_persistent_cache_mode = rwl
rbd_persistent_cache_path = /home/ideepika/ceph2/ceph/build/write_back_cache
rbd_persistent_cache_size = 1073741824
rbd_plugins = pwl_cache"
</pre></p>
<p>cache status during rbd bench run:<br /><pre>
Watchers:
watcher=172.21.10.3:0/3672603674 client.4113 cookie=140392213283296
Image cache state: {"present":"true","empty":"true","clean":"true","cache_type":"rwl","pwl_host":"vossi03","pwl_path":"/home/ideepika/ceph2/ceph/build/write_back_cache/rbd-pwl.test1.100d7102aa61.pool","pwl_size":1073741824}
</pre></p>
<p>4. create pool, image and run rbd bench:<br /><pre>
bin/ceph osd pool create test1 128;bin/rbd create --size 1024 test1/image1;bin/rbd ls test1;bin/rbd bench --io-type write image1 --pool=test1 --io-threads=16
</pre></p>
<p>the ios error out hitting ceph_abort with below stacktrace, I am yet to verify it on master</p>
<pre>
/home/ideepika/ceph2/ceph/src/common/lockdep.cc: 171: ceph_abort_msg("abort() called")
ceph version 16.2.4-2006-gbb6237f728a (bb6237f728a28d0a16efe3ac8107d7882d53d03c) pacific (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xfe) [0x7fafe347b926]
2: /home/ideepika/ceph2/ceph/build/lib/libceph-common.so.2(+0x17f0e46) [0x7fafe3580e46]
3: (lockdep_register(char const*)+0x24) [0x7fafe358123d]
4: (ceph::mutex_debug_detail::mutex_debugging_base::_register()+0x20) [0x7fafe358e48a]
5: (ceph::mutex_debug_detail::mutex_debugging_base::mutex_debugging_base(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool)+0x9a) [0x7fafe358e3ec]
6: (ceph::mutex_debug_detail::mutex_debug_impl<false>::mutex_debug_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool)+0x66) [0x55825221f108]
7: (ceph::mutex_debug_detail::mutex_debug_impl<false> ceph::make_mutex<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)+0x58) [0x558252405e62]
8: (librbd::cache::pwl::WriteLogEntry::WriteLogEntry(std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>, unsigned long, unsigned long)+0x11d) [0x7faf9af9530b]
9: (librbd::cache::pwl::rwl::WriteLogEntry::WriteLogEntry(std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>, unsigned long, unsigned long)+0x56) [0x7faf9afb63b8]
10: (void __gnu_cxx::new_allocator<librbd::cache::pwl::rwl::WriteLogEntry>::construct<librbd::cache::pwl::rwl::WriteLogEntry, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(librbd::cache::pwl::rwl::WriteLogEntry*, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x99) [0x7faf9afc5e1d]
11: (void std::allocator_traits<std::allocator<librbd::cache::pwl::rwl::WriteLogEntry> >::construct<librbd::cache::pwl::rwl::WriteLogEntry, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>&, librbd::cache::pwl::rwl::WriteLogEntry*, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x62) [0x7faf9afc561e]
12: (std::_Sp_counted_ptr_inplace<librbd::cache::pwl::rwl::WriteLogEntry, std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0xd5) [0x7faf9afc48f5]
13: (std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<librbd::cache::pwl::rwl::WriteLogEntry, std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(librbd::cache::pwl::rwl::WriteLogEntry*&, std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::rwl::WriteLogEntry> >, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x100) [0x7faf9afc3a38]
14: (std::__shared_ptr<librbd::cache::pwl::rwl::WriteLogEntry, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::rwl::WriteLogEntry> >, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x7a) [0x7faf9afc336e]
15: (std::shared_ptr<librbd::cache::pwl::rwl::WriteLogEntry>::shared_ptr<std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::rwl::WriteLogEntry> >, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x67) [0x7faf9afc2eef]
16: (std::shared_ptr<librbd::cache::pwl::rwl::WriteLogEntry> std::allocate_shared<librbd::cache::pwl::rwl::WriteLogEntry, std::allocator<librbd::cache::pwl::rwl::WriteLogEntry>, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(std::allocator<librbd::cache::pwl::rwl::WriteLogEntry> const&, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x62) [0x7faf9afc289d]
17: (std::shared_ptr<librbd::cache::pwl::rwl::WriteLogEntry> std::make_shared<librbd::cache::pwl::rwl::WriteLogEntry, std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&>(std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>&, unsigned long&, unsigned long&)+0x7e) [0x7faf9afc1f10]
18: (librbd::cache::pwl::rwl::Builder<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx> >::create_write_log_entry(std::shared_ptr<librbd::cache::pwl::SyncPointLogEntry>, unsigned long, unsigned long)+0x43) [0x7faf9afc1693]
19: (librbd::cache::pwl::C_WriteRequest<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx> >::create_operation(unsigned long, unsigned long)+0xbf) [0x7faf9af80849]
20: (librbd::cache::pwl::C_WriteRequest<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx> >::setup_log_operations(librbd::cache::pwl::DeferredContexts&)+0x4bf) [0x7faf9af80dc7]
21: (librbd::cache::pwl::C_WriteRequest<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx> >::dispatch()+0x1d1) [0x7faf9af8057f]
22: (librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::alloc_and_dispatch_io_req(librbd::cache::pwl::C_BlockIORequest<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx> >*)+0x205) [0x7faf9af3386d]
23: (librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::write(std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&&, ceph::buffer::v15_2_0::list&&, int, Context*)::{lambda(librbd::cache::pwl::GuardedRequestFunctionContext&)#2}::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const+0x41) [0x7faf9af3acbb]
24: (boost::detail::function::void_function_obj_invoker1<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::write(std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&&, ceph::buffer::v15_2_0::list&&, int, Context*)::{lambda(librbd::cache::pwl::GuardedRequestFunctionContext&)#2}, void, librbd::cache::pwl::GuardedRequestFunctionContext&>::invoke(boost::detail::function::function_buffer&, librbd::cache::pwl::GuardedRequestFunctionContext&)+0x2b) [0x7faf9af51002]
25: (boost::function1<void, librbd::cache::pwl::GuardedRequestFunctionContext&>::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const+0x6c) [0x7faf9af3a288]
26: (librbd::cache::pwl::GuardedRequestFunctionContext::finish(int)+0x42) [0x7faf9af2856c]
27: (Context::complete(int)+0x27) [0x5582522535e1]
28: (librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::detain_guarded_request(librbd::cache::pwl::C_BlockIORequest<librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx> >*, librbd::cache::pwl::GuardedRequestFunctionContext*, bool)+0x249) [0x7faf9af2e827]
29: (librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::write(std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&&, ceph::buffer::v15_2_0::list&&, int, Context*)+0x515) [0x7faf9af2b2b5]
30: (librbd::cache::WriteLogImageDispatch<librbd::ImageCtx>::write(librbd::io::AioCompletion*, std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&&, ceph::buffer::v15_2_0::list&&, std::shared_ptr<neorados::IOContext>, int, ZTracer::Trace const&, unsigned long, std::atomic<unsigned int>*, librbd::io::DispatchResult*, Context**, Context*)+0x282) [0x7faf9af1890e]
</pre>
mgr - Bug #53569 (Rejected): zipfile.BadZipFile: Bad magic number for central directory
https://tracker.ceph.com/issues/53569
2021-12-09T18:52:52Z
Deepika Upadhyay
<pre>
zipfile.BadZipFile: Bad magic number for central directory
Traceback (most recent call last):
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 164, in exc_logging_wrapper
status = run_func(*args)
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
return func(self, options, args)
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 360, in run
_, build_failures = build(
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/wheel_builder.py", line 346, in build
wheel_file = _build_one(
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/wheel_builder.py", line 227, in _build_one
_verify_one(req, wheel_path)
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/wheel_builder.py", line 174, in _verify_one
dist = get_wheel_distribution(FilesystemWheel(wheel_path), canonical_name)
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/metadata/__init__.py", line 51, in get_wheel_distribution
return Distribution.from_wheel(wheel, canonical_name)
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/metadata/pkg_resources.py", line 37, in from_wheel
with wheel.as_zipfile() as zf:
File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-dashboard-py3-virtualenv/lib/python3.8/site-packages/pip/_internal/metadata/base.py", line 321, in as_zipfile
return zipfile.ZipFile(self.location, allowZip64=True)
File "/usr/lib/python3.8/zipfile.py", line 1269, in __init__
self._RealGetContents()
File "/usr/lib/python3.8/zipfile.py", line 1364, in _RealGetContents
raise BadZipFile("Bad magic number for central directory")
zipfile.BadZipFile: Bad magic number for central directory
</pre>
<p><a class="external" href="https://jenkins.ceph.com/job/ceph-pull-requests/86953/">https://jenkins.ceph.com/job/ceph-pull-requests/86953/</a></p>