Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-28T18:45:10Z
Ceph
Redmine
CephFS - Backport #65214 (In Progress): squid: mds: quiesce_inode op waiting on remote auth pins ...
https://tracker.ceph.com/issues/65214
2024-03-28T18:45:10Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56564">https://github.com/ceph/ceph/pull/56564</a></p>
CephFS - Bug #65182 (Pending Backport): mds: quiesce_inode op waiting on remote auth pins is not ...
https://tracker.ceph.com/issues/65182
2024-03-27T16:01:30Z
Patrick Donnelly
pdonnell@redhat.com
<pre>
{
"description": "internal op quiesce_path:mds.1:1048 fp=#0x1/volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2",
"initiated_at": "2024-03-26T10:06:14.974850+0000",
"age": 101818.022728012,
"duration": 101818.025116246,
"continuous": true,
"type_data": {
"result": -2147483648,
"flag_point": "cleaned up request",
"reqid": {
"entity": {
"type": "mds",
"num": 1
},
"tid": 1048
},
"op_type": "internal_op",
"internal_op": 5384,
"op_name": "quiesce_path",
"events": [
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "initiated"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "throttled"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "header_read"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "all_read"
},
{
"time": "2024-03-26T10:06:14.974850+0000",
"event": "dispatched"
},
{
"time": "2024-03-26T10:06:14.974869+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:14.974879+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:14.974888+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:14.974898+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:21.501232+0000",
"event": "killing request"
},
{
"time": "2024-03-26T10:06:21.501253+0000",
"event": "cleaned up request"
}
],
"locks": []
}
},
...
{
"description": "internal op quiesce_inode:mds.1:1049 fp=#0x100008e255a fp2=#0x100008e255a",
"initiated_at": "2024-03-26T10:06:14.974908+0000",
"age": 101818.022670109,
"duration": 101818.02511086701,
"continuous": true,
"type_data": {
"result": -2147483648,
"flag_point": "quiesce complete for non-auth inode",
"reqid": {
"entity": {
"type": "mds",
"num": 1
},
"tid": 1049
},
"op_type": "internal_op",
"internal_op": 5385,
"op_name": "quiesce_inode",
"events": [
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "initiated"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "throttled"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "header_read"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "all_read"
},
{
"time": "2024-03-26T10:06:14.974908+0000",
"event": "dispatched"
},
{
"time": "2024-03-26T10:06:14.974977+0000",
"event": "requesting remote authpins"
},
{
"time": "2024-03-26T10:06:21.615411+0000",
"event": "acquired locks"
},
{
"time": "2024-03-26T10:06:21.615458+0000",
"event": "quiesce complete for non-auth inode"
}
],
"locks": [
{
"object": {
"is_auth": false,
"auth_state": {
"replicas": {}
},
"replica_state": {
"authority": [
0,
-2
],
"replica_nonce": 1
},
"auth_pins": 0,
"is_frozen": false,
"is_freezing": false,
"pins": {
"request": 1,
"lock": 1
},
"nref": 2
},
"object_string": "[inode 0x100008e255a [...2ae,head] /volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2/ rep@0.1 v1696 snaprealm=0x55b78d09f440 f(v0 m2024-03-26T10:05:13.326074+0000 10=2+8) n(v56 rc2024-03-26T10:17:04.624239+0000 b2670077140 31541=28967+2574)/n(v0 rc2024-03-26T09:40:15.892764+0000 b1027604480 138=3+135) (inest mix) (iquiesce lock x=1 by request(mds.1:1049 nref=3)) | request=1 lock=1 0x55b78d1b4580]",
"lock": {
"gather_set": [],
"state": "lock",
"type": "iquiesce",
"is_leased": false,
"num_rdlocks": 0,
"num_wrlocks": 0,
"num_xlocks": 1,
"xlock_by": {
"reqid": {
"entity": {
"type": "mds",
"num": 1
},
"tid": 1049
}
}
},
"flags": 4,
"wrlock_target": -1
}
]
}
},
</pre>
<p>This is an op dump from a QE test cluster. The quiesce_path was killed and shortly after the quiesce_inode op received the remote authpins allowing it to proceed. However, MDCache::request_kill does not actually kill a request waiting on remote authpins so it is allowed to proceed with its quiesce.</p>
mgr - Backport #65154 (In Progress): quincy: pybind/mgr/devicehealth: "rados.ObjectNotFound: [err...
https://tracker.ceph.com/issues/65154
2024-03-26T13:33:13Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56480">https://github.com/ceph/ceph/pull/56480</a></p>
mgr - Backport #65153 (In Progress): reef: pybind/mgr/devicehealth: "rados.ObjectNotFound: [errno...
https://tracker.ceph.com/issues/65153
2024-03-26T13:33:06Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/56479">https://github.com/ceph/ceph/pull/56479</a></p>
CephFS - Backport #65107 (New): quincy: qa: probabilistically ignore PG_AVAILABILITY/PG_DEGRADED
https://tracker.ceph.com/issues/65107
2024-03-25T05:00:35Z
Backport Bot
CephFS - Backport #65106 (New): squid: qa: probabilistically ignore PG_AVAILABILITY/PG_DEGRADED
https://tracker.ceph.com/issues/65106
2024-03-25T05:00:28Z
Backport Bot
CephFS - Bug #65018 (Fix Under Review): PG_DEGRADED warnings during cluster creation via cephadm:...
https://tracker.ceph.com/issues/65018
2024-03-21T00:27:43Z
Patrick Donnelly
pdonnell@redhat.com
<pre>
2024-03-20T19:01:35.938 DEBUG:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:360516069d9393362c4cc6eb9371680fe16d66ab shell --fsid b40d606c-e6ea-11ee-95c9-87774f69a715 -- ceph osd last-stat-seq osd.1
2024-03-20T19:01:36.042 INFO:journalctl@ceph.mon.a.smithi043.stdout:Mar 20 19:01:35 smithi043 ceph-mon[31664]: osdmap e88: 12 total, 12 up, 12 in
2024-03-20T19:01:36.250 INFO:journalctl@ceph.mon.b.smithi118.stdout:Mar 20 19:01:35 smithi118 ceph-mon[36322]: osdmap e88: 12 total, 12 up, 12 in
2024-03-20T19:01:36.261 INFO:journalctl@ceph.mon.c.smithi151.stdout:Mar 20 19:01:35 smithi151 ceph-mon[36452]: osdmap e88: 12 total, 12 up, 12 in
2024-03-20T19:01:36.479 INFO:teuthology.orchestra.run.smithi043.stdout:223338299439
2024-03-20T19:01:36.479 DEBUG:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:360516069d9393362c4cc6eb9371680fe16d66ab shell --fsid b40d606c-e6ea-11ee-95c9-87774f69a715 -- ceph osd last-stat-seq osd.7
2024-03-20T19:01:36.513 INFO:teuthology.orchestra.run.smithi043.stderr:Inferring config /var/lib/ceph/b40d606c-e6ea-11ee-95c9-87774f69a715/mon.a/config
2024-03-20T19:01:37.010 INFO:journalctl@ceph.mon.a.smithi043.stdout:Mar 20 19:01:36 smithi043 ceph-mon[31664]: pgmap v349: 97 pgs: 1 activating+degraded, 2 activating, 1 peering, 93 active+clean; 639 KiB data, 360 MiB used, 1.0 TiB / 1.0 TiB avail; 3.1 KiB/s rd, 5 op/s; 2/192 objects degraded (1.042%)
2024-03-20T19:01:37.011 INFO:journalctl@ceph.mon.a.smithi043.stdout:Mar 20 19:01:36 smithi043 ceph-mon[31664]: Health check failed: Degraded data redundancy: 2/192 objects degraded (1.042%), 1 pg degraded (PG_DEGRADED)
</pre>
<p><a class="external" href="https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612919/">https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612919/</a></p>
<p>many others also fail in similar fashion.</p>
CephFS - Fix #64984 (Pending Backport): qa: probabilistically ignore PG_AVAILABILITY/PG_DEGRADED
https://tracker.ceph.com/issues/64984
2024-03-19T14:24:01Z
Patrick Donnelly
pdonnell@redhat.com
mgr - Bug #63882 (Pending Backport): pybind/mgr/devicehealth: "rados.ObjectNotFound: [errno 2] RA...
https://tracker.ceph.com/issues/63882
2023-12-21T13:47:37Z
Patrick Donnelly
pdonnell@redhat.com
<p>We got a report of this failure:</p>
<pre>
"backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 342, in serve\n finished_loading_legacy = self.check_legacy_pool()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 321, in check_legacy_pool\n if self._load_legacy_object(ioctx, obj.key):",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 295, in _load_legacy_object\n ioctx.operate_read_op(op, oid)",
" File \"rados.pyx\", line 3720, in rados.Ioctx.operate_read_op",
"rados.ObjectNotFound: [errno 2] RADOS object not found (Failed to operate read op for oid HGST_<redacted>)"
],
</pre>
<p>The object appears in the pool listing but cannot be operated on, strangely. Make the module handle this robustly.</p>
CephFS - Backport #63676 (In Progress): reef: High memory usage on standby replay MDS
https://tracker.ceph.com/issues/63676
2023-11-29T10:24:45Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/54716">https://github.com/ceph/ceph/pull/54716</a></p>
Ceph - Backport #63277 (In Progress): reef: cmake: dependency ordering error for liburing and lib...
https://tracker.ceph.com/issues/63277
2023-10-20T13:11:18Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/54122">https://github.com/ceph/ceph/pull/54122</a></p>
Ceph - Backport #63276 (In Progress): quincy: cmake: dependency ordering error for liburing and l...
https://tracker.ceph.com/issues/63276
2023-10-20T13:11:11Z
Backport Bot
<p><a class="external" href="https://github.com/ceph/ceph/pull/54123">https://github.com/ceph/ceph/pull/54123</a></p>
Ceph - Bug #63218 (Pending Backport): cmake: dependency ordering error for liburing and librocksdb
https://tracker.ceph.com/issues/63218
2023-10-16T15:01:52Z
Patrick Donnelly
pdonnell@redhat.com
<pre>
./do_cmake.sh -DWITH_PYTHON3=3.6 -DWITH_BABELTRACE=OFF -DWITH_MANPAGE=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_RADOSGW=OFF -DWITH_LTTNG=OFF -DWITH_RDMA=OFF -DWITH_SEASTAR=OFF -DWITH_CEPH_DEBUG_MUTEX=ON
...
$ cmake --build build/ --verbose
...
FAILED: bin/ceph_test_keyvaluedb_iterators
: && /opt/rh/gcc-toolset-11/root/usr/bin/g++ -Og -g -rdynamic -pie src/test/ObjectMap/CMakeFiles/ceph_test_keyvaluedb_iterators.dir/test_keyvaluedb_iterators.cc.o src/test/ObjectMap/CMakeFiles/ceph_test_keyvaluedb_iterators.dir/KeyValueDBMemory.cc.o -o bin/ceph_test_keyvaluedb_iterators -Wl,-rpath,/home/pdonnell/scratch/build/lib lib/libos.a lib/libgmock_maind.a lib/libgmockd. a lib/libgtestd.a -lpthread -ldl lib/libglobal.a -ldl /usr/lib64/librt.so -lresolv -ldl lib/libblk.a /lib64/libaio.so src/liburing/src/liburing.a lib/libkv.a lib/libheap_profiler.a /lib64/libtcmalloc.so src/rocksdb/librocksdb.a /lib64/libsnappy.so /usr/lib64/liblz4.so /usr/lib64/libz.so /usr/lib64/libfuse.so lib/libceph-common.so.2 src/opentelemetry-cpp/sdk/src/trace/libopentelemetry_trace.a src/opentelemetry-cpp/sdk/src/resource/libopentelemetry_resources.a src/opentelemetry-cpp/sdk/src/common/libopentelemetry_common.a src/opentelemetry-cpp/exporters/jaeger/libopentelemetry_exporter_jaeger_trace.a src/opentelemetry-cpp/ext/src/http/client/curl/libopentelemetry_http_client_curl.a /usr/lib64/libcurl.so /usr/lib64/libthrift.so lib/libjson_spirit.a lib/libcommon_utf8.a lib/liberasure_code.a lib/libextblkdev.a -lcap boost/lib/libboost_thread.a boost/lib/libboost_chrono.a boost/lib/libboost_atomic.a boost/lib/libboost_system.a boost/lib/libboost_random.a boost/lib/libboost_program_options.a boost/lib/libboost_date_time.a boost/lib/libboost_iostreams.a boost/lib/libboost_regex.a lib/libfmtd.a /usr/lib64/libblkid.so -lpthread /usr/lib64/libcrypto.so /usr/lib64/libudev.so /usr/lib64/libz.so -ldl -lresolv -Wl,--as-needed -latomic && :
/opt/rh/gcc-toolset-11/root/usr/bin/ld: src/rocksdb/librocksdb.a(fs_posix.cc.o): in function `io_uring_wait_cqe_nr':
/home/pdonnell/scratch/build/src/liburing/src/include/liburing.h:494: undefined reference to `__io_uring_get_cqe'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: src/rocksdb/librocksdb.a(fs_posix.cc.o): in function `rocksdb::(anonymous namespace)::PosixFileSystem::AbortIO(std::vector<void*, std::allocator<void*> >&)':
/home/pdonnell/ceph/src/rocksdb/env/fs_posix.cc:1125: undefined reference to `io_uring_get_sqe'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: /home/pdonnell/ceph/src/rocksdb/env/fs_posix.cc:1134: undefined reference to `io_uring_submit'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: src/rocksdb/librocksdb.a(fs_posix.cc.o): in function `rocksdb::CreateIOUring()':
/home/pdonnell/ceph/src/rocksdb/env/io_posix.h:272: undefined reference to `io_uring_queue_init'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: src/rocksdb/librocksdb.a(io_posix.cc.o): in function `io_uring_wait_cqe_nr':
/home/pdonnell/scratch/build/src/liburing/src/include/liburing.h:494: undefined reference to `__io_uring_get_cqe'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: src/rocksdb/librocksdb.a(io_posix.cc.o): in function `rocksdb::PosixRandomAccessFile::MultiRead(rocksdb::FSReadRequest*, unsigned long, rocksdb::IOOptions const&, rocksdb::IODebugContext*)':
/home/pdonnell/ceph/src/rocksdb/env/io_posix.cc:674: undefined reference to `io_uring_get_sqe'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: /home/pdonnell/ceph/src/rocksdb/env/io_posix.cc:684: undefined reference to `io_uring_submit_and_wait'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: src/rocksdb/librocksdb.a(io_posix.cc.o): in function `rocksdb::PosixRandomAccessFile::ReadAsync(rocksdb::FSReadRequest&, rocksdb::IOOptions const&, std::function<void (rocksdb::FSReadRequest const&, void*)>, void*, void**, std::function<void (void*)>*, rocksdb::IODebugContext*)':
/home/pdonnell/ceph/src/rocksdb/env/io_posix.cc:901: undefined reference to `io_uring_get_sqe'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: /home/pdonnell/ceph/src/rocksdb/env/io_posix.cc:910: undefined reference to `io_uring_submit'
collect2: error: ld returned 1 exit status
</pre>
<p>This is on vossi04. I'm not sure what change to the system caused this error but the linking order is clearly wrong.</p>
CephFS - Backport #59558 (In Progress): quincy: qa: RuntimeError: more than one file system avail...
https://tracker.ceph.com/issues/59558
2023-04-26T14:20:32Z
Rishabh Dave
<p><a class="external" href="https://github.com/ceph/ceph/pull/52241">https://github.com/ceph/ceph/pull/52241</a></p>
CephFS - Bug #59425 (Pending Backport): qa: RuntimeError: more than one file system available
https://tracker.ceph.com/issues/59425
2023-04-11T14:10:25Z
Patrick Donnelly
pdonnell@redhat.com
<pre>Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_8d156aede5efdae00b53d8d3b8d127082980e7ec/teuthology/run_tasks.py", line 109, in run_tasks
manager.__enter__()
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/ceph.py", line 1893, in task
healthy(ctx=ctx, config=dict(cluster=config['cluster']))
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/ceph.py", line 1474, in healthy
ceph_fs.wait_for_daemons(timeout=300)
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/cephfs/filesystem.py", line 1097, in wait_for_daemons
status = self.getinfo(refresh=True)
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/cephfs/filesystem.py", line 545, in getinfo
raise RuntimeError("more than one file system available")
RuntimeError: more than one file system available
2023-04-11T00:43:27.591 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=e97063c90400452b932c9b98d75f076a
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_8d156aede5efdae00b53d8d3b8d127082980e7ec/teuthology/run_tasks.py", line 109, in run_tasks
manager.__enter__()
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/ceph.py", line 1893, in task
healthy(ctx=ctx, config=dict(cluster=config['cluster']))
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/ceph.py", line 1474, in healthy
ceph_fs.wait_for_daemons(timeout=300)
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/cephfs/filesystem.py", line 1097, in wait_for_daemons
status = self.getinfo(refresh=True)
File "/home/teuthworker/src/git.ceph.com_ceph-c_687b814f6a8c4db74daf97d825bbfa90b5560fa3/qa/tasks/cephfs/filesystem.py", line 545, in getinfo
raise RuntimeError("more than one file system available")
RuntimeError: more than one file system available
</pre>
<p>/ceph/teuthology-archive/pdonnell-2023-04-11_00:14:25-fs-wip-pdonnell-testing-20230410.205400-quincy-distro-default-smithi/7237945/teuthology.log</p>
<p>Problem also exists on main. Caused by: <a class="external" href="https://github.com/ceph/ceph/pull/50896">https://github.com/ceph/ceph/pull/50896</a></p>