Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2023-09-11T16:48:07Z
Ceph
Redmine
Ceph - Bug #62804 (New): Jaegertracing compile failure
https://tracker.ceph.com/issues/62804
2023-09-11T16:48:07Z
Adam Emerson
aemerson@redhat.com
<p>Compilation fails with:</p>
<pre><code class="text syntaxhl"><span class="CodeRay">In file included from /home/aemerson/work/ceph/main/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/src/TUDPTransport.h:16,
from /home/aemerson/work/ceph/main/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/src/TUDPTransport.cc:6:
/home/aemerson/work/ceph/main/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/src/TUDPTransport.cc: In member function ‘virtual void opentelemetry::v1::exporter::jaeger::TUDPTransport::close()’:
/home/aemerson/work/ceph/main/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/src/TUDPTransport.cc:71:7: error: ‘::close’ has not been declared; did you mean ‘pclose’?
71 | ::THRIFT_CLOSESOCKET(socket_);
| ^~~~~~~~~~~~~~~~~~
</span></code></pre>
<p>It can be fixed with:</p>
<pre><code class="cpp syntaxhl"><span class="CodeRay">diff --git a/exporters/jaeger/src/TUDPTransport.cc b/exporters/jaeger/src/TUDPTransport.cc
index e4111273.<span class="float">.0</span>ea86288 <span class="integer">100644</span>
--- a/exporters/jaeger/src/TUDPTransport.cc
+++ b/exporters/jaeger/src/TUDPTransport.cc
<span class="error">@</span><span class="error">@</span> -<span class="integer">3</span>,<span class="integer">6</span> +<span class="integer">3</span>,<span class="integer">8</span> <span class="error">@</span><span class="error">@</span>
<span class="preprocessor">#include</span> <span class="include"><sstream></span> <span class="comment">// std::stringstream</span>
+<span class="preprocessor">#include</span> <span class="include"><unistd.h></span>
+
<span class="preprocessor">#include</span> <span class="include">"TUDPTransport.h"</span>
<span class="preprocessor">#include</span> <span class="include">"opentelemetry/sdk_config.h"</span>
</span></code></pre>
<p>applied to jaegertracing/opentelemetry-cpp</p>
mgr - Bug #52658 (Need More Info): mgr: After upgrade to 16.2.6 crash on startup with python error
https://tracker.ceph.com/issues/52658
2021-09-19T06:47:18Z
Igor Savlook
<p>After upgrade to 16.2.6 ceph-mgr crash on startup with python modules error.</p>
<a name="Environment"></a>
<h3 >Environment<a href="#Environment" class="wiki-anchor">¶</a></h3>
<ul>
<li>Ceph version: 16.2.6</li>
<li>Platform (OS/distro/release): Fedora 36 (rawhide) aarch64</li>
</ul>
<a name="How-reproducible"></a>
<h3 >How reproducible<a href="#How-reproducible" class="wiki-anchor">¶</a></h3>
<p>1. Upgrade to 16.2.6 on Fedora rawhide.</p>
<p>2. ceph-mgr core modules crash.</p>
<a name="Actual-results"></a>
<h3 >Actual results<a href="#Actual-results" class="wiki-anchor">¶</a></h3>
<p>All python modules failed to load with this error in journal:</p>
<pre>
Sep 18 15:11:35 ceph-master0 ceph-mgr[71463]: 2021-09-18T15:11:35.086+0300 ffff6adb4530 -1 mgr load Failed to construct class in 'balancer'
Sep 18 15:11:35 ceph-master0 ceph-mgr[71463]: 2021-09-18T15:11:35.086+0300 ffff6adb4530 -1 mgr load Traceback (most recent call last):
File "/usr/share/ceph/mgr/balancer/module.py", line 427, in __init__
super(Module, self).__init__(*args, **kwargs)
File "/usr/share/ceph/mgr/mgr_module.py", line 882, in __init__
self._configure_logging(mgr_level, log_level, cluster_level,
File "/usr/share/ceph/mgr/mgr_module.py", line 581, in _configure_logging
self._cluster_log_handler = ClusterLogHandler(self)
File "/usr/share/ceph/mgr/mgr_module.py", line 535, in __init__
super().__init__()
RuntimeError: super(): __class__ cell not found
</pre>
<pre>
Sep 18 15:11:35 ceph-master0 ceph-mgr[71463]: 2021-09-18T15:11:35.090+0300 ffff6adb4530 -1 mgr operator() Failed to run module in active mode ('balancer')
Sep 18 15:11:35 ceph-master0 ceph-mgr[71463]: 2021-09-18T15:11:35.090+0300 ffff6adb4530 -1 mgr load Failed to construct class in 'crash'
Sep 18 15:11:35 ceph-master0 ceph-mgr[71463]: 2021-09-18T15:11:35.090+0300 ffff6adb4530 -1 mgr load Traceback (most recent call last):
File "/usr/share/ceph/mgr/crash/module.py", line 37, in __init__
super(Module, self).__init__(*args, **kwargs)
File "/usr/share/ceph/mgr/mgr_module.py", line 882, in __init__
self._configure_logging(mgr_level, log_level, cluster_level,
File "/usr/share/ceph/mgr/mgr_module.py", line 581, in _configure_logging
self._cluster_log_handler = ClusterLogHandler(self)
File "/usr/share/ceph/mgr/mgr_module.py", line 535, in __init__
super().__init__()
RuntimeError: super(): __class__ cell not found
</pre>
<a name="Expected-results"></a>
<h3 >Expected results<a href="#Expected-results" class="wiki-anchor">¶</a></h3>
<p>Succeful start of ceph-mgr.</p>
RADOS - Bug #50208 (New): [ FAILED ] CephSQLiteTest.InsertBulk4096 [with slow ops during this t...
https://tracker.ceph.com/issues/50208
2021-04-07T12:46:20Z
Deepika Upadhyay
<pre>
2021-04-06T15:02:12.602 INFO:tasks.ceph.osd.0.smithi016.stderr:2021-04-06T15:02:12.600+0000 7f6791369700 -1 osd.0 27 get_health_metrics reporting 29 slow ops, oldest is osd_op(client.4419.0:2172 3.17 3.aefb40d7 (undecoded) ondisk+retry+write+known_if_redirected e24)
2021-04-06T15:02:13.610 INFO:tasks.ceph.osd.0.smithi016.stderr:2021-04-06T15:02:13.608+0000 7f6791369700 -1 osd.0 27 get_health_metrics reporting 28 slow ops, oldest is osd_op(client.4419.0:1212 3.1f 3.c717541f (undecoded) ondisk+retry+write+known_if_redirected e27)
2021-04-06T15:02:14.191 INFO:teuthology.orchestra.run.smithi016.stdout:[/build/ceph-17.0.0-2790-g80aaef5b/src/test/libcephsqlite/main.cc:188] sqlite3 error: 10 `disk I/O error': disk I/O error
2021-04-06T15:02:14.192 INFO:teuthology.orchestra.run.smithi016.stdout:/build/ceph-17.0.0-2790-g80aaef5b/src/test/libcephsqlite/main.cc:199: Failure
2021-04-06T15:02:14.192 INFO:teuthology.orchestra.run.smithi016.stdout:Expected equality of these values:
2021-04-06T15:02:14.192 INFO:teuthology.orchestra.run.smithi016.stdout: 0
2021-04-06T15:02:14.193 INFO:teuthology.orchestra.run.smithi016.stdout: rc
2021-04-06T15:02:14.193 INFO:teuthology.orchestra.run.smithi016.stdout: Which is: 10
2021-04-06T15:02:14.195 INFO:teuthology.orchestra.run.smithi016.stdout:[ FAILED ] CephSQLiteTest.InsertBulk4096 (315826 ms)
2021-04-06T15:02:14.195 INFO:teuthology.orchestra.run.smithi016.stdout:[ RUN ] CephSQLiteTest.InsertBulk
</pre><br />/ceph/teuthology-archive/ideepika-2021-04-06_14:26:42-rados-wip-deepika-testing-2021-04-05-0643-distro-basic-smithi/6024579/teuthology.log
RADOS - Bug #47813 (Fix Under Review): osd op age is 4294967296
https://tracker.ceph.com/issues/47813
2020-10-10T08:54:27Z
Dan van der Ster
<p>Every few seconds on a 14.2.11 OSD I see an op with age close to 2^32:</p>
<pre>
{
"description": "MOSDECSubOpRead(35.fa2s4 158887/158791 ECSubRead(tid=786379, to_read={35:45f0772e:::61c59385-085d-4caa-9070-63a3868dccb6.135305794.1328_data%2ffd%2ffd55f6a6f49dedbfbc27fe782deae2c25230b16a70d2989bd2cab3913ae2bbbf:head=0,2097152,0}, subchunks={35:45f0772e:::61c59385-085d-4caa-9070-63a3868dccb6.135305794.1328_data%2ffd%2ffd55f6a6f49dedbfbc27fe782deae2c25230b16a70d2989bd2cab3913ae2bbbf:head=[0,1]}, attrs_to_read=))",
"initiated_at": "2020-10-10 10:50:34.302604",
"age": 4294967295.999867,
"duration": 0.00047244300000000002,
"type_data": {
"flag_point": "queued for pg",
"events": [
{
"time": "2020-10-10 10:50:34.302604",
"event": "initiated"
},
{
"time": "2020-10-10 10:50:34.302604",
"event": "header_read"
},
{
"time": "2020-10-10 10:50:34.302600",
"event": "throttled"
},
{
"time": "2020-10-10 10:50:34.302612",
"event": "all_read"
},
{
"time": "2020-10-10 10:50:34.302613",
"event": "dispatched"
},
{
"time": "2020-10-10 10:50:34.302616",
"event": "queued_for_pg"
}
]
}
},
</pre>
<p>The op is fine and completes quickly so maybe this is an initialization bug?</p>
RADOS - Bug #46508 (New): Health check failed: Reduced data availability: 1 pg inactive (PG_AVAIL...
https://tracker.ceph.com/issues/46508
2020-07-13T19:02:35Z
Neha Ojha
nojha@redhat.com
<p>rados/basic/{ceph clusters/{fixed-2 openstack} msgr-failures/many msgr/async objectstore/bluestore-comp-snappy rados supported-random-distro$/{rhel_8} tasks/rados_workunit_loadgen_mix</p>
<p>/a/teuthology-2020-07-12_07:01:02-rados-master-distro-basic-smithi/5217609</p>
RADOS - Bug #45761 (Need More Info): mon_thrasher: "Error ENXIO: mon unavailable" during sync_for...
https://tracker.ceph.com/issues/45761
2020-05-29T05:16:56Z
Brad Hubbard
bhubbard@redhat.com
<p>/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5097794</p>
<pre>
2020-05-28T06:13:04.288 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.g
2020-05-28T06:13:04.289 INFO:tasks.mon_thrash.mon_thrasher:killing mon.g
2020-05-28T06:13:04.289 DEBUG:tasks.ceph.mon.g:waiting for process to exit
2020-05-28T06:13:04.289 INFO:teuthology.orchestra.run:waiting for 300
2020-05-28T06:13:04.349 INFO:tasks.ceph.mon.g:Stopped
2020-05-28T06:13:04.349 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.f
2020-05-28T06:13:04.350 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.f store
2020-05-28T06:13:04.350 INFO:teuthology.orchestra.run.smithi078:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.f sync_force --yes-i-really-mean-it
2020-05-28T06:13:20.481 INFO:teuthology.orchestra.run.smithi078.stderr:Error ENXIO: mon unavailable
2020-05-28T06:13:20.485 DEBUG:teuthology.orchestra.run:got remote process result: 6
2020-05-28T06:13:20.486 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/mon_thrash.py", line 232, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/mon_thrash.py", line 285, in _do_thrash
self.thrash_store(mon)
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/mon_thrash.py", line 175, in thrash_store
'--yes-i-really-mean-it')
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/ceph_manager.py", line 1363, in raw_cluster_cmd
stdout=BytesIO(),
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 206, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 473, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi078 with status 6: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.f sync_force --yes-i-really-mean-it'
2020-05-28T06:13:24.555 INFO:tasks.workunit.client.0.smithi139.stderr:noup is unset
2020-05-28T06:13:24.568 INFO:tasks.workunit.client.0.smithi139.stderr:+ (( ++i ))
2020-05-28T06:13:24.568 INFO:tasks.workunit.client.0.smithi139.stderr:+ (( i < num ))
2020-05-28T06:13:24.568 INFO:tasks.workunit.client.0.smithi139.stderr:+ ceph osd set noup
2020-05-28T06:13:25.825 INFO:tasks.daemonwatchdog.daemon_watchdog:MonitorThrasher failed
2020-05-28T06:13:25.825 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
2020-05-28T06:13:25.825 ERROR:tasks.daemonwatchdog.daemon_watchdog:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/daemonwatchdog.py", line 37, in _run
self.watch()
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/daemonwatchdog.py", line 113, in watch
self.bark()
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.27.20/qa/tasks/daemonwatchdog.py", line 53, in bark
for mount in self.ctx.mounts.values():
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/config.py", line 254, in __getattr__
raise AttributeError(name)
AttributeError: mounts
</pre>
<p>Several of the MONs are never restarted so we end with.</p>
<pre>
2020-05-28T06:47:14.331 INFO:teuthology.run:Summary data:
description: rados/monthrash/{ceph clusters/9-mons msgr-failures/mon-delay msgr/async-v1only
objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest}
thrashers/force-sync-many workloads/rados_mon_osdmap_prune}
duration: 2537.953857898712
failure_reason: failed to reach quorum size 9 before timeout expired
</pre>
<p>/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083290</p>
<p>This looks like the same issue but with a different error.</p>
<pre>
2020-05-22T23:38:28.429 INFO:tasks.mon_thrash.mon_thrasher:monitors to thrash: ['d', 'a', 'h']
2020-05-22T23:38:28.429 INFO:tasks.mon_thrash.mon_thrasher:monitors to freeze: []
2020-05-22T23:38:28.430 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.d
2020-05-22T23:38:28.430 INFO:tasks.mon_thrash.mon_thrasher:killing mon.d
2020-05-22T23:38:28.430 DEBUG:tasks.ceph.mon.d:waiting for process to exit
2020-05-22T23:38:28.430 INFO:teuthology.orchestra.run:waiting for 300
2020-05-22T23:38:28.440 INFO:tasks.ceph.mon.d:Stopped
2020-05-22T23:38:28.441 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.a
2020-05-22T23:38:28.441 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.a store
2020-05-22T23:38:28.441 INFO:teuthology.orchestra.run.smithi204:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.a sync_force --yes-i-really-mean-it
2020-05-22T23:38:45.598 INFO:teuthology.orchestra.run.smithi204.stderr:Error ENXIO: problem getting command descriptions from mon.a
2020-05-22T23:38:45.610 DEBUG:teuthology.orchestra.run:got remote process result: 6
2020-05-22T23:38:45.610 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.22.20/qa/tasks/mon_thrash.py", line 232, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.22.20/qa/tasks/mon_thrash.py", line 285, in _do_thrash
self.thrash_store(mon)
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.22.20/qa/tasks/mon_thrash.py", line 175, in thrash_store
'--yes-i-really-mean-it')
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_5.22.20/qa/tasks/ceph_manager.py", line 1363, in raw_cluster_cmd
stdout=BytesIO(),
File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/remote.py", line 206, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/run.py", line 473, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/run.py", line 162, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/run.py", line 184, in _raise_for_status
node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi204 with status 6: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.a sync_force --yes-i-really-mean-it'
</pre>