Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2022-04-12T19:52:59Z
Ceph
Redmine
mgr - Feature #55303 (New): pg autoscaler: only warn about changes that will take many days
https://tracker.ceph.com/issues/55303
2022-04-12T19:52:59Z
Dan van der Ster
<p>Motivation: <a class="external" href="https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z2UPETNARNVEPTYYA5Q6J5QBCUWKWTZ2/">https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z2UPETNARNVEPTYYA5Q6J5QBCUWKWTZ2/</a></p>
<blockquote>
<p>We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting rebalancing of<br />misplaced objects is overwhelming the cluster and impacting MON DB compaction, deep scrub<br />repairs and us upgrading legacy bluestore OSDs. We have to pause the rebalancing if<br />misplaced objects or we're going to fall over.</p>
<p>Autoscaler-status tells us that we are reducing our PGs by 700'ish which will take us<br />over 100 days to complete at our current recovery speed</p>
</blockquote>
<p>The autoscaler should not trigger such changes behind the back of the operator.<br />I propose that it should estimate the amount of time to carry out a split or merge operation, and only "warn" if that operation would take longer than a day (configurable), even if autoscale_mode is on.<br />Seeing the HEALTH_WARN, the operator can then schedule and carry out the pg split or merge at a time that suits their operations.</p>
RADOS - Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
https://tracker.ceph.com/issues/51433
2021-06-29T20:33:55Z
Dan van der Ster
<p>While merging PGs our osdmaps are churning through ~2000 epochs per hour.<br />The osdmap diffs are empty:</p>
<pre>
# diff 302192.txt 302193.txt
1c1
< epoch 302192
---
> epoch 302193
4c4
< modified 2021-06-29 16:43:16.224508
---
> modified 2021-06-29 16:43:18.285394
25c25
< pool 35 'default.rgw.buckets.data' erasure size 6 min_size 5 crush_rule 3 object_hash rjenkins pg_num 4190 pgp_num 4170 pg_num_target 4096 pgp_num_target 4096 last_change 302192 lfor 0/300438/300436 flags hashpspool,nodelete,nosizechange stripe_width 4096 fast_read 1 application rgw
---
> pool 35 'default.rgw.buckets.data' erasure size 6 min_size 5 crush_rule 3 object_hash rjenkins pg_num 4190 pgp_num 4170 pg_num_target 4096 pgp_num_target 4096 last_change 302193 lfor 0/300438/300436 flags hashpspool,nodelete,nosizechange stripe_width 4096 fast_read 1 application rgw
</pre>
<p>I found that the mgr is spamming:<br /><pre>
2021-06-29 16:59:15.756021 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216166 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:15.757416 mon.cephgabe0 (mon.0) 1134697 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:16.188506 mon.cephgabe0 (mon.0) 1134698 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 16:59:17.819641 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216167 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:17.820362 mon.cephgabe0 (mon.0) 1134700 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:18.249184 mon.cephgabe0 (mon.0) 1134701 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 16:59:19.166836 mon.cephgabe0 (mon.0) 1134703 : audit [DBG] from='client.? [xxx]:0/1791927115' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 16:59:19.831892 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216168 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:19.832719 mon.cephgabe0 (mon.0) 1134704 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:20.330562 mon.cephgabe0 (mon.0) 1134705 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 16:59:20.590091 mon.cephgabe0 (mon.0) 1134707 : audit [DBG] from='client.? [xxx]:0/3061431364' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 16:59:21.868946 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216169 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:21.869712 mon.cephgabe0 (mon.0) 1134708 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:22.014923 mon.cephgabe0 (mon.0) 1134709 : audit [DBG] from='client.? [xxx]:0/1894286236' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 16:59:22.383554 mon.cephgabe0 (mon.0) 1134710 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 16:59:23.441387 mon.cephgabe0 (mon.0) 1134712 : audit [DBG] from='client.? [xxx]:0/1242539268' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 16:59:23.891402 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216170 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 16:59:23.892950 mon.cephgabe0 (mon.0) 1134713 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
...
2021-06-29 17:17:24.082373 mon.cephgabe0 (mon.0) 1136634 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 17:17:24.578802 mon.cephgabe0 (mon.0) 1136635 : audit [DBG] from='client.? [xxx]:0/3837647218' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 17:17:25.012672 mon.cephgabe0 (mon.0) 1136636 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 17:17:26.007230 mon.cephgabe0 (mon.0) 1136638 : audit [DBG] from='client.? [xxx]:0/4094418758' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 17:17:26.093214 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216759 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 17:17:26.093977 mon.cephgabe0 (mon.0) 1136639 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 17:17:27.044498 mon.cephgabe0 (mon.0) 1136640 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 17:17:27.431356 mon.cephgabe0 (mon.0) 1136642 : audit [DBG] from='client.? [xxx]:0/2297050664' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 17:17:28.125837 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216760 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 17:17:28.126546 mon.cephgabe0 (mon.0) 1136643 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 17:17:28.861731 mon.cephgabe0 (mon.0) 1136644 : audit [DBG] from='client.? [xxx]:0/2133339951' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2021-06-29 17:17:29.125072 mon.cephgabe0 (mon.0) 1136645 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd='[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]': finished
2021-06-29 17:17:30.141884 mon.cephgabe-mon-7bb8b9e3d5 (mon.2) 216761 : audit [INF] from='mgr.490803962 [xxx]:0/2660941' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
2021-06-29 17:17:30.142422 mon.cephgabe0 (mon.0) 1136647 : audit [INF] from='mgr.490803962 ' entity='mgr.cephgabe-mon-7bb8b9e3d5' cmd=[{"prefix": "osd pool set", "pool": "default.rgw.buckets.data", "var": "pgp_num_actual", "val": "4170"}]: dispatch
</pre></p>
<p>It's a bug in DaemonServer::adjust_pgs</p>
RADOS - Bug #47767 (Resolved): octopus: setting noscrub crashed osd process
https://tracker.ceph.com/issues/47767
2020-10-06T14:20:54Z
Dan van der Ster
<p>We just had a crash of one osd (out of ~1200) moments after we set noscrub and nodeep-scrub on the cluster.</p>
<p>Here's the setting of noscrub and the reporting of osd.762 failing:<br /><pre>
2020-10-06 15:54:45.474 7f60f532e700 0 mon.cephbeesly-mon-2a00f134e5@0(leader) e40 handle_command mon_command({"prefix": "osd set", "key": "noscrub"} v 0) v1
2020-10-06 15:54:45.474 7f60f532e700 0 log_channel(audit) log [INF] : from='client.504140151 ' entity='client.admin' cmd=[{"prefix": "osd set", "key": "noscrub"}]: dispatch
2020-10-06 15:54:45.956 7f60f7b33700 1 mon.cephbeesly-mon-2a00f134e5@0(leader).osd e3446634 do_prune osdmap full prune enabled
2020-10-06 15:54:46.249 7f60f3b2b700 1 mon.cephbeesly-mon-2a00f134e5@0(leader).osd e3446635 e3446635: 1222 total, 1219 up, 1216 in
2020-10-06 15:54:46.254 7f60f3b2b700 0 log_channel(audit) log [INF] : from='client.504140151 ' entity='client.admin' cmd='[{"prefix": "osd set", "key": "noscrub"}]': finished
2020-10-06 15:54:46.254 7f60f3b2b700 0 log_channel(cluster) log [DBG] : osdmap e3446635: 1222 total, 1219 up, 1216 in
2020-10-06 15:54:47.855 7f60f7b33700 0 log_channel(cluster) log [WRN] : Health check update: Degraded data redundancy: 558438/1087980300 objects degraded (0.051%), 29 pgs degraded, 29 pgs undersized (PG_DEGRADED)
2020-10-06 15:54:47.856 7f60f7b33700 1 mon.cephbeesly-mon-2a00f134e5@0(leader).osd e3446635 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 125829120 full_alloc: 520093696 kv_alloc: 369098752
2020-10-06 15:54:50.096 7f60f532e700 1 mon.cephbeesly-mon-2a00f134e5@0(leader).osd e3446635 prepare_failure osd.762 [v2:128.142.162.24:6800/201371,v1:128.142.162.24:6801/201371] from osd.929 is reporting failure:1
2020-10-06 15:54:50.096 7f60f532e700 0 log_channel(cluster) log [DBG] : osd.762 reported immediately failed by osd.929
2020-10-06 15:54:50.096 7f60f532e700 1 mon.cephbeesly-mon-2a00f134e5@0(leader).osd e3446635 we're forcing failure of osd.762
2020-10-06 15:54:50.096 7f60f532e700 0 log_channel(cluster) log [INF] : osd.762 failed (root=default,room=0513-R-0050,rack=RA17,host=p06253939h49320) (connection refused reported by osd.929)
<pre>
The osdmap e3446634 contains the noscrub, nodeep-scrub flags.
Here's osd.762 crashing:
<pre>
-25> 2020-10-06 15:54:44.719 7f57ba0c4700 2 osd.762 pg_epoch: 3446633 pg[4.4b1( v 3446633'286971848 (3446612'286968763,3446633'286971848] local-lis/les=3399272/3399273 n=36055 ec=355/355 lis/c 3399272/3399272 les/c/f 3399273/3399299/0 3399272/3399272/3399220) [762,247,793] r=0 lpr=3399272 crt=3446633'286971848 lcod 344
6633'286971847 mlcod 3446633'286971847 active+clean+scrubbing+deep trimq=[1984d~1] ps=119]
-24> 2020-10-06 15:54:45.093 7f57ba0c4700 10 monclient: _renew_subs
-23> 2020-10-06 15:54:45.093 7f57ba0c4700 10 monclient: _send_mon_message to mon.p05517715y01595 at v2:188.184.36.166:3300/0
-22> 2020-10-06 15:54:45.093 7f57c58db700 3 osd.762 3446633 handle_osd_map epochs [3446634,3446634], i have 3446633, src has [3442954,3446634]
-21> 2020-10-06 15:54:45.280 7f57c30d6700 3 osd.762 3446634 handle_osd_map epochs [3446634,3446634], i have 3446634, src has [3442954,3446634]
-20> 2020-10-06 15:54:45.280 7f57c30d6700 3 osd.762 3446634 handle_osd_map epochs [3446634,3446634], i have 3446634, src has [3442954,3446634]
-19> 2020-10-06 15:54:45.322 7f57c30d6700 3 osd.762 3446634 handle_osd_map epochs [3446634,3446634], i have 3446634, src has [3442954,3446634]
-18> 2020-10-06 15:54:45.322 7f57c30d6700 3 osd.762 3446634 handle_osd_map epochs [3446634,3446634], i have 3446634, src has [3442954,3446634]
-17> 2020-10-06 15:54:45.324 7f57ba0c4700 0 log_channel(cluster) log [DBG] : 4.4b1 scrub starts
-16> 2020-10-06 15:54:45.328 7f57c30d6700 3 osd.762 3446634 handle_osd_map epochs [3446634,3446634], i have 3446634, src has [3442954,3446634]
-15> 2020-10-06 15:54:45.372 7f57c30d6700 3 osd.762 3446634 handle_osd_map epochs [3446634,3446634], i have 3446634, src has [3442954,3446634]
-14> 2020-10-06 15:54:45.389 7f57ba0c4700 2 osd.762 pg_epoch: 3446634 pg[4.4b1( v 3446633'286971848 (3446612'286968763,3446633'286971848] local-lis/les=3399272/3399273 n=36055 ec=355/355 lis/c 3399272/3399272 les/c/f 3399273/3399299/0 3399272/3399272/3399220) [762,247,793] r=0 lpr=3399272 crt=3446633'286971848 lcod 3446633'286971847 mlcod 3446633'286971847 active+clean+scrubbing trimq=[1984d~1] ps=119] scrub_compare_maps replica 247 has 3 items
-13> 2020-10-06 15:54:45.389 7f57ba0c4700 2 osd.762 pg_epoch: 3446634 pg[4.4b1( v 3446633'286971848 (3446612'286968763,3446633'286971848] local-lis/les=3399272/3399273 n=36055 ec=355/355 lis/c 3399272/3399272 les/c/f 3399273/3399299/0 3399272/3399272/3399220) [762,247,793] r=0 lpr=3399272 crt=3446633'286971848 lcod 3446633'286971847 mlcod 3446633'286971847 active+clean+scrubbing trimq=[1984d~1] ps=119] scrub_compare_maps replica 793 has 3 items
-12> 2020-10-06 15:54:45.389 7f57ba0c4700 2 osd.762 pg_epoch: 3446634 pg[4.4b1( v 3446633'286971848 (3446612'286968763,3446633'286971848] local-lis/les=3399272/3399273 n=36055 ec=355/355 lis/c 3399272/3399272 les/c/f 3399273/3399299/0 3399272/3399272/3399220) [762,247,793] r=0 lpr=3399272 crt=3446633'286971848 lcod 3446633'286971847 mlcod 3446633'286971847 active+clean+scrubbing trimq=[1984d~1] ps=119] scrub_compare_maps osd.762 has 3 items
-11> 2020-10-06 15:54:45.389 7f57ba0c4700 2 osd.762 pg_epoch: 3446634 pg[4.4b1( v 3446633'286971848 (3446612'286968763,3446633'286971848] local-lis/les=3399272/3399273 n=36055 ec=355/355 lis/c 3399272/3399272 les/c/f 3399273/3399299/0 3399272/3399272/3399220) [762,247,793] r=0 lpr=3399272 crt=3446633'286971848 lcod 3446633'286971847 mlcod 3446633'286971847 active+clean+scrubbing trimq=[1984d~1] ps=119] 4.4b1 shard 247 4:8d20000d:::rbd_data.afa6497c52e435.000000000001adf5:head : missing
4.4b1 shard 247 4:8d20001b:::rbd_data.e28da6777550e9.00000000004025c5:head : missing
4.4b1 shard 247 4:8d20007c:::rbd_data.9e6d808a383234.000000000005eb0f:head : missing
4.4b1 shard 762 4:8d22820b:::rbd_data.0e08f963b0c4e9.0000000000041e4a:head : missing
4.4b1 shard 793 4:8d22820b:::rbd_data.0e08f963b0c4e9.0000000000041e4a:head : missing
4.4b1 shard 762 4:8d228210:::rbd_data.6db2f7c4d97bf74.00000000000155ee:head : missing
4.4b1 shard 793 4:8d228210:::rbd_data.6db2f7c4d97bf74.00000000000155ee:head : missing
4.4b1 shard 762 4:8d228217:::rbd_data.67f02bc2a30fea8.0000000000000e59:head : missing
4.4b1 shard 793 4:8d228217:::rbd_data.67f02bc2a30fea8.0000000000000e59:head : missing
-10> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 247 4:8d20000d:::rbd_data.afa6497c52e435.000000000001adf5:head : missing
-9> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 247 4:8d20001b:::rbd_data.e28da6777550e9.00000000004025c5:head : missing
-8> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 247 4:8d20007c:::rbd_data.9e6d808a383234.000000000005eb0f:head : missing
-7> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 762 4:8d22820b:::rbd_data.0e08f963b0c4e9.0000000000041e4a:head : missing
-6> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 793 4:8d22820b:::rbd_data.0e08f963b0c4e9.0000000000041e4a:head : missing
-5> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 762 4:8d228210:::rbd_data.6db2f7c4d97bf74.00000000000155ee:head : missing
-4> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 793 4:8d228210:::rbd_data.6db2f7c4d97bf74.00000000000155ee:head : missing
-3> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 762 4:8d228217:::rbd_data.67f02bc2a30fea8.0000000000000e59:head : missing
-2> 2020-10-06 15:54:45.389 7f57ba0c4700 -1 log_channel(cluster) log [ERR] : 4.4b1 shard 793 4:8d228217:::rbd_data.67f02bc2a30fea8.0000000000000e59:head : missing
-1> 2020-10-06 15:54:45.406 7f57ba0c4700 -1 /builddir/build/BUILD/ceph-14.2.11/src/osd/PG.cc: In function 'void PG::do_replica_scrub_map(OpRequestRef)' thread 7f57ba0c4700 time 2020-10-06 15:54:45.394594
/builddir/build/BUILD/ceph-14.2.11/src/osd/PG.cc: 4632: FAILED ceph_assert(scrubber.waiting_on_whom.count(m->from))
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55b3a44150e1]
2: (()+0x4d72a9) [0x55b3a44152a9]
3: (PG::do_replica_scrub_map(boost::intrusive_ptr<OpRequest>)+0x634) [0x55b3a45ad404]
4: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x692) [0x55b3a46b4d42]
5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) [0x55b3a44f4da2]
6: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55b3a47837c2]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x55b3a450fd3f]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x55b3a4ac3c26]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b3a4ac6740]
10: (()+0x7ea5) [0x7f57dc0faea5]
11: (clone()+0x6d) [0x7f57dafbe8dd]
0> 2020-10-06 15:54:45.412 7f57ba0c4700 -1 *** Caught signal (Aborted) **
in thread 7f57ba0c4700 thread_name:tp_osd_tp
</pre>
I have the coredump; maybe this is useful:
<pre>
(gdb) up
#8 0x000055b3a45ad404 in PG::do_replica_scrub_map(boost::intrusive_ptr<OpRequest>) ()
at /usr/src/debug/ceph-14.2.11/src/osd/PG.cc:4635
4635 dout(10) << __func__ << " replica was preempted, setting flag" << dendl;
(gdb) list
4630 dout(10) << __func__ << " waiting_on_whom was " << scrubber.waiting_on_whom
4631 << dendl;
4632 ceph_assert(scrubber.waiting_on_whom.count(m->from));
4633 scrubber.waiting_on_whom.erase(m->from);
4634 if (m->preempted) {
4635 dout(10) << __func__ << " replica was preempted, setting flag" << dendl;
4636 scrub_preempted = true;
4637 }
4638 if (scrubber.waiting_on_whom.empty()) {
4639 requeue_scrub(ops_blocked_by_scrub());
(gdb) p m->from
$1 = {static NO_OSD = 2147483647, osd = 247, shard = {id = -1 '\377', static NO_SHARD = {id = -1 '\377',
static NO_SHARD = <same as static member of an already seen type>}}}
</pre>
I suppose this is related to 46275 which just landed in this release, so I marked this as a regression.</p></pre>
mgr - Bug #47238 (Resolved): mgr becomes unresponsive when the progress bar is shown
https://tracker.ceph.com/issues/47238
2020-09-01T14:43:20Z
Dan van der Ster
<p><a class="issue tracker-7 status-3 priority-4 priority-default closed" title="Fix: mgr progress module causes needless load (Resolved)" href="https://tracker.ceph.com/issues/46416">#46416</a> works well in that when there is no progress bar in `ceph -s`, the mgr cpu usage is low and the mgr is responsive.</p>
<p>However, when the progress is active, e.g. after `ceph osd out 722`, the mgr becomes unbearably slow, never responding to commands like `ceph balancer status`. The `finisher-Mgr` queue also grows into the multiple 10's of thousands.</p>
<p>I suspect that `progress` needs further optimizations, or at least an option to disable it.</p>
RADOS - Bug #43893 (Duplicate): lingering osd_failure ops (due to failure_info holding references?)
https://tracker.ceph.com/issues/43893
2020-01-30T07:21:19Z
Dan van der Ster
<p>On Nautilus v14.2.6 we see osd_failure ops which linger:</p>
<pre>
# ceph --cluster=erin health detail
HEALTH_WARN 1 slow ops, oldest one blocked for 25202 sec, mon.cepherin-mon-7cb9b591e1 has slow ops
SLOW_OPS 1 slow ops, oldest one blocked for 25202 sec, mon.cepherin-mon-7cb9b591e1 has slow ops
</pre>
<p>The lingering op is:<br /><pre>
{
"description": "osd_failure(failed timeout osd.170 [v2:128.142.162.136:6973/1335903,v1:128.142.162.136:6976/1335903] for 23sec e728185 v728185)",
"initiated_at": "2020-01-30 01:15:34.939494",
"age": 25454.252783782002,
"duration": 25454.252815929001,
"type_data": {
"events": [
{
"time": "2020-01-30 01:15:34.939494",
"event": "initiated"
},
{
"time": "2020-01-30 01:15:34.939494",
"event": "header_read"
},
{
"time": "0.000000",
"event": "throttled"
},
{
"time": "0.000000",
"event": "all_read"
},
{
"time": "0.000000",
"event": "dispatched"
},
{
"time": "2020-01-30 01:15:34.939641",
"event": "mon:_ms_dispatch"
},
{
"time": "2020-01-30 01:15:34.939641",
"event": "mon:dispatch_op"
},
{
"time": "2020-01-30 01:15:34.939642",
"event": "psvc:dispatch"
},
{
"time": "2020-01-30 01:15:34.939676",
"event": "osdmap:preprocess_query"
},
{
"time": "2020-01-30 01:15:34.939678",
"event": "osdmap:preprocess_failure"
},
{
"time": "2020-01-30 01:15:34.939701",
"event": "osdmap:prepare_update"
},
{
"time": "2020-01-30 01:15:34.939702",
"event": "osdmap:prepare_failure"
},
{
"time": "2020-01-30 01:15:34.939739",
"event": "no_reply: send routed request"
}
],
"info": {
"seq": 10919358,
"src_is_mon": false,
"source": "osd.297 v2:128.142.25.100:6898/3120619",
"forwarded_to_leader": false
}
}
},
</pre></p>
<p>We can clear that slow op either by restarting mon.cepherin-mon-7cb9b591e1 or with `ceph osd fail osd.170`.</p>
<p>This issue is identical to <a class="external" href="https://tracker.ceph.com/issues/24531">https://tracker.ceph.com/issues/24531</a>, so presumably that fix was incomplete.</p>
CephFS - Bug #40608 (Duplicate): mds: assert after `delete gather` in C_Drop_Cache::recall_client...
https://tracker.ceph.com/issues/40608
2019-07-01T14:57:42Z
Dan van der Ster
<p>While performing an mds cache drop I had an MDS assert.</p>
<p>Command was:<br /><pre>
> ceph daemon mds.`hostname -s` cache drop 10
</pre></p>
<p>Crash line was:<br /><pre>
0> 2019-07-01 15:25:20.880887 7f5957186700 -1 /builddir/build/BUILD/ceph-12.2.12/src/include/Context.h: In function 'C_GatherBuilderBase<ContextType, GatherType>::~C_GatherBuilderBase() [with ContextType = MDSInternalContextBase; GatherType = MDSGather]' thread 7f5957186700 time 2019-07-01 15:25:20.877570
/builddir/build/BUILD/ceph-12.2.12/src/include/Context.h: 432: FAILED assert(activated)
</pre></p>
<p>Backtrace:<br /><pre>
(gdb) bt
#0 0x00007f595e9dc49b in raise () from /lib64/libpthread.so.0
#1 0x000055a7faafcb86 in reraise_fatal (signum=6) at /usr/src/debug/ceph-12.2.12/src/global/signal_handler.cc:74
#2 handle_fatal_signal (signum=6) at /usr/src/debug/ceph-12.2.12/src/global/signal_handler.cc:138
#3 <signal handler called>
#4 0x00007f595d9ea207 in raise () from /lib64/libc.so.6
#5 0x00007f595d9eb8f8 in abort () from /lib64/libc.so.6
#6 0x000055a7fab3e6d4 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x55a7faf20f90 "activated",
file=file@entry=0x55a7faf1ed98 "/builddir/build/BUILD/ceph-12.2.12/src/include/Context.h", line=line@entry=432,
func=func@entry=0x55a7faf22540 <C_GatherBuilderBase<MDSInternalContextBase, MDSGather>::~C_GatherBuilderBase()::__PRETTY_FUNCTION__> "C_GatherBuilderBase<ContextType, GatherType>::~C_GatherBuilderBase() [with ContextType = MDSInternalContextBase; GatherType = MDSGather]") at /usr/src/debug/ceph-12.2.12/src/common/assert.cc:66
#7 0x000055a7fa8196d2 in C_GatherBuilderBase<MDSInternalContextBase, MDSGather>::~C_GatherBuilderBase (this=<optimized out>, __in_chrg=<optimized out>)
at /usr/src/debug/ceph-12.2.12/src/include/Context.h:432
#8 0x000055a7fa82092b in C_Drop_Cache::recall_client_state (this=0x55a9106c7680) at /usr/src/debug/ceph-12.2.12/src/mds/MDSRank.cc:343
#9 0x000055a7fa7efe1a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-12.2.12/build/boost/include/boost/function/function_template.hpp:760
#10 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-12.2.12/src/include/Context.h:493
#11 0x000055a7fa7ecfe9 in Context::complete (this=0x55a8f031db60, r=<optimized out>) at /usr/src/debug/ceph-12.2.12/src/include/Context.h:70
#12 0x000055a7fab3ae04 in SafeTimer::timer_thread (this=0x55a80596a078) at /usr/src/debug/ceph-12.2.12/src/common/Timer.cc:97
#13 0x000055a7fab3c82d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-12.2.12/src/common/Timer.cc:30
#14 0x00007f595e9d4dd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f595dab1ead in clone () from /lib64/libc.so.6
</pre></p>
<p>Some gdb:<br /><pre>
(gdb) up
#8 0x000055a7fa82092b in C_Drop_Cache::recall_client_state (this=0x55a9106c7680) at /usr/src/debug/ceph-12.2.12/src/mds/MDSRank.cc:343
343 delete gather;
(gdb) p gather
$1 = (MDSGatherBuilder *) 0x55a8a8587820
(gdb) p *gather
$2 = {cct = 0x55a8057081c0, c_gather = 0x55a811b2ad80, finisher = 0x0, activated = false}
(gdb)
</pre></p>
CephFS - Bug #38203 (Can't reproduce): ceph-mds segfault during migrator nicely exporting
https://tracker.ceph.com/issues/38203
2019-02-06T10:34:16Z
Dan van der Ster
<p>We had a v12.2.10 mds crash during md balancing:</p>
<pre>
-22> 2019-02-06 03:49:51.932215 7f7be2bc9700 0 mds.3.migrator nicely exporting to mds.0 [dir 0x500006410dd /volumes/_nogroup/69d07521-b740-46d1-a545-7057160a8fb2/user/atlact1/www/jobs/2019-02-04/ [2,head] auth{0=1,1=1,2=1,4=1} v=744446 cv=744381/744381 REP dir_auth=3 state=1610612738|complete f(v1 m2019-02-05 10:16:58.337992 57=0+57) n(v9947 rc2019-02-06 03:49:44.347839 b110305046691 180971=180914+57)/n(v9947 rc2019-02-06 03:49:37.147924 b110298302472 180966=180909+57) hs=57+0,ss=0+0 dirty=1 | dnwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=0 authpin=0 0x561f5f294e00]
-21> 2019-02-06 03:49:52.583911 7f7be63d0700 2 mds.3.cache check_memory_usage total 14002256, rss 11375468, heap 315424, baseline 315424, buffers 0, 55325 / 2141345 inodes have caps, 83015 caps, 0.0387677 caps per inode
-20> 2019-02-06 03:49:54.692285 7f7be5bcf700 10 monclient: _send_mon_message to mon.cephflax-mon-9b406e0261 at 137.138.121.135:6789/0
-19> 2019-02-06 03:49:56.929179 7f7be7bd3700 10 monclient: tick
-18> 2019-02-06 03:49:56.929232 7f7be7bd3700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-02-06 03:49:26.929206)
-17> 2019-02-06 03:49:57.635358 7f7be63d0700 2 mds.3.cache check_memory_usage total 14002256, rss 11375172, heap 315424, baseline 315424, buffers 0, 55334 / 2133234 inodes have caps, 83029 caps, 0.0389217 caps per inode
-16> 2019-02-06 03:49:58.692466 7f7be5bcf700 10 monclient: _send_mon_message to mon.cephflax-mon-9b406e0261 at 137.138.121.135:6789/0
-15> 2019-02-06 03:50:01.357112 7f7be2bc9700 0 mds.3.migrator nicely exporting to mds.0 [dir 0x100093748b9.101* /volumes/_nogroup/f51e1a94-b1ff-4f2c-9686-c7e9835e0f6c/work/cli-build/ [2,head] auth{0=1} v=437725 cv=437725/437725 REP dir_auth=3 state=1073741826|complete f(v48 m2019-02-05 15:22:45.539666 8788=0+8788) n(v5346 rc2019-02-05 15:23:10.481882 b105348841048 24087=15299+8788) hs=8788+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=0 waiter=0 authpin=0 tempexporting=0 0x561f77cfa000]
-14> 2019-02-06 03:50:01.523391 7f7be8bd5700 0 mds.3.migrator nicely exporting to mds.0 [dir 0x100093748b9.010* /volumes/_nogroup/f51e1a94-b1ff-4f2c-9686-c7e9835e0f6c/work/cli-build/ [2,head] auth{0=1,2=1} v=426986 cv=426986/426986 REP dir_auth=3 state=1073741826|complete f(v48 m2019-02-06 00:38:56.760195 8769=0+8769) n(v5346 rc2019-02-06 00:39:12.952094 b100212238080 24118=15349+8769) hs=8769+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 0x561f875d1800]
-13> 2019-02-06 03:50:02.190203 7f7be93d6700 5 asok(0x561da983e1c0) AdminSocket: request 'get_command_descriptions' '' to 0x561da98201e0 returned 4774 bytes
-12> 2019-02-06 03:50:02.208436 7f7be93d6700 1 do_command 'perf dump' '
-11> 2019-02-06 03:50:02.208745 7f7be93d6700 1 do_command 'perf dump' 'result is 13156 bytes
-10> 2019-02-06 03:50:02.208755 7f7be93d6700 5 asok(0x561da983e1c0) AdminSocket: request 'perf dump' '' to 0x561da9820040 returned 13156 bytes
-9> 2019-02-06 03:50:02.209381 7f7be93d6700 5 asok(0x561da983e1c0) AdminSocket: request 'get_command_descriptions' '' to 0x561da98201e0 returned 4774 bytes
-8> 2019-02-06 03:50:02.227158 7f7be93d6700 1 mds.cephflax-mds-2a4cfd0e2c asok_command: session ls (starting...)
-7> 2019-02-06 03:50:02.234906 7f7be93d6700 1 mds.cephflax-mds-2a4cfd0e2c asok_command: session ls (complete)
-6> 2019-02-06 03:50:02.236787 7f7be93d6700 5 asok(0x561da983e1c0) AdminSocket: request 'session ls' '' to 0x561da9820220 returned 420758 bytes
-5> 2019-02-06 03:50:02.302902 7f7beb355700 0 -- 137.138.52.159:6800/1556042223 >> 10.32.3.223:0/1782848130 conn(0x561dd03b7800 :6800 s=STATE_OPEN pgs=1008711 cs=1 l=0).process bad tag 50
-4> 2019-02-06 03:50:02.569959 7f7beb355700 0 -- 137.138.52.159:6800/1556042223 >> 10.32.3.223:0/1782848130 conn(0x561dbecb3800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 2 vs existing csq=1 existing_state=STATE_STANDBY
-3> 2019-02-06 03:50:02.641677 7f7be63d0700 2 mds.3.cache check_memory_usage total 14002256, rss 11374236, heap 315424, baseline 315424, buffers 0, 55459 / 2107149 inodes have caps, 83164 caps, 0.0394675 caps per inode
-2> 2019-02-06 03:50:02.692685 7f7be5bcf700 10 monclient: _send_mon_message to mon.cephflax-mon-9b406e0261 at 137.138.121.135:6789/0
-1> 2019-02-06 03:50:04.494387 7f7beb355700 0 -- 137.138.52.159:6800/1556042223 >> 188.184.64.209:0/3134808329 conn(0x561dd0419000 :6800 s=STATE_OPEN pgs=1128257 cs=1 l=0).process bad tag 50
0> 2019-02-06 03:50:05.914673 7f7be8bd5700 -1 *** Caught signal (Segmentation fault) **
in thread 7f7be8bd5700 thread_name:ms_dispatch
ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) luminous (stable)
1: (()+0x5ca6c1) [0x561d9f4036c1]
2: (()+0xf5d0) [0x7f7beda135d0]
3: (tcmalloc::CentralFreeList::ReleaseListToSpans(void*)+0x13) [0x7f7bef2c0cb3]
4: (tcmalloc::CentralFreeList::ShrinkCache(int, bool)+0x124) [0x7f7bef2c0df4]
5: (tcmalloc::CentralFreeList::MakeCacheSpace()+0x5d) [0x7f7bef2c0f1d]
6: (tcmalloc::CentralFreeList::InsertRange(void*, void*, int)+0x68) [0x7f7bef2c0f98]
7: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)+0xb8) [0x7f7bef2c4408]
8: (tcmalloc::ThreadCache::Scavenge()+0x48) [0x7f7bef2c4798]
9: (MDRequestImpl::~MDRequestImpl()+0x224) [0x561d9f1b5b64]
10: (MDRequestImpl::~MDRequestImpl()+0x9) [0x561d9f1b5ce9]
11: (OpTracker::unregister_inflight_op(TrackedOp*)+0xf2) [0x561d9f3a88e2]
12: (Migrator::handle_export_discover_ack(MExportDirDiscoverAck*)+0x2e6) [0x561d9f2c9196]
13: (Migrator::dispatch(Message*)+0xbb) [0x561d9f2d370b]
14: (MDSRank::handle_deferrable_message(Message*)+0x613) [0x561d9f1079d3]
15: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x561d9f115e13]
16: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x561d9f116d55]
17: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x561d9f0ff093]
18: (DispatchQueue::entry()+0x792) [0x561d9f73ba22]
19: (DispatchQueue::DispatchThread::entry()+0xd) [0x561d9f4cc9cd]
20: (()+0x7dd5) [0x7f7beda0bdd5]
21: (clone()+0x6d) [0x7f7becae8ead]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre>
<p>Log is in ceph-post-file: 3933388c-0fe6-4095-a122-11d4a4c4c6e8</p>
<p>Core dump is available but 14GB so it might be more convenient if you tell me what to grab from gdb.</p>
Ceph - Bug #14695 (Resolved): osd: rados cppool omap to ec pool crashes osd
https://tracker.ceph.com/issues/14695
2016-02-08T16:40:01Z
Dan van der Ster
<p>If we do cppool from a replicated pool to erasure in hammer 0.94.5, this will cause an OSD assert error when it tries to write the omap object.</p>
<p>User side:<br /><pre>
# rados cppool .rgw.buckets.index test-ec
.rgw.buckets.index:.dir.default.39904722.872 => test-ec:.dir.default.39904722.872
2016-02-08 17:31:08.652540 7f21aea12700 0 -- 128.142.36.227:0/3721633 >> 188.184.18.39:6851/2788949 pipe(0x7f21a405ffa0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f21a4065c00).fault
^C
</pre></p>
<p>ceph-osd assert failure:<br /><pre>
-1> 2016-02-08 17:30:35.042187 7f66ecab6700 1 -- 188.184.18.39:0/2789844 <== osd.237 128.142.23.40:6891/842231 1 ==== osd_op_rep
ly(1 .dir.default.39904722.872 [copy-get max 8388608] v0'0 uv1 ondisk = 0) v6 ==== 192+0+211 (2786091916 0 4072082031) 0x1dd6c940 con
0x2061e3c0
0> 2016-02-08 17:30:35.076894 7f66f5118700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::_write_copy_chunk(ReplicatedP
G::CopyOpRef, PGBackend::PGTransaction*)' thread 7f66f5118700 time 2016-02-08 17:30:35.042276
osd/ReplicatedPG.cc: 6431: FAILED assert(cop->omap_header.length() == 0)
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: (ReplicatedPG::_write_copy_chunk(boost::shared_ptr<ReplicatedPG::CopyOp>, PGBackend::PGTransaction*)+0xb43) [0x895cd3]
2: (ReplicatedPG::_build_finish_copy_transaction(boost::shared_ptr<ReplicatedPG::CopyOp>, PGBackend::PGTransaction*)+0x114) [0x895e7
4]
3: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0x56a) [0x8d4baa]
4: (C_Copyfrom::finish(int)+0xb7) [0x914517]
5: (Context::complete(int)+0x9) [0x69a4d9]
6: (Finisher::finisher_thread_entry()+0x188) [0xa4fc78]
7: /lib64/libpthread.so.0() [0x3358a07a51]
8: (clone()+0x6d) [0x33586e893d]
</pre></p>
<p>Probably this needs to be backported to hammer?<br /> <a class="external" href="https://github.com/ceph/ceph/commit/2b4acfb1b808b98132a771cf1089063c0f7a75b5">https://github.com/ceph/ceph/commit/2b4acfb1b808b98132a771cf1089063c0f7a75b5</a></p>
CephFS - Backport #14668 (Resolved): hammer: Wrong ceph get mdsmap assertion
https://tracker.ceph.com/issues/14668
2016-02-05T08:51:17Z
Dan van der Ster
<p><a class="external" href="https://github.com/ceph/ceph/pull/7542">https://github.com/ceph/ceph/pull/7542</a></p>
Ceph - Bug #13844 (Rejected): ceph df MAX AVAIL is incorrect for simple replicated pool
https://tracker.ceph.com/issues/13844
2015-11-20T10:25:28Z
Dan van der Ster
<p>Since we upgraded from firefly to hammer the MAX AVAIL column is quite wrong for most of our pools.</p>
<pre>
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
3564T 2544T 1020T 28.61
</pre>
<p>Example pool:<br /><pre>
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
volumes 4 255T 7.17 272T 67556416
</pre></p>
<p>By my calculations MAX AVAIL should be 2184T/3 = 728T.</p>
<pre>
pool 4 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 296326 min_read_recency_for_promote 1 stripe_width 0
</pre>
<p>crush_ruleset 0:</p>
<pre>
{
"rule_id": 0,
"rule_name": "data",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "0513-R-0050"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "rack"
},
{
"op": "emit"
}
]
},
</pre>
<p>The osd tree:</p>
<pre>
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-92 0 root drain
-18 218.39999 root os
-22 218.39999 room 0513-R-0050-os
-30 54.59999 rack RJ55-os
-34 54.59999 host p05151113748698-os
-44 54.59999 rack RJ47-os
-26 54.59999 host p05151113613837-os
-62 54.59999 rack RJ45-os
-50 54.59999 host p05151113587529-os
-64 54.59999 rack RJ41-os
-63 54.59999 host p05151113561997-os
-1 3349.22803 root default
-2 2184.42822 room 0513-R-0050
-3 163.80000 rack RJ35
-15 54.59999 host p05151113471870
-16 54.59999 host p05151113489275
-17 54.59999 host p05151113479552
-4 163.80000 rack RJ37
-23 54.59999 host p05151113507373
-24 54.59999 host p05151113508409
-25 54.59999 host p05151113521447
-5 163.80000 rack RJ39
-19 54.59999 host p05151113538756
-20 54.59999 host p05151113535271
-21 54.59999 host p05151113534235
-6 163.80000 rack RJ41
-27 54.59999 host p05151113558761
-28 54.59999 host p05151113544113
-29 54.59999 host p05151113551146
-7 163.79990 rack RJ43
-31 54.59999 host p05151113573587
-32 54.59999 host p05151113578124
-33 54.59991 host p05151113568206
-8 163.79990 rack RJ45
-35 54.59999 host p05151113578807
-41 54.59999 host p05151113585107
-47 54.59991 host p05151113590997
-9 163.80000 rack RJ47
-38 54.59999 host p05151113599377
-39 54.59999 host p05151113598352
-40 54.59999 host p05151113619324
-10 219.30989 rack RJ49
-36 55.50992 host p05151113636272
-43 54.59999 host p05151113640230
-48 54.59999 host p05151113642826
-49 54.59999 host p05151113633314
-11 218.39981 rack RJ51
-37 54.59991 host p05151113676458
-42 54.59999 host p05151113674062
-45 54.59991 host p05151113669359
-46 54.59999 host p05151113654107
-12 219.30989 rack RJ53
-53 54.59999 host p05151113723693
-54 54.59999 host p05151113706163
-56 54.59999 host p05151113719408
-58 55.50992 host p05151113677609
-13 163.20972 rack RJ55
-59 54.39999 host p05151113760120
-60 54.40973 host p05151113725483
-61 54.39999 host p05151113751590
-14 217.59900 rack RJ57
-51 54.39999 host p05151113781242
-52 54.39999 host p05151113782262
-55 54.39999 host p05151113778539
-57 54.39999 host p05151113777233
-65 1164.79980 room 0513-R-0060
-71 582.39990 ipservice S513-A-IP37
-70 291.19995 rack BA09
-69 72.79999 host p05798818a82857
-73 72.79999 host p05798818b00047
-83 72.79999 host p05798818b00174
-86 72.79999 host p05798818b04322
-80 291.19995 rack BA10
-79 72.79999 host p05798818v47100
-88 72.79999 host p05798818v64334
-90 72.79999 host p05798818w03166
-91 72.79999 host p05798818v51559
-76 582.39990 ipservice S513-A-IP62
-75 291.19995 rack BA11
-74 72.79999 host p05798818s98313
-84 72.79999 host p05798818s63747
-85 72.79999 host p05798818s49204
-89 72.79999 host p05798818s40185
-78 291.19995 rack BA12
-77 72.79999 host p05798818b12431
-81 72.79999 host p05798818b37327
-82 72.79999 host p05798818b78429
-87 72.79999 host p05798818b40951
</pre>
Ceph - Bug #13843 (Resolved): ceph osd pool stats broken in hammer
https://tracker.ceph.com/issues/13843
2015-11-20T08:30:20Z
Dan van der Ster
<p>Since we updated from firefly to hammer the ceph osd pool stats is wrong when the cluster is degraded:</p>
<pre>
# ceph status --cluster beesly
cluster b4f463a0-c671-43a8-bd36-e40ab8d233d2
health HEALTH_WARN
3 pgs backfill
23 pgs backfilling
26 pgs degraded
26 pgs stuck degraded
26 pgs stuck unclean
26 pgs stuck undersized
26 pgs undersized
recovery 385262/264401079 objects degraded (0.146%)
recovery 859811/264401079 objects misplaced (0.325%)
monmap e23: 5 mons at {p01001532021656=128.142.35.220:6789/0,p01001532971954=128.142.36.229:6789/0,p05517715d82373=188.184.36.206:6789/0,p05517715y01595=188.184.36.166:6789/0,p05517715y58557=188.184.36.164:6789/0}
election epoch 5302, quorum 0,1,2,3,4 p01001532021656,p01001532971954,p05517715y58557,p05517715y01595,p05517715d82373
mdsmap e224: 1/1/1 up {0=0=up:active}
osdmap e296250: 1200 osds: 1199 up, 1199 in; 26 remapped pgs
pgmap v60417303: 20000 pgs, 22 pools, 339 TB data, 85969 kobjects
1018 TB used, 2543 TB / 3562 TB avail
385262/264401079 objects degraded (0.146%)
859811/264401079 objects misplaced (0.325%)
19974 active+clean
23 active+undersized+degraded+remapped+backfilling
3 active+undersized+degraded+remapped+wait_backfill
recovery io 395 MB/s, 98 objects/s
client io 101 MB/s rd, 342 MB/s wr, 7794 op/s
# ceph osd pool stats --cluster=beesly
pool data id 0
nothing is going on
pool metadata id 1
nothing is going on
pool rbd id 2
nothing is going on
pool volumes id 4
-517/54 objects misplaced (-957.407%)
recovery io 997 MB/s, 249 objects/s
client io 64512 kB/s rd, 188 MB/s wr, 4820 op/s
pool images id 5
-94/0 objects misplaced (-inf%)
recovery io 118 MB/s, 14 objects/s
client io 2617 kB/s rd, 0 op/s
pool .rgw.root id 17
nothing is going on
pool .rgw.control id 18
nothing is going on
pool .users.uid id 21
nothing is going on
pool .users.email id 22
nothing is going on
pool .users id 23
nothing is going on
pool .rgw.buckets.index id 24
client io 1312 B/s rd, 1 op/s
pool .usage id 56
nothing is going on
pool .log id 57
nothing is going on
pool .intent-log id 58
nothing is going on
pool .rgw.gc id 59
nothing is going on
pool .rgw id 60
nothing is going on
pool .rgw.buckets id 63
client io 988 kB/s wr, 1 op/s
pool test id 64
nothing is going on
pool test.os id 65
nothing is going on
pool .users.swift id 73
nothing is going on
pool cinder-critical id 75
client io 3256 kB/s rd, 10051 kB/s wr, 142 op/s
pool test.critical id 77
nothing is going on
</pre>
Ceph - Bug #13061 (Resolved): systemd: daemons restart when package is upgraded
https://tracker.ceph.com/issues/13061
2015-09-11T14:48:31Z
Dan van der Ster
<p>I just updated some ceph-mon and ceph-osd hosts from ceph-9.0.3-1460.g4290d68.el7.x86_64 to ceph-9.0.3-1572.g90cce11.el7.x86_64 and all the daemons restarted at the time of the yum updates.</p>
<p>From yum.log:</p>
<pre><code>Sep 11 16:27:43 Updated: 1:ceph-9.0.3-1572.g90cce11.el7.x86_64</code></pre>
<p>From journalctl -u ceph:</p>
<pre>
Sep 11 16:27:43 lxfsrd37a01.cern.ch systemd[1]: Stopping Ceph object storage daemon...
Sep 11 16:27:43 lxfsrd37a01.cern.ch ceph-osd[131530]: 2015-09-11 16:27:43.492935 7f01699c8700 -1 osd.0 248 *** Got signal Terminated ***
Sep 11 16:27:43 lxfsrd37a01.cern.ch ceph-osd[131530]: 2015-09-11 16:27:43.843512 7f01699c8700 -1 osd.0 248 shutdown
Sep 11 16:27:46 lxfsrd37a01.cern.ch systemd[1]: Stopped Ceph object storage daemon.
Sep 11 16:28:04 lxfsrd37a01.cern.ch systemd[1]: Starting Ceph object storage daemon...
Sep 11 16:28:04 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: getopt: unrecognized option '--setuser'
Sep 11 16:28:04 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: getopt: unrecognized option '--setgroup'
Sep 11 16:28:05 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: create-or-move updated item name 'osd.0' weight 1.6816 at location {host=lxfsrd37a01,rack=R
Sep 11 16:28:05 lxfsrd37a01.cern.ch systemd[1]: Started Ceph object storage daemon.
Sep 11 16:28:05 lxfsrd37a01.cern.ch ceph-osd[251987]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
</pre>
<p>The differs from the current (hammer) behaviour, which IIRC was agreed upon so that package auto-upgrades don't trigger daemon restarts.</p>
Ceph - Bug #11080 (Duplicate): bucket_straw2_choose div by zero when item_weight is 0
https://tracker.ceph.com/issues/11080
2015-03-10T10:57:05Z
Dan van der Ster
<p>Following <a class="issue tracker-1 status-3 priority-6 priority-high2 closed" title="Bug: crushtool -d zeroes the osd weights in straw2 buckets (Resolved)" href="https://tracker.ceph.com/issues/11079">#11079</a> we found that zero weighted items create a Floating point exception in straw2 buckets:</p>
<pre>$ gdb /usr/bin/crushtool
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/crushtool...Reading symbols from /usr/lib/debug/usr/bin/crushtool.debug...done.
done.
(gdb) run -i crush.test --test
Starting program: /usr/bin/crushtool -i crush.test --test
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7fe8700 (LWP 101803)]
[New Thread 0x7ffff712e700 (LWP 101804)]
WARNING: no output selected; use --output-csv or --show-X
Program received signal SIGFPE, Arithmetic exception.
0x000000000064b24e in bucket_straw2_choose (in=0x2a66f40, x=9, r=0) at crush/mapper.c:331
331 draw = ln / bucket->item_weights[i];
Missing separate debuginfos, use: debuginfo-install sqlite-3.6.20-1.el6.x86_64
(gdb) bt
#0 0x000000000064b24e in bucket_straw2_choose (in=0x2a66f40, x=9, r=0) at crush/mapper.c:331
#1 crush_bucket_choose (in=0x2a66f40, x=9, r=0) at crush/mapper.c:360
#2 0x000000000064b53f in crush_choose_firstn (map=0x2a2fb50, bucket=0x2a66f40, weight=0x2b3f690, weight_max=7310, x=9, numrep=1, type=0, out=0x7fffffffcdd8, outpos=0, out_size=1, tries=1, recurse_tries=0, local_retries=0, local_fallback_retries=0, recurse_to_leaf=0,
vary_r=1, out2=0x0, parent_r=0) at crush/mapper.c:467
#3 0x000000000064b8d1 in crush_choose_firstn (map=0x2a2fb50, bucket=0x2a4d680, weight=0x2b3f690, weight_max=7310, x=9, numrep=1, type=2, out=0x7fffffffcdd4, outpos=0, out_size=1, tries=51, recurse_tries=1, local_retries=0, local_fallback_retries=0, recurse_to_leaf=1,
vary_r=1, out2=0x7fffffffcdd8, parent_r=0) at crush/mapper.c:510
#4 0x000000000064c1d5 in crush_do_rule (map=0x2a2fb50, ruleno=<value optimized out>, x=9, result=0x7fffffffcdf0, result_max=1, weight=0x2b3f690, weight_max=7310, scratch=0x7fffffffcdd0) at crush/mapper.c:901
#5 0x000000000057c1b5 in CrushWrapper::do_rule (this=<value optimized out>, rule=5, x=9, out=std::vector of length 0, capacity 0, maxout=1, weight=std::vector of length 7310, capacity 8192 = {...}) at crush/CrushWrapper.h:1025
#6 0x000000000059727c in CrushTester::test (this=0x7fffffffd830) at crush/CrushTester.cc:575
#7 0x00000000004ef775 in main (argc=<value optimized out>, argv=<value optimized out>) at tools/crushtool.cc:811
(gdb) p bucket->item_weights[i]
value has been optimized out
(gdb) p bucket->h.size
$2 = 4
(gdb) p bucket->item_weights[0]
$3 = 0
(gdb) p bucket->item_weights[1]
$4 = 0
(gdb) p bucket->item_weights[2]
$5 = 0
(gdb) p bucket->item_weights[3]
$6 = 0
(gdb) p bucket->h.weight
$8 = 0
(gdb) p bucket->h.id
$9 = -200
</pre>
Ceph - Bug #11079 (Resolved): crushtool -d zeroes the osd weights in straw2 buckets
https://tracker.ceph.com/issues/11079
2015-03-10T10:43:02Z
Dan van der Ster
<p>Using: ceph-0.93-59.g3878878.el6.x86_64.rpm</p>
<p>Starting with crush.txt, having only straw buckets, we change the first host to be straw2 (see crush.txt2). Then we do:</p>
<pre><code>crushtool -c crush.txt2 -o crush.map2<br />crushtool -d crush.map2 -o crush.txt3</code></pre>
<p>In crush.txt3 you see that all the OSDs in the straw2 host become 0.000.</p>
Ceph - Bug #9487 (Resolved): dumpling: snaptrimmer causes slow requests while backfilling. osd_sn...
https://tracker.ceph.com/issues/9487
2014-09-16T01:44:44Z
Dan van der Ster
<p>Hi,<br />using dumpling 0.67.10...</p>
<p>We are doing quite a bit of backfilling these days in order to make room for some new SSD journals. I am removing 2 OSDs at a time using ceph osd crush reweight osd.N 0, and each time I do this I get slow requests which start a few minutes after the backfilling begins and end around 3 minutes later. Otherwise the backfilling completes without incident. I was able to isolate the cause of the backfilling to one single OSD which is busy with snap trimming. Here are some logs of osd.11 from this morning.</p>
<p>Backfilling starts at 2014-09-16 09:03:04.623202</p>
<p>First slow request is <br /> 2014-09-16 09:06:42.413217 osd.94 xxx:6920/89989 108 : [WRN] slow request 30.169698 seconds old, received at 2014-09-16 09:06:12.243490: osd_op(client.36356481.0:203616535 rbd_data.22cd9436e995f3.0000000000000f8f [write 2899968~20480] 4.9e398fd7 e95006) v4 currently waiting for subops from [11,1169]</p>
<p>Here is ceph-osd.11.log with debug_osd=20</p>
<p>2014-09-16 09:06:12.275675 7ff0ac575700 20 osd.11 pg_epoch: 95006 pg[5.318( v 93844'1471 (1587'13,93844'1471] local-les=95006 n=489 ec=357 les/c 95006/95006 94971/95005/95005) [11,1212,30] r=0 lpr=95005 mlcod 0'0 active+clean snaptrimq=[...] snap_trimmer slept for 0.100000<br />2014-09-16 09:06:12.435938 7ff0ac575700 10 osd.11 pg_epoch: 95006 pg[5.318( v 93844'1471 (1587'13,93844'1471] local-les=95006 n=489 ec=357 les/c 95006/95006 94971/95005/95005) [11,1212,30] r=0 lpr=95005 mlcod 0'0 active+clean snaptrimq=[...] snap_trimmer entry<br />2014-09-16 09:06:12.436803 7ff0ac575700 10 osd.11 pg_epoch: 95006 pg[5.318( v 93844'1471 (1587'13,93844'1471] local-les=95006 n=489 ec=357 les/c 95006/95006 94971/95005/95005) [11,1212,30] r=0 lpr=95005 mlcod 0'0 active+clean snaptrimq=[...] snap_trimmer posting<br />...<br />2014-09-16 09:06:12.439147 7ff0ac575700 10 osd.11 pg_epoch: 95006 pg[5.318( v 93844'1471 (1587'13,93844'1471] local-les=95006 n=489 ec=357 les/c 95006/95006 94971/95005/95005) [11,1212,30] r=0 lpr=95005 mlcod 0'0 active+clean snaptrimq=[...]] SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects: trimming snap 1<br />...<br />2014-09-16 09:06:12.446790 7ff0ac575700 10 osd.11 pg_epoch: 95006 pg[5.318( v 93844'1471 (1587'13,93844'1471] local-les=95006 n=489 ec=357 les/c 95006/95006 94971/95005/95005) [11,1212,30] r=0 lpr=95005 mlcod 0'0 active+clean snaptrimq=[...]] SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects: trimming snap 2</p>
<p>then eventually the last one three minutes later:</p>
<p>2014-09-16 09:09:04.621188 7ff0ac575700 10 osd.11 pg_epoch: 95006 pg[5.318( v 93844'1471 (1587'13,93844'1471] local-les=95006 n=489 ec=357 les/c 95006/95006 94971/95005/95005) [11,1212,30] r=0 lpr=95005 m<br />lcod 0'0 active+clean snaptrimq=[7275~1]] SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects: trimming snap 7275</p>
<p>During those three minutes, the osd was <em>only</em> snap trimmingl there are no other type of ops getting through. I even tried (re)injecting --debug_osd=20 while the osd was trimming, just to see exactly when osd.11 was responsive again. The injectargs hung until just after trimming completed:</p>
<p>2014-09-16 09:09:04.820524 7ff0b6f86700 20 osd.11 95020 _dispatch 0x1218ec80 command(tid 24: {"injected_args": ["--debug_osd 20"], "prefix": "injectargs"}) v1</p>
<p>Obviously the slow requests also disappeared just after the snap trim on osd.11 completed, and during this backfilling exercise there were no other slow requests.</p>
<p>As you can also see, the osd_snap_trim_sleep is not effective. We have it set to 0.1 but it's useless in this case anyway because the sleep only happens once at the start of trimming all of PG 5.318.</p>
<p>Full log of osd.11 is attached.</p>
<p>Do you have any suggestion here how to make this more transparent? We have ~150 more OSDs to drain so I'll have plenty of opportunities to troubleshoot this.</p>
<p>Best Regards, Dan</p>