Bug #20222: v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 - mgr - Ceph

Actions

Copy link

Bug #20222

closed

v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

Added by Nokia ceph-users almost 7 years ago. Updated over 6 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.0.0

ceph-qa-suite:

ceph-disk

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hello,

Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64

Problem desc...:-- bluestore OSD's going down on 12.0.3 when the cluster is under moderate (200MB/s) load,

Below osd logs after enabling - debug bluestore = 20

0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:15.707908 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:17.410365 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:18.512773 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:20.005812 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:20.006648 7f9d8ff13700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:20.015585 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:20.015677 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60

..<snip>..

2017-05-30 08:57:37.506457 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _do_write perform garbage collection, expected benefit = 0 AUs
2017-05-30 08:57:37.506459 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) dirty_range mark inline shard dirty
2017-05-30 08:57:37.506461 7f9d875fd700 10 bluestore(/var/lib/ceph/osd/ceph-19) _write meta #-1:7b3f43c4:::osd_superblock:0# 0x0~1f8 = 0
2017-05-30 08:57:37.506463 7f9d875fd700 10 bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d22680 cost 3104677 (2 ios * 1500000 + 104677 bytes)
2017-05-30 08:57:37.506479 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d22680 onodes 0x7f9da0ed4fc0,0x7f9da5bd8000,0x7f9da72dad80 shared_blobs
2017-05-30 08:57:37.506482 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) update #-1:7b3f43c4:::osd_superblock:0#
2017-05-30 08:57:37.506485 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) update inline shard 81 bytes from 1 extents
2017-05-30 08:57:37.506491 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:7b3f43c4:::osd_superblock:0# is 108 (21 bytes onode + 2 bytes spanning blobs + 85 bytes inline extents)
2017-05-30 08:57:37.506495 7f9d875fd700 20 bluestore.extentmap(0x7f9da5bd80f0) update #-1:a51ca935:::osdmap.848:0#
2017-05-30 08:57:37.506497 7f9d875fd700 20 bluestore.extentmap(0x7f9da5bd80f0) update inline shard 235 bytes from 2 extents
2017-05-30 08:57:37.506499 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:a51ca935:::osdmap.848:0# is 264 (23 bytes onode + 2 bytes spanning blobs + 239 bytes inline extents)
2017-05-30 08:57:37.506502 7f9d875fd700 20 bluestore.extentmap(0x7f9da72dae70) update #-1:48acce63:::inc_osdmap.848:0#
2017-05-30 08:57:37.506504 7f9d875fd700 20 bluestore.extentmap(0x7f9da72dae70) update inline shard 82 bytes from 1 extents
2017-05-30 08:57:37.506506 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:48acce63:::inc_osdmap.848:0# is 110 (22 bytes onode + 2 bytes spanning blobs + 86 bytes inline extents)
2017-05-30 08:57:37.506558 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d22680 allocated 0x[388afa0000~30000] released 0x[]
2017-05-30 08:57:37.506562 7f9d875fd700 10 freelist allocate 0x388afa0000~30000
2017-05-30 08:57:37.506563 7f9d875fd700 20 freelist _xor first_key 0x388af80000 last_key 0x388af80000
2017-05-30 08:57:37.510446 7f9d8ff13700 0 -- 10.50.63.152:6857/1940715 >> 10.50.63.153:6869/1572972 conn(0x7f9da698a800 :6857 s=STATE_OPEN pgs=243 cs=1 l=0).fault with nothing to send, going to standby
2017-05-30 08:57:37.514366 7f9d8ff13700 0 -- 10.50.63.152:6857/1940715 >> 10.50.63.153:6863/1572982 conn(0x7f9da620f800 :6857 s=STATE_OPEN pgs=149 cs=1 l=0).fault with nothing to send, going to standby
2017-05-30 08:57:37.517351 7f9d8ef11700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:37.517354 7f9d8ef11700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60 ====

config value --> threadpool_default_timeout = 60

// default timeout while caling WaitInterval on an empty queue
OPTION

I altered the above value to 300, then it serve for some time

2017-05-30 09:44:32.027168 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0)
2017-05-30 09:44:35.529164 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0)
2017-05-30 09:44:40.831130 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0)
2017-05-30 09:44:41.634056 7efc3fafa700 -1 osd.41 878 heartbeat_check: no reply from 10.50.62.152:6853 osd.30 since back 2017-05-30 09:44:21.521517 front 2017-05-30 09:44:21.521517 (cutoff 2017-05-30 09:44:21.634053)
2017-05-30 09:44:42.128280 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300
2017-05-30 09:44:42.128282 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300
2017-05-30 09:44:42.128284 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300

Ref - [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 { for more details... }

Findings.. =========

I/O not backing up, suspecting a bluestore bug...

awk 'NR==5 || /sdo/' diskstat-PL6-CN4.txt
<snip>
#Time Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util
14:17:12 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14:17:13 sdo 0 0 0 0 0 220 47 4 55 12 55 1 12 7 2
14:17:14 sdo 0 0 0 0 0 56 67 10 6 17 5 1 17 10 10
14:17:15 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14:17:16 sdo 0 0 0 0 0 740 102 19 39 25 38 5 25 4 8
14:17:17 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14:17:18 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
<snip>

Thanks

Files

Download all files

osd.12.ops (136 KB) osd.12.ops	Output of 'ceph daemon osd.12 ops'	Thomas Coelho, 06/28/2017 12:30 PM
ops_14-osd.zip (516 KB) ops_14-osd.zip	osd-14 ops report	Nokia ceph-users, 09/10/2017 04:39 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Nokia ceph-users almost 7 years ago

Note,, we are facing this issue, only with putting write load via librados with C API {http://docs.ceph.com/docs/master/rados/api/librados/} . We tested the same with rados user space utility and can't reproduce this behaviour.

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Status changed from New to Need More Info

What is the workload? Is this situation easy to reproduce?

What does 'ceph daemon osd.NNN ops' report?

Actions

Copy link

Updated by Thomas Coelho almost 7 years ago

File osd.12.ops osd.12.ops added

Hello,

I have the same error with a newly deployed test cluster and a different workload. But its also an erasure coded pool on bluestore osds.

3 Nodes with 10 OSDs each (4-6 TB Disks). Ubuntu 16.04. Installed with 12.0.3 and upgraded to 12.1.0.
ec pool with default profile (2+1).
35 TB RBD Device with 100 GB Cache Tier, mounted in kernel space with Kernel 4.8 on a separate machine.

I started to import data with rsync. After some time (approx 30 min) a random osd stucks and gets down. I/O stops because of min size 3. These are the first messages:

2017-06-28 12:18:54.310519 7f47f4497700 0 log_channel(cluster) log [WRN] : 11 slow requests, 5 included below; oldest blocked for > 30.384919 secs
2017-06-28 12:18:54.310548 7f47f4497700 0 log_channel(cluster) log [WRN] : slow request 30.066817 seconds old, received at 2017-06-28 12:18:24.243510: MOSDECSubOpWrite(13.c6s1 1587/1574 ECSubWrite(tid=2
10918, reqid=osd.4.1276:411831, at_version=1587'17069, trim_to=1587'14047, roll_forward_to=1587'17067)) currently started

I attached the out put of 'ceph deamon osd.NN ops'.

Before this I tried a different ec profile: 3+2 with striping on osd level instead of hosts. But with the same result.

Hope this helps.

Actions

Copy link

Updated by Nokia ceph-users over 6 years ago

File ops_14-osd.zip ops_14-osd.zip added

Sage,

Able to reproduce the issue in v12.2.0

Env - 5 node , EC 4+1 , 120 OSD's
----

2017-09-10 15:39:14.154986 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dee1c700' had timed out after 180
2017-09-10 15:39:14.172555 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172565 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172568 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172570 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172572 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.201733 7f46ef63d700 0 log_channel(cluster) log [WRN] : 5304 slow requests, 5 included below; oldest blocked for > 179.924775 secs
2017-09-10 15:39:14.201746 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 120.323422 seconds old, received at 2017-09-10 15:37:13.877416: pg_update_log_missing(3.11fs3 epoch 4484/3630 rep_tid 391429 entries 4484'23206 (0'0) error 3:f88a2e73:::%2fc17%2fvx0000031%2fpot%2fchannel120%2fhls%2fh_961cb33375df003a8a5219f98f7286f3%2fvar1070000%2fseg8838993_w1504680546.ts:head by client.6592717.0:232418 0.000000 -2) currently queued_for_pg
2017-09-10 15:39:14.201749 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 60.248614 seconds old, received at 2017-09-10 15:38:13.952224: MOSDECSubOpWrite(3.4c5s2 4484/3630 ECSubWrite(tid=310904, reqid=client.6590513.0:114620, at_version=4484'23083, trim_to=3662'20000, roll_forward_to=4484'23048)) currently queued_for_pg

Load -
~_{~
io:
client: 9705 B/s rd, 450 MB/s wr, 6 op/s rd, 2515 op/s wr}~~

Attaching , output of "ceph daemon osd.14 ops"

Debug_bluestore =20 , log snippet..

<snip>
...

2017-09-10 15:39:13.200738 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 120.575572 seconds old, received at 2017-09-10 15:37:12.624
208: pg_update_log_missing(3.273s3 epoch 4484/3630 rep_tid 329951 entries 4484'23086 (0'0) error 3:ce54d089:::%2fc17%2fvx0000031%2fpot%2fchannel1
2%2fhls%2fh_7b5961234862649569afc8a55620930b%2fvar3640000%2fseg8837300_w1504680725.ts:head by client.6595265.0:113704 0.000000 -2) currently queued_
for_pg
2017-09-10 15:39:13.372163 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.372173 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 2017-09-10 15:39:13.445600 7f46d9611700 20 bluestore(/var/lib/ceph/osd/ceph-14) statfsstore_statfs(0x2e168720000/0x3a3792d1000, stored 0xb352296d7a/
0xc082ae0000, compress 0x0/0x0/0x0)
2017-09-10 15:39:13.572269 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572281 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572283 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572285 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572287 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.772376 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.972465 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.056494 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dc617700' had timed out after 180
2017-09-10 15:39:14.056497 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dee1c700' had timed out after 180
2017-09-10 15:39:14.056595 7f46f3603700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dc617700' had timed out after 180
2017-09-10 15:39:14.154986 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dee1c700' had timed out after 180
2017-09-10 15:39:14.172555 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172565 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172568 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172570 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172572 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.201733 7f46ef63d700 0 log_channel(cluster) log [WRN] : 5304 slow requests, 5 included below; oldest blocked for > 179.924775 secs
2017-09-10 15:39:14.201746 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 120.323422 seconds old, received at 2017-09-10 15:37:13.877416: pg_update_log_missing(3.11fs3 epoch 4484/3630 rep_tid 391429 entries 4484'23206 (0'0) error 3:f88a2e73:::%2fc17%2fvx0000031%2fpot%2fchannel120%2fhls%2fh_961cb33375df003a8a5219f98f7286f3%2fvar1070000%2fseg8838993_w1504680546.ts:head by client.6592717.0:232418 0.000000 -2) currently queued_for_pg
2017-09-10 15:39:14.201749 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 60.248614 seconds old, received at 2017-09-10 15:38:13.952224: MOSDECSubOpWrite(3.4c5s2 4484/3630 ECSubWrite(tid=310904, reqid=client.6590513.0:114620, at_version=4484'23083, trim_to=3662'20000, roll_forward_to=4484'23048)) currently queued_for_pg

12:00:01 UTC DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
02:00:01 UTC dev8-48 0.60 5.35 0.01 8.93 0.00 0.28 0.28 0.02
02:10:01 UTC dev8-48 7.38 65.19 0.00 8.83 0.00 0.08 0.08 0.06
02:20:01 UTC dev8-48 3.17 27.88 0.01 8.79 0.00 0.04 0.04 0.01
02:30:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:40:01 UTC dev8-48 36.79 41.69 4993.68 136.86 0.05 1.28 0.81 3.00
02:50:01 UTC dev8-48 61.09 85.62 8384.76 138.65 0.09 1.50 0.99 6.07
03:00:01 UTC dev8-48 59.98 90.67 8409.14 141.70 0.09 1.58 1.05 6.32
03:10:01 UTC dev8-48 59.31 97.59 8243.29 140.62 0.09 1.59 1.07 6.36
03:20:01 UTC dev8-48 64.68 107.92 8500.18 133.10 0.10 1.59 1.00 6.45
03:30:01 UTC dev8-48 68.70 179.72 8420.77 125.19 0.11 1.61 1.01 6.94
03:40:01 UTC dev8-48 83.93 4413.78 11902.07 194.41 0.40 4.76 0.83 6.94 -->> started here ->>
03:50:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04:00:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04:10:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00