Project

General

Profile

Actions

Bug #20222

closed

v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

Added by Nokia ceph-users almost 7 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64

Problem desc...:-- bluestore OSD's going down on 12.0.3 when the cluster is under moderate (200MB/s) load,

Below osd logs after enabling - debug bluestore = 20

0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:15.707908 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:17.410365 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:18.512773 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0)
2017-05-30 08:57:20.005812 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:20.006648 7f9d8ff13700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:20.015585 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:20.015677 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60

..<snip>..

2017-05-30 08:57:37.506457 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _do_write perform garbage collection, expected benefit = 0 AUs
2017-05-30 08:57:37.506459 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) dirty_range mark inline shard dirty
2017-05-30 08:57:37.506461 7f9d875fd700 10 bluestore(/var/lib/ceph/osd/ceph-19) _write meta #-1:7b3f43c4:::osd_superblock:0# 0x0~1f8 = 0
2017-05-30 08:57:37.506463 7f9d875fd700 10 bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d22680 cost 3104677 (2 ios * 1500000 + 104677 bytes)
2017-05-30 08:57:37.506479 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d22680 onodes 0x7f9da0ed4fc0,0x7f9da5bd8000,0x7f9da72dad80 shared_blobs
2017-05-30 08:57:37.506482 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) update #-1:7b3f43c4:::osd_superblock:0#
2017-05-30 08:57:37.506485 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) update inline shard 81 bytes from 1 extents
2017-05-30 08:57:37.506491 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:7b3f43c4:::osd_superblock:0# is 108 (21 bytes onode + 2 bytes spanning blobs + 85 bytes inline extents)
2017-05-30 08:57:37.506495 7f9d875fd700 20 bluestore.extentmap(0x7f9da5bd80f0) update #-1:a51ca935:::osdmap.848:0#
2017-05-30 08:57:37.506497 7f9d875fd700 20 bluestore.extentmap(0x7f9da5bd80f0) update inline shard 235 bytes from 2 extents
2017-05-30 08:57:37.506499 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:a51ca935:::osdmap.848:0# is 264 (23 bytes onode + 2 bytes spanning blobs + 239 bytes inline extents)
2017-05-30 08:57:37.506502 7f9d875fd700 20 bluestore.extentmap(0x7f9da72dae70) update #-1:48acce63:::inc_osdmap.848:0#
2017-05-30 08:57:37.506504 7f9d875fd700 20 bluestore.extentmap(0x7f9da72dae70) update inline shard 82 bytes from 1 extents
2017-05-30 08:57:37.506506 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:48acce63:::inc_osdmap.848:0# is 110 (22 bytes onode + 2 bytes spanning blobs + 86 bytes inline extents)
2017-05-30 08:57:37.506558 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d22680 allocated 0x[388afa0000~30000] released 0x[]
2017-05-30 08:57:37.506562 7f9d875fd700 10 freelist allocate 0x388afa0000~30000
2017-05-30 08:57:37.506563 7f9d875fd700 20 freelist _xor first_key 0x388af80000 last_key 0x388af80000
2017-05-30 08:57:37.510446 7f9d8ff13700 0 -- 10.50.63.152:6857/1940715 >> 10.50.63.153:6869/1572972 conn(0x7f9da698a800 :6857 s=STATE_OPEN pgs=243 cs=1 l=0).fault with nothing to send, going to standby
2017-05-30 08:57:37.514366 7f9d8ff13700 0 -- 10.50.63.152:6857/1940715 >> 10.50.63.153:6863/1572982 conn(0x7f9da620f800 :6857 s=STATE_OPEN pgs=149 cs=1 l=0).fault with nothing to send, going to standby
2017-05-30 08:57:37.517351 7f9d8ef11700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-05-30 08:57:37.517354 7f9d8ef11700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60 ====

config value --> threadpool_default_timeout = 60

// default timeout while caling WaitInterval on an empty queue
OPTION

I altered the above value to 300, then it serve for some time

2017-05-30 09:44:32.027168 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0)
2017-05-30 09:44:35.529164 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0)
2017-05-30 09:44:40.831130 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0)
2017-05-30 09:44:41.634056 7efc3fafa700 -1 osd.41 878 heartbeat_check: no reply from 10.50.62.152:6853 osd.30 since back 2017-05-30 09:44:21.521517 front 2017-05-30 09:44:21.521517 (cutoff 2017-05-30 09:44:21.634053)
2017-05-30 09:44:42.128280 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300
2017-05-30 09:44:42.128282 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300
2017-05-30 09:44:42.128284 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300

Ref - [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 { for more details... }

Findings.. =========

I/O not backing up, suspecting a bluestore bug...

awk 'NR==5 || /sdo/' diskstat-PL6-CN4.txt
<snip>
#Time Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util
14:17:12 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14:17:13 sdo 0 0 0 0 0 220 47 4 55 12 55 1 12 7 2
14:17:14 sdo 0 0 0 0 0 56 67 10 6 17 5 1 17 10 10
14:17:15 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14:17:16 sdo 0 0 0 0 0 740 102 19 39 25 38 5 25 4 8
14:17:17 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14:17:18 sdo 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
<snip>

Thanks


Files

osd.12.ops (136 KB) osd.12.ops Output of 'ceph daemon osd.12 ops' Thomas Coelho, 06/28/2017 12:30 PM
ops_14-osd.zip (516 KB) ops_14-osd.zip osd-14 ops report Nokia ceph-users, 09/10/2017 04:39 PM

Related issues 1 (0 open1 closed)

Related to RADOS - Bug #21171: bluestore: aio submission deadlockResolvedSage Weil08/29/2017

Actions
Actions #1

Updated by Nokia ceph-users almost 7 years ago

Note,, we are facing this issue, only with putting write load via librados with C API {http://docs.ceph.com/docs/master/rados/api/librados/} . We tested the same with rados user space utility and can't reproduce this behaviour.

Actions #2

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Need More Info

What is the workload? Is this situation easy to reproduce?

What does 'ceph daemon osd.NNN ops' report?

Actions #3

Updated by Thomas Coelho almost 7 years ago

Hello,

I have the same error with a newly deployed test cluster and a different workload. But its also an erasure coded pool on bluestore osds.

3 Nodes with 10 OSDs each (4-6 TB Disks). Ubuntu 16.04. Installed with 12.0.3 and upgraded to 12.1.0.
ec pool with default profile (2+1).
35 TB RBD Device with 100 GB Cache Tier, mounted in kernel space with Kernel 4.8 on a separate machine.

I started to import data with rsync. After some time (approx 30 min) a random osd stucks and gets down. I/O stops because of min size 3. These are the first messages:

2017-06-28 12:18:54.310519 7f47f4497700 0 log_channel(cluster) log [WRN] : 11 slow requests, 5 included below; oldest blocked for > 30.384919 secs
2017-06-28 12:18:54.310548 7f47f4497700 0 log_channel(cluster) log [WRN] : slow request 30.066817 seconds old, received at 2017-06-28 12:18:24.243510: MOSDECSubOpWrite(13.c6s1 1587/1574 ECSubWrite(tid=2
10918, reqid=osd.4.1276:411831, at_version=1587'17069, trim_to=1587'14047, roll_forward_to=1587'17067)) currently started

I attached the out put of 'ceph deamon osd.NN ops'.

Before this I tried a different ec profile: 3+2 with striping on osd level instead of hosts. But with the same result.

Hope this helps.

Actions #4

Updated by Nokia ceph-users over 6 years ago

Sage,

Able to reproduce the issue in v12.2.0

Env - 5 node , EC 4+1 , 120 OSD's
----

2017-09-10 15:39:14.154986 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dee1c700' had timed out after 180
2017-09-10 15:39:14.172555 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172565 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172568 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172570 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172572 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.201733 7f46ef63d700 0 log_channel(cluster) log [WRN] : 5304 slow requests, 5 included below; oldest blocked for > 179.924775 secs
2017-09-10 15:39:14.201746 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 120.323422 seconds old, received at 2017-09-10 15:37:13.877416: pg_update_log_missing(3.11fs3 epoch 4484/3630 rep_tid 391429 entries 4484'23206 (0'0) error 3:f88a2e73:::%2fc17%2fvx0000031%2fpot%2fchannel120%2fhls%2fh_961cb33375df003a8a5219f98f7286f3%2fvar1070000%2fseg8838993_w1504680546.ts:head by client.6592717.0:232418 0.000000 -2) currently queued_for_pg
2017-09-10 15:39:14.201749 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 60.248614 seconds old, received at 2017-09-10 15:38:13.952224: MOSDECSubOpWrite(3.4c5s2 4484/3630 ECSubWrite(tid=310904, reqid=client.6590513.0:114620, at_version=4484'23083, trim_to=3662'20000, roll_forward_to=4484'23048)) currently queued_for_pg

Load -
~~
io:
client: 9705 B/s rd, 450 MB/s wr, 6 op/s rd, 2515 op/s wr
~~

Attaching , output of "ceph daemon osd.14 ops"

Debug_bluestore =20 , log snippet..

<snip>
...

2017-09-10 15:39:13.200738 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 120.575572 seconds old, received at 2017-09-10 15:37:12.624
208: pg_update_log_missing(3.273s3 epoch 4484/3630 rep_tid 329951 entries 4484'23086 (0'0) error 3:ce54d089:::%2fc17%2fvx0000031%2fpot%2fchannel1
2%2fhls%2fh_7b5961234862649569afc8a55620930b%2fvar3640000%2fseg8837300_w1504680725.ts:head by client.6595265.0:113704 0.000000 -2) currently queued_
for_pg
2017-09-10 15:39:13.372163 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.372173 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 2017-09-10 15:39:13.445600 7f46d9611700 20 bluestore(/var/lib/ceph/osd/ceph-14) statfsstore_statfs(0x2e168720000/0x3a3792d1000, stored 0xb352296d7a/
0xc082ae0000, compress 0x0/0x0/0x0)
2017-09-10 15:39:13.572269 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572281 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572283 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572285 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.572287 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.772376 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:13.972465 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.056494 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dc617700' had timed out after 180
2017-09-10 15:39:14.056497 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dee1c700' had timed out after 180
2017-09-10 15:39:14.056595 7f46f3603700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dc617700' had timed out after 180
2017-09-10 15:39:14.154986 7f46f3e04700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f46dee1c700' had timed out after 180
2017-09-10 15:39:14.172555 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172565 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172568 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172570 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.172572 7f46e6e2c700 10 trim shard target 209 k meta/data ratios 0.01 + 0 (2147 + 0 ), current 209 k (209 k + 0 )
2017-09-10 15:39:14.201733 7f46ef63d700 0 log_channel(cluster) log [WRN] : 5304 slow requests, 5 included below; oldest blocked for > 179.924775 secs
2017-09-10 15:39:14.201746 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 120.323422 seconds old, received at 2017-09-10 15:37:13.877416: pg_update_log_missing(3.11fs3 epoch 4484/3630 rep_tid 391429 entries 4484'23206 (0'0) error 3:f88a2e73:::%2fc17%2fvx0000031%2fpot%2fchannel120%2fhls%2fh_961cb33375df003a8a5219f98f7286f3%2fvar1070000%2fseg8838993_w1504680546.ts:head by client.6592717.0:232418 0.000000 -2) currently queued_for_pg
2017-09-10 15:39:14.201749 7f46ef63d700 0 log_channel(cluster) log [WRN] : slow request 60.248614 seconds old, received at 2017-09-10 15:38:13.952224: MOSDECSubOpWrite(3.4c5s2 4484/3630 ECSubWrite(tid=310904, reqid=client.6590513.0:114620, at_version=4484'23083, trim_to=3662'20000, roll_forward_to=4484'23048)) currently queued_for_pg

12:00:01 UTC DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
02:00:01 UTC dev8-48 0.60 5.35 0.01 8.93 0.00 0.28 0.28 0.02
02:10:01 UTC dev8-48 7.38 65.19 0.00 8.83 0.00 0.08 0.08 0.06
02:20:01 UTC dev8-48 3.17 27.88 0.01 8.79 0.00 0.04 0.04 0.01
02:30:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:40:01 UTC dev8-48 36.79 41.69 4993.68 136.86 0.05 1.28 0.81 3.00
02:50:01 UTC dev8-48 61.09 85.62 8384.76 138.65 0.09 1.50 0.99 6.07
03:00:01 UTC dev8-48 59.98 90.67 8409.14 141.70 0.09 1.58 1.05 6.32
03:10:01 UTC dev8-48 59.31 97.59 8243.29 140.62 0.09 1.59 1.07 6.36
03:20:01 UTC dev8-48 64.68 107.92 8500.18 133.10 0.10 1.59 1.00 6.45
03:30:01 UTC dev8-48 68.70 179.72 8420.77 125.19 0.11 1.61 1.01 6.94
03:40:01 UTC dev8-48 83.93 4413.78 11902.07 194.41 0.40 4.76 0.83 6.94 -->> started here ->>
03:50:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04:00:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04:10:01 UTC dev8-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Actions #5

Updated by Sage Weil over 6 years ago

  • Status changed from Need More Info to Duplicate

Highly likely this is #21171; please try latest luminous branch or wait for 12.2.1 (out very very soon now!)

Actions #6

Updated by Sage Weil over 6 years ago

  • Related to Bug #21171: bluestore: aio submission deadlock added
Actions

Also available in: Atom PDF