Project

General

Profile

Bug #50466

_delete_some additional unexpected onode list

Added by Dan van der Ster about 2 months ago. Updated about 1 month ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus, octopus, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After updating to 14.2.19 and then moving some PGs around we have a few warnings related to the new efficient PG removal code, e.g.

/var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
(33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[0#10:4d280000::::head#]

/var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
(33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
40670/40670/33432) [859,199,913,329,439,79]p859(0) r=-1 lpr=40670
DELETING pi=[33714,40670)/4 crt=40662'288967 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[5#10:fcc80000::::head#]

/var/log/ceph/ceph-osd.851.log:2021-04-14 20:58:14.393 7fd87adeb700  0
osd.851 pg_epoch: 40906 pg[10.2e8s3( v 40610'288991
(33782'285900,40610'288991] lb MIN (bitwise) local-lis/les=33786/33787
n=161220 ec=4937/4937 lis/c 39826/33716 les/c/f 39827/33747/0
40617/40617/39225) [717,933,727,792,607,129]p717(0) r=-1 lpr=40617
DELETING pi=[33716,40617)/3 crt=40610'288991 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[3#10:17400000::::head#]

/var/log/ceph/ceph-osd.883.log:2021-04-14 18:55:16.822 7f78c485d700  0
osd.883 pg_epoch: 40857 pg[7.d4( v 40804'9911289
(35835'9908201,40804'9911289] lb MIN (bitwise)
local-lis/les=40782/40783 n=195 ec=2063/1989 lis/c 40782/40782 les/c/f
40783/40844/0 40781/40845/40845) [877,870,894] r=-1 lpr=40845 DELETING
pi=[40782,40845)/1 crt=40804'9911289 lcod 40804'9911288 unknown NOTIFY
mbc={}] _delete_some additional unexpected onode list (new onodes has
appeared since PG removal started[#7:2b000000::::head#]

This can also lead to flapping if the OSD is particularly slow, e.g.

2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
2021-04-21 15:23:41.595 7f51a3e81700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
observed for _collection_list, latency = 67.7234s, lat = 67s cid
=10.14aes4_head start GHMAX end GHMAX max 30
2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
(bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
[171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[4#10:75280000::::head#]
2021-04-21 15:23:50.061 7f51a3e81700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
observed for submit_transact, latency = 8.46584s
2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
2021-04-21 15:23:50.115 7f51b6ca1700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
observed for _txc_committed_kv, latency = 8.51916s, txc =
0x5573928a7340
2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
: Monitor daemon marked osd.941 down, but it is still running

Workaround to avoid flapping is to increase osd_heartbeat_grace.

bluefs_buffered_io is enabled on the above cluster. I suspect it is not helping much in this case because the collection list is not fully cached by the end of a PG removal.


Related issues

Related to bluestore - Bug #50297: long osd online compaction: mon wrongly mark osd down New
Copied to RADOS - Backport #50704: nautilus: _delete_some additional unexpected onode list Resolved
Copied to RADOS - Backport #50705: octopus: _delete_some additional unexpected onode list In Progress
Copied to RADOS - Backport #50706: pacific: _delete_some additional unexpected onode list In Progress

History

#1 Updated by Dan van der Ster about 2 months ago

osd log with debugging:

ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c

#2 Updated by Konstantin Shalygin about 2 months ago

  • Source set to Community (user)
  • Affected Versions v14.2.19, v14.2.20 added

#3 Updated by Neha Ojha about 2 months ago

I think #178:ec000000::::head# is just a pgmeta object which is skipped until all other objects are removed. Note how _delete_some only reports 3 objects while there are 4 in the list below. I'm not entirely sure why we wait till the end but I don't think this behavior is new (based on a0f48db5cbfc48761eeac0daa259fecbaa573969). Maybe, this is not really an unexpected situation, Igor? How this can lead to flapping is not clear to me but we did change a delete sleep value with original patch.

2021-04-15 10:30:03.657 7fc550eb9700 20 osd.0 pg_epoch: 364385 pg[178.37( v 330198'282 (0'0,330198'282] lb MIN (bitwise) local-lis/les=364264/364265 n=3 ec=167514/167514 lis/c 364381/364264 les/c/f 364382/364265/0 364380/364384/360316) [128,16,34] r=-1 lpr=364384 DELETING pi=[364264,364384)/1 crt=330198'282 unknown NOTIFY mbc={}] _delete_some [#178:ec000000::::head#,#178:ec213ad1:::5001080715@cnsdev.9822087.000000000000007a:head#,#178:ec54a3f5:::5001080720@cnsdev.9822317.0000000000000074:head#,#178:ec84aff7:::5001080716@cnsdev.9822117.0000000000000042:head#]
2021-04-15 10:30:03.657 7fc550eb9700 20 osd.0 pg_epoch: 364385 pg[178.37( v 330198'282 (0'0,330198'282] lb MIN (bitwise) local-lis/les=364264/364265 n=3 ec=167514/167514 lis/c 364381/364264 les/c/f 364382/364265/0 364380/364384/360316) [128,16,34] r=-1 lpr=364384 DELETING pi=[364264,364384)/1 crt=330198'282 unknown NOTIFY mbc={}] _delete_some deleting 3 objects
...
2021-04-15 10:30:04.768 7fc550eb9700 20 osd.0 pg_epoch: 364385 pg[178.37( v 330198'282 (0'0,330198'282] lb MIN (bitwise) local-lis/les=364264/364265 n=3 ec=167514/167514 lis/c 364381/364264 les/c/f 364382/364265/0 364380/364384/360316) [128,16,34] r=-1 lpr=364384 DELETING pi=[364264,364384)/1 crt=330198'282 unknown NOTIFY mbc={}] _delete_some []

2021-04-15 10:30:04.768 7fc550eb9700  0 osd.0 pg_epoch: 364385 pg[178.37( v 330198'282 (0'0,330198'282] lb MIN (bitwise) local-lis/les=364264/364265 n=3 ec=167514/167514 lis/c 364381/364264 les/c/f 364382/364265/0 364380/364384/360316) [128,16,34] r=-1 lpr=364384 DELETING pi=[364264,364384)/1 crt=330198'282 unknown NOTIFY mbc={}] _delete_some additional unexpected onode list (new onodes has appeared since PG removal started[#178:ec000000::::head#]
..
2021-04-15 10:30:04.770 7fc550eb9700 10 bluestore(/var/lib/ceph/osd/ceph-0) _remove_collection oid #178:ec000000::::head#
2021-04-15 10:30:04.770 7fc550eb9700 10 bluestore(/var/lib/ceph/osd/ceph-0) _remove_collection 178.37_head = 0

#5 Updated by Konstantin Shalygin about 2 months ago

  • Backport set to nautilus, octopus, pacific

#6 Updated by Konstantin Shalygin about 2 months ago

  • Pull request ID set to 40993

#7 Updated by Konstantin Shalygin about 2 months ago

Actually, when PG objects deleted with sleep 1 (default) - NVMe is not loaded, but when pg header is deleted - is huge spike on client op latency and possibility to be marked down by mon.

2021-04-23 11:37:23.300 7f4254dc1700  0 osd.656 pg_epoch: 272109 pg[17.68a( v 270607'95012848 (270592'95009764,270607'95012848] lb MIN (bitwise) local-lis/les=269692/269693 n=3487424 ec=51990/51987 lis/c 271642/271615 les/c/f 271643/271616/0 271644/272093/270608) [775,702,708] r=-1 lpr=272093 DELETING pi=[269692,272093)/3 crt=270607'95012848 lcod 0'0 unknown NOTIFY mbc={}] _delete_some additional unexpected onode list (new onodes has appeared since PG removal started[#17:51600000::::head#]
2021-04-23 11:37:30.609 7f42545c0700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 5.8606s
2021-04-23 11:37:30.609 7f4254dc1700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.30805s
2021-04-23 11:37:30.609 7f42525bc700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.18567s
2021-04-23 11:37:30.609 7f4257dc7700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.19751s
2021-04-23 11:37:30.609 7f4256dc5700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.07916s
2021-04-23 11:37:30.609 7f42535be700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.16178s
2021-04-23 11:37:30.609 7f4258dc9700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 6.67811s
2021-04-23 11:37:30.609 7f42585c8700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 5.14165s
2021-04-23 11:37:30.609 7f42555c2700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.05395s
2021-04-23 11:37:30.609 7f4252dbd700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 7.12029s
2021-04-23 11:37:30.609 7f4255dc3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 6.81781s
2021-04-23 11:37:30.613 7f42575c6700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency slow operation observed for submit_transact, latency = 6.6591s
2021-04-23 11:37:30.613 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.31097s, txc = 0x55e3533c7200
2021-04-23 11:37:30.613 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.12324s, txc = 0x55e339263800
2021-04-23 11:37:30.613 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.08212s, txc = 0x55e27435ad00
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.20212s, txc = 0x55e2ab441b00
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.90095s, txc = 0x55e3660b8700
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.18201s, txc = 0x55e370724c00
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.09432s, txc = 0x55e35ee3f600
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.23788s, txc = 0x55e33e3a2100
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 7.22604s, txc = 0x55e3c9bd5e00
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 6.71846s, txc = 0x55e366104d00
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 6.85729s, txc = 0x55e2b468fc00
2021-04-23 11:37:30.649 7f426baa3700  0 bluestore(/var/lib/ceph/osd/ceph-656) log_latency_fn slow operation observed for _txc_committed_kv, latency = 6.69883s, txc = 0x55e2764cad00
2021-04-23 11:38:08.953 7f426c2a4700  4 rocksdb: [db/db_impl_write.cc:1436] reusing log 376455 from recycle list

#8 Updated by Dan van der Ster about 2 months ago

Konstantin Shalygin wrote:

Actually, when PG objects deleted with sleep 1 (default) - NVMe is not loaded, but when pg header is deleted - is huge spice on client op latency and possibility to be marked down by mon.

[...]

That observation is consistent that the root cause is this collection list: https://github.com/ceph/ceph/blob/v14.2.20/src/osd/PG.cc#L7150-L7156

#9 Updated by Konstantin Shalygin about 1 month ago

  • Status changed from New to Pending Backport

#10 Updated by Backport Bot about 1 month ago

  • Copied to Backport #50704: nautilus: _delete_some additional unexpected onode list added

#11 Updated by Backport Bot about 1 month ago

  • Copied to Backport #50705: octopus: _delete_some additional unexpected onode list added

#12 Updated by Backport Bot about 1 month ago

  • Copied to Backport #50706: pacific: _delete_some additional unexpected onode list added

#13 Updated by Konstantin Shalygin 7 days ago

  • Related to Bug #50297: long osd online compaction: mon wrongly mark osd down added

Also available in: Atom PDF