Project

General

Profile

Actions

Bug #65806

open

IO hangs when issuing balanced/localized reads to replica crimson osds while the pg is still peering

Added by Xuehan Xu 14 days ago. Updated 14 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The IO request in the following log never ends, this is because it's waiting for the pg to be active, but the current implementation only trigger the blocker on primary osds.

DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]): same_interval_since: 974
DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]): same_interval_since: 974
DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]) start
DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]).0: entering await_map stage
DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]).0: entered await_map stage, waiting for map
DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]).0: map epoch got 974, entering wait_for_active
DEBUG 2024-05-04 08:00:51,983 [shard 0:main] osd -  pg_epoch 974 pg[3.0( v 956'503 lc 955'501 (0'0,956'503] local-lis/les=958/959 n=7 ec=15/15 lis/c=958/946 les/c/f=959/947/0 sis=974) [3,2,1] r=2 lpr=974 pi=[946,974)/1 crt=956'503 mlcod 0'0 unknown NOTIFY  ClientRequest::with_pg_process: client_request(id=7763, detail=m=[osd_op(client.4240.0:7755 3.0 3:03eb76b7:::scephqa02.cpp.bjat.qianxin-inc.cn279848-2:head {sparse-read 0~3445732, omap-get-vals-by-keys in=4b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} RETRY=5 ondisk+retry+read+localize_reads+known_if_redirected+supports_pool_eio e974)]).0: entered wait_for_active stage, waiting for active
Actions #1

Updated by Xuehan Xu 14 days ago

  • Pull request ID set to 57279
Actions

Also available in: Atom PDF