Bug #22123
closedosd: objecter sends out of sync with pg epochs for proxied ops
0%
Description
2017-11-14 00:37:04.496 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7810 3.0 3:0b332f33:::smithi1441403-241 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [sparse-read 0~2033680,omap-get-vals-by-keys,omap-get-keys,omap-get-vals,omap-get-header,getxattrs] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected e95) v8) prio 63 cost 32 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7812 3.0 3:0b332f33:::smithi1441403-241 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [read 0~2033680,checksum] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected e95) v8) prio 63 cost 4 e115) 2017-11-14 00:37:04.496 7f251983e700 10 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] cancel_proxy_ops 3:1288273c:::smithi1441403-126:head requeuing 1 requests 2017-11-14 00:37:04.496 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c7800 2017-11-14 00:37:04.496 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7827 3.0 3:1288273c:::smithi1441403-126:head [stat,delete] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 0 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7811 3.0 3:0b332f33:::smithi1441403-241 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [read 0~2033680,checksum] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected e95) v8) prio 63 cost 4 e115) 2017-11-14 00:37:04.496 7f251983e700 10 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] cancel_proxy_ops 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head requeuing 4 requests 2017-11-14 00:37:04.496 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c9d80 2017-11-14 00:37:04.496 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7826 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+rwordered+known_if_redirected e95) v8) prio 63 cost 0 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7810 3.0 3:0b332f33:::smithi1441403-241 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [sparse-read 0~2033680,omap-get-vals-by-keys,omap-get-keys,omap-get-vals,omap-get-header,getxattrs] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected e95) v8) prio 63 cost 32 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c7b00 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7825 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [setxattr _header (356),truncate 3193660] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 363 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7827 3.0 3:1288273c:::smithi1441403-126:head [stat,delete] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 0 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c8b80 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7824 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [write 2037739~583136] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 583136 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7826 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+rwordered+known_if_redirected e95) v8) prio 63 cost 0 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c9c00 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7823 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [write 793660~503513,stat] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 503513 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7825 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [setxattr _header (356),truncate 3193660] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 363 e115) 2017-11-14 00:37:04.497 7f251983e700 10 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] cancel_proxy_ops 3:27587c4d:::smithi1441403-404:head requeuing 4 requests 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c9480 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7809 3.0 3:27587c4d:::smithi1441403-404:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+rwordered+known_if_redirected e95) v8) prio 63 cost 0 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7824 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [write 2037739~583136] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 583136 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c8580 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7808 3.0 3:27587c4d:::smithi1441403-404:head [setxattr _header (56),truncate 3160046] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 63 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7823 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [write 793660~503513,stat] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 503513 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea9a9aa80 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7807 3.0 3:27587c4d:::smithi1441403-404:head [write 2037128~695820] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 695820 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7809 3.0 3:27587c4d:::smithi1441403-404:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+rwordered+known_if_redirected e95) v8) prio 63 cost 0 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea9a9a780 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7806 3.0 3:27587c4d:::smithi1441403-404:head [write 760046~753015,stat] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 753015 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7808 3.0 3:27587c4d:::smithi1441403-404:head [setxattr _header (56),truncate 3160046] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 63 e115) 2017-11-14 00:37:04.497 7f251983e700 10 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] cancel_proxy_ops 3:3d6e4248:::smithi1441403-422:head requeuing 5 requests 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c6c00 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7822 3.0 3:3d6e4248:::smithi1441403-422:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+rwordered+known_if_redirected e95) v8) prio 63 cost 0 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7807 3.0 3:27587c4d:::smithi1441403-404:head [write 2037128~695820] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 695820 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c8e80 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7821 3.0 3:3d6e4248:::smithi1441403-422:head [setxattr _header (56),truncate 3176853] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 63 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7806 3.0 3:27587c4d:::smithi1441403-404:head [write 760046~753015,stat] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 753015 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c8a00 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7820 3.0 3:3d6e4248:::smithi1441403-422:head [write 2836307~340546] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 340546 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7822 3.0 3:3d6e4248:::smithi1441403-422:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+rwordered+known_if_redirected e95) v8) prio 63 cost 0 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c7c80 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7819 3.0 3:3d6e4248:::smithi1441403-422:head [write 1979257~439478] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 439478 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7821 3.0 3:3d6e4248:::smithi1441403-422:head [setxattr _header (56),truncate 3176853] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 63 e115) 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 pg_epoch: 115 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=96/97 n=51 ec=20/20 lis/c 96/96 les/c/f 97/97/0 115/115/95) [7]/[2,7] r=0 lpr=115 pi=[96,115)/1 crt=97'367 mlcod 0'0 remapped] requeue_op 0x55fea90c9600 2017-11-14 00:37:04.497 7f251983e700 20 osd.2 op_wq(0) _enqueue_front OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7818 3.0 3:3d6e4248:::smithi1441403-422:head [write 776853~428264,stat] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 428264 e115) shuffled w/ OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7820 3.0 3:3d6e4248:::smithi1441403-422:head [write 2836307~340546] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 340546 e115) ... 2017-11-14 00:37:04.499 7f2514834700 20 osd.2 op_wq(0) _process OpQueueItem(3.0 PGOpItem(op=osd_op(client.4264.0:7823 3.0 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [write 793660~503513,stat] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected e95) v8) prio 63 cost 503513 e115) queued
/a/sage-2017-11-13_22:25:20-rados-wip-sage-testing-2017-11-13-0823-distro-basic-smithi/1844914
Updated by Sage Weil over 6 years ago
enqueue_front
client.4264.0:7826
client.4264.0:7825
client.4264.0:7824
client.4264.0:7823
enqueue_front shuffle
client.4264.0:7809 with client.4264.0:7824 (!!! this should be 7823 !!!)
client.4264.0:7808 with client.4264.0:7823
...
Updated by Sage Weil over 6 years ago
- Subject changed from osd: misordered ops after enqueue_front shuffling to osd: objecter sends out of sync with pg epochs for proxied ops
- Priority changed from Urgent to High
Okay, the shuffling is actually fine.
Real problem appears to be that objecter osdmap updates race with pg map updates, and the requeueing of proxied ops can race with teh objecter atcually sending those ops:
- start proxy op A (epoch E)
- start proxy op B (epoch E)
- start proxy op C (epoch E)
- objecter sends none of these because target is down in epoch E
- osdmap update to E+1
- pg cancels requeues A, B
- objecter updates to E+1
- objecter sends C
- pg cancels/requeues C
actually, it's not even that, it's harder:
- pg cancels A, B
- object receives backoff release, sends C
- pg cancels C
2017-11-14 00:37:44.817 7fbba2be1700 10 osd.7 pg_epoch: 134 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=116/117 n=51 ec=20/20 lis/c 116/116 les/c/f 117/117/0 134/134/116) [5,7]/[7,2] r=0 lpr=134 pi=[116,134)/1 crt=97'367 lcod 97'366 mlcod 0'0 remapped] cancel_proxy_write 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 2017-11-14 00:37:44.817 7fbba2be1700 10 osd.7 pg_epoch: 134 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=116/117 n=51 ec=20/20 lis/c 116/116 les/c/f 117/117/0 134/134/116) [5,7]/[7,2] r=0 lpr=134 pi=[116,134)/1 crt=97'367 lcod 97'366 mlcod 0'0 remapped] cancel_proxy_write 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 2017-11-14 00:37:44.817 7fbba53e6700 1 -- 172.21.15.1:0/14841 <== osd.3 172.21.15.144:6812/27580 2 ==== osd_backoff(2.0 unblock id 1 [2:00000000::::head,2:40000000::::head) e134) v1 ==== 115+0+0 (1476801449 0 0) 0x55bfc5eae780 con 0x55bfc52a6800 2017-11-14 00:37:44.817 7fbba2be1700 10 osd.7 pg_epoch: 134 pg[3.0( v 97'367 (0'0,97'367] local-lis/les=116/117 n=51 ec=20/20 lis/c 116/116 les/c/f 117/117/0 134/134/116) [5,7]/[7,2] r=0 lpr=134 pi=[116,134)/1 crt=97'367 lcod 97'366 mlcod 0'0 remapped] cancel_proxy_write 3:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 2017-11-14 00:37:44.818 7fbba53e6700 1 -- 172.21.15.1:0/14841 --> 172.21.15.144:6812/27580 -- osd_op(client.4264.0:7825 2.0 2:14993ae1:::smithi1441403-11 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [setxattr _header (356),truncate 3193660] snapc 0=[] RETRY=1 ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected e134) v8 -- ?+0 0x55bfc8c0c680 con 0x55bfc52a6800
one fix might be to simply not cancel proxy ops since we can't cancel them all atomically.
another fix might be to generate a list of all ops to cancel and cancel them all atomically in one objecter call.
Updated by Sage Weil over 6 years ago
- Project changed from Ceph to RADOS
- Category set to Tiering
Updated by Greg Farnum over 6 years ago
We could also cancel them in reverse order; that won't break ordering and doesn't require new interfaces.
Updated by David Zafman about 6 years ago
- Related to Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi added
Updated by David Zafman about 6 years ago
- Related to deleted (Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi)
Updated by David Zafman about 6 years ago
- Has duplicate Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi added
Updated by David Zafman about 6 years ago
- Has duplicate Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inrados-jewel-distro-basic-smithi added
Updated by Nathan Cutler about 6 years ago
- Related to Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi added
Updated by Sage Weil about 6 years ago
- Status changed from 12 to Fix Under Review
https://github.com/ceph/ceph/pull/20484
I opted for the marginally more complex solution of cancelling multiple ops atomically. The objecter change to do it is trivial.
Updated by Sage Weil about 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to luminous
Updated by Nathan Cutler about 6 years ago
- Backport changed from luminous to luminous, jewel
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23075: luminous: osd: objecter sends out of sync with pg epochs for proxied ops added
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23076: jewel: osd: objecter sends out of sync with pg epochs for proxied ops added
Updated by Nathan Cutler about 6 years ago
- Status changed from Pending Backport to Resolved
Updated by Nathan Cutler about 6 years ago
Despite the jewel backport of this fix being merged, this problem has reappeared in jewel 10.2.11 integration testing. See #23598
Updated by Nathan Cutler about 6 years ago
- Related to Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade test added