Project

General

Profile

Bug #20041

ceph-osd: PGs getting stuck in scrub state, stalling RBD

Added by Jason Dillaman 5 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Scrub/Repair
Target version:
-
Start date:
05/22/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No
Component(RADOS):
OSD

Description

See the attached logs for the remove op against rbd_data.21aafa6b8b4567.0000000000000aaa

2017-05-17 20:12:06.943616 7fea1a7ff700 10 osd.23 pg_epoch: 181936 pg[2.aa( v 181936'11193583 (181936'11190568,181936'11193583] local-les=171705 n=4036 ec=114 les/c/f 171705/171705/0 171703/171704/171649) [23,41,9] r=0 lpr=171704 crt=181936'11193581 lcod 181936'11193582 mlcod 181936'11193582 active+clean+scrubbing snaptrimq=[...7b05~1,27b07~1,27b09~1,27b0b~1,27b0d~1,27b0f~1,27b11~2,27b15~2,27b18~1,27b1a~1,27b1c~2,27b22~1,27b24~1,27b29~2,27b2c~1,27b2e~1,27b30~1,27b32~1,27b34~1,27b36~1,27b38~1,27b3a~1,27b3f~1,27b41~1,27b43~1,27b48~1,27b4a~1,27b4c~1,27b4e~1,27b53~4,27b58~1,27b5b~1,27b5d~1,27b61~2,27b64~1,27b67~2,27b6b~1,27b6d~2,27b71~2,27b76~1,27b7d~2,27b83~1,27b86~1,27b88~1,27b8a~1,27b8c~2,27b8f~1,27b96~2,27b99~1,27b9b~1,27b9d~2,27ba0~1,27ba3~1,27ba6~1,27ba9~1,27bab~2,27bae~2,27bb2~2,27bb5~1,27bb8~2,27bbb~1,27bbe~2,27bc1~1,27bc4~1,27bc6~1,27bc8~1,27bcb~2,27bce~1,27bd0~1,27bd2~2,27bd6~2,27bd9~1,27bdc~1,27be0~1,27be3~2,27be9~1,27beb~2,27bef~1,27bf1~2,27bf6~1,27bf8~2,27c02~3,27c06~1,27c08~1,27c0a~1,27c0c~1,27c0e~1,27c10~2,27c13~2,27c16~1,27c19~2,27c1c~3,27c21~1,27c23~1,27c28~1,27c2c~1,27c2e~1,27c32~1,27c37~2,27c3d~1,27c42~1,27c45~1,27c48~3,27c52~1,27c54~1,27c56~1,27c5b~1,27c60~2,27c64~1,27c68~1,27c76~2,27c79~2,27c7c~1,27c7e~1,27c80~1,27c82~1,27c8b~2,27c8f~2,27c92~1,27c94~3,27c98~1,27c9a~1,27c9c~1,27c9f~1,27ca1~1,27ca4~3,27ca9~1,27cab~2,27cb2~3,27cb7~1,27cbb~1,27cbe~2,27cc2~2,27cc8~1,27cca~1,27ccf~1,27cd4~2,27cd8~1,27cdb~2,27cde~1,27ce0~3,27ce5~2,27ce8~2,27cec~1,27cf3~3,27cf8~1,27cfb~2,27cfe~1,27d00~1,27d06~1,27d08~1,27d0f~2,27d12~1,27d14~1,27d17~2,27d1a~1,27d20~2,27d26~2,27d2a~1,27d2e~2,27d32~1,27d34~3,27d3b~3,27d44~1,27d48~2,27d4c~2,27d50~2,27d55~f7b,28cd1~4,28cd7~1,28cd9~1,28cdb~1,28cdd~1,28cdf~1,28ce1~1,28ce3~2,28ce6~3,28ceb~1,28ced~1,28cef~2,28cf2~1,28cf5~1,28cf7~1,28cfa~1,28cfc~7,28d04~2,28d08~2,28d0c~1,28d0e~1,28d10~1,28d12~2,28d16~2,28d19~3,28d1d~1,28d1f~1,28d21~4,28d26~4,28d2b~1,28d2d~1,28d2f~1,28d31~1,28d33~1,28d35~1,28d37~4,28d3d~1,28d3f~1,28d41~1,28d43~1,28d45~1,28d47~1,28d4a~1,28d4c~1,28d4e~1,28d50~1,28d52~1,28d55~4,28d5a~1,28d5d~1,28d5f~1,28d63~1,28d65~3,28d69~1,28d6b~1,28d6d~1,28d6f~1,28d71~1,28d73~1,28d75~1,28d77~3,28d7b~1,28d7d~1,28d7f~1,28d81~1,28d83~1,28d86~1,28d88~4,28d8d~4,28d92~1,28d94~1,28d96~1,28d98~1,28d9a~1,28d9d~1,28d9f~1,28da1~1,28da3~1,28da5~1,28da7~1,28da9~1,28dad~1,28daf~1,28db1~4,28db7~1,28db9~1,28dbb~1,28dbd~2,28dc1~2,28dc5~1,28dc7~1,28dc9~1,28dcb~1,28dcd~2,28dd0~2,28dd3~1,28dd5~1,28dd7~1,28dd9~1,28ddb~4,28de0~3,28de5~3,28de9~3,28ded~1,28def~4,28df4~4,28df9~4,28dfe~4,28e03~1,28e06~1,28e08~1,28e0b~1,28e0d~1,28e0f~1,28e11~1,28e15~1,28e17~4,28e1c~1,28e1e~1,28e20~1,28e22~1,28e24~1,28e26~1,28e28~1,28e2a~1,28e2c~1,28e2e~1,28e30~1,28e32~1,28e34~1,28e36~4,28e3b~1,28e3d~1,28e3f~1,28e41~4,28e46~1,28e48~1,28e4a~1,28e4c~1,28e4e~1,28e50~3,28e54~1,28e56~1,28e58~1,28e5a~4,28e5f~1,28e61~3,28e65~1,28e68~1,28e6a~4,28e70~1,28e72~4,28e77~1,28e79~4,28e7f~1,28e81~1,28e83~1,28e85~1,28e87~1,28e8a~1,28e8c~1,28e8e~1,28e90~1,28e92~4,28e97~1,28e99~1,28e9b~1,28e9d~1,28e9f~1,28ea1~1,28ea3~1,28ea5~1,28ea7~1,28ea9~4,28eae~1,28eb0~1,28eb2~1,28eb4~1,28eb6~1,28eb9~1,28ebb~1,28ebd~1,28ec0~1,28ec5~2,28ec9~1,28ecb~1,28ece~2,28ed1~1,28ed3~2,28ed6~2,28ed9~1,28edb~1,28ede~1,28ee0~1,28ee2~1,28ee4~1,28ee6~1,28ee8~1,28eea~1,28eed~2,28ef1~2,28ef5~3,28ef9~3,28efe~1,28f00~2,28f04~2,28f07~4,28f0d~2,28f11~1,28f16~1,28f18~1,28f1b~2,28f1f~1,28f21~1,28f23~1,28f25~1,28f27~1,28f29~1,28f2b~1,28f2e~2,28f32~1,28f34~2,28f38~2,28f3c~2,28f40~2,28f44~1,28f46~1,28f48~4,28f4e~1,28f50~1,28f52~1,28f54~1,28f57~1,28f5a~1,28f5c~1,28f5e~1,28f61~2,28f64~1,28f67~3,28f6b~2,28f6e~3]] do_op osd_op(client.29745941.0:76103 2.cebed0aa rbd_data.21aafa6b8b4567.0000000000000aaa [delete] snapc 28a43=[] ack+ondisk+write+known_if_redirected e181936) v7 may_write -> write-ordered flags ack+ondisk+write+known_if_redirected

2017-05-17 20:12:06.943670 7fea1a7ff700 20 osd.23 pg_epoch: 181936 pg[2.aa( v 181936'11193583 (181936'11190568,181936'11193583] local-les=171705 n=4036 ec=114 les/c/f 171705/171705/0 171703/171704/171649) [23,41,9] r=0 lpr=171704 crt=181936'11193581 lcod 181936'11193582 mlcod 181936'11193582 active+clean+scrubbing snaptrimq=[...db~1,27add~1,27adf~1,27ae1~2,27ae6~1,27ae8~1,27aea~1,27aec~1,27aee~1,27af0~1,27af2~1,27af4~1,27af6~1,27af8~1,27afa~1,27afc~1,27afe~1,27b00~1,27b02~2,27b05~1,27b07~1,27b09~1,27b0b~1,27b0d~1,27b0f~1,27b11~2,27b15~2,27b18~1,27b1a~1,27b1c~2,27b22~1,27b24~1,27b29~2,27b2c~1,27b2e~1,27b30~1,27b32~1,27b34~1,27b36~1,27b38~1,27b3a~1,27b3f~1,27b41~1,27b43~1,27b48~1,27b4a~1,27b4c~1,27b4e~1,27b53~4,27b58~1,27b5b~1,27b5d~1,27b61~2,27b64~1,27b67~2,27b6b~1,27b6d~2,27b71~2,27b76~1,27b7d~2,27b83~1,27b86~1,27b88~1,27b8a~1,27b8c~2,27b8f~1,27b96~2,27b99~1,27b9b~1,27b9d~2,27ba0~1,27ba3~1,27ba6~1,27ba9~1,27bab~2,27bae~2,27bb2~2,27bb5~1,27bb8~2,27bbb~1,27bbe~2,27bc1~1,27bc4~1,27bc6~1,27bc8~1,27bcb~2,27bce~1,27bd0~1,27bd2~2,27bd6~2,27bd9~1,27bdc~1,27be0~1,27be3~2,27be9~1,27beb~2,27bef~1,27bf1~2,27bf6~1,27bf8~2,27c02~3,27c06~1,27c08~1,27c0a~1,27c0c~1,27c0e~1,27c10~2,27c13~2,27c16~1,27c19~2,27c1c~3,27c21~1,27c23~1,27c28~1,27c2c~1,27c2e~1,27c32~1,27c37~2,27c3d~1,27c42~1,27c45~1,27c48~3,27c52~1,27c54~1,27c56~1,27c5b~1,27c60~2,27c64~1,27c68~1,27c76~2,27c79~2,27c7c~1,27c7e~1,27c80~1,27c82~1,27c8b~2,27c8f~2,27c92~1,27c94~3,27c98~1,27c9a~1,27c9c~1,27c9f~1,27ca1~1,27ca4~3,27ca9~1,27cab~2,27cb2~3,27cb7~1,27cbb~1,27cbe~2,27cc2~2,27cc8~1,27cca~1,27ccf~1,27cd4~2,27cd8~1,27cdb~2,27cde~1,27ce0~3,27ce5~2,27ce8~2,27cec~1,27cf3~3,27cf8~1,27cfb~2,27cfe~1,27d00~1,27d06~1,27d08~1,27d0f~2,27d12~1,27d14~1,27d17~2,27d1a~1,27d20~2,27d26~2,27d2a~1,27d2e~2,27d32~1,27d34~3,27d3b~3,27d44~1,27d48~2,27d4c~2,27d50~2,27d55~f7b,28cd1~4,28cd7~1,28cd9~1,28cdb~1,28cdd~1,28cdf~1,28ce1~1,28ce3~2,28ce6~3,28ceb~1,28ced~1,28cef~2,28cf2~1,28cf5~1,28cf7~1,28cfa~1,28cfc~7,28d04~2,28d08~2,28d0c~1,28d0e~1,28d10~1,28d12~2,28d16~2,28d19~3,28d1d~1,28d1f~1,28d21~4,28d26~4,28d2b~1,28d2d~1,28d2f~1,28d31~1,28d33~1,28d35~1,28d37~4,28d3d~1,28d3f~1,28d41~1,28d43~1,28d45~1,28d47~1,28d4a~1,28d4c~1,28d4e~1,28d50~1,28d52~1,28d55~4,28d5a~1,28d5d~1,28d5f~1,28d63~1,28d65~3,28d69~1,28d6b~1,28d6d~1,28d6f~1,28d71~1,28d73~1,28d75~1,28d77~3,28d7b~1,28d7d~1,28d7f~1,28d81~1,28d83~1,28d86~1,28d88~4,28d8d~4,28d92~1,28d94~1,28d96~1,28d98~1,28d9a~1,28d9d~1,28d9f~1,28da1~1,28da3~1,28da5~1,28da7~1,28da9~1,28dad~1,28daf~1,28db1~4,28db7~1,28db9~1,28dbb~1,28dbd~2,28dc1~2,28dc5~1,28dc7~1,28dc9~1,28dcb~1,28dcd~2,28dd0~2,28dd3~1,28dd5~1,28dd7~1,28dd9~1,28ddb~4,28de0~3,28de5~3,28de9~3,28ded~1,28def~4,28df4~4,28df9~4,28dfe~4,28e03~1,28e06~1,28e08~1,28e0b~1,28e0d~1,28e0f~1,28e11~1,28e15~1,28e17~4,28e1c~1,28e1e~1,28e20~1,28e22~1,28e24~1,28e26~1,28e28~1,28e2a~1,28e2c~1,28e2e~1,28e30~1,28e32~1,28e34~1,28e36~4,28e3b~1,28e3d~1,28e3f~1,28e41~4,28e46~1,28e48~1,28e4a~1,28e4c~1,28e4e~1,28e50~3,28e54~1,28e56~1,28e58~1,28e5a~4,28e5f~1,28e61~3,28e65~1,28e68~1,28e6a~4,28e70~1,28e72~4,28e77~1,28e79~4,28e7f~1,28e81~1,28e83~1,28e85~1,28e87~1,28e8a~1,28e8c~1,28e8e~1,28e90~1,28e92~4,28e97~1,28e99~1,28e9b~1,28e9d~1,28e9f~1,28ea1~1,28ea3~1,28ea5~1,28ea7~1,28ea9~4,28eae~1,28eb0~1,28eb2~1,28eb4~1,28eb6~1,28eb9~1,28ebb~1,28ebd~1,28ec0~1,28ec5~2,28ec9~1,28ecb~1,28ece~2,28ed1~1,28ed3~2,28ed6~2,28ed9~1,28edb~1,28ede~1,28ee0~1,28ee2~1,28ee4~1,28ee6~1,28ee8~1,28eea~1,28eed~2,28ef1~2,28ef5~3,28ef9~3,28efe~1,28f00~2,28f04~2,28f07~4,28f0d~2,28f11~1,28f16~1,28f18~1,28f1b~2,28f1f~1,28f21~1,28f23~1,28f25~1,28f27~1,28f29~1,28f2b~1,28f2e~2,28f32~1,28f34~2,28f38~2,28f3c~2,28f40~2,28f44~1,28f46~1,28f48~4,28f4e~1,28f50~1,28f52~1,28f54~1,28f57~1,28f5a~1,28f5c~1,28f5e~1,28f61~2,28f64~1,28f67~3,28f6b~2,28f6e~3]] do_op: waiting for scrub

stuck-scrub-snap.txt View - ops, ops in flight etc. (74.9 KB) Anthony D'Atri, 06/29/2017 03:14 PM


Related issues

Copied to RADOS - Backport #20780: jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD Resolved
Copied to RADOS - Backport #20781: kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD Rejected

History

#1 Updated by Jason Dillaman 5 months ago

  • Description updated (diff)

#2 Updated by Stefan Priebe 5 months ago

I've seen this on scrub as well.

#3 Updated by Jason Dillaman 5 months ago

$19 = {
  reserved_peers = std::set with 0 elements, 
  reserved = false, 
  reserve_failed = false, 
  epoch_start = 444147, 
  active = true, 
  queue_snap_trim = true, 
  waiting_on = 2, 
  waiting_on_whom = std::set with 2 elements = {
    [0] = {
      osd = 37, 
      shard = {
        id = -1 '\377', 
        static NO_SHARD = {
          id = -1 '\377', 
          static NO_SHARD = <same as static member of an already seen type>
        }
      }
    },
    [1] = {
      osd = 116, 
      shard = {
        id = -1 '\377', 
        static NO_SHARD = {
          id = -1 '\377', 
          static NO_SHARD = <same as static member of an already seen type>
        }
      }
    }
  }, 
  shallow_errors = 0, 
  deep_errors = 0, 
  fixed = 0, 
  primary_scrubmap = {
    bitwise = true, 
    objects = std::map with 0 elements, 
    valid_through = {
      version = 0, 
      epoch = 0, 
      __pad = 0
    }, 
    incr_since = {
      version = 0, 
      epoch = 0, 
      __pad = 0
    }
  }, 
  received_maps = std::map with 1 elements = {
    [{
      osd = 85, 
      shard = {
        id = -1 '\377', 
        static NO_SHARD = {
          id = -1 '\377', 
          static NO_SHARD = <same as static member of an already seen type>
        }
      }
    }] = {
      bitwise = true, 
      objects = std::map with 4 elements = {
        [{
          oid = {
            name = "rbd_data.38f9f96b8b4567.", '0' <repeats 12 times>, "2478" 
          }, 
          snap = {
            val = 18446744073709551614
          }, 
          hash = 2852311351, 
          max = false, 
          nibblewise_key_cache = 1931288746, 
          hash_reverse_bits = 3968548949, 
          static POOL_META = -1, 
          static POOL_TEMP_START = -2, 
          pool = 3, 
          nspace = "", 
          key = "" 
        }] = {
          attrs = std::map with 2 elements = {
            ["_"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 117, 
              _len = 275
            },
            ["snapset"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 407, 
              _len = 39
            }
          }, 
          snapcolls = std::set with 0 elements, 
          size = 4194304, 
          omap_digest = 0, 
          digest = 0, 
          nlinks = 1, 
          negative = false, 
          digest_present = false, 
          omap_digest_present = false, 
          read_error = false, 
          stat_error = false, 
          ec_hash_mismatch = false, 
          ec_size_mismatch = false
        },
        [{
          oid = {
            name = "rbd_data.452eaf6b8b4567.", '0' <repeats 12 times>, "c7a0" 
          }, 
          snap = {
            val = 18446744073709551614
          }, 
          hash = 3145912631, 
          max = false, 
          nibblewise_key_cache = 1931290811, 
          hash_reverse_bits = 3968549341, 
          static POOL_META = -1, 
          static POOL_TEMP_START = -2, 
          pool = 3, 
          nspace = "", 
          key = "" 
        }] = {
          attrs = std::map with 2 elements = {
            ["_"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 576, 
              _len = 275
            },
            ["snapset"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 866, 
              _len = 31
            }
          }, 
          snapcolls = std::set with 0 elements, 
          size = 4194304, 
          omap_digest = 0, 
          digest = 0, 
          nlinks = 1, 
          negative = false, 
          digest_present = false, 
          omap_digest_present = false, 
          read_error = false, 
          stat_error = false, 
          ec_hash_mismatch = false, 
          ec_size_mismatch = false
        },
        [{
          oid = {
            name = "rbd_data.67de5e6b8b4567.", '0' <repeats 12 times>, "841c" 
          }, 
          snap = {
            val = 18446744073709551614
          }, 
          hash = 1271058743, 
          max = false, 
          nibblewise_key_cache = 1931291828, 
          hash_reverse_bits = 3968549842, 
          static POOL_META = -1, 
          static POOL_TEMP_START = -2, 
          pool = 3, 
          nspace = "", 
          key = "" 
        }] = {
          attrs = std::map with 2 elements = {
            ["_"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 1027, 
              _len = 275
            },
            ["snapset"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 1317, 
              _len = 39
            }
          }, 
          snapcolls = std::set with 0 elements, 
          size = 4194304, 
          omap_digest = 0, 
          digest = 0, 
          nlinks = 1, 
          negative = false, 
          digest_present = false, 
          omap_digest_present = false, 
          read_error = false, 
          stat_error = false, 
          ec_hash_mismatch = false, 
          ec_size_mismatch = false
        },
        [{
          oid = {
            name = "rbd_data.126ac06b8b4567.", '0' <repeats 12 times>, "1748" 
          }, 
          snap = {
            val = 18446744073709551614
          }, 
          hash = 3647131959, 
          max = false, 
          nibblewise_key_cache = 1931290269, 
          hash_reverse_bits = 3968550555, 
          static POOL_META = -1, 
          static POOL_TEMP_START = -2, 
          pool = 3, 
          nspace = "", 
          key = "" 
        }] = {
          attrs = std::map with 2 elements = {
            ["_"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 1486, 
              _len = 275
            },
            ["snapset"] = {
              _raw = 0x7f19a7c72f48, 
              _off = 1776, 
              _len = 31
            }
          }, 
          snapcolls = std::set with 0 elements, 
          size = 4194304, 
          omap_digest = 0, 
          digest = 0, 
          nlinks = 1, 
          negative = false, 
          digest_present = false, 
          omap_digest_present = false, 
          read_error = false, 
          stat_error = false, 
          ec_hash_mismatch = false, 
          ec_size_mismatch = false
        }
      }, 
      valid_through = {
        version = 20796862, 
        epoch = 453051, 
        __pad = 0
      }, 
      incr_since = {
        version = 0, 
        epoch = 0, 
        __pad = 0
      }
    }
  }, 
  active_rep_scrub = std::shared_ptr (empty) 0x0, 
  scrub_reg_stamp = {
    tv = {
      tv_sec = 1495334726, 
      tv_nsec = 424813646
    }
  }, 
  must_scrub = false, 
  must_deep_scrub = false, 
  must_repair = false, 
  auto_repair = false, 
  missing = std::map with 0 elements, 
  inconsistent = std::map with 0 elements, 
  authoritative = std::map with 0 elements, 
  cleaned_meta_map = {
    bitwise = true, 
    objects = std::map with 1 elements = {
      [{
        oid = {
          name = "rbd_data.5f640d6b8b4567.", '0' <repeats 12 times>, "69a2" 
        }, 
        snap = {
          val = 18446744073709551614
        }, 
        hash = 1342361911, 
        max = false, 
        nibblewise_key_cache = 1931288581, 
        hash_reverse_bits = 3968548874, 
        static POOL_META = -1, 
        static POOL_TEMP_START = -2, 
        pool = 3, 
        nspace = "", 
        key = "" 
      }] = {
        attrs = std::map with 2 elements = {
          ["_"] = {
            _raw = 0x7f196a0a4098, 
            _off = 0, 
            _len = 275
          },
          ["snapset"] = {
            _raw = 0x7f19615ba7a0, 
            _off = 0, 
            _len = 31
          }
        }, 
        snapcolls = std::set with 0 elements, 
        size = 4194304, 
        omap_digest = 0, 
        digest = 0, 
        nlinks = 1, 
        negative = false, 
        digest_present = false, 
        omap_digest_present = false, 
        read_error = false, 
        stat_error = false, 
        ec_hash_mismatch = false, 
        ec_size_mismatch = false
      }
    }, 
    valid_through = {
      version = 0, 
      epoch = 0, 
      __pad = 0
    }, 
    incr_since = {
      version = 0, 
      epoch = 0, 
      __pad = 0
    }
  }, 
  num_digest_updates_pending = 0, 
  start = {
    oid = {
      name = "rbd_data.38f9f96b8b4567.", '0' <repeats 12 times>, "2478" 
    }, 
    snap = {
      val = 0
    }, 
    hash = 2852311351, 
    max = false, 
    nibblewise_key_cache = 1931288746, 
    hash_reverse_bits = 3968548949, 
    static POOL_META = -1, 
    static POOL_TEMP_START = -2, 
    pool = 3, 
    nspace = "", 
    key = "" 
  }, 
  end = {
    oid = {
      name = "rbd_data.34bbd06b8b4567.", '0' <repeats 12 times>, "5895" 
    }, 
    snap = {
      val = 388576
    }, 
    hash = 3711095095, 
    max = false, 
    nibblewise_key_cache = 1931289565, 
    hash_reverse_bits = 3968552123, 
    static POOL_META = -1, 
    static POOL_TEMP_START = -2, 
    pool = 3, 
    nspace = "", 
    key = "" 
  }, 
  subset_last_update = {
    version = 20796861, 
    epoch = 453051, 
    __pad = 0
  }, 
  state = PG::Scrubber::WAIT_LAST_UPDATE, 
  store = std::unique_ptr<Scrub::Store> containing 0x7f196be8f200, 
  deep = false, 
  seed = 4294967295, 
  callbacks = empty std::list
}

#4 Updated by Jason Dillaman 5 months ago

  • Subject changed from ceph-osd: PGs getting stuck in deep-scrub state, stalling RBD to ceph-osd: PGs getting stuck in scrub state, stalling RBD

#5 Updated by Jason Dillaman 5 months ago

Logs available on teuthology:/home/jdillaman/osd.23.log_try_rados_rm.gz

#6 Updated by Greg Farnum 4 months ago

  • Project changed from Ceph to RADOS
  • Category set to Scrub/Repair
  • Component(RADOS) OSD added

#7 Updated by Greg Farnum 4 months ago

from thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/017869.html

[15:41:40] <jdillaman> gregsfortytwo: that thread at least
[15:42:40] <jdillaman> gregsfortytwo: Stefan provided a gcore from one of his stuck OSDs -- best I can figure from what little I know is that "ReplicatedPG::op_applied" failed to re-schedule the scrub
[15:43:10] <gregsfortytwo> yeah, getting lost somewhere in the restart machinery is not exactly a surprise
[15:43:23] <gregsfortytwo> from what that thread says, is this an rbd image with the exclusive-lock stuff?
[15:43:36] <jdillaman> gregsfortytwo: yeah -- but that's a red herring
[15:43:57] <gregsfortytwo> yeah, exclusive-lock probably doesn't matter for anything except the header?
[15:43:50] <jdillaman> gregsfortytwo: correct
[15:44:15] <gregsfortytwo> bummer, it would be an obvious way they differ that could have caused a leak
[15:44:25] <jdillaman> gregsfortytwo: there workload involves nightly fstrims which causes GBs of discards on the OSDs
[15:44:51] <jdillaman> gregsfortytwo: we somehow missed optimizing away the discards using the object map (fixed now), so the OSDs are getting hammered w/ lots of non-existent object deletions
[15:45:15] <gregsfortytwo> ah
[15:45:21] <gregsfortytwo> shouldn't be the problem here though
[15:45:17] <jdillaman> "shouldn't"
[15:45:34] <gregsfortytwo> although actually
[15:45:41] <gregsfortytwo> that's another place to explore
[15:45:49] <gregsfortytwo> if they are getting deletes on non-existent objects
[15:45:51] <gregsfortytwo> but the range is being scrubbed
[15:45:55] <gregsfortytwo> we might not be waking them up
[15:45:54] <jdillaman> yes -- that object that is stuck is 100% non-existent
[15:46:11] <gregsfortytwo> okay

#8 Updated by Stefan Priebe 4 months ago

Shouldn't this one be flagged as a regression? It was working fine under firefly and hammer.

#9 Updated by Jason Dillaman 4 months ago

Second reported case from mailing list of VMs locking up -- they also have VMs issuing periodic discards.

#10 Updated by Kenneth Van Alstyne 4 months ago

Anecdotally, it looks like I may be running into this very same issue (or something similar) -- occasionally I have some guest OSs in a small KVM cluster that crash due to what appears to be I/O timeouts. I can restart the guest after killing and restarting the offending OSD. If I just let it sit, the scrub appears to never complete. See below log snippets for an example of this behavior:

Jun 18 00:10:40 ceph-storage-1 ceph-status: 2017-06-18 00:10:36.698603 osd.46 [WRN] slow request 61440.279601 seconds old, received at 2017-06-17 07:06:36.418924: replica scrub(pg: 0.17b,from:0'0,to:448426'3686567,epoch:448426,start:0:dea9007c:::rbd_data.876fde6b8b4567.0000000000000b03:623a7,end:0:dea9007c:::rbd_data.876fde6b8b4567.0000000000000b03:631a7,chunky:1,deep:0,seed:4294967295,version:6) currently reached_pg
Jun 18 17:14:41 ceph-storage-1 ceph-status: 2017-06-18 17:14:37.298747 osd.46 [WRN] slow request 122880.879746 seconds old, received at 2017-06-17 07:06:36.418924: replica scrub(pg: 0.17b,from:0'0,to:448426'3686567,epoch:448426,start:0:dea9007c:::rbd_data.876fde6b8b4567.0000000000000b03:623a7,end:0:dea9007c:::rbd_data.876fde6b8b4567.0000000000000b03:631a7,chunky:1,deep:0,seed:4294967295,version:6) currently reached_pg
Jun 20 03:22:41 ceph-storage-1 ceph-status: 2017-06-20 03:22:36.552952 osd.46 [WRN] slow request 245760.133962 seconds old, received at 2017-06-17 07:06:36.418924: replica scrub(pg: 0.17b,from:0'0,to:448426'3686567,epoch:448426,start:0:dea9007c:::rbd_data.876fde6b8b4567.0000000000000b03:623a7,end:0:dea9007c:::rbd_data.876fde6b8b4567.0000000000000b03:631a7,chunky:1,deep:0,seed:4294967295,version:6) currently reached_pg

  1. ceph health detail
    HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests
    1 ops are blocked > 536871 sec on osd.46
    1 osds have slow requests
  1. ceph -s
    cluster <f-s-i-d>
    health HEALTH_WARN
    1 requests are blocked > 32 sec
    monmap e3: 3 mons at {ceph-storage-1=255.255.255.1:6789/0,ceph-storage-2=255.255.255.2:6789/0,ceph-storage-3=255.255.255.3:6789/0}
    election epoch 600, quorum 0,1,2 ceph-storage-1,ceph-storage-2,ceph-storage-3
    fsmap e116: 1/1/1 up {0=ceph-storage-3=up:active}, 2 up:standby
    osdmap e448502: 50 osds: 50 up, 50 in
    flags require_jewel_osds
    pgmap v54174730: 2880 pgs, 3 pools, 3999 GB data, 2670 kobjects
    32083 GB used, 55420 GB / 87504 GB avail
    2879 active+clean
    1 active+clean+scrubbing
    client io 0 B/s rd, 4380 kB/s wr, 36 op/s rd, 162 op/s wr
  1. ceph -v
    ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

#11 Updated by Stefan Priebe 4 months ago

Anything i could provide or test? VMs are still crashing every night...

#12 Updated by Josh Durgin 4 months ago

  • Priority changed from High to Urgent

#13 Updated by Stefan Priebe 4 months ago

@josh is this related to #19497?

#14 Updated by Anthony D'Atri 4 months ago

We've experienced at least three distinct cases of ops stuck for long periods of time on a scrub. The attached file shows op details (didn't want to inline a thousand lines). Restarting the OSD resolved the immediate situation.

  1. ceph health detail
    HEALTH_WARN 8 requests are blocked > 32 sec; 1 osds have slow requests
    2 ops are blocked > 67108.9 sec on osd.397
    3 ops are blocked > 33554.4 sec on osd.397
    1 ops are blocked > 16777.2 sec on osd.397
    1 ops are blocked > 8388.61 sec on osd.397
    1 ops are blocked > 4194.3 sec on osd.397
    1 osds have slow requests
  1. ceph -v ; dpkg -l | grep ceph
    ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
    ii ceph 10.2.6-1trusty amd64 distributed storage and file system
    ii ceph-base 10.2.6-1trusty amd64 common ceph daemon libraries and management tools
    ii ceph-common 10.2.6-1trusty amd64 common utilities to mount and interact with a ceph storage cluster
    ii ceph-mds 10.2.6-1trusty amd64 metadata server for the ceph distributed file system
    ii ceph-mon 10.2.6-1trusty amd64 monitor server for the ceph storage system
    ii ceph-osd 10.2.6-1trusty amd64 OSD server for the ceph storage system
    ii libcephfs1 10.2.6-1trusty amd64 Ceph distributed file system client library
    ii python-cephfs 10.2.6-1trusty amd64 Python libraries for the Ceph libcephfs library
  2. uname -r
    3.19.0-49-generic

#15 Updated by Anthony D'Atri 4 months ago

Anthony D'Atri wrote:

We've experienced at least three distinct cases of ops stuck for long periods of time on a scrub; on at least two of them snap activity has been associated. The attached file shows op details (didn't want to inline a thousand lines). Restarting the OSD resolved the immediate situation.

  1. ceph health detail
    HEALTH_WARN 8 requests are blocked > 32 sec; 1 osds have slow requests
    2 ops are blocked > 67108.9 sec on osd.397
    3 ops are blocked > 33554.4 sec on osd.397
    1 ops are blocked > 16777.2 sec on osd.397
    1 ops are blocked > 8388.61 sec on osd.397
    1 ops are blocked > 4194.3 sec on osd.397
    1 osds have slow requests
  1. ceph -v ; dpkg -l | grep ceph
    ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
    ii ceph 10.2.6-1trusty amd64 distributed storage and file system
    ii ceph-base 10.2.6-1trusty amd64 common ceph daemon libraries and management tools
    ii ceph-common 10.2.6-1trusty amd64 common utilities to mount and interact with a ceph storage cluster
    ii ceph-mds 10.2.6-1trusty amd64 metadata server for the ceph distributed file system
    ii ceph-mon 10.2.6-1trusty amd64 monitor server for the ceph storage system
    ii ceph-osd 10.2.6-1trusty amd64 OSD server for the ceph storage system
    ii libcephfs1 10.2.6-1trusty amd64 Ceph distributed file system client library
    ii python-cephfs 10.2.6-1trusty amd64 Python libraries for the Ceph libcephfs library
  2. uname -r
    3.19.0-49-generic

#16 Updated by David Zafman 3 months ago

Was the osd log saved for the primary of a stuck PG in this state? Can this be reproduced and provide an osd log?

#17 Updated by David Zafman 3 months ago

  • Status changed from New to Need More Info
  • Assignee set to David Zafman

#18 Updated by David Zafman 3 months ago

Based on a the earlier information:

subset_last_update = {
version = 20796861,
epoch = 453051,
__pad = 0
},
state = PG::Scrubber::WAIT_LAST_UPDATE,

It can be inferred that last_update_applied < scrubber.subset_last_update so scrub is waiting. Previously, if log levels were at 15 or above, this message would have been logged:

"wait for writes to flush"

I don't find any logs which correspond to that dump of scrubber info.

#19 Updated by David Zafman 3 months ago

Stefan Priebe wrote:

Anything i could provide or test? VMs are still crashing every night...

Can you reproduce this with osd log level set to 20?

#20 Updated by Stefan Priebe 3 months ago

Hello @david,

the best logs i could produce with level 20 i sent to @Jason Dillaman 2 month ago (pgp encrypted). Reproducing is not easy as it happens once or twice a night out of 2500 dis images. I can't run the osds with level 20 the whole night.

We're doing in a row:
- purge all snapshots
- fstrim
- sync in VM
- create new snapshot
- export snapshot

#21 Updated by Jason Dillaman 3 months ago

@Stefan: just for clarification, I believe the gpg-encrypted ceph-post-file dump was the gcore of the OSD and a Debian package list to recreating the gdb environment:

ceph-post-file: e3c22f4a-2570-4b86-b641-cdd7971b0227

You find the debian jessie pkgs here:

ceph-post-file: 401f2ed4-05e3-4db3-a67b-33fb8a42ad89

#22 Updated by Stefan Priebe 3 months ago

@Jason: argh yes this seems to be correct.

So it seems i didn't have any logs. Currently no idea how to generate them.

#23 Updated by David Zafman 3 months ago

  • Status changed from Need More Info to In Progress

I think I've reproduced this, examining logs.

#24 Updated by Stefan Priebe 3 months ago

@David
That would be so great! I'm happy to test any patch ;-)

#25 Updated by David Zafman 3 months ago

Analysis:

Secondary got scrub map request with scrub_to 1748'25608

2017-07-17 19:34:19.517667 7f8ef4dd4700 10 osd.2 1749 dequeue_op 0x7f8f1b50c100 prio 127 cost 0 latency 0.000104 replica scrub(pg: 3.0,from:0'0,to:1748'26508,epoch:1749/22,start:3:32dea4f3:::ROBJ147:0,end:3:3ee60b2f:::ROBJ379:6bd,chunky:1,deep:0,seed:4294967295,version:7) v7 pg pg[3.0( v 1749'26511 (1497'23504,1749'26511] local-lis/les=22/23 n=506 ec=22/22 lis/c 22/22 les/c/f 23/23/0 22/22/22) [1,2] r=1 lpr=23 luod=0'0 lua=1748'26507 crt=1749'26511 lcod 1748'26509 active]
2017-07-17 19:34:19.517682 7f8ef4dd4700 20 osd.2 1749 share_map osd.1 127.0.0.1:6806/97634 1749
2017-07-17 19:34:19.517685 7f8ef4dd4700 20 osd.2 1749 should_share_map osd.1 127.0.0.1:6806/97634 1749
2017-07-17 19:34:19.517692 7f8ef4dd4700 10 osd.2 pg_epoch: 1749 pg[3.0( v 1749'26511 (1497'23504,1749'26511] local-lis/les=22/23 n=506 ec=22/22 lis/c 22/22 les/c/f 23/23/0 22/22/22) [1,2] r=1 lpr=23 luod=0'0 lua=1748'26507 crt=1749'26511 lcod 1748'26509 active] handle_message: 0x7f8f1b50c100
2017-07-17 19:34:19.517701 7f8ef4dd4700  7 osd.2 pg_epoch: 1749 pg[3.0( v 1749'26511 (1497'23504,1749'26511] local-lis/les=22/23 n=506 ec=22/22 lis/c 22/22 les/c/f 23/23/0 22/22/22) [1,2] r=1 lpr=23 luod=0'0 lua=1748'26507 crt=1749'26511 lcod 1748'26509 active] replica_scrub
2017-07-17 19:34:19.517707 7f8ef4dd4700 10 osd.2 pg_epoch: 1749 pg[3.0( v 1749'26511 (1497'23504,1749'26511] local-lis/les=22/23 n=506 ec=22/22 lis/c 22/22 les/c/f 23/23/0 22/22/22) [1,2] r=1 lpr=23 luod=0'0 lua=1748'26507 crt=1749'26511 lcod 1748'26509 active] waiting for last_update_applied to catch up

But the transaction that created clone 6bd include the write to head object and applied as 1748'26509

2017-07-17 19:34:19.836747 7f8f035f1700 10 osd.2 pg_epoch: 1749 pg[3.0( v 1749'26516 (1497'23504,1749'26516] local-lis/les=22/23 n=508 ec=22/22 lis/c 22/22 les/c/f 23/23/0 22/22/22) [1,2] r=1 lpr=23 luod=0'0 lua=1748'26507 crt=1749'26516 lcod 1749'26514 active] repop_applied on 0x7f8f216fe718 op osd_repop(client.12963.0:1 3.0 e1748/22 3:3ee60b2f:::ROBJ379:head v 1748'26509) v

The chunky_scrub() code has (last_update_applied >= scrubber.subset_last_update) for the primary, so op_applied() should also have >= tests.

#26 Updated by David Zafman 3 months ago

@Stefan A patch for Jewel (current on current jewel branch) is can be found here:

https://github.com/ceph/ceph/pull/16405/commits/483be4136d4497aad30d659cdcecf6bd4d6b452a

Are you able to build a version with this change based on what you are are already running?

#27 Updated by Stefan Priebe 3 months ago

Yes i'm - builing right now. But it will take some time to publish that one to the clusters.

#28 Updated by David Zafman 3 months ago

  • Backport set to jewel, kraken

#29 Updated by Stefan Priebe 3 months ago

Updatet two of my clusters - will report back. Thanks again.

#30 Updated by David Zafman 3 months ago

  • Status changed from In Progress to Testing

#31 Updated by Stefan Priebe 3 months ago

Currently it looks good. Will wait until monday to be sure.

#32 Updated by Stefan Priebe 3 months ago

Still everything fine. No new hanging scrub but getting a lot of scrub pg errors which i need to repair manually. Not sure whether this is related.

#33 Updated by David Zafman 3 months ago

  • Status changed from Testing to Pending Backport

#35 Updated by David Zafman 3 months ago

  • Copied to Backport #20780: jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD added

#36 Updated by David Zafman 3 months ago

  • Copied to Backport #20781: kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD added

#39 Updated by Nathan Cutler about 1 month ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF