Project

General

Profile

Actions

Bug #10347

closed

duplicate OSD in acting set

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the map-for-loicd osdmap extracted from a ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) cluster the osd 7 shows wtice in the acting set. This happened after changing pg_num from 12 to 128 on an erasure coded pool k=7, m=2

./osdmaptool --test-map-pg 19.dd /tmp/map-for-loicd 
./osdmaptool: osdmap file '/tmp/map-for-loicd'
 parsed '19.dd' -> 19.dd
19.dd raw ([2,1,6,0,5,8,2147483647,7,4], p2) up ([2,1,6,0,5,8,2147483647,7,4], p2) acting ([2,1,6,0,5,8,7,7,4], p2)


Files

map-for-loicd (11 KB) map-for-loicd OSD map Loïc Dachary, 12/17/2014 03:11 AM
Actions #1

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary over 9 years ago

./osdmaptool --print /tmp/map-for-loicd 
./osdmaptool: osdmap file '/tmp/map-for-loicd'
epoch 5541
fsid eb26d697-03f5-4122-9f81-8c08ec680fe4
created 2014-03-19 12:32:51.564046
modified 2014-12-17 11:57:59.766630
flags 

pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 min_read_recency_for_promote 1 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 min_read_recency_for_promote 1 stripe_width 0
pool 17 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 5313 flags hashpspool stripe_width 0
pool 19 'ECpool7et2' erasure size 9 min_size 7 crush_ruleset 8 object_hash rjenkins pg_num 256 pgp_num 256 last_change 5337 lfor 5326 flags hashpspool tiers 20 read_tier 20 write_tier 20 stripe_width 4256
pool 20 'cacheFor7et2' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 5349 flags hashpspool,incomplete_clones tier_of 19 cache_mode writeback target_bytes 25000000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0

max_osd 9
osd.0 up   in  weight 1 up_from 5523 up_thru 5540 down_at 5519 last_clean_interval [3593,5518) 172.20.107.161:6800/3669 172.20.113.161:6800/3669 172.20.113.161:6801/3669 172.20.107.161:6801/3669 exists,up ebcc7f5a-d77c-4e43-a69e-acbeebe55338
osd.1 up   in  weight 1 up_from 5509 up_thru 5540 down_at 5507 last_clean_interval [5311,5506) 172.20.106.161:6800/9232 172.20.112.161:6800/9232 172.20.112.161:6801/9232 172.20.106.161:6801/9232 exists,up 93561c85-1ce7-4465-8da9-da8493cb323a
osd.2 up   in  weight 1 up_from 5497 up_thru 5540 down_at 5495 last_clean_interval [3580,5495) 172.20.108.161:6800/2824 172.20.114.161:6800/2824 172.20.114.161:6801/2824 172.20.108.161:6801/2824 exists,up e5451dc2-d294-416b-9d4a-7ffa453cb985
osd.3 up   in  weight 1 up_from 5526 up_thru 5540 down_at 5521 last_clean_interval [4859,5520) 172.20.107.162:6800/13703 172.20.113.162:6800/13703 172.20.113.162:6801/13703 172.20.107.162:6801/13703 exists,up b7d1efb1-0dc0-4ab5-bcbd-b5cfc444c4c0
osd.4 up   in  weight 1 up_from 5513 up_thru 5540 down_at 5511 last_clean_interval [3577,5510) 172.20.106.162:6800/22491 172.20.112.162:6801/22491 172.20.112.162:6802/22491 172.20.106.162:6801/22491 exists,up 9ddb6c30-2a54-4278-88d4-25c4668fc072
osd.5 up   in  weight 1 up_from 5503 up_thru 5540 down_at 5499 last_clean_interval [4686,5498) 172.20.108.162:6800/32616 172.20.114.162:6800/32616 172.20.114.162:6801/32616 172.20.108.162:6801/32616 exists,up 8dca3ace-0d45-4406-8b39-ecadbd6940b3
osd.6 up   in  weight 1 up_from 5528 up_thru 5540 down_at 5525 last_clean_interval [5002,5524) 172.20.107.163:6800/13204 172.20.113.163:6801/13204 172.20.113.163:6802/13204 172.20.107.163:6801/13204 exists,up 519f9b0d-71e8-49d1-b271-5473c0871587
osd.7 up   in  weight 1 up_from 5540 up_thru 5540 down_at 5538 last_clean_interval [5536,5537) 172.20.106.163:6800/15101 172.20.112.163:6800/15101 172.20.112.163:6801/15101 172.20.106.163:6801/15101 exists,up c7a5850d-fccb-482d-ada7-84075ab5968a
osd.8 up   in  weight 1 up_from 5505 up_thru 5540 down_at 5501 last_clean_interval [3587,5500) 172.20.108.163:6800/8386 172.20.114.163:6800/8386 172.20.114.163:6802/8386 172.20.108.163:6801/8386 exists,up ba4c2135-e859-43ee-a8bb-7f8d9874ed53

pg_temp 19.68 [2,7,8,5,1,0,6,4,1]
pg_temp 19.9a [1,0,1,2,6,8,7,4,3]
pg_temp 19.a8 [2,6,8,3,1,4,7,7,0]
pg_temp 19.c1 [2,0,5,8,1,8,6,7,3]
pg_temp 19.dd [2,1,6,0,5,8,7,7,4]
Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from 12 to Need More Info

Increasing choose tries as follows will ask CRUSH to try harder to find OSDs to map to the PG.

rule ECpool7et2 {
    ruleset 8
    type erasure
    min_size 3
    max_size 20
    step set_chooseleaf_tries 5
        step set_choose_tries 200
    step take default
    step choose indep 0 type osd
    step emit
}

before changing this, the mapping sometime fails:
$ ./osdmaptool --export-crush /tmp/crush /tmp/map-for-loicd 
./osdmaptool: osdmap file '/tmp/map-for-loicd'
./osdmaptool: exported crush map to /tmp/crush
$ ./crushtool -i /tmp/crush --test --show-bad-mappings --rule 8 --num-rep 9 --min-x 1 --max-x 128
bad mapping rule 8 x 43 num_rep 9 result [3,2,7,1,2147483647,8,5,6,0]
bad mapping rule 8 x 79 num_rep 9 result [6,0,2,1,4,7,2147483647,5,8]

after decompiling + modifying to set choose_tries to 200 + compiling again, testing the mapping over 1 million values does not show any mapping failure:
$ crushtool -i /tmp/crushfixed --test --show-bad-mappings --rule 8 --num-rep 9 --min-x 1 --max-x $((1024 * 1024))

Actions #4

Updated by Loïc Dachary over 9 years ago

  • Status changed from Need More Info to In Progress
ceph-mon-lmb-E-1:~# ceph pg dump | grep 19.dd
dumped all in format plain
19.dd    222    0    0    444    0    924292480    117    117    active+remapped    2014-12-17 11:57:59.926205    5346'3398    5541:6735    [2,1,6,0,5,8,2147483647,7,4]    2    [2,1,6,0,5,8,7,7,4]    2    0'0    2014-12-16 10:20:27.249480    0'0    2014-12-16 10:20:27.249480
ceph-mon-lmb-E-1:~# ceph --version
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
ceph-mon-lmb-E-1:~# ceph pg 19.dd query
{ "state": "active+remapped",
  "snap_trimq": "[]",
  "epoch": 5541,
  "up": [
        2,
        1,
        6,
        0,
        5,
        8,
        2147483647,
        7,
        4],
  "acting": [
        2,
        1,
        6,
        0,
        5,
        8,
        7,
        7,
        4],
  "actingbackfill": [
        "0(3)",
        "1(1)",
        "2(0)",
        "4(8)",
        "5(4)",
        "6(2)",
        "7(6)",
        "7(7)",
        "8(5)"],
  "info": { "pgid": "19.dds0",
      "last_update": "5346'3398",
      "last_complete": "5346'3398",
      "log_tail": "5346'3281",
      "last_user_version": 6870,
      "last_backfill": "MAX",
      "purged_snaps": "[]",
      "history": { "epoch_created": 5320,
          "last_epoch_started": 5541,
          "last_epoch_clean": 5541,
          "last_epoch_split": 0,
          "same_up_since": 5540,
          "same_interval_since": 5540,
          "same_primary_since": 5497,
          "last_scrub": "0'0",
          "last_scrub_stamp": "2014-12-16 10:20:27.249480",
          "last_deep_scrub": "0'0",
          "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
          "last_clean_scrub_stamp": "0.000000"},
      "stats": { "version": "5346'3398",
          "reported_seq": "6735",
          "reported_epoch": "5541",
          "state": "active+remapped",
          "last_fresh": "2014-12-17 11:57:59.926977",
          "last_change": "2014-12-17 11:57:59.926205",
          "last_active": "2014-12-17 11:57:59.926977",
          "last_clean": "2014-12-16 17:43:39.416657",
          "last_became_active": "0.000000",
          "last_unstale": "2014-12-17 11:57:59.926977",
          "last_undegraded": "2014-12-17 11:57:59.926977",
          "last_fullsized": "2014-12-17 11:57:59.926977",
          "mapping_epoch": 5538,
          "log_start": "5346'3281",
          "ondisk_log_start": "5346'3281",
          "created": 5320,
          "last_epoch_clean": 5541,
          "parent": "0.0",
          "parent_split_bits": 0,
          "last_scrub": "0'0",
          "last_scrub_stamp": "2014-12-16 10:20:27.249480",
          "last_deep_scrub": "0'0",
          "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
          "last_clean_scrub_stamp": "0.000000",
          "log_size": 117,
          "ondisk_log_size": 117,
          "stats_invalid": "1",
          "stat_sum": { "num_bytes": 924292480,
              "num_objects": 222,
              "num_object_clones": 0,
              "num_object_copies": 1998,
              "num_objects_missing_on_primary": 0,
              "num_objects_degraded": 0,
              "num_objects_misplaced": 444,
              "num_objects_unfound": 0,
              "num_objects_dirty": 222,
              "num_whiteouts": 0,
              "num_read": 0,
              "num_read_kb": 0,
              "num_write": 222,
              "num_write_kb": 902629,
              "num_scrub_errors": 0,
              "num_shallow_scrub_errors": 0,
              "num_deep_scrub_errors": 0,
              "num_objects_recovered": 0,
              "num_bytes_recovered": 0,
              "num_keys_recovered": 0,
              "num_objects_omap": 0,
              "num_objects_hit_set_archive": 0,
              "num_bytes_hit_set_archive": 0},
          "stat_cat_sum": {},
          "up": [
                2,
                1,
                6,
                0,
                5,
                8,
                2147483647,
                7,
                4],
          "acting": [
                2,
                1,
                6,
                0,
                5,
                8,
                7,
                7,
                4],
          "blocked_by": [],
          "up_primary": 2,
          "acting_primary": 2},
      "empty": 0,
      "dne": 0,
      "incomplete": 0,
      "last_epoch_started": 5541,
      "hit_set_history": { "current_last_update": "0'0",
          "current_last_stamp": "0.000000",
          "current_info": { "begin": "0.000000",
              "end": "0.000000",
              "version": "0'0"},
          "history": []}},
  "peer_info": [
        { "peer": "0(3)",
          "pgid": "19.dds3",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 7728,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "1(1)",
          "pgid": "19.dds1",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 6870,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3398",
              "reported_seq": "6401",
              "reported_epoch": "5496",
              "state": "active+undersized+degraded+remapped",
              "last_fresh": "2014-12-16 21:01:03.365617",
              "last_change": "2014-12-16 21:01:03.363852",
              "last_active": "2014-12-16 21:01:03.365617",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 21:01:03.365617",
              "last_undegraded": "2014-12-16 21:01:02.654014",
              "last_fullsized": "2014-12-16 21:01:02.654014",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5496,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 117,
              "ondisk_log_size": 117,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1998,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 0,
                  "num_objects_misplaced": 666,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "4(8)",
          "pgid": "19.dds8",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 6870,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "5(4)",
          "pgid": "19.dds4",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 6870,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "6(2)",
          "pgid": "19.dds2",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 6870,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "7(6)",
          "pgid": "19.dds6",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 7728,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "7(7)",
          "pgid": "19.dds7",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 6870,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}},
        { "peer": "8(5)",
          "pgid": "19.dds5",
          "last_update": "5346'3398",
          "last_complete": "5346'3398",
          "log_tail": "5346'3281",
          "last_user_version": 7728,
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 5320,
              "last_epoch_started": 5541,
              "last_epoch_clean": 5541,
              "last_epoch_split": 0,
              "same_up_since": 5540,
              "same_interval_since": 5540,
              "same_primary_since": 5497,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "5346'3397",
              "reported_seq": "6392",
              "reported_epoch": "5346",
              "state": "active+remapped",
              "last_fresh": "2014-12-16 17:58:41.455840",
              "last_change": "2014-12-16 17:44:21.999907",
              "last_active": "2014-12-16 17:58:41.455840",
              "last_clean": "2014-12-16 17:43:39.416657",
              "last_became_active": "0.000000",
              "last_unstale": "2014-12-16 17:58:41.455840",
              "last_undegraded": "2014-12-16 17:58:41.455840",
              "last_fullsized": "2014-12-16 17:58:41.455840",
              "mapping_epoch": 5538,
              "log_start": "5346'3281",
              "ondisk_log_start": "5346'3281",
              "created": 5320,
              "last_epoch_clean": 5342,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2014-12-16 10:20:27.249480",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 116,
              "ondisk_log_size": 116,
              "stats_invalid": "1",
              "stat_sum": { "num_bytes": 924292480,
                  "num_objects": 222,
                  "num_object_clones": 0,
                  "num_object_copies": 1989,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 952,
                  "num_objects_misplaced": 442,
                  "num_objects_unfound": 0,
                  "num_objects_dirty": 222,
                  "num_whiteouts": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 222,
                  "num_write_kb": 902629,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0,
                  "num_objects_omap": 0,
                  "num_objects_hit_set_archive": 0,
                  "num_bytes_hit_set_archive": 0},
              "stat_cat_sum": {},
              "up": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    2147483647,
                    7,
                    4],
              "acting": [
                    2,
                    1,
                    6,
                    0,
                    5,
                    8,
                    7,
                    7,
                    4],
              "blocked_by": [],
              "up_primary": 2,
              "acting_primary": 2},
          "empty": 0,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 5541,
          "hit_set_history": { "current_last_update": "0'0",
              "current_last_stamp": "0.000000",
              "current_info": { "begin": "0.000000",
                  "end": "0.000000",
                  "version": "0'0"},
              "history": []}}],
  "recovery_state": [
        { "name": "Started\/Primary\/Active",
          "enter_time": "2014-12-17 11:57:59.853375",
          "might_have_unfound": [],
          "recovery_progress": { "backfill_targets": [],
              "waiting_on_backfill": [],
              "last_backfill_started": "0\/\/0\/\/-1",
              "backfill_info": { "begin": "0\/\/0\/\/-1",
                  "end": "0\/\/0\/\/-1",
                  "objects": []},
              "peer_backfill_info": [],
              "backfills_in_flight": [],
              "recovering": [],
              "pg_backend": { "recovery_ops": [],
                  "read_ops": []}},
          "scrub": { "scrubber.epoch_start": "0",
              "scrubber.active": 0,
              "scrubber.block_writes": 0,
              "scrubber.waiting_on": 0,
              "scrubber.waiting_on_whom": []}},
        { "name": "Started",
          "enter_time": "2014-12-17 11:57:58.771996"}],
Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Rejected
<sjusthm> loicd: an appropriately odd crush map could result in an acting set with a repeated OSD, but only for an EC pool
<sjusthm> crush failed to fill in the acting set spot
<sjusthm> and the primary found an acceptable replacement
<sjusthm> which happened to be on the same osd as another shard
<sjusthm> so the new primary happened to find an existing shard which would fit, and requested a pg temp
<sjusthm> which happened to have a duplicate
<sjusthm> it's not really a duplicate, it just means that the osd actually has two copies of the pg
<sjusthm> one for each shard
<sjusthm> remember: with an EC pool, the different positions in the acting set actually have different data
<sjusthm> loicd: that's why most of the OSD internals deal with PG's as pg_shard_t (pair<pg_t, shard_it_t>) so that an OSD can have two copies of the same pg and have it work
<sjusthm> loicd: it's also why ghobjects exist: operations on the filestore need the shard_id_t to avoid different pg shards on the same osd clobbering each other's objects
<sjusthm> by contrast, any replicated pg has the same value for the shard: NO_SHARD
<sjusthm> so regardless of the acting set position, it maps to the same pg
<sjusthm> shard
<sjusthm> and there can only be one
<sjusthm> imagine an EC pg with acting set [0,1,2]
<sjusthm> which then changes to [2,1,0]
<sjusthm> osd 2 must create a pg temp with [0,1,2] until the new shard on 2 is backfilled
<sjusthm> then the temp becomes [2,1,2]
<sjusthm> and then [2,1,0]
Actions

Also available in: Atom PDF