Project

General

Profile

Bug #21550

PG errors reappearing after OSD node rebooted on Luminous

Added by Eric Eastman over 6 years ago. Updated about 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have two PGs with inconsistent errors that I can repair, re-run deep-scrub to show that they are repaired, but if I reboot the server node the OSDs are on, the inconsistent errors come back. The PGs are on two different OSDs on the same server node. I am using Bluestore.

Before fixing the PGs


ceph -s
  cluster:
    id:     85a91bbe-b287-11e4-889f-001517987704
    health: HEALTH_ERR
            4 scrub errors
            Possible data damage: 2 pgs inconsistent

  services:
    mon: 3 daemons, quorum ede-c1-mon01,ede-c1-mon02,ede-c1-mon03
    mgr: ede-c1-mon01(active), standbys: ede-c1-mon03, ede-c1-mon02
    mds: cephfs-1/1/1 up  {0=ede-c1-mon01=up:active}, 1 up:standby-replay, 1 up:standby
    osd: 24 osds: 24 up, 24 in

  data:
    pools:   2 pools, 1280 pgs
    objects: 725k objects, 285 GB
    usage:   961 GB used, 450 GB / 1412 GB avail
    pgs:     1278 active+clean
             2    active+clean+inconsistent

  io:
    client:   1278 B/s rd, 2 op/s rd, 0 op/s wr

ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
    pg 1.26d is active+clean+inconsistent, acting [1,0,4]
    pg 1.337 is active+clean+inconsistent, acting [22,18,14]

ceph pg deep-scrub 1.26d
instructing pg 1.26d on osd.1 to deep-scrub

Looking at the ceph-osd.1.log

2017-09-25 16:11:01.984718 7fe9947da700  0 log_channel(cluster) log [DBG] : 1.26d deep-scrub starts
2017-09-25 16:12:17.787279 7fe9947da700 -1 log_channel(cluster) log [ERR] : 1.26d deep-scrub stat mismatch, got 643/641 objects, 1/0 clones, 643/641 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1/0 whiteouts, 318584502/317968556 bytes, 0/0 hit_set_archive bytes.
2017-09-25 16:12:17.788521 7fe9947da700 -1 log_channel(cluster) log [ERR] : 1.26d deep-scrub 1 errors

ceph pg repair 1.26d
instructing pg 1.26d on osd.1 to repair

log on ceph-osd.1.log
2017-09-25 16:14:36.012750 7fe9947da700  0 log_channel(cluster) log [DBG] : 1.26d repair starts
2017-09-25 16:16:12.344894 7fe9947da700 -1 log_channel(cluster) log [ERR] : 1.26d repair stat mismatch, got 643/641 objects, 1/0 clones, 643/641 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1/0 whiteouts, 318584502/317968556 bytes, 0/0 hit_set_archive bytes.
2017-09-25 16:16:12.346489 7fe9947da700 -1 log_channel(cluster) log [ERR] : 1.26d repair 1 errors, 1 fixed

ceph -s
  cluster:
    id:     85a91bbe-b287-11e4-889f-001517987704
    health: HEALTH_ERR
            3 scrub errors
            Possible data damage: 1 pg inconsistent

  services:
    mon: 3 daemons, quorum ede-c1-mon01,ede-c1-mon02,ede-c1-mon03
    mgr: ede-c1-mon01(active), standbys: ede-c1-mon03, ede-c1-mon02
    mds: cephfs-1/1/1 up  {0=ede-c1-mon01=up:active}, 1 up:standby-replay, 1 up:standby
    osd: 24 osds: 24 up, 24 in

  data:
    pools:   2 pools, 1280 pgs
    objects: 725k objects, 285 GB
    usage:   961 GB used, 450 GB / 1412 GB avail
    pgs:     1279 active+clean
             1    active+clean+inconsistent

  io:
    client:   1278 B/s rd, 2 op/s rd, 0 op/s wr

Shows fixed so try a deep-scrub

ceph pg deep-scrub 1.26d
2017-09-25 16:20:06.057207 7fe9947da700  0 log_channel(cluster) log [DBG] : 1.26d deep-scrub starts
2017-09-25 16:21:02.572256 7fe9947da700  0 log_channel(cluster) log [DBG] : 1.26d deep-scrub ok

Fix next PG
ceph pg deep-scrub 1.337
instructing pg 1.337 on osd.22 to deep-scrub

2017-09-25 16:25:25.100316 7f51136ef700  0 log_channel(cluster) log [DBG] : 1.337 deep-scrub starts
2017-09-25 16:26:21.394606 7f51136ef700 -1 log_channel(cluster) log [ERR] : 1.337 deep-scrub stat mismatch, got 626/624 objects, 1/0 clones, 626/624 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1/0 whiteouts, 300162500/299593458 bytes, 0/0 hit_set_archive bytes.
2017-09-25 16:26:21.395723 7f51136ef700 -1 log_channel(cluster) log [ERR] : 1.337 deep-scrub 1 errors

ceph pg repair 1.337
instructing pg 1.337 on osd.22 to repair

2017-09-25 16:28:43.124010 7f51136ef700  0 log_channel(cluster) log [DBG] : 1.337 repair starts
2017-09-25 16:29:28.562689 7f51136ef700 -1 log_channel(cluster) log [ERR] : 1.337 repair stat mismatch, got 626/624 objects, 1/0 clones, 626/624 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1/0 whiteouts, 300162500/299593458 bytes, 0/0 hit_set_archive bytes.
2017-09-25 16:29:28.563884 7f51136ef700 -1 log_channel(cluster) log [ERR] : 1.337 repair 1 errors, 1 fixed

ceph health detail
HEALTH_OK

Both these ODS are on host ede-c1-osd05. Reboot  ede-c1-osd05, Wait for it to come back up

ceph -s
  cluster:
    id:     85a91bbe-b287-11e4-889f-001517987704
    health: HEALTH_ERR
            6 scrub errors
            Possible data damage: 2 pgs inconsistent

  services:
    mon: 3 daemons, quorum ede-c1-mon01,ede-c1-mon02,ede-c1-mon03
    mgr: ede-c1-mon01(active), standbys: ede-c1-mon03, ede-c1-mon02
    mds: cephfs-1/1/1 up  {0=ede-c1-mon01=up:active}, 1 up:standby-replay, 1 up:standby
    osd: 24 osds: 24 up, 24 in

  data:
    pools:   2 pools, 1280 pgs
    objects: 725k objects, 285 GB
    usage:   959 GB used, 452 GB / 1412 GB avail
    pgs:     1278 active+clean
             2    active+clean+inconsistent

  io:
    client:   1278 B/s rd, 2 op/s rd, 0 op/s wr

root@ede-c1-adm01:~# ceph health detail
HEALTH_ERR 6 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 6 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
    pg 1.26d is active+clean+inconsistent, acting [1,0,4]
    pg 1.337 is active+clean+inconsistent, acting [22,18,14]

rerun the deep scrub
ceph pg deep-scrub 1.26d

log
2017-09-25 16:38:32.291517 7fe4580b0700  0 log_channel(cluster) log [DBG] : 1.26d deep-scrub starts
2017-09-25 16:39:27.278895 7fe4580b0700 -1 log_channel(cluster) log [ERR] : 1.26d deep-scrub stat mismatch, got 643/641 objects, 1/0 clones, 643/641 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1/0 whiteouts, 318584502/317968556 bytes, 0/0 hit_set_archive bytes.
2017-09-25 16:39:27.278943 7fe4580b0700 -1 log_channel(cluster) log [ERR] : 1.26d deep-scrub 1 errors

ceph pg deep-scrub 1.337

log
2017-09-25 16:40:35.526689 7fb701f28700  0 log_channel(cluster) log [DBG] : 1.337 deep-scrub starts
2017-09-25 16:41:31.742312 7fb701f28700 -1 log_channel(cluster) log [ERR] : 1.337 deep-scrub stat mismatch, got 626/624 objects, 1/0 clones, 626/624 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1/0 whiteouts, 300162500/299593458 bytes, 0/0 hit_set_archive bytes.
2017-09-25 16:41:31.743710 7fb701f28700 -1 log_channel(cluster) log [ERR] : 1.337 deep-scrub 1 errors

Info on the system:

ceph -v
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

OS: Ubuntu 16.04
kernel: uname -a
Linux ede-c1-adm01 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

ceph osd df tree
ID  CLASS WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS TYPE NAME                     
 -1       1.37933        -  1412G   961G   450G 68.08 1.00   - root default                  
-15       0.46011        -   471G   325G   146G 68.97 1.01   -     rack rack-1               
-16       0.23006        -   235G   166G 70503M 70.78 1.04   -         chassis chassis-1-101 
 -5       0.23006        -   235G   166G 70503M 70.78 1.04   -             host ede-c1-osd01 
  3   hdd 0.05849  1.00000 61338M 42149M 19189M 68.72 1.01 160                 osd.3         
  9   hdd 0.05849  1.00000 61338M 44204M 17134M 72.07 1.06 169                 osd.9         
 16   hdd 0.05849  1.00000 61338M 42183M 19155M 68.77 1.01 168                 osd.16        
 18   hdd 0.05460  1.00000 57242M 42218M 15024M 73.75 1.08 159                 osd.18        
-27       0.23006        -   235G   158G 79202M 67.17 0.99   -         chassis chassis-1-104 
-13       0.23006        -   235G   158G 79202M 67.17 0.99   -             host ede-c1-osd04 
  5   hdd 0.05849  1.00000 61338M 40030M 21308M 65.26 0.96 159                 osd.5         
 11   hdd 0.05849  1.00000 61338M 42602M 18736M 69.45 1.02 159                 osd.11        
 17   hdd 0.05849  1.00000 61338M 46415M 14923M 75.67 1.11 183                 osd.17        
 23   hdd 0.05460  1.00000 57242M 33009M 24233M 57.67 0.85 139                 osd.23        
-19       0.22905        -   234G   158G 78407M 67.36 0.99   -     rack rack-2               
-20       0.22905        -   234G   158G 78407M 67.36 0.99   -         chassis chassis-2-102 
 -9       0.22905        -   234G   158G 78407M 67.36 0.99   -             host ede-c1-osd02 
  2   hdd 0.05849  1.00000 61338M 40667M 20671M 66.30 0.97 165                 osd.2         
 10   hdd 0.05849  1.00000 61338M 45892M 15446M 74.82 1.10 179                 osd.10        
 14   hdd 0.05849  1.00000 61338M 40969M 20369M 66.79 0.98 162                 osd.14        
 19   hdd 0.05359  1.00000 56218M 34299M 21919M 61.01 0.90 138                 osd.19        
-23       0.69017        -   706G   478G   228G 67.72 0.99   -     rack rack-3               
-24       0.23006        -   235G   161G 75796M 68.58 1.01   -         chassis chassis-3-103 
-11       0.23006        -   235G   161G 75796M 68.58 1.01   -             host ede-c1-osd03 
  4   hdd 0.05849  1.00000 61338M 42776M 18562M 69.74 1.02 165                 osd.4         
  8   hdd 0.05849  1.00000 61338M 42429M 18909M 69.17 1.02 158                 osd.8         
 13   hdd 0.05849  1.00000 61338M 44390M 16948M 72.37 1.06 173                 osd.13        
 20   hdd 0.05460  1.00000 57242M 35867M 21375M 62.66 0.92 140                 osd.20        
-29       0.23006        -   235G   159G 78292M 67.55 0.99   -         chassis chassis-3-105 
 -7       0.23006        -   235G   159G 78292M 67.55 0.99   -             host ede-c1-osd05 
  1   hdd 0.05849  1.00000 61338M 36952M 24386M 60.24 0.88 141                 osd.1         
  7   hdd 0.05849  1.00000 61338M 39981M 21357M 65.18 0.96 153                 osd.7         
 15   hdd 0.05849  1.00000 61338M 44403M 16935M 72.39 1.06 173                 osd.15        
 22   hdd 0.05460  1.00000 57242M 41630M 15612M 72.73 1.07 165                 osd.22        
-31       0.23006        -   235G   157G 79513M 67.04 0.98   -         chassis chassis-3-106 
 -3       0.23006        -   235G   157G 79513M 67.04 0.98   -             host ede-c1-osd06 
  0   hdd 0.05849  1.00000 61338M 42302M 19036M 68.96 1.01 166                 osd.0         
  6   hdd 0.05849  1.00000 61338M 39553M 21785M 64.48 0.95 153                 osd.6         
 12   hdd 0.05849  1.00000 61338M 39856M 21482M 64.98 0.95 157                 osd.12        
 21   hdd 0.05460  1.00000 57242M 40034M 17208M 69.94 1.03 156                 osd.21        
                     TOTAL  1412G   961G   450G 68.08                                        
MIN/MAX VAR: 0.85/1.11  STDDEV: 4.63

ceph daemon osd.1 config diff
{
    "diff": {
        "current": {
            "admin_socket": "/var/run/ceph/ceph-osd.1.asok",
            "auth_client_required": "cephx",
            "auth_supported": "cephx",
            "bluefs_allocator": "stupid",
            "bluestore_allocator": "stupid",
            "bluestore_cache_size_hdd": "134217728",
            "bluestore_cache_size_ssd": "134217728",
            "cephx_cluster_require_signatures": "true",
            "clog_to_syslog": "true",
            "cluster_addr": "172.16.2.18:0/0",
            "cluster_network": "172.16.2.0/24",
            "err_to_stderr": "true",
            "err_to_syslog": "true",
            "filestore_merge_threshold": "40",
            "filestore_split_multiple": "8",
            "fsid": "85a91bbe-b287-11e4-889f-001517987704",
            "internal_safe_to_start_threads": "true",
            "keyring": "/var/lib/ceph/osd/ceph-1/keyring",
            "leveldb_log": "",
            "log_file": "/var/log/ceph/ceph-osd.1.log",
            "log_max_recent": "10000",
            "log_to_stderr": "false",
            "log_to_syslog": "true",
            "mds_data": "/var/lib/ceph/mds/ceph-1",
            "mds_standby_replay": "true",
            "mgr_data": "/var/lib/ceph/mgr/ceph-1",
            "mon_allow_pool_delete": "true",
            "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log cluster=/var/log/ceph/ceph.log",
            "mon_data": "/var/lib/ceph/mon/ceph-1",
            "mon_debug_dump_location": "/var/log/ceph/ceph-osd.1.tdump",
            "mon_host": "10.14.2.11, 10.14.2.12, 10.14.2.13",
            "mon_initial_members": "ede-c1-mon01, ede-c1-mon02, ede-c1-mon03",
            "mon_pg_warn_max_per_osd": "1000",
            "mon_pg_warn_min_per_osd": "1",
            "osd_data": "/var/lib/ceph/osd/ceph-1",
            "osd_journal": "/var/lib/ceph/osd/ceph-1/journal",
            "osd_journal_size": "1024",
            "osd_max_backfills": "2",
            "osd_objectstore": "bluestore",
            "osd_recovery_max_active": "5",
            "osd_recovery_op_priority": "2",
            "osd_scrub_end_hour": "6",
            "public_addr": "10.14.2.18:0/0",
            "public_network": "10.14.0.0/16",
            "rgw_data": "/var/lib/ceph/radosgw/ceph-1",
            "setgroup": "ceph",
            "setuser": "ceph" 
        },
        "defaults": {
            "admin_socket": "",
            "auth_client_required": "cephx, none",
            "auth_supported": "",
            "bluefs_allocator": "bitmap",
            "bluestore_allocator": "bitmap",
            "bluestore_cache_size_hdd": "1073741824",
            "bluestore_cache_size_ssd": "3221225472",
            "cephx_cluster_require_signatures": "false",
            "clog_to_syslog": "false",
            "cluster_addr": "-",
            "cluster_network": "",
            "err_to_stderr": "false",
            "err_to_syslog": "false",
            "filestore_merge_threshold": "10",
            "filestore_split_multiple": "2",
            "fsid": "00000000-0000-0000-0000-000000000000",
            "internal_safe_to_start_threads": "false",
            "keyring": "/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,",
            "leveldb_log": "/dev/null",
            "log_file": "",
            "log_max_recent": "500",
            "log_to_stderr": "true",
            "log_to_syslog": "false",
            "mds_data": "/var/lib/ceph/mds/$cluster-$id",
            "mds_standby_replay": "false",
            "mgr_data": "/var/lib/ceph/mgr/$cluster-$id",
            "mon_allow_pool_delete": "false",
            "mon_cluster_log_file": "default=/var/log/ceph/$cluster.$channel.log cluster=/var/log/ceph/$cluster.log",
            "mon_data": "/var/lib/ceph/mon/$cluster-$id",
            "mon_debug_dump_location": "/var/log/ceph/$cluster-$name.tdump",
            "mon_host": "",
            "mon_initial_members": "",
            "mon_pg_warn_max_per_osd": "300",
            "mon_pg_warn_min_per_osd": "30",
            "osd_data": "/var/lib/ceph/osd/$cluster-$id",
            "osd_journal": "/var/lib/ceph/osd/$cluster-$id/journal",
            "osd_journal_size": "5120",
            "osd_max_backfills": "1",
            "osd_objectstore": "filestore",
            "osd_recovery_max_active": "3",
            "osd_recovery_op_priority": "3",
            "osd_scrub_end_hour": "24",
            "public_addr": "-",
            "public_network": "",
            "rgw_data": "/var/lib/ceph/radosgw/$cluster-$id",
            "setgroup": "",
            "setuser": "" 
        }
    },
    "unknown": []
}

I have attached the ceph-osd logs for both OSD. I have gone through the repair and reboot 3 times now.

ceph-osd.22.log-25Sep17.gz (291 KB) Eric Eastman, 09/25/2017 09:07 PM

ceph-osd.1.log-25Sep17.gz (81.1 KB) Eric Eastman, 09/25/2017 09:07 PM

History

#1 Updated by Greg Farnum over 6 years ago

  • Project changed from RADOS to bluestore
  • Priority changed from Normal to High

Not sure if this is actually bluestore, but Sage or somebody should look at it...

#2 Updated by Sage Weil about 6 years ago

  • Status changed from New to Need More Info

Hi Eric,

Do you still see this problem? I haven't seen anything like it so I'm hoping this is an artifact of 12.2.0

#3 Updated by Eric Eastman about 6 years ago

Hi Sage,

No, I have not seen the problem on this test cluster since rebuilding it with 12.2.2, and the system has been in constant use for a couple of months now running 12.2.2. Feel free to close the ticket if you feel it may have been an issue with 12.2.0.

#4 Updated by Sage Weil about 6 years ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF