Project

General

Profile

Actions

Bug #55668

open

osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound

Added by jianwei zhang almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

problem:

ec(4+2) The original 3 nodes, open ec folding, the ratio is 2, expand 2 nodes to the cluster
ec(2+1) The original 3 nodes, open ec folding, the ratio is 2, expand 2 nodes to the cluster
all osds status of ceph cluster are exists&up&in
but some PGs have been stuck in recovery_unfound state and cannot be recovered

root cause:

The pg acting set has missing objects that needs to recover.
But the source osd of the missing objects is in the backfill_targets(up) set,
Since the source osd in the backfill_targets(up) set needs to restart backfill, set the pg info.last_backfill to MIN,
Since the pg info.last_backfill of source osd is MIN, when MissingLoc::add_source_info, filter the source osd that has missing object,
These missing objects cannot recover and the pg status changes to recovery_unfound.

fix description:

pgid : pg.id
objname : ceph pg list_unfound
example:
ceph pg 3.20 list_unfound
{
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
        {
            "oid": {
                "oid": "202000000000095.00000005",
                "key": "",
                "snapid": -2,
                "hash": 1807777312,
                "max": 0,
                "pool": 3,
                "namespace": "" 
            },
            "need": "303'1222",
            "have": "0'0",
            "flags": "none",
            "clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1",
            "locations": [
                "1(5)",
                "46(1)" 
            ]
        }
    ],
    "more": false
}

# ceph pg 3.20 find_unfound_object 202000000000095.00000005
pg has 3:047e03d6:::202000000000095.00000005:head object unfound  missing 303'1222 flags = none clean_offsets: [], clean_omap: 0, new_object: 1 old_location 1(5),46(1) new_location 1(5),32(2),34(0),42(4),46(1) and find all might have unfound source

Actions #2

Updated by jianwei zhang almost 2 years ago

The problem was first described at this link:
https://tracker.ceph.com/issues/54966?next_issue_id=54965

Actions #3

Updated by jianwei zhang almost 2 years ago

reproduce:
1. 5 hosts
2. cephfs (replica pool + ec pool (allow_ec_overwrite))
3. 3 hosts create ec(2+1) data pool  and 3 replicas meta pool for base_cluster
4. running vdbench test file case
5. expand remain 2 hosts to cluster, data pool and meta pool, tips: osd one by one join up&in to cluster
6. all osds is up && in
7. backfill and recovery over

result:
ceph -s
11 active+recovery_unfound+degraded+remapped

analyse:
I found a reason, and I need everyone to discuss it together

Through the ceph log,pg and object information on the disk, 
The 3 ec shard data of the desired eversion of ec(2+1) are storaged on the disks, and are not lost. 
The problem lies in the missing object after the peering is completed. In the process of adding the source osd required for recovery, it will be skipped the source osd that because the named serial number of the object in the pg is greater than the last_backfill(MIN), and it is not added to the source osd set, source osds not enough ec(2) for recover, resulting in recovery_unfound.

problem code:

MissingLoc::add_source_info
{
    for ()
    ...
        if (p->first >= oinfo.last_backfill) {
            // FIXME: this is _probably_ true, although it could conceivably
            // be in the undefined region!  Hmm!
            ldout(cct, 10) << "search_for_missing " << soid << " " << need << " also missing on osd." << fromosd
                           << " (past last_backfill " << oinfo.last_backfill << ")" << dendl;
            continue;
        }
     ...
}

commit b99e135848ca5666308344cf5ecc9c7f95f30137
Author: Sage Weil <sage@inktank.com>
Date:   Mon Dec 5 17:25:21 2011 -0800

    osd: make backfill (basically) work again

    Still need to handle concurrent updates, log recovery vs backfill, etc.

    Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Actions #4

Updated by jianwei zhang almost 2 years ago

evidence :
(Take an recovery_unfound object as an example) : 101000000000133.0000001d

# ceph pg 3.2f2 list_unfound
        {
            "oid": {
                "oid": "101000000000133.0000001d",
                "key": "",
                "snapid": -2,
                "hash": 1021279986,
                "max": 0,
                "pool": 3,
                "namespace": "" 
            },
            "need": "374'872",
            "have": "0'0",
            "flags": "none",
            "clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1",
            "locations": [
                "36(0)" 
            ]
        }

# /etc/ceph/ceph.conf  debug_osd = 30/30
# filter add source osd for missing object after peering
[root@node124 ceph]# grep -e Primary -e 101000000000133.0000001d ceph-osd.57.log|grep -E "3\.2f2.*transitioning to Primary|search_for_missing.*101000000000133.0000001d" 
2022-03-19T14:04:42.223+0800 7f82f2028700  1 osd.57 pg_epoch: 5030 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5003/5004 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 mbc={}] state<Start>: transitioning to Primary
2022-03-19T14:04:43.303+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.57(0)
2022-03-19T14:04:43.303+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.37(2)
2022-03-19T14:04:43.304+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.41(1)                                                                                                        
2022-03-19T14:04:43.305+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.41(2) (past last_backfill MIN)                                                                                ///3.2f2s2.101000000000133.0000001d.osd41.object_info_t(on disk  object_info)            //However, because the serial number of the object is greater than the last_backfill of osd.41(2), it was skipped and not added to the source osd

2022-03-19T14:04:43.306+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.59(1) (past last_backfill MIN)                                                                                ///3.2f2s1.101000000000133.0000001d.osd59.object_info_t(on disk object_info)            ///However, because the serial number of the object is greater than the last_backfill of osd.59(1), it was skipped and not added to the source osd

2022-03-19T14:04:43.307+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.76(0) (past last_backfill MIN)
2022-03-19T14:04:43.344+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.44(1) (last_update 346'424 < needed 374'872)
2022-03-19T14:04:43.348+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={0={(0+1)=17},1={(0+0)=17},2={(0+0)=17}}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 is on osd.36(0)                                                                             ///3.2f2s0.101000000000133.0000001d.osd36.object_info_t(on disk object_info)             //Only osd.36(0) meets all the conditions and is added to the source osd

2022-03-19T14:04:43.362+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={0={(0+1)=18},1={(0+0)=18},2={(0+0)=18}}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.57(1) (last_update 362'664 < needed 374'872)
2022-03-19T14:04:43.365+0800 7f82ef823700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={0={(0+1)=18},1={(0+0)=18},2={(0+0)=18}}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.75(0) (past last_backfill MIN)
2022-03-19T14:04:43.378+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={0={(0+1)=18},1={(0+0)=18},2={(0+0)=18}}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.39(2)
2022-03-19T14:04:43.384+0800 7f82f482d700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={0={(0+1)=18},1={(0+0)=18},2={(0+0)=18}}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.48(1) (last_update 351'468 < needed 374'872)
2022-03-19T14:04:43.387+0800 7f82f2028700 10 osd.57 pg_epoch: 5031 pg[3.2f2s0( v 785'2826 lc 0'0 (503'2242,785'2826] local-lis/les=5030/5031 n=230 ec=198/198 lis/c=4985/282 les/c/f=4986/283/0 sis=5030) [76,59,41]/[57,41,37]p57(0) backfill=[41(2),59(1),76(0)] r=0 lpr=5030 pi=[282,5030)/9 crt=785'2826 mlcod 0'0 activating+degraded+remapped m=18 u=18 mbc={0={(0+1)=18},1={(0+0)=18},2={(0+0)=18}}] search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.73(0) (past last_backfill MIN)

ceph-object-tool: (object_info_t and check bluestore)

 3.2f2]# for i in $(ls -tlhr|grep 101000000000133.0000001d|grep -v -e '0 Mar 19'|grep -e osd|awk '{print $NF}'); do echo $i; ceph-dencoder import $i type object_info_t decode dump_json;done
3.2f2s0.101000000000133.0000001d.osd36.object_info_t
{
    "oid": {
        "oid": "101000000000133.0000001d",
        "key": "",
        "snapid": -2,
        "hash": 1021279986,
        "max": 0,
        "pool": 3,
        "namespace": "" 
    },
    "version": "374'872",
    "prior_version": "374'871",
    "last_reqid": "client.133562.0:359556",
    "user_version": 872,
    "size": 4194304,
    "mtime": "2022-03-18T11:41:39.433611+0800",
    "local_mtime": "2022-03-18T11:41:39.436761+0800",
    "lost": 0,
    "flags": [
        "dirty" 
    ],
    "truncate_seq": 0,
    "truncate_size": 0,
    "data_digest": "0xffffffff",
    "omap_digest": "0xffffffff",
    "expected_object_size": 0,
    "expected_write_size": 0,
    "alloc_hint_flags": 0,
    "manifest": {
        "type": 0
    },
    "watchers": {}
}

3.2f2s2.101000000000133.0000001d.osd41.object_info_t
{
    "oid": {
        "oid": "101000000000133.0000001d",
        "key": "",
        "snapid": -2,
        "hash": 1021279986,
        "max": 0,
        "pool": 3,
        "namespace": "" 
    },
    "version": "374'872",
    "prior_version": "374'871",
    "last_reqid": "client.133562.0:359556",
    "user_version": 872,
    "size": 4194304,
    "mtime": "2022-03-18T11:41:39.433611+0800",
    "local_mtime": "2022-03-18T11:41:39.436761+0800",
    "lost": 0,
    "flags": [
        "dirty" 
    ],
    "truncate_seq": 0,
    "truncate_size": 0,
    "data_digest": "0xffffffff",
    "omap_digest": "0xffffffff",
    "expected_object_size": 0,
    "expected_write_size": 0,
    "alloc_hint_flags": 0,
    "manifest": {
        "type": 0
    },
    "watchers": {}
}

3.2f2s1.101000000000133.0000001d.osd59.object_info_t
{
    "oid": {
        "oid": "101000000000133.0000001d",
        "key": "",
        "snapid": -2,
        "hash": 1021279986,
        "max": 0,
        "pool": 3,
        "namespace": "" 
    },
    "version": "374'872",
    "prior_version": "374'871",
    "last_reqid": "client.133562.0:359556",
    "user_version": 872,
    "size": 4194304,
    "mtime": "2022-03-18T11:41:39.433611+0800",
    "local_mtime": "2022-03-18T11:41:39.436761+0800",
    "lost": 0,
    "flags": [
        "dirty" 
    ],
    "truncate_seq": 0,
    "truncate_size": 0,
    "data_digest": "0xffffffff",
    "omap_digest": "0xffffffff",
    "expected_object_size": 0,
    "expected_write_size": 0,
    "alloc_hint_flags": 0,
    "manifest": {
        "type": 0
    },
    "watchers": {}
}

node121 ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-36/ --mountpoint /mnt/fuse-osd36 &
node122 ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-41/ --mountpoint /mnt/fuse-osd41    &
node124 ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-59/ --mountpoint /mnt/fuse-osd59    &

[root@node121 0#3:4f7efb3c:::101000000000133.0000001d:head#]# md5sum /mnt/fuse-osd36/3.2f2s0_head/all/0#3:4f7efb3c:::101000000000133.0000001d:head#/data
ea003c1e3b13d63d6ed67d0595e8ff1b  /mnt/fuse-osd36/3.2f2s0_head/all/0#3:4f7efb3c:::101000000000133.0000001d:head#/data

[root@node122 2#3:4f7efb3c:::101000000000133.0000001d:head#]# md5sum /mnt/fuse-osd41/3.2f2s2_head/all/2#3:4f7efb3c:::101000000000133.0000001d:head#/data
52ed9a463dd46982113bf748e7f0c4fe  /mnt/fuse-osd41/3.2f2s2_head/all/2#3:4f7efb3c:::101000000000133.0000001d:head#/data

[root@node124 1#3:4f7efb3c:::101000000000133.0000001d:head#]# md5sum /mnt/fuse-osd59/3.2f2s1_head/all/1#3:4f7efb3c:::101000000000133.0000001d:head#/data
f339660aa3a182d89be8110d67024a04  /mnt/fuse-osd59/3.2f2s1_head/all/1#3:4f7efb3c:::101000000000133.0000001d:head#/data

ceph-object-tool: dump pg info and pglog, then parse

[root@node121 3.2f2]# python3 parse_data.py 101000000000133.0000001d *.log
['3.2f2s0.osd36.log', '3.2f2s0.osd57.log', '3.2f2s0.osd73.log', '3.2f2s0.osd75.log', '3.2f2s0.osd76.log', '3.2f2s1.osd41.log', '3.2f2s1.osd44.log', '3.2f2s1.osd48.log', '3.2f2s1.osd57.log', '3.2f2s1.osd59.log', '3.2f2s2.osd37.log', '3.2f2s2.osd39.log', '3.2f2s2.osd41.log']
========================missing object===========================
3.2f2s0.osd36.log
3.2f2s0.osd57.log
{'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
3.2f2s0.osd73.log
3.2f2s0.osd75.log
3.2f2s0.osd76.log
3.2f2s1.osd41.log
{'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
3.2f2s1.osd44.log
3.2f2s1.osd48.log
3.2f2s1.osd57.log
3.2f2s1.osd59.log
3.2f2s2.osd37.log
{'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
3.2f2s2.osd39.log
{'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
3.2f2s2.osd41.log
========================pg log object==========================
3.2f2s0.osd36.log [0'0, 375'884]
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'833", 'prior_version': "0'0", 'reqid': 'client.133721.0:366904', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:03.843215+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': True, 'ops': [{'code': 'CREATE'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[1048576~18446744073708503039]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'834", 'prior_version': "374'833", 'reqid': 'client.133721.0:366972', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:03.967153+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'APPEND', 'old_size': 1048576}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~1048576,2097152~18446744073707454463]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'835", 'prior_version': "374'834", 'reqid': 'client.133721.0:367041', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:04.090498+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'APPEND', 'old_size': 2097152}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~2097152,3145728~18446744073706405887]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'836", 'prior_version': "374'835", 'reqid': 'client.133721.0:367114', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:04.210199+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'APPEND', 'old_size': 3145728}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~3145728,4194304~18446744073705357311]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'853", 'prior_version': "374'836", 'reqid': 'client.114246.0:360904', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.208243+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 853, 'snaps': '[0,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[1048576~18446744073708503039]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'854", 'prior_version': "374'853", 'reqid': 'client.114246.0:360926', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.256403+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 854, 'snaps': '[524288,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~1048576,2097152~18446744073707454463]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'855", 'prior_version': "374'854", 'reqid': 'client.114246.0:360958', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.295928+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 855, 'snaps': '[1048576,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~2097152,3145728~18446744073706405887]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'856", 'prior_version': "374'855", 'reqid': 'client.114246.0:360973', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.323967+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 856, 'snaps': '[1572864,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~3145728,4194304~18446744073705357311]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'869", 'prior_version': "374'856", 'reqid': 'client.133562.0:359426', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.277007+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 869, 'snaps': '[0,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[1048576~18446744073708503039]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'870", 'prior_version': "374'869", 'reqid': 'client.133562.0:359461', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.334504+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 870, 'snaps': '[524288,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~1048576,2097152~18446744073707454463]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'871", 'prior_version': "374'870", 'reqid': 'client.133562.0:359504', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.385649+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 871, 'snaps': '[1048576,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~2097152,3145728~18446744073706405887]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'872", 'prior_version': "374'871", 'reqid': 'client.133562.0:359556', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.433611+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 872, 'snaps': '[1572864,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~3145728,4194304~18446744073705357311]', 'clean_omap': True, 'new_object': False}}}
3.2f2s0.osd57.log [503'2242, 785'2826]
3.2f2s0.osd73.log [466'1540, 488'2076]
3.2f2s0.osd75.log [466'1540, 491'2080]
3.2f2s0.osd76.log [503'2242, 785'2826]
3.2f2s1.osd41.log [503'2242, 785'2826]
3.2f2s1.osd44.log [0'0, 346'424]
3.2f2s1.osd48.log [0'0, 351'468]
3.2f2s1.osd57.log [0'0, 362'664]
3.2f2s1.osd59.log [503'2242, 785'2826]
3.2f2s2.osd37.log [503'2242, 785'2826]
3.2f2s2.osd39.log [291'200, 386'1120]
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'833", 'prior_version': "0'0", 'reqid': 'client.133721.0:366904', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:03.843215+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': True, 'ops': [{'code': 'CREATE'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[1048576~18446744073708503039]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'834", 'prior_version': "374'833", 'reqid': 'client.133721.0:366972', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:03.967153+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'APPEND', 'old_size': 1048576}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~1048576,2097152~18446744073707454463]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'835", 'prior_version': "374'834", 'reqid': 'client.133721.0:367041', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:04.090498+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'APPEND', 'old_size': 2097152}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~2097152,3145728~18446744073706405887]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'836", 'prior_version': "374'835", 'reqid': 'client.133721.0:367114', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:04.210199+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'APPEND', 'old_size': 3145728}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~3145728,4194304~18446744073705357311]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'853", 'prior_version': "374'836", 'reqid': 'client.114246.0:360904', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.208243+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 853, 'snaps': '[0,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[1048576~18446744073708503039]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'854", 'prior_version': "374'853", 'reqid': 'client.114246.0:360926', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.256403+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 854, 'snaps': '[524288,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~1048576,2097152~18446744073707454463]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'855", 'prior_version': "374'854", 'reqid': 'client.114246.0:360958', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.295928+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 855, 'snaps': '[1048576,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~2097152,3145728~18446744073706405887]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'856", 'prior_version': "374'855", 'reqid': 'client.114246.0:360973', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:25.323967+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 856, 'snaps': '[1572864,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~3145728,4194304~18446744073705357311]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'869", 'prior_version': "374'856", 'reqid': 'client.133562.0:359426', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.277007+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 869, 'snaps': '[0,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[1048576~18446744073708503039]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'870", 'prior_version': "374'869", 'reqid': 'client.133562.0:359461', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.334504+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 870, 'snaps': '[524288,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~1048576,2097152~18446744073707454463]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'871", 'prior_version': "374'870", 'reqid': 'client.133562.0:359504', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.385649+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 871, 'snaps': '[1048576,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~2097152,3145728~18446744073706405887]', 'clean_omap': True, 'new_object': False}}}
{'op': 'modify', 'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'version': "374'872", 'prior_version': "374'871", 'reqid': 'client.133562.0:359556', 'extra_reqids': [], 'mtime': '2022-03-18T11:41:39.433611+0800', 'return_code': 0, 'mod_desc': {'object_mod_desc': {'can_local_rollback': True, 'rollback_info_completed': False, 'ops': [{'code': 'SETATTRS', 'attrs': ['_', 'hinfo_key', 'snapset']}, {'code': 'ROLLBACK_EXTENTS', 'gen': 872, 'snaps': '[1572864,524288]'}]}}, 'clean_regions': {'object_clean_regions': {'clean_offsets': '[0~3145728,4194304~18446744073705357311]', 'clean_omap': True, 'new_object': False}}}
3.2f2s2.osd41.log [503'2242, 785'2826]
Actions #5

Updated by jianwei zhang almost 2 years ago

///Take 3.2f2 as an example

[root@node121 objects-unfound]# grep 'start_peering_interval up' ceph-osd.36.log-3.2f2-node121 |awk -Fstart_peering_interval '{print $2}'
2022-03-18T10:45:13.683    epoch 199 up [36,44,28] acting [36,44,28]
2022-03-18T11:00:41.974 epoch 233 up [36,44,28] -> [38,34,27], acting [36,44,28] -> [38,34,27], acting_primary 36(0) -> 38, up_primary 36(0) -> 38, role 0 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
///Add new osd begins
2022-03-18T11:32:54.836 epoch 348 up [36,44,37] -> [36,41,37], acting [36,44,37] -> [36,41,37], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:32:55.833 epoch 349 up [36,41,37] -> [36,48,41], acting [36,41,37] -> [36,-1,37], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:32:56.881 epoch 350 up [36,48,41] -> [36,48,41], acting [36,-1,37] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:34:36.477 epoch 354 up [36,48,41] -> [36,57,41], acting [36,-1,41] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:38:09.782 epoch 366 up [36,57,41] -> [36,59,41], acting [36,-1,41] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:41:49.350 epoch 376 up [36,59,41] -> [57,41,39], acting [36,-1,41] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 57, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:41:49.896 epoch 377 up [57,41,39] -> [57,41,39], acting [36,-1,41] -> [-1,41,39], acting_primary 36(0) -> 41, up_primary 57(0) -> 57, role 0 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:45:33.471 epoch 390 up [57,41,39] -> [57,41,37], acting [-1,41,39] -> [-1,41,39], acting_primary 41(1) -> 41, up_primary 57(0) -> 57, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:45:34.461 epoch 391 up [57,41,37] -> [57,41,37], acting [-1,41,39] -> [57,41,-1], acting_primary 41(1) -> 57, up_primary 57(0) -> 57, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:55:29.770 epoch 483 up [57,41,37] -> [73,59,41], acting [57,41,-1] -> [57,41,-1], acting_primary 57(0) -> 57, up_primary 57(0) -> 73, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:55:30.707 epoch 484 up [73,59,41] -> [73,59,41], acting [57,41,-1] -> [57,41,37], acting_primary 57(0) -> 57, up_primary 73(0) -> 73, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:57:14.897 epoch 490 up [73,59,41] -> [75,59,41], acting [57,41,37] -> [57,41,37], acting_primary 57(0) -> 57, up_primary 73(0) -> 75, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
2022-03-18T11:57:17.609 epoch 493 up [75,59,41] -> [76,59,41], acting [57,41,37] -> [57,41,37], acting_primary 57(0) -> 57, up_primary 75(0) -> 76, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367
///Add new osd end

Problem sorting, why does recovery_unfound appear?

1. After the expansion is completed, after the last peering is completed, the corresponding up/acting, at this time we know that osd.37 belongs to the acting set
    2022-03-18T11:57:17.609 epoch 493 up [75,59,41] -> [76,59,41], acting [57,41,37] -> [57,41,37], acting_primary 57(0) -> 57, up_primary 75(0) -> 76, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367

2. View missing object on 57/41/37
    acting=[57,41,37]
        3.2f2s0.osd57 {'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
        3.2f2s1.osd41 {'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
        3.2f2s2.osd37 {'object': '3:4f7efb3c:::101000000000133.0000001d:head', 'need': "374'872", 'have': "0'0", 'flags': 'none', 'clean_regions': 'clean_offsets: [], clean_omap: 0, new_object: 1'}
        This means that acting=[57,41,37] all 3 ec shards are missing 101000000000133.0000001d objects
        Question: Under what conditions did osd57(0)/osd41(1)/osd37(2) add 101000000000133.0000001d to the missing list?
            osd.57(0) has historically intersected with osd.36(0) and osd.41(2)
            osd.41(1) has historically intersected with osd.36(0) and osd.41(2)
            osd.37(2) has historically intersected with osd.36(0)

    up=[76,59,41]
        3.2f2s0.osd76 empty
        3.2f2s1.osd59 empty
        3.2f2s2.osd41 empty

3. For the same PG, ceph must first perform recovery based on missing objects before backfilling

4. Object version exists on disk osd
    search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.41(2) (past last_backfill MIN) ///3.2f2s2.101000000000133.0000001d.osd41.object_info_t(disk exists), unchecked
    search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 also missing on osd.59(1) (past last_backfill MIN) ///3.2f2s1.101000000000133.0000001d.osd59.object_info_t(disk exists), unchecked    
                                                                                                                             /// Should be backfilled from osd.41(2)+osd.36(0) during epoch 366-376 history
    search_for_missing 3:4f7efb3c:::101000000000133.0000001d:head 374'872 is on osd.36(0)                                    ///3.2f2s0.101000000000133.0000001d.osd36.object_info_t(disk exists), selected

5. Up/acting of osd.41(2) and osd.59(1) in history
    2022-03-18T11:34:36.477 epoch 354 up [36,48,41] -> [36,57,41], acting [36,-1,41] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
    2022-03-18T11:38:09.782 epoch 366 up [36,57,41] -> [36,59,41], acting [36,-1,41] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 36, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367
    2022-03-18T11:41:49.350 epoch 376 up [36,59,41] -> [57,41,39], acting [36,-1,41] -> [36,-1,41], acting_primary 36(0) -> 36, up_primary 36(0) -> 57, role 0 -> 0, features acting 4540138292840890367 upacting 4540138292840890367

6. After the last peering is completed, at this time we know that when epoch=493, osd.59(1)/osd.41(2), these two osd, when epoch=366, are also osd.59(1)/ osd.41(2)
    2022-03-18T11:57:17.609 epoch 493 up [75,59,41] -> [76,59,41], acting [57,41,37] -> [57,41,37], acting_primary 57(0) -> 57, up_primary 75(0) -> 76, role -1 -> -1, features acting 4540138292840890367 upacting 4540138292840890367

7. Why is the last_backfill of osd.59(1) and osd.41(2) set to MIN?
    [root@jianwei ceph]# grep 'starting backfill to osd.' ceph.log |grep 3.2f2
    2022-03-18T11:55:31.749062+0800 osd.57 (osd.57) 340 : cluster [DBG] 3.2f2s0 starting backfill to osd.41(2) from (0'0,375'884] MAX to 482'1907
    2022-03-18T11:55:31.758747+0800 osd.57 (osd.57) 341 : cluster [DBG] 3.2f2s0 starting backfill to osd.59(1) from (0'0,375'884] MAX to 482'1907
    2022-03-18T11:57:18.399993+0800 osd.57 (osd.57) 394 : cluster [DBG] 3.2f2s0 starting backfill to osd.76(0) from (0'0,0'0] MAX to 491'2080

    acting=[57,41,37]
    3.2f2s0.osd57.log [503'2242, 785'2826]
    3.2f2s1.osd41.log [503'2242, 785'2826]
    3.2f2s2.osd37.log [503'2242, 785'2826]

    up=[76,59,41]
    3.2f2s0.osd76.log [503'2242, 785'2826]
    3.2f2s1.osd59.log [503'2242, 785'2826]
    3.2f2s2.osd41.log [503'2242, 785'2826]

    Since in the historical period of epoch=366, osd.59(1) and osd.41(2) are already shard 1 and 2, and their log range at that time is
        osd.59(1) (0'0,375'884]
        osd.41(2) (0'0,375'884]
        In fact, this further shows that in the up=[76,59,41] set, only osd.76 is missing the 101000000000133.0000001d object

    find_best_info jianwei.zhang best is osd.37(2)          [503'2242, 785'2826]

    Since the range of the authoritative log is [503'2242, 785'2826], therefore,
        osd.59(1) (0'0,375'884]     Its head (375'884) is much smaller than the tail (503'2242) of the authoritative log, that is, the logs have no intersection and can only be restored through backfill
        osd.41(2) (0'0,375'884]     Its head (375'884) is much smaller than the tail (503'2242) of the authoritative log, that is, the logs have no intersection and can only be restored through backfill

    Finally, after the PeeringState::activate() function is the end of peering, activating activates the processing function from the osd

    After PG Peering is completed, when activating each replica osd, the last_backfill will be reset to MIN for the osd that needs to be backfilled from scratch.

    void PeeringState::activate(ObjectStore::Transaction &t, epoch_t activation_epoch, PeeringCtxWrapper &ctx)
    {
            ...
            else if (pg_log.get_tail() > pi.last_update ||                         ///①The tail of the primary osd log is larger than the head of the replica osd log, indicating that the logs have no intersection, and data recovery can only be performed through backfill.
                    pi.last_backfill == hobject_t() ||                            ///②The last_backfill pointer of the replica osd is MIN, which means that after the last peering is completed, after the replica osd is set to MIN, it has not had time to backfill, died, and then got up again.
                   (backfill_targets.count(*i) && pi.last_backfill.is_max())) {    ///③The replica osd exists in the backfill_targets target list, and the last_backfill pointer of the replica osd is MAX, which means that the replica osd belongs to the up set and needs backfill
                /* ^ This last case covers a situation where a replica is not contiguous
                 * with the auth_log, but is contiguous with this replica.  Reshuffling
                 * the active set to handle this would be tricky, so instead we just go
                 * ahead and backfill it anyway.  This is probably preferrable in any
                 * case since the replica in question would have to be significantly
                 * behind.
                 */
                // backfill
                pl->get_clog_debug() << info.pgid << " starting backfill to osd." << peer << " from (" << pi.log_tail << "," << pi.last_update << "] " << pi.last_backfill << " to " << info.last_update;

                pi.last_update = info.last_update;
                pi.last_complete = info.last_update;
                pi.set_last_backfill(hobject_t());                                 ///reset last_backfill to MIN
                pi.last_epoch_started = info.last_epoch_started;
                pi.last_interval_started = info.last_interval_started;
                pi.history = info.history;
                pi.hit_set = info.hit_set;
                // Save num_bytes for reservation request, can't be negative
                peer_bytes[peer] = std::max<int64_t>(0, pi.stats.stats.sum.num_bytes);
                pi.stats.stats.clear();
                pi.stats.stats.sum.num_bytes = peer_bytes[peer];

                // initialize peer with our purged_snaps.
                pi.purged_snaps = info.purged_snaps;

                m = new MOSDPGLog(i->shard, pg_whoami.shard, get_osdmap_epoch(), pi,
                                  last_peering_reset /* epoch to create pg at */);

                // send some recent log, so that op dup detection works well.
                m->log.copy_up_to(cct, pg_log.get_log(), cct->_conf->osd_max_pg_log_entries);
                m->info.log_tail = m->log.tail;
                pi.log_tail = m->log.tail;  // sigh...

                pm.clear();
            }
            ...
    }

Problem logic:
1. The acting set has a missing object that needs to recover.
2. But the source osds of these missing objects is the osd in the up set to be backfill,
3. Since the osd in the up set needs to be backfilled, the last_backfill of the osd of the backfill should be set to MIN,
4. Since the last_backfill of source osd is MIN, the source osd with missing object is filtered out during MissingLoc::add_source_info.

The reason for the high recurrence frequency is our expansion process, which starts from 11:32 to 11:57. The expansion of two nodes is completed, and 13 up/acting are generated.
Actions #6

Updated by jianwei zhang almost 2 years ago

There is a modification idea: refer to ceph pg 2.5 mark_unfound_lost revert|delete the command line

For ec, when it is found in this scenario, it also encapsulates a ceph command line to add source osd for recovery_unfound missing object

Specific method: Use the osd in the might_have_unfound+acting_recovery_backfill list, scan the disk once, find the consistent version prior_version last_reqid mtime, etc., identify it as the same op, then add it to the source osd, and trigger data recovery.

            "acting_recovery_backfill": []
            "might_have_unfound": [
                {
                    "osd": "36(0)",
                    "status": "already probed" 
                },
                {
                    "osd": "37(2)",
                    "status": "already probed" 
                },
                {
                    "osd": "39(2)",
                    "status": "already probed" 
                },
                {
                    "osd": "41(1)",
                    "status": "already probed" 
                },
                {
                    "osd": "41(2)",
                    "status": "already probed" 
                },
                {
                    "osd": "44(1)",
                    "status": "already probed" 
                },
                {
                    "osd": "48(1)",
                    "status": "already probed" 
                },
                {
                    "osd": "57(1)",
                    "status": "already probed" 
                },
                {
                    "osd": "59(1)",
                    "status": "already probed" 
                },
                {
                    "osd": "73(0)",
                    "status": "already probed" 
                },
                {
                    "osd": "75(0)",
                    "status": "already probed" 
                },
                {
                    "osd": "76(0)",
                    "status": "already probed" 
                }
            ],

Actions #7

Updated by jianwei zhang almost 2 years ago

Since I was developing on ceph15.2.13, I did not adapt to the master.
If there is no problem with the review, consider porting to the master branch.

Actions

Also available in: Atom PDF