Bug #22804
closedmultisite Synchronization failed when read and write delete at the same time
0%
Description
<?xml version="1.0" encoding="UTF-8" ?> <workload name="s3-sample" description="sample benchmark for s3"> <storage type="s3" config="accesskey=test1;secretkey=test1;endpoint=http://s3.z1.tt;path_style_access=true" /> <workflow> <workstage name="init"> <work type="init" workers="1" config="cprefix=mix01;containers=r(1,2)" /> </workstage> <workstage name="prepare"> <work type="prepare" workers="1" config="cprefix=mix01;containers=r(1,2);objects=r(1,10);sizes=c(100)KB" /> </workstage> <workstage name="main"> <work name="main" workers="100" runtime="30"> <operation type="read" ratio="30" config="cprefix=mix01;containers=u(1,2);objects=u(1,10)" /> <operation type="write" ratio="40" config="cprefix=mix01;containers=u(1,2);objects=u(11,100);sizes=c(100)KB" /> <operation type="delete" ratio="30" config="cprefix=mix01;containers=u(1,2);objects=u(11,100)" /> </work> </workstage> </workflow> </workload>
After the test, there are differences between the objects on both sides. from the log that these objects are `PUT `and `DELETE` at the same time.
But if just testing mix PUT and GET, sync ok.
Files
Updated by John Spray about 6 years ago
- Project changed from Ceph to rgw
- Description updated (diff)
Updated by Amine Liu about 6 years ago
#!/bin/sh
usercfg="Ymliu.cfg"
bucket="s3://testmix"
file="s-mix011.txt"
for i in {1..100};
do
{
s3cmd -c $usercfg put $file $bucket/$i &
sleep 5
#s3cmd -c $usercfg rm $bucket/$i
s3cmd -c $usercfg put $file $bucket/$i &
s3cmd -c $usercfg rm $bucket/$i &
}&
done
This script can also be reproduced. master ` s3cmd ls` is 99 objects, bucket stats is'"num_objects": 96';and slave `s3cmd ls `is 89 objects,bucket stats is'"num_objects": 89'.
Updated by Amine Liu about 6 years ago
[root@sx-3f3r-ceph-s3-c1-03 my-cluster]# radosgw-admin bucket stats --bucket=testmix
{
"bucket": "testmix",
"zonegroup": "e1d5f39f-70f6-443e-98e8-3dc0b3b312f8",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "d49d824f-76c0-4d15-9219-ca7acf5c31b3.1514569.3",
"marker": "d49d824f-76c0-4d15-9219-ca7acf5c31b3.1514569.3",
"index_type": "Normal",
"owner": "Ymliu",
"ver": "0#163,1#74,2#265,3#246,4#161,5#164,6#179,7#244,8#166,9#220,10#179,11#180,12#134,13#180,14#194,15#103,16#78,17#154,18#170,19#214",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0",
"mtime": "2018-02-08 15:11:52.323452",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#",
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 3
},
"rgw.main": {
"size": 4785792,
"size_actual": 5111808,
"size_utilized": 4785792,
"size_kb": 4674,
"size_kb_actual": 4992,
"size_kb_utilized": 4674,
"num_objects": 96
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
[root@sx-3f3r-ceph-s3-c1-03 my-cluster]# s3cmd -c Ymliu.cfg ls s3://testmix|wc -l
99
[root@sx-3f3r-ceph-s3-c1-03 my-cluster]#
Updated by Amine Liu about 6 years ago
[root@sx-3f3r-ceph-s3-c1-03 my-cluster]# radosgw-admin bucket check --fix --check-objects --bucket=testmix
[]
{
"object": "1",
"object": "10",
"object": "100",
"object": "11",
"object": "12",
"object": "13",
"object": "14",
"object": "15",
"object": "16",
"object": "17",
"object": "18",
"object": "19",
"object": "2",
"object": "20",
"object": "21",
"object": "22",
"object": "23",
"object": "24",
"object": "25",
"object": "26",
"object": "27",
"object": "28",
"object": "29",
"object": "3",
"object": "30",
"object": "31",
"object": "32",
"object": "33",
"object": "34",
"object": "35",
"object": "36",
"object": "37",
"object": "38",
"object": "39",
"object": "4",
"object": "40",
"object": "41",
"object": "42",
"object": "43",
"object": "44",
"object": "45",
"object": "46",
"object": "48",
"object": "49",
"object": "5",
"object": "50",
"object": "51",
"object": "52",
"object": "53",
"object": "54",
"object": "55",
"object": "56",
"object": "57",
"object": "58",
"object": "59",
"object": "6",
"object": "60",
"object": "61",
"object": "62",
"object": "63",
"object": "64",
"object": "65",
"object": "66",
"object": "67",
"object": "68",
"object": "69",
"object": "7",
"object": "70",
"object": "71",
"object": "72",
"object": "73",
"object": "74",
"object": "75",
"object": "76",
"object": "77",
"object": "78",
"object": "79",
"object": "8",
"object": "80",
"object": "81",
"object": "82",
"object": "83",
"object": "84",
"object": "85",
"object": "86",
"object": "87",
"object": "88",
"object": "89",
"object": "9",
"object": "90",
"object": "91",
"object": "92",
"object": "93",
"object": "94",
"object": "95",
"object": "96",
"object": "97",
"object": "98",
"object": "99"
}
{
"existing_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 3
},
"rgw.main": {
"size": 4785792,
"size_actual": 5111808,
"size_utilized": 4785792,
"size_kb": 4674,
"size_kb_actual": 4992,
"size_kb_utilized": 4674,
"num_objects": 96
}
}
},
"calculated_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 3
},
"rgw.main": {
"size": 4785792,
"size_actual": 5111808,
"size_utilized": 4785792,
"size_kb": 4674,
"size_kb_actual": 4992,
"size_kb_utilized": 4674,
"num_objects": 96
}
}
}
}
Updated by Amine Liu about 6 years ago
- File slave-bilog.txt slave-bilog.txt added
- File master-bilog.txt master-bilog.txt added
Another test. dump of bilog:
master merge op(write,delete),but slave miss some op.
Updated by Casey Bodley about 6 years ago
a couple of prs that are potentially related? https://github.com/ceph/ceph/pull/20396 https://github.com/ceph/ceph/pull/19895
Updated by Amine Liu about 6 years ago
Casey Bodley wrote:
a couple of prs that are potentially related? https://github.com/ceph/ceph/pull/20396 https://github.com/ceph/ceph/pull/19895
yes, "rgw: fix index cancel op miss update header #20396" that is why index not equal by `ls`;
but "rgw: do not add cancel op in squash_map #19895" that does not seem to involve a OP merger, like miss `DELETE` on slave zone.
Updated by Yehuda Sadeh about 6 years ago
- Subject changed from multipsite Synchronization failed when read and write delete at the same time to multisite Synchronization failed when read and write delete at the same time
- Assignee set to Casey Bodley
- Priority changed from Normal to High
Updated by Pengju Niu about 6 years ago
it beacase that master try to del first write objA, and it has been canceled however bilog record del op behind second write.So master rgw has objA,slave rgw can't find. pr is https://github.com/ceph/ceph/pull/20814.
Updated by Amine Liu about 6 years ago
Pengju Niu wrote:
it beacase that master try to del first write objA, and it has been canceled however bilog record del op behind second write.So master rgw has objA,slave rgw can't find. pr is https://github.com/ceph/ceph/pull/20814.
Yes, your pull solved my problem, thank you!
Updated by Amine Liu about 6 years ago
Yehuda Sadeh wrote:
I commented on the PR.
sorry, Two days after this pull, bug reappeared.
Updated by Casey Bodley about 6 years ago
Updated by Casey Bodley about 6 years ago
- Status changed from New to Fix Under Review
- Tags changed from multipsite sync failed to multisite sync failed
- Backport set to jewel luminous
Updated by Amine Liu about 6 years ago
Updated by Casey Bodley about 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Abhishek Lekshmanan about 6 years ago
master pr : https://github.com/ceph/ceph/pull/20814
Updated by Abhishek Lekshmanan about 6 years ago
- Copied to Backport #23690: luminous: multisite Synchronization failed when read and write delete at the same time added
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23692: jewel: multisite Synchronization failed when read and write delete at the same time added
Updated by Amine Liu about 6 years ago
cosbench with read/write/delete at the same time, master's objects and index was still more than slave's.
Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to Resolved