Project

General

Profile

Bug #22556

failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22

Added by Amine Liu about 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-01-03 15:53:54.081272 7ffae4959700 0 -- 172.18.216.129:0/3441061439 submit_message mon_subscribe({osdmap=202}) v2 remote, 172.18.216.113:6789/0, failed lossy con, dropping message 0x7ffacc016320
2018-01-03 15:53:54.081304 7ffacbfff700 0 -- 172.18.216.129:0/1830369741 submit_message mon_subscribe({osdmap=202}) v2 remote, 172.18.216.113:6789/0, failed lossy con, dropping message 0x7ffabc0130f0
2018-01-03 15:53:54.084664 7ffacbfff700 0 monclient: hunting for new mon
2018-01-03 15:53:54.098603 7ffae4959700 0 monclient: hunting for new mon
2018-01-03 15:53:54.157355 7ff7b2e79700 1 rgw meta sync: epoch=0 in sync status comes before remote's oldest mdlog epoch=1, restarting sync
2018-01-03 15:53:54.167100 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:53:54.167116 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16
2018-01-03 15:53:55.257017 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:53:55.257032 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16
2018-01-03 15:53:57.336909 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:53:57.336925 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16
2018-01-03 15:54:01.404930 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:54:01.404946 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16
2018-01-03 15:54:02.123630 7ff7aba67700 1 ====== starting new request req=0x7ff7aba61710 =====
2018-01-03 15:54:03.285785 7ff7ab266700 1 ====== starting new request req=0x7ff7ab260710 =====
2018-01-03 15:54:03.786137 7ff7ab266700 1 ====== req done req=0x7ff7ab260710 op status=0 http_status=403 ======
2018-01-03 15:54:03.786178 7ff7aba67700 1 ====== req done req=0x7ff7aba61710 op status=0 http_status=403 ======
2018-01-03 15:54:03.786195 7ff7ab266700 1 civetweb: 0x7ffa3000b370: 172.18.52.241 - - [03/Jan/2018:15:54:03 +0800] "GET /admin/log HTTP/1.1" 403 0 - -
2018-01-03 15:54:03.786272 7ff7aba67700 1 civetweb: 0x7ffaac01de80: 172.18.52.241 - - [03/Jan/2018:15:54:02 +0800] "GET /admin/log HTTP/1.1" 403 0 - -

2018-01-03 15:54:09.456163 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:54:09.456177 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16
2018-01-03 15:54:22.011654 7ff7a725e700 1 ====== starting new request req=0x7ff7a7258710 =====
2018-01-03 15:54:22.015628 7ff7a725e700 1 ====== req done req=0x7ff7a7258710 op status=0 http_status=403 ======
2018-01-03 15:54:22.015697 7ff7a725e700 1 civetweb: 0x7ffaac0223c0: 172.18.52.242 - - [03/Jan/2018:15:54:22 +0800] "GET /admin/log HTTP/1.1" 403 0 - -
2018-01-03 15:54:25.508925 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:54:25.508941 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16

2018-01-03 15:54:41.801518 7ff7a2a55700 1 ====== starting new request req=0x7ff7a2a4f710 =====
2018-01-03 15:54:41.805426 7ff7a2a55700 1 ====== req done req=0x7ff7a2a4f710 op status=0 http_status=403 ======
2018-01-03 15:54:41.806707 7ff7a2254700 1 ====== starting new request req=0x7ff7a224e710 =====
2018-01-03 15:54:41.810124 7ff7a2254700 1 ====== req done req=0x7ff7a224e710 op status=0 http_status=403 ======
2018-01-03 15:54:41.834157 7ff7a2a55700 1 civetweb: 0x7ffaa40219c0: 172.18.52.242 - - [03/Jan/2018:15:54:41 +0800] "POST /admin/realm/period HTTP/1.1" 403 0 - -
2018-01-03 15:54:41.838756 7ff7a2254700 1 civetweb: 0x7ffaac0358e0: 172.18.52.241 - - [03/Jan/2018:15:54:41 +0800] "POST /admin/realm/period HTTP/1.1" 403 0 - -
2018-01-03 15:54:55.562206 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:54:55.562224 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16
2018-01-03 15:55:02.011254 7ff79da4b700 1 ====== starting new request req=0x7ff79da45710 =====
2018-01-03 15:55:02.015640 7ff79da4b700 1 ====== req done req=0x7ff79da45710 op status=0 http_status=403 ======
2018-01-03 15:55:02.015709 7ff79da4b700 1 civetweb: 0x7ffaac048d90: 172.18.52.242 - - [03/Jan/2018:15:55:02 +0800] "GET /admin/log HTTP/1.1" 403 0 - -

2018-01-03 15:55:11.864108 7ff79aa45700 1 ====== starting new request req=0x7ff79aa3f710 =====
2018-01-03 15:55:11.868045 7ff79aa45700 1 ====== req done req=0x7ff79aa3f710 op status=0 http_status=403 ======
2018-01-03 15:55:11.868798 7ff79a244700 1 ====== starting new request req=0x7ff79a23e710 =====
2018-01-03 15:55:11.872069 7ff79a244700 1 ====== req done req=0x7ff79a23e710 op status=0 http_status=403 ======
2018-01-03 15:55:11.897109 7ff79aa45700 1 civetweb: 0x7ffaa404dfd0: 172.18.52.241 - - [03/Jan/2018:15:55:11 +0800] "POST /admin/realm/period HTTP/1.1" 403 0 - -
2018-01-03 15:55:11.900796 7ff79a244700 1 civetweb: 0x7ffaac053830: 172.18.52.241 - - [03/Jan/2018:15:55:11 +0800] "POST /admin/realm/period HTTP/1.1" 403 0 - -
2018-01-03 15:55:25.606730 7ff7b1c75700 0 ERROR: failed to take a lock on datalog.sync-status.f608ce2b-5584-45af-b0c5-f4896995bd22
2018-01-03 15:55:25.606744 7ff7b1c75700 0 ERROR: failed to init sync, retcode=-16

gz-sh-s3-install.txt View (23.8 KB) Amine Liu, 01/03/2018 08:07 AM

History

#1 Updated by Josh Durgin about 6 years ago

  • Project changed from RADOS to rgw

#2 Updated by Amine Liu about 6 years ago

if set rgw_override_bucket_index_max_shards value as:
rgw_override_bucket_index_max_shards = 3, or othe int, reappear.

must purge&&purgedate all nodes ,reinstall cluster ,unset rgw_override_bucket_index_max_shards in ceph.conf

,then multip site can sync.

#3 Updated by Amine Liu about 6 years ago

Tave liu wrote:

if set rgw_override_bucket_index_max_shards value as:
rgw_override_bucket_index_max_shards = 3, or othe int, reappear.

must purge&&purgedate all nodes ,reinstall cluster ,unset rgw_override_bucket_index_max_shards in ceph.conf

,then multip site can sync.

I made a mistake about index shards, But about multip rgw. when single rgw, I is sync ok, But failed in multip rgw

#4 Updated by Amine Liu about 6 years ago

sync failed when initial multip rgws at the same time.

#5 Updated by Amine Liu about 6 years ago

Tave liu wrote:

sync failed when initial multip rgws at the same time.

if set rgw_cache_enabled = false, then lock on datalog failure can be avoided when initial multip rgw at the same time.

#6 Updated by Amine Liu about 6 years ago

Tave liu wrote:

Tave liu wrote:

sync failed when initial multip rgws at the same time.

if set rgw_cache_enabled = false, then lock on datalog failure can be avoided when initial multip rgw at the same time.

objecter_inflight_ops = 10240 # 默认值 1024
objecter_inflight_op_bytes = 1048576000 # 默认值 100M

rgw_thread_pool_size = 1000
rgw_num_rados_handles = 100
rgw_max_chunk_size = 1048576 # 默认值 512 * 1024
rgw_override_bucket_index_max_shards = 3
rgw_cache_enabled = true
mon_allow_pool_delete = true

[mon]
mon_allow_pool_delete = true

[osd]
osd_journal_size = 10240
osd_recovery_op_priority = 3
osd_recovery_max_active = 3
osd_recovery_delay_start = 2
osd_max_backfills = 3
osd_deep_scrub_stride = 131072

#7 Updated by Amine Liu about 6 years ago

It has been confirmed that causes by handles

rgw_num_rados_handles = 100

#8 Updated by Orit Wasserman almost 6 years ago

  • Status changed from New to Closed

#9 Updated by Orit Wasserman almost 6 years ago

The EBUSY error is a transient error not a real issue

Also available in: Atom PDF