Project

General

Profile

Bug #22072

one object degraded cause all ceph rgw request hang

Added by stefan bo over 6 years ago. Updated almost 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a 20 node cluster (3 mons, 3 rgw, 14 osd) running on centos7.3 with XFS filesystem, ceph version 10.2.9 jewel
13 osd service and 1 osd out as disk problem


pool 45 '.rgw.root' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 491 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 46 'default.rgw.control' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 493 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 47 'default.rgw.data.root' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 495 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 48 'default.rgw.gc' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 496 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 49 'default.rgw.log' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 497 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 50 'default.rgw.users.uid' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 499 flags hashpspool stripe_width 0
pool 51 'default.rgw.users.keys' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 501 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 52 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 1837 flags hashpspool stripe_width 0
pool 53 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 758 flags hashpspool stripe_width 0
pool 54 'default.rgw.users.email' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 507 owner 18446744073709551615 flags hashpspool stripe_width 0
max_osd 14
osd.0 up in weight 1 up_from 1348 up_thru 2051 down_at 1333 last_clean_interval [808,1332) 10.11.0.67:6800/3880287 10.11.0.67:6801/3880287 10.11.0.67:6802/3880287 10.11.0.67:6803/3880287 exists,up cd255b23-3558-41e3-917b-edad06337348
osd.1 up in weight 1 up_from 2029 up_thru 2051 down_at 2023 last_clean_interval [1821,2028) 10.11.0.68:6800/4095381 10.11.0.68:6801/10095381 10.11.0.68:6802/10095381 10.11.0.68:6803/10095381 exists,up 5db711d9-4d33-4ecb-b510-cd4de486576b
osd.2 up in weight 1 up_from 840 up_thru 2051 down_at 836 last_clean_interval [521,835) 10.11.0.69:6800/3514110 10.11.0.69:6801/3514110 10.11.0.69:6802/3514110 10.11.0.69:6803/3514110 exists,up cb1b349d-2a07-462b-b30a-2544aabb2481
osd.3 up in weight 1 up_from 2050 up_thru 2051 down_at 2048 last_clean_interval [1681,2049) 10.11.0.70:6800/3885324 10.11.0.70:6801/19885324 10.11.0.70:6802/19885324 10.11.0.70:6803/19885324 exists,up e3aa5559-e5d0-4404-b95a-368eec8edc05
osd.4 up in weight 1 up_from 1392 up_thru 2051 down_at 1390 last_clean_interval [867,1389) 10.11.0.71:6800/3873236 10.11.0.71:6801/3873236 10.11.0.71:6802/3873236 10.11.0.71:6803/3873236 exists,up d4c72369-dfe0-4263-b828-0e14b38fce53
osd.6 up in weight 1 up_from 1408 up_thru 2051 down_at 1404 last_clean_interval [541,1403) 10.11.0.73:6800/3898089 10.11.0.73:6801/3898089 10.11.0.73:6804/3898089 10.11.0.73:6805/3898089 exists,up 511d138d-e7f8-4bee-834d-7c876bda13c2
osd.7 up in weight 1 up_from 1415 up_thru 2051 down_at 1413 last_clean_interval [546,1412) 10.11.0.74:6800/4052759 10.11.0.74:6801/4052759 10.11.0.74:6802/4052759 10.11.0.74:6803/4052759 exists,up 43a4ef26-952e-4ea6-b24a-965d598982db
osd.8 up in weight 1 up_from 2045 up_thru 2051 down_at 2042 last_clean_interval [1864,2043) 10.11.0.75:6800/3989550 10.11.0.75:6804/14989550 10.11.0.75:6805/14989550 10.11.0.75:6806/14989550 exists,up e4ced560-c92c-46cf-8cf4-95a7889f8c53
osd.9 up in weight 1 up_from 1445 up_thru 2050 down_at 1443 last_clean_interval [556,1442) 10.11.0.76:6800/4054911 10.11.0.76:6801/4054911 10.11.0.76:6802/4054911 10.11.0.76:6803/4054911 exists,up e967038a-9413-4ae7-85bd-3b5ca1689c3f
osd.10 up in weight 1 up_from 1463 up_thru 2051 down_at 1460 last_clean_interval [561,1459) 10.11.0.77:6800/3936981 10.11.0.77:6801/3936981 10.11.0.77:6802/3936981 10.11.0.77:6803/3936981 exists,up a0129276-0ae1-4a27-a1b8-b89841fcb751
osd.11 up in weight 1 up_from 1474 up_thru 2051 down_at 1471 last_clean_interval [566,1470) 10.11.0.78:6800/3990918 10.11.0.78:6801/3990918 10.11.0.78:6802/3990918 10.11.0.78:6803/3990918 exists,up df2cb894-5a77-4265-8efa-3d2036909a29
osd.12 up in weight 1 up_from 1481 up_thru 2051 down_at 1479 last_clean_interval [802,1478) 10.11.0.79:6800/1737036 10.11.0.79:6801/1737036 10.11.0.79:6802/1737036 10.11.0.79:6803/1737036 exists,up 80b1d672-0f82-48f9-9481-1c3807373451
osd.13 up in weight 1 up_from 1499 up_thru 2051 down_at 1497 last_clean_interval [576,1496) 10.11.0.80:6800/3853864 10.11.0.80:6801/3853864 10.11.0.80:6802/3853864 10.11.0.80:6803/3853864 exists,up 637d385a-a13f-42b2-92a9-d39f8f1aba67

monitor server display log like this, begin active+recovering+degraded
2017-11-07 21:14:24.624054 mon.0 10.11.0.61:6789/0 11617599 : cluster [INF] pgmap v8305837: 416 pgs: 1 active+c
lean+scrubbing, 2 active+recovering+degraded, 2 remapped, 344 active+clean, 4 peering, 3 active+clean+scrubbing
+deep, 6 active+undersized+degraded+remapped+backfilling, 1 active+undersized+remapped, 25 active+undersized+de
graded, 25 remapped+peering, 3 stale+active+clean; 1523 GB data, 3782 GB used, 8606 GB / 12389 GB avail; 2543 B
/s rd, 2 op/s; 4924642/114683470 objects degraded (4.294%)
2017-11-07 21:14:23.648844 osd.1 10.11.0.68:6800/3585971 9281 : cluster [WRN] 5 slow requests, 5 included below
; oldest blocked for > 30.139556 secs

lots of log like this on monitor server in the same time
2017-11-07 21:14:23.648854 osd.1 10.11.0.68:6800/3585971 9282 : cluster [WRN] slow request 30.019878 seconds ol
d, received at 2017-11-07 21:13:53.628887: osd_op(client.7005.0:158809617 52.9adcb6dd .dir.8bde99df-01c8-4a5a-b
033-f55aff24d9c0.7002.11 [call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e1124) curren
tly waiting for degraded object

after a while, ceph health detail

HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck degraded; 1 pgs stuck unclean; 100 requests are blocked > 32 sec; 1 osds have slow requests; recovery 2/114689878 objects degraded (0.000%)
pg 52.1d is stuck unclean for 8651.216692, current state active+recovering+degraded, last acting [1,8,3]
pg 52.1d is stuck degraded for 663.356421, current state active+recovering+degraded, last acting [1,8,3]
pg 52.1d is active+recovering+degraded, acting [1,8,3]

pg 52.1d is always hang with `2/114689878 objects degraded (0.000%)`, no mater how many times restart osd.1 or osd.8. after restart osd.1 or osd.8, ceph rgw can service a while, but hang after.

check pg dump_stuck display the osd.1 was the last acting() and check osd.1 log
2017-11-07 21:15:13.657765 7fd6c51a5700 0 log_channel(cluster) log [WRN] : slow request 32.830642 seconds old, received at 2017-11-07 21:14:40.826978: osd_op(client.8157.0:9512149 52.9adcb6dd .dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11 [call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e1141) currently waiting for degraded object
2017-11-07 21:15:14.657997 7fd6c51a5700 0 log_channel(cluster) log [WRN] : 62 slow requests, 5 included below; oldest blocked for > 34.253927 secs
2017-11-07 21:15:14.658012 7fd6c51a5700 0 log_channel(cluster) log [WRN] : slow request 30.917050 seconds old, received at 2017-11-07 21:14:43.740814: osd_op(client.7008.0:158820368 52.9adcb6dd .dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11 [call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e1143) currently waiting for degraded object
2017-11-07 21:15:14.658017 7fd6c51a5700 0 log_channel(cluster) log [WRN] : slow request 34.099198 seconds old, received at 2017-11-07 21:14:40.558667: osd_op(client.7005.0:158810919 52.9adcb6dd .dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11 [call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e1141) currently waiting for degraded object

always the object `.dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11`, `waiting for degraded object` is a lot.

[root@Dy-JXQ-Ceph-A-68 ~]# ceph osd map default.rgw.buckets.index .dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11
osdmap e1636 pool 'default.rgw.buckets.index' (52) object '.dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11' -> pg 52.9adcb6dd (52.1d) -> up ([1,8,3], p1) acting ([1,8,3], p1)

after this, I think the object .dir.8bde99df-01c8-4a5a-b033-f55aff24d9c0.7002.11 is dameged or corrupted , make the ceph storage not work with block all radosgw request
try 3 ways:
1. try to make the pg 52.1d mark_unfound_lost revert|delete, but not response, ceph pg 52.1d query either
2. ceph pg repair pg 52.1d not work
3. try to make osd down and delete object with ceph-objectstore-tool, but execute is long time.(4hour)

when I focus on this problem, google a similar problem http://lists.ceph.com/pipermail/ceph-usfers-ceph.com/2016-March/008503.html
The thread problem debug detail is more than me ,but it is same problem.(I forget to record some debug output, just log file leave and few debug record)

with ceph hang after 8 hours, I decide to delete the object with rados -p rm with ceph-osd@1 restart with reboot time window
and IT WORKED!
ceph become health_ok, rgw work again.
just one bucket can't upload, but can access old data. I told dev recreate a new bucket, upload work fine without old bucket.

After this emergency, my problem is ceph as a distributed system, one object degraded can hang all system, it's so hurt to me(debug it without sleep with all night).

Hope dev can fix it.


Related issues

Related to RADOS - Bug #24645: Upload to radosgw fails when there are degraded objects New 06/25/2018

History

#1 Updated by Patrick Donnelly over 6 years ago

  • Project changed from Ceph to rgw

#2 Updated by Michal Cila almost 6 years ago

Having a same issue on Luminous, see: http://tracker.ceph.com/issues/24645

Can someone have a look at it, please?

#3 Updated by Nathan Cutler over 5 years ago

  • Related to Bug #24645: Upload to radosgw fails when there are degraded objects added

Also available in: Atom PDF