Project

General

Profile

Actions

Bug #22895

closed

radosgw not work after all rgw `RGWAsyncRadosProcessor had timed out

Added by Amine Liu about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Read and write continuously, and this error occurs after `logrotate's killall -q -1 ...` every day.

net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

2018-02-02 03:30:01.282755 7ef30f786700 1 civetweb: 0x7f0316fcc000: 172.26.99.236 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0211/myobjects8 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.344941 7ef329fbb700 1 ====== starting new request req=0x7ef329fb5190 =====
2018-02-02 03:30:01.376386 7ef329fbb700 1 ====== req done req=0x7ef329fb5190 op status=0 http_status=200 ======
2018-02-02 03:30:01.376962 7ef329fbb700 1 civetweb: 0x7f0316eb1000: 172.26.99.233 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0212/myobjects2 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.385675 7ef327fb7700 1 ====== req done req=0x7ef327fb1190 op status=0 http_status=200 ======
2018-02-02 03:30:01.385848 7ef327fb7700 1 civetweb: 0x7f0316ecf000: 172.26.99.236 - - [01/Feb/2018:17:01:20 +0800] "PUT /mix4mbrwx0212/myobjects5026 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.394224 7ef327fb7700 1 ====== starting new request req=0x7ef327fb1190 =====
2018-02-02 03:30:01.435574 7ef31bf9f700 1 ====== req done req=0x7ef31bf99190 op status=0 http_status=200 ======
2018-02-02 03:30:01.435653 7ef31bf9f700 1 civetweb: 0x7f0316f4b000: 172.26.99.233 - - [01/Feb/2018:17:01:20 +0800] "PUT /mix4mbrwx0212/myobjects6717 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.441087 7ef31bf9f700 1 ====== starting new request req=0x7ef31bf99190 =====
2018-02-02 03:30:01.457514 7ef30f786700 1 ====== starting new request req=0x7ef30f780190 =====
2018-02-02 03:30:01.464434 7ef31bf9f700 1 ====== req done req=0x7ef31bf99190 op status=0 http_status=200 ======
2018-02-02 03:30:01.465004 7ef31bf9f700 1 civetweb: 0x7f0316f4b000: 172.26.99.233 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0212/myobjects8 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.474509 7ef30f786700 1 ====== req done req=0x7ef30f780190 op status=0 http_status=200 ======
2018-02-02 03:30:01.475065 7ef30f786700 1 civetweb: 0x7f0316fcc000: 172.26.99.236 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0211/myobjects9 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.519088 7ef30c780700 1 ====== starting new request req=0x7ef30c77a190 =====
2018-02-02 03:30:01.544811 7ef30c780700 1 ====== req done req=0x7ef30c77a190 op status=0 http_status=200 ======
2018-02-02 03:30:01.545573 7ef30c780700 1 civetweb: 0x7f0316fea000: 172.26.99.236 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0211/myobjects8 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.625288 7ef30f786700 1 ====== starting new request req=0x7ef30f780190 =====
2018-02-02 03:30:01.629175 7ef332fcd700 -1 received signal: Hangup from PID: 1082590 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw UID: 0

2018-02-02 03:30:01.629285 7ef2fef65700 1 rgw realm reloader: Pausing frontends for realm update...
2018-02-02 03:30:01.631614 7ef329fbb700 1 ====== starting new request req=0x7ef329fb5190 =====
2018-02-02 03:30:01.639825 7ef31bf9f700 1 ====== starting new request req=0x7ef31bf99190 =====
2018-02-02 03:30:01.642634 7ef31bf9f700 1 ====== req done req=0x7ef31bf99190 op status=-2 http_status=204 ======
2018-02-02 03:30:01.642726 7ef31bf9f700 1 civetweb: 0x7f0316f4b000: 172.26.99.233 - - [01/Feb/2018:17:01:20 +0800] "DELETE /mix4mbrwx0212/myobjects8962 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.643210 7ef30f786700 1 ====== req done req=0x7ef30f780190 op status=0 http_status=200 ======
2018-02-02 03:30:01.644189 7ef30f786700 1 civetweb: 0x7f0316fcc000: 172.26.99.236 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0211/myobjects4 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.649796 7ef31bf9f700 1 ====== starting new request req=0x7ef31bf99190 =====
2018-02-02 03:30:01.663238 7ef329fbb700 1 ====== req done req=0x7ef329fb5190 op status=0 http_status=200 ======
2018-02-02 03:30:01.663784 7ef329fbb700 1 civetweb: 0x7f0316eb1000: 172.26.99.233 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0212/myobjects2 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.667091 7ef31bf9f700 1 ====== req done req=0x7ef31bf99190 op status=0 http_status=200 ======
2018-02-02 03:30:01.667487 7ef31bf9f700 1 civetweb: 0x7f0316f4b000: 172.26.99.233 - - [01/Feb/2018:17:01:20 +0800] "GET /mix4mbrwx0211/myobjects8 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:30:01.698675 7ef327fb7700 1 ====== req done req=0x7ef327fb1190 op status=0 http_status=200 ======
2018-02-02 03:30:01.698840 7ef2fef65700 1 rgw realm reloader: Frontends paused
2018-02-02 03:30:01.698819 7ef327fb7700 1 civetweb: 0x7f0316ecf000: 172.26.99.236 - - [01/Feb/2018:17:01:20 +0800] "PUT /mix4mbrwx0211/myobjects9767 HTTP/1.1" 1 0 - aws-sdk-java/1.4.0.1 Linux/3.10.0-327.22.2.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02
2018-02-02 03:34:16.634502 7f02f314f700 1 heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7ef3419f0700' had timed out after 600
2018-02-02 03:34:16.634531 7f02f314f700 1 heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7ef349a00700' had timed out after 600
2018-02-02 03:34:21.634628 7f02f314f700 1 heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7ef3419f0700' had timed out after 600
2018-02-02 03:34:21.634655 7f02f314f700 1 heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7ef349a00700' had timed out after 600
2018-02-02 03:34:26.634797 7f02f314f700 1 heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7ef3419f0700' had timed out after 600


Files

master-ceph.conf (1.47 KB) master-ceph.conf the rgw of master zone Amine Liu, 02/02/2018 02:04 AM
slave-ceph.conf (1.31 KB) slave-ceph.conf the rgw of slave zone Amine Liu, 02/02/2018 02:05 AM
slave-node1-rgw-log.tar.bz2 (71.5 KB) slave-node1-rgw-log.tar.bz2 after sighup1, all slave nodes rgw was in crash. Amine Liu, 02/06/2018 03:21 AM
master-node1-rgw-log.tar.bz2 (578 KB) master-node1-rgw-log.tar.bz2 master nodes was not in crash. Amine Liu, 02/06/2018 03:22 AM

Related issues 2 (1 open1 closed)

Related to rgw - Bug #22775: rgw: multisite: the huge object sync is slow in unstable network environmentFix Under ReviewYehuda Sadeh01/23/2018

Actions
Related to rgw - Backport #22365: luminous: log rotate causes rgw realm reloadResolvedCasey BodleyActions
Actions

Also available in: Atom PDF