Actions
Bug #19584
closedpg creates not sent when initial mapping empty
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
0'0 2017-04-11 05:57:37.032115 2.89 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:32.277227 0'0 2727:391 [2,4] 2 [2,4] 2 0'0 2017-04-11 05:57:38.369475 0'0 2017-04-11 05:57:38.369475 1.89 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:32.263358 0'0 2666:130 [5,2] 5 [5,2] 5 0'0 2017-04-11 05:57:37.032114 0'0 2017-04-11 05:57:37.032114 2.8a 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:32.256955 0'0 2666:31 [5,2] 5 [5,2] 5 0'0 2017-04-11 05:57:38.369481 0'0 2017-04-11 05:57:38.369481 1.88 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:16.098638 0'0 2660:20 [4,3] 4 [4,3] 4 0'0 2017-04-11 05:57:37.032106 0'0 2017-04-11 05:57:37.032106 2.8b 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:33.127248 0'0 2666:14 [3,2] 3 [3,2] 3 0'0 2017-04-11 05:57:38.369486 0'0 2017-04-11 05:57:38.369486 1.8f 0 0 0 0 0 17 5 5 active+clean 2017-04-11 07:00:32.288803 5'5 2727:395 [2,0] 2 [2,0] 2 0'0 2017-04-11 05:57:37.032104 0'0 2017-04-11 05:57:37.032104 2.8c 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:41.824837 0'0 2669:14 [3,4] 3 [3,4] 3 0'0 2017-04-11 05:57:38.369439 0'0 2017-04-11 05:57:38.369439 1.8e 0 0 0 0 0 0 2 2 active+clean 2017-04-11 07:00:41.889656 5'2 2669:13 [4,1] 4 [4,1] 4 0'0 2017-04-11 05:57:37.032102 0'0 2017-04-11 05:57:37.032102 2.8d 0 0 0 0 0 0 0 0 active+clean 2017-04-11 07:00:04.361054 0'0 2727:680 [1,0] 1 [1,0] 1 0'0 2017-04-11 05:57:38.369446 0'0 2017-04-11 05:57:38.369446 21 4 0 0 0 0 389 12 12 20 4 0 0 0 0 389 12 12 19 2 0 0 0 0 135 8 8 18 2 0 0 0 0 135 8 8 17 37 0 0 0 0 8401671 232 232 2 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 1 20 0 0 0 0 2068 470 470 22 0 0 0 0 0 0 0 0 0 2 0 0 0 0 4194304 1552 1552 sum 71 0 0 0 0 12599091 2294 2294 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM 5 160M 95180M 95341M [0,1,2,3,4] 151 4 158M 95183M 95342M [0,1,2,3,5] 172 3 158M 95182M 95341M [0,1,2,4,5] 166 2 156M 95185M 95341M [0,1,3,4,5] 154 1 154M 95187M 95342M [0,2,3,4,5] 165 0 158M 95182M 95341M [1,2,3,4,5] 180 sum 947M 557G 558G 2017-04-11T07:20:57.742 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 841, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 951, in do_thrash timeout=self.config.get('timeout') File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 2025, in wait_for_recovery 'failed to recover before timeout expired' AssertionError: failed to recover before timeout expired 2017-04-11T07:20:57.744 ERROR:teuthology:Uncaught exception (Hub) Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run result = self._run(*self.args, **self.kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 841, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 951, in do_thrash timeout=self.config.get('timeout') File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 2025, in wait_for_recovery 'failed to recover before timeout expired' AssertionError: failed to recover before timeout expired 2017-04-11T07:20:57.744 ERROR:teuthology:Uncaught exception (Hub) Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 608, in _notify_links link(self) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 538, in run self._report_error(sys.exc_info()) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 526, in _report_error self.parent.handle_error(self, *exc_info) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/__init__.py", line 106, in custom_handle_error self._origin_handle_error(context, type, value, tb) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 558, in handle_error self.print_exception(context, type, value, tb) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 605, in print_exception traceback.print_exception(type, value, tb, file=errstream) File "/usr/lib/python2.7/traceback.py", line 124, in print_exception _print(file, 'Traceback (most recent call last):') File "/usr/lib/python2.7/traceback.py", line 13, in _print file.write(str+terminator) IOError: [Errno 5] Input/output error
Updated by Josh Durgin about 7 years ago
- Subject changed from "AssertionError: failed to recover before timeout expired" in rados (upgrade) run to pg creates not sent when initial mapping empty
- Regression changed from No to Yes
3 pgs were stuck creating - no trace of them in the osd logs - all three were mapped to osds 0 and 1.
The monitor logs mon.a was upgraded to master already, and then did not send creates for the 3 pgs mapped to no osds:
2017-04-11 05:58:46.450243 7fb41a54d7c0 0 ceph version 12.0.0-2453-g6e97b99 (6e97b998e9c4c6eaad030dea63841281835d10ba), process ceph-mon, pid 29108 ... 2017-04-11 06:58:11.966483 7fb413bac700 10 register_pg will create 22.0 primary 3 acting [3] 2017-04-11 06:58:11.966490 7fb413bac700 10 register_pg will create 22.1 primary 2 acting [2,5] 2017-04-11 06:58:11.966497 7fb413bac700 10 register_pg will create 22.2 primary 4 acting [4] 2017-04-11 06:58:11.966502 7fb413bac700 10 register_pg will create 22.3 primary -1 acting [] 2017-04-11 06:58:11.966509 7fb413bac700 10 register_pg will create 22.4 primary 2 acting [2] 2017-04-11 06:58:11.966515 7fb413bac700 10 register_pg will create 22.5 primary 4 acting [4] 2017-04-11 06:58:11.966522 7fb413bac700 10 register_pg will create 22.6 primary 2 acting [2] 2017-04-11 06:58:11.966528 7fb413bac700 10 register_pg will create 22.7 primary 4 acting [4,3] ... 2017-04-11 06:58:13.387653 7fb413bac700 10 register_pg will create 23.0 primary 5 acting [5] 2017-04-11 06:58:13.387658 7fb413bac700 10 register_pg will create 23.1 primary 2 acting [2,5] 2017-04-11 06:58:13.387663 7fb413bac700 10 register_pg will create 23.2 primary 2 acting [2] 2017-04-11 06:58:13.387670 7fb413bac700 10 register_pg will create 23.3 primary 2 acting [2] 2017-04-11 06:58:13.387680 7fb413bac700 10 register_pg will create 23.4 primary 5 acting [5] 2017-04-11 06:58:13.387684 7fb413bac700 10 register_pg will create 23.5 primary -1 acting [] 2017-04-11 06:58:13.387690 7fb413bac700 10 register_pg will create 23.6 primary 3 acting [3] 2017-04-11 06:58:13.387695 7fb413bac700 10 register_pg will create 23.7 primary -1 acting []
22.3, 23.5, and 23.7 are the pgs stuck creating.
Updated by Kefu Chai about 7 years ago
- Status changed from New to Fix Under Review
Updated by Kefu Chai about 7 years ago
- Status changed from Fix Under Review to Resolved
Actions