Project

General

Profile

Bug #19584

pg creates not sent when initial mapping empty

Added by Yuri Weinstein about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/11/2017
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Release:
master
Needs Doc:
No

Description

Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2017-04-11_05:50:06-rados-wip-yuri-testing2_2017_4_11---basic-smithi/1012910/teuthology.log

   0'0 2017-04-11 05:57:37.032115 
2.89          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:32.277227     0'0  2727:391 [2,4]          2  [2,4]              2        0'0 2017-04-11 05:57:38.369475          
   0'0 2017-04-11 05:57:38.369475 
1.89          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:32.263358     0'0  2666:130 [5,2]          5  [5,2]              5        0'0 2017-04-11 05:57:37.032114          
   0'0 2017-04-11 05:57:37.032114 
2.8a          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:32.256955     0'0   2666:31 [5,2]          5  [5,2]              5        0'0 2017-04-11 05:57:38.369481          
   0'0 2017-04-11 05:57:38.369481 
1.88          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:16.098638     0'0   2660:20 [4,3]          4  [4,3]              4        0'0 2017-04-11 05:57:37.032106          
   0'0 2017-04-11 05:57:37.032106 
2.8b          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:33.127248     0'0   2666:14 [3,2]          3  [3,2]              3        0'0 2017-04-11 05:57:38.369486          
   0'0 2017-04-11 05:57:38.369486 
1.8f          0                  0        0         0       0      17   5        5 active+clean 2017-04-11 07:00:32.288803     5'5  2727:395 [2,0]          2  [2,0]              2        0'0 2017-04-11 05:57:37.032104          
   0'0 2017-04-11 05:57:37.032104 
2.8c          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:41.824837     0'0   2669:14 [3,4]          3  [3,4]              3        0'0 2017-04-11 05:57:38.369439          
   0'0 2017-04-11 05:57:38.369439 
1.8e          0                  0        0         0       0       0   2        2 active+clean 2017-04-11 07:00:41.889656     5'2   2669:13 [4,1]          4  [4,1]              4        0'0 2017-04-11 05:57:37.032102          
   0'0 2017-04-11 05:57:37.032102 
2.8d          0                  0        0         0       0       0   0        0 active+clean 2017-04-11 07:00:04.361054     0'0  2727:680 [1,0]          1  [1,0]              1        0'0 2017-04-11 05:57:38.369446          
   0'0 2017-04-11 05:57:38.369446 

21  4 0 0 0 0     389   12   12 
20  4 0 0 0 0     389   12   12 
19  2 0 0 0 0     135    8    8 
18  2 0 0 0 0     135    8    8 
17 37 0 0 0 0 8401671  232  232 
2   0 0 0 0 0       0    0    0 
23  0 0 0 0 0       0    0    0 
1  20 0 0 0 0    2068  470  470 
22  0 0 0 0 0       0    0    0 
0   2 0 0 0 0 4194304 1552 1552 

sum 71 0 0 0 0 12599091 2294 2294 
OSD_STAT USED AVAIL  TOTAL  HB_PEERS    PG_SUM 
5        160M 95180M 95341M [0,1,2,3,4]    151 
4        158M 95183M 95342M [0,1,2,3,5]    172 
3        158M 95182M 95341M [0,1,2,4,5]    166 
2        156M 95185M 95341M [0,1,3,4,5]    154 
1        154M 95187M 95342M [0,2,3,4,5]    165 
0        158M 95182M 95341M [1,2,3,4,5]    180 
sum      947M   557G   558G                    

2017-04-11T07:20:57.742 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 841, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 951, in do_thrash
    timeout=self.config.get('timeout')
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 2025, in wait_for_recovery
    'failed to recover before timeout expired'
AssertionError: failed to recover before timeout expired

2017-04-11T07:20:57.744 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 841, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 951, in do_thrash
    timeout=self.config.get('timeout')
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing2_2017_4_11/qa/tasks/ceph_manager.py", line 2025, in wait_for_recovery
    'failed to recover before timeout expired'
AssertionError: failed to recover before timeout expired
2017-04-11T07:20:57.744 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 608, in _notify_links
    link(self)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 538, in run
    self._report_error(sys.exc_info())
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 526, in _report_error
    self.parent.handle_error(self, *exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/__init__.py", line 106, in custom_handle_error
    self._origin_handle_error(context, type, value, tb)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 558, in handle_error
    self.print_exception(context, type, value, tb)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 605, in print_exception
    traceback.print_exception(type, value, tb, file=errstream)
  File "/usr/lib/python2.7/traceback.py", line 124, in print_exception
    _print(file, 'Traceback (most recent call last):')
  File "/usr/lib/python2.7/traceback.py", line 13, in _print
    file.write(str+terminator)
IOError: [Errno 5] Input/output error

History

#1 Updated by Josh Durgin about 2 months ago

  • Subject changed from "AssertionError: failed to recover before timeout expired" in rados (upgrade) run to pg creates not sent when initial mapping empty
  • Regression changed from No to Yes

3 pgs were stuck creating - no trace of them in the osd logs - all three were mapped to osds 0 and 1.

The monitor logs mon.a was upgraded to master already, and then did not send creates for the 3 pgs mapped to no osds:

2017-04-11 05:58:46.450243 7fb41a54d7c0  0 ceph version 12.0.0-2453-g6e97b99 (6e97b998e9c4c6eaad030dea63841281835d10ba), process ceph-mon, pid 29108
...
2017-04-11 06:58:11.966483 7fb413bac700 10 register_pg  will create 22.0 primary 3 acting [3]
2017-04-11 06:58:11.966490 7fb413bac700 10 register_pg  will create 22.1 primary 2 acting [2,5]
2017-04-11 06:58:11.966497 7fb413bac700 10 register_pg  will create 22.2 primary 4 acting [4]
2017-04-11 06:58:11.966502 7fb413bac700 10 register_pg  will create 22.3 primary -1 acting []
2017-04-11 06:58:11.966509 7fb413bac700 10 register_pg  will create 22.4 primary 2 acting [2]
2017-04-11 06:58:11.966515 7fb413bac700 10 register_pg  will create 22.5 primary 4 acting [4]
2017-04-11 06:58:11.966522 7fb413bac700 10 register_pg  will create 22.6 primary 2 acting [2]
2017-04-11 06:58:11.966528 7fb413bac700 10 register_pg  will create 22.7 primary 4 acting [4,3]
...
2017-04-11 06:58:13.387653 7fb413bac700 10 register_pg  will create 23.0 primary 5 acting [5]
2017-04-11 06:58:13.387658 7fb413bac700 10 register_pg  will create 23.1 primary 2 acting [2,5]
2017-04-11 06:58:13.387663 7fb413bac700 10 register_pg  will create 23.2 primary 2 acting [2]
2017-04-11 06:58:13.387670 7fb413bac700 10 register_pg  will create 23.3 primary 2 acting [2]
2017-04-11 06:58:13.387680 7fb413bac700 10 register_pg  will create 23.4 primary 5 acting [5]
2017-04-11 06:58:13.387684 7fb413bac700 10 register_pg  will create 23.5 primary -1 acting []
2017-04-11 06:58:13.387690 7fb413bac700 10 register_pg  will create 23.6 primary 3 acting [3]
2017-04-11 06:58:13.387695 7fb413bac700 10 register_pg  will create 23.7 primary -1 acting []

22.3, 23.5, and 23.7 are the pgs stuck creating.

#2 Updated by Kefu Chai about 2 months ago

  • Assignee set to Kefu Chai

#3 Updated by Kefu Chai about 1 month ago

  • Status changed from New to Need Review

#4 Updated by Kefu Chai about 1 month ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF