Project

General

Profile

Actions

Bug #4813

closed

pgs stuck creating

Added by Samuel Just about 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-04-25T02:36:57.292 DEBUG:teuthology.misc:with jobid basedir: 584
2013-04-25T02:36:57.292 DEBUG:teuthology.orchestra.run:Running [10.214.132.9]: '/home/ubuntu/cephtest/584/enable-coredump ceph-coverage /home/ubuntu/cephtest/584/archive/coverage ceph --concise -- pg dump --format=json'
2013-04-25T02:37:00.402 DEBUG:teuthology.misc:with jobid basedir: 584
2013-04-25T02:37:00.402 DEBUG:teuthology.orchestra.run:Running [10.214.132.9]: '/home/ubuntu/cephtest/584/enable-coredump ceph-coverage /home/ubuntu/cephtest/584/archive/coverage ceph --concise -- pg dump --format=json'
2013-04-25T02:37:00.459 DEBUG:teuthology.misc:with jobid basedir: 584
2013-04-25T02:37:00.459 DEBUG:teuthology.orchestra.run:Running [10.214.132.9]: '/home/ubuntu/cephtest/584/enable-coredump ceph-coverage /home/ubuntu/cephtest/584/archive/coverage ceph --concise -s'
2013-04-25T02:37:00.553 INFO:teuthology.task.thrashosds.ceph_manager: health HEALTH_WARN 7 pgs stuck inactive; 7 pgs stuck unclean
monmap e1: 3 mons at {a=10.214.132.9:6789/0,b=10.214.132.10:6789/0,c=10.214.132.9:6790/0}, election epoch 6, quorum 0,1,2 a,b,c
osdmap e87: 6 osds: 6 up, 5 in
pgmap v130: 90 pgs: 7 creating, 83 active+clean; 16284 bytes data, 817 MB used, 2792 GB / 2793 GB avail; 6B/s wr, 0op/s
mdsmap e5: 1/1/1 up {0=a=up:active}


Related issues 3 (0 open3 closed)

Related to Ceph - Bug #4748: mon: failed assert in OSDMonitor::build_incrementalResolvedSage Weil04/18/2013

Actions
Related to Ceph - Bug #4675: mon: pg creations don't get queued on mon startupResolvedSage Weil04/06/2013

Actions
Related to Ceph - Bug #4849: pg stuck peeringResolvedSamuel Just04/28/2013

Actions
Actions #1

Updated by Samuel Just about 11 years ago

ubuntu@teuthology:/a/teuthology-2013-04-25_01:00:08-rados-next-testing-basic/584

Actions #2

Updated by Samuel Just about 11 years ago

  • Status changed from New to Resolved

This was probably fixed by the fix for 4748

Actions #3

Updated by Samuel Just almost 11 years ago

  • Status changed from Resolved to 12

ubuntu@teuthology:/a/teuthology-2013-04-27_20:54:49-rados-next-testing-basic/2087

Actions #4

Updated by Ian Colle almost 11 years ago

  • Source changed from Development to Q/A
Actions #5

Updated by Samuel Just almost 11 years ago

  • Priority changed from Urgent to High
Actions #6

Updated by Samuel Just almost 11 years ago

  • Status changed from 12 to Resolved
Actions #7

Updated by Sage Weil almost 11 years ago

  • Status changed from Resolved to 12

this happened again on latest master: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-05-10_01:00:06-rados-master-testing-basic/10324

Actions #8

Updated by Sage Weil almost 11 years ago

  • Priority changed from High to Urgent
ubuntu@teuthology:/a/teuthology-2013-05-11_01:00:07-rados-next-testing-basic/11175$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: b5b09be30cf99f9c699e825629f02e3bce555d44
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: fd901056831586e8135e28c8f4ba9c2ec44dfcf6
  s3tests:
    branch: next
  workunit:
    sha1: fd901056831586e8135e28c8f4ba9c2ec44dfcf6
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test.sh
Actions #9

Updated by Sage Weil almost 11 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-05-21_01:00:05-rados-next-testing-basic/18454

Actions #10

Updated by Samuel Just almost 11 years ago

ubuntu@plana15:~$ ceph pg dump | grep creating
34.5 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000
34.4 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000
34.7 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000
34.6 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000
34.1 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000
34.3 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000
34.2 0 0 0 0 0 0 0 creating 0.000000 0'0 0'0 [] [2,1] 0'0 0.000000 0'0 0.000000

Actions #11

Updated by Samuel Just almost 11 years ago

2013-05-21 02:59:29.070891 mon.0 10.214.131.24:6789/0 44 : [INF] osdmap e8: 6 osds: 6 up, 6 in
2013-05-21 02:59:28.067712 mon.0 10.214.131.24:6789/0 43 : [INF] pgmap v15: 72 pgs: 72 active+clean; 9716 bytes data, 503 MB used, 2323 GB / 2328 GB avail
2013-05-21 02:59:29.070891 mon.0 10.214.131.24:6789/0 44 : [INF] osdmap e8: 6 osds: 6 up, 6 in
2013-05-21 02:59:29.136219 mon.0 10.214.131.24:6789/0 45 : [INF] pgmap v16: 80 pgs: 8 creating, 72 active+clean; 9716 bytes data, 503 MB used, 2323 GB / 2328 GB avail

Apparently not caused by split, interesting.

Actions #12

Updated by Samuel Just almost 11 years ago

2013-05-21 03:01:38.686052 7f53bdb97700 10 mon.b@0(leader).pg v133 check_osd_map applying osdmap e102 to pg_map
2013-05-21 03:01:38.686054 7f53bb38b700 20 -- 10.214.131.24:6789/0 >> 10.214.131.25:6806/17124 pipe(0x32fc780 sd=12 :6789 s=2 pgs=11 cs=1 l=1).writer signed seq # 87): sig = 0
2013-05-21 03:01:38.686069 7f53bb38b700 20 -- 10.214.131.24:6789/0 >> 10.214.131.25:6806/17124 pipe(0x32fc780 sd=12 :6789 s=2 pgs=11 cs=1 l=1).writer sending 87 0x3635a80
2013-05-21 03:01:38.686092 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs checking pg pools for osdmap epoch 102, last_pg_scan 101
2013-05-21 03:01:38.686100 7f53bdb97700 10 mon.b@0(leader).pg v133 no change in pool 0 rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 34 pgp_num 24 last_change 73 owner 0 crash_replay_interval 45
2013-05-21 03:01:38.686108 7f53bdb97700 10 mon.b@0(leader).pg v133 no change in pool 1 rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 24 pgp_num 24 last_change 1 owner 0
2013-05-21 03:01:38.686114 7f53bdb97700 10 mon.b@0(leader).pg v133 no change in pool 2 rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 24 pgp_num 24 last_change 1 owner 0
2013-05-21 03:01:38.686111 7f53bb38b700 10 -- 10.214.131.24:6789/0 >> 10.214.131.25:6806/17124 pipe(0x32fc780 sd=12 :6789 s=2 pgs=11 cs=1 l=1).writer: state = open policy.server=1
2013-05-21 03:01:38.686120 7f53bdb97700 10 mon.b@0(leader).pg v133 no change in pool 29 rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 88 owner 0
2013-05-21 03:01:38.686126 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs scanning pool 34 rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner 0
2013-05-21 03:01:38.686124 7f53bb38b700 20 -- 10.214.131.24:6789/0 >> 10.214.131.25:6806/17124 pipe(0x32fc780 sd=12 :6789 s=2 pgs=11 cs=1 l=1).writer sleeping
2013-05-21 03:01:38.686134 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.0
2013-05-21 03:01:38.686138 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.1 parent 34.0 ?
2013-05-21 03:01:38.686144 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.1 parent 34.0 by 1 bits
2013-05-21 03:01:38.686149 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.2 parent 34.0 ?
2013-05-21 03:01:38.686154 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.2 parent 34.0 by 1 bits
2013-05-21 03:01:38.686159 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.3 parent 34.1 ?
2013-05-21 03:01:38.686161 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.3 parent 34.0 ?
2013-05-21 03:01:38.686164 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.3 parent 34.0 by 2 bits
2013-05-21 03:01:38.686167 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.4 parent 34.0 ?
2013-05-21 03:01:38.686169 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.4 parent 34.0 by 1 bits
2013-05-21 03:01:38.686172 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.5 parent 34.1 ?
2013-05-21 03:01:38.686174 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.5 parent 34.0 ?
2013-05-21 03:01:38.686177 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.5 parent 34.0 by 2 bits
2013-05-21 03:01:38.686179 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.6 parent 34.2 ?
2013-05-21 03:01:38.686182 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.6 parent 34.0 ?
2013-05-21 03:01:38.686184 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.6 parent 34.0 by 2 bits
2013-05-21 03:01:38.686187 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.7 parent 34.3 ?
2013-05-21 03:01:38.686189 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.7 parent 34.1 ?
2013-05-21 03:01:38.686192 7f53bdb97700 10 mon.b@0(leader).pg v133 is 34.7 parent 34.0 ?
2013-05-21 03:01:38.686195 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs will create 34.7 parent 34.0 by 3 bits
2013-05-21 03:01:38.686204 7f53bdb97700 10 mon.b@0(leader).pg v133 register_new_pgs registered 8 new pgs, removed 0 uncreated pg

Actions #13

Updated by Samuel Just almost 11 years ago

  • Status changed from 12 to Pending Backport
Actions #14

Updated by Samuel Just almost 11 years ago

  • Priority changed from Urgent to High
Actions #15

Updated by Samuel Just almost 11 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF