Bug #8507
closedOSD: failing to create PG (when splitting?) in rados/test.sh
0%
Description
ubuntu@teuthology:/a/gregf-2014-05-29_16:19:17-rados-wip-client-fast-dispatch-testing-basic-plana/280674/remote/plana15/log/ceph-osd.0.log.gz
2014-05-30 12:54:31.864524 7f67089d9700 1 -- 10.214.131.25:6809/22687 <== mon.1 10.214.131.28:6789/0 5 ==== osd_pg_create(pg107.5,903; ) v2 ==== 60+0+0 (3774340542 0 0) 0x1d54400 con 0x2f35160 2014-05-30 12:54:31.864538 7f67089d9700 10 osd.0 926 do_waiters -- start 2014-05-30 12:54:31.864542 7f67089d9700 10 osd.0 926 do_waiters -- finish 2014-05-30 12:54:31.864544 7f67089d9700 20 osd.0 926 _dispatch 0x1d54400 osd_pg_create(pg107.5,903; ) v2 2014-05-30 12:54:31.864606 7f67089d9700 10 osd.0 926 handle_pg_create osd_pg_create(pg107.5,903; ) v2 2014-05-30 12:54:31.864611 7f67089d9700 15 osd.0 926 require_same_or_newer_map 926 (i am 926) 0x1d54400 2014-05-30 12:54:31.864627 7f67089d9700 20 osd.0 926 mkpg 107.5 e903 2014-05-30 12:54:31.864766 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 from 903 to 926, start ec=903 les/c 0/903 0/0/0 2014-05-30 12:54:31.865465 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 acting|up changed in 916 from [2147483647,2147483647,2147483647]/[2147483647,2147483647,2147483647] -1/-1 -> [2147483647,0,2147483647]/[2147483647,0,2147483647] 0/0 2014-05-30 12:54:31.865472 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 up changed in 916 from [2147483647,2147483647,2147483647] -1 -> [2147483647,0,2147483647] 0 2014-05-30 12:54:31.865476 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 primary changed in 916 2014-05-30 12:54:31.865479 7f67089d9700 15 osd.0 926 project_pg_history end ec=903 les/c 0/903 916/916/916 2014-05-30 12:54:31.865482 7f67089d9700 15 osd.0 926 calc_priors_during 107.5s1 [903,916) 2014-05-30 12:54:31.865634 7f67089d9700 20 osd.0 926 107.5s1 in epoch 903 was [2147483647,2147483647,2147483647] 2014-05-30 12:54:31.865787 7f67089d9700 20 osd.0 926 107.5s1 in epoch 904 was [2147483647,2147483647,2147483647] 2014-05-30 12:54:31.865893 7f67089d9700 20 osd.0 926 107.5s1 in epoch 905 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866001 7f67089d9700 20 osd.0 926 107.5s1 in epoch 906 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866107 7f67089d9700 20 osd.0 926 107.5s1 in epoch 907 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866214 7f67089d9700 20 osd.0 926 107.5s1 in epoch 908 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866320 7f67089d9700 20 osd.0 926 107.5s1 in epoch 909 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866427 7f67089d9700 20 osd.0 926 107.5s1 in epoch 910 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866534 7f67089d9700 20 osd.0 926 107.5s1 in epoch 911 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866640 7f67089d9700 20 osd.0 926 107.5s1 in epoch 912 was [2147483647,5,2147483647] 2014-05-30 12:54:31.866746 7f67089d9700 20 osd.0 926 107.5s1 in epoch 913 was [2147483647,2147483647,2147483647] 2014-05-30 12:54:31.866853 7f67089d9700 20 osd.0 926 107.5s1 in epoch 914 was [2147483647,2147483647,2147483647] 2014-05-30 12:54:31.866959 7f67089d9700 20 osd.0 926 107.5s1 in epoch 915 was [2147483647,2147483647,2147483647] 2014-05-30 12:54:31.866963 7f67089d9700 10 osd.0 926 calc_priors_during 107.5s1 [903,916) = 5(1),2147483647(0),2147483647(1),2147483647(2) 2014-05-30 12:54:31.866968 7f67089d9700 10 osd.0 926 mkpg 107.5s1 e903 h ec=903 les/c 0/903 916/916/916 : querying priors 5(1),2147483647(0),2147483647(1),2147483647(2) 2014-05-30 12:54:31.866973 7f67089d9700 10 osd.0 926 can_create_pg 107.5s1 - waiting for priors 5(1),2147483647(0),2147483647(1),2147483647(2)
This is running in teuthology, so I believe all the OSDs are running; it should be getting responses. The test timed out waiting for this PG to get created; something might be wrong on the peer it's querying, but this OSD got the request maaany times (starting much earlier than this one).
I believe ubuntu@teuthology:/a/gregf-2014-05-29_16:19:17-rados-wip-client-fast-dispatch-testing-basic-plana/280738 has more examples (6 PGs are stuck creating for it).
Updated by Samuel Just almost 10 years ago
Hmm, waiting for priors probably shouldn't be waiting on OSD -1 (Greg: yep, you were right). Looks like the priors code doesn't think to skip empty osd slots.
Updated by Samuel Just almost 10 years ago
- Status changed from New to In Progress
- Assignee set to Samuel Just
Updated by Samuel Just almost 10 years ago
- Status changed from 7 to Pending Backport
Updated by Samuel Just almost 10 years ago
- Status changed from Pending Backport to Resolved