Project

General

Profile

Bug #8507

OSD: failing to create PG (when splitting?) in rados/test.sh

Added by Greg Farnum about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
Start date:
06/02/2014
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

ubuntu@teuthology:/a/gregf-2014-05-29_16:19:17-rados-wip-client-fast-dispatch-testing-basic-plana/280674/remote/plana15/log/ceph-osd.0.log.gz

2014-05-30 12:54:31.864524 7f67089d9700  1 -- 10.214.131.25:6809/22687 <== mon.1 10.214.131.28:6789/0 5 ==== osd_pg_create(pg107.5,903; ) v2 ==== 60+0+0 (3774340542 0 0) 0x1d54400 con 0x2f35160
2014-05-30 12:54:31.864538 7f67089d9700 10 osd.0 926 do_waiters -- start
2014-05-30 12:54:31.864542 7f67089d9700 10 osd.0 926 do_waiters -- finish
2014-05-30 12:54:31.864544 7f67089d9700 20 osd.0 926 _dispatch 0x1d54400 osd_pg_create(pg107.5,903; ) v2
2014-05-30 12:54:31.864606 7f67089d9700 10 osd.0 926 handle_pg_create osd_pg_create(pg107.5,903; ) v2
2014-05-30 12:54:31.864611 7f67089d9700 15 osd.0 926 require_same_or_newer_map 926 (i am 926) 0x1d54400
2014-05-30 12:54:31.864627 7f67089d9700 20 osd.0 926 mkpg 107.5 e903
2014-05-30 12:54:31.864766 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 from 903 to 926, start ec=903 les/c 0/903 0/0/0
2014-05-30 12:54:31.865465 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 acting|up changed in 916 from [2147483647,2147483647,2147483647]/[2147483647,2147483647,2147483647] -1/-1 -> [2147483647,0,2147483647]/[2147483647,0,2147483647] 0/0
2014-05-30 12:54:31.865472 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 up changed in 916 from [2147483647,2147483647,2147483647] -1 -> [2147483647,0,2147483647] 0
2014-05-30 12:54:31.865476 7f67089d9700 15 osd.0 926 project_pg_history 107.5s1 primary changed in 916
2014-05-30 12:54:31.865479 7f67089d9700 15 osd.0 926 project_pg_history end ec=903 les/c 0/903 916/916/916
2014-05-30 12:54:31.865482 7f67089d9700 15 osd.0 926 calc_priors_during 107.5s1 [903,916)
2014-05-30 12:54:31.865634 7f67089d9700 20 osd.0 926   107.5s1 in epoch 903 was [2147483647,2147483647,2147483647]
2014-05-30 12:54:31.865787 7f67089d9700 20 osd.0 926   107.5s1 in epoch 904 was [2147483647,2147483647,2147483647]
2014-05-30 12:54:31.865893 7f67089d9700 20 osd.0 926   107.5s1 in epoch 905 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866001 7f67089d9700 20 osd.0 926   107.5s1 in epoch 906 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866107 7f67089d9700 20 osd.0 926   107.5s1 in epoch 907 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866214 7f67089d9700 20 osd.0 926   107.5s1 in epoch 908 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866320 7f67089d9700 20 osd.0 926   107.5s1 in epoch 909 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866427 7f67089d9700 20 osd.0 926   107.5s1 in epoch 910 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866534 7f67089d9700 20 osd.0 926   107.5s1 in epoch 911 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866640 7f67089d9700 20 osd.0 926   107.5s1 in epoch 912 was [2147483647,5,2147483647]
2014-05-30 12:54:31.866746 7f67089d9700 20 osd.0 926   107.5s1 in epoch 913 was [2147483647,2147483647,2147483647]
2014-05-30 12:54:31.866853 7f67089d9700 20 osd.0 926   107.5s1 in epoch 914 was [2147483647,2147483647,2147483647]
2014-05-30 12:54:31.866959 7f67089d9700 20 osd.0 926   107.5s1 in epoch 915 was [2147483647,2147483647,2147483647]
2014-05-30 12:54:31.866963 7f67089d9700 10 osd.0 926 calc_priors_during 107.5s1 [903,916) = 5(1),2147483647(0),2147483647(1),2147483647(2)
2014-05-30 12:54:31.866968 7f67089d9700 10 osd.0 926 mkpg 107.5s1 e903 h ec=903 les/c 0/903 916/916/916 : querying priors 5(1),2147483647(0),2147483647(1),2147483647(2)
2014-05-30 12:54:31.866973 7f67089d9700 10 osd.0 926 can_create_pg 107.5s1 - waiting for priors 5(1),2147483647(0),2147483647(1),2147483647(2)

This is running in teuthology, so I believe all the OSDs are running; it should be getting responses. The test timed out waiting for this PG to get created; something might be wrong on the peer it's querying, but this OSD got the request maaany times (starting much earlier than this one).

I believe ubuntu@teuthology:/a/gregf-2014-05-29_16:19:17-rados-wip-client-fast-dispatch-testing-basic-plana/280738 has more examples (6 PGs are stuck creating for it).


Related issues

Duplicated by Ceph - Bug #8610: osd: calc_priors_during doesn't ignore CRUSH_ITEM_NONE Duplicate 06/16/2014

Associated revisions

Revision 0bd6f679 (diff)
Added by Samuel Just about 5 years ago

OSD::calc_priors_during: handle CRUSH_ITEM_NONE correctly

Fixes: #8507
Signed-off-by: Samuel Just <>

Revision 50dc2462 (diff)
Added by Samuel Just about 5 years ago

OSD::calc_priors_during: handle CRUSH_ITEM_NONE correctly

Fixes: #8507
Signed-off-by: Samuel Just <>
(cherry picked from commit 0bd6f6797c69af9aff851033c57c42121671c684)

Conflicts:
src/osd/OSD.cc

History

#1 Updated by Samuel Just about 5 years ago

  • Priority changed from High to Urgent

#2 Updated by Samuel Just about 5 years ago

Hmm, waiting for priors probably shouldn't be waiting on OSD -1 (Greg: yep, you were right). Looks like the priors code doesn't think to skip empty osd slots.

#3 Updated by Samuel Just about 5 years ago

  • Status changed from New to In Progress
  • Assignee set to Samuel Just

#4 Updated by Samuel Just about 5 years ago

  • Status changed from In Progress to Testing

#5 Updated by Samuel Just about 5 years ago

  • Status changed from Testing to Pending Backport

#6 Updated by Samuel Just about 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF