Project

General

Profile

Actions

Bug #10039

closed

osd cann't entry up status with cpu 100%, when osd restart from out status.

Added by qiu shanggao over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version: 0.80.6
platform : Redhat 6.5
Host: 3
osd node: 15 (5 per host)

operator:
1 start the ceph cluster;
2 shutdown a host;
3 after one day, restart the host;
4. the osd start , but entry up state, and all osd's cpu 100%;

then , gdb attach to osd, I find the FileStore::lfn_open return error.
int FileStore::lfn_open(coll_t cid,
const ghobject_t& oid,
bool create,
FDRef *outfd,
IndexedPath *path,
Index *index) {
....

r = (*index)->lookup(oid, path, &exist);
if (r < 0) {
derr << "could not find " << oid << " in index: "
<< cpp_strerror(-r) << dendl;
goto fail;
}
r = ::open((*path)->path(), flags, 0644);

the ::open function return -1;

then I print path, I can find the path is:
(gdb) p *(CollectionIndex::Path *) 0x7f881c844160
$6 = {
full_path = "/ceph-data/ceph-1/current/meta/DIR_5/osdmap.168__0_AC977195__none",
parent_ref = std::tr1::shared_ptr (count 2) 0x7f881c8464a0,
parent_coll = { static META_COLL = {
static META_COLL = <same as static member of an already seen type>,
str="meta"
},
str = "meta"
}
}

but I list the "/ceph-data/ceph-1/current/meta/DIR_5/" , can't find the file:

[root@sandstone0002 DIR_5]# ll osdmap*
rw-r--r- 1 sdsadmin sdsadmin 17490 Nov 8 17:34 osdmap.883__0_AC94C795__none

Actions #1

Updated by qiu shanggao over 9 years ago

I have fix this problem, by merge this patch, thanks.

osd: fix map advance limit to handle map gaps
The recent change in cf25bdf would stop
advancing after some number of epochs, but did not take into consideration
the possibilty that there are missing maps. In that case, it is impossible
to advance past the gap.

Fix this by increasing the max epoch as we go so that we can always get
beyond the gap.

Signed-off-by: Sage Weil <>

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF