Project

General

Profile

Actions

Bug #1862

closed

filestore: EINVAL on replay

Added by Marco Aroldi over 12 years ago. Updated over 12 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,
I'm testing ceph 0.39 on two VM (Ubuntu 11.10) on Hyper-V with all Linux Integration Components installed.
2 osd on 2 devices (/dev/sdb and /dev/sdc) btfs formatted for each vm
First machine: Mon, Mds, Osd.1 and Osd.2
Second machine: Osd.3 and Osd.4

After 1hour and 20minutes, I've experienced some network trouble, so i've rebooted the machines.
At 12:10 osd.1 and .2 started to crash. See log attached.
I'm not able to bring that osds online.

Any help is appreciated.
Marco


Files

osd.1.log.zip (1.07 MB) osd.1.log.zip Marco Aroldi, 12/28/2011 07:25 AM
osd.1.log-0.39-195-ge18b1c9 (27.1 KB) osd.1.log-0.39-195-ge18b1c9 Marco Aroldi, 12/29/2011 03:33 AM
osd.2.log-0.39-195-ge18b1c9 (10.8 KB) osd.2.log-0.39-195-ge18b1c9 Marco Aroldi, 12/29/2011 03:33 AM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #1759: mds/client: truncate size overflow, fails with EINVALResolved11/29/2011

Actions
Actions #1

Updated by Sage Weil over 12 years ago

  • Category set to OSD
  • Status changed from New to Need More Info
  • Target version set to v0.40

Can you try running the latest master code and restart ceph-osd? Specifically, 7133a2faf0ae0710b7cbd9801c64767172d48faf will dump the contents of the transaction that failed to replay to the log.

THanks!

Actions #2

Updated by Sage Weil over 12 years ago

  • Subject changed from Osd crashes after a while to filestore: EINVAL on replay
Actions #3

Updated by Marco Aroldi over 12 years ago

Hi Sage,
I'm sorry but I don't understand the steps requested.
Please, could you explain a little bit more?

Updated by Marco Aroldi over 12 years ago

Hi,
I've downloaded and compiled the latest code from the git repository.
I've issued a "ceph-osd -i 1 --debug_ms 20" and a "/etc/init.d/ceph -a restart" but nothing is changed.
See logs attached.

Actions #5

Updated by Sage Weil over 12 years ago

  • Status changed from Need More Info to Duplicate

Aha, this is actually #1759. If you apply the patch in that bug report it'll get your OSDs up and running again. The master branch also has some additional checks and asserts now that will catch the MDS bug on the MDS side (instead of clobbering the OSD), which should help us find the actual problem. Thanks!

Actions #6

Updated by Marco Aroldi over 12 years ago

Hmmm
I have another problem: i've tried the patch in #1759 but I have a error at compile time:

CXX    libos_la-FileStore.lo
os/FileStore.cc: In member function 'int FileStore::_truncate(coll_t, const hobject_t&, uint64_t)':
os/FileStore.cc:2608:21: error: 'int64' was not declared in this scope
os/FileStore.cc:2608:27: error: expected ')' before 'size'
make[3]: * [libos_la-FileStore.lo] Error 1
make[3]: Leaving directory `/root/ceph/src'
make[2]:
[all-recursive] Error 1
make[2]: Leaving directory `/root/ceph/src'
make[1]:
[all] Error 2
make[1]: Leaving directory `/root/ceph/src'
make: *
[all-recursive] Error 1

This is my git diff:

diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index d25d974..910b52c 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@ -2605,6 +2605,8 @ int FileStore::_remove(coll_t cid, const hobject_t& oid)
int FileStore::_truncate(coll_t cid, const hobject_t& oid, uint64_t size) {
dout(15) << "truncate " << cid << "/" << oid << " size " << size << dendl;
+ if (replaying && (int64)size < 0)
+ return 0;
int r = lfn_truncate(cid, oid, size);
dout(10) << "truncate " << cid << "/" << oid << " size " << size << " = " << r << dendl;
return r;

Actions #7

Updated by Sage Weil over 12 years ago

Marco Aroldi wrote:

Hmmm
I have another problem: i've tried the patch in #1759 but I have a error at compile time:

CXX libos_la-FileStore.lo
os/FileStore.cc: In member function 'int FileStore::_truncate(coll_t, const hobject_t&, uint64_t)':
os/FileStore.cc:2608:21: error: 'int64' was not declared in this scope

Whoops, that should be int64_t, not int64.

Actions #8

Updated by Marco Aroldi over 12 years ago

Hello Sage,
Int64_t do the trick and now the osd are back online again!

Thank you
Marco

Actions

Also available in: Atom PDF