Project

General

Profile

Bug #1759

mds/client: truncate size overflow, fails with EINVAL

Added by Sam Lang over 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My version of ceph is a minor variant of 0.38, running with ext4, and ceph-fuse. It looks like my fs has gotten corrupted somehow. I've seen this assert failure on two of the osds I have running, and it hits the same assertion on restart of the osd. It looks like the EINVAL is actually coming from a truncate of an object, due to the size passed to the truncate being extremely large (18446744073709551615). Any way to debug this or correct it?

Log from one of the failed osds:

Nov 29 18:52:29 sug-chifj21 osd.5919711: 7f1dfa92b700 filestore(/srv/ceph/osd.59) error error 22: Invalid argument not handled
Nov 29 18:52:29 sug-chifj21 osd.5919711: ../../src/os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f1dfa92b700'#012../../src/os/FileStore.cc: 2407: FAILED assert(0 == "unexpected error")
Nov 29 18:52:29 sug-chifj21 osd.5919711: ceph version (commit:)#012 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x89) [0x9142d9]#012 2: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x199c) [0xa7921e]#012 3: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x105) [0xa76cc1]#012 4: (FileStore::_do_op(FileStore::OpSequencer*)+0x1b9) [0xa7545f]#012 5: (FileStore::OpWQ::_process(FileStore::OpSequencer*)+0x27) [0xa8d445]#012 6: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*)+0x2e) [0xa9c5a6]#012 7: (ThreadPool::worker()+0x42c) [0x914b54]#012 8: (ThreadPool::WorkThread::entry()+0x1c) [0x8ae79e]#012 9: (Thread::_entry_func(void*)+0x23) [0x97cd09]#012 10: (()+0x6d8c) [0x7f1e05a10d8c]#012 11: (clone()+0x6d) [0x7f1e0425204d]
Nov 29 18:52:29 sug-chifj21 osd.5919711: ceph version (commit:)#012 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x89) [0x9142d9]#012 2: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x199c) [0xa7921e]#012 3: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x105) [0xa76cc1]#012 4: (FileStore::_do_op(FileStore::OpSequencer*)+0x1b9) [0xa7545f]#012 5: (FileStore::OpWQ::_process(FileStore::OpSequencer*)+0x27) [0xa8d445]#012 6: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*)+0x2e) [0xa9c5a6]#012 7: (ThreadPool::worker()+0x42c) [0x914b54]#012 8: (ThreadPool::WorkThread::entry()+0x1c) [0x8ae79e]#012 9: (Thread::_entry_func(void*)+0x23) [0x97cd09]#012 10: (()+0x6d8c) [0x7f1e05a10d8c]#012 11: (clone()+0x6d) [0x7f1e0425204d]
Nov 29 18:52:29 sug-chifj21 osd.5919711: ** Caught signal (Aborted) *#012 in thread 7f1dfa92b700
Nov 29 18:52:29 sug-chifj21 osd.5919711: ceph version (commit:)#012 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x914655]#012 2: /usr/bin/ceph-osd() [0xa650ff]#012 3: (()+0xfc60) [0x7f1e05a19c60]#012 4: (gsignal()+0x35) [0x7f1e0419fd05]#012 5: (abort()+0x186) [0x7f1e041a3ab6]#012 6: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f1e04a566dd]#012 7: (()+0xb9926) [0x7f1e04a54926]#012 8: (()+0xb9953) [0x7f1e04a54953]#012 9: (()+0xb9a5e) [0x7f1e04a54a5e]#012 10: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f3) [0x914443]#012 11: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x199c) [0xa7921e]#012 12: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x105) [0xa76cc1]#012 13: (FileStore::_do_op(FileStore::OpSequencer*)+0x1b9) [0xa7545f]#012 14: (FileStore::OpWQ::_process(FileStore::OpSequencer*)+0x27) [0xa8d445]#012 15: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*)+0x2e) [0xa9c5a6]#012 16: (ThreadPool::worker()+0x42c) [0x914b54]#012 17: (ThreadPool::WorkThread::entry()+0x1c) [0x8ae79e]#012 18: (Thread::_entry_func(void*)+0x23) [0x97cd09]#012 19: (()+0x6d8c) [0x7f1e05a10d8c]#012 20: (clone()+0x6d) [0x7f1e0425204d]

strace truncate error for osd.59:

truncate("/srv/ceph/osd.59/current/0.274_head/1000000001a.00000002__head_9A0B7274", 18446744073709551615) = -1 EINVAL (Invalid argument)

strace truncate error for osd.65:

truncate("/srv/ceph/osd.65/current/0.274_head/1000000001a.00000002__head_9A0B7274", 18446744073709551615) = -1 EINVAL (Invalid argument)

osd.0-crash.log View - Syslog-formatted crash log of osd.0 replay (256 KB) David McBride, 12/14/2011 01:53 PM


Related issues

Duplicated by Ceph - Bug #1862: filestore: EINVAL on replay Duplicate 12/28/2011

History

#1 Updated by Sage Weil over 12 years ago

  • Description updated (diff)

Do you by chance have the log preceeding the first crash?

Working around this is probably a matter of patching with something like


diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index 3c64620..122e155 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@@ -2573,6 +2573,8 @@ int FileStore::_remove(coll_t cid, const hobject_t& oid)
 int FileStore::_truncate(coll_t cid, const hobject_t& oid, uint64_t size)
 {
   dout(15) << "truncate " << cid << "/" << oid << " size " << size << dendl;
+  if (replaying && (int64)size < 0)
+    return 0;
   int r = lfn_truncate(cid, oid, size);
   dout(10) << "truncate " << cid << "/" << oid << " size " << size << " = " << r << dendl;
   return r;

That will avoid the bug on osd restart... but it would be extremely helpful to see where the toxic truncate request is coming from in the first place, though, so if you have the old logs, or see this crashing the osd again, crank up osd logs and let's see where it comes from.

#2 Updated by Sage Weil over 12 years ago

  • Assignee set to Sage Weil
  • Target version set to v0.40

Actually, maybe you run with the wip-truncate branch on the mds and see if you triggers a failed assertion on the MDS? That is probably where the bug is.

Thanks!

#3 Updated by Sage Weil over 12 years ago

Sam, did you get a chance to try this?

#4 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position set to 8

#5 Updated by Sam Lang over 12 years ago

Sorry - haven't had a chance yet. I'll try it on Monday.

#6 Updated by Sam Lang over 12 years ago

I lost my setup over the weekend, so I'm not going to be able to try the wip-truncate branch on the deployment to see if it fixes the issue. If it happens again (after a reset of the filesystem), I'll try the wip-branch then.

#7 Updated by Sage Weil over 12 years ago

  • Subject changed from truncate object fails with EINVAL to mds/client: truncate size overflow, fails with EINVAL
  • translation missing: en.field_position deleted (13)
  • translation missing: en.field_position set to 13

#8 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position deleted (13)
  • translation missing: en.field_position set to 30

#9 Updated by David McBride over 12 years ago

Hi,

I've run into this precise problem on a small testing cluster that I'm running -- down to the large 64-bit truncate ID being exactly the same value, 18446744073709551615.

The main difference is how the error came about: my OSD didn't crash; rather, I deliberately restarted it to perform a kernel upgrade.

Cluster is configured as:
vm-cephhead: mon.0
terra15: osd.0, mds.0
terra16: osd.1
terra17: osd.2
terra18: osd.3
terra19: osd.4
terra20: osd.5

Each terra server has two disks; /dev/sda for OS+configuration+daemon-state+journal, /dev/sdb mounted as a btrfs filesystem on /data/osd for object storage.

Attached is the osd log from terra15, with the various debug logging settings in the default ceph.conf uncommented. Note that this I'm using syslog output, hence the formatting.

Attempting to restart the ceph-osd instance running on osd.5 caused the same failure; perhaps this is happenstance, but I suspect I might see similar results if I restart any of my other OSDs.

Finally: I've also had a chance to try running the wip-truncate branch ceph-mds on osd.0; however it did not produce any asserts()s.

Hope this helps. I'm not in a hurry to get my cluster back up and running, and none of the contents are at all important -- so I can leave it down and answer requests for further debug steps if desired.

Cheers,
David

#10 Updated by Sage Weil about 12 years ago

  • Status changed from New to Need More Info
  • Assignee deleted (Sage Weil)

The OSD now returns EINVAL, the MDS asserts if it gets EINVAL, and we have some MDS-side assertions that should catch this before the bad truncate request is sent. Now we wait until we see the MDS crash to find the original bug...

#11 Updated by Sage Weil about 12 years ago

  • translation missing: en.field_position deleted (39)
  • translation missing: en.field_position set to 30

#12 Updated by David McBride about 12 years ago

Unfortunately, my Ceph cluster which was presenting these symptoms has now died -- the remaining set of OSDs didn't have enough space to maintain the requisite number of replicas, and so they all sequentially filled up and fell over.

Working on the assumption that that's not going to be addressed in the immediate future, I'm going to re-initialize it and see if the error returns.

If it does, I'll update this bug.

Cheers,
David

#13 Updated by Sage Weil about 12 years ago

David McBride wrote:

Unfortunately, my Ceph cluster which was presenting these symptoms has now died -- the remaining set of OSDs didn't have enough space to maintain the requisite number of replicas, and so they all sequentially filled up and fell over.

Working on the assumption that that's not going to be addressed in the immediate future, I'm going to re-initialize it and see if the error returns.

Right. Fixing this is a few releases out.

If it does, I'll update this bug.

Great. We should now assert in the MDS before the badness hits the osd, so we can catch the bug in the right place. Hopefully your workload will trigger it.. THanks!

#14 Updated by Sage Weil about 12 years ago

  • Target version deleted (v0.40)
  • translation missing: en.field_position deleted (53)
  • translation missing: en.field_position set to 103

#15 Updated by David McBride about 12 years ago

Hi,

I've triggered this failure mode again with a later snapshot of master (commit: a1252463055e2d6816407bd6465e74dea87a0955) without any asserts in the MDS. While running a test workload (running a fairly large rsync into the Ceph filesystem), I triggered a restart of all of the Ceph OSDs (and the MDS). All but one of the OSDs failed to restart, experiencing the same failure mode seen before.

This time, I had logging turned up high on the OSDs. While it makes wading through the log data difficult, a quick grep for our big 2^64-1 value showed many entries which might indicate some bugs:

[...]
2012/01/04 10:47:40 warning     user    terra16 osd.1[11779]: 7f3663cfd700 osd.1 333 OSD::ms_verify_authorizer name=mds.terra15 auid=18446744073709551615
2012/01/04 10:47:49 warning     user    terra17 osd.2[21384]: 7f7515dca700 osd.2 334 OSD::ms_verify_authorizer name=osd.1 auid=18446744073709551615
2012/01/04 10:47:52 warning     user    terra17 osd.2[21384]: 7f75159c6700 osd.2 337 OSD::ms_verify_authorizer name=osd.0 auid=18446744073709551615
2012/01/04 10:47:57 warning     user    terra18 osd.3[32132]: 7fbed0b99700 osd.3 341 OSD::ms_verify_authorizer name=osd.0 auid=18446744073709551615
2012/01/04 10:48:02 warning     user    terra17 osd.2[21384]: 7f7514dba700 osd.2 341 OSD::ms_verify_authorizer name=osd.3 auid=18446744073709551615
[...]

Not obviously related to the current issue at hand, but those auids looks somewhat suspicious. (Perhaps that's typical?)

2012/01/04 10:50:50 warning     user    terra18 osd.3[32132]: 7fbed36a1700 osd.3 367 pg[0.a1( v 59'1425 (59'1423,59'1425] n=1425 ec=1 les/c 367/367 342/342/342) [3,0,1] r=0 lpr=342 lcod 0'0 mlcod 0'0 !hml active+clean]  truncate_seq 1 > current 0, truncating to 18446744073709551615
2012/01/04 10:50:50 warning     user    terra15 osd.0[3831]: 7f0989469700 filestore(/data/osd) truncate 0.a1_head/10000062024.00000000/head/68a014a1 size 18446744073709551615
2012/01/04 10:50:50 warning     user    terra15 osd.0[3831]: 7f0989469700 filestore(/data/osd) truncate 0.a1_head/10000062024.00000000/head/68a014a1 size 18446744073709551615 = -2
2012/01/04 10:50:50 warning     user    terra16 osd.1[11779]: 7f366cf11700 filestore(/data/osd) truncate 0.a1_head/10000062024.00000000/head/68a014a1 size 18446744073709551615
2012/01/04 10:50:50 warning     user    terra16 osd.1[11779]: 7f366cf11700 filestore(/data/osd) truncate 0.a1_head/10000062024.00000000/head/68a014a1 size 18446744073709551615 = -2
2012/01/04 10:50:53 warning     user    terra18 osd.3[32132]: 7fbed96ad700 filestore(/data/osd) truncate 0.a1_head/10000062024.00000000/head/68a014a1 size 18446744073709551615
2012/01/04 10:50:53 warning     user    terra18 osd.3[32132]: 7fbed96ad700 filestore(/data/osd) truncate 0.a1_head/10000062024.00000000/head/68a014a1 size 18446744073709551615 = -2
[...]
2012/01/04 14:46:28 warning     user    terra15 osd.0[3831]: 7f0989c6a700 filestore(/data/osd) truncate 0.f3_head/1000006392f.0000003c/head/b4c8c7f3 size 18446744073709551615
2012/01/04 14:46:28 warning     user    terra15 osd.0[3831]: 7f0989c6a700 filestore(/data/osd) truncate 0.f3_head/1000006392f.0000003c/head/b4c8c7f3 size 18446744073709551615 = -2
2012/01/04 14:46:28 warning     user    terra18 osd.3[32132]: 7fbed96ad700 filestore(/data/osd) truncate 0.b9_head/1000006392f.00000044/head/410fb9 size 18446744073709551615 = -2
2012/01/04 14:46:28 warning     user    terra16 osd.1[11779]: 7f366c710700 filestore(/data/osd) truncate 0.d7_head/1000006392f.0000005e/head/481efd7 size 18446744073709551615
2012/01/04 14:46:28 warning     user    terra16 osd.1[11779]: 7f366c710700 filestore(/data/osd) truncate 0.d7_head/1000006392f.0000005e/head/481efd7 size 18446744073709551615 = -2
2012/01/04 14:46:28 warning     user    terra18 osd.3[32132]: 7fbed9eae700 filestore(/data/osd) truncate 0.e0_head/1000006392f.0000004f/head/df94b6e0 size 18446744073709551615
2012/01/04 14:46:28 warning     user    terra16 osd.1[11779]: 7f3666f05700 osd.1 369 pg[0.e3( v 369'1449 (369'1447,369'1449] n=1447 ec=1 les/c 367/369 359/359/334) [1,0,5] r=0 lpr=359 mlcod 369'1448 !hml active+clean]  truncate_seq 1 > current 0, truncating to 18446744073709551615
2012/01/04 14:46:28 warning     user    terra16 osd.1[11779]: 7f366cf11700 filestore(/data/osd) truncate 0.36_head/1000006392f.00000051/head/256b8e36 size 18446744073709551615
2012/01/04 14:46:28 warning     user    terra16 osd.1[11779]: 7f366cf11700 filestore(/data/osd) truncate 0.36_head/1000006392f.00000051/head/256b8e36 size 18446744073709551615 = -2
2012/01/04 14:46:28 warning     user    terra16 osd.1[11779]: 7f366cf11700 filestore(/data/osd) truncate 0.e0_head/1000006392f.00000053/head/2af487e0 size 18446744073709551615
2012/01/04 14:46:28 warning     user    terra15 osd.0[3831]: 7f0989469700 filestore(/data/osd) truncate 0.f2_head/1000006392f.0000004e/head/650ba5f2 size 18446744073709551615
2012/01/04 14:46:29 warning     user    terra19 osd.4[21897]: 7fd7af5a0700 osd.4 369 pg[0.22( v 369'737 (369'735,369'737] n=737 ec=1 les/c 367/369 359/359/358) [4,2,5] r=0 lpr=359 mlcod 369'736 !hml active+clean]  truncate_seq 1 > current 0, truncating to 18446744073709551615
2012/01/04 14:46:29 warning     user    terra15 osd.0[3831]: 7f0989469700 filestore(/data/osd) truncate 0.f2_head/1000006392f.0000004e/head/650ba5f2 size 18446744073709551615 = -2
2012/01/04 14:46:29 warning     user    terra18 osd.3[32132]: 7fbed9eae700 filestore(/data/osd) truncate 0.e0_head/1000006392f.0000004f/head/df94b6e0 size 18446744073709551615 = -2
2012/01/04 14:46:29 warning     user    terra18 osd.3[32132]: 7fbed96ad700 filestore(/data/osd) truncate 0.5b_head/1000006392f.00000050/head/b4a4745b size 18446744073709551615
2012/01/04 14:46:29 warning     user    terra18 osd.3[32132]: 7fbed96ad700 filestore(/data/osd) truncate 0.5b_head/1000006392f.00000050/head/b4a4745b size 18446744073709551615 = -2
2012/01/04 14:46:29 warning     user    terra16 osd.1[11779]: 7f366c710700 filestore(/data/osd) truncate 0.10e_head/1000006392f.00000055/head/2ae8c90e size 18446744073709551615
2012/01/04 14:46:29 warning     user    terra16 osd.1[11779]: 7f366c710700 filestore(/data/osd) truncate 0.10e_head/1000006392f.00000055/head/2ae8c90e size 18446744073709551615 = -2
[...]

There's a lot of these. 2 maps to -ENOENT - i.e. an object being truncated that doesn't exist on disk -- so I'm guessing that this doesn't cause any fatal errors at runtime, but causes a restarted OSD to error-out when it replays the journal of events to ensure consistency. (Or something like that.)

Is this helpful for tracking the problem down? Or is this perfectly normal behaviour?

Cheers,
David

#16 Updated by Samuel Just about 12 years ago

Ok, I reproduced this with osd debugging, but not mds unfortunately. The logs are at slider:/home/samuelj/archived_logs/negative_one_truncate_bug.

#17 Updated by Sage Weil about 12 years ago

  • Status changed from Need More Info to Resolved

this is a problem with weird truncate_seq/size values in requests, that the osd is now cleaning up.
0ded7e4dac9fb9357afe6bd2fa9f02d0a96ed06c

Also available in: Atom PDF