Project

General

Profile

Bug #1943

osd: bad clone transaction on journal replay

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Martin got this with v0.40:


2012-01-14 15:30:23.530993 7f5dac7307a0 journal read_entry 186952585216 : seq 5669866 13905 bytes
2012-01-14 15:30:23.530993 7f5dac7307a0 filestore(/data/osd0) _do_transaction on 0x292f000
2012-01-14 15:30:23.530993 7f5dac7307a0 filestore(/data/osd0) remove 2.44_head/rb.0.3.000000000814/head/2fc5a244
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) remove 2.44_head/rb.0.3.000000000814/head/2fc5a244 = 0
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) write 2.44_head/rb.0.3.000000000814/head/2fc5a244 610304~12288
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) queue_flusher ep 0 fd 524 610304~12288 qlen 512 hit flusher_max_fds 512, skipp
ing async flush
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) write 2.44_head/rb.0.3.000000000814/head/2fc5a244 610304~12288 = 12288
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) clone_range 2.44_head/rb.0.3.000000000814/2/2fc5a244 -> 2.44_head/rb.0.3.00000
0000814/head/2fc5a244 0~610304 to 0
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_clone_range 0~610304 to 0
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_clone_range cloning 0~610304 to 0 = 0
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_clone_range failed CLONE_RANGE call with -EINVAL, using copy
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_copy_range 0~610304 to 0
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) FileStore::_do_copy_range got short read result at 524 of 524~610304
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_copy_range 0~610304 to 0 = -34
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) clone_range 2.44_head/rb.0.3.000000000814/2/2fc5a244 -> 2.44_head/rb.0.3.00000
0000814/head/2fc5a244 0~610304 to 0 = -34
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0)  error (34) Numerical result out of range not handled on operation 30 (op num 
3, counting from 1)
2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0)  transaction dump:
1: remove 2.44_head rb.0.3.000000000814/head/2fc5a244
2: write 2.44_head rb.0.3.000000000814/head/2fc5a244 610304~12288 (12288)
3: clone_range 2.44_head rb.0.3.000000000814/2/2fc5a244 -> rb.0.3.000000000814/head/2fc5a244 0~610304 -> 0
4: clone_range 2.44_head rb.0.3.000000000814/2/2fc5a244 -> rb.0.3.000000000814/head/2fc5a244 622592~3571712 -> 622592
5: setattrs 2.44_head rb.0.3.000000000814/head/2fc5a244 {_=buffer::ptr(12779~180 0x29e21eb in raw 0x29df000 len 13905 nref 4),snapset=buffer
::ptr(12974~102 0x29e22ae in raw 0x29df000 len 13905 nref 4)}
6: coll_setattr 2.44_head info (353)
7: truncate meta pginfo_2.44/0/28d2ff47 0
8: write meta pginfo_2.44/0/28d2ff47 0~326 (326)

os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t)', in thread '7f5dac7307a0'
os/FileStore.cc: 2438: FAILED assert(0 == "unexpected error")
 ceph version 0.40-6-g8d271f4 (commit:8d271f439ed9a2cb4e2dd9c3b22b50dde78f6bc0)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long)+0x2d73) [0x70ee63]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x77) [0x70f7f7]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x92a) [0x72641a]
 4: (FileStore::mount()+0x1dd9) [0x713389]
 5: /usr/bin/ceph-osd() [0x54835a]
 6: (OSD::convertfs(std::string const&, std::string const&)+0xbb) [0x54957b]
 7: (main()+0x3240) [0x4c1ea0]
 8: (__libc_start_main()+0xff) [0x7f5daa87ceff]
 9: /usr/bin/ceph-osd() [0x4be849]
 ceph version 0.40-6-g8d271f4 (commit:8d271f439ed9a2cb4e2dd9c3b22b50dde78f6bc0)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long)+0x2d73) [0x70ee63]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x77) [0x70f7f7]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x92a) [0x72641a]

It looks to me like the recovery code did the recover of head vs clone in the wrong order and issued a bad transaction.


Related issues

Duplicates Ceph - Bug #2002: osd: racy push/pull for clones Resolved 01/30/2012
Blocked by Ceph - Feature #1944: osd: dump journal Resolved 01/14/2012

History

#1 Updated by Sage Weil over 9 years ago

  • Status changed from New to Need More Info

#2 Updated by Sage Weil over 9 years ago

hit this again on ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-01-27-a/9261. thrashing.

added more btrfs debug to console

#3 Updated by Sage Weil over 9 years ago

Sage Weil wrote:

hit this again on ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-01-27-a/9261. thrashing.

added more btrfs debug to console

whoops, that was meant for #1975

#4 Updated by Sage Weil over 9 years ago

  • Assignee set to Sage Weil

I'm going to disable the clone stuff for snap objects until the push/pull code is rewritten (in a non-buggy way).

#5 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.41 to v0.42

#6 Updated by Sage Weil over 9 years ago

  • Priority changed from High to Normal

#7 Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Duplicate

Also available in: Atom PDF