Bug #1943
osd: bad clone transaction on journal replay
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Martin got this with v0.40:
2012-01-14 15:30:23.530993 7f5dac7307a0 journal read_entry 186952585216 : seq 5669866 13905 bytes 2012-01-14 15:30:23.530993 7f5dac7307a0 filestore(/data/osd0) _do_transaction on 0x292f000 2012-01-14 15:30:23.530993 7f5dac7307a0 filestore(/data/osd0) remove 2.44_head/rb.0.3.000000000814/head/2fc5a244 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) remove 2.44_head/rb.0.3.000000000814/head/2fc5a244 = 0 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) write 2.44_head/rb.0.3.000000000814/head/2fc5a244 610304~12288 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) queue_flusher ep 0 fd 524 610304~12288 qlen 512 hit flusher_max_fds 512, skipp ing async flush 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) write 2.44_head/rb.0.3.000000000814/head/2fc5a244 610304~12288 = 12288 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) clone_range 2.44_head/rb.0.3.000000000814/2/2fc5a244 -> 2.44_head/rb.0.3.00000 0000814/head/2fc5a244 0~610304 to 0 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_clone_range 0~610304 to 0 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_clone_range cloning 0~610304 to 0 = 0 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_clone_range failed CLONE_RANGE call with -EINVAL, using copy 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_copy_range 0~610304 to 0 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) FileStore::_do_copy_range got short read result at 524 of 524~610304 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) _do_copy_range 0~610304 to 0 = -34 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) clone_range 2.44_head/rb.0.3.000000000814/2/2fc5a244 -> 2.44_head/rb.0.3.00000 0000814/head/2fc5a244 0~610304 to 0 = -34 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) error (34) Numerical result out of range not handled on operation 30 (op num 3, counting from 1) 2012-01-14 15:30:23.540993 7f5dac7307a0 filestore(/data/osd0) transaction dump: 1: remove 2.44_head rb.0.3.000000000814/head/2fc5a244 2: write 2.44_head rb.0.3.000000000814/head/2fc5a244 610304~12288 (12288) 3: clone_range 2.44_head rb.0.3.000000000814/2/2fc5a244 -> rb.0.3.000000000814/head/2fc5a244 0~610304 -> 0 4: clone_range 2.44_head rb.0.3.000000000814/2/2fc5a244 -> rb.0.3.000000000814/head/2fc5a244 622592~3571712 -> 622592 5: setattrs 2.44_head rb.0.3.000000000814/head/2fc5a244 {_=buffer::ptr(12779~180 0x29e21eb in raw 0x29df000 len 13905 nref 4),snapset=buffer ::ptr(12974~102 0x29e22ae in raw 0x29df000 len 13905 nref 4)} 6: coll_setattr 2.44_head info (353) 7: truncate meta pginfo_2.44/0/28d2ff47 0 8: write meta pginfo_2.44/0/28d2ff47 0~326 (326) os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t)', in thread '7f5dac7307a0' os/FileStore.cc: 2438: FAILED assert(0 == "unexpected error") ceph version 0.40-6-g8d271f4 (commit:8d271f439ed9a2cb4e2dd9c3b22b50dde78f6bc0) 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long)+0x2d73) [0x70ee63] 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x77) [0x70f7f7] 3: (JournalingObjectStore::journal_replay(unsigned long)+0x92a) [0x72641a] 4: (FileStore::mount()+0x1dd9) [0x713389] 5: /usr/bin/ceph-osd() [0x54835a] 6: (OSD::convertfs(std::string const&, std::string const&)+0xbb) [0x54957b] 7: (main()+0x3240) [0x4c1ea0] 8: (__libc_start_main()+0xff) [0x7f5daa87ceff] 9: /usr/bin/ceph-osd() [0x4be849] ceph version 0.40-6-g8d271f4 (commit:8d271f439ed9a2cb4e2dd9c3b22b50dde78f6bc0) 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long)+0x2d73) [0x70ee63] 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x77) [0x70f7f7] 3: (JournalingObjectStore::journal_replay(unsigned long)+0x92a) [0x72641a]
It looks to me like the recovery code did the recover of head vs clone in the wrong order and issued a bad transaction.
Related issues
History
#1 Updated by Sage Weil over 11 years ago
- Status changed from New to Need More Info
#2 Updated by Sage Weil over 11 years ago
hit this again on ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-01-27-a/9261. thrashing.
added more btrfs debug to console
#3 Updated by Sage Weil over 11 years ago
Sage Weil wrote:
hit this again on ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-01-27-a/9261. thrashing.
added more btrfs debug to console
whoops, that was meant for #1975
#4 Updated by Sage Weil over 11 years ago
- Assignee set to Sage Weil
I'm going to disable the clone stuff for snap objects until the push/pull code is rewritten (in a non-buggy way).
#5 Updated by Sage Weil over 11 years ago
- Target version changed from v0.41 to v0.42
#6 Updated by Sage Weil over 11 years ago
- Priority changed from High to Normal
#7 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Duplicate