Project

General

Profile

Actions

Bug #2598

closed

filestore: error during upgrade

Added by Sage Weil almost 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

from ML:

i tried updating one of our osds from stable 0.47-2 to latest next 
branch and it started updating the filestore and failed.
After that neither next branch osd nor stable osd would start with this 
filestore anymore.
Is their something wrong with the filestore update?

Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134135 7ffed3e35780 0 filestore(/data/osd11) mount FIEMAP ioctl is supported and appears to work
Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134163 7ffed3e35780 0 filestore(/data/osd11) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 
Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134476 7ffed3e35780 0 filestore(/data/osd11) mount did NOT detect btrfs 
Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134485 7ffed3e35780 0 filestore(/data/osd11) mount syncfs(2) syscall not support by glibc 
Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134513 7ffed3e35780 0 filestore(/data/osd11) mount no syncfs(2), must use sync(2). 
Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134514 7ffed3e35780 0 filestore(/data/osd11) mount WARNING: multiple ceph-osd daemons on the same host will be slow Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134551 7ffed3e35780 -1 filestore(/data/osd11) FileStore::mount : stale version stamp detected: 2. Proceeding, o_update is set, DO NOT USE THIS OPTION IF YOU DO NOT KNOW WHAT IT DOES. More details can be found on the wiki. 
Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134585 7ffed3e35780 0 filestore(/data/osd11) mount found snaps <> 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.531974 7ffed3e35780 0 filestore(/data/osd11) mount: enabling WRITEAHEAD journal mode: btrfs not detected 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.543721 7ffed3e35780 1 journal _open /dev/sdb1 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 0 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.588059 7ffed3e35780 1 journal _open /dev/sdb1 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 0 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.588905 7ffed3e35780 -1 FileStore is old at version 2. Updating... 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.588914 7ffed3e35780 -1 Removing tmp pgs 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.594362 7ffed3e35780 -1 Getting collections 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.594369 7ffed3e35780 -1 597 to process. 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.595195 7ffed3e35780 -1 0/597 processed 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.595213 7ffed3e35780 -1 Updating collection omap current version is 0 
Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.662274 7ffed3e35780 -1 os/FlatIndex.cc: In function 'virtual int FlatIndex::collection_list_partial(const hobject_t&, int, int, snapid_t, std::vector<hobject_t>*, hobject_t*)' thread 7ffed3e35780 time 2012-06-16 14:10:12.637479
os/FlatIndex.cc: 386: FAILED assert(0)

 ceph version 0.47.2-500-g1e899d0 (commit:1e899d08e61bbba0af6f3600b6bc9a5fc9e5c2e9)
 1: /usr/local/bin/ceph-osd() [0x6b337d]
 2: (FileStore::collection_list_partial(coll_t, hobject_t, int, int, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x9c) [0x67b24c]
 3: (OSD::convert_collection(ObjectStore*, coll_t)+0x529) [0x5b90e9]
 4: (OSD::do_convertfs(ObjectStore*)+0x46f) [0x5b9b9f]
 5: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x5ba127]
 6: (main()+0x967) [0x531d07]
 7: (__libc_start_main()+0xfd) [0x7ffed1d8aead]
 8: /usr/local/bin/ceph-osd() [0x5357b9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Sage Weil almost 12 years ago

  • Description updated (diff)
Actions #2

Updated by Samuel Just almost 12 years ago

That's odd, it's updating the omap directory as a collection. list_collections should not have returned omap as a collection FileStore.cc:4413.

Actions #3

Updated by Sage Weil almost 12 years ago

  • Status changed from New to 7

Oh, der.. pretty sure 82cb3d61ff4f200e0a9040e6381a9eed32db9de1 fixes this.

Actions #4

Updated by Samuel Just almost 12 years ago

Ah... should have tested on another filesystem.

Actions #5

Updated by Simon Frerichs almost 12 years ago

Thanks.
The bug seems to be fixed.

Actions #6

Updated by Simon Frerichs almost 12 years ago

Hi,

filestore updated completed.
When i start the "updated" OSD the whole cluster starts lagging.
Is the next branch OSD incompatible to 0.47-2 OSDs?

We've this error in the logfile:

2012-06-18 10:24:18.299444 osd.14 46.19.94.2:6800/29435 3298 : [ERR] 2.110 push d09d4910/rb.0.29.0000000001ab/head v 8390'3328971 to osd.2 failed because local copy is 8402'3329186

Actions #7

Updated by Sage Weil almost 12 years ago

  • Status changed from 7 to Resolved

THanks!

Actions #8

Updated by Sage Weil almost 12 years ago

Simon Frerichs wrote:

Hi,

filestore updated completed.
When i start the "updated" OSD the whole cluster starts lagging.
Is the next branch OSD incompatible to 0.47-2 OSDs?

We've this error in the logfile:
[...]

That's something else; moved to #2602

Actions

Also available in: Atom PDF