Project

General

Profile

Actions

Bug #6598

closed

osd crash after recreating pool with same name (cuttlefish + bobtail?)

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Date: Sat, 19 Oct 2013 23:53:57 +0400
From: Andrey Korolyov <>
To: "" <>
Subject: [ceph-users] Cuttlefish: pool recreation results in cluster crash
Parts/Attachments:
1 Shown 11 lines Text (charset: UTF-8)
2 5.5 KB Application
3 Shown 5 lines Text
----------------------------------------

[ The following text is in the "UTF-8" character set. ]
[ Your display is set for the "ANSI_X3.4-1968" character set. ]
[ Some characters may be displayed incorrectly. ]

Hello,

I was able to reproduce following on the top of current cuttlefish:

- create pool,
- delete it after all pgs initialized,
- create new pool with same name after, say, ten seconds.

All osds dies immediately with attached trace. The problem exists in
bobtail as well.

Thread 1 (Thread 0x7f4fdf42b700 (LWP 28165)):
#0  0x00007f4fe78ee405 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f4fe78f1b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f4fe81ec875 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:50
#3  0x00007f4fe81ea996 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:40
#4  0x00007f4fe81ea9c3 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:50
#5  0x00007f4fe81eabee in __cxxabiv1::__cxa_throw (obj=0x2ec91f0, tinfo=<optimized out>, dest=<optimized out>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:83
#6  0x000000000090d3fa in ceph::__ceph_assert_fail (assertion=0xa2c697 "0 == \"unexpected error\"", file=<optimized out>, line=2787, 
    func=0xa2f7a0 "unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)") at common/assert.cc:77
#7  0x00000000008211ab in FileStore::_do_transaction (this=this@entry=0x24a4000, t=..., op_seq=op_seq@entry=226873, 
    trans_num=trans_num@entry=0) at os/FileStore.cc:2787
#8  0x0000000000824829 in FileStore::_do_transactions (this=this@entry=0x24a4000, tls=std::list, op_seq=226873, 
    handle=handle@entry=0x7f4fdf42ab80) at os/FileStore.cc:2163
#9  0x00000000008249be in FileStore::_do_op (this=0x24a4000, osr=<optimized out>, handle=...) at os/FileStore.cc:1997
#10 0x0000000000903eca in ThreadPool::worker (this=0x24a4a10, wt=0x24617a0) at common/WorkQueue.cc:119
#11 0x0000000000905170 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:316
#12 0x00007f4fe9663e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f4fe79aa3dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Actions #1

Updated by Samuel Just over 10 years ago

quick attempt seems to suceed in current master, checking cuttlefish

Actions #2

Updated by Samuel Just over 10 years ago

  • Status changed from New to Need More Info

Hmm, I was also unable to reproduce on cuttlefish head. What sha1 are you running? Can you reproduce with

debug osd = 20
debug filestore = 20
debug ms = 1

on the osds?

Actions #3

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #4

Updated by Sage Weil over 10 years ago

occurs on cuttlefish, not dumpling.

Actions #5

Updated by Andrey Korolyov over 10 years ago

Seems that we had accidentaly triggered existing assert() and in the wild it would be almost impossible to reproduce this one. Please hold on for a couple of days until investigation ends.

Actions #6

Updated by Andrey Korolyov over 10 years ago

It was our mistake resulted in race/duplication for mkcoll() call and resulting crash on EEXIST. Sorry for noise.

Actions #7

Updated by Loïc Dachary over 10 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF