Project

General

Profile

Actions

Bug #14533

closed

os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error") upgrade:hammer-x-wip-diag-14438-distro-basic-openstack

Added by David Zafman about 8 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

teuthology.ovh.sepia.ceph.com:/a/teuthology-2016-01-27_16:15:46-upgrade:hammer-x-wip-diag-14438-distro-basic-openstack/12016

teuthology didn't gather up information from the test nodes as of this moment, but I was able to connect briefly to the machines and got this information:

wip-diag-14438 is equivalent to jewel. OSD still running hammer.

ubuntu@target077206:/var/lib/ceph/osd/ceph-5/current$ ls -R 3.0s1_head
3.0s1_head:
temp\u3.0s0\u0\u34165\u3__head_00000000__fffffffffffffffb_ffffffffffffffff_1

Object named: temp_3.0s0_0_34165_3


2016-01-27 17:02:12.328833 7f06ab5978c0  0 ceph version 0.94.5-227-gd54840b (d54840bf4a70fc65285bbfdff0c7bf8f579643b1), process ceph-osd, pid 12493
...
2016-01-27 17:24:58.024848 7f069df65700 15 filestore(/var/lib/ceph/osd/ceph-5) write 3.0s1_head/0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1 0~4194304
2016-01-27 17:24:58.029039 7f069df65700 10 filestore(/var/lib/ceph/osd/ceph-5) write 3.0s1_head/0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1 0~4194304 = 4194304
2016-01-27 17:24:58.029091 7f069df65700 15 filestore(/var/lib/ceph/osd/ceph-5) setattrs 3.0s1_head/0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1
2016-01-27 17:24:58.029124 7f069df65700 10 filestore oid: 0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1 not skipping op, *spos 34858.0.19
2016-01-27 17:24:58.029150 7f069df65700 10 filestore(/var/lib/ceph/osd/ceph-5) setattrs 3.0s1_head/0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1 = 0
2016-01-27 17:24:59.621712 7f06915ed700 10 osd.5 pg_epoch: 840 pg[3.0s1( v 828'531 lc 763'494 (0'0,828'531] local-les=835 n=6 ec=30 les/c 835/691 840/840/683) [1,5,4] r=1 lpr=840 pi=683-839/2 crt=828'531 inactive m=7] on_change_cleanup: Removing oid 0/temp_3.0s0_0_34165_3/head//-5 from the temp collection
2016-01-27 17:24:59.629366 7f069e766700 15 filestore(/var/lib/ceph/osd/ceph-5) remove 3.0s1_TEMP/0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1
2016-01-27 17:24:59.629415 7f069e766700 10 filestore(/var/lib/ceph/osd/ceph-5) remove 3.0s1_TEMP/0/temp_3.0s0_0_34165_3/head//-5/18446744073709551615/1 = -2
....
2016-01-27 17:25:25.103061 7f069e766700  5 filestore(/var/lib/ceph/osd/ceph-5) _do_op 0x9026cd0 seq 35628 osr(3.0s1 0x83f2e00)/0x83f2e00 start
2016-01-27 17:25:25.103064 7f069e766700 10 filestore(/var/lib/ceph/osd/ceph-5) _do_transaction on 0xb748140
2016-01-27 17:25:25.103068 7f069e766700 15 filestore(/var/lib/ceph/osd/ceph-5) remove 3.0s1_head/0//head//3/18446744073709551615/1
2016-01-27 17:25:25.103086 7f069e766700 20 filestore(/var/lib/ceph/osd/ceph-5) lfn_unlink: clearing omap on 0//head//3/18446744073709551615/1 in cid 3.0s1_head
2016-01-27 17:25:25.103093 7f069e766700 10 filestore oid: 0//head//3/18446744073709551615/1 not skipping op, *spos 35628.0.0
2016-01-27 17:25:25.103095 7f069e766700 10 filestore  > header.spos 0.0.0
2016-01-27 17:25:25.103097 7f069e766700 20 filestore remove_map_header: removing 5711 oid 0//head//3/18446744073709551615/1
2016-01-27 17:25:25.103103 7f069e766700 20 filestore clear_header: clearing seq 5711
2016-01-27 17:25:25.104726 7f069e766700 10 filestore(/var/lib/ceph/osd/ceph-5) remove 3.0s1_head/0//head//3/18446744073709551615/1 = 0
2016-01-27 17:25:25.105583 7f069e766700  0 filestore(/var/lib/ceph/osd/ceph-5)  error (39) Directory not empty not handled on operation 0x91627a8 (35628.0.1, or op 1, counting from 0)
2016-01-27 17:25:25.105606 7f069e766700  0 filestore(/var/lib/ceph/osd/ceph-5) ENOTEMPTY suggests garbage data in osd data dir
2016-01-27 17:25:25.105609 7f069e766700  0 filestore(/var/lib/ceph/osd/ceph-5)  transaction dump:
{
   "ops": [
       {
           "op_num": 0,
           "op_name": "remove",
           "collection": "3.0s1_head",
           "oid": "0\/\/head\/\/3\/18446744073709551615\/1" 
       },
       {
           "op_num": 1,
           "op_name": "rmcoll",
           "collection": "3.0s1_head" 
       },
       {
           "op_num": 2,
           "op_name": "rmcoll",
           "collection": "3.0s1_TEMP" 
       }
   ]
}

2016-01-27 17:25:25.113874 7f069e766700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f069e766700 time 2016-01-27 17:25:25.105731
os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #14810: "FileStore.cc: 2855: FAILED assert(0 == "unexpected error")" in powercycle-infernalis-testing-basic-smithiResolved02/18/2016

Actions
Actions #1

Updated by David Zafman about 8 years ago

  • Description updated (diff)
Actions #2

Updated by David Zafman about 8 years ago

  • Assignee changed from David Zafman to Sage Weil
  • Priority changed from Normal to Urgent

Assigned to Sage. I think this is going to need a backport of _kludge_temp_object_collection() code, but I don't think it is a simple backport.

Actions #3

Updated by Yuri Weinstein about 8 years ago

  • Related to Bug #14810: "FileStore.cc: 2855: FAILED assert(0 == "unexpected error")" in powercycle-infernalis-testing-basic-smithi added
Actions #4

Updated by Samuel Just over 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF