Project

General

Profile

Actions

Bug #5689

closed

ceph-fuse crashed after upgrading from cuttlefish to dumpling

Added by Tamilarasi muthamizhan almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when trying to run some workload after upgrading from cuttlefish v0.61.4 to v0.66, ceph-fuse crashed and also seeing "wrong node" messages on the client side.

attaching the segfault and log messages on the client.

2013-07-19 17:13:16.081778 7effc5884780  0 ceph version 0.66-752-g8c5e1db (8c5e1db4fb76b5e1fcf6721ad210f143a571d7b8), process ceph-fuse, pid 627
2013-07-19 17:13:16.091001 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44795 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).connect claims to be 10.214.135.138:6806/14589 not
 10.214.135.138:6806/13815 - wrong node!
2013-07-19 17:13:16.091103 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44795 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).fault
2013-07-19 17:13:16.091821 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44796 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).connect claims to be 10.214.135.138:6806/14589 not
 10.214.135.138:6806/13815 - wrong node!
2013-07-19 17:13:16.292890 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44797 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).connect claims to be 10.214.135.138:6806/14589 not
 10.214.135.138:6806/13815 - wrong node!
2013-07-19 17:13:16.694001 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44798 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).connect claims to be 10.214.135.138:6806/14589 not
 10.214.135.138:6806/13815 - wrong node!
2013-07-19 17:13:17.090136 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44799 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).connect claims to be 10.214.135.138:6806/14589 not
 10.214.135.138:6806/13815 - wrong node!
2013-07-19 17:13:18.691207 7effbe1ac700  0 -- 10.214.134.136:0/627 >> 10.214.135.138:6806/13815 pipe(0x21ec280 sd=7 :44800 s=1 pgs=0 cs=0 l=0 c=0x21e9dc0).connect claims to be 10.214.135.138:6806/14589 not
 10.214.135.138:6806/13815 - wrong node!
2013-07-19 17:13:20.006180 7effc5884780 -1 *** Caught signal (Segmentation fault) **
 in thread 7effc5884780

 ceph version 0.66-752-g8c5e1db (8c5e1db4fb76b5e1fcf6721ad210f143a571d7b8)
 1: ceph-fuse() [0x79c6da]
 2: (()+0xfcb0) [0x7effc5233cb0]
 3: (std::__detail::_List_node_base::_M_unhook()+0xa) [0x7effc400a44a]
 4: (Client::wait_on_list(std::list<Cond*, std::allocator<Cond*> >&)+0x16a) [0x529caa]
 5: (Client::make_request(MetaRequest*, int, int, Inode**, bool*, int, ceph::buffer::list*)+0xd4f) [0x54ee1f]
 6: (Client::mount(std::string const&)+0x3b6) [0x567426]
 7: (main()+0x3e7) [0x523547]
 8: (__libc_start_main()+0xed) [0x7effc36f676d]
 9: ceph-fuse() [0x524e79]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

config file used:

roles:
- [mon.a, mds.a, osd.0, osd.1]
- [mon.b, mon.c, osd.2, osd.3]
- [client.0]

targets:
  ubuntu@mira057.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJ3bi5U+cSruKhSrjJ4hgIjkEQ841ymk24S0eh+QnCW1jta7Pe26q5TE5B8862TsSqWNlo04gGTPhKtyVbMH75uzwZUBBnkVa4tGRx1cWR9IDxDutdyQBA028lbMjBd5upr1K+jFVx8TvFvD30UVdacaDH0h1V4CUf65ejb1phSjNhE2DrOdKNYBA9Fl5bqlco9EWi7P3NCShMjeXyloY4IdyIFVwFpqqeSXApXEesb/6wSh81uYtWhp8twOYdNxVu+TPFYhjkcSopC3hKdV+BEAkwzG0TBCoXZg7j42UBB8M6KWhYS1YZl0jPlijTFuNaqbpuq4FJYeNqorhUfqLn
  ubuntu@mira074.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDE/Z5eYFgmJuBkdqZDrCeJHRq3zdJ+Gp10qf9d3qrcGnylz5ulwqFQcbnXX7j1ThA0P1s/UVRM0DNIqn7SieqeFcRFj6ER15UfKvI3Gxk13LJSX68NmQHysNc0q7Gkr8EMuyb2gOVdU5Dt2Sg5nCLQ656F+mX0aBwQkbi7ddSkF1Me9kvgwPvbzjKWhjxGO9ffhVoUK/n9QOXhkREG23jvPNt8YP1eUZmRIUbA7Al8YsmkEFR/9GjE0J4E3dsYJY+Da+zJ3vYlSbeIIDnevhv47miYJW+1WFpfEWJPX5Y1JeFCp9CwEA0dywxWUYYzrmtj8M8yUUox82wXJ93T9toP
  ubuntu@mira080.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGnNwnG4C8IU3gTmWaNB9YU0gbNeoBAcyD18JRuLJlLMKaZgvD2qvDgjis/4n1Fn0w7yY9YILNAI+fRlifaZRjg0nNyjIB3MpYbK/7oB12sO3R/fpNhA8FU6bt0V/9XQWrdWLe2s1PlTVgucMOVEJnp+3eFSR+3thfR9XXHqjdpOJ7Q1Ra/dLzjk/SP94i0EshvBlPl4kClyEWqKLlkMvZNGzKeJj+J9g8jagvsSJ65fdyi9qcaLVvicuOeL6T4ZGRypPaYfUNLRsPGRjTlqE2IZxC8RBtuyL4sVdFl2hBGT3HhOF4IQuFYTWgWzh9fUDJdwCVI7FH8O3id0QLirE7

tasks:
- install:
    branch: cuttlefish
- ceph:
- parallel:
   - workunit:
       clients:
         all: 
           - suites/blogbench.sh
   - sequential:
       - install.upgrade:
           all:
             branch: next
       - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3]
       - ceph-fuse: [client.0]
       - workunit:
           branch: next
           clients:
             client.0:
               - suites/dbench.sh

Actions #1

Updated by Tamilarasi muthamizhan almost 11 years ago

  • Category set to 11
Actions #2

Updated by Sage Weil almost 11 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Tamilarasi muthamizhan almost 11 years ago

logs are copied to mira074.front.sepia.ceph.com:/home/ubuntu/bug_5689/teuthology_run

Actions #4

Updated by Sage Weil almost 11 years ago

the clients sends request_open and gets back closed from the mds, and wait_on_list can't handle the list<Cond*> getting deallocated out from underneath it in _closed_mds_session(). :/

could make it try to reopen, but i'm not sure we want to busy loop here.

Actions #5

Updated by Sage Weil almost 11 years ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Sage Weil almost 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (11)
Actions #7

Updated by Greg Farnum almost 11 years ago

  • Status changed from Fix Under Review to 4
  • Assignee set to Sage Weil

Couple comments on the wip-5689 branch.

Actions #8

Updated by Sage Weil almost 11 years ago

  • Status changed from 4 to Resolved
Actions

Also available in: Atom PDF