Project

General

Profile

Actions

Bug #1947

closed

mds: SIGBUS during _mark_dirty

Added by Josh Durgin over 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happened on umount after ffsb with the kernel client.
From teuthology:~teut/log/mds.a.log.gz:

*** Caught signal (Bus error) **
 in thread 7fb59c359700
 ceph version 0.40-6-g8d271f4 (commit:8d271f439ed9a2cb4e2dd9c3b22b50dde78f6bc0)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x91cfe4]
 2: (()+0xfb40) [0x7fb59fdd6b40]
 3: (CDentry::_mark_dirty(LogSegment*)+0x8c) [0x6eb0ec]
 4: (CDentry::mark_dirty(unsigned long, LogSegment*)+0x6d) [0x6eecfd]
 5: (Server::_unlink_local_finish(MDRequest*, CDentry*, CDentry*, unsigned long)+0x307) [0x53ccf7]
 6: (C_MDS_unlink_local_finish::finish(int)+0x33) [0x596273]
 7: (Context::complete(int)+0x12) [0x4a0252]
 8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e42ce]
 9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d82f6]
 10: (Journaler::C_Flush::finish(int)+0x1d) [0x7e44fd]
 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xeb3) [0x7b4393]
 12: (MDS::handle_core_message(Message*)+0xedf) [0x4c418f]
 13: (MDS::_dispatch(Message*)+0x3c) [0x4c6cdc]
 14: (MDS::ms_dispatch(Message*)+0xa9) [0x4c92c9]
 15: (SimpleMessenger::dispatch_entry()+0x99a) [0x81ceba]
 16: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4972ac]
 17: (Thread::_entry_func(void*)+0x12) [0x817a02]
 18: (()+0x7971) [0x7fb59fdce971]
 19: (clone()+0x6d) [0x7fb59e65d92d]


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #3210: mds crashed and segfault at unlink_local_finishResolved09/24/2012

Actions
Actions #1

Updated by Sage Weil about 12 years ago

  • Status changed from New to Duplicate
Actions #2

Updated by Sage Weil almost 12 years ago

  • Status changed from Duplicate to Need More Info

It looks liek this one still lives on:


2012-06-09T00:15:05.215 DEBUG:teuthology.orchestra.run:Running: 'rm -rf -- /tmp/cephtest/workunits.list /tmp/cephtest/workunit.client.0'
2012-06-09T00:15:05.239 DEBUG:teuthology.parallel:result is None
2012-06-09T00:15:05.239 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x308dc10>
2012-06-09T00:15:05.240 INFO:teuthology.task.kclient:Unmounting kernel clients...
2012-06-09T00:15:05.240 DEBUG:teuthology.task.kclient:Unmounting client client.0...
2012-06-09T00:15:05.240 DEBUG:teuthology.orchestra.run:Running: 'sudo umount /tmp/cephtest/mnt.0'
2012-06-09T00:16:12.619 INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Bus error) **
2012-06-09T00:16:12.619 INFO:teuthology.task.ceph.mds.a.err: in thread 7fac932b8700
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.47.2-391-g4551808 (commit:4551808fa00b812fee6e0c196fd333eca0b06de9)
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x97530a]
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: 2: (()+0xfcb0) [0x7fac97e19cb0]
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: 3: (CDentry::_mark_dirty(LogSegment*)+0x9c) [0x70ae1c]
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: 4: (CDentry::mark_dirty(unsigned long, LogSegment*)+0x9e) [0x70af6e]
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: 5: (Server::_unlink_local_finish(MDRequest*, CDentry*, CDentry*, unsigned long)+0x35b) [0x55c50b]
2012-06-09T00:16:12.742 INFO:teuthology.task.ceph.mds.a.err: 6: (C_MDS_unlink_local_finish::finish(int)+0x33) [0x5abca3]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 7: (Context::complete(int)+0x12) [0x4b6322]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x176) [0x4e9e76]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x256) [0x7b51c6]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 10: (Journaler::C_Flush::finish(int)+0x1d) [0x7bdd8d]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1059) [0x7d9a79]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 12: (MDS::handle_core_message(Message*)+0x5cf) [0x4e2dcf]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 13: (MDS::_dispatch(Message*)+0xa2) [0x4e3e92]
2012-06-09T00:16:12.743 INFO:teuthology.task.ceph.mds.a.err: 14: (MDS::ms_dispatch(Message*)+0x10b) [0x4e6c4b]
2012-06-09T00:16:12.744 INFO:teuthology.task.ceph.mds.a.err: 15: (SimpleMessenger::dispatch_entry()+0xc5b) [0x8ae45b]
2012-06-09T00:16:12.744 INFO:teuthology.task.ceph.mds.a.err: 16: (SimpleMessenger::DispatchThread::entry()+0x15) [0x8588b5]
2012-06-09T00:16:12.744 INFO:teuthology.task.ceph.mds.a.err: 17: (Thread::_entry_func(void*)+0x12) [0x8ca942]
2012-06-09T00:16:12.744 INFO:teuthology.task.ceph.mds.a.err: 18: (()+0x7e9a) [0x7fac97e11e9a]
2012-06-09T00:16:12.744 INFO:teuthology.task.ceph.mds.a.err: 19: (clone()+0x6d) [0x7fac965ca4bd]
2012-06-09T00:16:12.746 INFO:teuthology.task.ceph.mds.a.err:2012-06-09 00:16:12.693925 7fac932b8700 -1 *** Caught signal (Bus error) **
2012-06-09T00:16:12.746 INFO:teuthology.task.ceph.mds.a.err: in thread 7fac932b8700

...

2012-06-09T00:16:12.991 INFO:teuthology.task.ceph.mds.a.err:daemon-helper: command crashed with signal 7
2012-06-09T00:17:05.381 DEBUG:teuthology.orchestra.run:Running: 'rmdir -- /tmp/cephtest/mnt.0'
2012-06-09T00:17:05.394 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x2fe1a50>
2012-06-09T00:17:05.394 INFO:teuthology.task.ceph:Shutting down mds daemons...
2012-06-09T00:17:05.394 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit

i.e., this happens before we sent a signal to the daemon to shut down.

ubuntu@teuthology:/a/nightly_coverage_2012-06-09-a/6581

Actions #3

Updated by Sage Weil almost 12 years ago

ubuntu@teuthology:/a/nightly_coverage_2012-06-13-a/7526

Actions #4

Updated by Tamilarasi muthamizhan almost 12 years ago

ubuntu@teuthology:/a/teuthology-2012-06-18_19:00:05-regression-master-testing-gcov/1579

Actions #5

Updated by Sage Weil almost 12 years ago

ubuntu@teuthology:/a/teuthology-2012-06-24_00:00:07-regression-next-testing-basic$

ubuntu@teuthology:/a/teuthology-2012-06-24_00:00:07-regression-next-testing-basic$ cat 2032/config.yaml 
kernel: &id001
  branch: testing
  kdb: true
nuke-on-error: true
overrides:
  ceph:
    branch: next
    fs: btrfs
    log-whitelist:
    - slow request
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana55.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXY+KFAAzWoJq5vLwy6PJHxNeqz3fHCisJDAbtdrnjhhxVyUQtQLlhIPqiQHi6PADNYNUS/4um0TNmDFYxJLJU9SxqmBQ3QTM9F56YQa9F/+98o4LyPLS5TXqq+nCDbU1vhMbpu0mv2MDZ9BVZAgdT/yYgYGErIQz2MnaCAbgp0SRSZOxq0/3KgMz4W0KxkagiNglZV3RvarYASdqZheYeQYtnIyEw+Hk/ZLHoxUirBthAuCu5RvYYTDptQDuOR0tjRaMS81kapD5VZhFbetSxJ9rJ21oepmLSY+0UoIufZS4CNJ/sP2HDDc1Pw1mjJhqClScxTOP1yUnNWhW1d0sP
  ubuntu@plana61.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTCMIScDTmD9NkfsWU7xeyZ+WOXai5izYeliiXDSjJC3bT6r8Fp+rhPfcHCVHiw++VsbvKZtkhjCSnJTVPWCdpRDghzJ3nZUBImWRo3PmHo1etQpCeimaOrIJ2q0ChN5jmSOqy5B+Z4om2vXBtBY6nkdTxDOr2+MH3NrSPkQSFB0zO+VPuwKXsemeUC6urb2IZZpxY3cxNq4fafTF9PROpgOnIA+o3igyU4duKEjnCzTHZjw/PL7Eph/7p6+UQgrUwe7pgVzT+2MM0zcBtBSXNqs3dCGmpvUapOkBlDoIX02EkWRNpkM3vfeFt1EFC17B5vd61Kg40bYUG8qWGR0T
  ubuntu@plana66.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9QpKzRhLR+H4oCXhZtMxFDAC7E9cmtqcEPmQiW1ycxHNpk+Vy/uNNtlSh8Ljz7/R595beZe4JKgAkxZCCDmmWIEcm01bJgxVq0cbFLz+9dvyiirmr+RxbWqayu2VDC6uuiVAuxz5RdAw6+5Y/4gCrSdQKfQ8dUJZb4e/4Kz/TLr/+R5z+WqCGeutwb9QvK2anvhPJy+wa/JWHwtpTBjZVa5RFOkz9lfNDYayw3j1rylKk0d39J4VB/ch/qIBqUfxD7Hc4exu9sbG9bt5VKdomqdQvTjQXRw083+Nlj6RqJxYZfdJTWG/gYV3MXC4pwqP/ovcjoqZ9cvrsicdSezKp
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- kclient: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh
ubuntu@teuthology:/a/teuthology-2012-06-24_00:00:07-regression-next-testing-basic$ cat 2032/summary.yaml 
ceph-sha1: 9fcc3dee9aca93f6069427f007befb8c06519aa7
description: collection:basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:kclient_workunit_suites_ffsb.yaml
duration: 797.95303511619568
failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage
  /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper kill /tmp/cephtest/binary/usr/local/bin/ceph-mds
  -f -i a -c /tmp/cephtest/ceph.conf'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
Actions #6

Updated by Sage Weil almost 12 years ago

moved test to marginal suite; move back to regression when this is resolved!

Actions #7

Updated by Tamilarasi muthamizhan almost 12 years ago

latest logs:
/a/teuthology-2012-07-03_00:00:09-regression-next-testing-basic/5019

config.yaml:
++++++++++

kernel: &id001
branch: testing
kdb: true
nuke-on-error: true
overrides:
ceph:
branch: next
fs: btrfs
log-whitelist:
- slow request
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh

logs: /a/teuthology-2012-07-03_00:00:09-regression-next-testing-basic/5075

config file is,

kernel: &id001
branch: testing
kdb: true
nuke-on-error: true
overrides:
ceph:
branch: next
fs: xfs
log-whitelist:
- slow request
roles:
- - mon.a
- osd.0
- osd.1
- osd.2
- - mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuXajaQgHe9XnbLOzI8WWFYVz6+TnOiTzbkIJPGOZpzQEjnUtJraQIEt5ABSeovMjiEj+V4XvunfyuSmEd0H9giRSyjmCHTPGlpndfTeCdVtCBpNqf5GkUqHaEY1Hp57XPbya2rGlwtFm0NeIDYx6pfkejKnsTOUqwhgUb6950TRhjHQhMjFgyALSyfAm/4y6vGZfjm57+yyih6XgDkqWiiQ6Y/aJVR2n+iCzvqEzV7JSCU+Brn+k8IQLHho1fadYqc5PjYct5BaVlHcP6c+T8nJE/DvqGwZ4gQaVJcuWJiDfLOPPYo1g/0AFicxauLwVNJ6HFR9FjLLGtGU+2DcVN
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDx0US96hot7gygZ69W4nxJQ9myYnn3I22YtOaSPe+yFWOPJVQUuOST+aw5K6JDcjdO2Gq0aS6s01mgoWpZlO/FVDKss7vZ2KjMp3uPkGMpDZarNbR3QTe5YZYrl7Wfw4pMu4jh92hCWJEzy5nH0H3X2YJhOd5BdOYz0P97qsMSPQGxhlvDBYBhDl9MLgsS3lKm/Js/OPLO+Uf3/SZceCjUqO2m3WsrJSiQJKh8XUWUu3z+6C1Wg6TXSSlA/jdVCiokDg7WYwPN9zMwzzGkGv+GUGHKMZaPGRZb9LQJLTBf/OjwRSgclAVdDc3vnZeYAS5+sDnt2grnJnlBd1rBUj3n
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYE0eu9E8TQwtUy89Wldp54VbNBEoO9XQf77eXXzzmNwYUFRrNX0mZV/I8GqyRJuMrPG8V4aZBthBHTtnEmQ6RAS7fVdthi/hEgwnM9cAqY3KX9mR5xJnHBc/fa5KLrnSr3Wrztf42PpQNEN5Tk55K6wWUlZOTHU3vE0j3kF+YQ5FeBhQbghztHPKFR8bOmZJp9TpbXgbvEM2RWr9bYtro1KuQOgrairyVVNWdAuwZuxSQT4soyHoSkY9JmeXKsNRAOamxH9w57mDC3PXui7r6Fp8OCWSK+GmlLTtPaZtulSCcucaZtpVae7F4s9JNxaRl5RxuUtwMRfgAHGlL2BZv
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
log-whitelist:
- wrongly marked me down or wrong addr
- objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh

Actions #8

Updated by Tamilarasi muthamizhan almost 12 years ago

Tamilarasi muthamizhan wrote:

latest logs:
/a/teuthology-2012-07-03_00:00:09-regression-next-testing-basic/5019

config.yaml:
++++++++++

kernel: &id001
branch: testing
kdb: true
nuke-on-error: true
overrides:
ceph:
branch: next
fs: btrfs
log-whitelist:
- slow request
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh

logs: /a/teuthology-2012-07-03_00:00:09-regression-next-testing-basic/5075

config file is,

kernel: &id001
branch: testing
kdb: true
nuke-on-error: true
overrides:
ceph:
branch: next
fs: xfs
log-whitelist:
- slow request
roles:
- - mon.a
- osd.0
- osd.1
- osd.2
- - mds.a
- osd.3
- osd.4
- osd.5
- - client.0

tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
log-whitelist:
- wrongly marked me down or wrong addr
- objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh

Actions #9

Updated by Tamilarasi muthamizhan almost 12 years ago

latest logs:
/a/teuthology-2012-07-06_00:00:03-regression-next-testing-basic/6655

Actions #10

Updated by Sage Weil over 11 years ago

added debugging to kernel ffsb task

Actions #11

Updated by Tamilarasi muthamizhan over 11 years ago

logs: ubuntu@teuthology:/a/teuthology-2012-08-21_02:00:04-regression-testing-testing-basic/5691

Actions #12

Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
Actions #13

Updated by Tamilarasi muthamizhan over 11 years ago

Recent logs: ubuntu@teuthology:/a/teuthology-2012-10-22_00:00:20-regression-next-testing-basic/5426

ceph version 0.53-290-g233b0bd (commit:233b0bdf0bdbad457f70d22ce3bd48395d5779c3)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x813e11]
 2: (()+0xfcb0) [0x7fbad4a3fcb0]
 3: (CDentry::_mark_dirty(LogSegment*)+0x78) [0x63bd68]
 4: (CDentry::mark_dirty(unsigned long, LogSegment*)+0x79) [0x63bf19]
 5: (Server::_unlink_local_finish(MDRequest*, CDentry*, CDentry*, unsigned long)+0x2a1) [0x521b71]
 6: (Context::complete(int)+0x12) [0x4ae9f2]
 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x12b) [0x4d3c9b]
 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x1df) [0x6b671f]
 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xff8) [0x6d01f8]
 10: (MDS::handle_core_message(Message*)+0xadf) [0x4cefff]
 11: (MDS::_dispatch(Message*)+0x32) [0x4cf1c2]
 12: (MDS::ms_dispatch(Message*)+0x21b) [0x4d0ebb]
 13: (DispatchQueue::entry()+0x711) [0x7ec5f1]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7727ad]
 15: (()+0x7e9a) [0x7fbad4a37e9a]
 16: (clone()+0x6d) [0x7fbad2fdf4bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #14

Updated by Sage Weil over 11 years ago

  • Status changed from Need More Info to Resolved
Actions

Also available in: Atom PDF