Project

General

Profile

Actions

Bug #3431

closed

ceph fuse crashed during fsx test

Added by Tamilarasi muthamizhan over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs:ubuntu@teuthology:/a/teuthology-2012-10-30_19:00:06-regression-master-testing-gcov/5947

2012-11-01T03:08:40.562 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.h: In function 'bool ObjectCacher::Object::can_close()' thread 7fab371f1700 time 2012-11-01 03:08:30.468866
2012-11-01T03:08:40.562 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.h: 221: FAILED assert(lru_is_expireable())
2012-11-01T03:08:40.562 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: ceph version 0.53-562-gfd4b839 (fd4b839d04525f5ca9069bad82b2f7f7f8ae8689)
2012-11-01T03:08:40.563 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 1: (ObjectCacher::release(ObjectCacher::Object*)+0x55f) [0x64a48f]
2012-11-01T03:08:40.563 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 2: (ObjectCacher::release_set(ObjectCacher::ObjectSet*)+0x14c) [0x64addc]
2012-11-01T03:08:40.563 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 3: (Client::_invalidate_inode_cache(Inode*, bool)+0xe3) [0x486cf3]
2012-11-01T03:08:40.563 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 4: (Client::_flushed(Inode*)+0x21e) [0x49e17e]
2012-11-01T03:08:40.563 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 5: (Client::flush_set_callback(ObjectCacher::ObjectSet*)+0x25) [0x49e255]
2012-11-01T03:08:40.563 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 6: (client_flush_set_callback(void*, ObjectCacher::ObjectSet*)+0x11) [0x49e2c1]
2012-11-01T03:08:40.564 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 7: (ObjectCacher::discard_set(ObjectCacher::ObjectSet*, std::vector<ObjectExtent, std::allocator<ObjectExtent> >&)+0x838) [0x653b98]
2012-11-01T03:08:40.564 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 8: (Client::_invalidate_inode_cache(Inode*, long, long, bool)+0x125) [0x4851f5]
2012-11-01T03:08:40.564 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 9: (Client::update_inode_file_bits(Inode*, unsigned long, unsigned long, unsigned long, unsigned long, utime_t, utime_t, utime_t, int)+0xd88) [0x486288]
2012-11-01T03:08:40.564 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 10: (Client::add_update_inode(InodeStat*, utime_t, int)+0x413) [0x4e8313]
2012-11-01T03:08:40.564 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 11: (Client::insert_trace(MetaRequest*, int)+0xbc7) [0x4ed787]
2012-11-01T03:08:40.565 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 12: (Client::handle_client_reply(MClientReply*)+0x463) [0x4ee9a3]
2012-11-01T03:08:40.565 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 13: (Client::ms_dispatch(Message*)+0x863) [0x4f62b3]
2012-11-01T03:08:40.565 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 14: (DispatchQueue::entry()+0x6b9) [0x6c9829]
2012-11-01T03:08:40.565 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 15: (DispatchQueue::DispatchThread::entry()+0x15) [0x5d3795]
2012-11-01T03:08:40.566 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 16: (Thread::_entry_func(void*)+0x12) [0x5d6fe2]
2012-11-01T03:08:40.566 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 17: (()+0x7e9a) [0x7fab3bf4ee9a]
2012-11-01T03:08:40.566 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 18: (clone()+0x6d) [0x7fab3a7074bd]
2012-11-01T03:08:40.566 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2012-10-30_19:00:06-regression-master-testing-gcov/5947$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 0d7dbfce9d6e3a57a6946fadf7f92b1792b8acc0
nuke-on-error: true
overrides:
  ceph:
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: fd4b839d04525f5ca9069bad82b2f7f7f8ae8689
  s3tests:
    branch: master
  workunit:
    sha1: fd4b839d04525f5ca9069bad82b2f7f7f8ae8689
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana02.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCtjMpSkaJhFqFtpo5AEe3KHygR+ueaWU+gYrrRzPa8YvmR0TCapw0kz77y1Fjcfh8rkTapnevpaYgQSMrMs0Yc34kF5XtNRuQXkpTwrhS8isZJBeNSc1W5XeKjj4KB/UuzBywJq0h/0KbH1DrMy72cGISOzdiP9CMA5KUvJo0m31wv1+MPcPn/5AhZgoWPStfaZdb4TaJUrNLrws0oRXa0yQbUa6WmUBsYhHsw4K1ukJAcJwVjcgAAv1N+GnyuWLVs+pvknBO3Whv1RhjY6EDGjun1MDPw+OE3wJsJX7BRr8eZv2Avi7pRlseWeWJwgsHMJ/j0yhf+SCy1+oSPrD2b
  ubuntu@plana22.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6ZmsmnHcSY7O9viGtUzt5WebiPbwcXo9tg5qgWsaqn46DeegKbdsQ55ajysSUVVhvQA5hW6J9IYyZ5MjtlY2G/whyHYG85tNpAUiuedaQHmzARtL3URZmy2ZxwXgYyPHW3t1n0cu6KSb4pTv9vBjcaCouV2wgrinHAISzDOVuUeXdIhC8Tr3MB0nD1Gw6Xcak680XsQw6oYP6cM+yGCZ7sF15W8TR9IJGmphMIvtd8aTuBo9yet5rIxUfzpCM9Jiv+XgH2oT9h9WfacuL1uQ2C/dHUWoPynK36Uv2J785bfw/hVVtuSGu9Lb1n4o8p8Z88Ex4i8KaOxMiQAs3zqOx
  ubuntu@plana23.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDiwH5Qz5dXnbtYRiTk0QVNNyZQWYcardED+AqVLxoz5h/z/tPUyt6VTcrNvyFiKZcrz70vKy/1S1JNmt74gSc0KE8YhLjvuCaTwJDw1LOTNzc5b074zfnjeNGKqb0L3BefbFFOMh/ZuxGbTJWZXdD1DwP2VWxGdhtHAxglgLjt5541nxw41vT+dVMgQMt7Lv5P3MXl+IY58LUzYC9EkOvgZTPTfRx7IptkDSEmbYGL7dQE6H9VyoukOejj4jgg8ZWhPR9e39OhB/Vh7qtiCTRfapovh1zrfCM/b7O1/nMisUmfOK+nF2ruiTefEA14u59uxlfpaRLQDtyv6b5aPour
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsx.sh
ubuntu@teuthology:/a/teuthology-2012-10-30_19:00:06-regression-master-testing-gcov/5947$ cat summary.yaml ceph-sha1: fd4b839d04525f5ca9069bad82b2f7f7f8ae8689
client.0-kernel-sha1: 0d7dbfce9d6e3a57a6946fadf7f92b1792b8acc0
description: collection:fs-basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:cfuse_workunit_suites_fsx.yaml
duration: 1115.6926379203796
failure_reason: 'Command failed with status 160: ''mkdir -- /tmp/cephtest/mnt.0/client.0/tmp
  && cd -- /tmp/cephtest/mnt.0/client.0/tmp && CEPH_REF=fd4b839d04525f5ca9069bad82b2f7f7f8ae8689
  PATH="$PATH:/tmp/cephtest/binary/usr/local/bin" LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/tmp/cephtest/binary/usr/local/lib" 
  CEPH_CONF="/tmp/cephtest/ceph.conf" CEPH_SECRET_FILE="/tmp/cephtest/data/client.0.secret" 
  CEPH_ID="0" PYTHONPATH="$PYTHONPATH:/tmp/cephtest/binary/usr/local/lib/python2.7/dist-packages:/tmp/cephtest/binary/usr/local/lib/python2.6/dist-packages" 
  /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage
  /tmp/cephtest/workunit.client.0/suites/fsx.sh && rm -rf -- /tmp/cephtest/mnt.0/client.0/tmp'''
flavor: gcov
mon.a-kernel-sha1: 0d7dbfce9d6e3a57a6946fadf7f92b1792b8acc0
mon.b-kernel-sha1: 0d7dbfce9d6e3a57a6946fadf7f92b1792b8acc0
owner: scheduled_teuthology@teuthology
success: false

Actions #1

Updated by Tamilarasi muthamizhan over 11 years ago

  • Source changed from Development to Q/A
Actions #2

Updated by Tamilarasi muthamizhan over 11 years ago

recent logs: ubuntu@teuthology:/a/teuthology-2012-11-04_19:00:03-regression-master-testing-gcov/9298

Actions #3

Updated by Tamilarasi muthamizhan over 11 years ago

recent log: ubuntu@teuthology:/a/teuthology-2012-11-05_19:00:02-regression-master-testing-gcov/10001

Actions #4

Updated by Sam Lang over 11 years ago

  • Status changed from New to Resolved

Fixed by caed0e917

Actions #5

Updated by Sam Lang over 11 years ago

  • Status changed from Resolved to Fix Under Review

With the fixes in place, we now get an assertion on readx path during lru cache eviction. I've pushed proposed fixes to the wip-3431 branch.

2012-11-19 09:06:35.187910 7ff143e2f780 1 osdc/ObjectCacher.cc: In function 'void ObjectCacher::close_object(ObjectCacher::Object*)' thread 7ff143e2f780 time 2012-11-19 09:06:35.186379
osdc/ObjectCacher.cc: 577: FAILED assert(ob
>can_close())

ceph version 0.54-641-g4c69f86 (4c69f865ca79328c62635ae32c91bd32b3985613)
1: (ObjectCacher::close_object(ObjectCacher::Object*)+0x135) [0x5c78d5]
2: (ObjectCacher::trim(long, long)+0x820) [0x5c94d0]
3: (ObjectCacher::_readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*, bool)+0x21ad) [0x5d92dd]
4: (Client::_read_async(Fh*, unsigned long, unsigned long, ceph::buffer::list*)+0x3e9) [0x486c09]
5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x265) [0x49bd65]
6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x97) [0x49be87]
7: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x4733cf]
8: (()+0x12d5e) [0x7ff1439fdd5e]
9: (fuse_session_loop()+0x75) [0x7ff1439fbd65]
10: (ceph_fuse_ll_main(Client*, int, char const**, int)+0x225) [0x474245]
11: (main()+0x42f) [0x4716ef]
12: (__libc_start_main()+0xed) [0x7ff141ebd76d]
13: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x472e95]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Actions #6

Updated by Tamilarasi muthamizhan over 11 years ago

recent logs: ubuntu@teuthology:/a/sage-2012-11-25_20:49:20-regression-next-master-basic/4176

2012-11-25T20:58:45.308 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::close_object(ObjectCacher::Object*)' thread 7f2cb9150780 time 2012-11-25 20:58:35.069442
2012-11-25T20:58:45.308 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: 577: FAILED assert(ob->can_close())
2012-11-25T20:58:45.308 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: ceph version 0.54-682-gbc32fc4 (bc32fc42d2bb38c300f65d577ab105f02cc50571)
2012-11-25T20:58:45.309 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x87a4f9]
2012-11-25T20:58:45.309 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 2: (ObjectCacher::close_object(ObjectCacher::Object*)+0x176) [0x8db170]
2012-11-25T20:58:45.309 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 3: (ObjectCacher::trim(long, long)+0x63f) [0x8de8ef]
2012-11-25T20:58:45.309 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 4: (ObjectCacher::_readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*, bool)+0x2735) [0x8e15cd]
2012-11-25T20:58:45.309 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 5: (ObjectCacher::readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*)+0x36) [0x8dee96]
2012-11-25T20:58:45.311 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 6: (ObjectCacher::file_read(ObjectCacher::ObjectSet*, ceph_file_layout*, snapid_t, long, unsigned long, ceph::buffer::list*, int, Context*)+0x83) [0x74af25]
2012-11-25T20:58:45.311 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 7: (Client::_read_async(Fh*, unsigned long, unsigned long, ceph::buffer::list*)+0xd18) [0x729330]
2012-11-25T20:58:45.311 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 8: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0xf6) [0x72831c]
2012-11-25T20:58:45.311 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 9: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x365) [0x739c45]
2012-11-25T20:58:45.311 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 10: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x6ee803]
2012-11-25T20:58:45.312 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 11: (()+0x12d5e) [0x7f2cb8d1ed5e]
2012-11-25T20:58:45.312 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 12: (fuse_session_loop()+0x75) [0x7f2cb8d1cd65]
2012-11-25T20:58:45.312 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 13: (ceph_fuse_ll_main(Client*, int, char const**, int)+0x736) [0x6ef815]
2012-11-25T20:58:45.312 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 14: (main()+0x820) [0x6ec542]
2012-11-25T20:58:45.312 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 15: (__libc_start_main()+0xed) [0x7f2cb71de76d]
2012-11-25T20:58:45.313 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 16: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x6ebc19]
2012-11-25T20:58:45.313 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-11-25T20:58:45.313 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:2012-11-25 20:58:35.076774 7f2cb9150780 -1 osdc/ObjectCacher.cc: In function 'void ObjectCacher::close_object(ObjectCacher::Object*)' thread 7f2cb9150780 time 2012-11-25 20:58:35.069442
2012-11-25T20:58:45.313 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: 577: FAILED assert(ob->can_close())
2012-11-25T20:58:45.313 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:
2012-11-25T20:58:45.314 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: ceph version 0.54-682-gbc32fc4 (bc32fc42d2bb38c300f65d577ab105f02cc50571)
2012-11-25T20:58:45.314 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x87a4f9]
2012-11-25T20:58:45.314 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 2: (ObjectCacher::close_object(ObjectCacher::Object*)+0x176) [0x8db170]
2012-11-25T20:58:45.314 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 3: (ObjectCacher::trim(long, long)+0x63f) [0x8de8ef]
2012-11-25T20:58:45.314 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 4: (ObjectCacher::_readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*, bool)+0x2735) [0x8e15cd]
2012-11-25T20:58:45.314 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 5: (ObjectCacher::readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*)+0x36) [0x8dee96]
2012-11-25T20:58:45.315 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 6: (ObjectCacher::file_read(ObjectCacher::ObjectSet*, ceph_file_layout*, snapid_t, long, unsigned long, ceph::buffer::list*, int, Context*)+0x83) [0x74af25]
2012-11-25T20:58:45.315 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 7: (Client::_read_async(Fh*, unsigned long, unsigned long, ceph::buffer::list*)+0xd18) [0x729330]
2012-11-25T20:58:45.315 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 8: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0xf6) [0x72831c]
2012-11-25T20:58:45.315 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 9: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x365) [0x739c45]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 10: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x6ee803]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 11: (()+0x12d5e) [0x7f2cb8d1ed5e]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 12: (fuse_session_loop()+0x75) [0x7f2cb8d1cd65]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 13: (ceph_fuse_ll_main(Client*, int, char const**, int)+0x736) [0x6ef815]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 14: (main()+0x820) [0x6ec542]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 15: (__libc_start_main()+0xed) [0x7f2cb71de76d]
2012-11-25T20:58:45.316 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 16: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x6ebc19]
2012-11-25T20:58:45.317 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #7

Updated by Sage Weil over 11 years ago

i have an alternate fix pushed to wip-mds-next.. sam, want to take a look?

Actions #8

Updated by Sage Weil over 11 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF