Project

General

Profile

Actions

Bug #6082

closed

vps machines getting stuck doing a syncfs(2)

Added by Tamilarasi muthamizhan over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Sandon Van Ness
Category:
-
Target version:
-
% Done:

100%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

logs: ubuntu@teuthology:/a/teuthology-2013-08-21_01:35:03-upgrade-parallel-next-testing-basic-vps/5214

    -1> 2013-08-21 11:22:01.361557 7fb3a0d43700 -1 FileStore: sync_entry timed out after 600 seconds.
 ceph version 0.67.1-11-gf6fe74f (f6fe74ff51f679e7245b02462822d9ef1e15d28c)
 1: (Context::complete(int)+0xa) [0x733faa]
 2: (SafeTimer::timer_thread()+0x1af) [0x9bd42f]
 3: (SafeTimerThread::entry()+0xd) [0x9beabd]
 4: (()+0x6b50) [0x7fb3ac673b50]
 5: (clone()+0x6d) [0x7fb3aaa50a7d]

     0> 2013-08-21 11:22:01.942667 7fb3a0d43700 -1 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)' thread 7fb3a0d43700 time 2013-08-21 11:22:01.941529
os/FileStore.cc: 3377: FAILED assert(0)

 ceph version 0.67.1-11-gf6fe74f (f6fe74ff51f679e7245b02462822d9ef1e15d28c)
 1: (SyncEntryTimeout::finish(int)+0x9e) [0x8c712e]
 2: (Context::complete(int)+0xa) [0x733faa]
 3: (SafeTimer::timer_thread()+0x1af) [0x9bd42f]
 4: (SafeTimerThread::entry()+0xd) [0x9beabd]
 5: (()+0x6b50) [0x7fb3ac673b50]
 6: (clone()+0x6d) [0x7fb3aaa50a7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2013-08-21_01:35:03-upgrade-parallel-next-testing-basic-vps/5214$ cat summary.yaml 
description: collection:rados 0-cluster:start.yaml 1-dumpling-install:dumpling.yaml
  2-workload:loadgenmix.yaml 3-upgrade-sequence:upgrade-all.yaml distro:debian_7.0.yaml
duration: 2749.6504888534546
failure_reason: 'Command failed on 10.214.138.165 with status 1: ''/home/ubuntu/cephtest/5214/adjust-ulimits
  ceph-coverage /home/ubuntu/cephtest/5214/archive/coverage sudo /home/ubuntu/cephtest/5214/daemon-helper
  kill ceph-osd -f -i 3'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
ubuntu@teuthology:/a/teuthology-2013-08-21_01:35:03-upgrade-parallel-next-testing-basic-vps/5214$ cat config.yaml 
kernel:
  kdb: true
  sha1: 546140dd51e9ec7e34fe0b0a5814240828f68f7d
machine_type: vps
nuke-on-error: true
os_type: debian
os_version: '7.0'
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    log-whitelist:
    - slow request
    sha1: cf8dbd248b8792781394fe87db141ad5704dc3b3
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: cf8dbd248b8792781394fe87db141ad5704dc3b3
  s3tests:
    branch: next
  workunit:
    sha1: cf8dbd248b8792781394fe87db141ad5704dc3b3
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
targets:
  ubuntu@vpm080.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDI2oIesSv+L/FBUIlAElUD5mNd36Hbs+vbzoPc9recE+X9cMuHhLiAD0AKXRc29GHMKdF3OVHRXdfzjyW9IXxQU2aCsMFK2Hvy/fHSI8YvEhYAfufnSdJJPV26juZjGWliYfujphyCWhre2QeuJ4oqlCGK/IIoaTQaGW8IGSsMHI11XDV9EFRIM6fkTJgIYjLpTtxy+ZABWbXLFGDwFoZpt3e3o9o1RzZ0JcUFaAiZP4RfWtyF5zSiGv6V7i+L0/ZlbGbdtI9TffaBV50fTRq4h2v7sCosGir385e8kHQkHa+ntWhPiOwYJwWCOEd5BtM8pDjyQpd//bAIQ1fUE2Cb
  ubuntu@vpm081.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD0+QxR1qkAzKBYgoXcvmu7RcSgoDZT1oW7Cy5ny5vKOcj7V5Se4ip4JOk5LzWzfJyEA13GJgw6BWpMLo9DXPVb+MCxRnVfPT1SHAFN6y6KRLxyA5ePilprwGWhRQEEwLTrfH168gaU3t7+MnD60bJEMpL7xWd87VlI8s99xwO9G12TeejBkE2dicD2VxhonUy9OVlYzX+i/DwP3U3pcI6KfXJa/mJKeUaVoRm/q6kLUD15NWF3oRDBhbUzqCYCLhWdB2qIuAp1gkatwyftMKGndO9PL6qesrFCr80ROzuQV6DXY41Zcv/xVUZ+KZaEOTTeP2rN6bnvOecP4cb1vSNR
  ubuntu@vpm099.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDRqRj+Wmrt8j8E9vStfPEINFEOW6qIow3WaikCDxkSect3EwwtBTEjV4Q4o4ffMtZ2JKzj+3PTdmms3Kcb4vBm//qJRz16knCi7vTHuK6nZeTTRoQDJ65Za9fMjAZWrIdcvLc9jeiPt7+NRLmGju+7I7wGDRblflCK2RiT/A2tKzeZG3BZKNCAxllElFmbiLNVWU1iC4bgSqsb8yfY1kaV4dDd3UIZTGLHvLHPsnK7yGph1mycsDBIbGu46hSTNPKviKmyaIxfKm2escHJ9KNCcgsQlANK9fPzBiVBmv4Z2IooiswMmokvn464iMA4syxAeNUgftlAqjdT9kihsTyl
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph: null
- parallel:
  - workload
  - upgrade-sequence
teuthology_branch: next
upgrade-sequence:
  sequential:
  - install.upgrade:
      all:
        branch: next
  - ceph.restart:
    - mon.a
    - mon.b
    - mon.c
    - mds.a
    - osd.0
    - osd.1
    - osd.2
    - osd.3
workload:
  workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/load-gen-mix.sh

Actions #1

Updated by Tamilarasi muthamizhan over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-08-21_01:35:03-upgrade-parallel-next-testing-basic-vps/5217
this happens on debian.

Actions #2

Updated by Sage Weil over 10 years ago

  • Status changed from New to Need More Info
  • Assignee set to Tamilarasi muthamizhan

can you reproduce this with debug filestore = 20 debug osd = 20 debug ms = 1? THanks!

Actions #3

Updated by Sage Weil over 10 years ago

  • Assignee changed from Tamilarasi muthamizhan to Sandon Van Ness

oh, the vm got stuck trying to do a syncfs(). not a ceph bug but a problem with teh vm...

Actions #4

Updated by Sage Weil over 10 years ago

  • Project changed from Ceph to sepia
  • Subject changed from nightlies: osd segfault during restart on debian 7.0 after upgrading from dumpling to next to vps machines getting stuck doing a syncfs(2)
  • Category deleted (OSD)
  • Status changed from Need More Info to 12
Actions #5

Updated by Tamilarasi muthamizhan over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-08-28_01:35:03-upgrade-parallel-next-testing-basic-vps/10754

Actions #6

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #7

Updated by Tamilarasi muthamizhan over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-11-13_14:42:07-upgrade-parallel-next-testing-basic-vps/97200

Actions #8

Updated by Sandon Van Ness about 10 years ago

  • Status changed from 12 to Resolved
  • % Done changed from 0 to 100

Marking this as resolved as I haven't seen any issues like this for a while since the VPS machines have dedicated disks.

Actions

Also available in: Atom PDF