Project

General

Profile

Bug #3498

mds: mds assert failure during untar_kernel

Added by Sam Lang over 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Saw a few mds assertions during testing for #3490, running the untar_kernel script a bunch of times on teuthology. Sometimes the assertion occurs along with the ENOENT, sometimes its just the assertion. The assertion happened 4/20 times. The assertion never happens in the same place, but its always during the compile (not the untar).

The config.yaml from the one of the runs:

overrides:
ceph:
fs: btrfs
log-whitelist:
- slow request
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDy2BKPe+fe5jK0ziU8aKM0DzODSTaKWecQRwLjnZLbjDvTyHm8x8xX/JCts3bFfrc2ozFz7ILBIWU96JRZiFF2TtFZjtf1H19kyvR8PWCxiZ/lld+C7B6U8iiPSNiSlgo7mwkpk1JoSpHe4rK/Z7WQRWBMsCC7XJETu6rRX3i0ZYaKh8BoWWhpsBs1quSNxRXNUqJ6OKnDbB5Vuan1TK9b49RXmibx+oapXm8V0sHEVLYa+NTUs+wAEHAnjFgRe75Cik/rmgeE0m2Cff1rp9tFhEEDwZ5PUdnscOTY78BxImMRdkbZ8lJXOGcOOsD3Dj1jOr4pVrgxZqUdtWfJGkj
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCjq/dBds5eofgD5l4r9AcpVnq4tnZrPYUlDPcgRgzUSM36rUAOA9IPzjamtlb+kdmo5TkWve/QgqDt/6SI07L0EeULiRhPnYjsDsDvRINoL/7oUd6dqhsC+5OMrnuscOIQUVM40TR/J4F2pwUWSM5YKsc6VBYkmh+2JTvI9ZiwXsA0KY0a6qa5ScDDZtfGapo166cm/BkqmGPPudYqCrXbCqKLaXjSizU3e02RlFLwF4xw0wKCKQ5T/mh8mqUOEWnZZTVNx0Xrkit0q247KKc4sNZYX4sQVRx8MMedOr33VEhmyuOY8TgHLFp/aB0HzbatYYJb0HATxbjwJWFjGBIX
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKGuvzfY8V0QDpwGmSa3Fz3ms2SM0syFb8GAnmivOpz29fV0m/kJMrbnobPWjZlZdcPE0Dn6bkcJ0htns9fn6n79nF0G7SGEQo8RFwbWUF2DHkcy8QxwF0HfQPn0OhrZdp3P2pN5lX2m5FwTLdZLprm7eQcW6V2gqD6ouZZKIUXfu2jy9B02DYWLPAG18n9Wq181IOQVoOQ0BP6gddwohnlGyipDgY8VBTiTV0uRV5PKVu3J2Ak5LCmwHRv9T0TyR3R6sRNNGvA85uaOgewk4/fPGsvPXRblILlKSi44ZMp0gFSA3flG9Coal3s0KTE8t8q35+MTGDfGd5o4Y4Hivv
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- ceph: null
- ceph-fuse: null
- workunit:
clients:
all:
- kernel_untar_build.sh

There's two stack traces here, mostly the same, but slightly different:

2012-11-15T13:35:30.140 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7f0be6374700 time 2012-11-15 13:35:04.432436
2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length())
2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a)
2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb]
2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err: 2: (MDBalancer::get_load(utime_t)+0x30b) [0x629d0b]
2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 3: (MDS::tick()+0xb3) [0x4ba8a3]
2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 4: (SafeTimer::timer_thread()+0x446) [0x782c26]
2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 5: (SafeTimerThread::entry()+0xd) [0x7838ad]
2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 6: (()+0x7e9a) [0x7f0becae7e9a]
2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 7: (clone()+0x6d) [0x7f0beb08ccbd]
2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err:2012-11-15 13:35:04.433622 7f0be6374700 -1 ./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7f0be6374700 time 2012-11-15 13:35:04.432436
2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length())
2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err:
2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a)
2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb]
2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 2: (MDBalancer::get_load(utime_t)+0x30b) [0x629d0b]
2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 3: (MDS::tick()+0xb3) [0x4ba8a3]
2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 4: (SafeTimer::timer_thread()+0x446) [0x782c26]
2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 5: (SafeTimerThread::entry()+0xd) [0x7838ad]
2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 6: (()+0x7e9a) [0x7f0becae7e9a]
2012-11-15T13:35:30.145 INFO:teuthology.task.ceph.mds.a.err: 7: (clone()+0x6d) [0x7f0beb08ccbd]
2012-11-15T13:35:30.145 INFO:teuthology.task.ceph.mds.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-11-15T13:35:30.145 INFO:teuthology.task.ceph.mds.a.err:
2

2012-11-15T14:17:40.656 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7fe310343700 time 2012-11-15 14:17:14.909273
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length())
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a)
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb]
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 2: (MDS::tick()+0xfe) [0x4ba8ee]
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 3: (SafeTimer::timer_thread()+0x446) [0x782c26]
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 4: (SafeTimerThread::entry()+0xd) [0x7838ad]
2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 5: (()+0x7e9a) [0x7fe316ab2e9a]
2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: 6: (clone()+0x6d) [0x7fe31505a4bd]
h2. 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
h2. 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err:2012-11-15 14:17:14.910292 7fe310343700 -1 ./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7fe310343700 time 2012-11-15 14:17:14.909273
2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length())
2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err:
2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a)
2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb]
2012-11-15T14:17:40.659 INFO:teuthology.task.ceph.mds.a.err: 2: (MDS::tick()+0xfe) [0x4ba8ee]

History

#1 Updated by Sage Weil about 11 years ago

  • Description updated (diff)
  • Status changed from New to Resolved

this was a msgr bug, long since fixed. commit:36c0fd220ef02b1ffd7a3ae0d98e0fdec6b55a5b or thereabouts

Also available in: Atom PDF