Project

General

Profile

Bug #3498

Updated by Sage Weil over 11 years ago


 Saw a few mds assertions during testing for #3490, running the untar_kernel script a bunch of times on teuthology.    Sometimes the assertion occurs along with the ENOENT, sometimes its just the assertion.    The assertion happened 4/20 times.    The assertion never happens in the same place, but its always during the compile (not the untar). 

 The config.yaml from the one of the runs: 

 overrides: 
   ceph: 
     fs: btrfs 
     log-whitelist: 
     - slow request 
 roles: 
 - - mon.a 
   - mon.c 
   - osd.0 
   - osd.1 
   - osd.2 
 - - mon.b 
   - mds.a 
   - osd.3 
   - osd.4 
   - osd.5 
 - - client.0 
 targets: 
   ubuntu@plana63.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDy2BKPe+fe5jK0ziU8aKM0DzODSTaKWecQRwLjnZLbjDvTyHm8x8xX/JCts3bFfrc2ozFz7ILBIWU96JRZiFF2TtFZjtf1H19kyvR8PWCxiZ/lld+C7B6U8iiPSNiSlgo7mwkpk1JoSpHe4rK/Z7WQRWBMsCC7XJETu6rRX3i0ZYaKh8BoWWhpsBs1quSNxRXNUqJ6OKnDbB5Vuan1TK9b49RXmibx+oapXm8V0sHEVLYa+NTUs+wAEHAnjFgRe75Cik/rmgeE0m2Cff1rp9tFhEEDwZ5PUdnscOTY78BxImMRdkbZ8lJXOGcOOsD3Dj1jOr4pVrgxZqUdtWfJGkj 
   ubuntu@plana68.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCjq/dBds5eofgD5l4r9AcpVnq4tnZrPYUlDPcgRgzUSM36rUAOA9IPzjamtlb+kdmo5TkWve/QgqDt/6SI07L0EeULiRhPnYjsDsDvRINoL/7oUd6dqhsC+5OMrnuscOIQUVM40TR/J4F2pwUWSM5YKsc6VBYkmh+2JTvI9ZiwXsA0KY0a6qa5ScDDZtfGapo166cm/BkqmGPPudYqCrXbCqKLaXjSizU3e02RlFLwF4xw0wKCKQ5T/mh8mqUOEWnZZTVNx0Xrkit0q247KKc4sNZYX4sQVRx8MMedOr33VEhmyuOY8TgHLFp/aB0HzbatYYJb0HATxbjwJWFjGBIX 
   ubuntu@plana69.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKGuvzfY8V0QDpwGmSa3Fz3ms2SM0syFb8GAnmivOpz29fV0m/kJMrbnobPWjZlZdcPE0Dn6bkcJ0htns9fn6n79nF0G7SGEQo8RFwbWUF2DHkcy8QxwF0HfQPn0OhrZdp3P2pN5lX2m5FwTLdZLprm7eQcW6V2gqD6ouZZKIUXfu2jy9B02DYWLPAG18n9Wq181IOQVoOQ0BP6gddwohnlGyipDgY8VBTiTV0uRV5PKVu3J2Ak5LCmwHRv9T0TyR3R6sRNNGvA85uaOgewk4/fPGsvPXRblILlKSi44ZMp0gFSA3flG9Coal3s0KTE8t8q35+MTGDfGd5o4Y4Hivv 
 tasks: 
 - internal.lock_machines: 3 
 - internal.save_config: null 
 - internal.check_lock: null 
 - internal.connect: null 
 - internal.check_conflict: null 
 - internal.base: null 
 - internal.archive: null 
 - internal.coredump: null 
 - internal.syslog: null 
 - internal.timer: null 
 - ceph: null 
 - ceph-fuse: null 
 - workunit: 
     clients: 
       all: 
       - kernel_untar_build.sh 



 There's two stack traces here, mostly the same, but slightly different: 

 2012-11-15T13:35:30.140 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7f0be6374700 time 2012-11-15 13:35:04.432436 
 2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length()) 
 2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a) 
 2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb] 
 2012-11-15T13:35:30.141 INFO:teuthology.task.ceph.mds.a.err: 2: (MDBalancer::get_load(utime_t)+0x30b) [0x629d0b] 
 2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 3: (MDS::tick()+0xb3) [0x4ba8a3] 
 2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 4: (SafeTimer::timer_thread()+0x446) [0x782c26] 
 2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 5: (SafeTimerThread::entry()+0xd) [0x7838ad] 
 2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 6: (()+0x7e9a) [0x7f0becae7e9a] 
 2012-11-15T13:35:30.142 INFO:teuthology.task.ceph.mds.a.err: 7: (clone()+0x6d) [0x7f0beb08ccbd] 
 2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err:2012-11-15 13:35:04.433622 7f0be6374700 -1 ./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7f0be6374700 time 2012-11-15 13:35:04.432436 
 2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length()) 
 2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: 
 2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a) 
 2012-11-15T13:35:30.143 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb] 
 2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 2: (MDBalancer::get_load(utime_t)+0x30b) [0x629d0b] 
 2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 3: (MDS::tick()+0xb3) [0x4ba8a3] 
 2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 4: (SafeTimer::timer_thread()+0x446) [0x782c26] 
 2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 5: (SafeTimerThread::entry()+0xd) [0x7838ad] 
 2012-11-15T13:35:30.144 INFO:teuthology.task.ceph.mds.a.err: 6: (()+0x7e9a) [0x7f0becae7e9a] 
 2012-11-15T13:35:30.145 INFO:teuthology.task.ceph.mds.a.err: 7: (clone()+0x6d) [0x7f0beb08ccbd] 
 2012-11-15T13:35:30.145 INFO:teuthology.task.ceph.mds.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 2012-11-15T13:35:30.145 INFO:teuthology.task.ceph.mds.a.err: 
 2 



 2012-11-15T14:17:40.656 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7fe310343700 time 2012-11-15 14:17:14.909273 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length()) 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a) 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb] 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 2: (MDS::tick()+0xfe) [0x4ba8ee] 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 3: (SafeTimer::timer_thread()+0x446) [0x782c26] 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 4: (SafeTimerThread::entry()+0xd) [0x7838ad] 
 2012-11-15T14:17:40.657 INFO:teuthology.task.ceph.mds.a.err: 5: (()+0x7e9a) [0x7fe316ab2e9a] 
 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: 6: (clone()+0x6d) [0x7fe31505a4bd] 
 h2. 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 h2. 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err:2012-11-15 14:17:14.910292 7fe310343700 -1 ./common/PrioritizedQueue.h: In function 'unsigned int PrioritizedQueue<T, K>::length() [with T = DispatchQueue::QueueItem, K = long unsigned int]' thread 7fe310343700 time 2012-11-15 14:17:14.909273 
 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err:./common/PrioritizedQueue.h: 203: FAILED assert(i->second.length()) 
 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: 
 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.54-599-g2fd9f4d (2fd9f4d74fcc48e88c31e84e627d1504d86c6b4a) 
 2012-11-15T14:17:40.658 INFO:teuthology.task.ceph.mds.a.err: 1: (PrioritizedQueue<DispatchQueue::QueueItem, unsigned long>::length()+0xdb) [0x77bedb] 
 2012-11-15T14:17:40.659 INFO:teuthology.task.ceph.mds.a.err: 2: (MDS::tick()+0xfe) [0x4ba8ee] 
 

Back