Project

General

Profile

Activity

From 10/20/2014 to 11/18/2014

11/18/2014

02:21 PM Bug #10116: Ceph vm guest disk lockup when using fio
As per jdillaman's suggestion on IRC, I have backed off to the PVE 2.6.32-34-pve kernel from 3.10.0-5-pve and can no ... Brad House
12:49 PM Bug #10116: Ceph vm guest disk lockup when using fio
second batch dump from same locked process as requested Brad House
12:43 PM Bug #10116: Ceph vm guest disk lockup when using fio
and a 3rd for good measure Brad House
12:41 PM Bug #10116: Ceph vm guest disk lockup when using fio
another attempt and trace Brad House
12:30 PM Bug #10116: Ceph vm guest disk lockup when using fio
Attached gdb output with libc and qemu debug symbols. Brad House
09:31 AM Bug #10116: Ceph vm guest disk lockup when using fio
Greg, incidentally several of the attached backtraces show the Pipe reader thread waiting on the pipe lock:... Jason Dillaman
09:03 AM Bug #10116: Ceph vm guest disk lockup when using fio
Jason, exactly what information is making you think the Pipe is hung waiting on a lock? And what version is in use ri... Greg Farnum
07:48 AM Bug #10116: Ceph vm guest disk lockup when using fio
Logs show that the pipe reader to osd.0 is hung waiting for the pipe lock. The last message from that thread is:
<p...
Jason Dillaman
05:56 AM Bug #10116: Ceph vm guest disk lockup when using fio
Thanks, I'll start reviewing these this morning. Jason Dillaman
05:27 AM Bug #10116: Ceph vm guest disk lockup when using fio
Attacked is the blktrace of the latest lockup.
Then the qemu output exceeded your max file size (by a couple of KB),...
Brad House
04:12 AM Bug #10116: Ceph vm guest disk lockup when using fio
Sure, I'll do that this morning first. Then I found the repo that proxmox is using to build qemu-kvm, so I'll rebuil... Brad House
09:07 AM Bug #10123: "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
LibRBD.ListChildren was the last logged test Josh Durgin

11/17/2014

07:26 PM Bug #10116: Ceph vm guest disk lockup when using fio
I can see four outstanding read requests in the last set of logs that you provided. Any chance you can re-run the sa... Jason Dillaman
02:26 PM Bug #10116: Ceph vm guest disk lockup when using fio
Using krbd instead of librbd with qemu doesn't hang, however, in the guest with dd, the total sequential performance ... Brad House
11:05 AM Bug #10116: Ceph vm guest disk lockup when using fio
blktrace and qemu log attached as requested. I could not gracefully kill blktrace as the vm hardlocked so hopefully ... Brad House
10:33 AM Bug #10116: Ceph vm guest disk lockup when using fio
None of the Ceph threads in the provided backtraces appeared to be deadlocked. It's possible a IO completion is bein... Jason Dillaman
09:40 AM Bug #10116: Ceph vm guest disk lockup when using fio
logs from 3 runs back-to-back, forcibly killing the vm and restarting it between each attempt Brad House
09:28 AM Bug #10116: Ceph vm guest disk lockup when using fio
Brad, it would be helpful to see a few back-to-back GDB backtraces. In the full backtrace above, all blocked threads... Jason Dillaman
09:22 AM Bug #10116: Ceph vm guest disk lockup when using fio
CPU usage is 0 when the lock occurs, so I don't think it is due to excess cpu usage.
I can definitely try those ...
Brad House
08:35 AM Bug #10116: Ceph vm guest disk lockup when using fio
alexandre derumier wrote:
> Hi,
>
> >>kernel:BUG: soft lockup - CPU#0 stuck for 23s!
>
> by default share 1thr...
alexandre derumier
08:31 AM Bug #10116: Ceph vm guest disk lockup when using fio
Hi,
>>kernel:BUG: soft lockup - CPU#0 stuck for 23s!
by default share 1thread for many things (clock,io access,...
alexandre derumier
08:28 AM Bug #10116: Ceph vm guest disk lockup when using fio
During lockup:... Brad House
07:10 AM Bug #10116: Ceph vm guest disk lockup when using fio
I should also mention I am brad_mssw in the #ceph IRC channel on oftc if there are any suggestions or things to try. Brad House
05:47 AM Bug #10116: Ceph vm guest disk lockup when using fio
Realized I was missing the librados debug symbols, here it is again, and also backtraced all threads:... Brad House
05:36 AM Bug #10116: Ceph vm guest disk lockup when using fio
What is more interesting to me is if I break into it with GDB when it is hung, then tell it to continue, I get notifi... Brad House
09:49 AM Bug #10123 (Resolved): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-16_19:13:03-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
09:48 AM Bug #10122 (Resolved): "LibRBD.TestClone" FAILED in upgrade:dumpling-x-firefly-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-16_19:13:03-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein

11/16/2014

06:20 PM Bug #10116: Ceph vm guest disk lockup when using fio
I wonder if this isn't the issue from #9854.
fio gets through writing the test files, and the lock occurs during t...
Brad House
02:31 PM Bug #10116 (Closed): Ceph vm guest disk lockup when using fio
When running a disk benchmark within a guest, I'm getting a disk lockup that doesn't ever appear to resolve itself. ... Brad House

11/13/2014

08:32 AM Bug #9854 (Pending Backport): librbd: reads contending for cache space can cause livelock
Jason Dillaman

11/12/2014

05:15 PM Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
Jason Dillaman
05:13 PM Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
Commit b75f85a2 added new elements to the _Thread_ class, breaking ABI. In this (and several other upgrade tests fro... Jason Dillaman
12:37 PM Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
commit:e94d3c11edb9c9cbcf108463fdff8404df79be33 Josh Durgin
06:53 AM Feature #2467 (Resolved): qemu: implement bdrv_invalidate_cache
Merged upstream: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=be21788495fdc8251b04dd4bfd0cdce95c49d75b Jason Dillaman

11/11/2014

06:17 PM Bug #10002 (Fix Under Review): Errors during import_export test in upgrade:firefly-x-next-distro-...
https://github.com/ceph/ceph/pull/2899 Josh Durgin
08:23 AM Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:15:02-upgrade:dumpling-firefly-x:parallel-... Yuri Weinstein
08:17 AM Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Seems similar issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:05:02-upgrade:firefly:singleton-f... Yuri Weinstein

11/10/2014

06:10 PM Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
Sage Weil
06:45 AM Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
... Sage Weil
09:53 AM Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
#10045 Sage Weil
09:51 AM Bug #10051 (Won't Fix): kernel-mounted RBD image may block shutdown
init-rbdmap fails to unmap an RBD image when the latter is still in use.
As consequence system shutdown hangs dead w...
Dmitry Smirnov
08:41 AM Bug #10030 (Pending Backport): Crash when attempting to open non-existent parent image
Sage Weil

11/09/2014

05:12 AM Feature #10037 (Resolved): cache-tier: Optimise RBD image removal
While removing an RBD image from EC pool I've noticed that it bubbles-up to caching pool hence removal is very slow. ... Dmitry Smirnov

11/07/2014

07:48 AM Bug #10030 (Fix Under Review): Crash when attempting to open non-existent parent image
Jason Dillaman
07:09 AM Bug #10030 (Resolved): Crash when attempting to open non-existent parent image
If a child image is not able to open a parent image, librbd will incorrectly attempt to close the parent image handle... Jason Dillaman

11/06/2014

01:37 PM Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-05_23:00:03-rbd-master-testing-basic-multi/588155/... Yuri Weinstein
09:29 AM Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Same issues in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-05_17:18:01-upgrade:firefly-x-next-distro-b... Yuri Weinstein

11/05/2014

08:39 AM Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
suite:upgrade:dumpling-firefly-x
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_17:25:01-upgrade:dump...
Yuri Weinstein
08:32 AM Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
suite:upgrade:firefly:singleton
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-04_18:41:21-upgrade:firef...
Yuri Weinstein

11/04/2014

12:09 PM Bug #9854: librbd: reads contending for cache space can cause livelock
PR: https://github.com/ceph/ceph/pull/2820 Jason Dillaman
09:12 AM Bug #9854 (Fix Under Review): librbd: reads contending for cache space can cause livelock
Jason Dillaman
10:59 AM Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
suite:upgrade:dumpling-firefly-x
Same issue in job http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_17:25...
Yuri Weinstein
09:29 AM Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
Two jobs failed ['584634', '584648']
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_17:18:0...
Yuri Weinstein
09:29 AM Bug #9742: `rbd map lun` fails with: (2) No such file or directory on kernel 3.14.14 w/ udev-216 ...
I'm guessing CRYPTO_CBC kernel config option is not enabled - -ENOENT is most likely because crypto core can't find a... Ilya Dryomov
09:01 AM Bug #9936 (Pending Backport): Exporting images larger than 2GB fails
Jason Dillaman

10/29/2014

11:44 AM Bug #9936: Exporting images larger than 2GB fails
PR: https://github.com/ceph/ceph/pull/2828 Jason Dillaman
11:43 AM Bug #9936 (Resolved): Exporting images larger than 2GB fails
An lseek64 result code is copied into an int32, causing an overflow for large images. Jason Dillaman

10/27/2014

03:25 PM Bug #9391: fio rbd driver rewrites same blocks
@Mark: I have to take a look at fio for that. Is this all about sequential writes only? Do you see a different behavi... Danny Al-Gaaf

10/26/2014

04:51 PM Bug #9391: fio rbd driver rewrites same blocks
Hi Guys,
This is all on the fio side. From what I remember, when you are doing sequential writes and specify mult...
Mark Nelson
11:49 AM Bug #9855 (Resolved): rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-...
fixed test Sage Weil

10/24/2014

11:26 AM Bug #8912: librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)
For better searchability, the backtrace for this crash is:... Josh Durgin
11:24 AM Bug #9513 (Pending Backport): rbd_cache=true default setting is degading librbd performance ~10X ...
reverted the backport for now as fully fixing the ObjectCacher is too large a change close to the giant release Josh Durgin

10/23/2014

01:47 PM Bug #9854: librbd: reads contending for cache space can cause livelock
Reads thrashing the cache can be reproduced with:... Josh Durgin
11:38 AM Feature #9733: Separate rbd listing into CAP
Is the list of OSD class methods documented somewhere? Robert LeBlanc

10/22/2014

12:17 PM Bug #9854 (In Progress): librbd: reads contending for cache space can cause livelock
Jason Dillaman
11:41 AM Bug #9854: librbd: reads contending for cache space can cause livelock
Update:
Run teuthology-2014-10-21_23:17:01-upgrade:firefly:newer-firefly-distro-basic-vps
Job: ['565380']
Logs...
Yuri Weinstein
09:46 AM Bug #9857 (Resolved): rbd readahead division by zero exception
Jason Dillaman
09:45 AM Bug #9857: rbd readahead division by zero exception
PR: https://github.com/ceph/ceph/pull/2770 Jason Dillaman

10/21/2014

05:19 PM Feature #9733: Separate rbd listing into CAP
It sounds like Nova is configured to use RBD as the backing store for its ephemeral disk images instead of the local ... Jason Dillaman
11:51 AM Feature #9733: Separate rbd listing into CAP
OK, putting the pool argument first does work. We have consequently found out that Nova does require list permissions... Robert LeBlanc
10:54 AM Feature #9733: Separate rbd listing into CAP
Try placing the "pool=test" argument before the "object_prefix XYZ" portion of the cap:... Jason Dillaman
04:44 PM Bug #9857 (Fix Under Review): rbd readahead division by zero exception
Jason Dillaman
03:53 PM Bug #9857 (In Progress): rbd readahead division by zero exception
Jason Dillaman
02:42 PM Bug #9857 (Resolved): rbd readahead division by zero exception
When using old-format RBD images, the RBD readahead block alignments are initialized to zero because the stripe param... Jason Dillaman
04:07 PM Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
Tamilarasi muthamizhan wrote:
> I think this issue could be related to bug # 9288, upgrading clients when workload i...
Sage Weil
02:04 PM Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
I think this issue could be related to bug # 9288, upgrading clients when workload is in progress.
Tamilarasi muthamizhan
02:02 PM Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
more logs:
ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561562
Tamilarasi muthamizhan
11:11 AM Bug #9855: rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-vps run
logs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561993 Tamilarasi muthamizhan
11:07 AM Bug #9855 (Resolved): rbd "Segmentation fault" in upgrade:firefly:singleton-firefly-distro-basic-...
On:
os_type: rhel
os_version: '6.4'
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-20_20:50:...
Yuri Weinstein
09:16 AM Bug #9854 (Resolved): librbd: reads contending for cache space can cause livelock
As a result of accounting for reads properly with #9513. Using qemu-io (a test program) is one way to trigger this - ... Josh Durgin
09:06 AM Bug #9513 (Resolved): rbd_cache=true default setting is degading librbd performance ~10X in Giant
backported in commit:65be257e9295619b960b49f6aa80ecdf8ea4d16a Josh Durgin

10/20/2014

04:14 PM Feature #9733: Separate rbd listing into CAP
OK, so one more question. This looks like it allows access to any pool. Is there a way to limit this to a particular ... Robert LeBlanc
 

Also available in: Atom