Project

General

Profile

Actions

Bug #3946

closed

rbd fsx failing in nightly

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

6376 FAIL scheduled_teuthology@teuthology collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:btrfs.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:rbd_fsx_cache_writeback.yaml 294s
6380 FAIL scheduled_teuthology@teuthology collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:ext4.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:rbd_fsx_cache_writeback.yaml 249s
6381 FAIL scheduled_teuthology@teuthology collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:ext4.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:rbd_fsx_cache_writethrough.yaml 234s
6384 FAIL scheduled_teuthology@teuthology collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:xfs.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:rbd_fsx_cache_writeback.yaml 284s
6385 FAIL scheduled_teuthology@teuthology collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:xfs.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:rbd_fsx_cache_writethrough.yaml 398s
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-01-27_01:00:03-regression-master-testing-gcov$
Actions #1

Updated by Sage Weil about 11 years ago

  • Project changed from Ceph to rbd
Actions #2

Updated by Ian Colle about 11 years ago

  • Assignee set to Josh Durgin
Actions #3

Updated by Josh Durgin about 11 years ago

I'm guessing these are related to recent objectcacher changes, since they didn't affect runs without caching. The core files seem to corrupt for some reason, and there's no trace of what happened in the logs.

Actions #4

Updated by Josh Durgin about 11 years ago

Reproducing locally seems to confirm this, since there was a recent change to replace commit_set() with flush_set():

#0  0x0000000000000031 in ?? ()
#1  0x00007f52b94e8800 in ~C_GatherBuilder (this=0x7fff34b20c60, __in_chrg=<value optimized out>) at ./include/Context.h:273
#2  0x00007f52b94e34fe in ObjectCacher::flush_set (this=0x1558910, oset=0x1558dd0, onfinish=0x1675660) at osdc/ObjectCacher.cc:1501
#3  0x00007f52b94aa808 in librbd::ImageCtx::flush_cache (this=0x1558010) at librbd/ImageCtx.cc:517
#4  0x00007f52b94c887c in librbd::_flush (ictx=0x1558010) at librbd/internal.cc:2475
#5  0x00007f52b94c27b0 in librbd::ictx_refresh (ictx=0x1558010) at librbd/internal.cc:1725
#6  0x00007f52b94c0860 in librbd::ictx_check (ictx=0x1558010) at librbd/internal.cc:1530
#7  0x00007f52b94b3cfe in librbd::snap_protect (ictx=0x1558010, snap_name=0x40744a "snap") at librbd/internal.cc:528
#8  0x00007f52b948ef50 in rbd_snap_protect (image=0x1558010, snap_name=0x40744a "snap") at librbd/librbd.cc:841
#9  0x0000000000405006 in do_clone () at test/librbd/fsx.c:837
#10 0x0000000000405b31 in test () at test/librbd/fsx.c:1073
#11 0x00000000004067f1 in main (argc=2, argv=0x7fff34b231e8) at test/librbd/fsx.c:1551

This occurred after just 246 ops in test_librbd_fsx with rbd caching on.

Actions #5

Updated by Josh Durgin about 11 years ago

  • Status changed from New to Resolved

Just an extra delete in a code path in flush_set that wasn't exercised before. Fixed by commit:3bc21143552b35698c9916c67494336de8964d2a

Actions

Also available in: Atom PDF