Project

General

Profile

Actions

Bug #11468

closed

ENOTEMPTY removing a pg

Added by Yuri Weinstein almost 9 years ago. Updated almost 9 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/giant-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Those are new test runs in typica lab
Run: http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-23_15:05:02-upgrade:giant-x-hammer-distro-basic-typica/
Job: ['3232', '3234']
Logs: http://typica002.front.sepia.ceph.com/teuthology-2015-04-23_15:05:02-upgrade:giant-x-hammer-distro-basic-typica/3232/

Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
ceph version 0.94.1-6-g8a58d83 (8a58d83b0d039d2c2be353fee9c57c4e6181b662)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc275b]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xafef79]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xaff806]
 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xaffee7]
 5: (CephContextServiceThread::entry()+0x154) [0xbd2694]
 6: (()+0x8182) [0x7fb2690c8182]
 7: (clone()+0x6d) [0x7fb267612efd]
Actions #1

Updated by Yuri Weinstein almost 9 years ago

And again in the same suite, diff ceph version
Run: http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-25_15:05:01-upgrade:giant-x-hammer-distro-basic-typica/
Jobs: ['4608', '4610']
Logs: http://typica002.front.sepia.ceph.com/teuthology-2015-04-25_15:05:01-upgrade:giant-x-hammer-distro-basic-typica/4608/

Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
ceph version 0.87.1-108-gc1301e8 (c1301e84aee0f399db85e2d37818a66147a0ce78)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb8061b]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xabf679]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xabff06]
 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xac05e7]
 5: (CephContextServiceThread::entry()+0x154) [0xb94bf4]
 6: (()+0x8182) [0x7f6e3fd00182]
 7: (clone()+0x6d) [0x7f6e3e24a47d]
Actions #3

Updated by Samuel Just almost 9 years ago

  • Regression set to No

It seemed to have hung on split, I tried to get at the corefile, but the version had been removed.

Actions #4

Updated by Samuel Just almost 9 years ago

We may have to simply reduce the number of osds/node in these upgrade tests. 4/disk may just be too many.

Actions #5

Updated by Samuel Just almost 9 years ago

The typica nodes appear to be timing out with 4 osds on a single typica node.

Actions #6

Updated by Samuel Just almost 9 years ago

We should install collectl: http://tracker.ceph.com/issues/11589

Actions #7

Updated by Sage Weil almost 9 years ago

  • Subject changed from "FAILED assert(0 == "hit suicide timeout")" in upgrade:giant-x-hammer-distro-basic-typica run to ENOTEMPTY removing a pg
Actions #8

Updated by Samuel Just almost 9 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF