Project

General

Profile

Actions

Bug #14766

closed

OSD reporting ENOTEMPTY and crashing

Added by Jeffrey McDonald about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mail thread inconsistent PG -> unfound objects on an erasure coded system

Sam's update: This bug results both in the ENOTEMPTY symptom described below and pgs erroneously reported as inconsistent (see above thread).

Hi,

I'm seeing a large number of these type of errors on my ceph cluster. I have 310 OSDs (roughly 240 in this erasure encoded pool). The system is under load and rebalancing due to some some failed OSDs and I'm getting these errors below at the rate of 8-10 per day. The problem just seems to be that there is unexpected data there. I've noticed that there are other tickets with this same issue but I haven't found the issue is resolved. To repair this, I have been removing the empty directories and restarting the OSDs. Is this problem indicative of a larger issue? Is there a better way to repair the issue?

  1. uname -a
    Linux ceph07 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  2. ceph --version
    ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

Below is the OSD log with the failure.

Regards,
Jeff

    -4> 2016-02-15 01:44:14.976891 7fb567987700  1 -- 10.31.0.7:6811/858240 --> 10.31.0.67:0/857704 -- osd_ping(ping_reply e164491 stamp 2016-02-15 01:44:14.969356) v2 -- ?+0 0x33555400 con 0x131a8c60
    -3> 2016-02-15 01:44:14.994698 7fb576097700  0 filestore(/var/lib/ceph/osd/ceph-185)  error (39) Directory not empty not handled on operation 0x1c543098 (19395964.0.1, or op 1, counting from 0)
    -2> 2016-02-15 01:44:14.994853 7fb576097700  0 filestore(/var/lib/ceph/osd/ceph-185) ENOTEMPTY suggests garbage data in osd data dir
    -1> 2016-02-15 01:44:14.994868 7fb576097700  0 filestore(/var/lib/ceph/osd/ceph-185)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "remove",
            "collection": "70.53as4_head",
            "oid": "53a\/\/head\/\/70\/18446744073709551615\/4" 
        },
        {
            "op_num": 1,
            "op_name": "rmcoll",
            "collection": "70.53as4_head" 
        }
    ]
}

     0> 2016-02-15 01:44:15.002104 7fb576097700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fb576097700 time 2016-02-15 01:44:14.996917
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 6: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 7: (()+0x8182) [0x7fb582384182]
 8: (clone()+0x6d) [0x7fb5808ef47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.185.log
--- end dump of recent events ---
2016-02-15 01:44:15.087422 7fb576097700 -1 *** Caught signal (Aborted) **
 in thread 7fb576097700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7fb58238c340]
 3: (gsignal()+0x39) [0x7fb58082bcc9]
 4: (abort()+0x148) [0x7fb58082f0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb581136535]
 6: (()+0x5e6d6) [0x7fb5811346d6]
 7: (()+0x5e703) [0x7fb581134703]
 8: (()+0x5e922) [0x7fb581134922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 15: (()+0x8182) [0x7fb582384182]
 16: (clone()+0x6d) [0x7fb5808ef47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2016-02-15 01:44:15.087422 7fb576097700 -1 *** Caught signal (Aborted) **
 in thread 7fb576097700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7fb58238c340]
 3: (gsignal()+0x39) [0x7fb58082bcc9]
 4: (abort()+0x148) [0x7fb58082f0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb581136535]
 6: (()+0x5e6d6) [0x7fb5811346d6]
 7: (()+0x5e703) [0x7fb581134703]
 8: (()+0x5e922) [0x7fb581134922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 15: (()+0x8182) [0x7fb582384182]
 16: (clone()+0x6d) [0x7fb5808ef47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.185.log
--- end dump of recent events ---

Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #15003: [hammer, master?] ec pool deep scrub turns up inconsistent objects inconsistentlyDuplicateSamuel Just03/07/2016

Actions
Copied to Ceph - Backport #15149: hammer: OSD reporting ENOTEMPTY and crashingResolvedSamuel JustActions
Actions

Also available in: Atom PDF