Project

General

Profile

Bug #14766

OSD reporting ENOTEMPTY and crashing

Added by Jeffrey McDonald over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
Start date:
02/15/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

mail thread inconsistent PG -> unfound objects on an erasure coded system

Sam's update: This bug results both in the ENOTEMPTY symptom described below and pgs erroneously reported as inconsistent (see above thread).

Hi,

I'm seeing a large number of these type of errors on my ceph cluster. I have 310 OSDs (roughly 240 in this erasure encoded pool). The system is under load and rebalancing due to some some failed OSDs and I'm getting these errors below at the rate of 8-10 per day. The problem just seems to be that there is unexpected data there. I've noticed that there are other tickets with this same issue but I haven't found the issue is resolved. To repair this, I have been removing the empty directories and restarting the OSDs. Is this problem indicative of a larger issue? Is there a better way to repair the issue?

  1. uname -a
    Linux ceph07 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  2. ceph --version
    ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

Below is the OSD log with the failure.

Regards,
Jeff

    -4> 2016-02-15 01:44:14.976891 7fb567987700  1 -- 10.31.0.7:6811/858240 --> 10.31.0.67:0/857704 -- osd_ping(ping_reply e164491 stamp 2016-02-15 01:44:14.969356) v2 -- ?+0 0x33555400 con 0x131a8c60
    -3> 2016-02-15 01:44:14.994698 7fb576097700  0 filestore(/var/lib/ceph/osd/ceph-185)  error (39) Directory not empty not handled on operation 0x1c543098 (19395964.0.1, or op 1, counting from 0)
    -2> 2016-02-15 01:44:14.994853 7fb576097700  0 filestore(/var/lib/ceph/osd/ceph-185) ENOTEMPTY suggests garbage data in osd data dir
    -1> 2016-02-15 01:44:14.994868 7fb576097700  0 filestore(/var/lib/ceph/osd/ceph-185)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "remove",
            "collection": "70.53as4_head",
            "oid": "53a\/\/head\/\/70\/18446744073709551615\/4" 
        },
        {
            "op_num": 1,
            "op_name": "rmcoll",
            "collection": "70.53as4_head" 
        }
    ]
}

     0> 2016-02-15 01:44:15.002104 7fb576097700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fb576097700 time 2016-02-15 01:44:14.996917
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 6: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 7: (()+0x8182) [0x7fb582384182]
 8: (clone()+0x6d) [0x7fb5808ef47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.185.log
--- end dump of recent events ---
2016-02-15 01:44:15.087422 7fb576097700 -1 *** Caught signal (Aborted) **
 in thread 7fb576097700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7fb58238c340]
 3: (gsignal()+0x39) [0x7fb58082bcc9]
 4: (abort()+0x148) [0x7fb58082f0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb581136535]
 6: (()+0x5e6d6) [0x7fb5811346d6]
 7: (()+0x5e703) [0x7fb581134703]
 8: (()+0x5e922) [0x7fb581134922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 15: (()+0x8182) [0x7fb582384182]
 16: (clone()+0x6d) [0x7fb5808ef47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2016-02-15 01:44:15.087422 7fb576097700 -1 *** Caught signal (Aborted) **
 in thread 7fb576097700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7fb58238c340]
 3: (gsignal()+0x39) [0x7fb58082bcc9]
 4: (abort()+0x148) [0x7fb58082f0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb581136535]
 6: (()+0x5e6d6) [0x7fb5811346d6]
 7: (()+0x5e703) [0x7fb581134703]
 8: (()+0x5e922) [0x7fb581134922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 15: (()+0x8182) [0x7fb582384182]
 16: (clone()+0x6d) [0x7fb5808ef47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.185.log
--- end dump of recent events ---

Related issues

Duplicated by Ceph - Bug #15003: [hammer, master?] ec pool deep scrub turns up inconsistent objects inconsistently Duplicate 03/07/2016
Copied to Ceph - Backport #15149: hammer: OSD reporting ENOTEMPTY and crashing Resolved

History

#1 Updated by Kefu Chai over 3 years ago

  • Description updated (diff)

#2 Updated by Samuel Just over 3 years ago

It would help if you could post the contents of one of the pg directories where this happened.

#3 Updated by Jeffrey McDonald over 3 years ago

Hi,
Not sure if these are related or not. Here's an instance where the OSD crashes, then on restart, I receive the error on the existence of the collection. Once I remove the empty collection, the OSD starts without error.

Under heavy remapping load, I'm seeing many of these events, of order 5 per hour on 370 OSDs.

Regards,
Jeff

   -31> 2016-02-20 15:40:31.403966 7f858345f700  1 -- 10.31.0.1:6883/3768668 <== osd.314 10.31.0.3:6910/5632023 51085 ==== MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)]) v2 ==== 150+0+0 (1807425228 0 0) 0x6dcd8200 con 0x41bd8000
   -30> 2016-02-20 15:40:31.403980 7f858345f700  5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.403928, event: header_read, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
   -29> 2016-02-20 15:40:31.403985 7f858345f700  5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.403929, event: throttled, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
   -28> 2016-02-20 15:40:31.403988 7f858345f700  5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.403961, event: all_read, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
   -27> 2016-02-20 15:40:31.403991 7f858345f700  5 -- op tracker -- seq: 2181493, time: 0.000000, event: dispatched, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
   -26> 2016-02-20 15:40:31.404187 7f85b951c700  5 -- op tracker -- seq: 2181491, time: 2016-02-20 15:40:31.404187, event: reached_pg, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -25> 2016-02-20 15:40:31.404775 7f85b951c700  5 -- op tracker -- seq: 2181491, time: 2016-02-20 15:40:31.404775, event: done, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -24> 2016-02-20 15:40:31.404921 7f85b951c700  5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.404921, event: reached_pg, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
   -23> 2016-02-20 15:40:31.405170 7f85b951c700  5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.405170, event: done, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
   -22> 2016-02-20 15:40:31.406756 7f85890bb700  1 -- 10.31.0.1:6883/3768668 <== osd.187 10.31.0.7:6912/3477344 10441 ==== MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0)) v1 ==== 1048776+0+0 (3333256963 0 0) 0x54ae9900 con 0x4737d020
   -21> 2016-02-20 15:40:31.406788 7f85890bb700  5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.402486, event: header_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -20> 2016-02-20 15:40:31.406850 7f85890bb700  5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.402487, event: throttled, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -19> 2016-02-20 15:40:31.406853 7f85890bb700  5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.406732, event: all_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -18> 2016-02-20 15:40:31.406873 7f85890bb700  5 -- op tracker -- seq: 2181494, time: 0.000000, event: dispatched, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -17> 2016-02-20 15:40:31.406954 7f85b951c700  5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.406954, event: reached_pg, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -16> 2016-02-20 15:40:31.407253 7f85b951c700  5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.407253, event: done, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -15> 2016-02-20 15:40:31.413944 7f855f320700  1 -- 10.31.0.1:6883/3768668 <== osd.318 10.31.0.3:6813/6613649 23696 ==== MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0)) v1 ==== 1048776+0+0 (876574210 0 0) 0x4dac2f80 con 0x47c77760
   -14> 2016-02-20 15:40:31.413982 7f855f320700  5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.411037, event: header_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -13> 2016-02-20 15:40:31.413988 7f855f320700  5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.411039, event: throttled, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -12> 2016-02-20 15:40:31.413991 7f855f320700  5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.413925, event: all_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -11> 2016-02-20 15:40:31.413994 7f855f320700  5 -- op tracker -- seq: 2181495, time: 0.000000, event: dispatched, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
   -10> 2016-02-20 15:40:31.414051 7f85b951c700  5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.414051, event: reached_pg, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
    -9> 2016-02-20 15:40:31.414792 7f85b951c700  5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.414792, event: done, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
    -8> 2016-02-20 15:40:31.415648 7f85c4532700  1 -- 10.31.0.1:6883/3768668 <== osd.44 10.31.0.4:6895/3507238 36127 ==== pg_trim(70.8e8 to 146705'273589 e227332) v1 ==== 34+0+0 (2877468469 0 0) 0x6433ae00 con 0x48076940
    -7> 2016-02-20 15:40:31.415675 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415537, event: header_read, op: pg_trim(70.8e8 to 146705'273589 e227332)
    -6> 2016-02-20 15:40:31.415680 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415538, event: throttled, op: pg_trim(70.8e8 to 146705'273589 e227332)
    -5> 2016-02-20 15:40:31.415682 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415588, event: all_read, op: pg_trim(70.8e8 to 146705'273589 e227332)
    -4> 2016-02-20 15:40:31.415689 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415672, event: dispatched, op: pg_trim(70.8e8 to 146705'273589 e227332)
    -3> 2016-02-20 15:40:31.415692 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415692, event: waiting_for_osdmap, op: pg_trim(70.8e8 to 146705'273589 e227332)
    -2> 2016-02-20 15:40:31.415700 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415700, event: started, op: pg_trim(70.8e8 to 146705'273589 e227332)
    -1> 2016-02-20 15:40:31.415708 7f85c4532700  5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415708, event: done, op: pg_trim(70.8e8 to 146705'273589 e227332)
     0> 2016-02-20 15:40:31.426524 7f85d15c1700 -1 *** Caught signal (Aborted) **
 in thread 7f85d15c1700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7f85de35c340]
 3: (gsignal()+0x39) [0x7f85dc7fbcc9]
 4: (abort()+0x148) [0x7f85dc7ff0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f85dd106535]
 6: (()+0x5e6d6) [0x7f85dd1046d6]
 7: (()+0x5e703) [0x7f85dd104703]
 8: (()+0x5e922) [0x7f85dd104922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 15: (()+0x8182) [0x7f85de354182]
 16: (clone()+0x6d) [0x7f85dc8bf47d]
  log_file /var/log/ceph/ceph-osd.227.log
--- end dump of recent events ---
----
OSD restarted: 
---

2016-02-20 15:45:44.409242 7f251eb10900  0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 4037000
2016-02-20 15:45:44.509302 7f251eb10900  0 filestore(/var/lib/ceph/osd/ceph-227) backend xfs (magic 0x58465342)
2016-02-20 15:45:44.601867 7f251eb10900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_features: FIEMAP ioctl is supported and appears to work
2016-02-20 15:45:44.601888 7f251eb10900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-02-20 15:45:44.619323 7f251eb10900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-02-20 15:45:44.619491 7f251eb10900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_feature: extsize is supported and kernel 3.13.0-65-generic >= 3.5
2016-02-20 15:45:45.098361 7f251eb10900  0 filestore(/var/lib/ceph/osd/ceph-227) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
root@cephmon1:~# ssh ceph01 tail -200 /var/log/ceph/ceph-osd.227.log
   -20> 2016-02-20 15:45:52.638819 7fc309809900  2 journal read_entry 911970304 : seq 17104051 46 bytes
   -19> 2016-02-20 15:45:52.638824 7fc309809900  3 journal journal_replay: applying op seq 17104051
   -18> 2016-02-20 15:45:52.638826 7fc309809900  3 journal journal_replay: r = 0, op_seq now 17104051
   -17> 2016-02-20 15:45:52.638830 7fc309809900  2 journal read_entry 911974400 : seq 17104052 46 bytes
   -16> 2016-02-20 15:45:52.638835 7fc309809900  3 journal journal_replay: applying op seq 17104052
   -15> 2016-02-20 15:45:52.638837 7fc309809900  3 journal journal_replay: r = 0, op_seq now 17104052
   -14> 2016-02-20 15:45:52.639234 7fc309809900  2 journal read_entry 911978496 : seq 17104053 1050545 bytes
   -13> 2016-02-20 15:45:52.639242 7fc309809900  3 journal journal_replay: applying op seq 17104053
   -12> 2016-02-20 15:45:52.640747 7fc309809900  3 journal journal_replay: r = 0, op_seq now 17104053
   -11> 2016-02-20 15:45:52.640779 7fc309809900  2 journal read_entry 913035264 : seq 17104054 1007 bytes
   -10> 2016-02-20 15:45:52.640781 7fc309809900  3 journal journal_replay: applying op seq 17104054
    -9> 2016-02-20 15:45:52.640823 7fc309809900  3 journal journal_replay: r = 0, op_seq now 17104054
    -8> 2016-02-20 15:45:52.640836 7fc309809900  2 journal read_entry 913039360 : seq 17104055 46 bytes
    -7> 2016-02-20 15:45:52.640842 7fc309809900  3 journal journal_replay: applying op seq 17104055
    -6> 2016-02-20 15:45:52.640844 7fc309809900  3 journal journal_replay: r = 0, op_seq now 17104055
    -5> 2016-02-20 15:45:52.640848 7fc309809900  2 journal read_entry 913043456 : seq 17104056 264 bytes
    -4> 2016-02-20 15:45:52.640849 7fc309809900  3 journal journal_replay: applying op seq 17104056
    -3> 2016-02-20 15:45:52.641064 7fc309809900  0 filestore(/var/lib/ceph/osd/ceph-227)  error (39) Directory not empty not handled on operation 0x4cc8336 (17104056.0.1, or op 1, counting from 0)
    -2> 2016-02-20 15:45:52.641081 7fc309809900  0 filestore(/var/lib/ceph/osd/ceph-227) ENOTEMPTY suggests garbage data in osd data dir
    -1> 2016-02-20 15:45:52.641083 7fc309809900  0 filestore(/var/lib/ceph/osd/ceph-227)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "remove",
            "collection": "70.9ees1_head",
            "oid": "9ee\/\/head\/\/70\/18446744073709551615\/1" 
        },
        {
            "op_num": 1,
            "op_name": "rmcoll",
            "collection": "70.9ees1_head" 
        }
    ]
}

     0> 2016-02-20 15:45:52.643388 7fc309809900 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fc309809900 time 2016-02-20 15:45:52.641117
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 4: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
 5: (FileStore::mount()+0x3bb6) [0x9139f6]
 6: (OSD::init()+0x259) [0x6c59b9]
 7: (main()+0x2860) [0x6527e0]
 8: (__libc_start_main()+0xf5) [0x7fc306947ec5]
 9: /usr/bin/ceph-osd() [0x66b887]
  log_file /var/log/ceph/ceph-osd.227.log
--- end dump of recent events ---
2016-02-20 15:45:52.648074 7fc309809900 -1 *** Caught signal (Aborted) **
 in thread 7fc309809900

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7fc3084bd340]
 3: (gsignal()+0x39) [0x7fc30695ccc9]
 4: (abort()+0x148) [0x7fc3069600d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fc307267535]
 6: (()+0x5e6d6) [0x7fc3072656d6]
 7: (()+0x5e703) [0x7fc307265703]
 8: (()+0x5e922) [0x7fc307265922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
 13: (FileStore::mount()+0x3bb6) [0x9139f6]
 14: (OSD::init()+0x259) [0x6c59b9]
 15: (main()+0x2860) [0x6527e0]
 16: (__libc_start_main()+0xf5) [0x7fc306947ec5]
 17: /usr/bin/ceph-osd() [0x66b887]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2016-02-20 15:45:52.648074 7fc309809900 -1 *** Caught signal (Aborted) **
 in thread 7fc309809900

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7fc3084bd340]
 3: (gsignal()+0x39) [0x7fc30695ccc9]
 4: (abort()+0x148) [0x7fc3069600d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fc307267535]
 6: (()+0x5e6d6) [0x7fc3072656d6]
 7: (()+0x5e703) [0x7fc307265703]
 8: (()+0x5e922) [0x7fc307265922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 12: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
 13: (FileStore::mount()+0x3bb6) [0x9139f6]
 14: (OSD::init()+0x259) [0x6c59b9]
 15: (main()+0x2860) [0x6527e0]
 16: (__libc_start_main()+0xf5) [0x7fc306947ec5]
 17: /usr/bin/ceph-osd() [0x66b887]
  log_file /var/log/ceph/ceph-osd.227.log
--- end dump of recent events ---

Here is a listing of that directory:

ls -lR /var/lib/ceph/osd/ceph-227/current/70.9ees1_head
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head:
total 8
drwxr-xr-x 3 root root 4096 Dec 16 07:37 DIR_E

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E:
total 8
drwxr-xr-x 3 root root 4096 Dec 16 07:37 DIR_E

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E:
total 16
drwxr-xr-x 16 root root 28672 Feb 20 15:40 DIR_9

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9:
total 12
drwxr-xr-x 2 root root 53248 Feb 20 15:29 DIR_2
drwxr-xr-x 2 root root    10 Feb 20 15:30 DIR_3
drwxr-xr-x 2 root root    10 Feb 20 15:31 DIR_4
drwxr-xr-x 2 root root    10 Feb 20 15:31 DIR_5
drwxr-xr-x 2 root root    10 Feb 20 15:32 DIR_6
drwxr-xr-x 2 root root    10 Feb 20 15:33 DIR_7
drwxr-xr-x 2 root root    10 Feb 20 15:34 DIR_8
drwxr-xr-x 2 root root    10 Feb 20 15:35 DIR_9
drwxr-xr-x 2 root root    10 Feb 20 15:36 DIR_A
drwxr-xr-x 2 root root    10 Feb 20 15:37 DIR_B
drwxr-xr-x 2 root root    10 Feb 20 15:38 DIR_C
drwxr-xr-x 2 root root    10 Feb 20 15:39 DIR_D
drwxr-xr-x 2 root root    10 Feb 20 15:39 DIR_E
drwxr-xr-x 2 root root    10 Feb 20 15:40 DIR_F

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_2:
total 4
-rw-r--r-- 1 root root 0 Jan 23 20:20 default.724733.17\u\ushadow\uprostate\srnaseq\sd959d5dd-2454-4f07-b69e-9ead4a58b5f2\sUNCID\u2256596.bf46c30c-14fa-4e2a-a013-4e84f24eb63b.130722\uUNC9-SN296\u0385\uAD2F28ACXX\u8\uGTTTCG.tar.gz.2~RGMpBL1jBOB6Pa4ZQrdgVMxKHw0CIGu.6_0944d86844834ea5e09d_0_long

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_3:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_4:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_5:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_6:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_7:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_8:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_9:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_A:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_B:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_C:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_D:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_E:
total 0

/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_F:
total 0

#4 Updated by Jeffrey McDonald over 3 years ago

In the collections, there typically seems to be one type of file left aside from the directory structure:
I can give the full path, but beside from empty directories, these files reamin as empty, size-0 files:

-rw-r--r-- 1 root root 0 Jan 23 20:50 default.724733.17\u\ushadow\uprostate\srnaseq\saf72557b-f523-4c96-b304-b9fd075e1206\sUNCID\u2408222.f8fba04c-cd45-4f85-8b75-dcf5426b7637.140312\uUNC11-SN627\u0348\uAC3KRYACXX\u7\uGCCAAT.tar.gz.2~RZoXzKKooUIbmQsGhoa9iYNrE-pIwvK._a0fa4277d2d26da2174b_0_long

-rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s626dcf96-13f2-4eb0-a11d-e156bb81420e\sUNCID\u2189803.c1188eb0-1e8b-4451-b9bc-312f33bb9fd3.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uACAGTG.tar.gz.2~Hf1LtGNLxK\us0Xp6QgXp3BqTqJlRUWl_b74cdab5f3a7d4404c1e_0_long

-rw-r--r-- 1 root root 0 Jan 23 20:20 default.724733.17\u\ushadow\uprostate\srnaseq\sd959d5dd-2454-4f07-b69e-9ead4a58b5f2\sUNCID\u2256596.bf46c30c-14fa-4e2a-a013-4e84f24eb63b.130722\uUNC9-SN296\u0385\uAD2F28ACXX\u8\uGTTTCG.tar.gz.2~RGMpBL1jBOB6Pa4ZQrdgVMxKHw0CIGu.6_392587ace40e89b50fac_0_long

-rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s4e065fa4-4dfa-4631-94f3-9700ce313b1b\sUNCID\u2189801.8c9b95d4-ee46-4312-ba8c-8000c9988ee8.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uTAGCTT.tar.gz.2~2ZE0mFJ8NvxxMDBkxIGZYAxIa5haChH._22bd00f367f222b6422b_0_long

-rw-r--r-- 1 root root 0 Jan 23 20:16 default.724733.17\u\ushadow\uprostate\srnaseq\s34e7e2ba-7705-46f6-8cb9-0c09ed859637\sUNCID\u2190511.b8cc1fb1-5944-431c-aeec-a4301721f667.120502\uUNC14-SN744\u0235\uBD0YUTACXX\u5\uACTTGA.tar.gz.2~v9q5esz-UexY\uKY--1LD76vhto7lWmK_79004bd9aeb749d5b80e_0_long

-rw-r--r-- 1 root root 0 Jan 23 20:10 default.724733.17\u\ushadow\uprostate\srnaseq\s4b7bfbd9-eee2-4f3f-9723-3376e22f6841\sUNCID\u2190523.0cc5f629-5046-4d05-8a5f-923ce5c04b9e.120501\uUNC11-SN627\u0226\uAC0TGKACXX\u8\uACAGTG.tar.gz.2~3tDZznTfWBw-waCnVQhkvj4YKrBSNL0._85db4f3e6d52d82b62a9_0_long

Regards, 
Jeff

#5 Updated by Samuel Just over 3 years ago

  • Assignee set to Loic Dachary

Loic, can you take a look at this one?

#6 Updated by Jeffrey McDonald over 3 years ago

With quite reasonable statistics, now I can say that its these files which are left behind in collections....although they are empty stubs.

ceph3: -rw-r--r-- 1 root root 0 Jan 23 20:50 default.724733.17\u\ushadow\uprostate\srnaseq\saf72557b-f523-4c96-b304-b9fd075e1206\sUNCID\u2408222.f8fba04c-cd45-4f85-8b75-dcf5426b7637.140312\uUNC11-SN627\u0348\uAC3KRYACXX\u7\uGCCAAT.tar.gz.2~RZoXzKKooUIbmQsGhoa9iYNrE-pIwvK._9ddd3057d2eda52a749e_0_long
ceph3: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s4e065fa4-4dfa-4631-94f3-9700ce313b1b\sUNCID\u2189801.8c9b95d4-ee46-4312-ba8c-8000c9988ee8.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uTAGCTT.tar.gz.2~2ZE0mFJ8NvxxMDBkxIGZYAxIa5haChH._a50f02e2b292f9305ecf_0_long
ceph02: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s4e065fa4-4dfa-4631-94f3-9700ce313b1b\sUNCID\u2189801.8c9b95d4-ee46-4312-ba8c-8000c9988ee8.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uTAGCTT.tar.gz.2~2ZE0mFJ8NvxxMDBkxIGZYAxIa5haChH._22bd00f367f222b6422b_0_long
ceph02: -rw-r--r-- 1 root root 0 Jan 23 20:20 default.724733.17\u\ushadow\uprostate\srnaseq\sd959d5dd-2454-4f07-b69e-9ead4a58b5f2\sUNCID\u2256596.bf46c30c-14fa-4e2a-a013-4e84f24eb63b.130722\uUNC9-SN296\u0385\uAD2F28ACXX\u8\uGTTTCG.tar.gz.2~RGMpBL1jBOB6Pa4ZQrdgVMxKHw0CIGu.6_392587ace40e89b50fac_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 20:52 default.724733.17\u\ushadow\uprostate\srnaseq\s03e68f72-6d91-4d1a-b1c9-1749109c564f\sUNCID\u2409206.b31f1e9a-cc4d-4e17-ab88-d88e234df83d.140312\uUNC15-SN850\u0357\uAC3M7FACXX\u8\uTTAGGC.tar.gz.2~QHe8pyFLfdpqQGFNDfOCGDil5Bzt\um3_07f011aa8f850c6c2529_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 21:38 default.724733.17\u\ushadow\uprostate\srnaseq\s3b4d1b9f-8210-4c38-831c-dee85865dc08\sUNCID\u2190644.e67b3b01-fdd5-49de-8829-84298c131a5f.111216\uUNC10-SN254\u0314\uAD0JVAACXX\u4\uGATCAG.tar.gz.2~2lO7dCj5a5FrV781k-3HRPn5Xpn7G64._4ce240e0f6f77c738216_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 20:10 default.724733.17\u\ushadow\uprostate\srnaseq\s4b7bfbd9-eee2-4f3f-9723-3376e22f6841\sUNCID\u2190523.0cc5f629-5046-4d05-8a5f-923ce5c04b9e.120501\uUNC11-SN627\u0226\uAC0TGKACXX\u8\uACAGTG.tar.gz.2~3tDZznTfWBw-waCnVQhkvj4YKrBSNL0._85db4f3e6d52d82b62a9_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 20:16 default.724733.17\u\ushadow\uprostate\srnaseq\s34e7e2ba-7705-46f6-8cb9-0c09ed859637\sUNCID\u2190511.b8cc1fb1-5944-431c-aeec-a4301721f667.120502\uUNC14-SN744\u0235\uBD0YUTACXX\u5\uACTTGA.tar.gz.2~v9q5esz-UexY\uKY--1LD76vhto7lWmK_79004bd9aeb749d5b80e_0_long
ceph01: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s626dcf96-13f2-4eb0-a11d-e156bb81420e\sUNCID\u2189803.c1188eb0-1e8b-4451-b9bc-312f33bb9fd3.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uACAGTG.tar.gz.2~Hf1LtGNLxK\us0Xp6QgXp3BqTqJlRUWl_b039da33eb5aeda09a40_0_long
ceph01: drwxr-xr-x 3 root root 4096 Aug 17  2015 70.38es2_head
ceph01: -rw-r--r-- 1 root root 0 Jan 23 21:05 default.724733.17\u\ushadow\uprostate\srnaseq\sbe565de3-ef5f-4905-82e4-cc103a5be31f\sUNCID\u2479335.244e1006-d5b9-46cb-8478-8a4591fdb6be.140325\uUNC15-SN850\u0358\uAC3LP7ACXX\u3\uAGTCAA.tar.gz.2~nhURx9rHDXIYrlnMXm0SXYIpzJy0HH\u_c181530726bda5847812_0_long
ceph04: -rw-r--r-- 1 root root 0 Jan 23 20:50 default.724733.17\u\ushadow\uprostate\srnaseq\saf72557b-f523-4c96-b304-b9fd075e1206\sUNCID\u2408222.f8fba04c-cd45-4f85-8b75-dcf5426b7637.140312\uUNC11-SN627\u0348\uAC3KRYACXX\u7\uGCCAAT.tar.gz.2~RZoXzKKooUIbmQsGhoa9iYNrE-pIwvK._a0fa4277d2d26da2174b_0_long
ceph04: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s626dcf96-13f2-4eb0-a11d-e156bb81420e\sUNCID\u2189803.c1188eb0-1e8b-4451-b9bc-312f33bb9fd3.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uACAGTG.tar.gz.2~Hf1LtGNLxK\us0Xp6QgXp3BqTqJlRUWl_b74cdab5f3a7d4404c1e_0_long

They are (in S3-space object notation) from files like this one:

s3://yang4414-tcga/prostate/rnaseq/ffacf189-d2c4-4e28-af17-dcedf6fedebf/UNCID_2256550.79d5f928-67b2-4db0-b018-a52889200dd3.130723_UNC9-SN296_0386_BC2E4WACXX_1_GTGGCC.tar.gz

Regards,
Jeff

#7 Updated by Loic Dachary over 3 years ago

  • Status changed from New to In Progress

#8 Updated by Samuel Just over 3 years ago

  • Subject changed from OSD reporting ENOTEMPTY and crashing to [hammer] OSD reporting ENOTEMPTY and crashing
  • Assignee changed from Loic Dachary to Samuel Just
  • Priority changed from Normal to Urgent

I'm fairly sure this is a bug in ObjectStore::collection_list_partial. hobject_t start is stuffed without ceremony into a ghobject setting shard and gen to NO_SHARD and NO_GEN respectively. This'll tend to cause it to skip one object per chunk. I expect the reason this doesn't show up in testing is due to pg removal collection scanning using a large enough stride size to never need a second stride.

This is fixed in master as part of the wholesale restructuring of those interfaces, we'll have to do something else for hammer.

#9 Updated by Samuel Just over 3 years ago

  • Subject changed from [hammer] OSD reporting ENOTEMPTY and crashing to OSD reporting ENOTEMPTY and crashing
  • Assignee changed from Samuel Just to Loic Dachary

Nevermind, OSD::remove_dir does the right thing.

#10 Updated by xingyi wu over 3 years ago

we also encountered this bug serval months ago. Our cluster runs giant-0.87.2, with EC backended. When recovering or backfilling happens, some osds coredumped with the same call trace as Jeffrey pasted aboved. We did the same as Jeffrey did: remove the files(but the files are not necessarily empty, some of them have a size of 512KB) in the collection.
Now we stop expanding our EC cluster because this would trigger this bug and cause osd coredump, which is bad for service stability. @Samuel, is it possible and safe to backport it to giant?

#11 Updated by Samuel Just over 3 years ago

I'm now pretty confident that this is the same issue as the scrub bug described in the ceph-users thread '[ceph-users] inconsistent PG -> unfound objects on an erasure coded system'

#12 Updated by Loic Dachary over 3 years ago

  • Backport set to hammer

#13 Updated by Loic Dachary over 3 years ago

@xingyi wu giant is no longer backported or released. It is better to upgrade to hammer.

#14 Updated by Loic Dachary over 3 years ago

  • Description updated (diff)

#15 Updated by Loic Dachary over 3 years ago

trying to figure out what causes the duplicate pairs to appear using the logs provided at http://www.spinics.net/lists/ceph-users/msg26197.html.

#16 Updated by Loic Dachary over 3 years ago

  • Assignee changed from Loic Dachary to Samuel Just

#17 Updated by Samuel Just over 3 years ago

  • Description updated (diff)

#18 Updated by Samuel Just over 3 years ago

  • Status changed from In Progress to Need Review

#19 Updated by Samuel Just over 3 years ago

Note that the above patch will only prevent the bug which caused the orphaned files, it won't clean up any that already exist.

#20 Updated by Sage Weil over 3 years ago

  • Status changed from Need Review to Pending Backport
  • Backport changed from hammer to hammer,infernalis

#21 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #15148: infernalis: OSD reporting ENOTEMPTY and crashing added

#22 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #15149: hammer: OSD reporting ENOTEMPTY and crashing added

#23 Updated by Samuel Just over 3 years ago

  • Duplicated by Bug #15003: [hammer, master?] ec pool deep scrub turns up inconsistent objects inconsistently added

#24 Updated by Loic Dachary almost 3 years ago

  • Backport changed from hammer,infernalis to hammer

infernalis is EOL

#25 Updated by Loic Dachary almost 3 years ago

  • Copied to deleted (Backport #15148: infernalis: OSD reporting ENOTEMPTY and crashing)

#26 Updated by Nathan Cutler over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF