https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2016-02-16T13:15:37Z
Ceph
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=65959
2016-02-16T13:15:37Z
Kefu Chai
tchaikov@gmail.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/65959/diff?detail_id=63194">diff</a>)</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=65961
2016-02-16T14:58:33Z
Samuel Just
sjust@redhat.com
<ul></ul><p>It would help if you could post the contents of one of the pg directories where this happened.</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66237
2016-02-24T03:41:51Z
Jeffrey McDonald
jmcdonal@umn.edu
<ul></ul><p>Hi, <br />Not sure if these are related or not. Here's an instance where the OSD crashes, then on restart, I receive the error on the existence of the collection. Once I remove the empty collection, the OSD starts without error.</p>
<p>Under heavy remapping load, I'm seeing many of these events, of order 5 per hour on 370 OSDs.</p>
<p>Regards,<br />Jeff</p>
<pre>
-31> 2016-02-20 15:40:31.403966 7f858345f700 1 -- 10.31.0.1:6883/3768668 <== osd.314 10.31.0.3:6910/5632023 51085 ==== MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)]) v2 ==== 150+0+0 (1807425228 0 0) 0x6dcd8200 con 0x41bd8000
-30> 2016-02-20 15:40:31.403980 7f858345f700 5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.403928, event: header_read, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
-29> 2016-02-20 15:40:31.403985 7f858345f700 5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.403929, event: throttled, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
-28> 2016-02-20 15:40:31.403988 7f858345f700 5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.403961, event: all_read, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
-27> 2016-02-20 15:40:31.403991 7f858345f700 5 -- op tracker -- seq: 2181493, time: 0.000000, event: dispatched, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
-26> 2016-02-20 15:40:31.404187 7f85b951c700 5 -- op tracker -- seq: 2181491, time: 2016-02-20 15:40:31.404187, event: reached_pg, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-25> 2016-02-20 15:40:31.404775 7f85b951c700 5 -- op tracker -- seq: 2181491, time: 2016-02-20 15:40:31.404775, event: done, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-24> 2016-02-20 15:40:31.404921 7f85b951c700 5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.404921, event: reached_pg, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
-23> 2016-02-20 15:40:31.405170 7f85b951c700 5 -- op tracker -- seq: 2181493, time: 2016-02-20 15:40:31.405170, event: done, op: MOSDPGPushReply(70.8e8s0 227332 [PushReplyOp(73cce8e8/default.539464.85__shadow_.KwIQl00AcWAz11K-sG-CJ108U1xwrNe_2/head//70)])
-22> 2016-02-20 15:40:31.406756 7f85890bb700 1 -- 10.31.0.1:6883/3768668 <== osd.187 10.31.0.7:6912/3477344 10441 ==== MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0)) v1 ==== 1048776+0+0 (3333256963 0 0) 0x54ae9900 con 0x4737d020
-21> 2016-02-20 15:40:31.406788 7f85890bb700 5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.402486, event: header_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-20> 2016-02-20 15:40:31.406850 7f85890bb700 5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.402487, event: throttled, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-19> 2016-02-20 15:40:31.406853 7f85890bb700 5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.406732, event: all_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-18> 2016-02-20 15:40:31.406873 7f85890bb700 5 -- op tracker -- seq: 2181494, time: 0.000000, event: dispatched, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-17> 2016-02-20 15:40:31.406954 7f85b951c700 5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.406954, event: reached_pg, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-16> 2016-02-20 15:40:31.407253 7f85b951c700 5 -- op tracker -- seq: 2181494, time: 2016-02-20 15:40:31.407253, event: done, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-15> 2016-02-20 15:40:31.413944 7f855f320700 1 -- 10.31.0.1:6883/3768668 <== osd.318 10.31.0.3:6813/6613649 23696 ==== MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0)) v1 ==== 1048776+0+0 (876574210 0 0) 0x4dac2f80 con 0x47c77760
-14> 2016-02-20 15:40:31.413982 7f855f320700 5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.411037, event: header_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-13> 2016-02-20 15:40:31.413988 7f855f320700 5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.411039, event: throttled, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-12> 2016-02-20 15:40:31.413991 7f855f320700 5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.413925, event: all_read, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-11> 2016-02-20 15:40:31.413994 7f855f320700 5 -- op tracker -- seq: 2181495, time: 0.000000, event: dispatched, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-10> 2016-02-20 15:40:31.414051 7f85b951c700 5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.414051, event: reached_pg, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-9> 2016-02-20 15:40:31.414792 7f85b951c700 5 -- op tracker -- seq: 2181495, time: 2016-02-20 15:40:31.414792, event: done, op: MOSDECSubOpReadReply(70.b04s0 227332 ECSubReadReply(tid=99115, attrs_read=0))
-8> 2016-02-20 15:40:31.415648 7f85c4532700 1 -- 10.31.0.1:6883/3768668 <== osd.44 10.31.0.4:6895/3507238 36127 ==== pg_trim(70.8e8 to 146705'273589 e227332) v1 ==== 34+0+0 (2877468469 0 0) 0x6433ae00 con 0x48076940
-7> 2016-02-20 15:40:31.415675 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415537, event: header_read, op: pg_trim(70.8e8 to 146705'273589 e227332)
-6> 2016-02-20 15:40:31.415680 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415538, event: throttled, op: pg_trim(70.8e8 to 146705'273589 e227332)
-5> 2016-02-20 15:40:31.415682 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415588, event: all_read, op: pg_trim(70.8e8 to 146705'273589 e227332)
-4> 2016-02-20 15:40:31.415689 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415672, event: dispatched, op: pg_trim(70.8e8 to 146705'273589 e227332)
-3> 2016-02-20 15:40:31.415692 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415692, event: waiting_for_osdmap, op: pg_trim(70.8e8 to 146705'273589 e227332)
-2> 2016-02-20 15:40:31.415700 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415700, event: started, op: pg_trim(70.8e8 to 146705'273589 e227332)
-1> 2016-02-20 15:40:31.415708 7f85c4532700 5 -- op tracker -- seq: 2181496, time: 2016-02-20 15:40:31.415708, event: done, op: pg_trim(70.8e8 to 146705'273589 e227332)
0> 2016-02-20 15:40:31.426524 7f85d15c1700 -1 *** Caught signal (Aborted) **
in thread 7f85d15c1700
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7f85de35c340]
3: (gsignal()+0x39) [0x7f85dc7fbcc9]
4: (abort()+0x148) [0x7f85dc7ff0d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f85dd106535]
6: (()+0x5e6d6) [0x7f85dd1046d6]
7: (()+0x5e703) [0x7f85dd104703]
8: (()+0x5e922) [0x7f85dd104922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
15: (()+0x8182) [0x7f85de354182]
16: (clone()+0x6d) [0x7f85dc8bf47d]
log_file /var/log/ceph/ceph-osd.227.log
--- end dump of recent events ---
----
OSD restarted:
---
2016-02-20 15:45:44.409242 7f251eb10900 0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 4037000
2016-02-20 15:45:44.509302 7f251eb10900 0 filestore(/var/lib/ceph/osd/ceph-227) backend xfs (magic 0x58465342)
2016-02-20 15:45:44.601867 7f251eb10900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_features: FIEMAP ioctl is supported and appears to work
2016-02-20 15:45:44.601888 7f251eb10900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-02-20 15:45:44.619323 7f251eb10900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-02-20 15:45:44.619491 7f251eb10900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-227) detect_feature: extsize is supported and kernel 3.13.0-65-generic >= 3.5
2016-02-20 15:45:45.098361 7f251eb10900 0 filestore(/var/lib/ceph/osd/ceph-227) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
root@cephmon1:~# ssh ceph01 tail -200 /var/log/ceph/ceph-osd.227.log
-20> 2016-02-20 15:45:52.638819 7fc309809900 2 journal read_entry 911970304 : seq 17104051 46 bytes
-19> 2016-02-20 15:45:52.638824 7fc309809900 3 journal journal_replay: applying op seq 17104051
-18> 2016-02-20 15:45:52.638826 7fc309809900 3 journal journal_replay: r = 0, op_seq now 17104051
-17> 2016-02-20 15:45:52.638830 7fc309809900 2 journal read_entry 911974400 : seq 17104052 46 bytes
-16> 2016-02-20 15:45:52.638835 7fc309809900 3 journal journal_replay: applying op seq 17104052
-15> 2016-02-20 15:45:52.638837 7fc309809900 3 journal journal_replay: r = 0, op_seq now 17104052
-14> 2016-02-20 15:45:52.639234 7fc309809900 2 journal read_entry 911978496 : seq 17104053 1050545 bytes
-13> 2016-02-20 15:45:52.639242 7fc309809900 3 journal journal_replay: applying op seq 17104053
-12> 2016-02-20 15:45:52.640747 7fc309809900 3 journal journal_replay: r = 0, op_seq now 17104053
-11> 2016-02-20 15:45:52.640779 7fc309809900 2 journal read_entry 913035264 : seq 17104054 1007 bytes
-10> 2016-02-20 15:45:52.640781 7fc309809900 3 journal journal_replay: applying op seq 17104054
-9> 2016-02-20 15:45:52.640823 7fc309809900 3 journal journal_replay: r = 0, op_seq now 17104054
-8> 2016-02-20 15:45:52.640836 7fc309809900 2 journal read_entry 913039360 : seq 17104055 46 bytes
-7> 2016-02-20 15:45:52.640842 7fc309809900 3 journal journal_replay: applying op seq 17104055
-6> 2016-02-20 15:45:52.640844 7fc309809900 3 journal journal_replay: r = 0, op_seq now 17104055
-5> 2016-02-20 15:45:52.640848 7fc309809900 2 journal read_entry 913043456 : seq 17104056 264 bytes
-4> 2016-02-20 15:45:52.640849 7fc309809900 3 journal journal_replay: applying op seq 17104056
-3> 2016-02-20 15:45:52.641064 7fc309809900 0 filestore(/var/lib/ceph/osd/ceph-227) error (39) Directory not empty not handled on operation 0x4cc8336 (17104056.0.1, or op 1, counting from 0)
-2> 2016-02-20 15:45:52.641081 7fc309809900 0 filestore(/var/lib/ceph/osd/ceph-227) ENOTEMPTY suggests garbage data in osd data dir
-1> 2016-02-20 15:45:52.641083 7fc309809900 0 filestore(/var/lib/ceph/osd/ceph-227) transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "remove",
"collection": "70.9ees1_head",
"oid": "9ee\/\/head\/\/70\/18446744073709551615\/1"
},
{
"op_num": 1,
"op_name": "rmcoll",
"collection": "70.9ees1_head"
}
]
}
0> 2016-02-20 15:45:52.643388 7fc309809900 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fc309809900 time 2016-02-20 15:45:52.641117
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
4: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
5: (FileStore::mount()+0x3bb6) [0x9139f6]
6: (OSD::init()+0x259) [0x6c59b9]
7: (main()+0x2860) [0x6527e0]
8: (__libc_start_main()+0xf5) [0x7fc306947ec5]
9: /usr/bin/ceph-osd() [0x66b887]
log_file /var/log/ceph/ceph-osd.227.log
--- end dump of recent events ---
2016-02-20 15:45:52.648074 7fc309809900 -1 *** Caught signal (Aborted) **
in thread 7fc309809900
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7fc3084bd340]
3: (gsignal()+0x39) [0x7fc30695ccc9]
4: (abort()+0x148) [0x7fc3069600d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fc307267535]
6: (()+0x5e6d6) [0x7fc3072656d6]
7: (()+0x5e703) [0x7fc307265703]
8: (()+0x5e922) [0x7fc307265922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
12: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
13: (FileStore::mount()+0x3bb6) [0x9139f6]
14: (OSD::init()+0x259) [0x6c59b9]
15: (main()+0x2860) [0x6527e0]
16: (__libc_start_main()+0xf5) [0x7fc306947ec5]
17: /usr/bin/ceph-osd() [0x66b887]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2016-02-20 15:45:52.648074 7fc309809900 -1 *** Caught signal (Aborted) **
in thread 7fc309809900
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7fc3084bd340]
3: (gsignal()+0x39) [0x7fc30695ccc9]
4: (abort()+0x148) [0x7fc3069600d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fc307267535]
6: (()+0x5e6d6) [0x7fc3072656d6]
7: (()+0x5e703) [0x7fc307265703]
8: (()+0x5e922) [0x7fc307265922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
12: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
13: (FileStore::mount()+0x3bb6) [0x9139f6]
14: (OSD::init()+0x259) [0x6c59b9]
15: (main()+0x2860) [0x6527e0]
16: (__libc_start_main()+0xf5) [0x7fc306947ec5]
17: /usr/bin/ceph-osd() [0x66b887]
log_file /var/log/ceph/ceph-osd.227.log
--- end dump of recent events ---
</pre>
<p>Here is a listing of that directory:</p>
<pre>
ls -lR /var/lib/ceph/osd/ceph-227/current/70.9ees1_head
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head:
total 8
drwxr-xr-x 3 root root 4096 Dec 16 07:37 DIR_E
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E:
total 8
drwxr-xr-x 3 root root 4096 Dec 16 07:37 DIR_E
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E:
total 16
drwxr-xr-x 16 root root 28672 Feb 20 15:40 DIR_9
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9:
total 12
drwxr-xr-x 2 root root 53248 Feb 20 15:29 DIR_2
drwxr-xr-x 2 root root 10 Feb 20 15:30 DIR_3
drwxr-xr-x 2 root root 10 Feb 20 15:31 DIR_4
drwxr-xr-x 2 root root 10 Feb 20 15:31 DIR_5
drwxr-xr-x 2 root root 10 Feb 20 15:32 DIR_6
drwxr-xr-x 2 root root 10 Feb 20 15:33 DIR_7
drwxr-xr-x 2 root root 10 Feb 20 15:34 DIR_8
drwxr-xr-x 2 root root 10 Feb 20 15:35 DIR_9
drwxr-xr-x 2 root root 10 Feb 20 15:36 DIR_A
drwxr-xr-x 2 root root 10 Feb 20 15:37 DIR_B
drwxr-xr-x 2 root root 10 Feb 20 15:38 DIR_C
drwxr-xr-x 2 root root 10 Feb 20 15:39 DIR_D
drwxr-xr-x 2 root root 10 Feb 20 15:39 DIR_E
drwxr-xr-x 2 root root 10 Feb 20 15:40 DIR_F
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_2:
total 4
-rw-r--r-- 1 root root 0 Jan 23 20:20 default.724733.17\u\ushadow\uprostate\srnaseq\sd959d5dd-2454-4f07-b69e-9ead4a58b5f2\sUNCID\u2256596.bf46c30c-14fa-4e2a-a013-4e84f24eb63b.130722\uUNC9-SN296\u0385\uAD2F28ACXX\u8\uGTTTCG.tar.gz.2~RGMpBL1jBOB6Pa4ZQrdgVMxKHw0CIGu.6_0944d86844834ea5e09d_0_long
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_3:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_4:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_5:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_6:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_7:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_8:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_9:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_A:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_B:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_C:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_D:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_E:
total 0
/var/lib/ceph/osd/ceph-227/current/70.9ees1_head/DIR_E/DIR_E/DIR_9/DIR_F:
total 0
</pre>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66333
2016-02-25T16:12:25Z
Jeffrey McDonald
jmcdonal@umn.edu
<ul></ul><p>In the collections, there typically seems to be one type of file left aside from the directory structure: <br />I can give the full path, but beside from empty directories, these files reamin as empty, size-0 files:</p>
<pre>
-rw-r--r-- 1 root root 0 Jan 23 20:50 default.724733.17\u\ushadow\uprostate\srnaseq\saf72557b-f523-4c96-b304-b9fd075e1206\sUNCID\u2408222.f8fba04c-cd45-4f85-8b75-dcf5426b7637.140312\uUNC11-SN627\u0348\uAC3KRYACXX\u7\uGCCAAT.tar.gz.2~RZoXzKKooUIbmQsGhoa9iYNrE-pIwvK._a0fa4277d2d26da2174b_0_long
-rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s626dcf96-13f2-4eb0-a11d-e156bb81420e\sUNCID\u2189803.c1188eb0-1e8b-4451-b9bc-312f33bb9fd3.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uACAGTG.tar.gz.2~Hf1LtGNLxK\us0Xp6QgXp3BqTqJlRUWl_b74cdab5f3a7d4404c1e_0_long
-rw-r--r-- 1 root root 0 Jan 23 20:20 default.724733.17\u\ushadow\uprostate\srnaseq\sd959d5dd-2454-4f07-b69e-9ead4a58b5f2\sUNCID\u2256596.bf46c30c-14fa-4e2a-a013-4e84f24eb63b.130722\uUNC9-SN296\u0385\uAD2F28ACXX\u8\uGTTTCG.tar.gz.2~RGMpBL1jBOB6Pa4ZQrdgVMxKHw0CIGu.6_392587ace40e89b50fac_0_long
-rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s4e065fa4-4dfa-4631-94f3-9700ce313b1b\sUNCID\u2189801.8c9b95d4-ee46-4312-ba8c-8000c9988ee8.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uTAGCTT.tar.gz.2~2ZE0mFJ8NvxxMDBkxIGZYAxIa5haChH._22bd00f367f222b6422b_0_long
-rw-r--r-- 1 root root 0 Jan 23 20:16 default.724733.17\u\ushadow\uprostate\srnaseq\s34e7e2ba-7705-46f6-8cb9-0c09ed859637\sUNCID\u2190511.b8cc1fb1-5944-431c-aeec-a4301721f667.120502\uUNC14-SN744\u0235\uBD0YUTACXX\u5\uACTTGA.tar.gz.2~v9q5esz-UexY\uKY--1LD76vhto7lWmK_79004bd9aeb749d5b80e_0_long
-rw-r--r-- 1 root root 0 Jan 23 20:10 default.724733.17\u\ushadow\uprostate\srnaseq\s4b7bfbd9-eee2-4f3f-9723-3376e22f6841\sUNCID\u2190523.0cc5f629-5046-4d05-8a5f-923ce5c04b9e.120501\uUNC11-SN627\u0226\uAC0TGKACXX\u8\uACAGTG.tar.gz.2~3tDZznTfWBw-waCnVQhkvj4YKrBSNL0._85db4f3e6d52d82b62a9_0_long
Regards,
Jeff
</pre>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66334
2016-02-25T17:05:19Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Loïc Dachary</i></li></ul><p>Loic, can you take a look at this one?</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66361
2016-02-25T23:24:35Z
Jeffrey McDonald
jmcdonal@umn.edu
<ul></ul><p>With quite reasonable statistics, now I can say that its these files which are left behind in collections....although they are empty stubs.</p>
<pre>
ceph3: -rw-r--r-- 1 root root 0 Jan 23 20:50 default.724733.17\u\ushadow\uprostate\srnaseq\saf72557b-f523-4c96-b304-b9fd075e1206\sUNCID\u2408222.f8fba04c-cd45-4f85-8b75-dcf5426b7637.140312\uUNC11-SN627\u0348\uAC3KRYACXX\u7\uGCCAAT.tar.gz.2~RZoXzKKooUIbmQsGhoa9iYNrE-pIwvK._9ddd3057d2eda52a749e_0_long
ceph3: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s4e065fa4-4dfa-4631-94f3-9700ce313b1b\sUNCID\u2189801.8c9b95d4-ee46-4312-ba8c-8000c9988ee8.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uTAGCTT.tar.gz.2~2ZE0mFJ8NvxxMDBkxIGZYAxIa5haChH._a50f02e2b292f9305ecf_0_long
ceph02: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s4e065fa4-4dfa-4631-94f3-9700ce313b1b\sUNCID\u2189801.8c9b95d4-ee46-4312-ba8c-8000c9988ee8.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uTAGCTT.tar.gz.2~2ZE0mFJ8NvxxMDBkxIGZYAxIa5haChH._22bd00f367f222b6422b_0_long
ceph02: -rw-r--r-- 1 root root 0 Jan 23 20:20 default.724733.17\u\ushadow\uprostate\srnaseq\sd959d5dd-2454-4f07-b69e-9ead4a58b5f2\sUNCID\u2256596.bf46c30c-14fa-4e2a-a013-4e84f24eb63b.130722\uUNC9-SN296\u0385\uAD2F28ACXX\u8\uGTTTCG.tar.gz.2~RGMpBL1jBOB6Pa4ZQrdgVMxKHw0CIGu.6_392587ace40e89b50fac_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 20:52 default.724733.17\u\ushadow\uprostate\srnaseq\s03e68f72-6d91-4d1a-b1c9-1749109c564f\sUNCID\u2409206.b31f1e9a-cc4d-4e17-ab88-d88e234df83d.140312\uUNC15-SN850\u0357\uAC3M7FACXX\u8\uTTAGGC.tar.gz.2~QHe8pyFLfdpqQGFNDfOCGDil5Bzt\um3_07f011aa8f850c6c2529_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 21:38 default.724733.17\u\ushadow\uprostate\srnaseq\s3b4d1b9f-8210-4c38-831c-dee85865dc08\sUNCID\u2190644.e67b3b01-fdd5-49de-8829-84298c131a5f.111216\uUNC10-SN254\u0314\uAD0JVAACXX\u4\uGATCAG.tar.gz.2~2lO7dCj5a5FrV781k-3HRPn5Xpn7G64._4ce240e0f6f77c738216_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 20:10 default.724733.17\u\ushadow\uprostate\srnaseq\s4b7bfbd9-eee2-4f3f-9723-3376e22f6841\sUNCID\u2190523.0cc5f629-5046-4d05-8a5f-923ce5c04b9e.120501\uUNC11-SN627\u0226\uAC0TGKACXX\u8\uACAGTG.tar.gz.2~3tDZznTfWBw-waCnVQhkvj4YKrBSNL0._85db4f3e6d52d82b62a9_0_long
ceph1: -rw-r--r-- 1 root root 0 Jan 23 20:16 default.724733.17\u\ushadow\uprostate\srnaseq\s34e7e2ba-7705-46f6-8cb9-0c09ed859637\sUNCID\u2190511.b8cc1fb1-5944-431c-aeec-a4301721f667.120502\uUNC14-SN744\u0235\uBD0YUTACXX\u5\uACTTGA.tar.gz.2~v9q5esz-UexY\uKY--1LD76vhto7lWmK_79004bd9aeb749d5b80e_0_long
ceph01: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s626dcf96-13f2-4eb0-a11d-e156bb81420e\sUNCID\u2189803.c1188eb0-1e8b-4451-b9bc-312f33bb9fd3.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uACAGTG.tar.gz.2~Hf1LtGNLxK\us0Xp6QgXp3BqTqJlRUWl_b039da33eb5aeda09a40_0_long
ceph01: drwxr-xr-x 3 root root 4096 Aug 17 2015 70.38es2_head
ceph01: -rw-r--r-- 1 root root 0 Jan 23 21:05 default.724733.17\u\ushadow\uprostate\srnaseq\sbe565de3-ef5f-4905-82e4-cc103a5be31f\sUNCID\u2479335.244e1006-d5b9-46cb-8478-8a4591fdb6be.140325\uUNC15-SN850\u0358\uAC3LP7ACXX\u3\uAGTCAA.tar.gz.2~nhURx9rHDXIYrlnMXm0SXYIpzJy0HH\u_c181530726bda5847812_0_long
ceph04: -rw-r--r-- 1 root root 0 Jan 23 20:50 default.724733.17\u\ushadow\uprostate\srnaseq\saf72557b-f523-4c96-b304-b9fd075e1206\sUNCID\u2408222.f8fba04c-cd45-4f85-8b75-dcf5426b7637.140312\uUNC11-SN627\u0348\uAC3KRYACXX\u7\uGCCAAT.tar.gz.2~RZoXzKKooUIbmQsGhoa9iYNrE-pIwvK._a0fa4277d2d26da2174b_0_long
ceph04: -rw-r--r-- 1 root root 0 Jan 23 20:45 default.724733.17\u\ushadow\uprostate\srnaseq\s626dcf96-13f2-4eb0-a11d-e156bb81420e\sUNCID\u2189803.c1188eb0-1e8b-4451-b9bc-312f33bb9fd3.120507\uUNC10-SN254\u0355\uAC0TR8ACXX\u7\uACAGTG.tar.gz.2~Hf1LtGNLxK\us0Xp6QgXp3BqTqJlRUWl_b74cdab5f3a7d4404c1e_0_long
</pre>
<p>They are (in S3-space object notation) from files like this one: </p>
<pre><code>s3://yang4414-tcga/prostate/rnaseq/ffacf189-d2c4-4e28-af17-dcedf6fedebf/UNCID_2256550.79d5f928-67b2-4db0-b018-a52889200dd3.130723_UNC9-SN296_0386_BC2E4WACXX_1_GTGGCC.tar.gz</code></pre>
<p>Regards, <br />Jeff</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66475
2016-02-27T09:03:52Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66950
2016-03-08T01:10:20Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Subject</strong> changed from <i>OSD reporting ENOTEMPTY and crashing</i> to <i>[hammer] OSD reporting ENOTEMPTY and crashing</i></li><li><strong>Assignee</strong> changed from <i>Loïc Dachary</i> to <i>Samuel Just</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p>I'm fairly sure this is a bug in ObjectStore::collection_list_partial. hobject_t start is stuffed without ceremony into a ghobject setting shard and gen to NO_SHARD and NO_GEN respectively. This'll tend to cause it to skip one object per chunk. I expect the reason this doesn't show up in testing is due to pg removal collection scanning using a large enough stride size to never need a second stride.</p>
<p>This is fixed in master as part of the wholesale restructuring of those interfaces, we'll have to do something else for hammer.</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66951
2016-03-08T01:37:22Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Subject</strong> changed from <i>[hammer] OSD reporting ENOTEMPTY and crashing</i> to <i>OSD reporting ENOTEMPTY and crashing</i></li><li><strong>Assignee</strong> changed from <i>Samuel Just</i> to <i>Loïc Dachary</i></li></ul><p>Nevermind, OSD::remove_dir does the right thing.</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=66967
2016-03-08T08:17:48Z
xingyi wu
wuxingyi@le.com
<ul></ul><p>we also encountered this bug serval months ago. Our cluster runs giant-0.87.2, with EC backended. When recovering or backfilling happens, some osds coredumped with the same call trace as Jeffrey pasted aboved. We did the same as Jeffrey did: remove the files(but the files are not necessarily empty, some of them have a size of 512KB) in the collection.<br />Now we stop expanding our EC cluster because this would trigger this bug and cause osd coredump, which is bad for service stability. @Samuel, is it possible and safe to backport it to giant?</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67037
2016-03-08T19:27:13Z
Samuel Just
sjust@redhat.com
<ul></ul><p>I'm now pretty confident that this is the same issue as the scrub bug described in the ceph-users thread '[ceph-users] inconsistent PG -> unfound objects on an erasure coded system'</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67114
2016-03-09T04:11:43Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Backport</strong> set to <i>hammer</i></li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67115
2016-03-09T04:20:00Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>@xingyi wu giant is no longer backported or released. It is better to upgrade to hammer.</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67116
2016-03-09T04:22:27Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/67116/diff?detail_id=64234">diff</a>)</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67123
2016-03-09T05:57:39Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>trying to figure out what causes the duplicate pairs to appear using the logs provided at <a class="external" href="http://www.spinics.net/lists/ceph-users/msg26197.html">http://www.spinics.net/lists/ceph-users/msg26197.html</a>.</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67143
2016-03-09T15:05:53Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Assignee</strong> changed from <i>Loïc Dachary</i> to <i>Samuel Just</i></li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67255
2016-03-10T19:10:17Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/67255/diff?detail_id=64347">diff</a>)</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67562
2016-03-15T18:13:24Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/8136">https://github.com/ceph/ceph/pull/8136</a></p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67563
2016-03-15T18:15:23Z
Samuel Just
sjust@redhat.com
<ul></ul><p>Note that the above patch will only prevent the bug which caused the orphaned files, it won't clean up any that already exist.</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67564
2016-03-15T18:16:36Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> changed from <i>hammer</i> to <i>hammer,infernalis</i></li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67578
2016-03-15T20:10:54Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-6 priority-4 priority-default closed" href="/issues/15148">Backport #15148</a>: infernalis: OSD reporting ENOTEMPTY and crashing</i> added</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67580
2016-03-15T20:11:00Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/15149">Backport #15149</a>: hammer: OSD reporting ENOTEMPTY and crashing</i> added</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=67816
2016-03-18T15:22:46Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-10 priority-6 priority-high2 closed" href="/issues/15003">Bug #15003</a>: [hammer, master?] ec pool deep scrub turns up inconsistent objects inconsistently</i> added</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=76126
2016-08-08T08:33:53Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Backport</strong> changed from <i>hammer,infernalis</i> to <i>hammer</i></li></ul><p>infernalis is EOL</p>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=76521
2016-08-12T09:51:42Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Copied to</strong> deleted (<i><a class="issue tracker-9 status-6 priority-4 priority-default closed" href="/issues/15148">Backport #15148</a>: infernalis: OSD reporting ENOTEMPTY and crashing</i>)</li></ul>
Ceph - Bug #14766: OSD reporting ENOTEMPTY and crashing
https://tracker.ceph.com/issues/14766?journal_id=84995
2017-01-27T20:24:57Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>