Project

General

Profile

Bug #14766

Updated by Kefu Chai almost 7 years ago

Hi,

I'm seeing a large number of these type of errors on my ceph cluster. I have 310 OSDs (roughly 240 in this erasure encoded pool). The system is under load and rebalancing due to some some failed OSDs and I'm getting these errors below at the rate of 8-10 per day. The problem just seems to be that there is unexpected data there. I've noticed that there are other tickets with this same issue but I haven't found the issue is resolved. To repair this, I have been removing the empty directories and restarting the OSDs. Is this problem indicative of a larger issue? Is there a better way to repair the issue?

# uname -a
Linux ceph07 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# ceph --version
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

Below is the OSD log with the failure.

Regards,
Jeff

<pre>


-4> 2016-02-15 01:44:14.976891 7fb567987700 1 -- 10.31.0.7:6811/858240 --> 10.31.0.67:0/857704 -- osd_ping(ping_reply e164491 stamp 2016-02-15 01:44:14.969356) v2 -- ?+0 0x33555400 con 0x131a8c60
-3> 2016-02-15 01:44:14.994698 7fb576097700 0 filestore(/var/lib/ceph/osd/ceph-185) error (39) Directory not empty not handled on operation 0x1c543098 (19395964.0.1, or op 1, counting from 0)
-2> 2016-02-15 01:44:14.994853 7fb576097700 0 filestore(/var/lib/ceph/osd/ceph-185) ENOTEMPTY suggests garbage data in osd data dir
-1> 2016-02-15 01:44:14.994868 7fb576097700 0 filestore(/var/lib/ceph/osd/ceph-185) transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "remove",
"collection": "70.53as4_head",
"oid": "53a\/\/head\/\/70\/18446744073709551615\/4"
},
{
"op_num": 1,
"op_name": "rmcoll",
"collection": "70.53as4_head"
}
]
}

0> 2016-02-15 01:44:15.002104 7fb576097700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fb576097700 time 2016-02-15 01:44:14.996917
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
6: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
7: (()+0x8182) [0x7fb582384182]
8: (clone()+0x6d) [0x7fb5808ef47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.185.log
--- end dump of recent events ---
2016-02-15 01:44:15.087422 7fb576097700 -1 *** Caught signal (Aborted) **
in thread 7fb576097700

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7fb58238c340]
3: (gsignal()+0x39) [0x7fb58082bcc9]
4: (abort()+0x148) [0x7fb58082f0d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb581136535]
6: (()+0x5e6d6) [0x7fb5811346d6]
7: (()+0x5e703) [0x7fb581134703]
8: (()+0x5e922) [0x7fb581134922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
15: (()+0x8182) [0x7fb582384182]
16: (clone()+0x6d) [0x7fb5808ef47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2016-02-15 01:44:15.087422 7fb576097700 -1 *** Caught signal (Aborted) **
in thread 7fb576097700

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7fb58238c340]
3: (gsignal()+0x39) [0x7fb58082bcc9]
4: (abort()+0x148) [0x7fb58082f0d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb581136535]
6: (()+0x5e6d6) [0x7fb5811346d6]
7: (()+0x5e703) [0x7fb581134703]
8: (()+0x5e922) [0x7fb581134922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8]
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a]
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
15: (()+0x8182) [0x7fb582384182]
16: (clone()+0x6d) [0x7fb5808ef47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.185.log
--- end dump of recent events ---
</pre>

Back