Bug #5163
filestore: ENOTEMPTY on object removal
0%
Description
I had an osd crash during normal opperation, this could possibly be related to 4927. I was able to restart the osd and continue running. The underlying xfs filesystem appears to be normal and dmesg is clean.
May 24 10:14:37 gwioss1 ceph-osd: -22> 2013-05-24 10:14:36.868976 7ff0fdbc2700 0 filestore(/data/osd.9) error (39) Directory not empty not handled on operation 13 (28835393.1.0, or op 0, counting from 0)
May 24 10:14:37 gwioss1 ceph-osd: -21> 2013-05-24 10:14:36.884334 7ff0fdbc2700 0 filestore(/data/osd.9) ENOTEMPTY suggests garbage data in osd data dir
May 24 10:14:37 gwioss1 ceph-osd: -20> 2013-05-24 10:14:36.884340 7ff0fdbc2700 0 filestore(/data/osd.9) transaction dump:#012{ "ops": [#012 { "op_num": 0,#012 "op_name": "remove",#012 "collection": "2.363_head",#012 "oid": "8416ff63\/rb.0.2025.2ae8944a.000000005e78\/head\/\/2"}]}
May 24 10:14:37 gwioss1 ceph-osd: 0> 2013-05-24 10:14:37.095302 7ff0fdbc2700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)' thread 7ff0fdbc2700 time 2013-05-24 10:14:36.911403#012os/FileStore.cc: 2697: FAILED assert(0 == "unexpected error")#012#012 ceph version 0.61-4-ge99daaf (e99daaf76c63377a03c1078f2a22b31b1f617304)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int)+0x94c) [0x804fec]#012 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x69) [0x808479]#012 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x17e) [0x80860e]#012 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x8e646a]#012 5: (ThreadPool::WorkThread::entry()+0x10) [0x8e7710]#012 6: (()+0x7e9a) [0x7ff107dc5e9a]#012 7: (clone()+0x6d) [0x7ff105efbccd]#012 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
History
#1 Updated by Mike Lowe almost 11 years ago
The object in question was part of an rbd image that a vm was doing a fstrim on when the crash happened.
#2 Updated by Tamilarasi muthamizhan almost 11 years ago
- Priority changed from High to Urgent
This happened when running rados_python test on the arm test setup.
2013-05-31 16:13:02.587030 af3ff420 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)' thread af3ff420 time 2013-05-31 16:13:02.582218 os/FileStore.cc: 2698: FAILED assert(0 == "unexpected error") ceph version 0.61.2-30-g131dca7 (131dca7d10d5b0b7dca5c5cbe9b0b54938805015) 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long long, int)+0x639) [0x337926] 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long long, ThreadPool::TPHandle*)+0x43) [0x3396d4] 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x10f) [0x3397fc] 4: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*, ThreadPool::TPHandle&)+0x7) [0x33aa44] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x3c3) [0x3bee88] 6: (ThreadPool::WorkThread::entry()+0x9) [0x3bf9ea] 7: (Thread::_entry_func(void*)+0x7) [0x3b92f4] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ... ... 2013-05-31 16:13:02.833444 a2cff420 0 -- 10.214.143.10:6801/19670 >> 10.214.143.10:6804/19671 pipe(0xac355b28 sd=34 :57310 s=2 pgs=1 cs=1 l=0).fault with nothing to send, going to standby 2013-05-31 16:13:02.833813 a25ff420 0 -- 10.214.143.10:0/19670 >> 10.214.143.10:6805/19671 pipe(0xb0d104d8 sd=30 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-31 16:13:02.846528 af3ff420 -1 *** Caught signal (Aborted) ** in thread af3ff420 ceph version 0.61.2-30-g131dca7 (131dca7d10d5b0b7dca5c5cbe9b0b54938805015) 1: ceph-osd() [0x35bcee] 2: (__default_sa_restorer_v2()+0) [0xb69e49d0] 3: (()+0x171e6) [0xb69d61e6] 4: (gsignal()+0x29) [0xb69e3db2] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. tamil@ubuntu:~/arm/teuthology$ cat rados.yaml roles: - [mon.a, mds.a, osd.0, osd.1] - [mon.b, mon.c, osd.2, osd.3, client.0] targets: ubuntu@tala007.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC77W/bX7Lxw7CYYGH0Qef09nebdo+RKD07QliMGVogjb/UItyn31uMTh3yOLm1VDD5jtAlDRNRNNyDyqYp4MFKyLJAOHIMdBQV+Q8nGd4wjpMkh2/aCPKEO73l64+vVKnNLWLlI3Lrgp1OTrz3r9Fk7pRDJb5K7EmYKacx4UeEENch9pr/f/FE8222e5aoLXJnifz9SrSI2ItqZlgNgGPOXCdvlEaefIaDU7MIlEBP+21jjJHzDlFWoQW0umL9JF6nXbH+dxQlbLNE9mOI+c8YnsolStRAMqziXOhmsK4cn4DrzcY48vuIOFQVjhwsuqE+f8oEGIkXv6OWRFo9JziH ubuntu@tala010.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAPzBE6fYTGlFJJ5fkmwsqmsIB3xs+ukNywGc048bSzrCghBQ5BKjVHh/cWt2AX5RfP5naZCCnOLftWQs/V+oC36J7pBzAt2hwDq88EWXRegEaiTjwjSzgBffI8wFeKJ6BVMMfMk0yXkuQYXiwXUnR5X57Gfar+vw8bIxDqiJE6i9JQ3Qo/exmTZMsPGLfAMUE7xmQqwhBpaUhA5BbSnekI70LshHQETNUtE3C1wDNziZ+C5X6CeiHGUUdTHwmdfFpsHT1/Wfnf0UMJi8QSA00D5Tb4IUXvTGtFZD+1zU9f76Zkb8Ff9cELtr/2KmVn1YogpZWkKYNkHAi41K5Ubu/ tasks: #- chef: - install: branch: cuttlefish - ceph: - workunit: clients: client.0: - rados/test_python.sh
#3 Updated by Tamilarasi muthamizhan almost 11 years ago
The teuthology logs are copied to ubuntu@burnupi24.front.sepia.ceph.com:/home/ubuntu/bug5163/testing_logs_rados_python
#4 Updated by Sage Weil almost 11 years ago
- Status changed from New to Need More Info
Tamil- Yours looks like a different (and easier) bug. what was the workload? It appears to just be a bad trucnation value
{ "ops": [ { "op_num": 0, "op_name": "truncate", "collection": "17.3_head", "oid": "251e4793\/abc\/head\/\/17", "offset": 13152137403151941634}, { "op_num": 1, "op_name": "setattr", "collection": "17.3_head", "oid": "251e4793\/abc\/head\/\/17", "name": "_", "length": 206}, { "op_num": 2, "op_name": "setattr", "collection": "17.3_head", "oid": "251e4793\/abc\/head\/\/17", "name": "snapset", "length": 31}]}
Mike- do you have the full log with more line sleading up to the crash? usually it tells you what the error code was and dumps the transaction.
thanks!
#5 Updated by Sage Weil almost 11 years ago
Tamil- I see, it's the python rados tests. Is this reproducible?
#6 Updated by Sage Weil almost 11 years ago
- Subject changed from OSD Crash to filestore: ENOTEMPTY on object removal
- Source changed from other to Community (user)
moved tamil's issue to #5233. and mike, i see the output now, but it doesn't make much sense. a more complete log would be great, thanks!
#7 Updated by Samuel Just almost 11 years ago
Can we get a recursive ls of 2.363_head on that osd?
#8 Updated by Sage Weil almost 11 years ago
- Priority changed from Urgent to High
#9 Updated by Samuel Just almost 11 years ago
- Status changed from Need More Info to Can't reproduce