rados: export caught in loop on 'buck' bucket (1.5M objects)
dumped an object list, watched strace, and periodically checked the current file/object name against the list, and it did not appear ot be making progress... looks like it's reiterating over the same block somewhere around object 1.1M.
this is on raid3136, bucket buck.
#5 Updated by Sage Weil about 8 years ago
trying to reproduce this (with logs) and having a hard time. :/
rados -n client.dhobackup01 export --delete-after buck /mnt/backup/dhobjects/b/u/c/buck --log-file buck2.log --debug-ms 1 --debug-objecter 20 --log-to-stderr 0 &
#6 Updated by Colin McCabe about 8 years ago
This is something where a core file or a backtrace would be really, really helpful. I reviewed the code in librados::ObjectIterator and in rados_sync, and although they could use some optimization, there is nothing obviously wrong there.
Was someone else performing operations on the pool while this happened? One thing that I don't think we've tested very much is one user performing adds and deletes on a rados pool while another user lists the objects in that pool.
#7 Updated by Sage Weil about 8 years ago
The original process is still running (but suspended). Unfortunately the binary is an old build so there are no debug symbols, making it hard to make much sense of in gdb. I was able to tell from strace -p that it's caught in a loop but its difficult to get much more out of it.
I've run it a few more times and still can't reproduce. I think you're right that the thing to do is write a test that tests listing large buckets with concurrent modifications...