Bug #1185
closed
rados: export caught in loop on 'buck' bucket (1.5M objects)
Added by Sage Weil almost 13 years ago.
Updated almost 13 years ago.
Description
dumped an object list, watched strace, and periodically checked the current file/object name against the list, and it did not appear ot be making progress... looks like it's reiterating over the same block somewhere around object 1.1M.
this is on raid3136, bucket buck.
- Target version changed from v0.30 to v0.31
- Translation missing: en.field_position set to 693
- Translation missing: en.field_story_points set to 5
- Translation missing: en.field_position deleted (
699)
- Translation missing: en.field_position set to 699
- Assignee deleted (
Sage Weil)
trying to reproduce this (with logs) and having a hard time. :/
cd /mnt/backup/dhobjects
rados -n client.dhobackup01 export --delete-after buck /mnt/backup/dhobjects/b/u/c/buck --log-file buck2.log --debug-ms 1 --debug-objecter 20 --log-to-stderr 0 &
on raid3136
This is something where a core file or a backtrace would be really, really helpful. I reviewed the code in librados::ObjectIterator and in rados_sync, and although they could use some optimization, there is nothing obviously wrong there.
Was someone else performing operations on the pool while this happened? One thing that I don't think we've tested very much is one user performing adds and deletes on a rados pool while another user lists the objects in that pool.
The original process is still running (but suspended). Unfortunately the binary is an old build so there are no debug symbols, making it hard to make much sense of in gdb. I was able to tell from strace -p that it's caught in a loop but its difficult to get much more out of it.
I've run it a few more times and still can't reproduce. I think you're right that the thing to do is write a test that tests listing large buckets with concurrent modifications...
- Target version changed from v0.31 to v0.32
- Translation missing: en.field_position deleted (
708)
- Translation missing: en.field_position set to 726
Still having trouble hitting this. Running in a loop without any debugging to see if I can trigger it.
- Status changed from New to Can't reproduce
Also available in: Atom
PDF