Bug #1185: rados: export caught in loop on 'buck' bucket (1.5M objects) - Ceph - Ceph

Actions

Copy link

Bug #1185

closed

rados: export caught in loop on 'buck' bucket (1.5M objects)

Added by Sage Weil almost 13 years ago. Updated almost 13 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

librados

Target version:

v0.32

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

dumped an object list, watched strace, and periodically checked the current file/object name against the list, and it did not appear ot be making progress... looks like it's reiterating over the same block somewhere around object 1.1M.

this is on raid3136, bucket buck.

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Target version changed from v0.30 to v0.31

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Translation missing: en.field_position set to 693

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Translation missing: en.field_story_points set to 5
Translation missing: en.field_position deleted (~~699~~)
Translation missing: en.field_position set to 699

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Assignee deleted (~~Sage Weil~~)

Actions

Copy link

Updated by Sage Weil almost 13 years ago

trying to reproduce this (with logs) and having a hard time. :/

cd /mnt/backup/dhobjects
 rados -n client.dhobackup01 export --delete-after buck /mnt/backup/dhobjects/b/u/c/buck --log-file buck2.log --debug-ms 1 --debug-objecter 20 --log-to-stderr 0 &

on raid3136

Actions

Copy link

Updated by Colin McCabe almost 13 years ago

This is something where a core file or a backtrace would be really, really helpful. I reviewed the code in librados::ObjectIterator and in rados_sync, and although they could use some optimization, there is nothing obviously wrong there.

Was someone else performing operations on the pool while this happened? One thing that I don't think we've tested very much is one user performing adds and deletes on a rados pool while another user lists the objects in that pool.

Actions

Copy link

Updated by Sage Weil almost 13 years ago

The original process is still running (but suspended). Unfortunately the binary is an old build so there are no debug symbols, making it hard to make much sense of in gdb. I was able to tell from strace -p that it's caught in a loop but its difficult to get much more out of it.

I've run it a few more times and still can't reproduce. I think you're right that the thing to do is write a test that tests listing large buckets with concurrent modifications...

Actions

Copy link