Bug #20505
closedRGW: radosgw-admin bi list - ERROR: bi_list(): (4) Interrupted system call
0%
Description
When trying to make a backup of a bucket index, it fails with
- radosgw-admin -n client.radosgw.be-west-3 bi list
--bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted
ERROR: bi_list(): (4) Interrupted system call
2294942
When I do a bucket stats for that bucket I get:
- radosgw-admin -n client.radosgw.be-west-3 bucket stats
--bucket=priv-prod-up-alex | grep num_objects
2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted
"num_objects": 20148575
It looks like there are 18 million objects missing and the backup is not
complete (not sure if that's a correct assumption?). We're also afraid that
the resharding command will face the same issue.
We're running on Jewel 10.2.7
I've ran a the same command with --debug-rgw=20 --debug-ms=1 and attached the complete output in a text file.
Regards,
Maarten
Files
Updated by Maarten De Quick almost 7 years ago
Updated by Orit Wasserman almost 7 years ago
- Assignee set to Orit Wasserman
Hi,
I am not sure how you are using bi_list for backup, I would happy to hear more details to understand.
Is the bucket being writing to when you are running the command?
I would need osd.2 logs with object class logging, you can use this command to increase the log level:
ceph tell osd.2 injectargs --debug-objclass 5
After changing the log level, run the command again with --debug-rgw=20 --debug-ms=1
(just to make sure it is still fails in osd.2 and not a different osd)
Please provide both the command output and the osd log
Updated by Orit Wasserman almost 7 years ago
This issue seems to be fixed in 10.2.8
see http://tracker.ceph.com/issues/20014
Updated by Orit Wasserman almost 7 years ago
- Related to Backport #20014: jewel: multisite: bi_list() decode failures added
Updated by Orit Wasserman almost 7 years ago
- Status changed from New to Resolved