Project

General

Profile

Actions

Bug #20505

closed

RGW: radosgw-admin bi list - ERROR: bi_list(): (4) Interrupted system call

Added by Maarten De Quick almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When trying to make a backup of a bucket index, it fails with

  1. radosgw-admin -n client.radosgw.be-west-3 bi list
    --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
    2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted
    ERROR: bi_list(): (4) Interrupted system call
When I grep for "idx" and I count these: # grep idx priv-prod-up-alex.list.backup | wc -l
2294942
When I do a bucket stats for that bucket I get:
  1. radosgw-admin -n client.radosgw.be-west-3 bucket stats
    --bucket=priv-prod-up-alex | grep num_objects
    2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted
    "num_objects": 20148575

It looks like there are 18 million objects missing and the backup is not
complete (not sure if that's a correct assumption?). We're also afraid that
the resharding command will face the same issue.

We're running on Jewel 10.2.7
I've ran a the same command with --debug-rgw=20 --debug-ms=1 and attached the complete output in a text file.

Regards,
Maarten


Files

ceph_debug_radosgw_bi_list.txt.gz (126 KB) ceph_debug_radosgw_bi_list.txt.gz Maarten De Quick, 07/05/2017 09:13 AM

Related issues 1 (0 open1 closed)

Related to rgw - Backport #20014: jewel: multisite: bi_list() decode failuresResolvedAlexey SheplyakovActions
Actions #2

Updated by Orit Wasserman almost 7 years ago

  • Assignee set to Orit Wasserman

Hi,
I am not sure how you are using bi_list for backup, I would happy to hear more details to understand.

Is the bucket being writing to when you are running the command?

I would need osd.2 logs with object class logging, you can use this command to increase the log level:
ceph tell osd.2 injectargs --debug-objclass 5

After changing the log level, run the command again with --debug-rgw=20 --debug-ms=1
(just to make sure it is still fails in osd.2 and not a different osd)

Please provide both the command output and the osd log

Actions #3

Updated by Orit Wasserman almost 7 years ago

This issue seems to be fixed in 10.2.8
see http://tracker.ceph.com/issues/20014

Actions #4

Updated by Orit Wasserman almost 7 years ago

  • Related to Backport #20014: jewel: multisite: bi_list() decode failures added
Actions #5

Updated by Orit Wasserman almost 7 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF