Project

General

Profile

Bug #22201

PG removal with ceph-objectstore-tool segfaulting

Added by David Turner over 6 years ago. Updated about 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Introspection/Control
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
objectstore-tool
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I am using ceph 10.2.10. ceph-objectstore-tool is failing to delete a PG with the below segfault. The initial attempt to delete the PG was accidentally canceled by a timeout in the SSH connection. Subsequent attempt to run the delete on the PG have resulted in this segfault. The Pool that the PG is from was deleted prior to any attempts to delete the PG with ceph-objectstore-tool. It was being tested if using the tool could increase the speed of cleaning up the old PG folders.

The OSD does not make any attempt to clean up this folder while it's running. This PG is only taking up 200GB so I can leave it here for testing if anyone would like to utilize it. Greg Farnum took a guess at this on the Mailing List. I'm guessing he's correct in that the OSD has no reference to the PG any more and it's just a folder sitting on the disk now. If that's the case, then I see no reason for ceph-objectstore-tool to not just continue past its segfault and remove any existing folders on the disk that matches the provided path and name.

"Well, this isn't supposed to happen, but backtraces like that generally mean the PG is trying to load an OSDMap that has already been trimmed. If I were to guess, in this case enough of the PG metadata got cleaned up that the OSD no longer knows it's there, and it removed the maps. But trying to remove the PG is pulling them in.
Or, alternatively, there's an issue with removing PGs that have lost their metadata and it's trying to pull in map epoch 0 or something..."

[root@osd1 ~] # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --journal-path /var/lib/ceph/osd/ceph-0/journal --pgid 97.314s0 --op remove
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
marking collection for removal
mark_pg_for_removal warning: peek_map_epoch reported error
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
what(): buffer::end_of_buffer
  • Caught signal (Aborted) *
    in thread 7f98ab2dc980 thread_name:ceph-objectstor
    ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
    1: (()+0x95209a) [0x7f98abc4b09a]
    2: (()+0xf100) [0x7f98a91d7100]
    3: (gsignal()+0x37) [0x7f98a7d825f7]
    4: (abort()+0x148) [0x7f98a7d83ce8]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7f98a86879d5]
    6: (()+0x5e946) [0x7f98a8685946]
    7: (()+0x5e973) [0x7f98a8685973]
    8: (()+0x5eb93) [0x7f98a8685b93]
    9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int, char
    )+0xa5) [0x7f98abd498a5]
    10: (PG::read_info(ObjectStore*, spg_t, coll_t const&, ceph::buffer::list&, pg_info_t&, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >&, unsigned char&)+0x324) [0x7f98ab6d3094]
    11: (mark_pg_for_removal(ObjectStore*, spg_t, ObjectStore::Transaction*)+0x87c) [0x7f98ab66615c]
    12: (initiate_new_remove_pg(ObjectStore*, spg_t, ObjectStore::Sequencer&)+0x131) [0x7f98ab666a51]
    13: (main()+0x39b7) [0x7f98ab610437]
    14: (__libc_start_main()+0xf5) [0x7f98a7d6eb15]
    15: (()+0x363a57) [0x7f98ab65ca57]
    Aborted

History

#1 Updated by Greg Farnum over 6 years ago

  • Project changed from Ceph to RADOS
  • Category set to Introspection/Control
  • Component(RADOS) objectstore-tool added

#2 Updated by David Turner about 6 years ago

We're getting close to converting the OSDs in this cluster to Bluestore. If you would like any tests to be run on this OSD/PG, let me know. There is probably about 2 weeks left before it is re-deployed.

Also available in: Atom PDF