Project

General

Profile

Actions

Bug #50169

closed

Pacific - RadosGW not starting after upgrade

Added by Enrico Kern about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using Ubuntu 20.04, none cephadm, packages from ceph repositories.

Upgraded from latest Octopus to Pacific.

After the upgrade radosgw AND osds do not start with permission errors. There are multiple users with the same issues on ceph-users mailinglist.

for OSDs:

2021-04-06T11:27:23.402+0200 7f527b714f00 0 osd.20:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2021-04-06T11:27:23.402+0200 7f527b714f00 1 bdev(0x55815696a400 /var/lib/ceph/osd/ceph-20/block) open path /var/lib/ceph/osd/ceph-20/block
2021-04-06T11:27:23.402+0200 7f527b714f00 -1 bdev(0x55815696a400 /var/lib/ceph/osd/ceph-20/block) open open got: (1) Operation not permitted
2021-04-06T11:27:23.402+0200 7f527b714f00 0 osd.20:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2021-04-06T11:27:23.402+0200 7f527b714f00 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-20/block: (1) Operation not permitted
2021-04-06T11:27:23.402+0200 7f527b714f00 1 bluestore(/var/lib/ceph/osd/ceph-20) _mount path /var/lib/ceph/osd/ceph-20
2021-04-06T11:27:23.402+0200 7f527b714f00 0 bluestore(/var/lib/ceph/osd/ceph-20) _open_db_and_around read-only:0 repair:0
2021-04-06T11:27:23.402+0200 7f527b714f00 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-20/block: (1) Operation not permitted
2021-04-06T11:27:23.402+0200 7f527b714f00 1 bdev(0x55815696a400 /var/lib/ceph/osd/ceph-20/block) open path /var/lib/ceph/osd/ceph-20/block
2021-04-06T11:27:23.402+0200 7f527b714f00 -1 bdev(0x55815696a400 /var/lib/ceph/osd/ceph-20/block) open open got: (1) Operation not permitted
2021-04-06T11:27:23.402+0200 7f527b714f00 -1 osd.20 0 OSD:init: unable to mount object store
2021-04-06T11:27:23.402+0200 7f527b714f00 -1 [[0;31m ** ERROR: osd init failed: (1) Operation not permitted[[0m

permissions to block etc. are all correct. Utilizing manual start of the osd with as example:

/usr/bin/ceph-osd -f --cluster ceph --id 20 --setuser ceph --setgroup ceph

starts the OSD just fine. But via systemctl it is not working.

for RGW:

2021-04-06T15:39:14.923+0200 7efe51a9cb40 0 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process radosgw, pid 2560132
2021-04-06T15:39:14.923+0200 7efe51a9cb40 0 framework: beast
2021-04-06T15:39:14.923+0200 7efe51a9cb40 0 framework conf key: port, val: 7480
2021-04-06T15:39:14.923+0200 7efe51a9cb40 1 radosgw_Main not setting numa affinity
2021-04-06T15:39:15.107+0200 7efe51a9cb40 -1 int rgw::cls::fifo::get_meta(librados::v14_2_0::IoCtx&, const string&, std::optional<rados::cls::fifo::objv>, rados::cls::fifo::info*, uint32_t*, uint32_t*, uint64_t, optional_yield, bool):105 fifo::op::GET_META failed r=-1 tid=0
2021-04-06T15:39:15.107+0200 7efe51a9cb40 -1 static int rgw::cls::fifo::FIFO::open(librados::v14_2_0::IoCtx, std::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, optional_yield, std::optional<rados::cls::fifo::objv>, bool):884 get_meta failed: r=-1
2021-04-06T15:39:15.107+0200 7efe51a9cb40 -1 static int RGWDataChangesFIFO::exists(ceph::common::CephContext*, librados::v14_2_0::Rados*, const rgw_pool&, bool*, bool*): unable to open FIFO: default.rgw.log/data_log.0: (1) Operation not permitted
2021-04-06T15:39:15.107+0200 7efe51a9cb40 -1 int RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*, librados::v14_2_0::Rados*): Error when checking for existing FIFO datalog backend: (1) Operation not permitted
2021-04-06T15:39:15.107+0200 7efe51a9cb40 -1 static int rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx, std::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, optional_yield, std::optional<rados::cls::fifo::objv>, std::optional<std::basic_string_view<char> >, bool, uint64_t, uint64_t):925 create_meta failed: r=-1
2021-04-06T15:39:15.107+0200 7efe51a9cb40 -1 int RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*, librados::v14_2_0::Rados*): Error when starting backend: Operation not permitted
2021-04-06T15:39:15.107+0200 7efe51a9cb40 0 ERROR: failed to start datalog_rados service ((1) Operation not permitted
2021-04-06T15:39:15.107+0200 7efe51a9cb40 0 ERROR: failed to init services (ret=(1) Operation not permitted)
2021-04-06T15:39:15.131+0200 7efe51a9cb40 -1 Couldn't init storage provider (RADOS)

rgw also does not start with manual command invokation.


Related issues 2 (1 open1 closed)

Copied to RADOS - Bug #50682: Pacific - OSD not starting after upgradeNew

Actions
Copied to rgw - Backport #50982: pacific: Pacific - RadosGW not starting after upgradeResolvedAdam EmersonActions
Actions #2

Updated by Enrico Kern about 3 years ago

I can confirm that setting

rgw_data_log_backing = omap

is resolving the radosGW startup issues. I coupled them together here as both show the same problems (permission denied). Could create a seperate one about the OSD issue.

Actions #3

Updated by Greg Farnum almost 3 years ago

  • Copied to Bug #50682: Pacific - OSD not starting after upgrade added
Actions #4

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to rgw
  • Subject changed from Pacific - OSD and RadosGW not starting after upgrade to Pacific - RadosGW not starting after upgrade

Created bug for OSD issue at https://tracker.ceph.com/issues/50682

Actions #5

Updated by Casey Bodley almost 3 years ago

  • Assignee set to Adam Emerson
Actions #6

Updated by Adam Emerson almost 3 years ago

That error, -1, on a CLS call, usually means the OSD doesn't have the CLS module loaded. Are you specifying the list of CLS modules explicitly in your configuration?

Actions #7

Updated by 玮文 胡 almost 3 years ago

My logs show r=-5 instead of -1.

Casey mentioned in the mailing list that the fixes already merged (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/T2LPPKWHQZRWPQGGQPNPLUGI46SBYSKJ/). But I still see this after upgrade to 16.2.3.

debug 2021-05-08T17:36:40.027+0000 7f34f8595440  0 deferred set uid:gid to 167:167 (ceph:ceph)
debug 2021-05-08T17:36:40.027+0000 7f34f8595440  0 ceph version 16.2.3 (381b476cb3900f9a92eb95d03b4850b953cfd79a) pacific (stable), process radosgw, pid 7
debug 2021-05-08T17:36:40.027+0000 7f34f8595440  0 framework: beast
debug 2021-05-08T17:36:40.027+0000 7f34f8595440  0 framework conf key: port, val: 7480
debug 2021-05-08T17:36:40.027+0000 7f34f8595440  1 radosgw_Main not setting numa affinity
debug 2021-05-08T17:36:43.103+0000 7f34f8595440 -1 static int rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx, std::__cxx11::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, optional_yield, std::optional<rados::cls::fifo::objv>, std::optional<std::basic_string_view<char> >, bool, uint64_t, uint64_t):1212 create_meta failed: r=-5
debug 2021-05-08T17:36:43.103+0000 7f34f8595440 -1 tl::expected<log_type, boost::system::error_code> {anonymous}::handle_dne(librados::v14_2_0::IoCtx&, log_type, std::__cxx11::string, bool, optional_yield):131 error creating FIFO: r=-5, oid=data_log.0
debug 2021-05-08T17:36:43.103+0000 7f34f8595440 -1 int RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, librados::v14_2_0::Rados*): Error initializing backends: Input/output error
debug 2021-05-08T17:36:43.103+0000 7f34f8595440  0 ERROR: failed to start datalog_rados service ((5) Input/output error
debug 2021-05-08T17:36:43.103+0000 7f34f8595440  0 ERROR: failed to init services (ret=(5) Input/output error)
debug 2021-05-08T17:36:43.879+0000 7f34f8595440 -1 Couldn't init storage provider (RADOS)

So I changed "rgw_default_data_log_backing" to omap, then rgw can start. But now, when I write this reply. I reset "rgw_default_data_log_backing" back to "fifo", and it still works. So is rgw did some migration after it started?

Actions #8

Updated by Adam Emerson almost 3 years ago

  • Status changed from New to Fix Under Review
  • Backport set to pacific,octopus

It tries to autodetect the zero'th generation, once it does so successfully it marks what it is so it doesn't have to do the timely business every time it starts up.

The bug was a leftover bit of detection from the previous attempt at autodetection, introduced by accident as part of another fix.

There's a fix in: https://github.com/ceph/ceph/pull/41465

Actions #9

Updated by Casey Bodley almost 3 years ago

  • Pull request ID set to 41465
Actions #10

Updated by Adam Emerson almost 3 years ago

  • Backport changed from pacific,octopus to pacific
Actions #11

Updated by Adam Emerson almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #12

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50982: pacific: Pacific - RadosGW not starting after upgrade added
Actions #13

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF