Project

General

Profile

Actions

Bug #50441

closed

cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service

Added by M B about 3 years ago. Updated almost 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
octopus, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

I installed a new Ceph 15.2.10 cluster on Ubuntu 20.04 arm64 bare metal starting with a first monitor/manager node using the new "cephadm bootstrap" tool using the following command:

cephadm bootstrap --mon-ip 192.168.1.11

but unfortunately the grafana service is not working at all. It tries to restart the ceph/ceph-grafana container every 10 minutes but fails to do so because it looks like there is no arm64 version of this container as you can see from the logs below:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1021, in _remote_connection
    yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1168, in _run_cephadm
    code, '\n'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Deploy daemon grafana.ceph1a ...
Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=docker.io/ceph/ceph-grafana:6.7.4 -e NODE_NAME=ceph1a docker.io/ceph/ceph-grafana:6.7.4 -c %u %g /var/lib/grafana
stat: stderr {"msg":"exec container process `/usr/bin/stat`: Exec format error","level":"error","time":"2021-04-09T06:17:54.000910863Z"}
Traceback (most recent call last):
  File "<stdin>", line 6153, in <module>
  File "<stdin>", line 1412, in _default_image
  File "<stdin>", line 3431, in command_deploy
  File "<stdin>", line 3362, in extract_uid_gid_monitoring
  File "<stdin>", line 2099, in extract_uid_gid
RuntimeError: uid/gid not found

So I see two options here:

1) provide an arm64 docker image for the ceph/ceph-grafana container (preferred)
2) check for arm64 arch and do not deploy the grafana service on this architecture until 1) is fixed

I think it is a real win for Ceph to fully work on arm64 architecture, so it would be great if this could be taken care of. In case you need more details or more log data do not hesitate to contact me.

Thank you very much in advance.


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #51549: pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service ResolvedActions
Copied to RADOS - Backport #51551: octopus: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service RejectedActions
Actions #1

Updated by Sebastian Wagner almost 3 years ago

  • Category changed from cephadm to cephadm/monitoring
Actions #2

Updated by Sebastian Wagner almost 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan Mick
  • Pull request ID set to 41559
Actions #3

Updated by Kefu Chai almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to octopus, pacific
Actions #4

Updated by Kefu Chai almost 3 years ago

  • Copied to Backport #51549: pacific: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service added
Actions #5

Updated by Kefu Chai almost 3 years ago

  • Copied to Backport #51551: octopus: cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service added
Actions #6

Updated by Sebastian Wagner almost 3 years ago

  • Status changed from Pending Backport to Resolved
Actions #7

Updated by Deepika Upadhyay almost 3 years ago

  • Status changed from Resolved to Pending Backport
Actions #8

Updated by Deepika Upadhyay almost 3 years ago

  • Project changed from Orchestrator to RADOS
  • Category deleted (cephadm/monitoring)

moved temp to RADOS so that we can use backport scripts

Actions #9

Updated by Deepika Upadhyay almost 3 years ago

  • Status changed from Pending Backport to Resolved
Actions #10

Updated by M B over 2 years ago

Unfortunately this issue does not seem to be resolved, or at least not with Pacific 16.2.5. I installed a fresh new cluster with "cephadm boostrap --mon-ip <IP>" and it is stuck at "Updating prometheus deployment" as you can see below from "ceph -s" output:


  cluster:
    id:     fb48d256-f43d-11eb-9f74-7fd39d4b232f
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum ceph1a (age 76m)
    mgr: no daemons active (since 64m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     

  progress:
    Updating prometheus deployment (+1 -> 1) (0s)
      [............................] 

The web admin interface was working for a the first few minutes after bootstrapping but then stopped and commands such as "ceph orch host ls" just stall and never give any output back.

This is Ubuntu 20.04 LTS as host on aarch64.

Let me know if you need anymore details.

Actions #11

Updated by Dan Mick over 2 years ago

Can't reproduce the failure; I just started a mon-and-mgr bootstrapped cluster with no incident:

# ceph orch ls
NAME           PORTS        RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager   ?:9093,9094      1/1  40s ago    6m   count:1
crash                           1/1  40s ago    7m   *
grafana        ?:3000           1/1  40s ago    6m   count:1
mgr                             1/2  40s ago    7m   count:2
mon                             1/5  40s ago    7m   count:5
node-exporter  ?:9100           1/1  40s ago    6m   *
prometheus     ?:9095           1/1  40s ago    6m   count:1


The base OS was CentOS 8, but that shouldn't matter. I guess we need to know why prometheus update was failing. Are there any hints in /var/log/ceph/cephadm.log?
Actions #12

Updated by Loïc Dachary over 2 years ago

  • Status changed from Resolved to Pending Backport
Actions #13

Updated by Loïc Dachary over 2 years ago

Deepika, you marked this issue resolved but I can't figure out why, would you be so kind as to explain ? Thanks in advance !

Actions #14

Updated by Deepika Upadhyay over 2 years ago

@Loïc Dachary, sure, the PR addressing this issue was backported to pacific, spoke to Dan that octopus backport is not necessary. So I marked it as resolved after pacific and master merge.

Actions #15

Updated by Deepika Upadhyay over 2 years ago

  • Status changed from Pending Backport to Need More Info

M B wrote:

Unfortunately this issue does not seem to be resolved, or at least not with Pacific 16.2.5. I installed a fresh new cluster

Actions #16

Updated by M B over 2 years ago

@Deepika finally I think this issue I mentioned last week regarding the prometheus deployment after a new cluster installation with Pacific is unrelated because I simply rebooted the node and finally prometheus got deployed. So it is an issue that it gets stuck on bootstrapping the node but it is not related to this specific issue. I have been reading on the ceph-users that some other users are having similar issues with services which should be deployed simply get stuck, probably and hopefully there is another track issue open for that. From my side all good.

Actions #17

Updated by Neha Ojha over 2 years ago

Deepika: why is this issue in need-more-info? Looks like the original fix and pacific backport https://github.com/ceph/ceph/pull/42211 have merged?

Actions #18

Updated by Dan Mick over 2 years ago

I assume because of MB's comment, but that seems now to be historical

Actions #19

Updated by Dan Mick over 2 years ago

Deepika, was that the reason why?

Actions #20

Updated by Deepika Upadhyay over 2 years ago

  • Status changed from Need More Info to Resolved

Dan Mick wrote:

Deepika, was that the reason why?

yep Dan, Neha marked needs info because of MB's comment, marking it as resolved since that's no longer valid, feel free to reopen if otherwise

Actions #21

Updated by Konstantin Shalygin almost 1 year ago

  • Status changed from Resolved to Rejected
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF