Project

General

Profile

Bug #54120

mgr/dashboard: dashboard turns telemetry off when configuring report

Added by Yaarit Hatuka 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Component - Telemetry
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
octopus, pacific, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem

The Telemetry wizard on the Dashboard turns telemetry off when user explores various channels in the report (while being opted-in).

Environment

  • ceph version vstart cluster (pacific / quincy / master). Saw that also on the gibba cluster.
  • Platform (OS/distro/release): centos8
  • Cluster details (nodes, monitors, OSDs): vstart cluster / gibba cluster
  • Did it happen on a stable environment or after a migration/upgrade?: both
  • Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)): firefox 78.13.Oesr 64-bit

How reproducible

Steps:

  1. Start a vstart cluster (pacific / quincy / master)
  2. Opt-in to telemetry either via:
    1. CLI with `ceph telemetry on`
    2. Dashboard wizard:
      1. Go to Telemetry Configuration page (Step 1 of 2: Telemetry report configuration) at 
https://127.0.0.1:<port>/#/telemetry
      2. Click “Next” at the bottom of the page
      3. On the next page (Step 2 of 2: Telemetry report preview) check the license box (I agree…).
      4. Click “Update”
  3. In the CLI, run `ceph telemetry status` to see that `"enabled": true`.
  4. Go back to the Telemetry Configuration page (Step 1 of 2: Telemetry report configuration).
  5. Uncheck one of the checked boxed of the available channels (e.g. uncheck Crash channel checkbox).
  6. Click “Next”.
  7. Now a popup window should appear with the text:
    "Your settings have been applied successfully. Due to privacy/legal reasons the Telemetry module is now disabled until you complete the next step and accept the license."
  8. Go back to CLI and run `ceph telemetry status` (which shows that “enabled”: false)

Actual results

The telemetry module is being disabled when the user just explores the report, which is wrong.

Expected results

The wizard should not turn telemetry off, unless the user explicitly clicked the "Deactivate" button.
The user should be able to explore the report details without setting any new configuration.

Additional info

This might happen also in Octopus, need to check.


Related issues

Related to mgr - Bug #54250: mgr/telemetry: telemetry module experiences an AssertionError when generating device metrics Resolved
Related to Dashboard - Bug #54133: mgr/dashboard: Contact Info should be visible only when Ident channel is checked Resolved
Related to Dashboard - Feature #53543: mgr/dashboard: expose new telemetry commands New
Related to Dashboard - Feature #51020: telemetry activate: only show ident fields when ident is check Resolved
Copied to Dashboard - Backport #54351: octopus: mgr/dashboard: dashboard turns telemetry off when configuring report Resolved
Copied to Dashboard - Backport #54352: pacific: mgr/dashboard: dashboard turns telemetry off when configuring report Resolved
Copied to Dashboard - Backport #54353: quincy: mgr/dashboard: dashboard turns telemetry off when configuring report Resolved

History

#1 Updated by Yaarit Hatuka 8 months ago

  • Priority changed from Normal to Urgent

#2 Updated by Ernesto Puerta 8 months ago

  • Status changed from New to In Progress
  • Assignee set to Ernesto Puerta

When I try to enable Telemetry (on a new vstart cluster) it fails and I get the following exception:

2022-02-04T10:27:14.677+0000 7f9994ea3700 -1 Traceback (most recent call last):
  File "/ceph/src/pybind/mgr/telemetry/module.py", line 1803, in serve
    self.send(self.last_report)
  File "/ceph/src/pybind/mgr/telemetry/module.py", line 1271, in send
    assert devices
AssertionError

#3 Updated by Ernesto Puerta 8 months ago

  • Status changed from In Progress to Need More Info

That seems to be an error inside the Telemetry module

#4 Updated by Yaarit Hatuka 8 months ago

Thanks for reporting this.
  1. What version are you running?
    I built master a couple of days ago and this did not happen.
  2. Did you try this on Pacific?
  3. Did you try this on the gibba cluster Dashboard? You don't need to build anything, just access the Dashboard, and follow step 2.2 and on from the list in the description above.
  4. How did you enable telemetry when you tried to reproduce this?

#5 Updated by Laura Flores 8 months ago

Nizamudeen reported experiencing this failure on the ceph-dev environment: https://github.com/rhcs-dashboard/ceph-dev

[root@ceph ceph]# ceph telemetry on --license sharing-1-0
2022-02-04T07:02:44.216+0000 7f587dea4700 -1 WARNING: all dangerous and experimental features are enabled.
2022-02-04T07:02:44.222+0000 7f587dea4700 -1 WARNING: all dangerous and experimental features are enabled.
Telemetry is on.
Some channels are disabled, please enable with:
`ceph telemetry enable channel perf`
[root@ceph ceph]# ceph telemetry status
2022-02-04T07:02:51.761+0000 7f5768191700 -1 WARNING: all dangerous and experimental features are enabled.
2022-02-04T07:02:51.764+0000 7f5768191700 -1 WARNING: all dangerous and experimental features are enabled.
Error EIO: Module 'telemetry' has experienced an error and cannot handle commands:
2022-02-04T07:02:45.697+0000 7f8ead583700  0 [telemetry INFO root] Sent report to https://telemetry.ceph.com/report
2022-02-04T07:02:45.697+0000 7f8ead583700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'telemetry' while running on mgr.x: 
2022-02-04T07:02:45.698+0000 7f8ead583700 -1 telemetry.serve:
2022-02-04T07:02:45.698+0000 7f8ead583700 -1 Traceback (most recent call last):
  File "/ceph/src/pybind/mgr/telemetry/module.py", line 1803, in serve
    self.send(self.last_report)
  File "/ceph/src/pybind/mgr/telemetry/module.py", line 1271, in send
    assert devices
AssertionError

#6 Updated by Ernesto Puerta 8 months ago

Laura, Yaarit, ceph-dev basically runs vstart + master RPMs (usually the ones built last night). The issue happens when you start the cluster with vstart, log in to Dashboard and click "Accept" on the Telemetry notification to set it up from scratch.

I'll go on looking the issue you reported.

#7 Updated by Ernesto Puerta 8 months ago

Hi,

I tried with ceph-dev and reproduced the behaviour described. In any case, if a user completes the wizard, telemetry will be eventually enabled. It'll only stay disabled if the user leaves the wizard before completion. IMHO, while not ideal, that makes the priority less urgent. If I'm not wrong, this is not a regression: this behaviour has been always there.

I assume that the issue was that the telemetry UI was mainly designed to gain adoption among new users (I cannot confirm because the contributor that worked on this is no longer with us), not with the upgrade path in mind.

Either case, the workflow could be like this:
  1. Wizard starts
  2. DASHBOARD fetches:
    • telemetry status (on or off),
    • enabled channels,
    • contact info,
  3. DASHBOARD displays the first Wizard page with the above info updated accordingly.
  4. The USER can modify:
    • telemetry status,
    • channels,
    • contact info
  5. USER clicks "next" (but these changes are NOT applied yet)
  6. USER is presented with the Review page:
    • the report will be displayed with the selected fields.
  7. If USER clicks "Update" the changes are applied (enable disable telemetry, enable/disable changes, update contact info).
    • Otherwise, not changes are applied.

It seems we're not using the telemetry CLI API, but calling directly to the internal Telemetry methods via mgr.remote() interface.

The changes to the UI and REST shouldn't be too complex. Question: can you take this or you expect us to deal with it? I'm asking because we don't have plenty of hands lately...

#8 Updated by Ernesto Puerta 8 months ago

  • Assignee changed from Ernesto Puerta to Sarthak Gupta

#9 Updated by Yaarit Hatuka 8 months ago

Hi Ernesto,

Both Laura and I could not reproduce this issue.
I think we should avoid using 'assert devices', since if there is something wrong nothing will be reported (since the report will not be generated) and we will never know that it happened. We will have a fix for that.

Thanks for reporting this.

#10 Updated by Yaarit Hatuka 8 months ago

  • Related to Bug #54250: mgr/telemetry: telemetry module experiences an AssertionError when generating device metrics added

#11 Updated by Sarthak Gupta 8 months ago

  • Status changed from Need More Info to In Progress
  • Pull request ID set to 44985

#12 Updated by Yaarit Hatuka 8 months ago

  • Related to Bug #54133: mgr/dashboard: Contact Info should be visible only when Ident channel is checked added

#13 Updated by Yaarit Hatuka 8 months ago

Thanks, Ernesto, Sarthak!
I have a few comments:

Ernesto Puerta wrote:

Hi,

I tried with ceph-dev and reproduced the behaviour described. In any case, if a user completes the wizard, telemetry will be eventually enabled. It'll only stay disabled if the user leaves the wizard before completion. IMHO, while not ideal, that makes the priority less urgent. If I'm not wrong, this is not a regression: this behaviour has been always there.

The thing is that we cannot rely on the user re-completing the wizard; users should be allowed just to look around, without applying any changes. It's a confusing and unintuitive behavior of the wizard. The "Next" button should not apply any changes to the configuration, just the "Update" button.
Thanks for the fix!

I assume that the issue was that the telemetry UI was mainly designed to gain adoption among new users (I cannot confirm because the contributor that worked on this is no longer with us), not with the upgrade path in mind.

Either case, the workflow could be like this:
  1. Wizard starts
  2. DASHBOARD fetches:
    • telemetry status (on or off),
    • enabled channels,
    • contact info,

There are additional settings (proxy, interval) that the wizard currently displays, they need to be included here too;
need to also fetch leaderboard, organization (which is part of the Contact Info section).

We should also display the output of `ceph telemetry collection ls` (introduced in Quincy), which will give the user more information about the available collections, whether they opt-in for the first time, or if they are upgrading. See https://docs.ceph.com/en/latest/mgr/telemetry/#collections for more details.

  1. DASHBOARD displays the first Wizard page with the above info updated accordingly.
  2. The USER can modify:
    • telemetry status,
    • channels,
    • contact info

- Contact info section (Contact, Description, Organization) should be visible only if the Ident channel is enabled. See related ticket (https://tracker.ceph.com/issues/54133).
- Leaderboard should also be displayed; the other Advanced Settings which are currently displayed should still be there.

  1. USER clicks "next" (but these changes are NOT applied yet)
  2. USER is presented with the Review page:
    • the report will be displayed with the selected fields.

In the CLI we use `ceph telemetry show` and `ceph telemetry preview` to differentiate between opted-in and opted-out states.
See https://docs.ceph.com/en/latest/mgr/telemetry/#sample-report for more details. Please let us know if you have any questions.

  1. If USER clicks "Update" the changes are applied (enable disable telemetry, enable/disable changes, update contact info).

- Other settings should also be updated (proxy, interval, leaderboard)

  • Otherwise, not changes are applied.

It seems we're not using the telemetry CLI API, but calling directly to the internal Telemetry methods via mgr.remote() interface.

The changes to the UI and REST shouldn't be too complex. Question: can you take this or you expect us to deal with it? I'm asking because we don't have plenty of hands lately...

We are also currently very busy on our end... we will be happy to assist whenever we can. I feel like we need to have a solid definition of the ideal GUI design first.

Is it possible to see the planned GUI blueprint prior to its implementation to see that we're on the same page? What tools do you use for that?

Thanks!

#14 Updated by Yaarit Hatuka 8 months ago

  • Related to Feature #53543: mgr/dashboard: expose new telemetry commands added

#15 Updated by Yaarit Hatuka 8 months ago

  • Backport changed from pacific, quincy to octopus, pacific, quincy

#16 Updated by Yaarit Hatuka 8 months ago

  • Status changed from In Progress to Fix Under Review

#17 Updated by Laura Flores 8 months ago

  • Status changed from Fix Under Review to Pending Backport

#18 Updated by Backport Bot 8 months ago

  • Copied to Backport #54351: octopus: mgr/dashboard: dashboard turns telemetry off when configuring report added

#19 Updated by Backport Bot 8 months ago

  • Copied to Backport #54352: pacific: mgr/dashboard: dashboard turns telemetry off when configuring report added

#20 Updated by Backport Bot 8 months ago

  • Copied to Backport #54353: quincy: mgr/dashboard: dashboard turns telemetry off when configuring report added

#21 Updated by Yaarit Hatuka 7 months ago

  • Related to Feature #51020: telemetry activate: only show ident fields when ident is check added

#22 Updated by Yaarit Hatuka 7 months ago

  • Status changed from Pending Backport to Resolved

all backports are merged; resolving

Also available in: Atom PDF