Project

General

Profile

Actions

Bug #20891

closed

mon: mysterious "application not enabled on <N> pool(s)"

Added by Patrick Donnelly over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From: /ceph/teuthology-archive/pdonnell-2017-08-02_17:25:29-fs-wip-pdonnell-testing-20170802-distro-basic-smithi/1474672/teuthology.log

2017-08-02T19:46:30.014 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20170802/qa/tasks/ceph.py", line 1654, in task
    healthy(ctx=ctx, config=dict(cluster=config['cluster']))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20170802/qa/tasks/ceph.py", line 1251, in healthy
    ceph_cluster=cluster_name,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/misc.py", line 903, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 132, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy' reached maximum tries (150) after waiting for 900 seconds

and

2017-08-02T19:46:05.284 INFO:teuthology.misc.health.smithi193.stdout:HEALTH_WARN application not enabled on 1 pool(s)
2017-08-02T19:46:05.286 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN application not enabled on 1 pool(s)

This is mysterious for two reasons: (a) there are three pools at the time of the error and all of them have no application set; (b) the `fs new` command did not set the application. According to the `osd dump`:

"pools":[
   {
      "pool":1,
      "pool_name":"rbd",
      "flags":1,
      "flags_names":"hashpspool",
      "type":1,
      "size":2,
      "min_size":1,
      "crush_rule":0,
      "object_hash":2,
      "pg_num":8,
      "pg_placement_num":8,
      "crash_replay_interval":0,
      "last_change":"10",
      "last_force_op_resend":"0",
      "last_force_op_resend_preluminous":"0",
      "auid":0,
      "snap_mode":"selfmanaged",
      "snap_seq":0,
      "snap_epoch":0,
      "pool_snaps":[

      ],
      "removed_snaps":"[]",
      "quota_max_bytes":0,
      "quota_max_objects":0,
      "tiers":[

      ],
      "tier_of":-1,
      "read_tier":-1,
      "write_tier":-1,
      "cache_mode":"none",
      "target_max_bytes":0,
      "target_max_objects":0,
      "cache_target_dirty_ratio_micro":400000,
      "cache_target_dirty_high_ratio_micro":600000,
      "cache_target_full_ratio_micro":800000,
      "cache_min_flush_age":0,
      "cache_min_evict_age":0,
      "erasure_code_profile":"",
      "hit_set_params":{
         "type":"none" 
      },
      "hit_set_period":0,
      "hit_set_count":0,
      "use_gmt_hitset":true,
      "min_read_recency_for_promote":0,
      "min_write_recency_for_promote":0,
      "hit_set_grade_decay_rate":0,
      "hit_set_search_last_n":0,
      "grade_table":[

      ],
      "stripe_width":0,
      "expected_num_objects":0,
      "fast_read":false,
      "options":{

      },
      "application_metadata":{

      }
   },
   {
      "pool":2,
      "pool_name":"cephfs_metadata",
      "flags":1,
      "flags_names":"hashpspool",
      "type":1,
      "size":2,
      "min_size":1,
      "crush_rule":0,
      "object_hash":2,
      "pg_num":4,
      "pg_placement_num":4,
      "crash_replay_interval":0,
      "last_change":"12",
      "last_force_op_resend":"0",
      "last_force_op_resend_preluminous":"0",
      "auid":0,
      "snap_mode":"selfmanaged",
      "snap_seq":0,
      "snap_epoch":0,
      "pool_snaps":[

      ],
      "removed_snaps":"[]",
      "quota_max_bytes":0,
      "quota_max_objects":0,
      "tiers":[

      ],
      "tier_of":-1,
      "read_tier":-1,
      "write_tier":-1,
      "cache_mode":"none",
      "target_max_bytes":0,
      "target_max_objects":0,
      "cache_target_dirty_ratio_micro":400000,
      "cache_target_dirty_high_ratio_micro":600000,
      "cache_target_full_ratio_micro":800000,
      "cache_min_flush_age":0,
      "cache_min_evict_age":0,
      "erasure_code_profile":"",
      "hit_set_params":{
         "type":"none" 
      },
      "hit_set_period":0,
      "hit_set_count":0,
      "use_gmt_hitset":true,
      "min_read_recency_for_promote":0,
      "min_write_recency_for_promote":0,
      "hit_set_grade_decay_rate":0,
      "hit_set_search_last_n":0,
      "grade_table":[

      ],
      "stripe_width":0,
      "expected_num_objects":0,
      "fast_read":false,
      "options":{

      },
      "application_metadata":{

      }
   },
   {
      "pool":3,
      "pool_name":"cephfs_data",
      "flags":1,
      "flags_names":"hashpspool",
      "type":1,
      "size":2,
      "min_size":1,
      "crush_rule":0,
      "object_hash":2,
      "pg_num":4,
      "pg_placement_num":4,
      "crash_replay_interval":0,
      "last_change":"13",
      "last_force_op_resend":"0",
      "last_force_op_resend_preluminous":"0",
      "auid":0,
      "snap_mode":"selfmanaged",
      "snap_seq":0,
      "snap_epoch":0,
      "pool_snaps":[

      ],
      "removed_snaps":"[]",
      "quota_max_bytes":0,
      "quota_max_objects":0,
      "tiers":[

      ],
      "tier_of":-1,
      "read_tier":-1,
      "write_tier":-1,
      "cache_mode":"none",
      "target_max_bytes":0,
      "target_max_objects":0,
      "cache_target_dirty_ratio_micro":400000,
      "cache_target_dirty_high_ratio_micro":600000,
      "cache_target_full_ratio_micro":800000,
      "cache_min_flush_age":0,
      "cache_min_evict_age":0,
      "erasure_code_profile":"",
      "hit_set_params":{
         "type":"none" 
      },
      "hit_set_period":0,
      "hit_set_count":0,
      "use_gmt_hitset":true,
      "min_read_recency_for_promote":0,
      "min_write_recency_for_promote":0,
      "hit_set_grade_decay_rate":0,
      "hit_set_search_last_n":0,
      "grade_table":[

      ],
      "stripe_width":0,
      "expected_num_objects":0,
      "fast_read":false,
      "options":{

      },
      "application_metadata":{

      }
   }
]

Related issues 2 (0 open2 closed)

Related to rbd - Bug #20916: HEALTH_WARN application not enabled on 1 pool(s)DuplicateJosh Durgin08/04/2017

Actions
Related to RADOS - Bug #20959: cephfs application metdata not set by ceph.pyResolvedGreg Farnum08/09/2017

Actions
Actions #1

Updated by Martin Emrich over 6 years ago

Same here after upgrading from 12.1.1 to 12.1.2:

id: cfaf0f4e-3b09-49e8-875b-4b114b0c4842
health: HEALTH_WARN
application not enabled on 16 pool(s)

I have no idea what to do.

Actions #2

Updated by Nathan Cutler over 6 years ago

Yesterday I, too, found myself doing "ceph osd pool application enable $POOL_NAME cephfs" on cephfs pools. This is a recent change in Ceph behavior.

Actions #3

Updated by Greg Farnum over 6 years ago

  • Subject changed from mon: mysterious "application not enabled on 1 pool(s)" to mon: mysterious "application not enabled on <N> pool(s)"
  • Description updated (diff)
  • Status changed from New to 12
  • Assignee set to Greg Farnum

I'm not sure if we've got the same issue here with upgrades and "fs new", or if they're distinct. Possibly we're doing something like updating them in the wrong map (current instead of pending)?

I will poke around some today.

Actions #4

Updated by Josh Durgin over 6 years ago

  • Related to Bug #20916: HEALTH_WARN application not enabled on 1 pool(s) added
Actions #5

Updated by Greg Farnum over 6 years ago

  • Status changed from 12 to In Progress

Martin, your cluster won't get auto-updated because it's going from Luminous rc to Luminous RC and we don't have a good way to apply the auto-update logic in that scenario.

Nathan, does that apply to your cluster as well?

Actions #6

Updated by Greg Farnum over 6 years ago

Hmm, the monitor log indicates it's setting the pool application

2017-08-02 19:27:44.368500 7fdee23af700 20 mon.b@0(leader).osd e13 do_application_enable: pool_id=3, app_name=cephfs
2017-08-02 19:27:44.368517 7fdee23af700 20 mon.b@0(leader).osd e13 do_application_enable: pool_id=2, app_name=cephfs

There isn't anything obvious in do_application_enable that directly triggers a new OSDMap, but I can see it going from epoch 13 to 14 right away, continuing on to 15 about a second later, so that doesn't seem to be a problem.

And the rbd pool is created with "pool create" and I don't see anything which sets its application. This appears to be deliberate, from teuthology.log:

2017-08-02T19:27:40.418 INFO:tasks.ceph:Creating RBD pool
2017-08-02T19:27:40.418 INFO:teuthology.orchestra.run.smithi193:Running: 'sudo ceph --cluster ceph osd pool create rbd 8'
Actions #7

Updated by Greg Farnum over 6 years ago

I'm not really sure why the osd dump isn't including the cephfs tag. From osd_types.cc, pg_pool_t::dump

  f->open_object_section("application_metadata");
  for (auto &app_pair : application_metadata) {
    f->open_object_section(app_pair.first.c_str());
    for (auto &kv_pair : app_pair.second) {
      f->dump_string(kv_pair.first.c_str(), kv_pair.second);
    }
    f->close_section(); // application
  }
  f->close_section(); // application_metadata

Do the object sections not get output if they don't have any contents, maybe? That doesn't seem right but is the only thing I can think of.

Actions #8

Updated by Greg Farnum over 6 years ago

Hmm, Dan has a ceph-docker cluster running recent code which is outputting a "cephfs" section as you'd expect.

Actions #9

Updated by Nathan Cutler over 6 years ago

Nathan, does that apply to your cluster as well?

No - I'm only doing integration testing, so all my clusters are short-lived.

Actions #10

Updated by Gordon Wang over 6 years ago

Hello Nathan

Thanks so much!

My ceph cluster encountered the same issue after upgrade from jewel to luminous(12.1.2) in yesterday.

Just now I did run command like this one "ceph osd pool application enable rbd cephfs" to do_application_enable for all pools created by me.

Now, ceph health result is "HEALTH_OK".

\\BR
Gordon

Actions #11

Updated by Nathan Cutler over 6 years ago

Hi Gordon

Just now I did run command like this one "ceph osd pool application enable rbd cephfs" to do_application_enable for all pools created by me.

Actually, the command for rbd pools should be: "ceph osd pool application enable $POOL_NAME rbd"

For cephfs pools it should be: "ceph osd pool application enable $POOL_NAME cephfs"

The command you wrote assigns the pool named "rbd" to the application "cephfs" which doesn't make sense unless it really is a cephfs pool?

One further note: by design, the upgrade from jewel to luminous is not supposed to require this manual intervention for standard pools like "rbd", "cephfs_metadata" etc.

Hope this helps,
Nathan

Actions #12

Updated by Martin Emrich over 6 years ago

Greg Farnum wrote:

Martin, your cluster won't get auto-updated because it's going from Luminous rc to Luminous RC and we don't have a good way to apply the auto-update logic in that scenario.

Thanks for the hint. I fixed all my rgw pools with

for POOL in `rados lspools | grep rgw`; do ceph osd pool application enable $POOL rgw; done

Then I fixed my rbd pools, too.

But I have one pools I created for our developers for native rados development stuff (not used yet). What application should I set for this pool? Do the different strings like "rbd", "cephfs" or "rgw" have some semantics behind them or are they just labels?

Thanks
Martin

Actions #13

Updated by Gordon Wang over 6 years ago

Dear Nathan, thanks for corrective, you are right. I am going to fix it now. _

Actions #14

Updated by Greg Farnum over 6 years ago

Martin Emrich wrote:

But I have one pools I created for our developers for native rados development stuff (not used yet). What application should I set for this pool? Do the different strings like "rbd", "cephfs" or "rgw" have some semantics behind them or are they just labels?

The tools are being updated to make sure the application matches what they expect. If you've got custom apps I would just set them to "librados", or perhaps a less reserved word like your company application name. The core cluster doesn't care at all.

Actions #15

Updated by Greg Farnum over 6 years ago

Nathan Cutler wrote:

Nathan, does that apply to your cluster as well?

No - I'm only doing integration testing, so all my clusters are short-lived.

Are you using the "fs new" command or something else? Are you sure it was the CephFS pools? (There was a brief period of time where teuthology was creating an "rbd" pool automatically but not tagging it correctly.)

Actions #16

Updated by Greg Farnum over 6 years ago

Apparently I forgot to update this ticket — the ceph health warning Patrick saw was the result of the just-mentioned teuthology change, which is now fixed.

The empty "pool application" json dump is confusing and my current working theory is that somewhere along the chain the osd dump was encoded without the feature bit indicating that was supported data.

Actions #17

Updated by Greg Farnum over 6 years ago

Gordon Wang wrote:

Hello Nathan

Thanks so much!

My ceph cluster encountered the same issue after upgrade from jewel to luminous(12.1.2) in yesterday.

Gordon, did you perhaps upgrade to 12.1.1 on some of your nodes, or along the way?

Actions #18

Updated by Nathan Cutler over 6 years ago

Hi Greg. Technically I'm using teuthology proper to automate the test cases, but I'm not using ceph-cm-ansible or any of the tasks in qa/tasks/, so I doubt the issue is coming from "teuthology" in the wider sense. The cephfs is being created like this:

ceph osd pool create cephfs_data 128
ceph osd pool application enable cephfs cephfs_data || :
ceph osd pool create cephfs_metadata 128
ceph osd pool application enable cephfs cephfs_metadata || :
ceph fs new cephfs cephfs_metadata cephfs_data

(I added the "osd pool application enable" commands to work around this issue.)

Actions #19

Updated by Nathan Cutler over 6 years ago

Also, my clusters are new deployments of very recent luminous code, no upgrades.

Actions #20

Updated by Greg Farnum over 6 years ago

  • Related to Bug #20959: cephfs application metdata not set by ceph.py added
Actions #21

Updated by Greg Farnum over 6 years ago

  • Status changed from In Progress to Resolved

I'm going to close this one since we fixed up the RBD issue and aren't going to resolve the RC-upgrade problem.

Further work is ongoing at #20959 for the missing json output.

Actions

Also available in: Atom PDF