Project

General

Profile

Actions

Bug #45596

closed

qa/tasks/cephadm: No cephadm module detected

Added by Varsha Rao almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
teuthology
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Something weird is happening here.
First mgr fails and cephadm is disabled.

2020-05-08T07:53:17.716 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:17 smithi003 podman[39947]: 2020-05-08 07:53:17.404452424 +0000 UTC m=+0.254745655 container died 164e714f7b5017223ced2fe56efb54e8b0076d495c028be9bab85bda91f22c97 (image=quay.io/ceph-ci/ceph:5d96f0c9612029b065cea7c34cd161b174878f8c, name=ceph-8f4e11de-9100-11ea-a068-001a4aab830c-mgr.a)
2020-05-08T07:53:17.717 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:17 smithi003 podman[39947]: 2020-05-08 07:53:17.467673383 +0000 UTC m=+0.317966394 container stop 164e714f7b5017223ced2fe56efb54e8b0076d495c028be9bab85bda91f22c97 (image=quay.io/ceph-ci/ceph:5d96f0c9612029b065cea7c34cd161b174878f8c, name=ceph-8f4e11de-9100-11ea-a068-001a4aab830c-mgr.a)
2020-05-08T07:53:17.717 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:17 smithi003 podman[39947]: 164e714f7b5017223ced2fe56efb54e8b0076d495c028be9bab85bda91f22c97
2020-05-08T07:53:17.997 INFO:teuthology.orchestra.run.smithi003:> true
2020-05-08T07:53:18.013 INFO:teuthology.orchestra.run.smithi003:> sudo pkill -f 'journalctl -f -n 0 -u ceph-8f4e11de-9100-11ea-a068-001a4aab830c@mgr.a.service'
2020-05-08T07:53:18.083 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:17 smithi003 podman[26049]: 2020-05-08 07:53:17.983398302 +0000 UTC m=+134.198961367 container remove 164e714f7b5017223ced2fe56efb54e8b0076d495c028be9bab85bda91f22c97 (image=quay.io/ceph-ci/ceph:5d96f0c9612029b065cea7c34cd161b174878f8c, name=ceph-8f4e11de-9100-11ea-a068-001a4aab830c-mgr.a)
2020-05-08T07:53:18.083 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:17 smithi003 bash[26046]: debug
2020-05-08T07:53:18.083 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:17 smithi003 systemd[1]: Stopped Ceph mgr.a for 8f4e11de-9100-11ea-a068-001a4aab830c.
2020-05-08T07:53:18.203 DEBUG:teuthology.orchestra.run:got remote process result: None
2020-05-08T07:53:18.204 INFO:tasks.cephadm.mgr.a:Stopped mgr.a
2020-05-08T07:53:18.205 INFO:teuthology.orchestra.run.smithi003:> true
2020-05-08T07:53:18.251 INFO:teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early mgr fail a
2020-05-08T07:53:18.713 INFO:teuthology.orchestra.run.smithi003:> true

Then mgr is restarted and fails as cephadm is not detected.

2020-05-08T07:53:19.453 INFO:teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early mgr module disable cephadm
2020-05-08T07:53:19.964 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:19 smithi003 bash[25807]: audit 2020-05-08T07:53:18.700715+0000 mon.a (mon.0) 256 : audit [INF] from='client.? 172.21.15.3:0/3482683540' entity='client.admin' cmd='[{"prefix": "mgr fail", "who": "a"}]': finished
2020-05-08T07:53:19.965 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:19 smithi003 bash[25807]: cluster 2020-05-08T07:53:18.700760+0000 mon.a (mon.0) 257 : cluster [DBG] mgrmap e14: no daemons active (since 0.0557753s)
2020-05-08T07:53:19.965 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:19 smithi003 bash[25807]: audit 2020-05-08T07:53:19.085697+0000 mon.a (mon.0) 258 : audit [DBG] from='client.? 172.21.15.3:0/2781800809' entity='client.admin' cmd=[{"prefix": "mgr module ls"}]: dispatch
2020-05-08T07:53:20.727 INFO:teuthology.orchestra.run.smithi003:> true
2020-05-08T07:53:20.744 INFO:teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early mgr module disable iostat
2020-05-08T07:53:20.964 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:20 smithi003 bash[25807]: audit 2020-05-08T07:53:19.810781+0000 mon.a (mon.0) 259 : audit [INF] from='client.? 172.21.15.3:0/3588794767' entity='client.admin' cmd=[{"prefix": "mgr module disable", "module": "cephadm"}]: dispatch
2020-05-08T07:53:21.736 INFO:teuthology.orchestra.run.smithi003:> true
2020-05-08T07:53:21.753 INFO:teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early mgr module disable dashboard
2020-05-08T07:53:21.964 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:21 smithi003 bash[25807]: audit 2020-05-08T07:53:20.710746+0000 mon.a (mon.0) 260 : audit [INF] from='client.? 172.21.15.3:0/3588794767' entity='client.admin' cmd='[{"prefix": "mgr module disable", "module": "cephadm"}]': finished
2020-05-08T07:53:21.964 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:21 smithi003 bash[25807]: cluster 2020-05-08T07:53:20.710801+0000 mon.a (mon.0) 261 : cluster [DBG] mgrmap e15: no daemons active (since 2s)
2020-05-08T07:53:21.965 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:21 smithi003 bash[25807]: audit 2020-05-08T07:53:21.110849+0000 mon.a (mon.0) 262 : audit [INF] from='client.? 172.21.15.3:0/1093881759' entity='client.admin' cmd=[{"prefix": "mgr module disable", "module": "iostat"}]: dispatch
2020-05-08T07:53:22.731 INFO:tasks.cephadm.mgr.a:Restarting mgr.a (starting--it wasn't running)...

2020-05-08T07:53:35.900 INFO:ceph.mgr.a.smithi003.stdout:May 08 07:53:35 smithi003 bash[40619]: debug 2020-05-08T07:53:35.522+0000 7f1e08ae5700 -1 no module 'cephadm'
2020-05-08T07:53:35.901 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:35 smithi003 bash[25807]: audit 2020-05-08T07:53:33.861586+0000 mgr.a (mgr.14247) 5 : audit [DBG] from='client.14262 -' entity='client.admin' cmd=[{"prefix": "nfs clust
er create", "type": "cephfs", "clusterid": "test", "target": ["mon-mgr", ""]}]: dispatch
2020-05-08T07:53:35.901 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:35 smithi003 bash[25807]: audit 2020-05-08T07:53:34.470014+0000 mon.a (mon.0) 296 : audit [INF] from='mgr.14247 172.21.15.3:0/57223522' entity='mgr.a' cmd='[{"prefix": 
"osd pool create", "pool": "nfs-ganesha"}]': finished
2020-05-08T07:53:35.901 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:35 smithi003 bash[25807]: cluster 2020-05-08T07:53:34.470056+0000 mon.a (mon.0) 297 : cluster [DBG] osdmap e24: 3 total, 3 up, 3 in
2020-05-08T07:53:35.901 INFO:ceph.mon.a.smithi003.stdout:May 08 07:53:35 smithi003 bash[25807]: audit 2020-05-08T07:53:34.473610+0000 mon.a (mon.0) 298 : audit [INF] from='mgr.14247 172.21.15.3:0/57223522' entity='mgr.a' cmd=[{"prefix": " 
osd pool application enable", "pool": "nfs-ganesha", "app": "nfs"}]: dispatch
2020-05-08T07:53:35.902 INFO:teuthology.orchestra.run.smithi003.stderr:Error ENOENT: Module not found

http://qa-proxy.ceph.com/teuthology/varsha-2020-05-08_07:35:51-rados-wip-varsha-testing-distro-basic-smithi/5034105/

Actions #1

Updated by Varsha Rao almost 4 years ago

This failure can also be seen in this test:
http://pulpito.ceph.com/varsha-2020-05-18_10:25:58-rados-wip-integrate-cephadm-with-volume-nfs-distro-basic-smithi/5066842/

The test passed here as I have enabled cephadm again.

Actions #2

Updated by Sebastian Wagner almost 4 years ago

Right, the mgr gets restarted:

teuthology.run_tasks:Running task cephfs_test_runner...
teuthology.orchestra.run.smithi003:> true
teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd dump --format=json
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.034316+0000 mon.a (mon.0) 239 : audit [INF] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd=[{"prefix":"config-key set","key":"mgr/cephadm/spec.mds.1","val":"{\"create\": \"2020-05-08T07:53:15.034002\", \"spec\": {\"placement\": {\"count\": 2}, \"service_id\": \"1\", \"service_name\": \"mds.1\", \"service_type\": \"mds\"}}"}]: dispatch
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.036083+0000 mon.a (mon.0) 240 : audit [INF] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd='[{"prefix":"config-key set","key":"mgr/cephadm/spec.mds.1","val":"{\"created\": \"2020-05-08T07:53:15.034002\", \"spec\": {\"placement\": {\"count\": 2}, \"service_id\": \"1\", \"service_name\": \"mds.1\", \"service_type\": \"mds\"}}"}]': finished
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.037441+0000 mon.a (mon.0) 241 : audit [INF] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd=[{"prefix": "config set", "who": "mds.1", "name": "mds_join_fs", "value": "1"}]: dispatch
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.039909+0000 mon.a (mon.0) 242 : audit [INF] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd='[{"prefix": "config set", "who": "mds.1", "name": "mds_join_fs", "value": "1"}]': finished
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.040708+0000 mon.a (mon.0) 243 : audit [INF] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd=[{"prefix": "auth get-or-create", "entity": "mds.1.smithi003.pehqbc", "caps": ["mon", "profile mds", "osd", "allow rwx", "mds", "allow"]}]: dispatch
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.042618+0000 mon.a (mon.0) 244 : audit [INF] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd='[{"prefix": "auth get-or-create", "entity": "mds.1.smithi003.pehqbc", "caps": ["mon", "profile mds", "osd", "allow rwx", "mds", "allow"]}]': finished
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.043260+0000 mon.a (mon.0) 245 : audit [DBG] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
ceph.mon.a.smithi003.stdout:May 08 07:53:15 smithi003 bash[25807]: audit 2020-05-08T07:53:15.044105+0000 mon.a (mon.0) 246 : audit [DBG] from='mgr.14142 172.21.15.3:0/125380827' entity='mgr.a' cmd=[{"prefix": "config get", "who": "mds.1.smithi003.pehqbc", "key": "container_image"}]: dispatch
teuthology.orchestra.run.smithi003:> true
teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd pool get device_health_metrics pg_num
teuthology.orchestra.run.smithi003.stdout:pg_num: 1
teuthology.orchestra.run.smithi003:> true
teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd dump --format=json
teuthology.orchestra.run.smithi003.stdout:
teuthology.orchestra.run.smithi003.stdout:{"epoch":22,"fsid":"8f4e11de-9100-11ea-a068-001a4aab830c","created":"2020-05-08T07:50:57.493161+0000","modified":"2020-05-08T07:52:42.465961+0000","last_up_change":"2020-05-08T07:52:41.465316+0000","last_in_change":"2020-05-08T07:52:41.465316+0000","flags":"sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit","flags_num":5799936,"flags_set":["pglog_hardlimit","purged_snapdirs","recovery_deletes","sortbitwise"],"crush_version":8,"full_ratio":0.94999998807907104,"backfillfull_ratio":0.89999997615814209,"nearfull_ratio":0.85000002384185791,"cluster_snapshot":"","pool_max":1,"max_osd":3,"require_min_compat_client":"luminous","min_compat_client":"jewel","require_osd_release":"pacific","pools":[{"pool":1,"pool_name":"device_health_metrics","create_time":"2020-05-08T07:51:08.676406+0000","flags":1,"flags_names":"hashpspool","type":1,"size":3,"min_size":2,"crush_rule":0,"object_hash":2,"pg_autoscale_mode":"off","pg_num":1,"pg_placement_num":1,"pg_placement_num_target":1,"pg_num_target":1,"pg_num_pending":1,"last_pg_merge_meta":{"source_pgid":"0.0","ready_epoch":0,"last_epoch_started":0,"last_epoch_clean":0,"source_version":"0'0","target_version":"0'0"},"last_change":"12","last_force_op_resend":"0","last_force_op_resend_prenautilus":"0","last_force_op_resend_preluminous":"0","auid":0,"snap_mode":"selfmanaged","snap_seq":0,"snap_epoch":0,"pool_snaps":[],"removed_snaps":"[]","quota_max_bytes":0,"quota_max_objects":0,"tiers":[],"tier_of":-1,"read_tier":-1,"write_tier":-1,"cache_mode":"none","target_max_bytes":0,"target_max_objects":0,"cache_target_dirty_ratio_micro":400000,"cache_target_dirty_high_ratio_micro":600000,"cache_target_full_ratio_micro":800000,"cache_min_flush_age":0,"cache_min_evict_age":0,"erasure_code_profile":"","hit_set_params":{"type":"none"},"hit_set_period":0,"hit_set_count":0,"use_gmt_hitset":true,"min_read_recency_for_promote":0,"min_write_recency_for_promote":0,"hit_set_grade_decay_rate":0,"hit_set_search_last_n":0,"grade_table":[],"stripe_width":0,"expected_num_objects":0,"fast_read":false,"options":{"pg_num_min":1},"application_metadata":{"mgr_devicehealth":{}}}],"osds":[{"osd":0,"uuid":"fc7238e9-7c0c-43a8-9bdb-6fbd64a1dd1e","up":1,"in":1,"weight":1,"primary_affinity":1,"last_clean_begin":0,"last_clean_end":0,"up_from":10,"up_thru":10,"down_at":0,"lost_at":0,"public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6802","nonce":3721421418},{"type":"v1","addr":"172.21.15.3:6803","nonce":3721421418}]},"cluster_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6804","nonce":3721421418},{"type":"v1","addr":"172.21.15.3:6805","nonce":3721421418}]},"heartbeat_back_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6808","nonce":3721421418},{"type":"v1","addr":"172.21.15.3:6809","nonce":3721421418}]},"heartbeat_front_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6806","nonce":3721421418},{"type":"v1","addr":"172.21.15.3:6807","nonce":3721421418}]},"public_addr":"172.21.15.3:6803/3721421418","cluster_addr":"172.21.15.3:6805/3721421418","heartbeat_back_addr":"172.21.15.3:6809/3721421418","heartbeat_front_addr":"172.21.15.3:6807/3721421418","state":["exists","up"]},{"osd":1,"uuid":"cafa5809-bbde-4125-bd28-c4b36f434f51","up":1,"in":1,"weight":1,"primary_affinity":1,"last_clean_begin":0,"last_clean_end":0,"up_from":16,"up_thru":21,"down_at":0,"lost_at":0,"public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6810","nonce":431372452},{"type":"v1","addr":"172.21.15.3:6811","nonce":431372452}]},"cluster_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6812","nonce":431372452},{"type":"v1","addr":"172.21.15.3:6813","nonce":431372452}]},"heartbeat_back_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6816","nonce":431372452},{"type":"v1","addr":"172.21.15.3:6817","nonce":431372452}]},"heartbeat_front_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6814","nonce":431372452},{"type":"v1","addr":"172.21.15.3:6815","nonce":431372452}]},"public_addr":"172.21.15.3:6811/431372452","cluster_addr":"172.21.15.3:6813/431372452","heartbeat_back_addr":"172.21.15.3:6817/431372452","heartbeat_front_addr":"172.21.15.3:6815/431372452","state":["exists","up"]},{"osd":2,"uuid":"d490049d-7375-4755-835c-387571dec1b4","up":1,"in":1,"weight":1,"primary_affinity":1,"last_clean_begin":0,"last_clean_end":0,"up_from":21,"up_thru":0,"down_at":0,"lost_at":0,"public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6818","nonce":3872854968},{"type":"v1","addr":"172.21.15.3:6819","nonce":3872854968}]},"cluster_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6820","nonce":3872854968},{"type":"v1","addr":"172.21.15.3:6821","nonce":3872854968}]},"heartbeat_back_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6824","nonce":3872854968},{"type":"v1","addr":"172.21.15.3:6825","nonce":3872854968}]},"heartbeat_front_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.3:6822","nonce":3872854968},{"type":"v1","addr":"172.21.15.3:6823","nonce":3872854968}]},"public_addr":"172.21.15.3:6819/3872854968","cluster_addr":"172.21.15.3:6821/3872854968","heartbeat_back_addr":"172.21.15.3:6825/3872854968","heartbeat_front_addr":"172.21.15.3:6823/3872854968","state":["exists","up"]}],"osd_xinfo":[{"osd":0,"down_stamp":"0.000000","laggy_probability":0,"laggy_interval":0,"features":4540138292837744639,"old_weight":0,"last_purged_snaps_scrub":"2020-05-08T07:52:05.414664+0000","dead_epoch":0},{"osd":1,"down_stamp":"0.000000","laggy_probability":0,"laggy_interval":0,"features":4540138292837744639,"old_weight":0,"last_purged_snaps_scrub":"2020-05-08T07:52:22.372890+0000","dead_epoch":0},{"osd":2,"down_stamp":"0.000000","laggy_probability":0,"laggy_interval":0,"features":4540138292837744639,"old_weight":0,"last_purged_snaps_scrub":"2020-05-08T07:52:39.696420+0000","dead_epoch":0}],"pg_upmap":[],"pg_upmap_items":[],"pg_temp":[],"primary_temp":[],"blacklist":{"172.21.15.3:0/4218317686":"2020-05-09T07:51:31.456074+0000","172.21.15.3:6801/747294320":"2020-05-09T07:51:31.456074+0000","172.21.15.3:6800/747294320":"2020-05-09T07:51:31.456074+0000","172.21.15.3:0/343613801":"2020-05-09T07:51:31.456074+0000","172.21.15.3:0/321731460":"2020-05-09T07:51:17.451065+0000","172.21.15.3:0/1524061237":"2020-05-09T07:51:17.451065+0000","172.21.15.3:6800/570309461":"2020-05-09T07:51:17.451065+0000","172.21.15.3:6801/570309461":"2020-05-09T07:51:17.451065+0000","172.21.15.3:0/1326871292":"2020-05-09T07:51:31.456074+0000","172.21.15.3:0/2357622241":"2020-05-09T07:51:17.451065+0000"},"erasure_code_profiles":{"default":{"crush-failure-domain":"osd","k":"2","m":"1","plugin":"jerasure","ruleset-failure-domain":"osd","technique":"reed_sol_van"}},"removed_snaps_queue":[],"new_removed_snaps":[],"new_purged_snaps":[],"crush_node_flags":{},"device_class_flags":{}}
ceph.mon.a.smithi003.stdout:May 08 07:53:16 smithi003 bash[25807]: audit 2020-05-08T07:53:15.033546+0000 mgr.a (mgr.14142) 59 : audit [DBG] from='client.14224 -' entity='client.admin' cmd=[{"prefix": "orch apply mds", "fs_name": "1", "target": ["mon-mgr", ""]}]: dispatch
ceph.mon.a.smithi003.stdout:May 08 07:53:16 smithi003 bash[25807]: cephadm 2020-05-08T07:53:15.033991+0000 mgr.a (mgr.14142) 60 : cephadm [INF] Saving service mds.1 spec with placement count:2
ceph.mon.a.smithi003.stdout:May 08 07:53:16 smithi003 bash[25807]: cephadm 2020-05-08T07:53:15.043747+0000 mgr.a (mgr.14142) 61 : cephadm [INF] Deploying daemon mds.1.smithi003.pehqbc on smithi003
ceph.mon.a.smithi003.stdout:May 08 07:53:16 smithi003 bash[25807]: cluster 2020-05-08T07:53:15.527635+0000 mgr.a (mgr.14142) 62 : cluster [DBG] pgmap v62: 1 pgs: 1 active+clean; 0 B data, 576 KiB used, 265 GiB / 268 GiB avail
ceph.mon.a.smithi003.stdout:May 08 07:53:16 smithi003 bash[25807]: audit 2020-05-08T07:53:15.900953+0000 mon.a (mon.0) 247 : audit [DBG] from='client.? 172.21.15.3:0/3739932791' entity='client.admin' cmd=[{"prefix": "osd dump", "format": "json"}]: dispatch
ceph.mon.a.smithi003.stdout:May 08 07:53:16 smithi003 bash[25807]: audit 2020-05-08T07:53:16.252179+0000 mon.a (mon.0) 248 : audit [DBG] from='client.? 172.21.15.3:0/2236254010' entity='client.admin' cmd=[{"prefix": "osd pool get", "pool": "device_health_metrics", "var": "pg_num"}]: dispatch
tasks.cephfs.filesystem.ceph_manager:[{'pool': 1, 'pool_name': 'device_health_metrics', 'create_time': '2020-05-08T07:51:08.676406+0000', 'flags': 1, 'flags_names': 'hashpspool', 'type': 1, 'size': 3, 'min_size': 2, 'crush_rule': 0, 'object_hash': 2, 'pg_autoscale_mode': 'off', 'pg_num': 1, 'pg_placement_num': 1, 'pg_placement_num_target': 1, 'pg_num_target': 1, 'pg_num_pending': 1, 'last_pg_merge_meta': {'source_pgid': '0.0', 'ready_epoch': 0, 'last_epoch_started': 0, 'last_epoch_clean': 0, 'source_version': "0'0", 'target_version': "0'0"}, 'last_change': '12', 'last_force_op_resend': '0', 'last_force_op_resend_prenautilus': '0', 'last_force_op_resend_preluminous': '0', 'auid': 0, 'snap_mode': 'selfmanaged', 'snap_seq': 0, 'snap_epoch': 0, 'pool_snaps': [], 'removed_snaps': '[]', 'quota_max_bytes': 0, 'quota_max_objects': 0, 'tiers': [], 'tier_of': -1, 'read_tier': -1, 'write_tier': -1, 'cache_mode': 'none', 'target_max_bytes': 0, 'target_max_objects': 0, 'cache_target_dirty_ratio_micro': 400000, 'cache_target_dirty_high_ratio_micro': 600000, 'cache_target_full_ratio_micro': 800000, 'cache_min_flush_age': 0, 'cache_min_evict_age': 0, 'erasure_code_profile': '', 'hit_set_params': {'type': 'none'}, 'hit_set_period': 0, 'hit_set_count': 0, 'use_gmt_hitset': True, 'min_read_recency_for_promote': 0, 'min_write_recency_for_promote': 0, 'hit_set_grade_decay_rate': 0, 'hit_set_search_last_n': 0, 'grade_table': [], 'stripe_width': 0, 'expected_num_objects': 0, 'fast_read': False, 'options': {'pg_num_min': 1}, 'application_metadata': {'mgr_devicehealth': {}}}]
teuthology.orchestra.run.smithi003:> true
teuthology.orchestra.run.smithi003:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd pool get device_health_metrics pg_num
teuthology.orchestra.run.smithi003.stdout:pg_num: 1
tasks.cephadm.mgr.a:Stopping mgr.a...

do you have any idea why that happens?

Actions #3

Updated by Varsha Rao almost 4 years ago

mgr log

2020-05-08T07:53:16.835+0000 7f4114862700 10 mgr.server ms_handle_authentication  session 0x55eba3d80120 client.admin has caps allow * 'allow *'
2020-05-08T07:53:16.835+0000 7f4114862700  1 --2- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >>  conn(0x55eba943f800 0x55eba9686400 secure :-1 s=AUTH_ACCEPTING_SIGN pgs=0 cs=0 l=1 rx=0x55eba4232a50 tx=0x55eba99af6d0).
handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-05-08T07:53:16.836+0000 7f4114862700  1 --2- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >>  conn(0x55eba943f800 0x55eba9686400 secure :-1 s=SESSION_ACCEPTING pgs=0 cs=0 l=1 rx=0x55eba4232a50 tx=0x55eba99af6d0).ha
ndle_read_frame_epilogue_main read frame epilogue bytes=32
2020-05-08T07:53:16.836+0000 7f4114862700  1 --2- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >> 172.21.15.3:0/1228161486 conn(0x55eba943f800 0x55eba9686400 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55eba4232a50 tx=0x55eb
a99af6d0).ready entity=client.14232 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2020-05-08T07:53:17.005+0000 7f4114862700  1 -- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >> 172.21.15.3:0/1228161486 conn(0x55eba943f800 msgr2=0x55eba9686400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk 
peer close file descriptor 54
2020-05-08T07:53:17.005+0000 7f4114862700  1 -- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >> 172.21.15.3:0/1228161486 conn(0x55eba943f800 msgr2=0x55eba9686400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until
 read failed
2020-05-08T07:53:17.005+0000 7f4114862700  1 --2- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >> 172.21.15.3:0/1228161486 conn(0x55eba943f800 0x55eba9686400 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55eba4232a50 tx=0x55eb
a99af6d0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2020-05-08T07:53:17.005+0000 7f4114862700  1 --2- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] >> 172.21.15.3:0/1228161486 conn(0x55eba943f800 0x55eba9686400 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55eba4232a50 tx=0x55eb
a99af6d0).stop
2020-05-08T07:53:17.005+0000 7f4114862700  1 -- [v2:172.21.15.3:6800/1589529919,v1:172.21.15.3:6801/1589529919] reap_dead start
2020-05-08T07:53:23.577+0000 7f1e3a35b040  1 -- 172.21.15.3:0/435947524 shutdown_connections 
2020-05-08T07:53:23.577+0000 7f1e3a35b040  1 --2- 172.21.15.3:0/435947524 >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdbb6ba80 unknown :-1 s=CLOSED pgs=154 cs=0 l=1 rx=0 tx=0).stop
2020-05-08T07:53:23.577+0000 7f1e3a35b040  1 -- 172.21.15.3:0/435947524 >> 172.21.15.3:0/435947524 conn(0x55fcdbb4c800 msgr2=0x55fcdbc24800 unknown :-1 s=STATE_NONE l=0).mark_down
2020-05-08T07:53:23.577+0000 7f1e3a35b040  1 -- 172.21.15.3:0/435947524 shutdown_connections 
2020-05-08T07:53:23.577+0000 7f1e3a35b040  1 -- 172.21.15.3:0/435947524 wait complete.
2020-05-08T07:53:23.578+0000 7f1e3a35b040  0 set uid:gid to 167:167 (ceph:ceph)
2020-05-08T07:53:23.578+0000 7f1e3a35b040  0 ceph version 16.0.0-1393-g5d96f0c9612 (5d96f0c9612029b065cea7c34cd161b174878f8c) pacific (dev), process ceph-mgr, pid 1
2020-05-08T07:53:23.578+0000 7f1e3a35b040  0 pidfile_write: ignore empty --pid-file
2020-05-08T07:53:23.578+0000 7f1e3a35b040  1  Processor -- start
2020-05-08T07:53:23.578+0000 7f1e3a35b040  1 --  start start
2020-05-08T07:53:23.578+0000 7f1e3a35b040  4 mgr init Registered monc callback
2020-05-08T07:53:23.579+0000 7f1e3a35b040  1 --2-  >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2020-05-08T07:53:23.579+0000 7f1e3a35b040  1 --  --> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] -- mon_getmap magic: 0 v1 -- 0x55fcdbbaedc0 con 0x55fcdbb4cc00
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2-  >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2-  >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello peer v2:172.21.15.3:3300/0 says I am v2:172.21.15.3:37882/0 (socket says 172.21.15.3:37882)
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 -- 172.21.15.3:0/57223522 learned_addr learned my addr 172.21.15.3:0/57223522 (peer_addr_for_me v2:172.21.15.3:0/0)
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 -- 172.21.15.3:0/57223522 --> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] -- mon_subscribe({config=0+,mgrmap=0+,monmap=0+}) v3 -- 0x55fcdbb76820 con 0x55fcdbb4cc00
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2- 172.21.15.3:0/57223522 >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 secure :-1 s=AUTH_CONNECTING_SIGN pgs=0 cs=0 l=0 rx=0x55fcdbb8c000 tx=0x55fcdbba2eb0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2- 172.21.15.3:0/57223522 >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=0 rx=0x55fcdbb8c000 tx=0x55fcdbba2eb0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2- 172.21.15.3:0/57223522 >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 secure :-1 s=READY pgs=155 cs=0 l=1 rx=0x55fcdbb8c000 tx=0x55fcdbba2eb0).ready entity=mon.0 client_cookie=f93b18c2c8c4c327 server_cookie=0 in_seq=0 out_seq=0
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2- 172.21.15.3:0/57223522 >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 secure :-1 s=THROTTLE_DONE pgs=155 cs=0 l=1 rx=0x55fcdbb8c000 tx=0x55fcdbba2eb0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-05-08T07:53:23.579+0000 7f1e27885700  1 -- 172.21.15.3:0/57223522 <== mon.0 v2:172.21.15.3:3300/0 1 ==== mon_map magic: 0 v1 ==== 183+0+0 (secure 0 0 0) 0x55fcdbb962c0 con 0x55fcdbb4cc00
2020-05-08T07:53:23.579+0000 7f1e27885700  1 -- 172.21.15.3:0/57223522 --> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] -- auth(proto 2 2 bytes epoch 0) v1 -- 0x55fcdbb95500 con 0x55fcdbb4cc00
2020-05-08T07:53:23.579+0000 7f1e2b88d700  1 --2- 172.21.15.3:0/57223522 >> [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] conn(0x55fcdbb4cc00 0x55fcdc84e000 secure :-1 s=THROTTLE_DONE pgs=155 cs=0 l=1 rx=0x55fcdbb8c000 tx=0x55fcdbba2eb0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-05-08T07:53:23.579+0000 7f1e27885700  1 -- 172.21.15.3:0/57223522 <== mon.0 v2:172.21.15.3:3300/0 2 ==== config(25 keys) v1 ==== 1066+0+0 (secure 0 0 0) 0x55fcdbb94900 con 0x55fcdbb4cc00
2020-05-08T07:53:23.579+0000 7f1e26082700 10 mgr operator() config_callback: auth_debug : true
2020-05-08T07:53:23.579+0000 7f1e26082700 10 mgr operator() config_callback: container_image : quay.io/ceph-ci/ceph:5d96f0c9612029b065cea7c34cd161b174878f8c
2020-05-08T07:53:23.579+0000 7f1e26082700 10 mgr operator() config_callback: debug_asserts_on_shutdown : true

mds log

2020-05-08T07:53:17.369+0000 7f11bf1fe5c0  0 set uid:gid to 167:167 (ceph:ceph)
2020-05-08T07:53:17.369+0000 7f11bf1fe5c0  0 ceph version 16.0.0-1393-g5d96f0c9612 (5d96f0c9612029b065cea7c34cd161b174878f8c) pacific (dev), process ceph-mds, pid 1
2020-05-08T07:53:17.369+0000 7f11bf1fe5c0  1 main not setting numa affinity
2020-05-08T07:53:17.370+0000 7f11bf1fe5c0 -1 MDS id 'mds.1.smithi003.pehqbc' is invalid. MDS names may not start with a numeric digit.
2020-05-08T07:53:29.146+0000 7fcf012cb5c0  0 set uid:gid to 167:167 (ceph:ceph)
2020-05-08T07:53:29.146+0000 7fcf012cb5c0  0 ceph version 16.0.0-1393-g5d96f0c9612 (5d96f0c9612029b065cea7c34cd161b174878f8c) pacific (dev), process ceph-mds, pid 1
2020-05-08T07:53:29.146+0000 7fcf012cb5c0  1 main not setting numa affinity
2020-05-08T07:53:29.146+0000 7fcf012cb5c0 -1 MDS id 'mds.1.smithi003.pehqbc' is invalid. MDS names may not start with a numeric digit.

The mds log is similar to the passed test mds log.

I don't the reason. But mgr gets killed and restarted after deploying mds here.

Actions #5

Updated by Varsha Rao almost 4 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Varsha Rao
  • Pull request ID set to 35198
Actions #6

Updated by Kefu Chai almost 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF