Project

General

Profile

Actions

Bug #20705

closed

repair_test fails due to race with osd start

Added by Sage Weil almost 7 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-07-20T06:42:07.295 INFO:tasks.repair_test:starting repair test for erasure code
2017-07-20T06:42:07.295 INFO:tasks.repair_test:doing put
2017-07-20T06:42:07.296 INFO:teuthology.orchestra.run.smithi195:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados --cluster ceph -p repair_pool_3 put repair_test_obj /etc/hosts'
2017-07-20T06:42:08.021 INFO:tasks.repair_test:corrupting object
2017-07-20T06:42:08.021 INFO:tasks.repair_test:remove the hinfo_key
2017-07-20T06:42:08.021 INFO:teuthology.orchestra.run.smithi195:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --format=json osd map repair_pool_3 repair_test_obj'
2017-07-20T06:42:08.246 INFO:teuthology.orchestra.run.smithi195.stdout:
2017-07-20T06:42:08.246 INFO:teuthology.orchestra.run.smithi195.stdout:{"epoch":57,"pool":"repair_pool_3","pool_id":10,"objname":"repair_test_obj","raw_pgid":"10.3df68405","pgid":"10.0","up":[5,4,0],"up_primary":5,"acting":[5,4,0],"acting_primary":5}
2017-07-20T06:42:08.247 INFO:teuthology.orchestra.run.smithi195:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd dump --format=json'
2017-07-20T06:42:08.485 INFO:teuthology.orchestra.run.smithi195.stdout:
2017-07-20T06:42:08.486 INFO:teuthology.orchestra.run.smithi195.stdout:{"epoch":57,"fsid":"34d091ce-d14b-42d2-b01b-2c77020a2c2b","created":"2017-07-20 06:34:06.336079","modified":"2017-07-20 06:42:07.200393","flags":"noscrub,nodeep-scrub,sortbitwise","crush_version":1,"full_ratio":0.000000,"backfillfull_ratio":0.000000,"nearfull_ratio":0.000000,"cluster_snapshot":"","pool_max":10,"max_osd":6,"require_min_compat_client":"unknown","min_compat_client":"jewel","require_osd_release":"kraken","pools":[{"pool":1,"pool_name":"rbd","flags":1,"flags_names":"hashpspool","type":1,"size":2,"min_size":1,"crush_rule":0,"object_hash":2,"pg_num":8,"pg_placement_num":8,"crash_replay_interval":0,"last_change":"12","last_force_op_resend":"0","last_force_op_resend_preluminous":"0","auid":0,"snap_mode":"selfmanaged","snap_seq":0,"snap_epoch":0,"pool_snaps":[],"removed_snaps":"[]","quota_max_bytes":0,"quota_max_objects":0,"tiers":[],"tier_of":-1,"read_tier":-1,"write_tier":-1,"cache_mode":"none","target_max_bytes":0,"target_max_objects":0,"cache_target_dirty_ratio_micro":400000,"cache_target_dirty_high_ratio_micro":600000,"cache_target_full_ratio_micro":800000,"cache_min_flush_age":0,"cache_min_evict_age":0,"erasure_code_profile":"","hit_set_params":{"type":"none"},"hit_set_period":0,"hit_set_count":0,"use_gmt_hitset":true,"min_read_recency_for_promote":0,"min_write_recency_for_promote":0,"hit_set_grade_decay_rate":0,"hit_set_search_last_n":0,"grade_table":[],"stripe_width":0,"expected_num_objects":0,"fast_read":false,"options":{},"application_metadata":{}},{"pool":10,"pool_name":"repair_pool_3","flags":1,"flags_names":"hashpspool","type":3,"size":3,"min_size":3,"crush_rule":1,"object_hash":2,"pg_num":1,"pg_placement_num":1,"crash_replay_interval":0,"last_change":"56","last_force_op_resend":"0","last_force_op_resend_preluminous":"0","auid":0,"snap_mode":"selfmanaged","snap_seq":0,"snap_epoch":0,"pool_snaps":[],"removed_snaps":"[]","quota_max_bytes":0,"quota_max_objects":0,"tiers":[],"tier_of":-1,"read_tier":-1,"write_tier":-1,"cache_mode":"none","target_max_bytes":0,"target_max_objects":0,"cache_target_dirty_ratio_micro":400000,"cache_target_dirty_high_ratio_micro":600000,"cache_target_full_ratio_micro":800000,"cache_min_flush_age":0,"cache_min_evict_age":0,"erasure_code_profile":"default","hit_set_params":{"type":"none"},"hit_set_period":0,"hit_set_count":0,"use_gmt_hitset":true,"min_read_recency_for_promote":0,"min_write_recency_for_promote":0,"hit_set_grade_decay_rate":0,"hit_set_search_last_n":0,"grade_table":[],"stripe_width":8192,"expected_num_objects":0,"fast_read":false,"options":{},"application_metadata":{}}],"osds":[{"osd":0,"uuid":"72fcebcc-b944-45b3-8c6f-eace2d3c688f","up":1,"in":1,"weight":1.000000,"primary_affinity":1.000000,"last_clean_begin":0,"last_clean_end":0,"up_from":9,"up_thru":36,"down_at":0,"lost_at":0,"public_addr":"172.21.15.195:6800/1132005","cluster_addr":"172.21.15.195:6801/1132005","heartbeat_back_addr":"172.21.15.195:6802/1132005","heartbeat_front_addr":"172.21.15.195:6803/1132005","state":["exists","up"]},{"osd":1,"uuid":"1bd4321d-1121-4d0c-b732-4aab7a2ce9a0","up":1,"in":1,"weight":1.000000,"primary_affinity":1.000000,"last_clean_begin":0,"last_clean_end":0,"up_from":10,"up_thru":50,"down_at":0,"lost_at":0,"public_addr":"172.21.15.195:6804/1132157","cluster_addr":"172.21.15.195:6805/1132157","heartbeat_back_addr":"172.21.15.195:6806/1132157","heartbeat_front_addr":"172.21.15.195:6807/1132157","state":["exists","up"]},{"osd":2,"uuid":"b72c6b53-be32-4cb8-b926-63174e76c407","up":1,"in":1,"weight":1.000000,"primary_affinity":1.000000,"last_clean_begin":0,"last_clean_end":0,"up_from":11,"up_thru":12,"down_at":0,"lost_at":0,"public_addr":"172.21.15.195:6808/1132318","cluster_addr":"172.21.15.195:6809/1132318","heartbeat_back_addr":"172.21.15.195:6810/1132318","heartbeat_front_addr":"172.21.15.195:6811/1132318","state":["exists","up"]},{"osd":3,"uuid":"e52ef3bb-a277-4956-aa82-640c19ded198","up":1,"in":1,"weight":1.000000,"primary_affinity":1.000000,"last_clean_begin":0,"last_clean_end":0,"up_from":5,"up_thru":44,"down_at":0,"lost_at":0,"public_addr":"172.21.15.111:6801/18098","cluster_addr":"172.21.15.111:6802/18098","heartbeat_back_addr":"172.21.15.111:6803/18098","heartbeat_front_addr":"172.21.15.111:6804/18098","state":["exists","up"]},{"osd":4,"uuid":"8b0944b6-aefa-4a7d-8ce2-9bdb945c59b7","up":1,"in":1,"weight":1.000000,"primary_affinity":1.000000,"last_clean_begin":0,"last_clean_end":0,"up_from":6,"up_thru":28,"down_at":0,"lost_at":0,"public_addr":"172.21.15.111:6805/18262","cluster_addr":"172.21.15.111:6806/18262","heartbeat_back_addr":"172.21.15.111:6807/18262","heartbeat_front_addr":"172.21.15.111:6808/18262","state":["exists","up"]},{"osd":5,"uuid":"23f7cba1-8118-4045-965a-5a206530681e","up":1,"in":1,"weight":1.000000,"primary_affinity":1.000000,"last_clean_begin":0,"last_clean_end":0,"up_from":8,"up_thru":55,"down_at":0,"lost_at":0,"public_addr":"172.21.15.111:6809/18432","cluster_addr":"172.21.15.111:6810/18432","heartbeat_back_addr":"172.21.15.111:6811/18432","heartbeat_front_addr":"172.21.15.111:6812/18432","state":["exists","up"]}],"osd_xinfo":[{"osd":0,"down_stamp":"0.000000","laggy_probability":0.000000,"laggy_interval":0,"features":2305244844532236283,"old_weight":0},{"osd":1,"down_stamp":"0.000000","laggy_probability":0.000000,"laggy_interval":0,"features":2305244844532236283,"old_weight":0},{"osd":2,"down_stamp":"0.000000","laggy_probability":0.000000,"laggy_interval":0,"features":2305244844532236283,"old_weight":0},{"osd":3,"down_stamp":"0.000000","laggy_probability":0.000000,"laggy_interval":0,"features":2305244844532236283,"old_weight":0},{"osd":4,"down_stamp":"0.000000","laggy_probability":0.000000,"laggy_interval":0,"features":2305244844532236283,"old_weight":0},{"osd":5,"down_stamp":"0.000000","laggy_probability":0.000000,"laggy_interval":0,"features":2305244844532236283,"old_weight":0}],"pg_upmap":[],"pg_upmap_items":[],"pg_temp":[],"primary_temp":[],"blacklist":{},"erasure_code_profiles":{"default":{"crush-failure-domain":"osd","k":"2","m":"1","plugin":"jerasure","ruleset-failure-domain":"osd","technique":"reed_sol_van"}}}
2017-07-20T06:42:08.486 INFO:teuthology.orchestra.run.smithi195:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --format=json osd map repair_pool_3 repair_test_obj'
2017-07-20T06:42:08.644 INFO:teuthology.orchestra.run.smithi195.stdout:
2017-07-20T06:42:08.644 INFO:teuthology.orchestra.run.smithi195.stdout:{"epoch":57,"pool":"repair_pool_3","pool_id":10,"objname":"repair_test_obj","raw_pgid":"10.3df68405","pgid":"10.0","up":[5,4,0],"up_primary":5,"acting":[5,4,0],"acting_primary":5}
2017-07-20T06:42:08.644 DEBUG:tasks.ceph.osd.5:waiting for process to exit
2017-07-20T06:42:08.645 INFO:teuthology.orchestra.run:waiting for 300
2017-07-20T06:42:08.658 INFO:tasks.ceph.osd.5:Stopped
2017-07-20T06:42:08.659 INFO:tasks.ceph.ceph_manager.ceph:object=$(sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal --pgid 10.0s0 --op list |grep '"oid":"repair_test_obj"')
sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal  --pgid 10.0s0 "$object" rm-attr hinfo_key
2017-07-20T06:42:08.659 INFO:teuthology.orchestra.run.smithi111:Running: 'bash -e -x -c \'object=$(sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal --pgid 10.0s0 --op list |grep \'"\'"\'"oid":"repair_test_obj"\'"\'"\')\nsudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal  --pgid 10.0s0 "$object" rm-attr hinfo_key\''
2017-07-20T06:42:08.825 INFO:teuthology.orchestra.run.smithi111.stderr:++ sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal --pgid 10.0s0 --op list
2017-07-20T06:42:08.827 INFO:teuthology.orchestra.run.smithi111.stderr:++ grep '"oid":"repair_test_obj"'
2017-07-20T06:42:08.827 INFO:teuthology.orchestra.run.smithi111.stderr:+ object='["10.0s0",{"oid":"repair_test_obj","key":"","snapid":-2,"hash":1039565829,"max":0,"pool":10,"namespace":"","shard_id":0,"max":0}]'
2017-07-20T06:42:08.827 INFO:teuthology.orchestra.run.smithi111.stderr:+ sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal --pgid 10.0s0 '["10.0s0",{"oid":"repair_test_obj","key":"","snapid":-2,"hash":1039565829,"max":0,"pool":10,"namespace":"","shard_id":0,"max":0}]' rm-attr hinfo_key
2017-07-20T06:42:08.827 INFO:tasks.ceph.osd.5:Restarting daemon
2017-07-20T06:42:08.827 INFO:teuthology.orchestra.run.smithi111:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f --cluster ceph -i 5'
2017-07-20T06:42:08.894 INFO:tasks.ceph.osd.5:Started
2017-07-20T06:42:08.894 INFO:teuthology.orchestra.run.smithi111:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok dump_ops_in_flight'
2017-07-20T06:42:08.936 INFO:tasks.ceph.osd.5.smithi111.stdout:starting osd.5 at - osd_data /var/lib/ceph/osd/ceph-5 /var/lib/ceph/osd/ceph-5/journal
2017-07-20T06:42:08.953 INFO:tasks.ceph.osd.5.smithi111.stderr:2017-07-20 06:42:08.958566 7faf5d877c80 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-07-20T06:42:08.962 INFO:tasks.ceph.osd.5.smithi111.stderr:2017-07-20 06:42:08.967541 7faf5d877c80 -1 osd.5 57 log_to_monitors {default=true}
2017-07-20T06:42:08.965 INFO:tasks.ceph.osd.5.smithi111.stderr:2017-07-20 06:42:08.970053 7faf5d877c80 -1 osd.5 57 mon_cmd_maybe_osd_create fail: 'you must complete the upgrade and 'ceph osd require-osd-release luminous' before using crush device classes': (1) Operation not permitted
2017-07-20T06:42:08.994 INFO:tasks.repair_test:scrubbing
...
2017-07-20T06:42:10.633 INFO:teuthology.orchestra.run.smithi195.stderr:Error EAGAIN: pg 10.0 primary osd.5 not up
2017-07-20T06:42:10.645 ERROR:teuthology.run_tasks:Saw exception from tasks.

/a/sage-2017-07-20_05:37:32-rados-wip-sage-testing2-distro-basic-smithi/1424147

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #22818: jewel: repair_test fails due to race with osd startResolvedNathan CutlerActions
Actions #1

Updated by Sage Weil almost 7 years ago

  • Status changed from 12 to Fix Under Review
Actions #2

Updated by Kefu Chai almost 7 years ago

  • Status changed from Fix Under Review to Resolved
  • Assignee set to Sage Weil
Actions #3

Updated by David Zafman about 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to jewel
Actions #4

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #22818: jewel: repair_test fails due to race with osd start added
Actions #5

Updated by Nathan Cutler about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF