Project

General

Profile

Actions

Bug #55407

closed

quincy osd's fail to boot and crash

Added by Gonzalo Aguilar Delgado almost 2 years ago. Updated almost 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Security
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a cluster with pacific. One of the osd started to crash...

So I zapped the disk and recreated again. I found that it no longer boots... Crashing same way the other osd was crashing...

epoch 31
fsid 9028f4da-0d77-462b-be9b-dbdf7fa57771
last_changed 2021-11-01T21:39:34.621391+0000
created 2014-04-19T08:45:51.315721+0000
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] mon.red-compute
1: [v2:172.16.0.119:3300/0,v1:172.16.0.119:6789/0] mon.blue-compute
2: [v2:172.16.99.10:3300/0,v1:172.16.99.10:6789/0] mon.cadet-compute

-161> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: _finish_auth 0
-160> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-04-21T20:15:06.652659+0000)
-159> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-158> 2022-04-21T20:15:36.649+0000 7f186e6f05c0 5 monclient: authenticate success, global_id 39712085
-157> 2022-04-21T20:15:36.649+0000 7f186e6f05c0 10 monclient: wait_auth_rotating waiting for 30
-156> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: handle_config config(12 keys) v1
-155> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: handle_monmap mon_map magic: 0 v1
-154> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: got monmap 31 from mon.cadet-compute (according to old e31)
-153> 2022-04-21T20:15:36.649+0000 7f186a966640 4 set_mon_vals no callback set
-152> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: dump:
epoch 31
fsid 9028f4da-0d77-462b-be9b-dbdf7fa57771
last_changed 2021-11-01T21:39:34.621391+0000
created 2014-04-19T08:45:51.315721+0000
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] mon.red-compute
1: [v2:172.16.0.119:3300/0,v1:172.16.0.119:6789/0] mon.blue-compute
2: [v2:172.16.99.10:3300/0,v1:172.16.99.10:6789/0] mon.cadet-compute
-151> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: _finish_auth 0
-150> 2022-04-21T20:15:36.649+0000 7f184dff3640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-04-21T20:15:06.653651+0000)
-149> 2022-04-21T20:15:36.649+0000 7f186e6f05c0 10 monclient: wait_auth_rotating done
-148> 2022-04-21T20:15:36.777+0000 7f186e6f05c0 10 start_mon_command cmd=[{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["3"]}]
-147> 2022-04-21T20:15:36.777+0000 7f186e6f05c0 10 monclient: _send_command 1 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["3"]}]
-146> 2022-04-21T20:15:36.777+0000 7f186e6f05c0 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-145> 2022-04-21T20:15:39.545+0000 7f18217fa640 5 osd.3 1655897 heartbeat osd_stat(store_statfs(0xe8359f7000/0x0/0xe8e0db6000, data 0x7272479d/0xa542d000, compress 0x0/0x0/0x0, omap 0x4d3, meta 0x5f8fb2d), peers [] op hist [])
-144> 2022-04-21T20:15:39.645+0000 7f184cff1640 10 monclient: tick
-143> 2022-04-21T20:15:39.645+0000 7f184cff1640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-04-21T20:15:09.647002+0000)
-142> 2022-04-21T20:15:39.725+0000 7f184dff3640 10 monclient: handle_mon_command_ack 1 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["3"]}]
-141> 2022-04-21T20:15:39.725+0000 7f184dff3640 10 monclient: _finish_command 1 = system:0 osd.3 already set to class hdd. set-device-class item id 3 name 'osd.3' device_class 'hdd': no change. set osd(s) to class 'hdd'
-140> 2022-04-21T20:15:39.725+0000 7f186e6f05c0 10 start_mon_command cmd=[{"prefix": "osd crush create-or-move", "id": 3, "weight":0.9097, "args": ["host=red-compute", "root=default"]}]
-139> 2022-04-21T20:15:39.725+0000 7f186e6f05c0 10 monclient: _send_command 2 [{"prefix": "osd crush create-or-move", "id": 3, "weight":0.9097, "args": ["host=red-compute", "root=default"]}]
-138> 2022-04-21T20:15:39.725+0000 7f186e6f05c0 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-137> 2022-04-21T20:15:39.837+0000 7f184dff3640 10 monclient: handle_mon_command_ack 2 [{"prefix": "osd crush create-or-move", "id": 3, "weight":0.9097, "args": ["host=red-compute", "root=default"]}]
-136> 2022-04-21T20:15:39.837+0000 7f184dff3640 10 monclient: _finish_command 2 = system:0 create-or-move updated item name 'osd.3' weight 0.9097 at location {host=red-compute,root=default} to crush map
-135> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 0 osd.3 1655897 done with init, starting boot process
-134> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 10 monclient: _renew_subs
-133> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-132> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 1 osd.3 1655897 start_boot
-131> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-130> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 1 osd.3 1655897 maybe_override_max_osd_capacity_for_qos default_iops: 315.00 cur_iops: 227.69. Skip OSD benchmark test.
-129> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 1 osd.3 1655897 maybe_override_options_for_qos: Changing recovery/backfill/sleep settings for QoS
-128> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command status hook 0x5599e294be30
-127> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command flush_journal hook 0x5599e294be30
-126> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_ops_in_flight hook 0x5599e294be30
-125> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command ops hook 0x5599e294be30
-124> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_blocked_ops hook 0x5599e294be30
-123> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_historic_ops hook 0x5599e294be30
-122> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_historic_slow_ops hook 0x5599e294be30
-121> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_historic_ops_by_duration hook 0x5599e294be30
-120> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_op_pq_state hook 0x5599e294be30
-119> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_blocklist hook 0x5599e294be30
-118> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_watchers hook 0x5599e294be30
-117> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_recovery_reservations hook 0x5599e294be30
-116> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_scrub_reservations hook 0x5599e294be30
-115> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command get_latest_osdmap hook 0x5599e294be30
-114> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command set_heap_property hook 0x5599e294be30
-113> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command get_heap_property hook 0x5599e294be30
-112> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_objectstore_kv_stats hook 0x5599e294be30
-111> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_scrubs hook 0x5599e294be30
-110> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command calc_objectstore_db_histogram hook 0x5599e294be30
-109> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command flush_store_cache hook 0x5599e294be30
-108> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_pgstate_history hook 0x5599e294be30
-107> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command compact hook 0x5599e294be30
-106> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command get_mapped_pools hook 0x5599e294be30
-105> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command smart hook 0x5599e294be30
-104> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command list_devices hook 0x5599e294be30
-103> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command send_beacon hook 0x5599e294be30
-102> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_osd_network hook 0x5599e294be30
-101> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_pool_statfs hook 0x5599e294be30
-100> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command setomapval hook 0x5599e8b8aef0
-99> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command rmomapkey hook 0x5599e8b8aef0
-98> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command setomapheader hook 0x5599e8b8aef0
-97> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command getomap hook 0x5599e8b8aef0
-96> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command truncobj hook 0x5599e8b8aef0
-95> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command injectdataerr hook 0x5599e8b8aef0
-94> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command injectmdataerr hook 0x5599e8b8aef0
-93> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command set_recovery_delay hook 0x5599e8b8aef0
-92> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command injectfull hook 0x5599e8b8aef0
-91> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command bench hook 0x5599e294be30
-90> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command cluster_log hook 0x5599e294be30
-89> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command flush_pg_stats hook 0x5599e294be30
-88> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command heap hook 0x5599e294be30
-87> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command debug dump_missing hook 0x5599e294be30
-86> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command debug kick_recovery_wq hook 0x5599e294be30
-85> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command cpu_profiler hook 0x5599e294be30
-84> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command dump_pg_recovery_stats hook 0x5599e294be30
-83> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command reset_pg_recovery_stats hook 0x5599e294be30
-82> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command cache drop hook 0x5599e294be30
-81> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command cache status hook 0x5599e294be30
-80> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command scrub_purged_snaps hook 0x5599e294be30
-79> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command scrubdebug hook 0x5599e294be30
-78> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command pg hook 0x5599e294be30
-77> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command pg hook 0x5599e294be30
-76> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command pg hook 0x5599e294be30
-75> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command pg hook 0x5599e294be30
-74> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command pg hook 0x5599e294be30
-73> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command query hook 0x5599e294be30
-72> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command mark_unfound_lost hook 0x5599e294be30
-71> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command list_unfound hook 0x5599e294be30
-70> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command scrub hook 0x5599e294be30
-69> 2022-04-21T20:15:39.837+0000 7f186e6f05c0 5 asok(0x5599e2718530) register_command deep_scrub hook 0x5599e294be30
-68> 2022-04-21T20:15:39.849+0000 7f184dff3640 4 mgrc handle_mgr_map Got map version 3237
-67> 2022-04-21T20:15:39.849+0000 7f184dff3640 4 mgrc handle_mgr_map Active mgr is now [v2:172.16.0.100:6826/17066,v1:172.16.0.100:6827/17066]
-66> 2022-04-21T20:15:39.849+0000 7f184dff3640 4 mgrc reconnect Starting new session with [v2:172.16.0.100:6826/17066,v1:172.16.0.100:6827/17066]
-65> 2022-04-21T20:15:39.849+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17f80293c0 auth_method 0
-64> 2022-04-21T20:15:39.853+0000 7f184dff3640 4 mgrc handle_mgr_configure stats_period=5
-63> 2022-04-21T20:15:39.853+0000 7f184dff3640 4 mgrc handle_mgr_configure updated stats threshold: 5
-62> 2022-04-21T20:15:39.981+0000 7f184dff3640 10 monclient: handle_get_version_reply finishing 1 version 1655911
-61> 2022-04-21T20:15:39.981+0000 7f1869152640 5 osd.3 1655897 heartbeat osd_stat(store_statfs(0xe8359f7000/0x0/0xe8e0db6000, data 0x7272479d/0xa542d000, compress 0x0/0x0/0x0, omap 0x4d3, meta 0x5f8fb2d), peers [] op hist [])
-60> 2022-04-21T20:15:39.981+0000 7f18527fc640 -1 osd.3 1655897 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
-59> 2022-04-21T20:15:39.981+0000 7f18527fc640 1 osd.3 1655897 set_numa_affinity setting numa affinity to node 0 cpus 0-19
-58> 2022-04-21T20:15:40.005+0000 7f18527fc640 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-57> 2022-04-21T20:15:40.645+0000 7f184cff1640 10 monclient: tick
-56> 2022-04-21T20:15:40.645+0000 7f184cff1640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-04-21T20:15:10.647148+0000)
-55> 2022-04-21T20:15:40.657+0000 7f185b7fe640 1 osd.3 1655897 tick checking mon for new map
-54> 2022-04-21T20:15:40.657+0000 7f185b7fe640 10 monclient: _renew_subs
-53> 2022-04-21T20:15:40.657+0000 7f185b7fe640 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-52> 2022-04-21T20:15:40.657+0000 7f184dff3640 3 osd.3 1655897 handle_osd_map epochs [1655898,1655911], i have 1655897, src has [1491297,1655911]
-51> 2022-04-21T20:15:41.225+0000 7f184dff3640 3 osd.3 1655911 handle_osd_map epochs [1655912,1655912], i have 1655911, src has [1491297,1655912]
-50> 2022-04-21T20:15:41.241+0000 7f18357fa640 1 osd.3 1655912 state: booting -> active
-49> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f17dc024d10 auth_method 0
-48> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc001f30 auth_method 0
-47> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f17dc00a5c0 auth_method 0
-46> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc030770 auth_method 0
-45> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f17dc0364a0 auth_method 0
-44> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc0102f0 auth_method 0
-43> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc0278d0 auth_method 0
-42> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc007aa0 auth_method 0
-41> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc039060 auth_method 0
-40> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f17dc0134a0 auth_method 0
-39> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc033330 auth_method 0
-38> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc00d750 auth_method 0
-37> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f17dc01bda0 auth_method 0
-36> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc004960 auth_method 0
-35> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc019220 auth_method 0
-34> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc016040 auth_method 0
-33> 2022-04-21T20:15:41.245+0000 7f18217fa640 5 osd.3 1655912 heartbeat osd_stat(store_statfs(0xe8359f7000/0x0/0xe8e0db6000, data 0x7272479d/0xa542d000, compress 0x0/0x0/0x0, omap 0x4d3, meta 0x5f8fb2d), peers [0,2,4,5,6,7,9,10,11,12] op hist [])
-32> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc021ba0 auth_method 0
-31> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f17dc02d600 auth_method 0
-30> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f17dc02aa40 auth_method 0
-29> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: get_auth_request con 0x7f17dc01efe0 auth_method 0
-28> 2022-04-21T20:15:41.245+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f18640035e0 auth_method 0
-27> 2022-04-21T20:15:41.245+0000 7f186b968640 10 monclient: get_auth_request con 0x7f186402b0f0 auth_method 0
-26> 2022-04-21T20:15:41.245+0000 7f186c169640 10 monclient: handle_auth_request added challenge on 0x7f1864039130
-25> 2022-04-21T20:15:41.249+0000 7f186c96a640 10 monclient: handle_auth_request added challenge on 0x7f186400d570
-24> 2022-04-21T20:15:41.249+0000 7f18377fe640 2 osd.3 1655912 ms_handle_reset con 0x7f1864039130 session 0x7f185c055040
-23> 2022-04-21T20:15:41.249+0000 7f18377fe640 2 osd.3 1655912 ms_handle_reset con 0x7f186400d570 session 0x7f1864034e30
-22> 2022-04-21T20:15:41.249+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f186405e430 auth_method 0
-21> 2022-04-21T20:15:41.249+0000 7f186b968640 10 monclient: get_auth_request con 0x7f1864034520 auth_method 0
-20> 2022-04-21T20:15:41.249+0000 7f186c169640 10 monclient: get_auth_request con 0x7f18600bf040 auth_method 0
-19> 2022-04-21T20:15:41.249+0000 7f186c96a640 10 monclient: get_auth_request con 0x7f18600968d0 auth_method 0
-18> 2022-04-21T20:15:41.249+0000 7f186b968640 10 monclient: get_auth_request con 0x7f1864020430 auth_method 0
-17> 2022-04-21T20:15:41.249+0000 7f186c169640 10 monclient: get_auth_request con 0x7f1864019420 auth_method 0
-16> 2022-04-21T20:15:41.249+0000 7f186c169640 10 monclient: get_auth_request con 0x7f1860096420 auth_method 0
-15> 2022-04-21T20:15:41.337+0000 7f186b968640 10 monclient: handle_auth_request added challenge on 0x7f1864075470
-14> 2022-04-21T20:15:41.337+0000 7f186c169640 10 monclient: handle_auth_request added challenge on 0x7f186404d930
-13> 2022-04-21T20:15:41.337+0000 7f186c96a640 10 monclient: handle_auth_request added challenge on 0x7f186404c920
-12> 2022-04-21T20:15:41.337+0000 7f186b968640 10 monclient: handle_auth_request added challenge on 0x7f1864086920
-11> 2022-04-21T20:15:41.365+0000 7f186c169640 10 monclient: handle_auth_request added challenge on 0x7f1864074be0
-10> 2022-04-21T20:15:41.365+0000 7f186b968640 10 monclient: handle_auth_request added challenge on 0x7f186408d480
-9> 2022-04-21T20:15:41.633+0000 7f184e7f4640 5 rocksdb: commit_cache_size High Pri Pool Ratio set to 0.421053
-8> 2022-04-21T20:15:41.633+0000 7f184e7f4640 5 rocksdb: commit_cache_size High Pri Pool Ratio set to 0.153846
-7> 2022-04-21T20:15:41.633+0000 7f184e7f4640 5 bluestore.MempoolThread(0x5599e2842528) _resize_shards cache_size: 1020054732 kv_alloc: 436207616 kv_used: 53315448 kv_onode_alloc: 159383552 kv_onode_used: 1430760 meta_alloc: 314572800 meta_used: 59936 data_alloc: 100663296 data_used: 12288
-6> 2022-04-21T20:15:41.645+0000 7f184cff1640 10 monclient: tick
-5> 2022-04-21T20:15:41.645+0000 7f184cff1640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-04-21T20:15:11.647266+0000)
-4> 2022-04-21T20:15:41.649+0000 7f1853fff640 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-3> 2022-04-21T20:15:41.773+0000 7f186c96a640 10 monclient: handle_auth_request added challenge on 0x7f1864001260
-2> 2022-04-21T20:15:41.773+0000 7f186c169640 10 monclient: handle_auth_request added challenge on 0x7f1864016320
-1> 2022-04-21T20:15:42.457+0000 7f186b968640 10 monclient: handle_auth_request added challenge on 0x7f18640a18c0
0> 2022-04-21T20:15:42.461+0000 7f186c96a640 -1 ** Caught signal (Segmentation fault) *
in thread 7f186c96a640 thread_name:msgr-worker-0
ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quincy (dev)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f186e93f520]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_pwl
0/ 5 journaler
0/ 5 objectcacher
0/ 5 immutable_obj_cache
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/ 5 rgw_datacache
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 fuse
2/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
1/ 5 prioritycache
0/ 5 test
0/ 5 cephfs_mirror
0/ 5 cephsqlite
0/ 5 seastore
0/ 5 seastore_onode
0/ 5 seastore_odata
0/ 5 seastore_omap
0/ 5 seastore_tm
0/ 5 seastore_cleaner
0/ 5 seastore_lba
0/ 5 seastore_cache
0/ 5 seastore_journal
0/ 5 seastore_device
0/ 5 alienstore
1/ 5 mclock
2/-2 (syslog threshold)
99/99 (stderr threshold)
--
pthread ID / name mapping for recent threads ---
7f18217fa640 / osd_srv_heartbt
7f1834ff9640 / ceph-osd
7f18357fa640 / cfin
7f18377fe640 / ms_dispatch
7f184cff1640 / safe_timer
7f184dff3640 / ms_dispatch
7f184e7f4640 / bstore_mempool
7f184fff7640 / rocksdb:low0
7f18527fc640 / fn_anonymous
7f1853fff640 / safe_timer
7f185b7fe640 / safe_timer
7f1869152640 / io_context_pool
7f186a966640 / io_context_pool
7f186b167640 / admin_socket
7f186b968640 / msgr-worker-2
7f186c169640 / msgr-worker-1
7f186c96a640 / msgr-worker-0
7f186e6f05c0 / ceph-osd
max_recent 10000
max_new 10000
log_file /var/lib/ceph/crash/2022-04-21T20:15:42.463486Z_abab8a8b-c1df-4a92-a818-3d68829edf1e/log
--- end dump of recent events ---
ViolaciĆ³n de segmento (`core' generado)


Files

ceph-crash.log.bz2 (43.5 KB) ceph-crash.log.bz2 Gonzalo Aguilar Delgado, 04/25/2022 05:50 PM
Actions #1

Updated by Neha Ojha almost 2 years ago

  • Status changed from New to Need More Info

Did you see the same segmentation fault in quincy and pacific? Were you testing a custom build of ceph (17.1.0 is a dev build) Can you reproduce the issue on 17.2.0?

Actions #2

Updated by Gonzalo Aguilar Delgado almost 2 years ago

Neha Ojha wrote:

Did you see the same segmentation fault in quincy and pacific? Were you testing a custom build of ceph (17.1.0 is a dev build) Can you reproduce the issue on 17.2.0?

Nope... Pacific fails to start with:

2022-04-22T13:34:41.679+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-04-22T13:34:41.679+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) close
2022-04-22T13:34:41.971+0000 7fd5798ed080 0 osd.3:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-04-22T13:34:41.971+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2022-04-22T13:34:41.995+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open size 1000204886016 (0xe8e0db6000, 932 GiB) block_size 4096 (4 KiB) rotational discard not supported
2022-04-22T13:34:41.995+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-04-22T13:34:41.995+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) close
2022-04-22T13:34:42.331+0000 7fd5798ed080 0 osd.3:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _mount path /var/lib/ceph/osd/ceph-3
2022-04-22T13:34:42.371+0000 7fd5798ed080 0 bluestore(/var/lib/ceph/osd/ceph-3) _open_db_and_around read-only:0 repair:0
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open size 1000204886016 (0xe8e0db6000, 932 GiB) block_size 4096 (4 KiB) rotational discard not supported
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708800 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708800 /var/lib/ceph/osd/ceph-3/block) open size 1000204886016 (0xe8e0db6000, 932 GiB) block_size 4096 (4 KiB) rotational discard not supported
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-3/block size 932 GiB
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluefs mount
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluefs _init_alloc shared, id 1, capacity 0xe8e0db6000, block size 0x10000
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs _replay 0x11000: stop: unrecognized op 12
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs mount failed to replay log: (5) Input/output error
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_bluefs failed bluefs mount: (5) Input/output error
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_db failed to prepare db environment:
2022-04-22T13:34:42.419+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) close
2022-04-22T13:34:42.687+0000 7fd5798ed080 -1 osd.3 0 OSD:init: unable to mount object store
2022-04-22T13:34:42.687+0000 7fd5798ed080 -1 ** ERROR: osd init failed: (5) Input/output error

Actions #3

Updated by Igor Fedotov almost 2 years ago

2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs _replay 0x11000: stop: unrecognized op 12

@Gonzalo, AFAIU you're trying to downgrade the OSD from Q to P, right? The above error is apparently caused by such a downgrade, Quincy has got one more op in BlueFS log which is unrecognizable in Pacific... Just FYI

Actions #4

Updated by Gonzalo Aguilar Delgado almost 2 years ago

Igor Fedotov wrote:

2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs _replay 0x11000: stop: unrecognized op 12

@Gonzalo, AFAIU you're trying to downgrade the OSD from Q to P, right? The above error is apparently caused by such a downgrade, Quincy has got one more op in BlueFS log which is unrecognizable in Pacific... Just FYI

There are two issues here. Let me explain.

The osd is in pacific, and I used the ./ceph-objectstore-tool built from source because I had to test the solution given in (#53729) unfortunately this build is quincy... So I suppose that the tool made the switch.

I thought, no problem, because the host where the OSD was running (under docker with pacific) has ubuntu jammy so I can boot the same osd this time under baremetal. And I got the first trouble.

This one:

-7> 2022-04-21T20:15:41.633+0000 7f184e7f4640  5 bluestore.MempoolThread(0x5599e2842528) _resize_shards cache_size: 1020054732 kv_alloc: 436207616 kv_used: 53315448 kv_onode_alloc: 159383552 kv_onode_used: 1430760 meta_alloc: 314572800 meta_used: 59936 data_alloc: 100663296 data_used: 12288
-6> 2022-04-21T20:15:41.645+0000 7f184cff1640 10 monclient: tick
-5> 2022-04-21T20:15:41.645+0000 7f184cff1640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-04-21T20:15:11.647266+0000)
-4> 2022-04-21T20:15:41.649+0000 7f1853fff640 10 monclient: _send_mon_message to mon.cadet-compute at v2:172.16.99.10:3300/0
-3> 2022-04-21T20:15:41.773+0000 7f186c96a640 10 monclient: handle_auth_request added challenge on 0x7f1864001260
-2> 2022-04-21T20:15:41.773+0000 7f186c169640 10 monclient: handle_auth_request added challenge on 0x7f1864016320
-1> 2022-04-21T20:15:42.457+0000 7f186b968640 10 monclient: handle_auth_request added challenge on 0x7f18640a18c0
0> 2022-04-21T20:15:42.461+0000 7f186c96a640 -1 ** Caught signal (Segmentation fault) *

Now I built 17.2.0 to see if I can boot it... Will report back...

Actions #5

Updated by Gonzalo Aguilar Delgado almost 2 years ago

Ok. This is the situation:

1.- OSD built from scracth in pacific. (docker pull ceph/daemon:latest-pacific)(
2.- Used ./ceph-objectstore-tool built from scratch (:NitzanMordhai/ceph.git wip-nitzan-pglog-dups-not-trimmed) version 17.0.0-11722-gb40738fffcc (b40738fffcc55824da2bbbbae704c3bb268ceddd) quincy
3.- Booted ceph-osd with pacific > Error "unrecognized op 12"
4.
Booted ceph-osd with quincy ceph version 17.0.0-11722-gb40738fffcc (b40738fffcc55824da2bbbbae704c3bb268ceddd) quincy
Crashed with handle_auth_request posted first
5.- Booted ceph-osd that comes with ubuntu jammy (ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quincy (dev))
Crashed posted in previous comment
6.- Booted ceph-osd built from scratch from source v17.2.0 (ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable))
Crashed like this, can post whole log...

16> 2022-04-22T21:12:38.525+0000 7f7c6334d640 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_release_alloc(sync) 0x55bed7425500 []
-15> 2022-04-22T21:12:38.525+0000 7f7c59339640 30 osd.3 pg_epoch: 1657369 pg[9.fa( v 1492034'3544171 lc 0'0 (1492034'3542171,1492034'3544171] local-lis/les=1389434/1389436 n=512 ec=1492072/417 lis/c=1389434/1389434 les/c/f=1389436/1389436/11759 sis=1655034) [0,6,13] r=-1 lpr=1657329 pi=[1389434,1655034)/13 crt=1492034'3544171 mlcod 0'0 unknown mbc={}] lock
-14> 2022-04-22T21:12:38.525+0000 7f7c59339640 20 osd.3 op_wq(0) _process empty q, waiting
-13> 2022-04-22T21:12:38.525+0000 7f7c58b38640 30 osd.3 pg_epoch: 1657369 pg[9.a6( v 1492033'3891036 lc 0'0 (1492033'3889036,1492033'3891036] local-lis/les=1389434/1389436 n=521 ec=1492072/417 lis/c=1389434/1389434 les/c/f=1389436/1389436/11759 sis=1655034) [0,13,6] r=-1 lpr=1657329 pi=[1389434,1655034)/14 crt=1492033'3891036 mlcod 0'0 unknown mbc={}] lock
-12> 2022-04-22T21:12:38.525+0000 7f7c58b38640 30 osd.3 pg_epoch: 1657369 pg[9.ba( v 1492034'3544171 lc 0'0 (1492034'3542171,1492034'3544171] local-lis/les=1389434/1389436 n=512 ec=1492072/417 lis/c=1389434/1389434 les/c/f=1389436/1389436/11759 sis=1655034) [0,6,13] r=-1 lpr=1657329 pi=[1389434,1655034)/13 crt=1492034'3544171 mlcod 0'0 unknown mbc={}] lock
-11> 2022-04-22T21:12:38.525+0000 7f7c58b38640 20 osd.3 op_wq(1) _process empty q, waiting
-10> 2022-04-22T21:12:38.533+0000 7f7c6334d640 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x55bed7425800 kv_submitted
-9> 2022-04-22T21:12:38.533+0000 7f7c6334d640 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x55bed7425800 finishing
-8> 2022-04-22T21:12:38.533+0000 7f7c6334d640 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_release_alloc(sync) 0x55bed7425800 []
-7> 2022-04-22T21:12:38.533+0000 7f7c58337640 30 osd.3 pg_epoch: 1657369 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657329 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] lock
-6> 2022-04-22T21:12:38.533+0000 7f7c58337640 20 osd.3 op_wq(2) _process empty q, waiting
-5> 2022-04-22T21:12:38.549+0000 7f7c65351640 10 monclient: tick
-4> 2022-04-22T21:12:38.549+0000 7f7c65351640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-04-22T21:12:08.553167+0000)
-3> 2022-04-22T21:12:38.565+0000 7f7c6f365640 10 osd.3 1657369 tick
-2> 2022-04-22T21:12:38.565+0000 7f7c6f365640 10 osd.3 1657369 do_waiters -
start
1> 2022-04-22T21:12:38.565+0000 7f7c6f365640 10 osd.3 1657369 do_waiters - finish
0> 2022-04-22T21:12:38.565+0000 7f7c6f365640 20 osd.3 1657369 tick last_purged_snaps_scrub 2022-04-21T21:38:33.125086+0000 next 2022-04-23T07:45:20.937586+0000
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_pwl
0/ 5 journaler
0/ 5 objectcacher
0/ 5 immutable_obj_cache
0/ 5 client
99/99 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/ 5 rgw_datacache
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 compressor
10/10 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 fuse
2/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
5/ 5 prioritycache
0/ 5 test
0/ 5 cephfs_mirror
0/ 5 cephsqlite
0/ 5 seastore
0/ 5 seastore_onode
0/ 5 seastore_odata
0/ 5 seastore_omap
0/ 5 seastore_tm
0/ 5 seastore_t
0/ 5 seastore_cleaner
0/ 5 seastore_lba
0/ 5 seastore_lba_details
0/ 5 seastore_cache
0/ 5 seastore_journal
0/ 5 seastore_device
0/ 5 alienstore
1/ 5 mclock
2/-2 (syslog threshold)
99/99 (stderr threshold)
--
pthread ID / name mapping for recent threads ---
7f7c57335640 / tp_osd_tp
7f7c57b36640 / tp_osd_tp
7f7c58337640 / tp_osd_tp
7f7c58b38640 / tp_osd_tp
7f7c59339640 / tp_osd_tp
7f7c6234b640 / cfin
7f7c62b4c640 / bstore_kv_sync
7f7c6334d640 / bstore_kv_final
7f7c65351640 / safe_timer
7f7c66353640 / ms_dispatch
7f7c66b54640 / bstore_mempool
7f7c6c35f640 / safe_timer
7f7c6f365640 / safe_timer
max_recent 10000
max_new 10000
log_file /var/lib/ceph/crash/2022-04-22T21:12:38.505233Z_28466242-4600-4fae-9c7e-6b2246b75871/log
--- end dump of recent events ---
Abortado (`core' generado)
Actions #6

Updated by Gonzalo Aguilar Delgado almost 2 years ago

I saw the stacktrace. This time v17.2.0. Latest


28/1389429 n=488 ec=417/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657253 pi=[1389428,1655876)/9 crt=1492033'2724365 mlcod 0'0 unknown NOTIFY mbc={}] enter Start
   -90> 2022-04-22T21:22:42.756+0000 7f6743ecd640  1 osd.3 pg_epoch: 1657369 pg[9.12( v 1492033'2724365 lc 0'0 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=417/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657253 pi=[1389428,1655876)/9 crt=1492033'2724365 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
   -89> 2022-04-22T21:22:42.756+0000 7f6743ecd640  5 osd.3 pg_epoch: 1657369 pg[9.12( v 1492033'2724365 lc 0'0 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=417/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657253 pi=[1389428,1655876)/9 crt=1492033'2724365 mlcod 0'0 unknown NOTIFY mbc={}] exit Start 0.000010 0 0.000000
   -88> 2022-04-22T21:22:42.756+0000 7f6743ecd640  5 osd.3 pg_epoch: 1657369 pg[9.12( v 1492033'2724365 lc 0'0 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=417/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657253 pi=[1389428,1655876)/9 crt=1492033'2724365 mlcod 0'0 unknown NOTIFY mbc={}] enter Started/Stray
   -87> 2022-04-22T21:22:42.760+0000 7f6743ecd640 -1 /home/gaguilar/ceph/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f6743ecd640 time 2022-04-22T21:22:42.758474+0000
/home/gaguilar/ceph/ceph/src/os/bluestore/BlueStore.cc: 14136: FAILED ceph_assert(!c)

 ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x128) [0x560a37e111dd]
 2: ./ceph-osd(+0x581399) [0x560a37e11399]
 3: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1f2c) [0x560a384aa75c]
 4: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2e0) [0x560a384aff80]
 5: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xad) [0x560a37f67cbd]
 6: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0x1ae) [0x560a37f0f10e]
 7: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x2c7) [0x560a37f37ad7]
 8: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x5b) [0x560a381d481b]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x993) [0x560a37f288e3]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x414) [0x560a38600cf4]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560a38603254]
 12: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f67614b3b43]
 13: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f6761545a00]

   -86> 2022-04-22T21:22:42.768+0000 7f6743ecd640 -1 *** Caught signal (Aborted) **
 in thread 7f6743ecd640 thread_name:tp_osd_tp

 ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6761461520]
 2: pthread_kill()
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x182) [0x560a37e11237]
 6: ./ceph-osd(+0x581399) [0x560a37e11399]
 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1f2c) [0x560a384aa75c]
 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2e0) [0x560a384aff80]
 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xad) [0x560a37f67cbd]
 10: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0x1ae) [0x560a37f0f10e]
 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x2c7) [0x560a37f37ad7]
 12: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x5b) [0x560a381d481b]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x993) [0x560a37f288e3]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x414) [0x560a38600cf4]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560a38603254]
 16: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f67614b3b43]
 17: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f6761545a00]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

   -85> 2022-04-22T21:22:42.804+0000 7f674e6e2640 10 monclient: _renew_subs
   -84> 2022-04-22T21:22:42.804+0000 7f674e6e2640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
   -83> 2022-04-22T21:22:42.804+0000 7f67446ce640  5 osd.3 pg_epoch: 1657409 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657369 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] exit Reset 58.340556 42 0.000421
   -82> 2022-04-22T21:22:42.804+0000 7f67446ce640  5 osd.3 pg_epoch: 1657409 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657369 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] enter Started
   -81> 2022-04-22T21:22:42.804+0000 7f67446ce640  5 osd.3 pg_epoch: 1657409 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657369 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] enter Start
   -80> 2022-04-22T21:22:42.804+0000 7f67446ce640  1 osd.3 pg_epoch: 1657409 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657369 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Stray
   -79> 2022-04-22T21:22:42.804+0000 7f67446ce640  5 osd.3 pg_epoch: 1657409 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657369 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] exit Start 0.000008 0 0.000000
   -78> 2022-04-22T21:22:42.804+0000 7f67446ce640  5 osd.3 pg_epoch: 1657409 pg[9.bb( v 1492033'1478345 lc 0'0 (1492033'1476345,1492033'1478345] local-lis/les=1389428/1389430 n=541 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389430/1389430/11759 sis=1656105) [4,13,9] r=-1 lpr=1657369 pi=[1389428,1656105)/9 crt=1492033'1478345 mlcod 0'0 unknown mbc={}] enter Started/Stray
   -77> 2022-04-22T21:22:42.804+0000 7f67456d0640  5 osd.3 pg_epoch: 1657409 pg[9.d2( v 1492033'2724365 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657369 pi=[1389428,1655876)/9 crt=1492033'2724365 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Reset 58.412648 42 0.000472
   -76> 2022-04-22T21:22:42.804+0000 7f67456d0640  5 osd.3 pg_epoch: 1657409 pg[9.d2( v 1492033'2724365 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657369 pi=[1389428,1655876)/9 crt=1492033'2724365 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Started
   -75> 2022-04-22T21:22:42.804+0000 7f67456d0640  5 osd.3 pg_epoch: 1657409 pg[9.d2( v 1492033'2724365 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657369 pi=[1389428,1655876)/9 crt=1492033'2724365 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Start
   -74> 2022-04-22T21:22:42.804+0000 7f67456d0640  1 osd.3 pg_epoch: 1657409 pg[9.d2( v 1492033'2724365 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657369 pi=[1389428,1655876)/9 crt=1492033'2724365 lcod 0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Stray
   -73> 2022-04-22T21:22:42.804+0000 7f67456d0640  5 osd.3 pg_epoch: 1657409 pg[9.d2( v 1492033'2724365 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389429 n=488 ec=1492072/417 lis/c=1389428/1389428 les/c/f=1389429/1389429/11759 sis=1655876) [4,10,13] r=-1 lpr=1657369 pi=[1389428,1655876)/9 crt=1492033'2724365 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Start 0.000014 0 0.000000
   -72> 2022-04-22T21:22:42.804+0000 7f67456d0640  5 osd.3 pg_epoch: 1657409 pg[9.d2( v 1492033'2724365 (1492033'2722365,1492033'2724365] local-lis/les=1389428/1389

Actions #7

Updated by Gonzalo Aguilar Delgado almost 2 years ago

I managed to reproduce...

I install an OSD with the pacific version. Then I let it run for a while (10 min or so) and stop the OSD. I use ceph-objectstore-tool to trim dups from quincy... Since I ran it with quincy, something is changed in the bluefs layer and it doesn't boot again in pacific.

I run OSD with quincy, and after logging to the mons it crashed. Just it. Hope it helps. 100% reproducible.

Actions #8

Updated by Gonzalo Aguilar Delgado almost 2 years ago

The situation is even worse. Any osd created with ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quincy (dev)

And added to the cluster will crash. Will check If I missed something in the release notes, but It should not crash.
The log I attach is just after adding the osd to the cluster.

Actions #9

Updated by Gonzalo Aguilar Delgado almost 2 years ago

Gonzalo Aguilar Delgado wrote:

Neha Ojha wrote:

Did you see the same segmentation fault in quincy and pacific? Were you testing a custom build of ceph (17.1.0 is a dev build) Can you reproduce the issue on 17.2.0?

Nope... Pacific fails to start with:

2022-04-22T13:34:41.679+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-04-22T13:34:41.679+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) close
2022-04-22T13:34:41.971+0000 7fd5798ed080 0 osd.3:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-04-22T13:34:41.971+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2022-04-22T13:34:41.995+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open size 1000204886016 (0xe8e0db6000, 932 GiB) block_size 4096 (4 KiB) rotational discard not supported
2022-04-22T13:34:41.995+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-04-22T13:34:41.995+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) close
2022-04-22T13:34:42.331+0000 7fd5798ed080 0 osd.3:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _mount path /var/lib/ceph/osd/ceph-3
2022-04-22T13:34:42.371+0000 7fd5798ed080 0 bluestore(/var/lib/ceph/osd/ceph-3) _open_db_and_around read-only:0 repair:0
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) open size 1000204886016 (0xe8e0db6000, 932 GiB) block_size 4096 (4 KiB) rotational discard not supported
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708800 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bdev(0x55fe79708800 /var/lib/ceph/osd/ceph-3/block) open size 1000204886016 (0xe8e0db6000, 932 GiB) block_size 4096 (4 KiB) rotational discard not supported
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-3/block size 932 GiB
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluefs mount
2022-04-22T13:34:42.371+0000 7fd5798ed080 1 bluefs _init_alloc shared, id 1, capacity 0xe8e0db6000, block size 0x10000
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs _replay 0x11000: stop: unrecognized op 12
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs mount failed to replay log: (5) Input/output error
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_bluefs failed bluefs mount: (5) Input/output error
2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_db failed to prepare db environment:
2022-04-22T13:34:42.419+0000 7fd5798ed080 1 bdev(0x55fe79708400 /var/lib/ceph/osd/ceph-3/block) close
2022-04-22T13:34:42.687+0000 7fd5798ed080 -1 osd.3 0 OSD:init: unable to mount object store
2022-04-22T13:34:42.687+0000 7fd5798ed080 -1 ** ERROR: osd init failed: (5) Input/output error

WARN: Please forget this output since it's a different error. This is caused for upgrading the bluefs inadvertently by use ceph-objectstore-tool to repair osd.

Actions #10

Updated by Radoslaw Zarzynski almost 2 years ago

Hello Gonzalo!
Just a quick note from a bug srub: we don't support mixing the tool from a newer release with OSDs from older one.

Actions #11

Updated by Gonzalo Aguilar Delgado almost 2 years ago

It doesn't matter. This is just a side effect. I mean... The bug is not caused by the tool.
The bug is caused because the OSD (quicy) tries to connect to a pacific cluster and crashes...

I can create the OSD from zero and the situation is the same, the osd will not boot.

Actions #12

Updated by Neha Ojha almost 2 years ago

Gonzalo Aguilar Delgado wrote:

It doesn't matter. This is just a side effect. I mean... The bug is not caused by the tool.
The bug is caused because the OSD (quicy) tries to connect to a pacific cluster and crashes...

Could you please clarify what pacific cluster means here? What was the output of ceph versions?
From my understanding there are mixed versions involved here, but I'd like to fully understand which combination causes the issue for you.

I can create the OSD from zero and the situation is the same, the osd will not boot.

Actions #13

Updated by Gonzalo Aguilar Delgado almost 2 years ago

It seems I messed up everything... Let me startover.

I have a ceph cluster running since looooong time ago. Recently it was upgraded to pacific version and I found troubles (I will not describe), the cluster has became up with 12 OSD's in 4 different hosts. And I can say is up and running.

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 9.63698 root default
-2 3.81898 host blue-compute
0 hdd 1.00000 osd.0 up 1.00000 1.00000
2 hdd 1.00000 osd.2 up 0.95001 1.00000
4 hdd 1.81898 osd.4 up 1.00000 1.00000
-5 2.36800 host cadet-compute
1 hdd 0.03000 osd.1 up 0.90002 1.00000
5 hdd 0.03999 osd.5 up 0.90002 1.00000
7 hdd 0.03000 osd.7 up 0.90002 1.00000
11 hdd 0.45000 osd.11 up 1.00000 1.00000
13 hdd 1.81799 osd.13 up 1.00000 1.00000
-6 0.45000 host cobalt-compute
12 hdd 0.45000 osd.12 up 1.00000 1.00000
-3 3.00000 host red-compute
3 hdd 0 osd.3 down 0 1.00000
6 hdd 1.00000 osd.6 up 1.00000 1.00000
9 hdd 1.00000 osd.9 up 1.00000 1.00000
10 hdd 1.00000 osd.10 up 1.00000 1.00000

Because the troubles I had added a new OSD, called osd.3 to one of the hosts (the one down there).
In the host that's is running:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"

That means it comes with pacific version by default, I cannot downgrade.

ceph-osd --version
ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quincy (dev)

And I need this OSD.3 not fully get in into the cluster. It's just a way to restore blocks I did backup in past (but don't worry about this detail). So I rebalanced it to 0. The problem is that I can:

  • ceph osd purge 3 --force
  • ceph-volume lvm zap /dev/sda
  • /usr/bin/dd if=/dev/zero of=/dev/sda bs=1M count=100 conv=fsync
  • ceph-volume raw prepare --bluestore --data /dev/sda
  • ceph-osd --cluster ceph --default-log-to-stderr=true --err-to-stderr=true --default-log-to-file=false --foreground -i 3 -k /var/lib/ceph/osd/ceph-3/keyring
  • ceph osd crush reweight osd.3 0

Basically zap disk, and redo the OSD anytime.

Everytime I need. Well, this mostly works. But everytime I do this it crashes on the second boot. I mean, it boots, I CTRL+C after joined cluster and it's up and working and the second time I boot it crashes...

The crash can be seen in the original post, and in comment #4...

In comment #6 also breaks but this time with latest version of ceph built from scratch.

What I report is that maybe is not convenient mix OSD versions, but at least it should not crash. I would expect the OSD to join the cluster... It has nothing, no data. Just a clean OSD ready to be populated. So why it crashes...

Actions #14

Updated by Gonzalo Aguilar Delgado almost 2 years ago

Hi! Today I've just build master again.

Rebuilt osd and on first boot:
...
-79> 2022-05-16T09:10:21.707+0000 7fb1f1728640 10 monclient: _renew_subs
-78> 2022-05-16T09:10:21.707+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-77> 2022-05-16T09:10:21.707+0000 7fb1f5730640 3 osd.3 1739257 handle_osd_map epochs [1739258,1739297], i have 1739257, src has [1491297,1739460]
-76> 2022-05-16T09:10:21.751+0000 7fb1f1728640 10 monclient: _renew_subs
-75> 2022-05-16T09:10:21.751+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-74> 2022-05-16T09:10:21.751+0000 7fb1f5730640 3 osd.3 1739297 handle_osd_map epochs [1739298,1739337], i have 1739297, src has [1491297,1739460]
-73> 2022-05-16T09:10:21.807+0000 7fb1f1728640 10 monclient: _renew_subs
-72> 2022-05-16T09:10:21.807+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-71> 2022-05-16T09:10:21.811+0000 7fb1f5730640 3 osd.3 1739337 handle_osd_map epochs [1739338,1739377], i have 1739337, src has [1491297,1739460]
-70> 2022-05-16T09:10:21.859+0000 7fb1f1728640 10 monclient: _renew_subs
-69> 2022-05-16T09:10:21.859+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-68> 2022-05-16T09:10:21.859+0000 7fb1f5730640 3 osd.3 1739377 handle_osd_map epochs [1739378,1739417], i have 1739377, src has [1491297,1739460]
-67> 2022-05-16T09:10:21.903+0000 7fb1f1728640 10 monclient: _renew_subs
-66> 2022-05-16T09:10:21.903+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-65> 2022-05-16T09:10:21.915+0000 7fb1f5730640 3 osd.3 1739417 handle_osd_map epochs [1739418,1739457], i have 1739417, src has [1491297,1739460]
-64> 2022-05-16T09:10:21.967+0000 7fb1f1728640 10 monclient: _renew_subs
-63> 2022-05-16T09:10:21.967+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-62> 2022-05-16T09:10:21.967+0000 7fb1f5730640 3 osd.3 1739457 handle_osd_map epochs [1739458,1739460], i have 1739457, src has [1491297,1739460]
-61> 2022-05-16T09:10:21.991+0000 7fb1f1728640 5 osd.3 1739460 heartbeat osd_stat(store_statfs(0xe7a54d6000/0x0/0xe8e0db6000, data 0xab32b8c9/0xf770e000, compress 0x0/0x0/0x0, omap 0x0, meta 0x441d0000), peers [] op hist [])
-60> 2022-05-16T09:10:21.991+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-59> 2022-05-16T09:10:21.991+0000 7fb1f1728640 10 monclient: _renew_subs
-58> 2022-05-16T09:10:21.991+0000 7fb1f1728640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-57> 2022-05-16T09:10:22.027+0000 7fb1f5730640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-56> 2022-05-16T09:10:22.031+0000 7fb1f5730640 1 osd.3 1739460 start_boot
-55> 2022-05-16T09:10:22.031+0000 7fb1f5730640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-54> 2022-05-16T09:10:22.031+0000 7fb1f5730640 10 monclient: handle_get_version_reply finishing 2 version 1739460
-53> 2022-05-16T09:10:22.031+0000 7fb201194640 5 osd.3 1739460 heartbeat osd_stat(store_statfs(0xe7a54d6000/0x0/0xe8e0db6000, data 0xab32b8c9/0xf770e000, compress 0x0/0x0/0x0, omap 0x0, meta 0x441d0000), peers [] op hist [])
-52> 2022-05-16T09:10:22.039+0000 7fb1f472e640 10 monclient: tick
-51> 2022-05-16T09:10:22.039+0000 7fb1f472e640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-05-16T09:09:52.041726+0000)
-50> 2022-05-16T09:10:22.039+0000 7fb1f9f39640 -1 osd.3 1739460 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
-49> 2022-05-16T09:10:22.039+0000 7fb1f9f39640 1 osd.3 1739460 set_numa_affinity setting numa affinity to node 0 cpus 0-19
-48> 2022-05-16T09:10:22.063+0000 7fb1f9f39640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-47> 2022-05-16T09:10:22.411+0000 7fb1f5f31640 5 prioritycache tune_memory target: 2147483648 mapped: 378167296 unmapped: 335536128 heap: 713703424 old mem: 1020054731 new mem: 1020054731
-46> 2022-05-16T09:10:22.499+0000 7fb1f3f2d640 4 mgrc reconnect Starting new session with [v2:172.16.99.10:6842/11647,v1:172.16.99.10:6843/11647]
-45> 2022-05-16T09:10:22.499+0000 7fb202196640 10 monclient: get_auth_request con 0x561b0af07c00 auth_method 0
-44> 2022-05-16T09:10:22.503+0000 7fb1f5730640 4 mgrc ms_handle_reset ms_handle_reset con 0x561b0af07c00
-43> 2022-05-16T09:10:22.503+0000 7fb1f5730640 4 mgrc reconnect Terminating session with v2:172.16.99.10:6842/11647
-42> 2022-05-16T09:10:22.503+0000 7fb1f5730640 4 mgrc reconnect waiting to retry connect until 2713.683826s
-41> 2022-05-16T09:10:22.847+0000 7fb1fe742640 1 osd.3 1739460 tick checking mon for new map
-40> 2022-05-16T09:10:23.039+0000 7fb1f472e640 10 monclient: tick
-39> 2022-05-16T09:10:23.039+0000 7fb1f472e640 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-05-16T09:09:53.041842+0000)
-38> 2022-05-16T09:10:23.059+0000 7fb1f5f31640 5 rocksdb: commit_cache_size High Pri Pool Ratio set to 0.4
-37> 2022-05-16T09:10:23.059+0000 7fb1f5f31640 5 rocksdb: commit_cache_size High Pri Pool Ratio set to 0.168421
-36> 2022-05-16T09:10:23.059+0000 7fb1f5f31640 5 bluestore.MempoolThread(0x561b09d1cb38) _resize_shards cache_size: 1020054731 kv_alloc: 398458880 kv_used: 24928 kv_onode_alloc: 167772160 kv_onode_used: 8276112 meta_alloc: 339738624 meta_used: 13100328 data_alloc: 104857600 data_used: 0
-35> 2022-05-16T09:10:23.099+0000 7fb202196640 10 monclient: handle_auth_request added challenge on 0x561b0e47cc00
-34> 2022-05-16T09:10:23.099+0000 7fb203198640 10 monclient: handle_auth_request added challenge on 0x561b0e47d000
-33> 2022-05-16T09:10:23.131+0000 7fb1f5730640 10 monclient: _renew_subs
-32> 2022-05-16T09:10:23.131+0000 7fb1f5730640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-31> 2022-05-16T09:10:23.131+0000 7fb1f5730640 3 osd.3 1739460 handle_osd_map epochs [1739461,1739461], i have 1739460, src has [1491297,1739461]
-30> 2022-05-16T09:10:23.151+0000 7fb1f1728640 1 osd.3 1739461 state: booting -> active
-29> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b1d576400 auth_method 0
-28> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b298b0c00 auth_method 0
-27> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b1d577c00 auth_method 0
-26> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b1a61a000 auth_method 0
-25> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b1a61b800 auth_method 0
-24> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b1a61ac00 auth_method 0
-23> 2022-05-16T09:10:23.151+0000 7fb202997640 10 monclient: get_auth_request con 0x561b1d577000 auth_method 0
-22> 2022-05-16T09:10:23.187+0000 7fb1f5730640 3 osd.3 1739461 handle_osd_map epochs [1739461,1739461], i have 1739461, src has [1491297,1739461]
-21> 2022-05-16T09:10:23.411+0000 7fb1f5f31640 5 prioritycache tune_memory target: 2147483648 mapped: 378462208 unmapped: 335241216 heap: 713703424 old mem: 1020054731 new mem: 1020054731
-20> 2022-05-16T09:10:23.499+0000 7fb1f3f2d640 4 mgrc reconnect Starting new session with [v2:172.16.99.10:6842/11647,v1:172.16.99.10:6843/11647]
-19> 2022-05-16T09:10:23.547+0000 7fb1fb73c640 10 monclient: _send_mon_message to mon.blue-compute at v2:172.16.0.119:3300/0
-18> 2022-05-16T09:10:23.595+0000 7fb202196640 10 monclient: get_auth_request con 0x561b1d576800 auth_method 0
-17> 2022-05-16T09:10:23.595+0000 7fb203198640 10 monclient: get_auth_request con 0x561b1d577800 auth_method 0
-16> 2022-05-16T09:10:23.595+0000 7fb202196640 10 monclient: get_auth_request con 0x561b1d577400 auth_method 0
-15> 2022-05-16T09:10:23.595+0000 7fb203198640 10 monclient: get_auth_request con 0x561b1a61a800 auth_method 0
-14> 2022-05-16T09:10:23.595+0000 7fb202196640 10 monclient: get_auth_request con 0x561b218fe000 auth_method 0
-13> 2022-05-16T09:10:23.595+0000 7fb202196640 10 monclient: get_auth_request con 0x561b0e47d800 auth_method 0
-12> 2022-05-16T09:10:23.595+0000 7fb203198640 10 monclient: get_auth_request con 0x561b0e47dc00 auth_method 0
-11> 2022-05-16T09:10:23.595+0000 7fb202196640 10 monclient: get_auth_request con 0x561b1a61a400 auth_method 0
-10> 2022-05-16T09:10:23.595+0000 7fb202196640 10 monclient: get_auth_request con 0x561b1a61bc00 auth_method 0
-9> 2022-05-16T09:10:23.595+0000 7fb203198640 10 monclient: get_auth_request con 0x561b1a61b400 auth_method 0
-8> 2022-05-16T09:10:23.595+0000 7fb203198640 10 monclient: get_auth_request con 0x561b1d576c00 auth_method 0
-7> 2022-05-16T09:10:23.595+0000 7fb203198640 10 monclient: get_auth_request con 0x561b1d576000 auth_method 0
-6> 2022-05-16T09:10:23.599+0000 7fb203198640 10 monclient: get_auth_request con 0x561b0af07c00 auth_method 0
-5> 2022-05-16T09:10:23.599+0000 7fb202196640 10 monclient: get_auth_request con 0x561b1a61b000 auth_method 0
-4> 2022-05-16T09:10:23.599+0000 7fb1f5730640 4 mgrc handle_mgr_configure stats_period=5
-3> 2022-05-16T09:10:23.699+0000 7fb203198640 10 monclient: handle_auth_request added challenge on 0x561b1fe40400
-2> 2022-05-16T09:10:23.699+0000 7fb202997640 10 monclient: handle_auth_request added challenge on 0x561b1fe40c00
-1> 2022-05-16T09:10:23.699+0000 7fb202196640 10 monclient: handle_auth_request added challenge on 0x561b1fe40800
0> 2022-05-16T09:10:23.711+0000 7fb202997640 -1 ** Caught signal (Segmentation fault) *
in thread 7fb202997640 thread_name:msgr-worker-1

ceph version 17.0.0-12154-g6f78f2f42ea (6f78f2f42eaa5f84d25bf175563df9717f8c4203) quincy (dev)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fb2044a8520]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_pwl
0/ 5 journaler
0/ 5 objectcacher
0/ 5 immutable_obj_cache
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/ 5 rgw_datacache
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 fuse
2/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
1/ 5 prioritycache
0/ 5 test
0/ 5 cephfs_mirror
0/ 5 cephsqlite
0/ 5 seastore
0/ 5 seastore_onode
0/ 5 seastore_odata
0/ 5 seastore_omap
0/ 5 seastore_tm
0/ 5 seastore_t
0/ 5 seastore_cleaner
0/ 5 seastore_lba
0/ 5 seastore_lba_details
0/ 5 seastore_cache
0/ 5 seastore_journal
0/ 5 seastore_device
0/ 5 seastore_backref
0/ 5 alienstore
1/ 5 mclock
0/ 5 cyanstore
2/-2 (syslog threshold)
99/99 (stderr threshold)
--
pthread ID / name mapping for recent threads ---
7fb1e5f11640 / osd_srv_heartbt
7fb1f1728640 / cfin
7fb1f1f29640 / bstore_kv_sync
7fb1f3f2d640 / safe_timer
7fb1f472e640 / safe_timer
7fb1f5730640 / ms_dispatch
7fb1f5f31640 / bstore_mempool
7fb1f6732640 / rocksdb:high0
7fb1f6f33640 / rocksdb:low1
7fb1f7734640 / rocksdb:low0
7fb1f9f39640 / fn_anonymous
7fb1fb73c640 / safe_timer
7fb1fe742640 / safe_timer
7fb201194640 / io_context_pool
7fb202196640 / msgr-worker-2
7fb202997640 / msgr-worker-1
7fb203198640 / msgr-worker-0
max_recent 10000
max_new 10000
log_file /var/lib/ceph/crash/2022-05-16T09:10:23.706211Z_777e42fd-25c5-4efc-a27e-b02c6eb3110a/log
--- end dump of recent events ---
ViolaciĆ³n de segmento (`core' generado)

Last commit:
Merge: 12d0955b5fc 89430b1c39d
Author: Samuel Just <>
Date: Wed May 11 19:06:53 2022 -0700

root@red-compute:/mnt/ceph-recovery# LD_LIBRARY_PATH=lib/ bin/ceph-osd -d --cluster ceph --id 3 --default-log-to-stderr=true --err-to-stderr=true --default-log-to-file=false --foreground --setuser ceph --setgroup ceph --version
ceph version 17.0.0-12154-g6f78f2f42ea (6f78f2f42eaa5f84d25bf175563df9717f8c4203) quincy (dev)
root@red-compute:/mnt/ceph-recovery# ls

Hope it helps... Cannot boot any new osd

Actions #15

Updated by Radoslaw Zarzynski almost 2 years ago

This looks like something new and unrelated to other crashes in this ticket, so created a new one: https://tracker.ceph.com/issues/55698.

Actions #16

Updated by Gonzalo Aguilar Delgado almost 2 years ago

Radoslaw Zarzynski wrote:

This looks like something new and unrelated to other crashes in this ticket, so created a new one: https://tracker.ceph.com/issues/55698.

Thank you a lot!

Actions #17

Updated by Radoslaw Zarzynski almost 2 years ago

  • Status changed from Need More Info to Rejected

Closing this ticket. The new crash is tracked independently (https://tracker.ceph.com/issues/55698).

Actions

Also available in: Atom PDF