Bug #59241: [crimson] OSD logs with debug 20/20 are not captured - crimson - Ceph

Actions

Copy link

Bug #59241

closed

[crimson] OSD logs with debug 20/20 are not captured

Added by Harsh Kumar about 1 year ago. Updated 2 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Aishwarya Mathuria

Category:

Target version:

% Done:

Source:

Tags:

crimson

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v18.0.0

ceph-qa-suite:

Pull request ID:

54813

Crash signature (v1):

Crash signature (v2):

Description

OSD logs captured do not have enough debugging information even when debug level is set to 20/20
The same is not observed for a mgr log on the same cluster.

# ceph config dump
2023-03-31T01:05:26.037+0000 7fe84e08f700 -1 WARNING: the following dangerous and experimental features are enabled: crimson
2023-03-31T01:05:26.038+0000 7fe84e08f700 -1 WARNING: the following dangerous and experimental features are enabled: crimson
WHO     MASK                LEVEL     OPTION                                                      VALUE                                                                                              RO
global                      basic     container_image                                             quay.ceph.io/ceph-ci/ceph@sha256:b667b57bdc127d1992370596b283f9a9992248e5d4f4861a26e85538e5611454  * 
global                      advanced  enable_experimental_unrecoverable_data_corrupting_features  crimson                                                                                              
global                      basic     log_to_file                                                 true                                                                                                 
global                      advanced  mon_cluster_log_to_file                                     true                                                                                                 
global                      advanced  osd_pool_default_pg_autoscale_mode                          off                                                                                                  
mon                         advanced  auth_allow_insecure_global_id_reclaim                       false                                                                                                
mon                         advanced  osd_pool_default_crimson                                    true                                                                                                 
mon                         advanced  public_network                                              10.1.240.0/23                                                                                      * 
mgr                         advanced  debug_mgr                                                   20/20                                                                                                
mgr                         advanced  mgr/cephadm/container_init                                  True                                                                                               * 
mgr                         advanced  mgr/cephadm/migration_current                               5                                                                                                  * 
mgr                         advanced  mgr/dashboard/ALERTMANAGER_API_HOST                         http://host.containers.internal:9093                                                               * 
mgr                         advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY                        false                                                                                              * 
mgr                         advanced  mgr/dashboard/GRAFANA_API_URL                               https://host.containers.internal:3000                                                              * 
mgr                         advanced  mgr/dashboard/PROMETHEUS_API_HOST                           http://host.containers.internal:9095                                                               * 
mgr                         advanced  mgr/dashboard/ssl_server_port                               8443                                                                                               * 
mgr                         advanced  mgr/orchestrator/orchestrator                               cephadm                                                                                              
osd                         advanced  debug_osd                                                   20/20                                                                                                
osd     host:dell-r640-079  basic     osd_memory_target                                           43298420872                                                                                          
osd     host:dell-r640-083  basic     osd_memory_target                                           45803788834                                                                                          
osd     host:dell-r640-084  basic     osd_memory_target                                           44372132113                                                                                          
osd                         advanced  osd_memory_target_autotune                                  true

Crimson image used - https://shaman.ceph.com/repos/ceph/main/c3f1eeebbd5e7805720d5f665b68bdca160a5855/crimson/275574/

MGR logs capturing debug 20 level logging -

2023-03-31T01:09:17.696+0000 7f9ffcb22700 15 mgr get_health_checks getting health checks for volumes
2023-03-31T01:09:17.696+0000 7f9ffcb22700 10 mgr update_delta_stats  v19734
2023-03-31T01:09:17.696+0000 7f9ffcb22700 10 mgr.server operator() 65 pgs: 65 active+clean; 30 GiB data, 45 GiB used, 1.3 TiB / 1.3 TiB avail
2023-03-31T01:09:17.696+0000 7f9ffcb22700 10 mgr.server operator() 0 health checks
2023-03-31T01:09:17.696+0000 7f9ffcb22700 20 mgr.server operator() health checks:
{}

2023-03-31T01:09:17.696+0000 7f9ffcb22700  0 log_channel(cluster) log [DBG] : pgmap v19734: 65 pgs: 65 active+clean; 30 GiB data, 45 GiB used, 1.3 TiB / 1.3 TiB avail
2023-03-31T01:09:17.696+0000 7f9ffcb22700 20 mgr.server adjust_pgs
2023-03-31T01:09:17.696+0000 7f9ffcb22700 10 mgr.server operator() creating_or_unknown 0 max_creating 1024 left 1024
2023-03-31T01:09:17.696+0000 7f9ffcb22700 20 mgr.server operator() misplaced_ratio 0 degraded_ratio 0 inactive_pgs_ratio 0 unknown_pgs_ratio 0; target_max_misplaced_ratio 0.05
2023-03-31T01:09:17.699+0000 7f9fc5d3c700  0 [rbd_support DEBUG root] PerfHandler: tick
2023-03-31T01:09:17.699+0000 7f9fc5d3c700 10 ceph_option_get mgr_stats_period found: 5
2023-03-31T01:09:17.790+0000 7f9febb20700 10 mgr.server handle_report from 0x56526e234c00 mon.dell-r640-083
2023-03-31T01:09:17.790+0000 7f9febb20700 20 mgr.server handle_report updating existing DaemonState for mon.dell-r640-083
2023-03-31T01:09:17.790+0000 7f9febb20700 20 mgr update loading 0 new types, 0 old types, had 119 types, got 1150 bytes of data

Cluster status -

[ceph: root@dell-r640-079 /]# ceph -s
2023-03-31T01:10:12.300+0000 7f4ff4096700 -1 WARNING: the following dangerous and experimental features are enabled: crimson
2023-03-31T01:10:12.300+0000 7f4ff4096700 -1 WARNING: the following dangerous and experimental features are enabled: crimson
  cluster:
    id:     86890d16-cf04-11ed-8f60-78ac443b3b3c
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum dell-r640-079,dell-r640-083,dell-r640-084 (age 10h)
    mgr: dell-r640-079.fomsxp(active, since 10h), standbys: dell-r640-084.dxijhp
    osd: 9 osds: 9 up (since 10h), 9 in (since 10h)

  data:
    pools:   3 pools, 65 pgs
    objects: 78.06k objects, 30 GiB
    usage:   45 GiB used, 1.3 TiB / 1.3 TiB avail
    pgs:     65 active+clean

[ceph: root@dell-r640-079 /]#

Cephadm version -

# yum list | grep cephadm
cephadm.noarch                                         2:18.0.0-3168.gc3f1eeeb.el8                                 @ceph-noarch                     
ceph-mgr-cephadm.noarch                                2:18.0.0-3168.gc3f1eeeb.el8                                 ceph-noarch

# cephadm shell -- ceph -v
Inferring fsid 86890d16-cf04-11ed-8f60-78ac443b3b3c
Inferring config /var/lib/ceph/86890d16-cf04-11ed-8f60-78ac443b3b3c/mon.dell-r640-079/config
Using ceph image with id '39c6c7c7a15b' and tag 'c3f1eeebbd5e7805720d5f665b68bdca160a5855-crimson' created on 2023-03-29 22:11:49 +0000 UTC
quay.ceph.io/ceph-ci/ceph@sha256:b667b57bdc127d1992370596b283f9a9992248e5d4f4861a26e85538e5611454
ceph version 18.0.0-3168-gc3f1eeeb (c3f1eeebbd5e7805720d5f665b68bdca160a5855) reef (dev)

Cluster logs - https://drive.google.com/drive/folders/1jrCeceEepaPIl5JYxsSZ61fgaUI5ap1U?usp=sharing

Actions

Copy link

Updated by Aishwarya Mathuria 12 months ago

Assignee set to Aishwarya Mathuria

Actions

Copy link

Updated by Aishwarya Mathuria 11 months ago

With debug_osd set to 20, I can see debug logs from my vstart cluster

DEBUG 2023-05-23 17:28:33,172 [shard 0] osd -  pg_epoch 7 pg[1.4( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/6 les/c/f=7/7/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active+clean PeeringState::prepare_stats_for_publish reporting purged_snaps []
DEBUG 2023-05-23 17:28:33,172 [shard 0] osd -  pg_epoch 7 pg[1.4( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/6 les/c/f=7/7/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active+clean PeeringState::prepare_stats_for_publish publish_stats_to_osd 7:7
DEBUG 2023-05-23 17:28:33,172 [shard 0] osd - log is not dirty
DEBUG 2023-05-23 17:28:33,172 [shard 0] osd - do_peering_event handling epoch_sent: 7 epoch_requested: 7 AllReplicasRecovered for pg: 1.1
INFO  2023-05-23 17:28:33,172 [shard 0] osd -  pg_epoch 7 pg[1.1( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/0 les/c/f=7/0/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active exit Started/Primary/Active/Activating 0.124752 2 0.001637
INFO  2023-05-23 17:28:33,172 [shard 0] osd - Exiting state: Started/Primary/Active/Activating, entered at 1684843113.048149, 0.001637133 spent on 2 events
INFO  2023-05-23 17:28:33,172 [shard 0] osd -  pg_epoch 7 pg[1.1( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/0 les/c/f=7/0/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered
INFO  2023-05-23 17:28:33,172 [shard 0] osd - Entering state: Started/Primary/Active/Recovered
DEBUG 2023-05-23 17:28:33,173 [shard 0] osd -  pg_epoch 7 pg[1.1( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/0 les/c/f=7/0/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active PeeringState::needs_recovery is recovered
DEBUG 2023-05-23 17:28:33,173 [shard 0] osd -  pg_epoch 7 pg[1.1( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/0 les/c/f=7/0/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active PeeringState::update_calc_stats actingset 0 upset 0 acting_recovery_backfill 0
DEBUG 2023-05-23 17:28:33,173 [shard 0] osd -  pg_epoch 7 pg[1.1( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/0 les/c/f=7/0/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active PeeringState::update_calc_stats shard 0 primary objects 0 missing 0
DEBUG 2023-05-23 17:28:33,173 [shard 0] osd -  pg_epoch 7 pg[1.1( empty local-lis/les=6/7 n=0 ec=6/6 lis/c=6/0 les/c/f=7/0/0 sis=6) [0] r=0 lpr=6 crt=0'0 mlcod 0'0 active PeeringState::update_calc_stats object_location_counts {}

I'm going through the logs provided to see why debug logs were not showing up.

Actions

Copy link

Updated by Harsh Kumar 11 months ago

As I have been informed by Sam previously, the way that we enable logging in crimson differs from bluestore.
Could you please share the command to enable debug logging in Crimson?

Also I believe we no longer have the debugging level varying till 20 in Crimson, please correct me, if this is wrong.

Actions

Copy link

Updated by Matan Breizman 2 months ago

Status changed from New to Resolved
Pull request ID set to 54813

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » crimson

Custom queries

Bug #59241

[crimson] OSD logs with debug 20/20 are not captured

Updated by Aishwarya Mathuria 12 months ago

Updated by Aishwarya Mathuria 11 months ago

Updated by Harsh Kumar 11 months ago

Updated by Matan Breizman 2 months ago