Project

General

Profile

Actions

Bug #48993

closed

cephadm: 'mgr stat' and/or 'pg dump' output truncated

Added by Sage Weil about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Two symptoms:

- cephadm bootstrap 'ceph mgr dump' output is truncated (always at 180224 bytes, oddly)
example: https://pulpito.ceph.com/sage-2021-01-25_17:38:16-rados:cephadm-wip-sage2-testing-2021-01-25-1011-distro-basic-smithi/5828176

- teuthology 'ceph pg dump' dump output is truncated at 200k+ somewhere
example: https://pulpito.ceph.com/sage-2021-01-25_16:36:16-rados:cephadm:thrash-wip-sage-testing-2021-01-23-1326-distro-basic-smithi/5828009

Both result in JSON parse failures.


Related issues 3 (0 open3 closed)

Related to Orchestrator - Bug #49016: find multiple coredumps of conmonResolved

Actions
Has duplicate Orchestrator - Bug #49000: JSONDecodeError when wait_for_mgr_restart()Duplicate

Actions
Has duplicate Orchestrator - Bug #49076: cephadm: Bootstrapping fails: json.decoder.JSONDecodeError: Expecting value: line 3217 column 25 (char 114688)Duplicate

Actions
Actions #1

Updated by Sage Weil about 3 years ago

Current theory: common denominator is the ubuntu 18.04 version of podman, which is currently:

2021-01-25T17:04:04.118 INFO:teuthology.orchestra.run.smithi097.stdout:Get:4 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  conmon 2.0.24~1 [32.1 kB]
2021-01-25T17:04:04.119 INFO:teuthology.orchestra.run.smithi097.stdout:Get:5 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  containers-golang 0.6.0~1 [4,020 B]
2021-01-25T17:04:04.119 INFO:teuthology.orchestra.run.smithi097.stdout:Get:6 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  containers-image 5.8.1~1 [24.3 kB]
2021-01-25T17:04:04.121 INFO:teuthology.orchestra.run.smithi097.stdout:Get:7 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  containers-common 1.2.1~2 [13.6 kB]
2021-01-25T17:04:04.122 INFO:teuthology.orchestra.run.smithi097.stdout:Get:8 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  crun 0.16~2 [220 kB]
2021-01-25T17:04:04.218 INFO:teuthology.orchestra.run.smithi097.stdout:Get:9 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  containernetworking-plugins 0.8.7~1 [6,360 kB]
2021-01-25T17:04:04.736 INFO:teuthology.orchestra.run.smithi097.stdout:Get:10 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  podman-plugins 1.1.1~1 [1,035 kB]
2021-01-25T17:04:04.930 INFO:teuthology.orchestra.run.smithi097.stdout:Get:11 https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04  podman 2.2.1~4 [18.5 MB]

All of the failures seem to happen on ubuntu_18.04_podman.yaml. 18.04 with docker and 20.04 with podman seem unaffected.

This sounds like it might be the problem: https://github.com/containers/podman/issues/5046#issuecomment-600330600 ... except that someone claims it is resolved in conmon 2.0.14 and we're using version 2.0.24 and are still seeing (our) problem.

Actions #2

Updated by Sage Weil about 3 years ago

The reproducer on that other bug doesn't work for 20.04.. different bug. However, I can reproduce it on ubuntu 18.04 (and not on fedora).

working fedora box:

[root@gnit ~]# podman run --rm --entrypoint /bin/bash quay.ceph.io/ceph-ci/ceph:d8ced8b53927915a604da2faa03905eff41f1774 -c 'dd if=/dev/zero count=500000 bs=1' > /tmp/test ; ls -al /tmp/test ; podman -v ; grep PRETTY /etc/os-release 
500000+0 records in
500000+0 records out
500000 bytes (500 kB, 488 KiB) copied, 0.727671 s, 687 kB/s
-rw-r--r-- 1 root root 500000 Jan 25 17:51 /tmp/test
podman version 2.2.1
PRETTY_NAME="Fedora 32 (Server Edition)" 

bad ubuntu box (not changing size of /tmp/test)

root@teuthology:~# podman run --rm --entrypoint /bin/bash quay.ceph.io/ceph-ci/ceph:d8ced8b53927915a604da2faa03905eff41f1774 -c 'dd if=/dev/zero count=500000 bs=1' > /tmp/test ; ls -al /tmp/test ; podman -v ; grep PRETTY /etc/os-release 
-rw-r--r-- 1 root root 497008 Jan 25 23:52 /tmp/test
podman version 2.2.1
PRETTY_NAME="Ubuntu 18.04.4 LTS" 
root@teuthology:~# podman run --rm --entrypoint /bin/bash quay.ceph.io/ceph-ci/ceph:d8ced8b53927915a604da2faa03905eff41f1774 -c 'dd if=/dev/zero count=500000 bs=1' > /tmp/test ; ls -al /tmp/test ; podman -v ; grep PRETTY /etc/os-release 
-rw-r--r-- 1 root root 2115 Jan 25 23:52 /tmp/test
podman version 2.2.1
PRETTY_NAME="Ubuntu 18.04.4 LTS" 
root@teuthology:~# podman run --rm --entrypoint /bin/bash quay.ceph.io/ceph-ci/ceph:d8ced8b53927915a604da2faa03905eff41f1774 -c 'dd if=/dev/zero count=500000 bs=1' > /tmp/test ; ls -al /tmp/test ; podman -v ; grep PRETTY /etc/os-release 
-rw-r--r-- 1 root root 134809 Jan 25 23:52 /tmp/test
podman version 2.2.1
PRETTY_NAME="Ubuntu 18.04.4 LTS" 
root@teuthology:~# podman run --rm --entrypoint /bin/bash quay.ceph.io/ceph-ci/ceph:d8ced8b53927915a604da2faa03905eff41f1774 -c 'dd if=/dev/zero count=500000 bs=1' > /tmp/test ; ls -al /tmp/test ; podman -v ; grep PRETTY /etc/os-release 
-rw-r--r-- 1 root root 1928 Jan 25 23:52 /tmp/test
podman version 2.2.1
PRETTY_NAME="Ubuntu 18.04.4 LTS" 

Actions #4

Updated by Kefu Chai about 3 years ago

  • Has duplicate Bug #49000: JSONDecodeError when wait_for_mgr_restart() added
Actions #5

Updated by Kefu Chai about 3 years ago

i have the same issue when testing Ubuntu_20.04 + conmon 2.0.24~1

- /a/kchai-2021-01-26_13:24:13-rados:cephadm-wip-kefu-testing-2021-01-26-1857-distro-basic-smithi/5830522
- /a/kchai-2021-01-26_13:24:13-rados:cephadm-wip-kefu-testing-2021-01-26-1857-distro-basic-smithi/5830523
- /a/kchai-2021-01-26_13:24:13-rados:cephadm-wip-kefu-testing-2021-01-26-1857-distro-basic-smithi/5830527

Actions #6

Updated by Kefu Chai about 3 years ago

  • Related to Bug #49016: find multiple coredumps of conmon added
Actions #7

Updated by Deepika Upadhyay about 3 years ago

/ceph/teuthology-archive/yuriw-2021-01-28_19:54:33-rados-wip-yuri4-testing-2021-01-28-0959-octopus-distro-basic-smithi/5835938/teuthology.log

Actions #8

Updated by Sebastian Wagner about 3 years ago

  • Related to Bug #49076: cephadm: Bootstrapping fails: json.decoder.JSONDecodeError: Expecting value: line 3217 column 25 (char 114688) added
Actions #9

Updated by Sebastian Wagner about 3 years ago

  • Related to deleted (Bug #49076: cephadm: Bootstrapping fails: json.decoder.JSONDecodeError: Expecting value: line 3217 column 25 (char 114688))
Actions #10

Updated by Sebastian Wagner about 3 years ago

  • Has duplicate Bug #49076: cephadm: Bootstrapping fails: json.decoder.JSONDecodeError: Expecting value: line 3217 column 25 (char 114688) added
Actions #11

Updated by Sebastian Wagner about 3 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF