Project

General

Profile

Actions

Bug #9381

closed

"jerasure load dlopen(/usr/lib64/ceph/erasure-code/libec_lrc.so)" error in upgrade:dumpling-firefly-x-master-distro-basic-vps suite

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Monitor
Target version:
-
% Done:

100%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Per Josh analysis:

"looking at one of the ones that timed out waiting to be healthy: http://pulpito.ceph.com/teuthology-2014-09-06_17:08:01-upgrade:dumpling-firefly-x-master-distro-basic-vps/470461/
teuthology.log reports that 1 mon is down 2:28
and it's mon.a 2:29
the end of mon.a's log has an error: -1 load: jerasure load dlopen(/usr/lib64/ceph/erasure-code/libec_lrc.so): /usr/lib64/ceph/erasure-code/libec_lrc.so: cannot open shared object file: No such file or directory 2:29
no crash, but maybe that failure caused the mon to exit" 

Log is in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-06_17:08:01-upgrade:dumpling-firefly-x-master-distro-basic-vps/470461/teuthology.log

Error from vpm019/log/ceph-mon.a.log.gz :

vpm019/log/ceph-mon.a.log.gz:75490040-2014-09-06 21:16:11.934638 7f532445e700 15 mon.a@0(leader).mds e10 _note_beacon mdsbeacon(4500/a up:active seq 89 v10) v2 noting time
vpm019/log/ceph-mon.a.log.gz:75490174-2014-09-06 21:16:11.934646 7f532445e700  1 -- 10.214.138.58:6789/0 --> 10.214.138.58:6808/19385 -- mdsbeacon(4500/a up:active seq 89 v10) v2 -- ?+0 0x1e0b9c0 con 0x1d9d960
vpm019/log/ceph-mon.a.log.gz:75490346-2014-09-06 21:16:19.749711 7f37967407a0  0 ceph version 0.84-1029-g7d8fe2d (7d8fe2d994a673f2187bf99ac8e20df6a0cd2514), process ceph-mon, pid 20161
vpm019/log/ceph-mon.a.log.gz:75490493:2014-09-06 21:16:19.992684 7f37967407a0 -1 load: jerasure load dlopen(/usr/lib64/ceph/erasure-code/libec_lrc.so): /usr/lib64/ceph/erasure-code/libec_lrc.so: cannot open shared object file: No such file or directory
^C
teuthology@teuthology:/a/teuthology-2014-09-06_17:08:01-upgrade:dumpling-firefly-x-master-distro-basic-vps/470461/remote$ zgrep "cannot open shared object file"  vpm019/log/ceph-mon.a.log.gz -a20
vpm019/log/ceph-mon.a.log.gz:2014-09-06 21:16:11.743071 7f532445e700  1 mon.a@0(leader).paxos(paxos active c 4268..4827) is_readable now=2014-09-06 21:16:11.743072 lease_expire=2014-09-06 21:16:16.739306 has v0 lc 4827
vpm019/log/ceph-mon.a.log.gz:2014-09-06 21:16:11.743083 7f532445e700 10 mon.a@0(leader).osd e1292 preprocess_query mon_command({"prefix": "osd pool create", "pool": "test-rados-api-vpm050-10162-59", "pool_type":"erasure", "pg_num":8, "pgp_num":8, "erasure_code_profile":"testprofile"} v 0) v1 from client.4681 10.214.138.113:0/59010162
vpm019/log/ceph-mon.a.log.gz:2014-09-06 21:16:11.743142 7f532445e700  7 mon.a@0(leader).osd e1292 prepare_update mon_command({"prefix": "osd pool create", "pool": "test-rados-api-vpm050-10162-59", "pool_type":"erasure", "pg_num":8, "pgp_num":8, "erasure_code_profile":"testprofile"} v 0) v1 from client.4681 10.214.138.113:0/59010162
vpm019/log/ceph-mon.a.log.gz:2014-09-06 21:16:11.743258 7f532445e700  1 mon.a@0(leader).osd e1292 implicitly use ruleset named after the pool: test-rados-api-vpm050-10162-59
vpm019/log/ceph-mon.a.log.gz:2014-09-06 21:16:11.743496 7f532445e700 10 mon.a@0(leader).osd e1292 should_propose
Actions #1

Updated by Yuri Weinstein over 9 years ago

More from Josh:

Also 'ceph pg dump --format json' failed it's the same root cause, but in this case all 3 mons went down from errors loading libec_lrc.so, so the 'ceph pg dump' timed out
i.e. http://pulpito.ceph.com/teuthology-2014-09-07_17:08:02-upgrade:dumpling-firefly-x-master-distro-basic-vps/471368/
Actions #2

Updated by Yuri Weinstein over 9 years ago

Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Duplicate
Actions #4

Updated by Loïc Dachary over 9 years ago

  • Category set to Monitor
  • Status changed from Duplicate to 12
  • Assignee set to Loïc Dachary
  • Priority changed from Normal to High

It does not look like a duplicate after all. It fails when preloading the lrc erasure code plugin.

Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from 12 to In Progress
Actions #6

Updated by Loïc Dachary over 9 years ago

ceph-mon and the plugins are in the ceph package . However the lrc and isa plugins need to be explicitly mentioned and that was not done.

Actions #7

Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Loïc Dachary over 9 years ago

  • % Done changed from 0 to 90
Actions #9

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport

merged to master

Actions #10

Updated by Yuri Weinstein over 9 years ago

Note for re-testing:

Same issues on suite:upgrade:dumpling-giant-x

For example http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-11_16:05:01-upgrade:dumpling-giant-x:parallel-master-distro-basic-vps/478515/teuthology.log

Visibly tests fail with error

"CommandFailedError: Command failed on vpm078 with status 1: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd dump --format=json'" 

Actions #11

Updated by Loïc Dachary over 9 years ago

This is because giant is at tag v0.85 which does not include the fix. The fix is in the giant branch though so it will work when the next giant release candidate is available.

2014-09-11T16:59:55.839 INFO:teuthology.orchestra.run.vpm078:Running: 'sudo yum -y install ceph-debuginfo-0.85 ceph-radosgw-0.85 ceph-test-0.85 ceph-devel-0.85 ceph-0.85 ceph-fuse-0.85 rest-bench-0.85 libcephfs_jni1-0.85 libcephfs1-0.85 python-ceph-0.85'
2014-09-11T17:03:02.577 INFO:teuthology.task.install:config contains sha1|tag|branch, removing those keys from override
2014-09-11T17:03:02.577 INFO:teuthology.task.install:remote ubuntu@vpm068.front.sepia.ceph.com config {'branch': 'giant'}
2014-09-11T17:03:02.578 INFO:teuthology.orchestra.run.vpm068:Running: 'sudo lsb_release -is'
2014-09-11T17:03:04.436 DEBUG:teuthology.misc:System to be installed: CentOS
2014-09-11T17:03:04.436 INFO:teuthology.task.install:Upgrading ceph rpm packages: ceph-debuginfo, ceph-radosgw, ceph-test, ceph-devel, ceph, ceph-fuse, rest-bench, libcephfs_jni1, libcephfs1, python-ceph
2014-09-11T17:03:04.436 INFO:teuthology.orchestra.run.vpm068:Running: 'arch'
2014-09-11T17:03:04.467 INFO:teuthology.orchestra.run.vpm068:Running: 'lsb_release -is'
2014-09-11T17:03:04.579 INFO:teuthology.orchestra.run.vpm068:Running: 'lsb_release -rs'
2014-09-11T17:03:04.624 INFO:teuthology.task.install:config is {'project': 'ceph', 'branch': 'giant'}
2014-09-11T17:03:04.624 INFO:teuthology.task.install:Host vpm068 is: CentOS 6.5 x86_64
2014-09-11T17:03:04.624 INFO:teuthology.orchestra.run.vpm068:Running: 'arch'
2014-09-11T17:03:04.717 INFO:teuthology.orchestra.run.vpm068:Running: 'lsb_release -is'
2014-09-11T17:03:04.827 INFO:teuthology.orchestra.run.vpm068:Running: 'lsb_release -rs'
2014-09-11T17:03:04.936 INFO:teuthology.task.install:config is {'project': 'ceph', 'branch': 'giant'}
2014-09-11T17:03:04.937 INFO:teuthology.orchestra.run.vpm068:Running: 'sudo lsb_release -is'
2014-09-11T17:03:05.055 DEBUG:teuthology.misc:System to be installed: CentOS
2014-09-11T17:03:05.055 INFO:teuthology.task.install:Repo base URL: http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/giant
2014-09-11T17:03:05.056 INFO:teuthology.orchestra.run.vpm068:Running: 'wget -q -O- http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/giant/version'
2014-09-11T17:03:05.371 INFO:teuthology.orchestra.run.vpm068:Running: 'sudo rpm -ev ceph-release'
2014-09-11T17:03:06.036 INFO:teuthology.orchestra.run.vpm068:Running: 'sudo rpm -Uv http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/giant/noarch/ceph-release-1-0.el6.noarch.rpm'
2014-09-11T17:03:06.668 INFO:teuthology.orchestra.run.vpm068:Running: 'arch'
2014-09-11T17:03:06.694 INFO:teuthology.orchestra.run.vpm068:Running: 'lsb_release -is'
2014-09-11T17:03:06.803 INFO:teuthology.orchestra.run.vpm068:Running: 'lsb_release -rs'
2014-09-11T17:03:06.846 INFO:teuthology.task.install:config is {'project': 'ceph', 'branch': 'giant'}
2014-09-11T17:03:06.846 INFO:teuthology.orchestra.run.vpm068:Running: "sudo sed -i -e ':a;N;$!ba;s/enabled=1\\ngpg/enabled=1\\npriority=1\\ngpg/g' -e 's;ref/[a-zA-Z0-9_]*/;ref/giant/;g' /etc/yum.repos.d/ceph.repo" 
2014-09-11T17:03:06.945 INFO:teuthology.orchestra.run.vpm068:Running: 'sudo yum clean all'
2014-09-11T17:03:08.182 INFO:teuthology.orchestra.run.vpm068.stdout:Loaded plugins: fastestmirror, priorities
2014-09-11T17:03:08.288 INFO:teuthology.orchestra.run.vpm068.stdout:Cleaning repos: Ceph Ceph-noarch base centos6-apache-ceph centos6-fcgi-ceph
2014-09-11T17:03:08.288 INFO:teuthology.orchestra.run.vpm068.stdout:              : centos6-misc-ceph centos6-qemu-ceph ceph-source epel extras
2014-09-11T17:03:08.289 INFO:teuthology.orchestra.run.vpm068.stdout:              : rpmforge updates
2014-09-11T17:03:08.289 INFO:teuthology.orchestra.run.vpm068.stdout:Cleaning up Everything
2014-09-11T17:03:08.328 INFO:teuthology.orchestra.run.vpm068.stdout:Cleaning up list of fastest mirrors
2014-09-11T17:03:08.346 INFO:teuthology.orchestra.run.vpm068:Running: 'sudo yum -y install ceph-debuginfo-0.85 ceph-radosgw-0.85 ceph-test-0.85 ceph-devel-0.85 ceph-0.85 ceph-fuse-0.85 rest-bench-0.85 libcephfs_jni1-0.85 libcephfs1-0.85 python-ceph-0.85'
2014-09-11T17:06:42.937 INFO:teuthology.run_tasks:Running task print...
2014-09-11T17:06:42.937 INFO:teuthology.task.print:**** done install.upgrade
2014-09-11T17:06:42.937 INFO:teuthology.run_tasks:Running task ceph.restart...
2014-09-11T17:06:48.937 INFO:tasks.ceph.mon.a:Stopped
2014-09-11T17:06:48.937 INFO:tasks.ceph.mon.a:Restarting daemon
2014-09-11T17:06:48.937 INFO:teuthology.orchestra.run.vpm078:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f -i a'
2014-09-11T17:06:49.000 INFO:tasks.ceph.mon.a:Started
2014-09-11T17:06:51.857 INFO:tasks.ceph.mon.a.vpm078.stderr:2014-09-11 20:06:51.856791 7f6fc03407a0 -1 load: jerasure load dlopen(/usr/lib64/ceph/erasure-code/libec_lrc.so): /usr/lib64/ceph/erasure-code/libec_lrc.so: cannot open shared object file: No such file or directory

Actions #12

Updated by Loïc Dachary over 9 years ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 90 to 100

All rpm packages were eventually updated.

Actions

Also available in: Atom PDF