Project

General

Profile

Bug #9153

erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

100%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-17_11:30:03-upgrade:dumpling-firefly-x-master-distro-basic-vps/431048/

ceph-osd.0.log.gz:2014-08-17 18:40:08.383475 7f068a8e0700 -1 *** Caught signal (Segmentation fault) **
ceph-osd.0.log.gz:     0> 2014-08-17 18:40:08.383475 7f068a8e0700 -1 *** Caught signal (Segmentation fault) **
ceph-osd.1.log.gz:2014-08-17 18:38:07.937758 7fd207819700 -1 *** Caught signal (Segmentation fault) **
ceph-osd.1.log.gz:     0> 2014-08-17 18:38:07.937758 7fd207819700 -1 *** Caught signal (Segmentation fault) **
537187608:2014-08-17 18:40:08.383475 7f068a8e0700 -1 *** Caught signal (Segmentation fault) **
537187693- in thread 7f068a8e0700
537187717-
537187718- ceph version 0.80.5-164-gcc4e625 (cc4e6258d67fb16d4a92c25078a0822a9849cd77)
537187795- 1: ceph-osd() [0x9b58c1]
537187821- 2: (()+0xf710) [0x7f06a3e24710]
537187854- 3: (memcpy()+0x15b) [0x7f06a2d4daab]
537187892- 4: (jerasure_matrix_dotprod()+0xc8) [0x7f067fd11618]
537187946- 5: (jerasure_matrix_encode()+0x75) [0x7f067fd11865]
537187999- 6: (ErasureCodeJerasureReedSolomonVandermonde::jerasure_encode(char**, char**, int)+0x21) [0x7f067fd294b1]
537188107- 7: (ErasureCodeJerasure::encode_chunks(std::set<int, std::less<int>, std::allocator<int> > const&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >*)+0x607) [0x7f067fd2a807]

2014-08-17T15:59:24.501 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosWatchNotify.WatchNotifyTest
2014-08-17T15:59:26.096 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosWatchNotify.WatchNotifyTest (1595 ms)
2014-08-17T15:59:27.472 INFO:teuthology.orchestra.run.vpm096:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-08-17T15:59:27.688 INFO:tasks.workunit.client.0.vpm046.stdout:[----------] 1 test from LibRadosWatchNotify (1595 ms total)
2014-08-17T15:59:28.895 INFO:tasks.workunit.client.0.vpm046.stdout:
2014-08-17T15:59:28.895 INFO:tasks.workunit.client.0.vpm046.stdout:[----------] 1 test from LibRadosWatchNotifyEC
2014-08-17T15:59:28.897 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 18 pgs degraded; 18 pgs stale; 18 pgs stuck stale; 18 pgs stuck unclean; recovery 23/348 objects degraded (6.609%)
2014-08-17T15:59:29.897 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 39, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph.py", line 1086, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph.py", line 994, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 790, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 127, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
2014-08-17T15:59:29.938 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 98, in next
    result = self.results.get()
  File "/usr/lib/python2.7/dist-packages/gevent/queue.py", line 190, in get
    return waiter.get()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 321, in get
    return get_hub().switch()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 164, in switch
    return greenlet.switch(self)
GreenletExit
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-17_11:30:03-upgrade:dumpling-firefly-x-master-distro-basic-vps/431048
branch: master
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-type.yaml
  6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/rhel_6.5.yaml}
email: ceph-qa@ceph.com
job_id: '431048'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-08-17_11:30:03-upgrade:dumpling-firefly-x-master-distro-basic-vps
nuke-on-error: true
os_type: rhel
os_version: '6.5'
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: 5045c5cb4c880255a1a5577c09b89d4be225bee9
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 5045c5cb4c880255a1a5577c09b89d4be225bee9
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: master
  workunit:
    sha1: 5045c5cb4c880255a1a5577c09b89d4be225bee9
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:dumpling-firefly-x
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@vpm046.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA36VeOFXt2Oe/FC8faHGyepp8rDHtaSV3y3PUJNJzTcAUYDTI5qmiMYtnVZMEnIGjhldz1dhCyf3MxswhjBiExbtqRXgiDKv2CGtg8JiZUio6Xxbeen4YbkEjgW7UpZiLTCD00cnL+UPkNquQUMphIkRnnwuVMMk4XgpRNSuiZovADs9tmnkl/8JfHaHqlhfd2zsY4dcbIHTPKuzGtvGT8hzWq94aIjnSlUvlQPeAbXWdfAp/zTes2MWYuRNq9addTX81SmcwfECn/XuZXk0RPGVkGYw5I3de05JpT+I1uCehba/Yah2vkdDKnSPONPJOdSYMbcmI+NNyKWlw9/y+jQ==
  ubuntu@vpm057.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0T6oCd2r50UyKxBU9u+jbIH7WGeL8Asl9Fvwb0cE7Oo5QU+rHerWldMOSuS/nXp+zmIKnK51aLjJlayhFfbtQIq13lue/jvmVtSJZNuQaI77EPNt8zGpLJ1SjX/ThLYgMVwTjcXNumiKdmU6RH0npFrNKSd/Vn59C43r1Qz9EqOe2jjsRZP1yTqvMuRFZu5vklvAiaGoAsQbCt/yMA15BrNNhEdar8/n0sejR/a4ooyDzOkY0tQH0gUOHNQQinz1BAK6oGry5RYaHmmtTpJtLJxcqffURdru5wleRCuQWrv28cD8BP6ivAfF0Tr6bABvsa9DaYf3mx26WggPZ3uD4w==
  ubuntu@vpm096.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq1lqBHy10TXfEAFVLchqI/l474vB7Cgm6kCxewYuYySJQJ+Kre/Os4uFrsGNqykEx5AzAiUPZnYtwxLkBa9fgcTtdr48t93jIALAla1kz3F6p3WbB1ioRuQun66T18ZWHeDcTuI1bDhFX02ZOyf78fX/q1ddhnUZIMNaez5Rm57F4njTEq5pxUExeLmFnVfKJ6gswY93/HBoCAmizZrmihzMC8UzMV4q27j2AoZdcDj5bfbvswcrGdI/+Zg4TsnsZlhZKh7eLD+u+nfmxl8OkCKn2nlgksCrglE2JtxqtCqoW5IjSfaUS99w+Hd+eNVoIKtrXxdZ4bKdjfVZTmaShQ==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- print: '**** done dumpling install'
- ceph:
    fs: xfs
- parallel:
  - workload
- print: '**** done parallel'
- install.upgrade:
    client.0:
      branch: firefly
    mon.a:
      branch: firefly
    mon.b:
      branch: firefly
- print: '**** done install.upgrade'
- ceph.restart: null
- print: '**** done restart'
- parallel:
  - workload2
  - upgrade-sequence
- print: '**** done parallel'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade client.0 to the version from teuthology-suite
    arg'
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 500
    op_weights:
      append: 45
      delete: 10
      read: 45
      write: 0
    ops: 4000
- rados:
    clients:
    - client.1
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- workunit:
    clients:
      client.1:
      - rados/load-gen-mix.sh
- mon_thrash:
    revive_delay: 20
    thrash_delay: 1
- workunit:
    clients:
      client.1:
      - rados/test.sh
- workunit:
    clients:
      client.1:
      - cls/test_cls_rbd.sh
- workunit:
    clients:
      client.1:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rgw:
  - client.1
- s3tests:
    client.1:
      rgw_server: client.1
- swift:
    client.1:
      rgw_server: client.1
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
  - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg'
  - install.upgrade:
      mon.b: null
  - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.a
      - mon.b
      - mon.c
      wait-for-healthy: true
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - osd.0
      - osd.1
      - osd.2
      - osd.3
      wait-for-healthy: true
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - sleep:
      duration: 60
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.4690
workload:
  sequential:
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done rados/test.sh &  cls'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh'
workload2:
  sequential:
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done #rados/test.sh and cls 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh 2'
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-type.yaml
  6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/rhel_6.5.yaml}
duration: 5281.546662092209
failure_reason: '''wait_until_healthy''reached maximum tries (150) after waiting for
  900 seconds'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues

Related to Ceph - Feature #9167: erasure-code: check plugin version when loading it Resolved 08/19/2014
Related to Ceph - Bug #9170: erasure-code: preload erasure code plugins Resolved 08/19/2014
Related to Ceph - Bug #9186: erasure-code: conditionally preload isa plugin Duplicate 08/20/2014
Related to Ceph - Bug #9273: mon doesn't preload ec plugins; triggers crash in upgrade tests Resolved 08/28/2014

Associated revisions

Revision 9b802701 (diff)
Added by Loic Dachary over 9 years ago

erasure-code: preload the jerasure plugin

Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:

  • ceph-osd-v1 is running but did not load jerasure
  • ceph-osd-v2 is installed being installed but takes time : the files
    are installed before ceph-osd is restarted
  • ceph-osd-v1 is required to handle an erasure coded placement group and
    loads jerasure (the v2 version which is not API compatible)
  • ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
    expected part of the code and crashes

Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.

While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Backport: firefly
Signed-off-by: Loic Dachary <>

Revision 164f1a19 (diff)
Added by Loic Dachary over 9 years ago

erasure-code: preload the jerasure plugin

Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:

  • ceph-osd-v1 is running but did not load jerasure
  • ceph-osd-v2 is installed being installed but takes time : the files
    are installed before ceph-osd is restarted
  • ceph-osd-v1 is required to handle an erasure coded placement group and
    loads jerasure (the v2 version which is not API compatible)
  • ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
    expected part of the code and crashes

Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.

While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Backport: firefly
Signed-off-by: Loic Dachary <>
(cherry picked from commit 9b802701f78288ba4f706c65b853415c69002d27)

Conflicts:
src/test/erasure-code/test-erasure-code.sh
src/common/config_opts.h

Revision 8d7e77b9 (diff)
Added by Loic Dachary over 9 years ago

erasure-code: preload the jerasure plugin variant (sse4,sse3,generic)

The preloading of the jerasure plugin ldopen the plugin that is in
charge of selecting the variant optimized for the
CPU (sse4,sse3,generic). The variant plugin itself is not loaded because
it does not happen at load() but when the factory() method is called.

The JerasurePlugin::preload method is modified to call the factory()
method to load jerasure_sse4 or jerasure_sse3 or jerasure_generic as a
side effect.

Indirectly loading another plugin in the factory() method is error prone
and should be moved to the load() method instead. This change should be
done in a separate commit.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Signed-off-by: Loic Dachary <>

Revision efc8bfd1 (diff)
Added by Loic Dachary over 9 years ago

erasure-code: jerasure preloads the plugin variant

The variant selection depending on the available CPU features is
encapsulated in a helper. The helper is used in the factory() method and
in the load() method.

The factory() method may load a variant that is not the default, for
benchmark purposes. Such a variant is not preloaded by the load() method
and upgrading while running may be problematic. However, running with a
non standard variant is used for benchmarking and upgrades in this
context are not a concern.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Signed-off-by: Loic Dachary <>

History

#1 Updated by Sage Weil over 9 years ago

  • Assignee set to Loïc Dachary

Loic, can you take a look?

#2 Updated by Loïc Dachary over 9 years ago

Ack

#3 Updated by Loïc Dachary over 9 years ago

  • Subject changed from osd crash in upgrade:dumpling-firefly-x-master-distro-basic-vps suite to erasure-code: jerasure_matrix_dotprod segmentation fault
  • Category set to OSD
  • Status changed from New to In Progress

As soon as VPS are available, lock three and run the job again hoping to repeat it

teuthology-lock --lock-many 3 --machine-type vps --owner loic@dachary.org --os-type rhel --os-version 6.5

#4 Updated by Loïc Dachary over 9 years ago

Got three VPS with rhel 6.5 installed, running the job on them with no "nuke-on-error"

#5 Updated by Loïc Dachary over 9 years ago

The stack trace is bizarre. ECUtil::decode calls ErasureCodeJerasure::encode_chunks which makes no sense becase a) decoding does not involve encoding and b) ECUtil::decode does not call *_chunks methods.

 4: (jerasure_matrix_dotprod()+0xc8) [0x7f067fd11618]
 5: (jerasure_matrix_encode()+0x75) [0x7f067fd11865]
 6: (ErasureCodeJerasureReedSolomonVandermonde::jerasure_encode(char**, char**, int)+0x21) [0x7f067fd294b1]
 7: (ErasureCodeJerasure::encode_chunks(std::set<int, std::less<int>, std::allocator<int> > const&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >*)+0x607) [0x7f067fd2a807]
 8: (ECUtil::decode(ECUtil::stripe_info_t const&, std::tr1::shared_ptr<ceph::ErasureCodeInterface>&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >&, ceph::buffer::list*)+0x34c) [0x9b16bc]
 9: (CallClientContexts::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x270) [0x967d90]
 10: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x963719]
 11: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x6c) [0x95260c]

#6 Updated by Loïc Dachary over 9 years ago

The upgrade sequence

  • dumpling
  • firefly -> installs and load the jerasure plugin
  • master -> installs an updated jerasure plugin (the *_chunks methods are only found in master, not in firefly) while the OSD is using it

could it be that the running OSD process gets to use the new jerasure plugin as soon as it is installed ?

#7 Updated by Loïc Dachary over 9 years ago

If the ceph-libs package is upgraded before the ceph package, it is entirely possible that the shared library is replaced while the ceph-osd daemon runs. The plugin version is the same so the file will be replaced.

[ubuntu@vpm179 ~]$ repoquery -q -l ceph-libs
/usr/lib/ceph/erasure-code
/usr/lib/ceph/erasure-code/libec_example.so
/usr/lib/ceph/erasure-code/libec_example.so.0
/usr/lib/ceph/erasure-code/libec_example.so.0.0.0
/usr/lib/ceph/erasure-code/libec_fail_to_initialize.so
/usr/lib/ceph/erasure-code/libec_fail_to_initialize.so.0
/usr/lib/ceph/erasure-code/libec_fail_to_initialize.so.0.0.0
/usr/lib/ceph/erasure-code/libec_fail_to_register.so
/usr/lib/ceph/erasure-code/libec_fail_to_register.so.0
/usr/lib/ceph/erasure-code/libec_fail_to_register.so.0.0.0
/usr/lib/ceph/erasure-code/libec_hangs.so
/usr/lib/ceph/erasure-code/libec_hangs.so.0
/usr/lib/ceph/erasure-code/libec_hangs.so.0.0.0
/usr/lib/ceph/erasure-code/libec_jerasure.so
/usr/lib/ceph/erasure-code/libec_jerasure.so.2
/usr/lib/ceph/erasure-code/libec_jerasure.so.2.0.0
/usr/lib/ceph/erasure-code/libec_missing_entry_point.so
/usr/lib/ceph/erasure-code/libec_missing_entry_point.so.0
/usr/lib/ceph/erasure-code/libec_missing_entry_point.so.0.0.0
/usr/lib/librados.so.2
/usr/lib/librados.so.2.0.0
/usr/lib/librbd.so.1
/usr/lib/librbd.so.1.0.0
/usr/lib/rados-classes
/usr/lib/rados-classes/libcls_hello.so
/usr/lib/rados-classes/libcls_kvs.so
/usr/lib/rados-classes/libcls_lock.so
/usr/lib/rados-classes/libcls_log.so
/usr/lib/rados-classes/libcls_rbd.so
/usr/lib/rados-classes/libcls_refcount.so
/usr/lib/rados-classes/libcls_replica_log.so
/usr/lib/rados-classes/libcls_rgw.so
/usr/lib/rados-classes/libcls_statelog.so
/usr/lib/rados-classes/libcls_user.so
/usr/lib/rados-classes/libcls_user.so.1
/usr/lib/rados-classes/libcls_user.so.1.0.0
/usr/lib/rados-classes/libcls_version.so
/usr/share/doc/ceph-libs-0.81.0
/usr/share/doc/ceph-libs-0.81.0/COPYING
/usr/lib64/ceph/erasure-code
/usr/lib64/ceph/erasure-code/libec_example.so
/usr/lib64/ceph/erasure-code/libec_example.so.0
/usr/lib64/ceph/erasure-code/libec_example.so.0.0.0
/usr/lib64/ceph/erasure-code/libec_fail_to_initialize.so
/usr/lib64/ceph/erasure-code/libec_fail_to_initialize.so.0
/usr/lib64/ceph/erasure-code/libec_fail_to_initialize.so.0.0.0
/usr/lib64/ceph/erasure-code/libec_fail_to_register.so
/usr/lib64/ceph/erasure-code/libec_fail_to_register.so.0
/usr/lib64/ceph/erasure-code/libec_fail_to_register.so.0.0.0
/usr/lib64/ceph/erasure-code/libec_hangs.so
/usr/lib64/ceph/erasure-code/libec_hangs.so.0
/usr/lib64/ceph/erasure-code/libec_hangs.so.0.0.0
/usr/lib64/ceph/erasure-code/libec_jerasure.so
/usr/lib64/ceph/erasure-code/libec_jerasure.so.2
/usr/lib64/ceph/erasure-code/libec_jerasure.so.2.0.0
/usr/lib64/ceph/erasure-code/libec_missing_entry_point.so
/usr/lib64/ceph/erasure-code/libec_missing_entry_point.so.0
/usr/lib64/ceph/erasure-code/libec_missing_entry_point.so.0.0.0
/usr/lib64/librados.so.2
/usr/lib64/librados.so.2.0.0
/usr/lib64/librbd.so.1
/usr/lib64/librbd.so.1.0.0
/usr/lib64/rados-classes
/usr/lib64/rados-classes/libcls_hello.so
/usr/lib64/rados-classes/libcls_kvs.so
/usr/lib64/rados-classes/libcls_lock.so
/usr/lib64/rados-classes/libcls_log.so
/usr/lib64/rados-classes/libcls_rbd.so
/usr/lib64/rados-classes/libcls_refcount.so
/usr/lib64/rados-classes/libcls_replica_log.so
/usr/lib64/rados-classes/libcls_rgw.so
/usr/lib64/rados-classes/libcls_statelog.so
/usr/lib64/rados-classes/libcls_user.so
/usr/lib64/rados-classes/libcls_user.so.1
/usr/lib64/rados-classes/libcls_user.so.1.0.0
/usr/lib64/rados-classes/libcls_version.so
/usr/share/doc/ceph-libs-0.81.0
/usr/share/doc/ceph-libs-0.81.0/COPYING

#8 Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Fix Under Review

#9 Updated by Loïc Dachary over 9 years ago

It looks like the ceph-libs package is not upgraded, which explains the core dump : master cannot successfully load and run a jerasure plugin from firefly because the ErasureCodeInterface interface changed.

2014-08-17T15:23:55.819 INFO:teuthology.run_tasks:Running task install.upgrade...
2014-08-17T15:23:55.819 INFO:teuthology.task.install:project ceph config {'mon.a': {'branch': 'firefly'}, 'mon.b': {'branch': 'firefly'}, 'client.0': {'branch': 'firefly'}} overrides {'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:23:55.820 INFO:teuthology.task.install:extra packages: []
2014-08-17T15:23:55.820 INFO:teuthology.task.install:config contains sha1|tag|branch, removing those keys from override
2014-08-17T15:23:55.820 INFO:teuthology.task.install:remote ubuntu@vpm046.front.sepia.ceph.com config {'branch': 'firefly'}
2014-08-17T15:23:55.820 INFO:teuthology.orchestra.run.vpm046:Running: 'sudo lsb_release -is'
2014-08-17T15:23:56.264 DEBUG:teuthology.misc:System to be installed: RedHatEnterpriseServer
2014-08-17T15:23:56.264 INFO:teuthology.task.install:Upgrading ceph rpm packages: ceph-debuginfo, ceph-radosgw, ceph-test, ceph-devel, ceph, ceph-fuse, rest-bench, libcephfs_jni1, libcephfs1, python-ceph

#10 Updated by Loïc Dachary over 9 years ago

The ceph-libs package is obsolete and the jerasure plugin now lives in the ceph package. The problem does not come from the lack of upgrading.

#11 Updated by Loïc Dachary over 9 years ago

Trying a manual upgrade

[ubuntu@vpm179 ~]$ cat /etc/yum.repos.d/ceph.repo
[ceph-6-repo]
name=Ceph
baseurl=http://gitbuilder.ceph.com/ceph-rpm-rhel6_5-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/x86_64/
gpgcheck=0
enabled=1
[ubuntu@vpm179 ~]$ yum info ceph
Loaded plugins: priorities
centos6-apache-ceph                                                                                                                                      |  951 B     00:00     
centos6-fcgi-ceph                                                                                                                                        |  951 B     00:00     
centos6-misc-ceph                                                                                                                                        |  951 B     00:00     
centos6-qemu-ceph                                                                                                                                        |  951 B     00:00     
epel                                                                                                                                                     | 3.0 kB     00:00     
267 packages excluded due to repository priority protections
Available Packages
Name        : ceph
Arch        : i686
Version     : 0.81.0
Release     : 5.el6
Size        : 19 M
Repo        : epel
Summary     : User space components of the Ceph file system
URL         : https://ceph.com/
License     : LGPLv2
Description : Ceph is a distributed network file system designed to provide excellent
            : performance, reliability, and scalability.

Name        : ceph
Arch        : x86_64
Version     : 0.83
Release     : 777.g5045c5c.el6
Size        : 12 M
Repo        : ceph-6-repo
Summary     : User space components of the Ceph file system
URL         : http://ceph.com/
License     : GPL-2.0
Description : Ceph is a massively scalable, open-source, distributed
            : storage system that runs on commodity hardware and delivers object,
            : block and file system storage.
[ubuntu@vpm179 ~]$ yum install ceph
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package ceph.x86_64 0:0.83-777.g5045c5c.el6 will be installed
--> Processing Dependency: librbd1 = 0.83-777.g5045c5c.el6 for package: ceph-0.83-777.g5045c5c.el6.x86_64
--> Processing Dependency: librados2 = 0.83-777.g5045c5c.el6 for package: ceph-0.83-777.g5045c5c.el6.x86_64
--> Processing Dependency: libcephfs1 = 0.83-777.g5045c5c.el6 for package: ceph-0.83-777.g5045c5c.el6.x86_64
--> Processing Dependency: ceph-common = 0.83-777.g5045c5c.el6 for package: ceph-0.83-777.g5045c5c.el6.x86_64
--> Processing Dependency: python-ceph for package: ceph-0.83-777.g5045c5c.el6.x86_64
--> Processing Dependency: libcephfs.so.1()(64bit) for package: ceph-0.83-777.g5045c5c.el6.x86_64
--> Running transaction check
---> Package ceph-common.x86_64 0:0.83-777.g5045c5c.el6 will be installed
---> Package libcephfs1.x86_64 0:0.83-777.g5045c5c.el6 will be installed
---> Package librados2.x86_64 0:0.67.10-5.gc7948af.el6 will be updated
---> Package librados2.x86_64 0:0.83-777.g5045c5c.el6 will be an update
---> Package librbd1.x86_64 0:0.67.10-5.gc7948af.el6 will be updated
---> Package librbd1.x86_64 0:0.83-777.g5045c5c.el6 will be an update
---> Package python-ceph.x86_64 0:0.83-777.g5045c5c.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================================
 Package                                  Arch                                Version                                            Repository                                Size
================================================================================================================================================================================
Installing:
 ceph                                     x86_64                              0.83-777.g5045c5c.el6                              ceph-6-repo                               12 M
Installing for dependencies:
 ceph-common                              x86_64                              0.83-777.g5045c5c.el6                              ceph-6-repo                              5.3 M
 libcephfs1                               x86_64                              0.83-777.g5045c5c.el6                              ceph-6-repo                              1.6 M
 python-ceph                              x86_64                              0.83-777.g5045c5c.el6                              ceph-6-repo                               69 k
Updating for dependencies:
 librados2                                x86_64                              0.83-777.g5045c5c.el6                              ceph-6-repo                              1.5 M
 librbd1                                  x86_64                              0.83-777.g5045c5c.el6                              ceph-6-repo                              351 k

Transaction Summary
================================================================================================================================================================================
Install       4 Package(s)
Upgrade       2 Package(s)

Total download size: 21 M
Is this ok [y/N]: y
Is this ok [y/N]: y
Downloading Packages:
(1/6): ceph-0.83-777.g5045c5c.el6.x86_64.rpm                                                                                                             |  12 MB     00:01     
(2/6): ceph-common-0.83-777.g5045c5c.el6.x86_64.rpm                                                                                                      | 5.3 MB     00:00     
(3/6): libcephfs1-0.83-777.g5045c5c.el6.x86_64.rpm                                                                                                       | 1.6 MB     00:00     
(4/6): librados2-0.83-777.g5045c5c.el6.x86_64.rpm                                                                                                        | 1.5 MB     00:00     
(5/6): librbd1-0.83-777.g5045c5c.el6.x86_64.rpm                                                                                                          | 351 kB     00:00     
(6/6): python-ceph-0.83-777.g5045c5c.el6.x86_64.rpm                                                                                                      |  69 kB     00:00     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                            11 MB/s |  21 MB     00:01     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Updating   : librados2-0.83-777.g5045c5c.el6.x86_64                                                                                                                       1/8 
  Updating   : librbd1-0.83-777.g5045c5c.el6.x86_64                                                                                                                         2/8 
  Installing : python-ceph-0.83-777.g5045c5c.el6.x86_64                                                                                                                     3/8 
  Installing : ceph-common-0.83-777.g5045c5c.el6.x86_64                                                                                                                     4/8 
  Installing : libcephfs1-0.83-777.g5045c5c.el6.x86_64                                                                                                                      5/8 
  Installing : ceph-0.83-777.g5045c5c.el6.x86_64                                                                                                                            6/8 
  Cleanup    : librbd1-0.67.10-5.gc7948af.el6.x86_64                                                                                                                        7/8 
  Cleanup    : librados2-0.67.10-5.gc7948af.el6.x86_64                                                                                                                      8/8 
  Verifying  : librbd1-0.83-777.g5045c5c.el6.x86_64                                                                                                                         1/8 
  Verifying  : ceph-common-0.83-777.g5045c5c.el6.x86_64                                                                                                                     2/8 
  Verifying  : ceph-0.83-777.g5045c5c.el6.x86_64                                                                                                                            3/8 
  Verifying  : librados2-0.83-777.g5045c5c.el6.x86_64                                                                                                                       4/8 
  Verifying  : python-ceph-0.83-777.g5045c5c.el6.x86_64                                                                                                                     5/8 
  Verifying  : libcephfs1-0.83-777.g5045c5c.el6.x86_64                                                                                                                      6/8 
  Verifying  : librbd1-0.67.10-5.gc7948af.el6.x86_64                                                                                                                        7/8 
  Verifying  : librados2-0.67.10-5.gc7948af.el6.x86_64                                                                                                                      8/8 

Installed:
  ceph.x86_64 0:0.83-777.g5045c5c.el6                                                                                                                                           

Dependency Installed:
  ceph-common.x86_64 0:0.83-777.g5045c5c.el6                libcephfs1.x86_64 0:0.83-777.g5045c5c.el6                python-ceph.x86_64 0:0.83-777.g5045c5c.el6               

Dependency Updated:
  librados2.x86_64 0:0.83-777.g5045c5c.el6                                                librbd1.x86_64 0:0.83-777.g5045c5c.el6                                               

Complete!

#12 Updated by Loïc Dachary over 9 years ago

Here is the part of the teuthology log dealing with the upgrade, which is immediately followed by a core dump from osd.0 and osd.1

2014-08-17T15:34:18.837 INFO:teuthology.task.sequential:In sequential, running task install.upgrade...
2014-08-17T15:34:18.837 INFO:teuthology.task.install:project ceph config {'mon.a': None} overrides {'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:34:18.838 INFO:teuthology.task.install:extra packages: []
2014-08-17T15:34:18.838 INFO:teuthology.task.install:remote ubuntu@vpm096.front.sepia.ceph.com config {'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:34:18.838 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo lsb_release -is'
2014-08-17T15:34:18.979 DEBUG:teuthology.misc:System to be installed: RedHatEnterpriseServer
2014-08-17T15:34:18.979 INFO:teuthology.task.install:Upgrading ceph rpm packages: ceph-debuginfo, ceph-radosgw, ceph-test, ceph-devel, ceph, ceph-fuse, rest-bench, libcephfs_jni1, libcephfs1, python-ceph
2014-08-17T15:34:18.979 INFO:teuthology.orchestra.run.vpm096:Running: 'arch'
2014-08-17T15:34:19.012 INFO:teuthology.orchestra.run.vpm096:Running: 'lsb_release -is'
2014-08-17T15:34:19.126 INFO:teuthology.orchestra.run.vpm096:Running: 'lsb_release -rs'
2014-08-17T15:34:19.171 INFO:teuthology.task.install:config is {'project': 'ceph', 'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:34:19.171 INFO:teuthology.task.install:Host vpm096 is: RedHatEnterpriseServer 6.5 x86_64
2014-08-17T15:34:19.171 INFO:teuthology.orchestra.run.vpm096:Running: 'arch'
2014-08-17T15:34:19.197 INFO:teuthology.orchestra.run.vpm096:Running: 'lsb_release -is'
2014-08-17T15:34:19.307 INFO:teuthology.orchestra.run.vpm096:Running: 'lsb_release -rs'
2014-08-17T15:34:19.350 INFO:teuthology.task.install:config is {'project': 'ceph', 'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:34:19.350 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo lsb_release -is'
2014-08-17T15:34:19.383 INFO:teuthology.orchestra.run.vpm046.stderr:stat: cannot stat `/home/ubuntu/cephtest/mnt.0': No such file or directory
2014-08-17T15:34:19.383 INFO:teuthology.orchestra.run.vpm046:Running: 'mkdir -- /home/ubuntu/cephtest/mnt.0'
2014-08-17T15:34:19.420 INFO:tasks.workunit:Created dir /home/ubuntu/cephtest/mnt.0
2014-08-17T15:34:19.420 INFO:teuthology.orchestra.run.vpm046:Running: 'cd -- /home/ubuntu/cephtest/mnt.0 && mkdir -- client.0'
2014-08-17T15:34:19.467 DEBUG:teuthology.misc:System to be installed: RedHatEnterpriseServer
2014-08-17T15:34:19.467 INFO:teuthology.task.install:Repo base URL: http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9
2014-08-17T15:34:19.467 INFO:teuthology.orchestra.run.vpm096:Running: 'wget -q -O- http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/version'
2014-08-17T15:34:19.519 INFO:teuthology.orchestra.run.vpm046:Running: "mkdir -- /home/ubuntu/cephtest/workunit.client.0 && git archive --remote=git://ceph.newdream.net/git/ceph.git firefly:qa/workunits | tar -C /home/ubuntu/cephtest/workunit.client.0 -x -f- && cd -- /home/ubuntu/cephtest/workunit.client.0 && if test -e Makefile ; then make ; fi && find -executable -type f -printf '%P\\0' >/home/ubuntu/cephtest/workunits.list" 
2014-08-17T15:34:19.610 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo rpm -ev ceph-release'
2014-08-17T15:34:20.113 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo rpm -Uv http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/noarch/ceph-release-1-0.el6.noarch.rpm'
2014-08-17T15:34:20.269 INFO:teuthology.orchestra.run.vpm096:Running: 'arch'
2014-08-17T15:34:20.291 INFO:tasks.workunit.client.0.vpm046.stdout:for d in direct_io fs ; do ( cd $d ; make all ) ; done
2014-08-17T15:34:20.294 INFO:teuthology.orchestra.run.vpm096:Running: 'lsb_release -is'
2014-08-17T15:34:20.298 INFO:tasks.workunit.client.0.vpm046.stdout:make[1]: Entering directory `/home/ubuntu/cephtest/workunit.client.0/direct_io'
2014-08-17T15:34:20.298 INFO:tasks.workunit.client.0.vpm046.stdout:cc -Wall -Wextra -D_GNU_SOURCE direct_io_test.c -o direct_io_test
2014-08-17T15:34:20.406 INFO:teuthology.orchestra.run.vpm096:Running: 'lsb_release -rs'
2014-08-17T15:34:20.448 INFO:teuthology.task.install:config is {'project': 'ceph', 'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:34:20.449 INFO:teuthology.orchestra.run.vpm096:Running: "sudo sed -i -e ':a;N;$!ba;s/enabled=1\\ngpg/enabled=1\\npriority=1\\ngpg/g' -e 's;ref/[a-zA-Z0-9_]*/;sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/;g' /etc/yum.repos.d/ceph.repo" 
2014-08-17T15:34:20.546 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo yum clean all'
2014-08-17T15:34:20.576 INFO:tasks.workunit.client.0.vpm046.stdout:cc -Wall -Wextra -D_GNU_SOURCE test_sync_io.c -o test_sync_io
2014-08-17T15:34:20.636 INFO:tasks.workunit.client.0.vpm046.stdout:cc -Wall -Wextra -D_GNU_SOURCE test_short_dio_read.c -o test_short_dio_read
2014-08-17T15:34:20.684 INFO:tasks.workunit.client.0.vpm046.stdout:make[1]: Leaving directory `/home/ubuntu/cephtest/workunit.client.0/direct_io'
2014-08-17T15:34:20.686 INFO:tasks.workunit.client.0.vpm046.stdout:make[1]: Entering directory `/home/ubuntu/cephtest/workunit.client.0/fs'
2014-08-17T15:34:20.686 INFO:tasks.workunit.client.0.vpm046.stdout:cc -Wall -Wextra -D_GNU_SOURCE test_o_trunc.c -o test_o_trunc
2014-08-17T15:34:20.731 INFO:tasks.workunit.client.0.vpm046.stdout:make[1]: Leaving directory `/home/ubuntu/cephtest/workunit.client.0/fs'
2014-08-17T15:34:20.832 INFO:tasks.workunit:Running workunits matching rados/test.sh on client.0...
2014-08-17T15:34:20.832 INFO:tasks.workunit:Running workunit rados/test.sh...
2014-08-17T15:34:20.832 INFO:teuthology.orchestra.run.vpm046:Running: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/rados/test.sh'
2014-08-17T15:34:20.887 INFO:tasks.workunit.client.0.vpm046.stderr:+ ceph_test_rados_api_aio
2014-08-17T15:34:21.104 INFO:tasks.workunit.client.0.vpm046.stdout:Running main() from gtest_main.cc
2014-08-17T15:34:21.104 INFO:tasks.workunit.client.0.vpm046.stdout:[==========] Running 62 tests from 2 test cases.
2014-08-17T15:34:21.104 INFO:tasks.workunit.client.0.vpm046.stdout:[----------] Global test environment set-up.
2014-08-17T15:34:21.104 INFO:tasks.workunit.client.0.vpm046.stdout:[----------] 31 tests from LibRadosAio
2014-08-17T15:34:21.105 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.SimpleWrite
2014-08-17T15:34:21.809 INFO:teuthology.orchestra.run.vpm096.stdout:Loaded plugins: priorities
2014-08-17T15:34:21.993 INFO:teuthology.orchestra.run.vpm096.stdout:Cleaning repos: Ceph Ceph-noarch centos6-apache-ceph centos6-fcgi-ceph
2014-08-17T15:34:21.993 INFO:teuthology.orchestra.run.vpm096.stdout:              : centos6-misc-ceph centos6-qemu-ceph ceph-source epel local
2014-08-17T15:34:21.993 INFO:teuthology.orchestra.run.vpm096.stdout:              : rhel-6-repo
2014-08-17T15:34:21.993 INFO:teuthology.orchestra.run.vpm096.stdout:Cleaning up Everything
2014-08-17T15:34:22.154 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo yum -y install ceph-debuginfo-0.83 ceph-radosgw-0.83 ceph-test-0.83 ceph-devel-0.83 ceph-0.83 ceph-fuse-0.83 rest-bench-0.83 libcephfs_jni1-0.83 libcephfs1-0.83 python-ceph-0.83'
2014-08-17T15:34:24.081 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.SimpleWrite (2973 ms)
2014-08-17T15:34:24.081 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.SimpleWritePP
2014-08-17T15:34:30.748 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.SimpleWritePP (6669 ms)
2014-08-17T15:34:30.748 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.WaitForSafe
2014-08-17T15:34:35.324 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.WaitForSafe (4575 ms)
2014-08-17T15:34:35.324 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.WaitForSafePP
2014-08-17T15:34:38.432 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.WaitForSafePP (3108 ms)
2014-08-17T15:34:38.432 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTrip
2014-08-17T15:34:42.861 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTrip (4428 ms)
2014-08-17T15:34:42.861 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTrip2
2014-08-17T15:34:45.999 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTrip2 (3139 ms)
2014-08-17T15:34:45.999 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTripPP
2014-08-17T15:34:49.554 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTripPP (3555 ms)
2014-08-17T15:34:49.554 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTripPP2
2014-08-17T15:34:52.679 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTripPP2 (3124 ms)
2014-08-17T15:34:52.679 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTripAppend
2014-08-17T15:34:58.678 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTripAppend (5999 ms)
2014-08-17T15:34:58.678 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTripAppendPP
2014-08-17T15:35:04.393 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTripAppendPP (5715 ms)
2014-08-17T15:35:04.393 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.IsComplete
2014-08-17T15:35:08.605 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.IsComplete (4213 ms)
2014-08-17T15:35:08.605 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.IsCompletePP
2014-08-17T15:35:18.239 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.IsCompletePP (9633 ms)
2014-08-17T15:35:18.239 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.IsSafe
2014-08-17T15:35:22.460 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.IsSafe (4222 ms)
2014-08-17T15:35:22.460 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.IsSafePP
2014-08-17T15:35:25.596 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.IsSafePP (3136 ms)
2014-08-17T15:35:25.596 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.ReturnValue
2014-08-17T15:35:30.148 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.ReturnValue (4551 ms)
2014-08-17T15:35:30.148 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.ReturnValuePP
2014-08-17T15:35:36.744 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.ReturnValuePP (6596 ms)
2014-08-17T15:35:36.744 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.Flush
2014-08-17T15:35:41.510 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.Flush (4767 ms)
2014-08-17T15:35:41.511 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.FlushPP
2014-08-17T15:35:46.514 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.FlushPP (5003 ms)
2014-08-17T15:35:46.514 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.FlushAsync
2014-08-17T15:35:52.453 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.FlushAsync (5939 ms)
2014-08-17T15:35:52.453 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.FlushAsyncPP
2014-08-17T15:35:57.296 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.FlushAsyncPP (4843 ms)
2014-08-17T15:35:57.296 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTripWriteFull
2014-08-17T15:36:02.400 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTripWriteFull (5104 ms)
2014-08-17T15:36:02.400 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.RoundTripWriteFullPP
2014-08-17T15:36:09.884 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.RoundTripWriteFullPP (7485 ms)
2014-08-17T15:36:09.884 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.SimpleStat
2014-08-17T15:36:14.485 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.SimpleStat (4600 ms)
2014-08-17T15:36:14.485 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.SimpleStatPP
2014-08-17T15:36:18.890 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.SimpleStatPP (4407 ms)
2014-08-17T15:36:18.890 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.SimpleStatNS
2014-08-17T15:36:23.457 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.SimpleStatNS (4566 ms)
2014-08-17T15:36:23.457 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.SimpleStatPPNS
2014-08-17T15:36:28.299 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.SimpleStatPPNS (4841 ms)
2014-08-17T15:36:28.299 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.StatRemove
2014-08-17T15:36:33.356 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.StatRemove (5059 ms)
2014-08-17T15:36:33.357 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.StatRemovePP
2014-08-17T15:36:37.380 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.StatRemovePP (4023 ms)
2014-08-17T15:36:37.380 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.OmapPP
2014-08-17T15:36:41.379 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.OmapPP (3998 ms)
2014-08-17T15:36:41.379 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.MultiWrite
2014-08-17T15:36:45.848 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.MultiWrite (4470 ms)
2014-08-17T15:36:45.848 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAio.MultiWritePP
2014-08-17T15:36:49.781 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAio.MultiWritePP (3932 ms)
2014-08-17T15:36:49.781 INFO:tasks.workunit.client.0.vpm046.stdout:[----------] 31 tests from LibRadosAio (148673 ms total)
2014-08-17T15:36:49.781 INFO:tasks.workunit.client.0.vpm046.stdout:
2014-08-17T15:36:49.781 INFO:tasks.workunit.client.0.vpm046.stdout:[----------] 31 tests from LibRadosAioEC
2014-08-17T15:36:49.781 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.SimpleWrite
2014-08-17T15:37:01.065 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.SimpleWrite (11285 ms)
2014-08-17T15:37:01.065 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.SimpleWritePP
2014-08-17T15:37:18.250 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.SimpleWritePP (17184 ms)
2014-08-17T15:37:18.250 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.WaitForSafe
2014-08-17T15:37:21.565 INFO:teuthology.task.sequential:In sequential, running task print...
2014-08-17T15:37:21.565 INFO:teuthology.task.print:**** done install.upgrade mon.a to the version from teuthology-suite arg
2014-08-17T15:37:21.566 INFO:teuthology.task.sequential:In sequential, running task install.upgrade...
2014-08-17T15:37:21.566 INFO:teuthology.task.install:project ceph config {'mon.b': None} overrides {'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:37:21.566 INFO:teuthology.task.install:extra packages: []
2014-08-17T15:37:21.566 INFO:teuthology.task.install:remote ubuntu@vpm057.front.sepia.ceph.com config {'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:37:21.567 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo lsb_release -is'
2014-08-17T15:37:22.327 DEBUG:teuthology.misc:System to be installed: RedHatEnterpriseServer
2014-08-17T15:37:22.327 INFO:teuthology.task.install:Upgrading ceph rpm packages: ceph-debuginfo, ceph-radosgw, ceph-test, ceph-devel, ceph, ceph-fuse, rest-bench, libcephfs_jni1, libcephfs1, python-ceph
2014-08-17T15:37:22.327 INFO:teuthology.orchestra.run.vpm057:Running: 'arch'
2014-08-17T15:37:22.360 INFO:teuthology.orchestra.run.vpm057:Running: 'lsb_release -is'
2014-08-17T15:37:22.469 INFO:teuthology.orchestra.run.vpm057:Running: 'lsb_release -rs'
2014-08-17T15:37:22.513 INFO:teuthology.task.install:config is {'project': 'ceph', 'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:37:22.513 INFO:teuthology.task.install:Host vpm057 is: RedHatEnterpriseServer 6.5 x86_64
2014-08-17T15:37:22.513 INFO:teuthology.orchestra.run.vpm057:Running: 'arch'
2014-08-17T15:37:22.605 INFO:teuthology.orchestra.run.vpm057:Running: 'lsb_release -is'
2014-08-17T15:37:22.713 INFO:teuthology.orchestra.run.vpm057:Running: 'lsb_release -rs'
2014-08-17T15:37:22.753 INFO:teuthology.task.install:config is {'project': 'ceph', 'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:37:22.754 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo lsb_release -is'
2014-08-17T15:37:22.869 DEBUG:teuthology.misc:System to be installed: RedHatEnterpriseServer
2014-08-17T15:37:22.870 INFO:teuthology.task.install:Repo base URL: http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9
2014-08-17T15:37:22.870 INFO:teuthology.orchestra.run.vpm057:Running: 'wget -q -O- http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/version'
2014-08-17T15:37:23.450 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo rpm -ev ceph-release'
2014-08-17T15:37:24.158 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo rpm -Uv http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/noarch/ceph-release-1-0.el6.noarch.rpm'
2014-08-17T15:37:24.322 INFO:teuthology.orchestra.run.vpm057:Running: 'arch'
2014-08-17T15:37:24.347 INFO:teuthology.orchestra.run.vpm057:Running: 'lsb_release -is'
2014-08-17T15:37:24.454 INFO:teuthology.orchestra.run.vpm057:Running: 'lsb_release -rs'
2014-08-17T15:37:24.559 INFO:teuthology.task.install:config is {'project': 'ceph', 'sha1': '5045c5cb4c880255a1a5577c09b89d4be225bee9'}
2014-08-17T15:37:24.559 INFO:teuthology.orchestra.run.vpm057:Running: "sudo sed -i -e ':a;N;$!ba;s/enabled=1\\ngpg/enabled=1\\npriority=1\\ngpg/g' -e 's;ref/[a-zA-Z0-9_]*/;sha1/5045c5cb4c880255a1a5577c09b89d4be225bee9/;g' /etc/yum.repos.d/ceph.repo" 
2014-08-17T15:37:24.653 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo yum clean all'
2014-08-17T15:37:25.239 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.WaitForSafe (6990 ms)
2014-08-17T15:37:25.239 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.WaitForSafePP
2014-08-17T15:37:26.551 INFO:teuthology.orchestra.run.vpm057.stdout:Loaded plugins: priorities
2014-08-17T15:37:26.725 INFO:teuthology.orchestra.run.vpm057.stdout:Cleaning repos: Ceph Ceph-noarch centos6-apache-ceph centos6-fcgi-ceph
2014-08-17T15:37:26.725 INFO:teuthology.orchestra.run.vpm057.stdout:              : centos6-misc-ceph centos6-qemu-ceph ceph-source epel local
2014-08-17T15:37:26.725 INFO:teuthology.orchestra.run.vpm057.stdout:              : rhel-6-repo
2014-08-17T15:37:26.725 INFO:teuthology.orchestra.run.vpm057.stdout:Cleaning up Everything
2014-08-17T15:37:26.876 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo yum -y install ceph-debuginfo-0.83 ceph-radosgw-0.83 ceph-test-0.83 ceph-devel-0.83 ceph-0.83 ceph-fuse-0.83 rest-bench-0.83 libcephfs_jni1-0.83 libcephfs1-0.83 python-ceph-0.83'
2014-08-17T15:37:32.326 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.WaitForSafePP (7087 ms)
2014-08-17T15:37:32.326 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.RoundTrip
2014-08-17T15:37:44.224 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.RoundTrip (11898 ms)
2014-08-17T15:37:44.224 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.RoundTrip2
2014-08-17T15:37:51.050 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.RoundTrip2 (6826 ms)
2014-08-17T15:37:51.050 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.RoundTripPP
2014-08-17T15:38:00.845 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.RoundTripPP (9795 ms)
2014-08-17T15:38:00.845 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.RoundTripPP2
2014-08-17T15:38:08.288 INFO:tasks.ceph.osd.1.vpm096.stderr:daemon-helper: command crashed with signal 11
2014-08-17T15:40:01.498 INFO:tasks.workunit.client.0.vpm046.stdout:[       OK ] LibRadosAioEC.RoundTripPP2 (120652 ms)
2014-08-17T15:40:01.498 INFO:tasks.workunit.client.0.vpm046.stdout:[ RUN      ] LibRadosAioEC.RoundTripAppend
2014-08-17T15:40:08.695 INFO:tasks.ceph.osd.0.vpm096.stderr:daemon-helper: command crashed with signal 11
2014-08-17T15:40:39.685 INFO:teuthology.task.sequential:In sequential, running task print...

#13 Updated by Loïc Dachary over 9 years ago

Here is a possible scenario:

  • ceph-osd-0.80.5 is running but did not load jerasure
  • ceph-osd-0.83 is installed being installed but takes time : the files are installed before ceph-osd is restarted
  • ceph-osd-0.80.5 is required to handle an erasure coded placement group and loads jerasure (the 0.83 version which is not API compatible)
  • ceph-osd-0.80.5 calls the 0.83 jerasure plugin and does not reference the expected part of the code and crashes

The osd.0 log corroborates this theory

2014-08-17 18:33:41.829136 7f06a4e7a7a0  0 ceph version 0.80.5-164-gcc4e625 (cc4e6258d67fb16d4a92c25078a0822a9849cd77), process ceph-osd, pid 8230
...
2014-08-17 18:36:56.245697 7f0692aed700 10 ErasureCodePluginSelectJerasure: generic plugin
...
2014-08-17 18:40:08.383475 7f068a8e0700 -1 *** Caught signal (Segmentation fault) **

while teuthology.log shows (assuming 15:00 in teuthology log is equal to 18:00 in osd log):
2014-08-17T15:33:41.862 INFO:tasks.ceph.osd.0.vpm096.stdout:starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
...
2014-08-17T15:34:18.837 INFO:teuthology.task.sequential:In sequential, running task install.upgrade...
...
2014-08-17T15:34:22.154 INFO:teuthology.orchestra.run.vpm096:Running: 'sudo yum -y install ceph-debuginfo-0.83 ceph-radosgw-0.83 ceph-test-0.83 ceph-devel-0.83 ceph-0.83 ceph-fuse-0.83 rest-bench-0.83 libcephfs_jni1-0.83 libcephfs1-0.83 python-ceph-0.83'
...
2014-08-17T15:40:08.695 INFO:tasks.ceph.osd.0.vpm096.stderr:daemon-helper: command crashed with signal 11

#15 Updated by Loïc Dachary over 9 years ago

  • Subject changed from erasure-code: jerasure_matrix_dotprod segmentation fault to erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade race

#16 Updated by Loïc Dachary over 9 years ago

Stopping the daemons may not be the brightest idea because of http://tracker.ceph.com/issues/8849 . Pre-loading the plugins would be better.

#18 Updated by Loïc Dachary over 9 years ago

  • % Done changed from 0 to 80

#20 Updated by Loïc Dachary over 9 years ago

The teuthology upgrade tests fails consistently with the same problem. Backporting to firefly seem to be the only way to unblock the teuthology tests.

#22 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved

#24 Updated by Loïc Dachary over 9 years ago

  • Status changed from Resolved to In Progress

preloading jerasure is not enough : the plugin selects another plugin to be loaded depending on the CPU features (jerasure SSE4 etc.). This must also be done otherwise the preload is incomplete.

#25 Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

#26 Updated by Loïc Dachary over 9 years ago

  • Status changed from Resolved to In Progress

Preloading must also be done in the mon, for the exact same reasons as in the osd.

#27 Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF