Project

General

Profile

Bug #6613

samba is crashing in teuthology

Added by Greg Farnum over 10 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
samba
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

At the end of the run:

2013-10-22T07:23:13.890 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.1/client.1/tmp'
2013-10-22T07:23:45.597 INFO:teuthology.task.workunit:Stopping suites/dbench.sh on client.1...
2013-10-22T07:23:45.598 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.1'
2013-10-22T07:23:45.652 DEBUG:teuthology.parallel:result is None
2013-10-22T07:23:45.652 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d77090>
2013-10-22T07:23:45.652 INFO:teuthology.task.cifs-mount:Unmounting cifs clients...
2013-10-22T07:23:45.653 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: 'sudo umount /home/ubuntu/cephtest/mnt.1'
2013-10-22T07:23:45.801 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: "rmdir -- /home/ubuntu/cephtest/mnt.1 2>&1 | grep 'Device or resource busy'" 
2013-10-22T07:23:45.834 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d75250>
2013-10-22T07:23:45.834 INFO:teuthology.task.samba:Stopping smbd processes...
2013-10-22T07:23:45.835 DEBUG:teuthology.task.samba.smbd.0:waiting for process to exit
2013-10-22T07:23:45.942 INFO:teuthology.task.samba.smbd.0.err:[10.214.132.37]: daemon-helper: command crashed with signal 15
2013-10-22T07:23:45.951 ERROR:teuthology.task.samba:Saw exception from smbd.0
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-master/teuthology/task/samba.py", line 174, in task
    d.stop()
  File "/home/teuthworker/teuthology-master/teuthology/task/ceph.py", line 36, in stop
    run.wait([self.proc])
  File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 286, in wait
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 223, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.132.37 with status 1: 'sudo daemon-helper kill nostdin /usr/local/samba/sbin/smbd -F'
2013-10-22T07:23:45.952 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1d75250>

The latest two:
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-21_23:01:15-fs-master-testing-basic-plana/63864/
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-21_23:01:15-fs-master-testing-basic-plana/63872/


Related issues

Duplicated by CephFS - Bug #7012: smbd crash during cifs + dbench Duplicate 12/16/2013

History

#1 Updated by Greg Farnum over 10 years ago

This is happening regularly on dumpling and next, but I don't think I've seen it on cuttlefish. We've clearly done something to break Samba that we'll need to dig into. Based solely on the timing, "mds: fix infinite loop of MDCache::populate_mydir()." is probably worth looking into (although I can't think of any mechanism by which it would matter....).

http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-25_19:01:14-fs-dumpling-testing-basic-plana/68766/
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-25_23:01:10-fs-master-testing-basic-plana/69214/
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-26_23:01:16-fs-next-testing-basic-plana/70399/

#2 Updated by Zheng Yan over 10 years ago

tail of client log:
---
2013-10-22 08:05:27.405155 7ff1167fc700 20 client.4105 trim_cache size 0 max 0
2013-10-22 08:05:27.405157 7ff1167fc700 10 client.4105 unmounting: trim pass, size still 0+0
2013-10-22 08:05:27.405258 7ff132702740 2 client.4105 unmounted.
2013-10-22 08:05:27.405270 7ff132702740 1 client.4105 shutdown
2013-10-22 08:05:27.406122 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down 0x7ff1346ef720 -- 0x7ff1346ef4c0
2013-10-22 08:05:27.406245 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down 0x7ff1346c6060 -- 0x7ff1346c5e00
2013-10-22 08:05:27.406461 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down 0x7ff134673e70 -- 0x7ff134673c10
2013-10-22 08:05:27.414962 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down_all
2013-10-22 08:05:27.415332 7ff132702740 1 -- 10.214.131.3:0/953954433 shutdown complete.
2013-10-22 08:05:27.415738 7ff132702740 20 client.4105 trim_cache size 0 max 0

No error in the client log and signal 15 is SIGTERM. So I don't think samba was crashed.

#3 Updated by Zheng Yan over 10 years ago

  • Status changed from New to Need More Info

#5 Updated by Greg Farnum over 10 years ago

  • Priority changed from Normal to High

#6 Updated by Zheng Yan over 10 years ago

teuthology claims smbd crashed with signal SIGTERM after sending SIGTERM to smbd. No error in the log, no coredump file, it's impossible to diagnose

#7 Updated by Sage Weil about 10 years ago

  • Priority changed from High to Urgent

#8 Updated by Greg Farnum about 10 years ago

  • Priority changed from Urgent to Low

Demoting priority on samba.

#10 Updated by Greg Farnum over 9 years ago

Still happening
/a/teuthology-2014-09-22_23:14:01-samba-giant-testing-basic-multi/50607

#12 Updated by Zheng Yan almost 9 years ago

need to update our samba repository.

#15 Updated by Greg Farnum almost 9 years ago

  • Status changed from Need More Info to 7

Merged into ceph-qa-suite master as of commit:18c35bf23b59a39cd5c574de89e1a52333c57874. We'll need to let it run through a bunch of nightlies to see if this stops popping up, and if it does we'll want to backport that commit to the other nightly branches that are still run.

#16 Updated by Yuri Weinstein almost 9 years ago

  • Regression set to No

Seeing in release firefly v0.80.9 validation

Run: http://pulpito.ceph.com/ubuntu-2015-06-08_11:05:13-samba-firefly---basic-multi/
Jobs: ['925254', '925255', '925256', '925257']
Logs for one: http://qa-proxy.ceph.com/teuthology/ubuntu-2015-06-08_11:05:13-samba-firefly---basic-multi/925254/

2015-06-08T12:08:11.940 INFO:teuthology.orchestra.run.burnupi13.stderr:Testing device=0xfe
2015-06-08T12:08:11.958 INFO:teuthology.orchestra.run.burnupi13.stderr:Testing device=0xff
2015-06-08T12:08:12.015 INFO:teuthology.orchestra.run.burnupi13.stdout:time: 2015-06-08 12:08:12.013067
2015-06-08T12:08:12.015 INFO:teuthology.orchestra.run.burnupi13.stdout:success: scan-ioctl
2015-06-08T12:08:19.980 DEBUG:teuthology.parallel:result is None
2015-06-08T12:08:19.980 DEBUG:teuthology.run_tasks:Unwinding manager samba
2015-06-08T12:08:19.980 INFO:tasks.samba:Stopping smbd processes...
2015-06-08T12:08:19.981 DEBUG:tasks.samba.smbd.0:waiting for process to exit
2015-06-08T12:08:22.848 INFO:tasks.samba.smbd.0.burnupi13.stderr:daemon-helper: command crashed with signal 15
2015-06-08T12:08:31.981 ERROR:tasks.samba:Saw exception from smbd.0
Traceback (most recent call last):
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/samba.py", line 190, in task
    d.stop()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/daemon.py", line 45, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 401, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on burnupi13 with status 1: 'sudo daemon-helper kill nostdin /usr/local/samba/sbin/smbd -F'
2015-06-08T12:08:31.982 ERROR:teuthology.run_tasks:Manager failed: samba
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/samba.py", line 190, in task
    d.stop()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/daemon.py", line 45, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 401, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on burnupi13 with status 1: 'sudo daemon-helper kill nostdin /usr/local/samba/sbin/smbd -F'
2015-06-08T12:08:31.982 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2015-06-08T12:08:31.982 INFO:teuthology.orchestra.run.plana49:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2015-06-08T12:08:32.193 INFO:teuthology.orchestra.run.plana49.stderr:2015-06-08 12:08:32.192143 7f13255c8700  1 -- :/0 messenger.start
2015-06-08T12:08:32.194 INFO:teuthology.orchestra.run.plana49.stderr:2015-06-08 12:08:32.193018 7f13255c8700  1 -- :/1019053 --> 10.214.132.29:6791/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f132002f6e0 con 0x7f132002f370
2015-06-08T12:08:32.194 INFO:teuthology.orchestra.run.plana49.stderr:2015-06-08 12:08:32.193427 7f131cd88700  1 -- 10.214.132.29:0/1019053 learned my addr 10.214.132.29:0/1019053

#17 Updated by Yuri Weinstein over 8 years ago

  • Release set to firefly
  • ceph-qa-suite samba added

#19 Updated by Patrick Donnelly almost 6 years ago

  • Status changed from 7 to Closed

Closing as stale.

Also available in: Atom PDF