Project

General

Profile

Bug #6613

samba is crashing in teuthology

Added by Greg Farnum almost 6 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
Start date:
10/22/2013
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
samba
Component(FS):
Labels (FS):
Pull request ID:

Description

At the end of the run:

2013-10-22T07:23:13.890 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.1/client.1/tmp'
2013-10-22T07:23:45.597 INFO:teuthology.task.workunit:Stopping suites/dbench.sh on client.1...
2013-10-22T07:23:45.598 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.1'
2013-10-22T07:23:45.652 DEBUG:teuthology.parallel:result is None
2013-10-22T07:23:45.652 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d77090>
2013-10-22T07:23:45.652 INFO:teuthology.task.cifs-mount:Unmounting cifs clients...
2013-10-22T07:23:45.653 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: 'sudo umount /home/ubuntu/cephtest/mnt.1'
2013-10-22T07:23:45.801 DEBUG:teuthology.orchestra.run:Running [10.214.132.37]: "rmdir -- /home/ubuntu/cephtest/mnt.1 2>&1 | grep 'Device or resource busy'" 
2013-10-22T07:23:45.834 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d75250>
2013-10-22T07:23:45.834 INFO:teuthology.task.samba:Stopping smbd processes...
2013-10-22T07:23:45.835 DEBUG:teuthology.task.samba.smbd.0:waiting for process to exit
2013-10-22T07:23:45.942 INFO:teuthology.task.samba.smbd.0.err:[10.214.132.37]: daemon-helper: command crashed with signal 15
2013-10-22T07:23:45.951 ERROR:teuthology.task.samba:Saw exception from smbd.0
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-master/teuthology/task/samba.py", line 174, in task
    d.stop()
  File "/home/teuthworker/teuthology-master/teuthology/task/ceph.py", line 36, in stop
    run.wait([self.proc])
  File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 286, in wait
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 223, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.132.37 with status 1: 'sudo daemon-helper kill nostdin /usr/local/samba/sbin/smbd -F'
2013-10-22T07:23:45.952 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1d75250>

The latest two:
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-21_23:01:15-fs-master-testing-basic-plana/63864/
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-21_23:01:15-fs-master-testing-basic-plana/63872/


Related issues

Duplicated by fs - Bug #7012: smbd crash during cifs + dbench Duplicate 12/16/2013

History

#1 Updated by Greg Farnum almost 6 years ago

This is happening regularly on dumpling and next, but I don't think I've seen it on cuttlefish. We've clearly done something to break Samba that we'll need to dig into. Based solely on the timing, "mds: fix infinite loop of MDCache::populate_mydir()." is probably worth looking into (although I can't think of any mechanism by which it would matter....).

http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-25_19:01:14-fs-dumpling-testing-basic-plana/68766/
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-25_23:01:10-fs-master-testing-basic-plana/69214/
http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-26_23:01:16-fs-next-testing-basic-plana/70399/

#2 Updated by Zheng Yan almost 6 years ago

tail of client log:
---
2013-10-22 08:05:27.405155 7ff1167fc700 20 client.4105 trim_cache size 0 max 0
2013-10-22 08:05:27.405157 7ff1167fc700 10 client.4105 unmounting: trim pass, size still 0+0
2013-10-22 08:05:27.405258 7ff132702740 2 client.4105 unmounted.
2013-10-22 08:05:27.405270 7ff132702740 1 client.4105 shutdown
2013-10-22 08:05:27.406122 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down 0x7ff1346ef720 -- 0x7ff1346ef4c0
2013-10-22 08:05:27.406245 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down 0x7ff1346c6060 -- 0x7ff1346c5e00
2013-10-22 08:05:27.406461 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down 0x7ff134673e70 -- 0x7ff134673c10
2013-10-22 08:05:27.414962 7ff132702740 1 -- 10.214.131.3:0/953954433 mark_down_all
2013-10-22 08:05:27.415332 7ff132702740 1 -- 10.214.131.3:0/953954433 shutdown complete.
2013-10-22 08:05:27.415738 7ff132702740 20 client.4105 trim_cache size 0 max 0

No error in the client log and signal 15 is SIGTERM. So I don't think samba was crashed.

#3 Updated by Zheng Yan almost 6 years ago

  • Status changed from New to Need More Info

#5 Updated by Greg Farnum over 5 years ago

  • Priority changed from Normal to High

#6 Updated by Zheng Yan over 5 years ago

teuthology claims smbd crashed with signal SIGTERM after sending SIGTERM to smbd. No error in the log, no coredump file, it's impossible to diagnose

#7 Updated by Sage Weil over 5 years ago

  • Priority changed from High to Urgent

#8 Updated by Greg Farnum over 5 years ago

  • Priority changed from Urgent to Low

Demoting priority on samba.

#10 Updated by Greg Farnum almost 5 years ago

Still happening
/a/teuthology-2014-09-22_23:14:01-samba-giant-testing-basic-multi/50607

#12 Updated by Zheng Yan over 4 years ago

need to update our samba repository.

#15 Updated by Greg Farnum over 4 years ago

  • Status changed from Need More Info to Testing

Merged into ceph-qa-suite master as of commit:18c35bf23b59a39cd5c574de89e1a52333c57874. We'll need to let it run through a bunch of nightlies to see if this stops popping up, and if it does we'll want to backport that commit to the other nightly branches that are still run.

#16 Updated by Yuri Weinstein about 4 years ago

  • Regression set to No

Seeing in release firefly v0.80.9 validation

Run: http://pulpito.ceph.com/ubuntu-2015-06-08_11:05:13-samba-firefly---basic-multi/
Jobs: ['925254', '925255', '925256', '925257']
Logs for one: http://qa-proxy.ceph.com/teuthology/ubuntu-2015-06-08_11:05:13-samba-firefly---basic-multi/925254/

2015-06-08T12:08:11.940 INFO:teuthology.orchestra.run.burnupi13.stderr:Testing device=0xfe
2015-06-08T12:08:11.958 INFO:teuthology.orchestra.run.burnupi13.stderr:Testing device=0xff
2015-06-08T12:08:12.015 INFO:teuthology.orchestra.run.burnupi13.stdout:time: 2015-06-08 12:08:12.013067
2015-06-08T12:08:12.015 INFO:teuthology.orchestra.run.burnupi13.stdout:success: scan-ioctl
2015-06-08T12:08:19.980 DEBUG:teuthology.parallel:result is None
2015-06-08T12:08:19.980 DEBUG:teuthology.run_tasks:Unwinding manager samba
2015-06-08T12:08:19.980 INFO:tasks.samba:Stopping smbd processes...
2015-06-08T12:08:19.981 DEBUG:tasks.samba.smbd.0:waiting for process to exit
2015-06-08T12:08:22.848 INFO:tasks.samba.smbd.0.burnupi13.stderr:daemon-helper: command crashed with signal 15
2015-06-08T12:08:31.981 ERROR:tasks.samba:Saw exception from smbd.0
Traceback (most recent call last):
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/samba.py", line 190, in task
    d.stop()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/daemon.py", line 45, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 401, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on burnupi13 with status 1: 'sudo daemon-helper kill nostdin /usr/local/samba/sbin/smbd -F'
2015-06-08T12:08:31.982 ERROR:teuthology.run_tasks:Manager failed: samba
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/samba.py", line 190, in task
    d.stop()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/daemon.py", line 45, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 401, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on burnupi13 with status 1: 'sudo daemon-helper kill nostdin /usr/local/samba/sbin/smbd -F'
2015-06-08T12:08:31.982 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2015-06-08T12:08:31.982 INFO:teuthology.orchestra.run.plana49:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2015-06-08T12:08:32.193 INFO:teuthology.orchestra.run.plana49.stderr:2015-06-08 12:08:32.192143 7f13255c8700  1 -- :/0 messenger.start
2015-06-08T12:08:32.194 INFO:teuthology.orchestra.run.plana49.stderr:2015-06-08 12:08:32.193018 7f13255c8700  1 -- :/1019053 --> 10.214.132.29:6791/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f132002f6e0 con 0x7f132002f370
2015-06-08T12:08:32.194 INFO:teuthology.orchestra.run.plana49.stderr:2015-06-08 12:08:32.193427 7f131cd88700  1 -- 10.214.132.29:0/1019053 learned my addr 10.214.132.29:0/1019053

#17 Updated by Yuri Weinstein almost 4 years ago

  • Release set to firefly
  • ceph-qa-suite samba added

#19 Updated by Patrick Donnelly over 1 year ago

  • Status changed from Testing to Closed

Closing as stale.

Also available in: Atom PDF