tests: Fix Daemon Killing to actually notice when a deamon dies

Lot's of the GR topotests kill daemons in order to test code
that deals with crashing daemons.  Under heavy system load
it was noticed that a kill command was sent and if told to
wait we would sleep 2 seconds send another kill command and
call it good.  This was causiing issues when subsuquent
json commands would get errors like `lost connection to daemon`
as the daemon finally shut down after some time due to load.

Modify the kill the daemon function to notice that the daemon
was not actually killed and if we need to wait wait some
more time for it too happen

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This commit is contained in:
Donald Sharp 2021-11-29 19:33:48 -05:00
parent 31ccdb903f
commit c9f92703bc

View File

@ -1859,7 +1859,7 @@ class Router(Node):
self.cmd("kill -9 %s" % daemonpid)
if pid_exists(int(daemonpid)):
numRunning += 1
if wait and numRunning > 0:
while wait and numRunning > 0:
sleep(
2,
"{}: waiting for {} daemon to be stopped".format(
@ -1883,7 +1883,11 @@ class Router(Node):
)
)
self.cmd("kill -9 %s" % daemonpid)
self.cmd("rm -- {}".format(d.rstrip()))
if daemonpid.isdigit() and not pid_exists(
int(daemonpid)
):
numRunning -= 1
self.cmd("rm -- {}".format(d.rstrip()))
if wait:
errors = self.checkRouterCores(reportOnce=True)
if self.checkRouterVersion("<", minErrorVersion):