AMF is complemented to handle termination and instantiation with respect to
instantiation level also for the following scenarios:
- SU restart
- termination/instantiation errors during component/SU restart
- instantiation error during cluster start up
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1352 fd59a12c-fef9-0310-b244-a6a79926bd2f
- improves error handling caused by the INSTANTIATE or CLEANUP command
while recovering with component_restart or su_restart
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1347 fd59a12c-fef9-0310-b244-a6a79926bd2f
- AMF handles a component report of injurious health.
- AMF handles saAmfHealthcheckConfirm() SA_AIS_ERR_FAILED_OPERATION
so that if it's a recent recovery ongoing amf does nothing but if it's
no immediate recovery in progress, AMF invokes the recovery action
specified by the component when the health check is started If
the individual recommendation was SA_AMF_NO_RECOMMENDATION,
then AMF uses the configured recovery action for the component
(saAmfCompRecoveryOnError). If this recommendation also is
SA_AMF_NO_RECOMMENDATION, then AMF makes a component restart or
component/SU fail over counts on the value of
saAmfCompDisableRestart and saAmfSUFailover.
- Handling of cleanup of a component and health check response hardened.
- Time supervision and check return value of clc-cli CLEANUP command.
- Handle 'recommended recovery' specified by a component in an error
report. The potential recovery action to choose
implemented is - component restart - and - node fails over.
- The attribute saAmfCompDisableRestart is now recognizable which means
that if the component specifies 'Component restart' and restart is
disabled
then the SU in which the component is contained shall fall over.
- The attribute saAmfSUFailover will not be recognized. SU will always
fail
over as a single entity.
- A component can report an error on another component than itself.
- Implementation 'Instantiation Level' according to chapter 3.9.2 in the
AMF specification.
- Implementation of the escalation levels, component restart, SU
restart, SU fail over and Node fail over.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1321 fd59a12c-fef9-0310-b244-a6a79926bd2f
that doesn't directly is associated to the failing over SU's active assignments
in other SU's
* Improvement of Node fail over to handle remove of those standby assignments
that doesn't directly is associated to the failing over Node SU's active assignments
in other SU's.
* Improvement of SU fail over to handle si assignments to spare SU:s
* Improvement of Node fail over to handle si assignments to spare SU:s
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1285 fd59a12c-fef9-0310-b244-a6a79926bd2f
- a new function sync_request() that can be called by a user to execute
synchronization on request of a specified service.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1280 fd59a12c-fef9-0310-b244-a6a79926bd2f
One of type 'AMF invoked' and one of type 'component invoked'. testamf1.c
code got a bit restructured at the same time.
Changes in amf.conf to complement testamf1
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1274 fd59a12c-fef9-0310-b244-a6a79926bd2f
using CUnit. With this patch, amf.c can handle a full totem send queue.
This is not easily reproducable with function test.
amf.c is also prepared for further component testing with this patch.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1272 fd59a12c-fef9-0310-b244-a6a79926bd2f
A mechanism to defer and recall simultaneous
events in the state machines for amf_cluster,
amf_application and amf_sg.
The implication of this defer and recall mechanism is
that it's now possible to to recover from e.g. several
simultaneous SU failures in an ordered serialized manner.
The events that could be deferred/recalled so far is
SG_FAILOVER_NODE_EV,SG_START_EV,SG_FAILOVER_SU_EV,
CLUSTER_SYNC_READY_EV,APPLICATION_START_EV and
APPLICATION_ASSIGN_WORKLOAD_EV.
Files involved:
Index: exec/amfnode.c
Index: exec/amfsg.c
Index: exec/amfutil.c
Index: exec/amfapp.c
Index: exec/amfcomp.c
Index: exec/amfcluster.c
Index: exec/amf.h
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1266 fd59a12c-fef9-0310-b244-a6a79926bd2f
and joins the cluster quickly (within one second is default), the config
change messages will not indicate that the node left and rejoined. The patch
introduces a short delay in main() to make sure the token_timeout expires.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1259 fd59a12c-fef9-0310-b244-a6a79926bd2f