Commit Graph

17 Commits

Author SHA1 Message Date
Steven Dake
cb154572a2 Patch from Renaud to report some broken Solaris porting from past.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1353 fd59a12c-fef9-0310-b244-a6a79926bd2f
2007-03-06 16:18:44 +00:00
Hans Feldt
06b87e1322 Was writing to random mem using an uninitialized pointer
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1351 fd59a12c-fef9-0310-b244-a6a79926bd2f
2007-01-25 08:19:38 +00:00
Steven Dake
aec3f25bc8 Display the names of the configuration files used by openais.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1350 fd59a12c-fef9-0310-b244-a6a79926bd2f
2007-01-23 17:39:43 +00:00
Lon Hohberger
8f87e5f413 This patch contians:
-  AMF handles a component report of injurious health.

- AMF handles saAmfHealthcheckConfirm() SA_AIS_ERR_FAILED_OPERATION
so that if it's a recent recovery ongoing amf does nothing but if it's
no  immediate recovery in progress, AMF invokes the recovery action
specified by the component when the health check is started If
the individual recommendation was SA_AMF_NO_RECOMMENDATION,
then AMF uses the configured recovery action for the component
(saAmfCompRecoveryOnError). If this recommendation also is
SA_AMF_NO_RECOMMENDATION, then AMF makes a component restart or
component/SU fail over counts on the value of
saAmfCompDisableRestart and saAmfSUFailover.

- Handling of cleanup of a component and health check response hardened.


- Time supervision and check return value of clc-cli CLEANUP command.


- Handle 'recommended recovery' specified by a component in an error
report. The potential recovery action to  choose
implemented is - component restart - and - node fails over.

- The attribute saAmfCompDisableRestart is now recognizable which means
that if the component specifies 'Component restart' and restart is
disabled
then the SU in which the component is contained shall fall over.

- The attribute saAmfSUFailover will not be recognized. SU will always
  fail
over as a single entity.

- A component can report an error on another component than itself.


- Implementation 'Instantiation Level' according to chapter 3.9.2 in the
AMF specification.
- Implementation of the escalation levels, component restart, SU
restart, SU fail over and Node fail over.



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1321 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-12-11 05:37:07 +00:00
Hans Feldt
3d68074945 - Use of sync_request() in SYNC service
- sync_abort() callback implemented



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1317 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-12-04 14:28:40 +00:00
Hans Feldt
4e7e222aea * Improvement of SU fail over to handle remove of those standby assignments
that doesn't directly is associated to the failing over SU's active assignments
 in other SU's 
* Improvement of Node fail over to handle remove of those standby assignments
 that doesn't directly is associated to the failing over  Node SU's active assignments
 in other SU's.

* Improvement of SU fail over to handle si assignments to spare SU:s

* Improvement of Node fail over to handle si assignments to spare SU:s



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1285 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-10-27 09:58:59 +00:00
Lon Hohberger
f983d37a2f Patch contains:
A mechanism to defer and recall simultaneous
events in the state machines for amf_cluster,
amf_application and amf_sg.

The implication of this defer and recall mechanism is
that it's now possible to to recover from e.g. several
simultaneous SU failures in an ordered serialized manner.

The events that could be deferred/recalled so far is
SG_FAILOVER_NODE_EV,SG_START_EV,SG_FAILOVER_SU_EV,
CLUSTER_SYNC_READY_EV,APPLICATION_START_EV and
APPLICATION_ASSIGN_WORKLOAD_EV.

Files involved:

Index: exec/amfnode.c
Index: exec/amfsg.c
Index: exec/amfutil.c
Index: exec/amfapp.c
Index: exec/amfcomp.c
Index: exec/amfcluster.c
Index: exec/amf.h



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1266 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-10-13 10:18:04 +00:00
Lon Hohberger
5126acc37a The patch contains:
The instantiaton of the component is performed with some new steps:
 
 1. SU invoke Comp to instantiate
 2. Comp multicast a new event 
 MESSAGE_REQ_EXEC_AMF_COMPONENT_INSTANTIATE
 3. Comp receive the new  event 
 MESSAGE_REQ_EXEC_AMF_COMPONENT_INSTANTIATE
 4. If the Comp is within the SU hosted on the node. The 

 component invokes

    the clc_cli instantiate script to start the component 

 and start a timer

    to supervise the start and registration of the component.
 5. If the instantiation time elapse before the component has 
 registered himself
    Comp is sending a new multicasted event
    MESSAGE_REQ_EXEC_AMF_COMPONENT_INSTANTIATE_TMO.
 6. Comp receive 

 MESSAGE_REQ_EXEC_AMF_COMPONENT_INSTANTIATE_TMO event.

 7. The Comp presence state is set to 
 SA_AMF_PRESENCE_INSTANTIATION_FAILED
 8. When all Components are in presence state 

 SA_AMF_PRESENCE_INSTANTIATED or

    SA_AMF_PRESENCE_INSTANTIATION_FAILED the start or restart will 
 continue with
    the assignment of load.
 
 This implemntation means that the complete instantiation procedure 
 never will be endlessly waiting for a register. The 

 instantiation will 

 either turn out in a component instantiation failure or a success.
 
 
 Hardening of the cluster start use case:
 
 1. A clearer separation of the responsibilities between 

 amf_cluster and

    amf_application.
                      
 2. A clearer interface and separation between amf_main (amf.c) and 
 amf_cluster.
 
 3. A clearer interface and separation between amf_cluster 

 and amf_node.

 
 4. A clearer separation of the responsibilities between amf_node and
    amf_application.
 



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1251 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-10-04 05:24:14 +00:00
Hans Feldt
2e86a93d45 Two configuration attributes for SG objects were not handled
correctly by the config parser. 


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1249 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-09-29 11:23:00 +00:00
Hans Feldt
da1fc7acd2 - "No need for DNS or /etc/hosts"
The call to gethostbyaddr() has been removed. This has been replaced by a
protocol where each node multicasts its hostname (obtained with gethostname()).

- "Logical AMF nodes"

The AMF node name is no longer a hostname. The saAmfNodeClmNode
configuration attribute of the AMF node is now the hostname. This config
attribute is now mandatory. The change to amf.conf file shows required changes.

- Some other AMF sync bug fixes



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1236 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-09-05 07:31:28 +00:00
Hans Feldt
634d196c47 1. Improvement of the use case 'Amf node leave spontaneously'
2.    Improvement of the use case 'Amf node join'
3.    Improvement to manage more than one SG within an Application.
4.    Improvement to manage an arbitrary number of Csi-assignments associated
       to the Csi



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1232 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-08-28 12:44:15 +00:00
Hans Feldt
4a61871344 AMF sync #2
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1220 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-08-17 13:25:01 +00:00
Hans Feldt
98dfb95e26 - New sync state machine, implemented and described in amf.c
- One AMF node reads the AMF config file (IMM style)
- One AMF node syncs others AMF nodes
- One AMF object is serialized and sent as one message
- Serialization/deserialization of most objects is trivial (memcpy)
except for component and csi-attributes objects which have variable size
arrays/strings.
- Depth first AMF object tree traversal preserves relations when syncing
- Ordered lists of SUs and SIs
- Constructors/destructor per class
- Serializers/deserializers per class
- Config-change changes sync state
- Sync callbacks executes the sync
- "Use case" tracing for sync using the SYNCTRACE macro (trace6)
- Sync master is initially the winner of a timeout race and if the
master leaves the cluster, the node with the lowest node ID becomes new master.
- amf_malloc implements an AMF central malloc routine with error handling.



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1200 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-08-11 12:28:10 +00:00
Hans Feldt
62bc733e2e - Error escalation improved, SU failover recovery action added
- Most runtime attributes in the inf. model calculated in runtime from
  more fundamental information. (improves consistency)
- sg_assign_si can now recalculate workloads considering existing
  assignments
- Logging improvements, similar to what is required as notification in
  AMF spec.
- CLC-CLI INSTANTIATE now exits aisexec when it fails (should later be
  sent as an NTF alarm)
- CLC-CLI CLEANUP correctly handles already terminated processes
- testamf1.c printouts removed for normal operation
- Iterator functions for SI/CSI assignments 



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1108 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-07-07 08:04:01 +00:00
Hans Feldt
154a857c3b AMF changes:
- Revised cluster start 
- Includes Steven's "amf invalid write patch"
- Includes "components not started with 0.76" patch
- New timer API use backed out of AMF (temporary)



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1091 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-06-27 08:49:07 +00:00
Steven Dake
01afe82393 32/64/mixed endian support for checkpoint service.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1074 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-06-21 21:15:16 +00:00
Hans Feldt
e993689ac5 Refactoring of AMF into several files (based on classed in inf.
model). A central header file (amf.h) keeps all the definitions and
prototypes needed.

New things apart from that:
- some doxygen html generated from AMF e.g. each file has a description
- saAmfHAStateGet() now works
- component invoked healthchecks implemented (but not tested)



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1071 fd59a12c-fef9-0310-b244-a6a79926bd2f
2006-06-20 06:45:16 +00:00