Powered By Blogger

Saturday, January 14, 2012

Best practices in configuring fault-policies

In this post I talk about some of the best practices we can think of while designing fault-policies.


JCA binding level retry execution within fault policy retries


If you are using retry actions on adapters with both JCA-level retries for the outbound direction and a retry action in the fault policy file for outbound failures, the JCA-level (or binding level) retries are executed within the fault policy retries.
Lets say you have configured below JCA binding properties in composite.xml

       <property name="jca.retry.count">2</property>
        <property name="jca.retry.interval">5</property>
        <property name="jca.retry.backoff">1</property>

and assume you have following retry configuration in fault policies file for reference binding component for remoteFault

      <Action id="ora-retry">
        <retry>
          <retryCount>3</retryCount>
          <retryInterval>10</retryInterval>
          <exponentialBackoff/>
          <retryFailureAction ref="ora-human-intervention"/>
        </retry>
      </Action>

Now when remote fault returned for reference binding component, following retry sequence happens

    * Fault policy retry 1:
          o  JCA retry 1 (with 5 seconds interval)
          o  JCA retry 2 (with 10 seconds interval)
    * Fault policy retry 2:
          o  JCA retry 1 (with 5 seconds interval)
          o  JCA retry 2 (with 10 seconds interval)
    * Fault policy retry 3:
          o  JCA retry 1 (with 5 seconds interval)
          o  JCA retry 2 (with 10 seconds interval)

As a reminder, if your intention is just retrying through fault-policies then do not use JCA level retries in composite.xml

Best practices in Configuring ora-retry fault policy

When you configure a fault policy to recover instances with the ora-retry action and the number of specified instance retries is exceeded, the instance is marked as open.faulted (in-flight state). The instance remains active and keeps running.
Marking instances as open.faulted ensures that no instances are lost and this also causes reattempting retry (even after retry count is over) which you do not like at this point of time.
Here we have a best practice to follow.
You can then configure another fault handling action following the ora-retry action in the fault policy file, such as the following:
•    Configure an ora-human-intervention action to manually perform instance recovery from Oracle Enterprise Manager Fusion Middleware Control Console.
•    Configure an ora-terminate action to close the instance (mark it as closed.faulted) and never retry again.
For example     
      <Action id="ora-retry">
        <retry>
          <retryCount>3</retryCount>
          <retryInterval>3</retryInterval>
          <exponentialBackoff/>
          <retryFailureAction ref="ora-human-intervention"/>
        </retry>
      </Action>
However, if you do not set an action to be performed after an ora-retry action in the fault policy file and the number of instance retries is exceeded, the instance remains marked as open.faulted, and recovery again attempts to handle the instance.
For example, if no action is defined in the following fault policy file after ora-retry:
<Action id="ora-retry">
       <retry>
          <retryCount>2</retryCount>
          <retryInterval>2</retryInterval>
          <exponentialBackoff/>
       </retry>
  </Action>
The following actions are performed:
•    The invoke activity is attempted (using the above-mentioned fault policy code to handle the fault).
•    Two retries are attempted at increasing intervals (after two seconds, then after four seconds).
•    If all retry attempts fail, the following actions are performed:
o    A detailed fault error message is logged in the audit trail
o    The instance is marked as open.faulted (in-flight state)
o    The instance is picked up and the invoke activity is re-attempted
o    Re throws fault to system fault handler
•    Recovery may also fail. In that case, the invoke activity is re-executed. Additional audit messages are logged.

Message rejection handlers

The messages that error out before being posted to the service infrastructure are referred to as rejected messages. For example, the Oracle File Adapter selects a file having data in text format and tries to translate it to XML format (using NXSD). If there is any error in the translation, this message is rejected and will not be posted to the target composite.   
Rejected messages are stored in the database (in the rejected_message table) by default. If you do not configure a message rejection handler then default rejection message handler will handle  it which stores them on the file system. This handler stores the payload and properties of the message on the file system at a predefined location in WLS_HOME. Currently, the Oracle SOA suite does not provide the capability to resubmit rejected messages; consequently it is your responsibility to take care of the resubmission.If we do not have one then rejected messages may go unnoticed which is not good practice .I recommend to have one to handle and resubmit them.


JCA adapters and fault-policies


In nutshell standard practices for JCA EH are:
Error type
Best practise
Inbound  Retryable
Use JCA level retries in composite.xml
Inbound  Non-retryable
Use fault-policies message rejection handlers
Outbound Retryable
Use JCA level retries in composite.xml
Outbound Non-retryable
Use fault-policies

4 comments:

  1. good content..Thanks for sharing.

    ReplyDelete
  2. Hi. Good article, thanks. How to configure a fault policy for a JMS Consumer Composite *not* to write rejected messages to the filesystem ? Thanks. OracleDigga.

    ReplyDelete
    Replies
    1. Answer to my question about fault policy here :
      http://oracle-aia-11g.blogspot.fr/

      Delete
    2. Direct link :
      http://oracle-aia-11g.blogspot.fr/2013/10/oracle-aia-11g-et-les-fault-policies.html

      Delete