Wednesday, September 12, 2018

Managing Long Running Jobs In RedPrairie/JDA

Overview

We often have jobs that perform long tasks.  There is no easy way to see the work they are performing at a given time.  The only option is to look at MOCA Console or other similar view which provides a very low level window into the MOCA command or SQL that is being executed.  Historical information is not easy to view either.

While this solution describes it in the context of RedPrairie/JDA - the concept is universal and can be used in any solution where long running operations are implemented.

Our Solution

At Oracular we have developed a universal approach to handle this problem which we apply in all of our solutions including RedPririe/JDA jobs.  We view any long running operation as follows:
  • Each instance of execution is a job
  • The job executes one or more modules
  • The module executes one or more actions
  • Start and stop of job, module, and action is recorded with timestamp in a separate commit context so that it is visible outside the transaction right way.
  • In case of an error the error information is recorded as well.
  • All of this information is kept in a database table so it can be referenced later
  • DDAs are available so that end users can view this information.
A table called usr_ossi_job_log records this information.  It has following structure:
ColumnDescription
Job#A new number for every job
Module#A new number for every module within the job
Action#A new number for every action within the module
WhoA string to identify user or other context which is executing the job
Job IdJob Name
Module IdModule Name
Action IdAction Name
Context DataAdditional Data
Started OnStart Date/Time
Ended OnEnd Date/Time
Error CodeIf an error occurs error code
Error MessageIf an error occurs error message

So as a job progresses it keeps updating this table in a separate commit context.  This allows the users to know exactly how far along this instance of the job is and given that we have historical data we can reasonably predict when the job will end.

Making It Happen!

To make it all come together we have created a set of MOCA commands and MOCA functions that allow our code to be simple and at the same time provide information to this table.  A typical such job will be structured as follows:
/*
 * Adds a row to the table with new job#, job-id, additional information, 
 * and who information.  The start date/time is logged as well.  module 
 * and action are set to % (literal).  Commits this data in a separate 
 * commit context
 */
publish data
where uc_ossi_job_seq = ossi__register_job ( 'JOB-ID, 'SOME-INFO', 
                        'WHO-INFO' )
|
try
{
    /* 
     * Some step(s) during the job
     * Adds row to the table for the job#.  gets a new module# and 
     * returns that.  Row has module name and also additional 
     * information as needed.  Additional.  Action is set to 
     * % (literal).  Additional Information can come in handy 
     * to for example to log when we are processing several 
     * rows of a domain, like processing orders in a wave.
     * You could log something like 1/n there so that you can see 
     * how far along the job is.  This is committed in a separate 
     * commit context.
     */
    publish data
    where uc_ossi_module_seq = ossi__register_module ( @uc_ossi_job_seq, 
                               'MODULE NAME', 'MORE DATA' )
    |
    try
    {
        /* 
         * This module will have 1 or more actions - each coded like
         * We do not have to use action concept - this becomes useful 
         * if we can divide a module further.  For example module could 
         * be an order and action could be pick.  Idea is same as module 
         * - here we can give action name and additional information
         * Additional Information can be used to provide specific 
         * details and also 1/n type of data.  This is committed in a 
         * separate commit context.
         */
        publish data 
        where uc_ossi_action_seq = ossi__register_action ( 
                                   @uc_ossi_job_seq, @uc_ossi_module_seq, 
                                   'ACTION NAME', 'MORE DATA' )
        |
        try
        {
            moca comamnds that do the work
        }
        finally
        {
            /*
             * uc_ossi_job_seq, uc_ossi_module_seq. uc_ossi_action_seq are in scope
             * so that row
             * is updated with timestamp and error information.
             * This is committed in a separate commit context.
             */
            complete ossi job log
            where uc_ossi_err_code  = @?
            and   uc_ossi_err_descr = @!
        }
    }
    finally
    {
        /*
         * uc_ossi_job_seq and uc_ossi_module_seq are in scope so that row
         * is updated with timestamp and error information.
         * This is committed in a separate commit context.
         */
        complete ossi job log
        where uc_ossi_err_code  = @?
        and   uc_ossi_err_descr = @!
    }
}
finally
{
    /*
     * Updates the row in the table for the job itself (added in 
     * register of job).  Sets end time and also error 
     * information if applicable.
     * This is committed in a separate commit context.
     */
    complete ossi job
    where uc_ossi_err_code  = @?
    and   uc_ossi_err_descr = @!
} 

So as this job progresses through several actions we will be publishing the progress of the job to this table - so we will know exactly how far along it is.  We will also know historically how long the job has taken and how long the various modules and actions have taken.

The data structure is generic and will work for any type of long running operation.

End User View

The users can view all of the jobs in a simple to use front-end.  First screen provides a dropdown to see all the jobs defined in the system:

Then it shows a view that indicates how many times the job has run and the times of the latest execution:
By going to the Job Log Detail Display, we can see the progress of the latest execution.  Here the users can see the progress of the currently executing job as well.
To see previous executions, go to Job Executions tab:

Summary

The users can view all of the jobs using this approach.   The data provided by this approach is extremely valuable for support.  We can detect if a job is stuck and also monitor progress of long running operations.  We can easily predict when the job will end and can objectively determine if the performance has deteriorated.  In case of performance we can pinpoint the exact operation that is the culprit.

1 comment: