Sunday, October 21, 2012

Improved Archiving for RedPrairie

Overview of the RedPrairie standard solution

An archive instance is a complete RedPrairie installation.  The data from the primary instance is copied to the archive instance using MOCA remote execution capability.  “Database Archive Maintenance” application is used to define this.  This setup is done on the primary instance.
RedPrairie uses a policy on the primary instance to define the location of the archive instance.

  •  Use “INSTALLED” to indicate if the system is installed.
  •  Use “REMOTE-HOST” to specify the host and port of the archive environment.
  •  Use “LOCAL-ARCHIVE” to archive data to tables in the same instance.  Use “TABLE-PREFIX” to determine the table name in that case.  This is typically not used because in this case the existing applications cannot be used against the archive instance.

The setup is generally as follows:
1.       Specify the main table that is being archived.  This is the actual table name
2.       Specify the duplicate action.  This tells us what to do in case of a duplicate
a.       Do we ignore the error and continue?, i.e. we do not error out and we do not replace the value.  Typical for data that is only added.
b.      Do we error out and rollback?
c.       Do we replace the current data and continue.
3.       After this, we specify the “List Command”.  This command will list the rows from the primary data source that need to be archived.   The filter will ensure that we correct the proper candidates; for example check statuses and date ranges.
4.       Use “Post Archive Command” to specify the work that needs to be done once this object has been processed; for example we may purge the rows that were archived.
5.       Use “Details” to specify dependent data that needs to be archived.  This also specifies a list command.  The “@” parameters refer to the variables that are returned by the row specified above in the “List Command”. 
The above data is stored in the archdr and arcdtl tables.
The data is processed as follows:
1.       Fetch the rows from the main table using the “List Command”.  This would return multiple rows.
2.       For each row returned.
a.       Move the row to the archive environment.
b.      For all “Detail” objects
                                                               i.      Execute the “List Command”.  This utilizes the context for the list command from the main table specified above.
                                                             ii.      For each row
1.       Move the row to the archive environment. 

Issues with the standard solution

The above approach is quite sophisticated and works in some cases.  We often see following types of issues:
  1. It processes one row at a time.  This implies that if we are required to move 200,000 rows (a typical number for some of the large tables); we will require 600,000 independent operations (200,000 to read the data, 200,000 to move the data, and 200,000 to delete the data from the source).
  2. The logic to move the dependent data uses the same row by row approach.  For each row in the header table the data for the detail table is moved one row at a time.
  3. The solution does not provide visibility to the archive process.  The user cannot find out how archiving ran, how many rows were processed, or what was the status, etc.
  4. If archiving fails for a period or if it is turned on some time after the go-live; archiving some of the larger tables like dlytrn becomes extremely difficult.

Archiving++ by Oracular

Our solution will keep the best of the RedPrairie solution and focus on optimizing it for high volume scenarios.  The key part of our approach is to use database capabilities to transfer the data.  While moving a million rows one row at a time is a significant task doing it using advanced database techniques is quite trivial and fast.  Our solution uses the following techniques:
  • We will use the database features to connect multiple database instances.  For example on Oracle we will use “Database Links” and for SQLServer we will use “Linked Servers”.  Our preferred approach is to co-locate the archive database in the same instance as the online database - this improves the performance even further.
  • We will use the “insert into <target> select * from <source>” technique for moving the data.
  • We will use “delete” with a where clause to delete the data rather than a delete that processes the data one row at a time.
  • For updating the archive instance, we will do it in two steps:
    • Delete the target rows
    • Insert the new data
  • We will use correlated queries between the two instances
  • We will have generic, central components to do the work for any table.
  • We will utilize the information in the database data dictionary to streamline the process of moving the data from the primary instance to the archive instance.
Our solution will employ the same database objects and applications as the RedPrairie standard solution.  This will allow the users to easily switch from the RedPrairie jobs to the jobs that we provide.

Implementation Details

All of archiving can be a single job or a set of few jobs
Our solution allows for setting up a single job that will run a set of pre-defined archives.  We will do it as opposed to running an archive one at a time. 

Archive Job Number (uc_ossi_arcnum)
Our solution introduces a new concept of an “Archive Job Number”.  Whenever archiving starts we will set up a new archive job number and all of the data that is thus archived will be mapped to that archive job number.

uc_ossi_arcnum Column
All of the tables that are to be archived using this approach will need a new column called “uc_ossi_arcnum”.  This will contain the “Archive Job Number” of the last job that archived this specific row.

uc_ossi_job_log Table
We will use this table to log the progress of an archive job.  It will provide visibility to the progress of the archive process.  It can be used to compare the performance of archiving over time.  This table will have the following columns.
Column Name
Type
Comments
uc_ossi_job_seq
Number(12) – PK 1
A unique number for every execution of the job
uc_ossi_job_log_seq
Number(12) – PK 2
A job can have several steps, this corresponds to that.
uc_ossi_job_id
Varchar2(200)
The name of the job. 
For archiving it will be the name of the job as defined in RedPrairie schedule agent operations.
uc_ossi_job_num
Varchar2(200)
A number used to identify an instance of the job.
For archiving we will use this to store the archive job number, i.e. the uc_ossi_arcnum value.
uc_ossi_job_module
Varchar2(200)
Job module.
For archiving we will use this to store the archive name (as defined in archdr table)
uc_ossi_job_action
varchar2(200)
Job Action for a given job module.
For archiving we will use this to store the table name that is being processed.  We will prefix the entries with “MARK-“ or “ARC-“ based on the type of work that we are performing.
uc_ossi_job_start_dt
Date/Time
Start date/time of the operation.
uc_ossi_job_end_dt
Date/Time
End date/time of the operatioin.
uc_ossi_job_cmpl_cd
Varchar2(20)
Completion Status.  For our case this can be:
·         SUCC
·         FAIL
                               
This table closely follows the Oracle concepts of “client_info”, “module”, and “action” that are used to monitor the sessions.  We have put the information in a de-normalized fashion to simplify the query capabilities.   For a typical run the data would look like:

Seq
Log_seq
Id
Num
Module
Action
Start_dt
End_Dt
Cmpl_Cd
10
1
JOB01
72
%
%
1/1/09 00:00:00
1/1/09 00:10:00
SUCC
10
2
JOB01
72
ARCHIVE01
%
1/1/09 00:00:00
1/1/09 00:08:00
SUCC
10
3
JOB01
72
ARCHIVE01
MARK-SHIPMENT
1/1/09 00:00:00
1/1/09 00:01:00
SUCC
10
4
JOB01
72
ARCHIVE01
MARK-SHIPMENT_INS
1/1/09 00:01:00
1/1/09 00:02:00
SUCC
10
5
JOB01
72
ARCHIVE01
MARK-MANFST
1/1/09 00:02:00
1/1/09 00:03:00
SUCC
10
6
JOB01
72
ARCHIVE01
MARK-PCKWRK
1/1/09 00:03:00
1/1/09 00:04:00
SUCC
10
7
JOB01
72
ARCHIVE01
ARC-SHIPMENT
1/1/09 00:04:00
1/1/09 00:05:00
SUCC
10
8
JOB01
72
ARCHIVE01
ARC-SHIPMENT_INS
1/1/09 00:05:00
1/1/09 00:06:00
SUCC
10
9
JOB01
72
ARCHIVE01
ARC-MANFST
1/1/09 00:06:00
1/1/09 00:07:00
SUCC
10
10
JOB01
72
ARCHIVE01
ARC-PCKWRK
1/1/09 00:07:00
1/1/09 00:08:00
SUCC
10
11
JOB01
72
ARCHIVE02
%
1/1/09 00:08:00
1/1/09 00:10:00
SUCC
10
12
JOB01
72
ARCHIVE02
MARK-BOMHDR
1/1/09 00:08:00
1/1/09 00:09:00
SUCC
10
12
JOB01
72
ARCHIVE02
ARC-BOMHDR
1/1/09 00:09:00
1/1/09 00:10:00
SUCC

We will commit the entries in this table right away (without impacting the active transaction).  This will allow the users to query the table to see the current status of the activity.  It has summary rows for the whole job and the whole archive.  The user will be able to use this to compare the performance over time.

If we encounter an error, we will update the corresponding row in the above table with “FAIL”.  The error details will be available in the uc_ossi_job_err_log table.  That table will refer to the above table so we will know exactly which step failed and how.

The users can use the API for the above tables in their own jobs so that the information from those jobs is also available in this fashion.

Application to View the job history
This DDA will allow the user to query the data from the uc_ossi_job_log table.  The user will be able to look at the errors for a given job from here as well.

Policy for storing archive parameters
Standard RedPrairie implementation uses hard-coded values within the “list command” to specify the appropriate filter criteria.  Our solution will use the following policy structure.

Policy Code = USR-OSSI-ARC.  Warehouse Id will always be “----“
                Policy Variable = MISCELLANEOUS
                                Policy Value = DBLINK
                                                Set rtstr1 value to the name of the database link

                Policy Variable = DEFAULT
                                Policy Value = Arbitrary Text, e.g. NUM_DAYS_FOR_SHIPMENTS
                                                Set rtstr1, rtnum1, etc.
                Policy variable = DEFAULT-Specific Archive (from archdr table)
                                Policy Value = Arbitrary Text, e.g. NUM_DAYS_FOR_SHIPMENTS
                                                Set rtstr1, rtnum1, etc.

Before executing the archiving logic for a specific archive, we will fetch data from this policy.  We will publish the various parameters on to the stack.  For example for the case above, a variable called “ARC_PARM_NUM_DAYS_FOR_SHIPMENTS” will be available on the stack.  The various “List Commands” can then use this variable.

The policy variable of a specific archive will override the default value.

Archiving Logic
1.       Generate a new archive job number
2.       For each of the archive jobs that needs to be executed
a.       Verify that “uc_ossi_arcnum” column exists on this table.  If not raise an error.
b.      We will use the “List Command” to define a command that will mark the rows that need to be archived in the table.  This command will use an “update” statement with a “where” clause – which is much faster than selecting the data and updating by primary key.  We will update the rows with the archive job number generated above.
c.       For each object in the “Details”
                                                                           i.      Verify that “uc_ossi_arcnum” column exists on this table.  If not raise an error.
                                                                         ii.      We will use the “List Command” to mark the rows (same as 2.b above).  The command will use a correlated query with the main table along with suitable join criteria to accomplish this.  We will not do this by scanning the primary table rows one at a time – a single update will accomplish this.
d.      By this time we will have marked the rows in the primary table and all of the dependent tables with the archive job number.  Now we will go through all of these tables and transfer them to the archive instance.
We will use database links (on oracle) or Linked Servers (on SQLServer) to move this data to the archive instance.  We will handle the various “Duplicate Actions” as follows:
Duplicate Action
Logic
<blank>
Do not archive
Error and Rollback
See if any of the rows that are marked with the current archive number already exist on the archive instance.  If so raise an error and do not proceed.
The check will be done using “select from <primary table> where uc_ossi_arcnum is current archive# and exists ( select 1 from <archive table> where <primary key join>)” syntax. 
Ignore and Continue
Add data on the archive instance for the current archive job.  We will only add the data which does not already exist on the archive instance.
We will do this using “insert into <archive table> ( column list ) select column list from <primary table> where uc_ossi_arcnum is current archive# and not exists ( select 1 from <archive table> where <primary key join>)” syntax.
Replace and Continue
Add the data on the archive instance for the current archive job.  If the data already exists on the archive instance then replace it.
In database operations update is more expensive then delete and insert so we will first delete such rows in the archive instance and then add them again.
We will use correlated delete and insert statements to accomplish this.

e.      Now run the “Post Archive” command.  This command typically has a purge command that purges the main table and the dependent data.  If that is the goal then this command does not need to be used an instead the “purge_flg” may be checked.
f.        If “purge_flg” was checked, then delete the rows from the tables that were identified earlier (Step 2.d).

The users will still use the “Database Archive Maintenance” application.  The setup will be as follows:


Compared to Standard RedPrairie solution
Archive Table
Name of the database table being archived
Same
Maximum Rows
Our approach is to move the data in a single transaction.  Since we are using database operations, we believe that this is not needed.
Our solution will work as if this was set to 0
Duplicate Action
The RedPrairie provided values of “Error and Rollback”, “Ignore and Continue”, and “Replace and Continue” will be respected
If it is left blank then we will not archive the data at all.  This can be used in conjunction with “Purge” to simply purge the data.
Generally same.  The logic to respect a blank entry here is different.
Status File
We provide a more elaborate monitoring environment so this does not apply.
Does not apply for our solution.
List Command
The command will be used to mark the rows rather than listing them. 
We will allow a blank in this field when specifying the “Details”.  If it is blank then we will assume that the table is related to the main table on the main table’s primary key and that the column names for the main table’s primary key exist in this table as well.
Our solution uses the field differently.
Purge?
The data from the main table and the detail tables is purged.

Custom Columns
Use Improved Archiving?
A flag called uc_ossi_arc_flg will be added to the archdr table.  The archives that use this technique should set this flag to True.

Job Set
A new text field called uc_ossi_job_set will be added to the archdr table.
Several archives can specify the same value here.  Then a single job may be scheduled to run all of those archives.  See Step#2 in “Archive Logic” above.


Standard Command Override
RedPrairie uses the “archive production data” command in order to archive the data.  We will override this command to look at the “uc_ossi_arc_flg” column for the passed-in archive.  If the archive is defined as such we will use the improved archiving technique defined here otherwise it will default to the standard RedPrairie logic.

Summary

Archiving and Purging are critical for every RedPrairie installation.  Some of the RedPrairie tables grow rapidly and if the data is not archived properly it can lead to significant performance degradation.  We expect remarkable improvement using this technique.  We expect the performance improvement to be manifold.
Customers will find the improved job visibility useful as well.  The users will be able to see details about the job execution.  We will stamp the archive rows with the specific job that moved the data.  This will provide assistance during troubleshooting.

Copyright Notice
·          RedPrairie is a registered trademark of RedPrairie (http://www.redprairie.com)
·          Oracular is a registered trademark of Oracular (http://www.oracular.com)
·          This document contains information about the RedPrairie software.  It is assumed that recipient has legal access to the said software.
·          

3 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Can you please start a blog on how to use the DDA? Basically, how can one create objects like textbox, dropdown etc and how one can pass data to and from these objects

    ReplyDelete
  3. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in RedPrairie, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on RedPrairie. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us:
    Name : Nitesh Kumar
    Email : nitesh@maxmunus.com
    Skype id: nitesh_maxmunus
    Contact No.-+91-8553912023
    Company Website –http://www.maxmunus.com

    ReplyDelete