Wednesday, December 30, 2015

Integrator In-Memory Database - Makes Integration Even Easier

Overview

Recent RedPrairie Integrator versions include an embedded H2 database that opens up new possibilities for efficient and simple integration projects.  The basic components provided are:
  • create internal table
  • insert into internal table
  • select from internal table
    • This command asks for a table_name as a parameter - but we can pass a complete join clause as well.
  • release internal tables
In some respects the idea is similar to caching discussed in another post - but the added functionality that helps in integration is the ability to apply complete SQL operations.

As of the writing of this blog, this functionality is less known and primarily used internally by RedPrairie integrator itself to abstract the inbound IFD as a table and for "Expression on Child Segment" IFD Field class - but it can come in handy in other cases as well.

Complex Mapping Scenario

If the inbound structure closely matches the RedPrairie concepts - then nothing special is needed - but in some cases the structures can be quite different.  In my experience the various SAP IDOCS present the data in a large number of mutually connected segments, for example:

Here the left side represents an incomplete view of a delivery in SAP.  Right hand side represents an order in WMS.  As you can see the SAP view is a lot more verbose and it needs to transform basically down to a order and order lines.

How to handle this mapping in standard integrator?

This type of mapping is not a problem for Integrator.  It has been able to handle it since the earliest versions.  Some of the tools available are;
  • Ability to access any inbound IFD field as an EO Column
  • Create EO segments at the same level as inbound IFD
  • Using "Expression on Child Segment" concept on the final result IFD.
Some less experienced implementers end up creating intermediate tables to address this type of mapping - that is not needed.  We can map it in one shot, i.e. inbound IFD goes via EO creation to the result IFD which looks like the right hand side.

The result is that the integration works but can be a resource hog.  Basically every time an EO segment instance is created - it has to push a subset of the inbound IFD to the internal database (it used to be actual tables and now it is the embedded H2 database) and then system allows for executing SQL against it.  As described above the traditional solution would have required several EO segments parallel to the various inbound IFD segments, e.g.

Once these additional EO segments have been created - the final result IFD will use "Expression on Child Segment" to create the final fields.  Each of these IFD fields will require integrator to push the EO segment data to the internal database so that SQL can be applied.  Lets say you have 10 order line fields that have to be created in this fashion - it will require the EO data to be pushed to the internal database 10 times for the same order line.  As you can see this can quickly become a performance issue.

Internal Database to the rescue

At a very basic level - the above discussion highlights the key deficiency in integrator - that it does not allow for joining data from multiple inbound segments.  So creating the intermediate EO segments is a way to circumvent that.  Using the internal database we can overcome this and simply load the inbound IFD to internal databases.  Afterwards we can simply join the data so that the EO also matches the result IFD:
The basic difference is that we have introduced a high level segment that would load all of the relevant inbound IFDs to internal database as tables.  And at the end we would release those tables.  The retrieve method implementation of the UC_LOAD_HOST_ORD will utilize the "create internal table" and "insert into internal table" commands.  
    First we need to create a component that would return a resultset for each inbound IFD segment that we are interested in. For example lets call that component "get usr sap order segments as rs". It could be implemented as:
    /*
     * During integrator execution - ifd_data_ptr is on stack
     * so it does not need to parse the inbound again.  But
     * in case it is not on stack, e.g.
     * for testing then we get it via standard
     * integrator command.
     */
    if ( @ifd_data_ptr#onstack )
       hide stack variable where name = 'ifd_data_seq'
    else
       sl_get ifd_data_ptr 
    |
    {
       /*
        * If an inbound IFD is optional - we need to still
        * create an empty recordset for it
        */
        ... create empty recordsets
        |
        /*
         * Now fetch the inbound IFD Data into a
         * recorset, e.g. lets say inbound IFD
         * segment is called MYSAPHDR - we will say
         */
       {
          {
             sl_list ifd_data_fld <
             where ifd_seg_id = 'MYSAPHDR'
             >> uc_sap_mysaphdr
          }
       }
       |
       ... so on.  Get every relevant inbound IFD
       ... segment to a recordset.
       ... This step is extremely efficient as it
       ... is simply looking at parsed 
       ... view of inbound IFD from memory.  
       ... If an inbound IFD was optional, e.g.
       ... order notes - then you still need
       ... to create a valid recordset with 1 row.
    }
    
    Now we can create another command say "create usr sap order tables" as follows:
    get usr sap order segments as rs
    |
    {
       release internal tables
       ;
       create internal table
       where table_name = 'uc_sap_mysaphdr'
       and res = @uc_sap_mysaphdr
       ;
       ... so on for all tables
    }
    
The UC_RELEASE_HOST_ORD will simply call "release internal tables" command.

With that in place the ORDER_HDR, ORDER_DETAIL etc. will no longer utilize the generated retrieve method implementation that says "where sl_data_fk = :sl_data_pk" instead it will call a MOCA Command  that will utilize "select from internal table" to get data from multiple inbound IFD segments utilizing joins - which is extremely efficient.
    This will utilize the several tables created earlier and access the data from there. It can utilize complete SQL language including group and join clauses:
    {
       select from internal table
       where select_stmt =
         "select ... any columns from any table ...."
       and table_name = 
         " uc_sap_mysaphdr join .. left outer join .." 
    }
    

Some Gatchas

  • Note that the internal tables are created directly from the recordset.  If the recordset is empty, the table cannot be created as it does not have the column information.  That is why we need to have an empty table in that case.  The "sl_list ifd_data_fld" command returns ifd_data_seq and ifd_data_dtl_seq in the recordset.  So you can create the empty recordset with these values set as 0.  That way the recordset will not be empty but when you access it for the particular ifd_data_seq you will not get a row - which is what we wanted.
  • If this functionality is used, then all segments below UC_LOAD_HOST_ORD must use this and not the traditional "sl_data_fk = :sl_data_pk" retrieve methods.  This is because when creating the tables we called "release internal tables" and that also killed the table that integrator had created for this SQL to work.
  • Referring to first gotcha - note that in MOCA if you assign one recordset to another, it just sets the pointer and does not clone the data.  This implies that if you set multiple recordsets to the same "empty" recordset then even though you are thinking they are two different recordsets - they are really the same.  So if you move to the last row in one - you will also move to the last row in the other.  Where it would cause an issue is that when you will create the internal table from one - the next one will not work.  You will need to call ".reset" on the recordset again before using it.

Summary

The basic idea is that we should try to match event output to the RedPrairie order model via this capability.  That implies that we push the inbound data to in-memory tables once and then join them as needed to create the final set.  At a very high level the approach is similar to a two-step approach of first loading the data to temporary staging area and then logging another event from that - but this achieves that in a single step and will be more optimal - both in performance and also in management.


1 comment:

  1. Hi Sir,

    I begin to studying and working with RedPrairie WMS as supporter & developer.Thanks so much for all subjects that you posted in your blog.Please give me advises where should i start with as beginner?Any resources,documents relate to system architecture,framework..could i refer to ?

    Thanks in advance.

    ReplyDelete