Thursday, January 10, 2019

Blockchain For Supply Chain - Reality or Just Another Fad!

Overview

Every few years we see enthusiasm generated by certain innvoations as that single solution will solve all of the problems we have in the world.  XML was supposed to solve all the problems with EDI data exchange.  Open Source was supposed to render Oracle, Microsoft, and IBM obsolete.  NoSQL databases were supposed to be so much better than legacy and cumbursome relational databases and that RDBMS was a dirty word.

All of these innovations added substantial value - but their hype was overblown and as a result several organizations started taking useless inititaives to force a concept into their organization.  For example they would XMLise their EDI by putting the whole EDI document as-is into an XML structure.  All core concepts like transactions that are 101 level concepts have to be defended.  We have to re-educate folks that NoSQL type of databases have their place but there is nothing legacy about Oracles, SQLServers, or DB2s of the world.  And that a database transaction is still a fundamental concept when creating real-world solutions.

Now we have another fad - blockchain.  And that all concepts like SAAS, RDBMS, etc. are legacy; and the earth spin needs to be managed using blockchain!

In this paper, I will focus on the hype as it relates to supply chain.  Specifically there is a lot of buzz with the premise that "Hey Walmart wants its suppliers to use blockchain - or that Walmart has adopted blockchain - so we must do something with it or else ...."

Basic Terminology

First lets understand at a high level what a blockchain is:
  • It is a chain of blocks where each block contains certain features that enhance its resiliance:
    • Element Description
      Data This is the main data in the block
      Previous Block Hash Value This is hash value of the previous block in the chain
      Current Block Hash Value This is hash value of the current block. This is computed by taking into account all contents, i.e. data and prvious block hash value
  • Being a chain makes it resilient, i.e. you must always process the chain from the beginning and while processing you validate the integrity of each block via its own hash value and previous block hash value - until you get to the data you are interested in.
  • The hash function is supposed to be "hard" for the computer, e.g. in case of bitcoin it is a requirement that the resulting value has a certain number of 0s in the beginning.  As there is no direct way of getting that many 0s - the algorithm needs to keep trying the hash function with slight modifcation of data until it gets the correct hash value.
  • If you attempted to corrupt a specific block you may render the "Current Block Hash Value" as incorrect.  If you fixed that as well then subsequent blocks will no longer be pointing to this block correctly.
  • Various well-known techniques are used to broadcast these contents to a network. 
  • With a large enough network and "hard" hashing technique corrupting the chain becomes close to impossible
If someone wants to understand these concepts in detail and work with examples of creating it - I suggest youtube channel Simply Explained - Savjee 

As you can see the concept of block chain itslef is quite simple.  A basic Data Structure student will tell you that she could implement this using any lower level concept like arrays, linked list etc.  For persistence it could be implented using files, NoSQL databases, or the legacy and obsolete RDBMS technologies.  It is a concept not a solution!

What makes it tick!

We need to understand some fundamental concepts about it to consider its value.
  • It is resilliant against hacking if and only if sufficiently large number of nodes keep a copy of the database.  
  • It is resilliant because by definition we never update the data.  Once data is addded it must be undone via an additional transaction.
  • It is resilliant because you always traverse the chain from beginning - so that you can detect any foul play. 
By definition one chain is working on homogenous data - for example bitcoin is a blockchain dealing with a special digitial currency and not with court records.

If we were to apply this concept generally we need to look at it abstractly and see what is the real value and that is not blockchain but that:
  • We have a distributed data store with sufficiently large number of nodes with the complete copy of the database.
  • And because of that - no central authority is needed.
  • And it is a type of data structure that does not allow "updates" and "deletes" - a new transaction must be defined to undo previous transaction.  This protocols limits chances of foul play.
With that understanding - is it really the new wheel?  We understand distributed databases quite well already.  DNS is a well-known example of itApplication clusters are deployed on millions of servers across the world using vaious types of distributed database techniques.   The logic about making it immutable is certainly very interesting and innovative and could prove to be a valuable technique - but the concept of implementing immutable data sets is not new.  Most legacy databases allow this level of control on their arcane data stores - and they have been doing it for decades.  But this approach of implementing it via hash values can certainly be quite valuable!


Then what is all this hype about "Walmart embracing blockchain - so should we".  Here are some details about Walmart's initiative:
  • It is not a public network - but a private network
  • It is owned and operated by a central authority called IBM.  They are using IBM Food Trust solution.
Under the hood - they could certainly be using blockchain concepts but how does that translate to this hype about blockchains solving the supply chain problems.  If IBM used linked lists in a certain algorithm - should a whole fad start around linked lists and how hash tables and arrays are now obsolete?

Supply Chain Optimization Problems - Myths and Realities

The often quoted scenarios for blockchain solving this made-up crisis is that "I as an individual should know where that box of Tylenol came from so I could see its whole history from beginning to end".

I fail to see how this statement - which is certainly a laudable and achievable goal - relates to blockchain?  Have we recently tracked a Fedex package?  Does it not provide end to end tracking?  Is that concept somehow dependent on the underlying data structure to be a blockchain?

This goal is more dependent on the premise that the actors involved in getting that Tylenol bottle to you are willing to share those details with you.  Walmart hype certainly does not give you, as a consumer, that level of visibility.  If a large number of actors involved in supply chain were willing to share that data using public domain - then a host of solutions could come into being - and most likely the winners would be based on dreaded and obsolete high-end RDBMSs of the world.

The basic building blocks of this level of visibility have been around for decades.  EDI framework provides specific transaction sets for providing supply chain visibility.   So the real challenge is not that we suddenly figured out this domain - the issue is that the industry has not agreed to publish these documents into the public domain - rather it is shared in private networks.  

So for that Tylenol bottle in your hand - all companies involved from manufacturing, to transportation, to distribution, to retail have all of the data without requiring blockchains to answer your question.  But they have decided to keep it private and propriatary.  If for example through industry concensus or government regulation each of the EDI transactions were published to the public networks - your query could be answered without requiring a specific underlying data structure.

Potential Case Study

A potential use case often presented is about movement of goods and when they reach a certain point certain logic could execute to for example pay for the transportation of goods.

So let us work on that - we are all quite familiar with receiving packages via UPS, FedEx, or USPS.  All of them today provide end to end visibility of the goods movement on their network.  We can track it from the shipment facility all the way to my house - but currently I would need to go to their specific website to track the package.  

So let us say they all agree that we want to provide this visibility generically so that I may be able to see the movement without requiring to go to the specific vendor's site - what needs to happen to make it happen?  Is it blockchain or something else?

We just need to implement blockchain!

First question is - what is the unit of the blockchain?  Is there one chain for each tracking#?  Or is there one global blockchain containing billions and billions of nodes?  Or something in the middle?  Can a solution be developed without the industry coming up with some standards?  

So "we just need to implement blockchain because walmart did so" will not work here.  They will need to sit together and come up with some protocols about how the data is to be exchanged and then a solution can be developed.

Does this porblem require blockchain?

If the industry comes up with standard definition of tracking numbers, service levels, etc. a solution can be developed.   The solution cannot really be a global ledger as there may be trillions of nodes to go through.  It needs to be a solution where unit is a tracking# so the person can quickly get to that and see its history.  The requirement of this network is not that it must be a blockchain but that:
  • It is an add-only list
  • Unit is tracking number
  • Various participants should be able to add data to it for example the various contractors who may be carrying it and eventually the final delivery
  • Logic may be executed when certain conditions become true 
  • The list must be in public domain and not on propriatary networks
Given these requirements a blockchain is certainly one option - but that is an implmentation strategy.  For instance DNS is the data store - one site may have it as a file, another for example may choose to have it in a database.  There is nothing fundamental in that protocol that requires a certain solution - as long as the protocols are respected - we have a solution.  

So in that world, for example, a shipper will announce that I have given this tracking number to a truck driver.  Truck driver will then announce that I am on the way.  Airplane loader will announce the loading on the airplane and so on.  Eventually the delivery van driver will announce that it has been delivered.  If signature is captured, it will be announced as well.

With a standard and well-understood protocol in place various competing nodes get that announcement and add to their respective databases - and then announce to the network that they have added it.  

To prevent against foul-play, difficult hashing is just one technique - for example DNS works quite well without it.  Distributed databases have various concepts to safeguard against this.

So in this world - one of the nodes may implement this distributed database as a blockchain while another may have an Oracle backend and still another could use mysql.  The problem is not of technology but of protocols and standards.

Conclusion

It is important to separate hype from reality.  Blockchains are an intersting concept that need to be understood by today's computer professionals.  In supply chain, if visibility is the goal - problem is not the technology platform but mindset of the industry where currently they want to keep several realities close to them and not share with the world.

So for example walmart did not push their data to the public domain because doing so would publicize their vendors and also their volumes.  These could compromise their potential advantage over the competition.  

We must never force a solution just to say "we did it!".  While understanding concepts is important, using the right tool in the right place is essential.  Blockchain is a tehnique - is it supply chain's savior - not really!

Thursday, January 3, 2019

Huge Trace Files - Not a problem!

Overview

JDA/RedPrarie has a very useful feature that allows for detailed tracing of a server component.  It provides information about every single server component, parameter, and also all SQLs that are executed.  But that also ends up being an issue as the traces would often reach several gigabytes.  Analyzing them becomes a problem.

Our Solution

Oracular (http://www.oracular.com) has created a client software that allows for executing MOCA commands.  It can be downloaded from http://autoupdate.oracular.com/mocaclient/ .  We have incorporated an advanced Trace Viewer into this application that allows for analyzing such trace files.  The concepts are based on some of the concepts described in my earlier blog.

Database Trace Option

When you have our MOCA Client, you can access the "MOCA Log" pulldown menu:

As described in the earlier blog post, we will be pushing the trace to a set of tables in the database.  So in the screen you can press the button to create the tables on the instance.  This needs to be done only once.
Now you can choose the trace file you want to load from the instance by pressing the insert button.  There is no upper limit on the size of trace file.  You will be able to process several Gigabyte trace file and the client will give you progress update.  Once the trace file has been parsed the parsed view is stored in a set of database tables so re-analyzing it does not require re-parsing.

The parser views the trace file as having three high level perspectives:
  1. Activity.  This is from the point of view of a client that accessed a service provided by MOCA
  2. Commands.  These are the various MOCA commands
  3. SQLs.  These are the SQL statements that are executed

Activity View

As you select a trace file from the top grid, the various Activities will be displayed in the order of execution along with the time spent on each:

You may sort the grid as you see fit.  Each row here implies an interaction from the client (e.g. RF or GUI user) with the MOCA server.  Being able to view this is a huge benefit when trying to make sense of huge trace files as our interesting area may be one of several activities.  As you can notice that the client makes several calls to the MOCA server for house-keeping activities which create a lot of noise to filter through.  You can right click on any row here and export just that part of the trace file:

The exported trace file can be processed in any way you like, i.e. in a trace viewer of your choice.

Here you have two tabs "SQL" or "Commands" to do further analysis:


You can choose a tab and then press "Analysis" button.

SQL Analysis View

You can perform the analysis of the whole trace file or of a certain activity.  You decide that based on the filter section:

This shows a grid that has all SQLs within the context.  The SQLs here are the ones with bind variables.  This implies that we group like SQL statements.  This is extremely helpful when we are looking at a trace from the point of view of performance tuning:

You can sort the data as you see fit.  For example while looking for performance imporvements, sorting by tot_elapsed can be quite helpful.  There is also a summary section to separate the tracing overhead from the execution overhead.  The above example highlights that there is a single SQL statement that took 8 seconds.  But also that the seond one is executed very frequently and even though each execution was sub-second (0 ms to 101 ms) adding them together comes to 1.5 seconds which can be significant when looking at a transaction like inventory movement.

You can right click on any row here to see further options:
  • Show all executions will show each execution.  As you can see below it provides a summary view and also the time for each execution.  It also provides contextual information about the command that executed it and also the line number in the trace file.

  • Explain query will show an execution plan.  This is very useful as the explain plan is really dependent on the query with bind variables.  A common mistake is to analyze queries with bind variables replaced.  This provides a simple mechanism to get the trace


Command Analysis View

Like SQL Analysis, you can do command level analysis as well.  You can analyze the whole trace or focus on an activity.  Once you analyze you see similar output but from the point of view of MOCA commands:

You can see summary view and also similar statistics for the various commands within the analysis context.  You can for example focus in on a command that may be executed too often causing overall performance degredation.


Trace View Option

You also have the traditional option of viewing the trace file as a tree view or detailed view.  Here too we have several advantages over the competing options:
  • We can look at part of huge trace
As mentioned above when we right click on an activity we can create a trace for just that portion.  That sub-trace is opened into the trace viewer:


  • Ability to filter out noise
When we open a trace file, we get an option to filter out some typical parts of the trace to make it easier to view.
This will reduce the resulting view significantly as the common house keeping commands in MCS and MOCA will be ignored.  You can also put additional commands to ignore.  This also reduces the size by getting rid of some other parts of the trace.
  • Tree View and Detailed View in side by side panes
Above enhancements allows us to view very large trace files in this view as well.  Then we can see the contents in two panes where right side shows it as a tree and left side as text:

  • View Arguments in a tab
We can highlight a command in the tree view to see the arguments that are available to it in the Arguments tab:

  • Easily see FLOW messages
MOCA code may put explicit trace messages as FLOW type.  These provide explicit trace hints that can be valuable in making sense of a complex command.  The "Flow Messages" tab shows such messages based on the selected node of the tree:

Conclusion

RedPrairie/JDA server tracing option is very valuable and provides a tool to perform indepth analysis and troubleshooting.  Complex use cases become difficult due to the size of the trace file.  Our approach provides us a solution to focus on the problems and eliminating noise.  Our consultants are also users of our tools so this allows us to imporve these by utilizing their feedback.  If you have any questions or comments please contact us.