Thursday, January 10, 2019

Blockchain For Supply Chain - Reality or Just Another Fad!

Overview

Every few years we see enthusiasm generated by certain innvoations as that single solution will solve all of the problems we have in the world.  XML was supposed to solve all the problems with EDI data exchange.  Open Source was supposed to render Oracle, Microsoft, and IBM obsolete.  NoSQL databases were supposed to be so much better than legacy and cumbursome relational databases and that RDBMS was a dirty word.

All of these innovations added substantial value - but their hype was overblown and as a result several organizations started taking useless inititaives to force a concept into their organization.  For example they would XMLise their EDI by putting the whole EDI document as-is into an XML structure.  All core concepts like transactions that are 101 level concepts have to be defended.  We have to re-educate folks that NoSQL type of databases have their place but there is nothing legacy about Oracles, SQLServers, or DB2s of the world.  And that a database transaction is still a fundamental concept when creating real-world solutions.

Now we have another fad - blockchain.  And that all concepts like SAAS, RDBMS, etc. are legacy; and the earth spin needs to be managed using blockchain!

In this paper, I will focus on the hype as it relates to supply chain.  Specifically there is a lot of buzz with the premise that "Hey Walmart wants its suppliers to use blockchain - or that Walmart has adopted blockchain - so we must do something with it or else ...."

Basic Terminology

First lets understand at a high level what a blockchain is:
  • It is a chain of blocks where each block contains certain features that enhance its resiliance:
    • Element Description
      Data This is the main data in the block
      Previous Block Hash Value This is hash value of the previous block in the chain
      Current Block Hash Value This is hash value of the current block. This is computed by taking into account all contents, i.e. data and prvious block hash value
  • Being a chain makes it resilient, i.e. you must always process the chain from the beginning and while processing you validate the integrity of each block via its own hash value and previous block hash value - until you get to the data you are interested in.
  • The hash function is supposed to be "hard" for the computer, e.g. in case of bitcoin it is a requirement that the resulting value has a certain number of 0s in the beginning.  As there is no direct way of getting that many 0s - the algorithm needs to keep trying the hash function with slight modifcation of data until it gets the correct hash value.
  • If you attempted to corrupt a specific block you may render the "Current Block Hash Value" as incorrect.  If you fixed that as well then subsequent blocks will no longer be pointing to this block correctly.
  • Various well-known techniques are used to broadcast these contents to a network. 
  • With a large enough network and "hard" hashing technique corrupting the chain becomes close to impossible
If someone wants to understand these concepts in detail and work with examples of creating it - I suggest youtube channel Simply Explained - Savjee 

As you can see the concept of block chain itslef is quite simple.  A basic Data Structure student will tell you that she could implement this using any lower level concept like arrays, linked list etc.  For persistence it could be implented using files, NoSQL databases, or the legacy and obsolete RDBMS technologies.  It is a concept not a solution!

What makes it tick!

We need to understand some fundamental concepts about it to consider its value.
  • It is resilliant against hacking if and only if sufficiently large number of nodes keep a copy of the database.  
  • It is resilliant because by definition we never update the data.  Once data is addded it must be undone via an additional transaction.
  • It is resilliant because you always traverse the chain from beginning - so that you can detect any foul play. 
By definition one chain is working on homogenous data - for example bitcoin is a blockchain dealing with a special digitial currency and not with court records.

If we were to apply this concept generally we need to look at it abstractly and see what is the real value and that is not blockchain but that:
  • We have a distributed data store with sufficiently large number of nodes with the complete copy of the database.
  • And because of that - no central authority is needed.
  • And it is a type of data structure that does not allow "updates" and "deletes" - a new transaction must be defined to undo previous transaction.  This protocols limits chances of foul play.
With that understanding - is it really the new wheel?  We understand distributed databases quite well already.  DNS is a well-known example of itApplication clusters are deployed on millions of servers across the world using vaious types of distributed database techniques.   The logic about making it immutable is certainly very interesting and innovative and could prove to be a valuable technique - but the concept of implementing immutable data sets is not new.  Most legacy databases allow this level of control on their arcane data stores - and they have been doing it for decades.  But this approach of implementing it via hash values can certainly be quite valuable!


Then what is all this hype about "Walmart embracing blockchain - so should we".  Here are some details about Walmart's initiative:
  • It is not a public network - but a private network
  • It is owned and operated by a central authority called IBM.  They are using IBM Food Trust solution.
Under the hood - they could certainly be using blockchain concepts but how does that translate to this hype about blockchains solving the supply chain problems.  If IBM used linked lists in a certain algorithm - should a whole fad start around linked lists and how hash tables and arrays are now obsolete?

Supply Chain Optimization Problems - Myths and Realities

The often quoted scenarios for blockchain solving this made-up crisis is that "I as an individual should know where that box of Tylenol came from so I could see its whole history from beginning to end".

I fail to see how this statement - which is certainly a laudable and achievable goal - relates to blockchain?  Have we recently tracked a Fedex package?  Does it not provide end to end tracking?  Is that concept somehow dependent on the underlying data structure to be a blockchain?

This goal is more dependent on the premise that the actors involved in getting that Tylenol bottle to you are willing to share those details with you.  Walmart hype certainly does not give you, as a consumer, that level of visibility.  If a large number of actors involved in supply chain were willing to share that data using public domain - then a host of solutions could come into being - and most likely the winners would be based on dreaded and obsolete high-end RDBMSs of the world.

The basic building blocks of this level of visibility have been around for decades.  EDI framework provides specific transaction sets for providing supply chain visibility.   So the real challenge is not that we suddenly figured out this domain - the issue is that the industry has not agreed to publish these documents into the public domain - rather it is shared in private networks.  

So for that Tylenol bottle in your hand - all companies involved from manufacturing, to transportation, to distribution, to retail have all of the data without requiring blockchains to answer your question.  But they have decided to keep it private and propriatary.  If for example through industry concensus or government regulation each of the EDI transactions were published to the public networks - your query could be answered without requiring a specific underlying data structure.

Potential Case Study

A potential use case often presented is about movement of goods and when they reach a certain point certain logic could execute to for example pay for the transportation of goods.

So let us work on that - we are all quite familiar with receiving packages via UPS, FedEx, or USPS.  All of them today provide end to end visibility of the goods movement on their network.  We can track it from the shipment facility all the way to my house - but currently I would need to go to their specific website to track the package.  

So let us say they all agree that we want to provide this visibility generically so that I may be able to see the movement without requiring to go to the specific vendor's site - what needs to happen to make it happen?  Is it blockchain or something else?

We just need to implement blockchain!

First question is - what is the unit of the blockchain?  Is there one chain for each tracking#?  Or is there one global blockchain containing billions and billions of nodes?  Or something in the middle?  Can a solution be developed without the industry coming up with some standards?  

So "we just need to implement blockchain because walmart did so" will not work here.  They will need to sit together and come up with some protocols about how the data is to be exchanged and then a solution can be developed.

Does this porblem require blockchain?

If the industry comes up with standard definition of tracking numbers, service levels, etc. a solution can be developed.   The solution cannot really be a global ledger as there may be trillions of nodes to go through.  It needs to be a solution where unit is a tracking# so the person can quickly get to that and see its history.  The requirement of this network is not that it must be a blockchain but that:
  • It is an add-only list
  • Unit is tracking number
  • Various participants should be able to add data to it for example the various contractors who may be carrying it and eventually the final delivery
  • Logic may be executed when certain conditions become true 
  • The list must be in public domain and not on propriatary networks
Given these requirements a blockchain is certainly one option - but that is an implmentation strategy.  For instance DNS is the data store - one site may have it as a file, another for example may choose to have it in a database.  There is nothing fundamental in that protocol that requires a certain solution - as long as the protocols are respected - we have a solution.  

So in that world, for example, a shipper will announce that I have given this tracking number to a truck driver.  Truck driver will then announce that I am on the way.  Airplane loader will announce the loading on the airplane and so on.  Eventually the delivery van driver will announce that it has been delivered.  If signature is captured, it will be announced as well.

With a standard and well-understood protocol in place various competing nodes get that announcement and add to their respective databases - and then announce to the network that they have added it.  

To prevent against foul-play, difficult hashing is just one technique - for example DNS works quite well without it.  Distributed databases have various concepts to safeguard against this.

So in this world - one of the nodes may implement this distributed database as a blockchain while another may have an Oracle backend and still another could use mysql.  The problem is not of technology but of protocols and standards.

Conclusion

It is important to separate hype from reality.  Blockchains are an intersting concept that need to be understood by today's computer professionals.  In supply chain, if visibility is the goal - problem is not the technology platform but mindset of the industry where currently they want to keep several realities close to them and not share with the world.

So for example walmart did not push their data to the public domain because doing so would publicize their vendors and also their volumes.  These could compromise their potential advantage over the competition.  

We must never force a solution just to say "we did it!".  While understanding concepts is important, using the right tool in the right place is essential.  Blockchain is a tehnique - is it supply chain's savior - not really!