Segal's Law and two-way data sync
Which System is Right?
Data syncing is a constantly underestimated problem. It seems very easy on the face of it. If you have and control System A, it can be easy to ingest data into A from System B. Then, someone requests the ability to bi-directional sync A and B. The managers or executives think it should be easy as playing a Reverse Uno card!
It is not, and the main issue is one of data constraints. In any two-way sync of systems, what should happen if System A does something that is ‘invalid’ in System B? At that point, they are out of sync, and what happens after that? Ignore that difference or alert the user or do the ‘reasonable’ thing?
There is also the reverse case. What if System B does something that System A(your system) views as invalid? The same questions apply, but often the result is a hack to allow input into System A.
Then what happens if you have multiple modifications in A and B that don’t sync across? Which system is right at that point? The last change is the easy answer, but often you only have access to it when an object changes, not a field in that object. A modified Segal’s law can apply here.
“A man with a system knows what is correct. A man with two systems is never sure.”
System A needs to have as many options, actions, and constraints as System B. For every ‘action’ of System B, there needs to be a reaction from System A. In order words, if you want to have a “complete” two-way sync that is “seamless,” what you are saying is “I want to recreate System B on top of our System A,” Which can be a significant undertaking, but there are three ways to address this.
Implement a full bi-directional sync. You become an expert at understanding System B. You know everything it can and cannot do. You subscribe to their changelogs and start a business partnership with them.
I think this would only work if you are a larger company. There needs to be some buffer or a large enough revenue stream from other places to support this effort. I honestly think that buying the other company would be a better solution since a solution can be more holistic.
One answer is adding restrictions. Have a set of actions where the reaction is just ‘not syncable.’ For example, System B has multiple, complex field types like Country. The Country field has a list of valid options like “France” or “South Korea,” If you try an invalid option, System B will return an error code. You can exclude the Country field from the sync and only simple sync fields like a String or Number. This way, there is less complexity and fewer responses you have to go through, and you are ignoring and freeing your program from worrying about a set of reactions.
If you want to sync with something, restrict the options. Maybe only specific data rows or fields are synced. Perhaps you can only read specific fields but not write them.
Most bi-directional and ‘seamless’ I have seen syncs do restrictions. It is seamless if you squint and ignore some gotchas. It is mainly a marketing tactic that people take too literally.
Open the APIs
Using APIs is really clever since it combines the above two options and offloads the cost to a third party. The issue with Restrictions is that you have to make choices and a customer might have made a different choice than you. You need good documentation and result code for System A, and they can request more features, but only if it makes sense for what System A already does.
I presented three options with bi-directional syncs.
- Add Constraints
- Open up APIs
Of course, these options can be mixed, but it is painful when people assume it is a potential easy problem and might be very hard. In any case, you can use constraints, and open APIs to simplify the code, deliver faster, and better the overall experiences within the company and for your customers.