One of the Darwino piece of code that I'm the most proud of is the replication engine. If a majority of our customers see it as a Domino to JSON replication engine, it goes far beyond that. In reality, it can replicate between virtually any data sources.
It is a true, two ways, multi-point replication engine, borrowing some ideas from IBM Domino but going beyond its venerable ancestor in multiple places.
The main idea is that any data set can be represented as a set decorated JSON objects. Decorations include meta-data (ex: data security, replication attributes...) and binary attachments. The latest are stored apart from the JSON itself for performance reasons.
As an example, a Domino document can be transformed to a JSON object where every Domino document item is converted to a JSON attribute in the object, along with the attachments.
A relational table can also be transformed to a set of JSON objects or arrays, one per row. Even more interesting, a set of rows coming from multiple tables can be grouped to create a consistent, atomic JSON object. This way, the replication granularity is not limited to a row, but can be a whole business object even if the physical storage split it into multiple records
Dawino's engine is mostly made of two components:
- A universal replication engine
This engine does *not* know about the data sources. It contains all the replication logic while it accesses the data through a generic connector API
A connector encapsulates the physical access to the data, via an API. Its API provides a "JSON" vision of the data that we'll be consumed by the engine
As of today, the Darwino replication engine comes with ready to use connector implementations:
- Darwino JSON store, running atop RDBMS
- IBM Domino
- FlowBuilder XML store, used by ProjExec
- HTTP to wrap a local connector and use HTTP/HTTPS as the transport
Some other connectors have been started:
- Pure RDBMS used, for example, to replicate to DashDB
- MS Sharepoint List
We are also waiting for a first beta release of IBM LiveGrid to replicate these data to your mobile devices and get a true offline expereince.
Writing a basic connector is a pretty easy operation. Then, the connector can be enhanced incrementally to provide fine tuned capabilities (better conflict detection....)
Darwino replication strengths
If the concept of replication feels easy to understand, there are many details that make Darwino unique so far:
The data can be transformed two ways during the replication process. These transformations can happen at the basic field level (ex: transforming a Domino name to its LDAP counterpart) or at a more global level, like grouping a set of Domino multiple fields into a JSON array. This way, the replicated JSON is clean, and it does not carry any legacy tricks from the source platform.
The transformation can obviously be coded, in Java or Groovy, but Darwino already comes with a pretty large set of standard transformations you can apply to any data. Thus, setting up the transformation is mostly about scripting the pre-defined transformers, unless you have more advanced need.
One of the benefit of data transformation is data normalization: you can ensure that the fields in a JSON document are converted to the expected data type. Domino is well know have inconsistent fields across the whole database, as they evolved over time. Darwino can normalized them.
Sometimes you want to limit the data replicated to a target system. It can be because of storage requirements (ex: your mobile device can only handle a subset of the data), or security (ex: you only want some data to be available on the cloud).
Darwino handles the selective replication by either providing selection formulas based on data. It also provides a unique feature called "Business Replication" where related documents can be grouped and replicated as a whole.
Selective replication poses a problem: how to propagate deletions as the data used to select the documents no longer exists, or how to remove a document from the target when the formula no longer selects that document? Well, we solved this problem with an efficient and reliable method that I'll be happy to talk about around a beer :-) This is a pretty complex problem.
Transaction and error recovery
If the target database supports transactions, then the engine can take advantage of them. If the replication fails for any reason, then it can rollback to the initial state, and eventually attempts another time. But this is not enough: when you have a large data set to replicate, you might not want to restart from scratch while you already replicated 98% of the data. For that purpose, the engine features commit points. If it fails at any time, it only restarts from the latest successful commit point.
This is one of the area were Darwino excels. It has multiple conflict detection systems, going way beyond Domino. A particular connector might only implement a subset of them, depending on the data source capabilities. In this happens, then the engine will degrade gracefully and deal with what is available.
The existing mechanisms include: the last modification date, a sequence id (similar to Domino or some RDBMS), a change string carrying the update history, an update id...
Once a conflict is detected, an API is called and the developer can choose to execute a predefined behavior (ignore, last wins, create a copy data set...) or do a custom resolution like a data merge.
Finally, Darwino has an interesting mechanism to deal with records deletion. If the source database has deletion stubs, it will effectively use them. Else, it provides some helpers to find the deleted records, in particular within relational tables.
The whole Darwino replication engine is designed to handle very large datasets. The Domino connector is mostly written in "C", thus calling the native "C" API. Not only this allows critical features that are not available with the regular back-end classes, but it also provide high performance data replication. On my laptop, it replicates ~400 complex documents/second, including attachments and rich text conversion. For simpler documents, with only basic data, it goes to the range of 1000-2000 a second! In many cases, you can get your existing NSF replicated in minutes.
Domino data accuracy
A replication engine only makes sense if it is high fidelity. From a Domino perspective, the Darwino replication should act like a regular Domino replication, maintaining the same data (sequence id, creation and modification dates, list of authors...). We can do that thanks to the "C" API.
Darwino can replicate most of the Domino field types: string, number, name, rich text, mime, date/time... with maximum precision. For example, it deals particularly well with the date/time and the time zones.
Any domino document can be replicated, including design elements or profile documents. If you choose to replicate the database ACL document, then it will then be used by the Darwino runtime to apply the same security. Talking about security, we also handle the reader/author fields.
Data can be aggregated during replication: multiple sources of data can be replicated in one single target database, thus allowing a more global view of the whole data set. Think about it: you can replicate many NSF into a single set of relational tables and then run global queries on top of these tables. Isn't that a dream? Darwino liberates you Domino data!
As you see, the Darwino replication engine is a very complete engine that is flexible to handle many data sources. Out of the box, it can replicate your databases as is. But it also give you a lot of options to select and transform your data.
| Recent Blog Posts
Develop a Domino applications using any modern tools|
Sun, Nov 12th 2017 9:00p Philippe Riand
Modern Notes like UI using ReactJS|
Thu, Nov 9th 2017 6:21p Philippe Riand
The world is evolving fast, and so technologies are. Today, for now a little while, when we talk about building a new web UI or a mobile hybrid one, we think about using pure client technologies without markup generation on the server side. JSF, JSP, ASP.NET are being replaced by Angular, ReactJS, VueJS and services... I personally think this is a great evolution. But, are these technologies easy enough to use? Can a developer be as productive as he/she is with XPages, for example? Well, the qu
Universal Data Replication|
Wed, Oct 18th 2017 7:21p Philippe Riand
One of the Darwino piece of code that I'm the most proud of is the replication engine. If a majority of our customers see it as a Domino to JSON replication engine, it goes far beyond that. In reality, it can replicate between virtually any data sources. It is a true, two ways, multi-point replication engine, borrowing some ideas from IBM Domino but going beyond its venerable ancestor in multiple places. The architectureThe main idea is that any data set can be represented as a set decorated J
The power of Domain Specific Languages|
Thu, Jun 15th 2017 11:03p Philippe Riand
We are all used to configuration files, whenever they are XML, JSON or simply text based. If this is ok for simple configurations, but it falls short when the complexity increases. The biggest issue is that they cannot provide more than what they are designed for, basically static values. Let's suppose, for example, that we have an XML configuration file exposing a property like: trueOk, simple enough. This property can be true or false. But now what if I want a more dynamic value, like true
Darwino as the IBM Domino reporting graal |
Sat, Mar 11th 2017 8:31p Philippe Riand
Reports, dashboards, data analytics... have been the conundrum of IBM Notes/Domino since the beginning. Its proprietary data structure, the absence of standard APIs and its deficient query capability make it very difficult. This has been ranked as one of the top need for any for business applications. I know several business partners who created great Domino solutions but struggling with poor reporting capabilities. Of course some attempts were made to fix it: LEI, DB2NSF,.. all incomplete and
Fri, Mar 3rd 2017 11:56p Philippe Riand
FaceBook officially introduced a few months ago a new technology called GraphQL. Well, rather than really being new, FaceBook made public and open source their internal graph query engine. It starts to be widely used for exposing APIs. For example, IBM Watson Worskpace makes use of it. I also heard that IBM Connections will also use it. In a nutshell, it allows powerful, tailored queries including navigation between the data sources, in a single query. As a result, it minimizes the number of re
Get your apps integrated with IBM Connections, cloud and on-premises!|
Fri, Feb 17th 2017 4:51a Philippe Riand
I've been using this blog to share some of the techniques we use in ProjExec to get tightly integrated with the Connections platform. I got a lot of feedback from developers who wanted to know more, so I'm moving a step further: Jesse Gallagher and I will describe these techniques in a breakout session @Connect 2017! DEV-1430 : IBM Connections Integration: Exploring the Long List of Options Program : Development, Design and Tools Topic : Enterprise collaboration Session Type : Breakout Sessio
When SQL meets NoSQL, you get the best of both worlds!|
Thu, Jan 26th 2017 11:46p Philippe Riand
At the heart of Darwino is an advanced, portable JSON document store, implemented on top of any relational database. I'm often being asked the following question "why did you implement that on top of an RDBMS?". Behind the scene, the real question is: "why are you not using MongoDB or another nosql database?" Well, I'm generally answering it with multiple arguments:It leverages all the RDBMS well known capabilities: transactions, data integrity, security, backups, performance, reporting, a
ReactJS or AngularJS? What about something else?|
Wed, Jan 25th 2017 9:25p Philippe Riand
Why AngularJS sounds familiar to XPages developers...|
Tue, Jan 3rd 2017 5:53p Philippe Riand
When I started to look at AngularJS a few years ago, I surprisingly found myself quickly comfortable with this technology. One of the reason is that many of its concepts are shared with XPages. Of course, there are fundamental differences, the most obvious being AngularJS a pure client technology while XPages, based on JSF, is a server side one. But still, they share a lot! If you know XPages, your experience understanding AngularJS should be similar to mine. I'm basing my experience on Angul