Feed aggregator

The Non-Jigsaw Improvements of Java 9

Javalobby Syndicated Feed - Wed, 04-Jan-17 00:01

Although the flagship feature of Java 9 is modularity, a large number of other enhancements are planned for this release. This article will provide an overview of those features that are scheduled for Java 9 release but are not as famous and glorious as the Jigsaw.

Reactive Streams

Reactive Streams are a contract for asynchronous stream processing with non-blocking back pressure. Publisher and Subscriber are two key concepts in the Reactive Streams specification. Publisher is a producer of items (and related control messages) that will be received by Subscribers, all these are done in a non-blocking way. Back pressure allows to control the amount of inflight data. That is, it will regulate the transfer between a slow publisher and a fast consumer and a fast publisher and a slow consumer.

Categories: Java

6 reasons for native Android development

codecentric Blog - Tue, 03-Jan-17 23:00

2016 was yet again a successful year for the mobile device market. The operating systems Android and iOS together reach a market coverage of 99.3%.

//www.idc.com/promo/smartphone-market-share/os

It sounds promising to develop cross-platform apps and share certain components between them to reduce code duplication. Based on the experience with the Xamarin.Android platform this post shows a few reasons why this may not be such a good idea after all.

About Xamarin.Android

The concept of the Xamarin.Android platform is promising at first sight. Combining an awesome language like C# with a cross-platform managed runtime (Mono) should enable the developers to focus more on producing new features instead of
maintaining code for Android and iOS separately.

The fact that C# provides features on language level for some Java- and Android specialties (i.e. manifest generation using annotations, implicit casts via generic methods, background threads via the async keyword) and sharing code using the Mono runtime seems to make mobile development more efficient for developers.

I think this approach has some disadvantages which drove me to abandon Xamarin.Android and proceed with pure Android development. Although the title says “native development” this post is not a rant against hybrid development, but a plea for vanilla Android, which got so much easier than a few years ago.

Reason #1: The vast ecosystem of native libraries

Since the beginning of Android the amount of libraries, which are developed for or enhanced to support Android, has grown. Only a small part of that is being ported to C# and available as NuGet-Packages for the Xamarin platform.

Reason #2: The direct vendor support by Google

The community and the vendor support around Android is far more extensive than for Xamarin.Android. Bugs can be created and discussed using the Android bug tracker (https://source.android.com/source/life-of-a-bug.html).
Newest Android SDK versions are available as soon as Google releases them, there is no need in waiting for someone to port them, as it is the case with Xamarin.Android.

The well known discussion about who is responsible for the bug is between the Android project and the developer, there is no third party like Xamarin involved.

Reason #3: Stackoverflow AKA developer love

Stackoverflow is *the* source for problem solutions in IT. Currently (as of 27.12.2016) there a 931,615 questions tagged with Android and only 18,590 tagged with Xamarin.

Reason #4: Android Studio

Since Google dropped Eclipse, switched to the IntelliJ platform and is offering Android Studio, the IDE has become much better. Although I am using IntelliJ IDEA and there is the Android plugin available, I am still using Android Studio separately because its such a good adaption of the Android developer usecase.

My favorite features are:

  • the incredibly awesome layout designer
  • the extensive amount of lint rules for code and layout optimization
  • Instant Run

Reason #5: Tools

The tools coming with the Android SDK are very well integrated in Android Studio. Additionally, Android uses the uniform, transparently described, extendable build system Gradle, which can also use the extensive Maven repositories as a source for third party libraries. The productivity as a developer is very good, because everything fits in nice and works well together.

Additionally, Android Studio is available for Windows and macOS. Until now, for Xamarin.Android you have to use Xamarin Studio for macOS and Visual Studio for Windows. But this may change in the near future: https://www.visualstudio.com/vs/visual-studio-mac/

Reason #6: Startup time and size of the apps

Apps developed with Xamarin.Android use the Mono runtime to execute the code written in C#. This runtime has to be booted everytime the app starts in contrast to the JVM in Android, which is always running. This results in increased deployment times while developing in comparison to Android Studios Instant Run.

The Mono runtime and parts of Mono.Android have to be integrated in the app, which leads to a bigger application size.

Conclusion

A few of these reasons are of course personal taste, but I think native app development is more cost effective than is commonly believed. Which experience do you have with Xamarin and Android? Share your thoughts and leave a comment!

The post 6 reasons for native Android development appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

If You Wrote Java in 2016, Here Are the Trends You Couldn't Have Missed

Javalobby Syndicated Feed - Tue, 03-Jan-17 22:01

See more posts like this at Takipi.

There are a lot of trending topics when it comes to code, and trying to keep up with everything that’s going on is a full-time job on its own. If you’re wondering how to separate the wheat from the chaff, we’ve gone ahead and done the work for you.

Categories: Java

Learn Drools (Part 7): Salience

Javalobby Syndicated Feed - Tue, 03-Jan-17 05:31

Let's quickly summarize how Drools works. Rules are written on .drl files, facts or POJOs are inserted in the Knowledge Base, then rules are applied on each fact. If a "When" part of a rule satisfies a fact, the "Then" part will execute.

Having said that, one question just popped into my mind. If multiple rules match a fact, in which order will they be fired?

Categories: Java

Gradle Goodness: Getting Project Info Into Rule-Based Model Configuration

Javalobby Syndicated Feed - Tue, 03-Jan-17 04:01

Rule-based model configuration in Gradle allows us to have a graph of objects with dependencies that are resolved by Gradle. To make this work, Gradle needs to know about the objects in this model space. The model space is populated with objects of our own and with objects from Gradle.

At the time of writing this blog post, we cannot interact with the Gradle Project object in our rule-based model configuration. It is not officially part of the model space. This might, and probably will, change in the future, and the Project objects managed by Gradle will be part of the model space.

Categories: Java

Developing a Geospatial Webservice With Kotlin and Spring Boot [Video]

Javalobby Syndicated Feed - Tue, 03-Jan-17 02:01

As described in this announcement on the Spring blog, it is now easy to create a Spring Boot application using Kotlin.

Thanks to a sample geospatial messenger application, we will show how Spring Boot and Kotlin share the same pragmatic, innovative, and opinionated mindset to allow you to build simple but powerful projects.

Categories: Java

Topic Modeling of the codecentric Blog Articles

codecentric Blog - Tue, 03-Jan-17 00:30

The major part of big data is unstructured data. When an organization wants to leverage its data or external information from social media with the goal to make better business decisions, a challenge is to retrieve important information from unstructured text documents written in natural language. The main goal of techniques from natural language processing (NLP) is to turn text into structured data that can be used for further analysis.

A particular example of NLP are probabilistic topic models that seek to discover common topics in a collection of documents. Unsupervised machine learning algorithms have been developed to find such topics, which can be used for organizing and managing the collection of documents. Topic models allow to address interesting data science related questions concerning, for instance, recommendations: “What articles are most relevant for a certain topic?”, and clustering: “What are newly published articles discussing and how similar are two articles?”. The derived topics can also be viewed as a dimensionality reduction and can be used as features for subsequent machine learning tasks (feature engineering).

In this article, we present results from a topic modeling in the codecentric blog. The topics are used to analyze the blog content and how it changes over time. Of course one could argue that authors usually assign their blog posts to a category and might use additional tags that give hints about its content. When no such labels are available in a very large collection of documents or if one wants to obtain a more objective clustering, topic modeling is an appropriate tool.

We perform the analysis using Apache Spark with its Python API in a Jupyter Notebook, which you may download here. Spark allows us to build a scalable machine learning (ML) pipeline containing latent Dirichlet allocation (LDA) topic modeling from its machine learning library (MLlib). A small Spark cluster can be easily set up, as described in this post. Another advantage in using Spark is that the developed prototypes of a data product can be easily translated to a production environment.

This post is organized in five sections:

LDA Topic Model
Data Preprocessing
Model Training and Evaluation
Results
Summary and Conclusion

The first three rather technical sections describe some theoretical concepts of LDA topic modeling as well as the implementation of data preprocessing and model training. Some readers might want to directly jump to the Results section.

LDA Topic Model

In natural language processing, a probabilistic topic model describes the semantic structure of a collection of documents, the so-called corpus. Latent Dirichlet allocation (LDA) is one of the most popular and successful models to discover common topics as a hidden structure of the collection of documents. According to the LDA model, text documents are represented by mixtures of topics. This means that a document concerns one or multiple topics in different proportions. A topic can be viewed as a cluster of similar words. More formally, the model assumes each topic to be characterized by a distribution over a fixed vocabulary, and each text document to be generated by a distribution of topics.

The basic assumption of LDA is that the documents have been generated in a two-step random process rather than having been written by a human. The generative process for a document consisting of N words is as follows. The most important model parameter is the number of topics k that has to be chosen in advance. In the first step, the mixture of topics is generated according to a Dirichlet distribution of k topics. Second, from the previously determined topic distribution, a topic is randomly chosen, which then generates a word from its distribution over the vocabulary. The second step is repeated for the N words of the document. Note that LDA is a bag-of-words model and the order of words appearing in the text as well as the order of the documents in the collection is neglected.

When starting with a collection of documents and considering the reverse direction of the generative process, LDA topic modeling is the method to infer what topics might have generated the collection of documents. Further details about LDA can be found in the original paper by Blei et al. or in this nice review about probabilistic topic models. Probabilistic topic models are a suite of algorithms that have been developed to estimate the distribution of topics from a corpus of text documents because there is no exact solution for these distributions.

Data Preprocessing

We follow a typical workflow of data preparation for natural language processing (NLP). Textual data is transformed into numerical feature vectors required as input for the LDA machine learning algorithm. A similar approach is described in a recent blog post about spam detection.

A MySQL table of the blog posts is loaded into a Spark DataFrame using JDBC; an additional Spark submit argument contains the MySQL Connector jar file.

# read from mysql table, only use published posts sorted by date
df_posts = ((spark.read.format("jdbc")
 .option("url", "jdbc:mysql://localhost/ccblog")
 .option("driver", "com.mysql.jdbc.Driver")
 .option("dbtable", "wp_2_posts")
 .option("user", "*****")
 .option("password", "**********")
 .load()
 ).filter("post_type == 'post'").filter("post_status == 'publish'")
 .sort("post_date"))

From the post content, we first have to extract the text that is decorated with various HTML tags. A beautiful Python library to achieve this is BeautifulSoup. An example raw text is shown in the notebook for the first entry of the post content. The textual data extracted from the HTML file is then normalized by removing numbers, punctuation and other special characters and using lowercase. A so-called tokenizer splits the sentences into words (tokens) that are separated by whitespace. These operations on the Spark DataFrame columns are performed via Spark’s user-defined functions (UDF).

extractText = udf(
 lambda d: BeautifulSoup(d, "lxml").get_text(strip=False), StringType())
removePunct = udf(
 lambda s: re.sub(r'[^a-zA-Z0-9]', r' ', s).strip().lower(), StringType())
 
# normalize the post content (remove html tags, punctuation and lower case..)
df_posts_norm = df_posts.withColumn("text", removePunct(extractText(df_posts.post_content)))
 
# breaking text into words 
tokenizer = RegexTokenizer(inputCol="text", outputCol="words", 
                           gaps=True, pattern=r'\s+', minTokenLength=2)
df_tokens = tokenizer.transform(df_posts_norm)

The RegexTokenizer is an example of a Spark transformer. Inspired by the concept of scikit-learn, transformers and estimators can be connected to a pipeline, i.e., a machine learning workflow comprising the various stages of preprocessing, feature generation, and model training and evaluation.

Language identification

We only want to analyze English blog posts and have to identify the language since no such tag is available in our data set. A simple classification between English and German as the primary language is achieved by comparing the fraction of stop words in the text. Stop words are the most common words of a given language such as “a”, “of”, “the”, “and” in English. Lists of stop words for different languages are provided by NLTK. The Fraction of English stop words in a given article is obtained by counting the number of English stop words that appear at least once in the text, divided by the total number of stop words in the list. Similarly, we calculate the fraction of German stop words and decide which language an article mainly uses by the larger of the two fractions.

from nltk.corpus import stopwords
englishSW = set(stopwords.words('english'))
germanSW = set(stopwords.words('german'))
 
nEngSW = len(englishSW)
nGerSW = len(germanSW)
 
RatioEng = udf(lambda l: len(set(l).intersection(englishSW)) / nEngSW)
RatioGer = udf(lambda l: len(set(l).intersection(germanSW)) / nGerSW)
 
df_tokens_en = (df_tokens.withColumn("ratio_en", RatioEng(df_tokens['words']))
                         .withColumn("ratio_ge", RatioGer(df_tokens['words']))
                         .withColumn("Eng", col('ratio_en') > col('ratio_ge'))
                         .filter('Eng'))

Filtering out stop words and stemming

The last preprocessing steps are filtering out the English stop words, as these common words presumably do not help in identifying meaningful topics, and stemming the words such that, for instance, “test”, “tests”, “tested”, and “testing” are all reduced to their word stem “test”. The list of stop words is expanded by moreStopWords, which we manually collect as follows. After having trained an LDA model, we inspect the topics and identify additional stop words, which are filtered out for the subsequent model training. This procedure is repeated, as long as stop words appear in the lists of top words.

swRemover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="filtered")
swRemover.setStopWords(swRemover.getStopWords() + moreStopWords)
 
df_finalTokens = swRemover.transform(df_tokens_en)
 
# Stemming
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english", ignore_stopwords=False)
udfStemmer = udf(lambda l: [stemmer.stem(s) for s in l], ArrayType(StringType()))
 
df_finalTokens = df_finalTokens.withColumn("filteredStemmed",
                                           udfStemmer(df_finalTokens["filtered"]))

Feature generation

The feature vectors are then generated following a simple bag-of-words approach using Spark’s CountVectorizer. Each document is represented as a vector of counts, the length of which is given by the number of words in the vocabulary, which we set to 2500. The CountVectorizer is an estimator that generates a model from which the tokenized documents are transformed into count vectors. Words have to appear at least in two different documents and at least four times in a document to be taken into account.

cv = CountVectorizer(inputCol="filteredStemmed", outputCol="features", vocabSize=2500, minDF=2, minTF=4)
 
cvModel = cv.fit(df_finalTokens)
 
countVectors = (cvModel
                .transform(df_finalTokens)
                .select("ID", "features").cache())
 
cvModel.save("path/to/model/file")

Model Training and Evaluation

The Spark implementation of LDA allows online variational inference as a method for learning the model. Data is incrementally processed in small batches, which allows scaling to very large data sets that might even arrive in a streaming fashion.

df_training, df_testing = countVectors.randomSplit([0.9, 0.1], 1)
 
numTopics = 20 # number of topics
 
lda = LDA(k = numTopics, seed = 1, optimizer="online", optimizeDocConcentration=True,
 maxIter = 50,           # number of iterations
 learningDecay = 0.51,   # kappa, learning rate
 learningOffset = 64.0,  # tau_0, larger values downweigh early iterations
 subsamplingRate = 0.05, # mini batch fraction 
 )
 
ldaModel = lda.fit(df_training)
 
lperplexity = ldaModel.logPerplexity(df_testing)
 
ldaModel.save(path)

In general, the data set is split into a training set and a testing set in order to evaluate the model performance via a measure such as the perplexity, i.e., a measure of how well the word counts of the test documents are represented by the topic’s word distributions. However, we find it more useful to evaluate the model manually by looking at the resulting topics and the corresponding distribution of words. A good result is obtained training a 20-topic LDA model on the entire corpus of the English codecentric blog articles. Using a more quantitive performance measure would allow a hyper-parameter tuning. A grid search for the optimal parameters such as the number of topics is facilitated by Spark’s pipeline concept. The ML models are saved for later usage.

Results

In the following we present results of a 20-topic model trained on the entire data set of English codecentric blog articles that were published until and including November 2016. A visualization of the distribution of words for the two top topics is given by the word clouds in Fig.1 and Fig.2. The size of the words correspond to their relative weights; words having a large weight are more often generated by this topic. With the top words and by inspection of some documents discussing a given topic, it is often possible to manually assign somewhat summarizing labels to the topics. The topics that correspond to the word clouds in Fig.1 and Fig.2 are labeled “Agility” and “Testing”, respectively. Note that some words are reduced to a non-valid word stem like “stori” or “softwar”.

word cloud - agility

Figure 1. Word cloud of the topic labeled “Agility“.

word cloud - Testing

Figure 2. Word cloud of the topic labeled “Testing“.

Labeling of topics and identifying top documents

The twelve most meaningful topics of our 20-topic model are listed in Tab.1. These topics are selected by hand and meaningful is of course a quite subjective measure. We exclude for instance topics where two very different themes appear. For each topic, we suggest a label that summarizes what the topic is about and provide the top words in the order of their probability to be generated. In order to identify the top document for a given topic, we order the documents by their probability to discuss that topic. The top document is defined as the document having the largest contribution from the given topic compared to all other documents.

Table 1. The twelve top topics of a 20-topic model trained on all English codecentric blog posts.
topic label top words top document
0 Testing test, file, application, server, project Testing JavaScript on various platforms with Karma and SauceLabs, Ben Ripkens
1 DevOps build, plugin, run, imag, maven How to enter a Docker container, Alexander Berresch
2 Memory Management java, gc, time, jvm, memory Useful JVM Flags – Part 2 (Flag Categories and JIT Compiler Diagnostics), Patrick Peschlow
3 Data/Search data, index, field, query, operator Big Data – What to do with it? (Part 1 of 2), Jan Malcomess
4 Reactive Systems state, node, system, cluster, data A Map of Akka, Heiko Seeberger
5 Math method, latex, value, point, parameter The Machinery behind Machine Learning – Part 1, Stefan Kühn
6 Spring spring, public, class, configure, batch Boot your own infrastructure – Extending Spring Boot in five steps, Tobias Flohre
7 Frontend module, type, grunt, html, import Elm Friday: Imports (Part VIII), Bastian Krol
8 Database mongodb, document, id, db, name Spring Batch and MongoDB, Tobias Trelle
9 Functional Programming function, name, var, node, call Functional JavaScript using Lo-Dash, an underscore.js alternative, Ben Ripkens
10 Agility team, develop, time, agile, work What Agile Software Development has in common with Sailing, Thomas Jaspers
11 Mobile App app, notif, object, return, null New features in iOS 10 Notifications, Martin Berger

Next we determine the number of documents having the same main topic. Remember that a document usually concerns several topics in different proportions. The main topic of a document is defined as the topic with the largest probability.

getMainTopicIdx = udf(lambda l: int(numpy.argmax([float(x) for x in l])), IntegerType())
 
countTopDocs = (ldaModel
                .transform(countVectors)
                .select(getMainTopicIdx("topicDistribution").alias("idxMainTopic"))
                .groupBy("idxMainTopic").count().sort("idxMainTopic"))

For each document in our data set we identify the topic index for which the probability is the largest, i.e., the main topic. Grouping by the topic index, counting, and sorting results in the counts of documents per topics plotted in Fig.3. The most discussed topics in the entire collection of blog articles are topic 0 – “Testing”, topic 6 – “Spring”, and topic 10 – “Agility”.

topics - counts

Figure 3. For each topic, we count the number of documents that discuss the topic with the largest probability (main topic). Only the 12 most meaningful topics of the 20-topic model are shown.

Evolution of blog content over time

How many blog articles were published on a specific topic during one year? This question is addressed in Fig.4 illustrating for the top topics, “Testing”, “Spring”, and “Agility”, the number of documents that discuss the topic with the largest probability as a function of time. At first glance, it appears that “Agility” became less important after a hype in 2009, as seen by the red line in Fig.4. However, another explanation would be that in later years, agile methodologies are not exclusively discussed as a main topic in a document but rather co-appear with other topics in smaller proportions. A growing number of article are dedicated to both the topics “Spring” and “Testing”, with some oscillations for the latter. What might also be interesting to look at is the number of documents that discuss a specified topic with a probability larger than some threshold value rather than considering only the largest probability, as in Fig.4. However, we do not go into detail here and only provide a glimpse on possible analyses.

Top documents over time

Figure 4. Time evolution of the number of documents with the same main topic. Results are shown for the three top topics obtained from the LDA model trained on the entire data set.

Evolution of topics over time

Another interesting question is at what time topics appear or disappear and how the words representing a topic change over time. For the results in Fig.4 only a single LDA model was trained on the entire data set. The resulting topic distribution is fixed and does not change over time. In order to study the evolution of topics over time in a systematic way, machine learning researchers have developed dynamic topic models.

Here, we take a simpler approach investigating how the distribution of topics change over time. Several different LDA models are trained on the blog articles of a specific year including articles from all previous years. Thus, we obtain topic distributions for the collection of documents published during the years 2008-2010, 2008-2011, …, 2008-2016. We then try to identify the same topics, which might contain different words. In principle, this approach allows to predict next years’ topics given all the articles from the previous years. Without going into details, we present as an example in Tab.2 the top ten words for the topic “Agility” from different LDA models trained with data until and including consecutive years.

Table 2. Top ten words for the topic “Agility” from different LDA models trained on blog articles until and including the given year. The order of the words from top to bottom represent the probability to be generated.
2010 2011 2012 2013 2014 2015 2016
agil
scrum
team
project
develop
manag
stori
point
sprint
meet
team
develop
project
scrum
agil
manag
time
product
softwar
test
team
agil
develop
scrum
product
softwar
project
manag
continu
stage
agil
team
role
session
product
develop
manag
peopl
plan
time
develop
agil
team
scrum
work
time
session
product
peopl
softwar
develop
work
agil
time
team
softwar
test
problem
code
point
team
develop
time
agil
work
project
product
softwar
scrum
problem

As can be seen in Fig.5, the probability of top words to appear in a text about “Agility” changes over time. For instance, there is a slight decrease in the use of the words “agile” and “scrum” in the period from 2010 until 2016.

words_time_agility

Figure 5. Time evolution of some words in the topic “Agility”. The weights of the words, shown as a function of time, correspond to the probability to appear in a document about agility.

The topic distribution of this article

In order to test the trained LDA topic model, we now predict the topics for the present article. We use the LDA model trained on the entire data set and predict the present article before writing this paragraph. As a result, we obtain the topic distribution depicted in Fig.6 as a pie chart. The two main topics with about 20 percent are “Functional Programming” and “Data/Search”, which is quite appropriate. All other topics having less than 5 percent probability are collected in the “Other” part.

topic distribution of this article

Figure 6. The topic distribution for this article predicted by the LDA model trained on the entire dataset of all English codecentric blog articles.

Summary and Conclusion

In this article, we analyze the content of the codecentric blog by means of Spark’s implementation of LDA topic modeling. Data preprocessing steps necessary for NLP are described. Training a 20-topic model on all blog posts allows to identify a number of meaningful topics. Some exploratory investigations on the time evolution of the blog content and the topics are performed using different LDA models trained on articles until a specified year. We thereby obtain hints on how topics and words have changed over time. In the last part we successfully predict the topics of the present blog article.

In a follow-up post, it would be interesting to use German blog posts and see whether the topics depend on the language. It might be worth to compare LDA with e.g. non-negative matrix factorization and more elaborate (dynamic) topic models with different features such as tf-idf. Further insight into how topics tend to co-occur could be gained by modeling the connection between topics in a graph in order to study relations between different topics. As a concluding remark, note that topic modeling is not restricted to text documents but can also be applied to other unstructured data such as images or video clips, e.g., video behavior mining, where visual features are interpreted as words.

References

David M. Blei, Andrew Y. Ng, Michael I. Jordan. “Latent Dirichlet Allocation”  Journal of Machine Learning Research 3. 993-1022. 2003.

Blei, David. “Probabilisitic Topic Models.” Communications of the ACM. 55.4: 77-84. 2012.

Hoffman, Matthew, Francis R. Bach, and David M. Blei. “Online learning for latent dirichlet allocation.” Advances in neural information processing systems. 2010.

David M. Blei and John D. Lafferty. “Dynamic topic models.” In Proceedings of the 23rd international conference on Machine learning. ACM, 113-120. 2006.

The post Topic Modeling of the codecentric Blog Articles appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

The Essential Java 9 Feature You Probably Never Heard Of

Javalobby Syndicated Feed - Tue, 03-Jan-17 00:01

For more articles like this, visit Takipi.

Java 9 is set to release in July 2017, and it will come with a list of new and revised features, methods, and other elements. In our search through the JDK enhancement proposals (JEP), we came across JEP 266. It holds some interesting updates to CompletableFuture, Concurrency updates and support for Reactive Streams, that caught our attention.

Categories: Java

Spring Boot Starters

Javalobby Syndicated Feed - Mon, 02-Jan-17 22:01

Ho, ho, hooo! It looks like all members of Infinispan Community have been nice and Santa brought you Spring Boot Starters!

This will make you even more productive and your code less verbose!

Categories: Java

Java Bullshifier: Generate Massive Random Code Bases

Javalobby Syndicated Feed - Mon, 02-Jan-17 05:31

For more articles like this, visit Takipi.
Image title

It's the command line tool you’ve been waiting for. Or not. After all, it’s pretty esoteric. Either way, it’s pretty useful to some and an amusing utility to others. Bullshifier is an internal OverOps tool developed by David Levanon and Hodaya Gamliel. It’s used in order to test some of our monitoring capabilities over ridiculously large code bases, with transactions that go thousands of calls deep, over thousands of classes, and end up with exceptions.

Categories: Java

Enhancing JAX-RS Project Analysis With JavaDoc [Video]

Javalobby Syndicated Feed - Mon, 02-Jan-17 04:01

Recently, I released version 0.12 of the JAX-RS Analyzer. As a main improvement, JavaDocs of JAX-RS resources will now be taken into account for Swagger JSON.

The Maven plugin usage is updated to:

Categories: Java

Java and the Blockchain [Slides]

Javalobby Syndicated Feed - Mon, 02-Jan-17 02:01

I spoke last month at the Sydney JVM Meetup about Ethereum and web3j. The slides from the talk are available below.

Java and the Blockchain: Introducing Web3j from Conor Svensson


Categories: Java

Finding Inner Peace With the Liskov Substitution Principle

Javalobby Syndicated Feed - Mon, 02-Jan-17 00:01

We’re successfully handling divergent requirements by conforming to SRP. We’ve made our systems extensible by unleashing the power of OCP. Everything at WooMinus seems calm. But your life couldn’t be further from that. What happened?! What does this have to do with Liskov Substitution Principle? Read on to see how it will bring peace to your life.

King Benedictus and Real Politics

It’s late in the evening. Your wife asked you to go to the shop and buy a liter of milk and, if there are eggs, to get ten. So, you’re coming back carrying 10 liters of milk, the night appears so peaceful. Suddenly, a black van drives by and stops. Two big guys in hoodies punch you and force you in the car. It’s all black. A strangely familiar voice begins to whisper…

Categories: Java

The State of Java EE at Java Day Kiev

Javalobby Syndicated Feed - Sun, 01-Jan-17 22:01

Java Day Kiev took place Oct. 14-15. Led by the Ukrainian JUG, it is one of the most significant developer events in Ukraine. The event attracted a bevy of world-class speakers including Burr Sutter, Ivar Grimstad, Sebastian Daschner, Ruslan Sinitskiy and Edson Yanaga. Java EE had an excellent showing at the event including my own talks. The organizers had invited me in previous years, but I could not go to Ukraine due to Oracle's overly conservative travel restrictions. This year was my opportunity for redemption, so it was important for me to attend. I suggest others do the same to support Ukrainian developers when they need us most.

The organizers were very kind to arrange a special session on the current state of Java EE with the Ukraine JUG the day before the conference. Ivar and I led the full house session. We talked about Java EE 7 adoption, the importance of Java EE to the ecosystem, and the forward plans for Java EE 8 as well as Java EE 9 that Oracle shared at JavaOne 2016. We also talked about the key  MicroProfile initiative that aims to bring a collaborative, vendor-neutral approach to microservices in the Java ecosystem. The heart of the talk covers the key features Java EE 8 will bring in 2017 such as HTTP/2, a complete security API overhaul, even stronger JSON support, support for HTML 5 Server-Sent Events (SSE), CDI 2, more reactive programming support, more pruning, and Java SE 8 alignment. The current slides for the talk are here (click here if you can't see the embedded slide deck):


Categories: Java

Running Spring Boot Apps on Windows with Ansible

codecentric Blog - Sun, 01-Jan-17 19:00

There are times you have to use a Windows box instead of your accustomed Linux machine to run your Spring Boot app on. Maybe you have to call some native libraries, that rely on an underlying Windows OS or there´s some other reason. But using the same Continuous Integration (CI) tools like we are used to should be non-negotiable!

Windows? No problem, but not without beloved Ansible!

No matter how – it´s fine if we have to use Windows to run our App on. But we should´nt accept beeing forced to give up on our principles of modern software development like Continuous Integration (CI) and Deployment (CD) or automation of recurring tasks like setting up servers and bringing our apps to life on them.

In our current CI-Pipeline we have a Jenkins building & testing our Spring Boot apps and use Ansible to provision our (Linux) machines, so that we can deploy and run our apps on them. Why not just do the same with those Windows boxes?

Seems to be something like a dream? Ansible was this Unix/SSH-thing right?! How could that work with Windows? Our Jenkins runs on Linux – and this should be somehow capable of managing Windows environments?

windows_is_coming

Well, that´s possible and there´s a way to use Ansible here

Categories: Agile, Java, TDD & BDD

JDK 8: Lessons Learned With Lambdas and Streams [Video]

Javalobby Syndicated Feed - Sun, 01-Jan-17 04:01

Recorded at SpringOne Platform 2016.

Speaker: Simon Ritter, Azul

Categories: Java

10 Productive Ways to Use Spring Boot [Video]

Javalobby Syndicated Feed - Sat, 31-Dec-16 22:01

Recorded at SpringOne Platform 2016.

Speakers: Stéphane Nicoll, Brian Clozel

Categories: Java

Let’s build a Spotify GraphQL Server – Part 1

codecentric Blog - Sat, 31-Dec-16 16:01

GitHub built a GraphQL API server. You can write your own, too. This article shows how to write a GraphQL Server for Spotify:

  • How a simple Express Javascript server with the GraphQL endpoint can be set up
  • Steps for implementing the proxy for retrieving / serving the data – spotify-graphql-server on Github

simple spotify client screenshot

What is GraphQL?

The main motivation for developing GraphQL was the need to have efficient and flexible client-server communication

  • GraphQL was built by facebook to have an advanced interface for specific communication for mobile clients requirements
  • GraphQL is a query language
  • GraphQL works as a central data provider (aka. single endpoint for all data)

See more details on graphql.org

As an example, let’s build a simple Spotify server, focussing on a mobile client with only minimum data needs.

Instead of fetching all needed data from these different endpoints by the client

https://api.spotify.com/v1/search?type=artist
https://api.spotify.com/v1/artists/{id}
https://api.spotify.com/v1/artists/{id}/albums
https://api.spotify.com/v1/albums/{id}/tracks

we will load these data per fetching from one GQL endpoint.

  • Only one request: Each http request in mobile communication is expensive, because of the higher latency / ping times.
  • Only the minimum data are transfered: This saves bandwith, because it avoids over-fetching all unneeded data, compared to the full REST response.

To achieve this, we will

  • have a Javascript Express server, with Facebook’s reference implementation of graphql-js,
  • add a GraphQL schema
  • fetch the data from the Spotify API endpoint and aggregate them (asynchronously) on our server

Let’s start our project with a simple server setup

import express from 'express';
import expressGraphQL from 'express-graphql';
import schema from './data/schema';

const app = express ();
app.use('/graphql', expressGraphQL(req => ({
    schema,
    graphiql: true,
    pretty: true
})));

app.set('port', 4000);
let http = require('http');
let server = http.createServer(app);
server.listen(port);

This still needs a schema:

GraphQL Schema

“HTTP is commonly associated with REST, which uses “resources” as its core concept. In contrast, GraphQL’s conceptual model is an entity graph” (http://graphql.org/learn/serving-over-http)

A GraphQL schema consists of a list of type definitions.

For a minimalistic schema it needs a root node (aka Query type) which provides (indirect) access to any other data nodes in a graph.

Schema definition variant 1

In the following implementation, the query type has just one field hi which represents just the string hello world:

// after adding these imports:
import {
    GraphQLSchema,
    GraphQLString as StringType,
} from 'graphql';

// minimalistic schema

const schema = new GraphQLSchema({
    query: new GraphQLObjectType({
        name: 'Query',
        fields: {
            hi: {
                type: StringType,
                resolve: () => 'Hello world!'
            }
        }
    })
});

Let’s use the schema from above in our simple express server, start it with the command

babel-node server.js

and run a simple query with curl:

curl 'http://localhost:4000/graphql' \
     -H 'content-type: application/json' \
     -d '{"query":"{hi}"}'

If everything worked fine, we should get a JSON response where the query result can be found in the data property, any error information could be found in an optional error property of the response object.

{
  "data": {
    "hi": "Hello world!"
  }
}

Because graphiql was enabled on server start, you can simply point your browser to

http://localhost:4000/graphql?query={hi} and get the following page:


graphiql-hi

If we add some descriptions, GraphQL allows inspection. Let’s add a schema description and see how it works:

import {
    GraphQLString as StringType,
} from 'graphql';

const schema = new GraphQLSchema({
    query: new GraphQLObjectType({
        name: 'Query',

        description: 'The root of all queries."',

        fields: {
            hi: {
                type: StringType,

                description: 'Just returns "Hello world!"',

                resolve: () => 'Hello world!'
            }
        }
    })
});

We get code completion, and additional swagger like API documentation for free, which is always up-to-date! graphiql-hi-with-docs

We can fetch the built-in schema, as graphiql does it in the background:

The built in schema can be fetched per

curl 'http://localhost:4000/graphql' \
     -H 'content-type: application/json' \
  -d '{"query": "{__schema { types { name, fields { name, description, type {name} } }}}"}'

which gives us a long JSON response… hard to read (for humans).

So we should help us out of this mess by creating another representation using the printSchema module, which gives us a nice, readable output.

To create a more readable form, let’s create another representation using printSchema from graphql:

import { printSchema } from 'graphql';
import schema from './schema';

console.log(printSchema(schema));

This prints this format:

# The root of all queries."
type Query {
  # Just returns "Hello world!"
  hi: String
}

You might recognize that this is the same format as it is used when defining types in flowtype (type annotations for javascript)

Schema definition variant 2

This can even be used as a shorter, alternative approach to setup the schema:

import { buildSchema } from 'graphql';

const schema = buildSchema(`
#
# "The root of all queries:"
#
type Query {
  # Just returns "Hello world!"
  hi: String
}
`);

Then we just need to define all resolvers in the rootValue object, like the hi() function:

app.use('/graphql', expressGraphQL(req => ({
    schema,
    rootValue: {
      hi: () => 'Hello world!'
    },
    graphiql: true,
    pretty: true
})));

We could use arguments in the resolvers and could return a Promise. This is great, because it allows that all data fetching can run asynchronously! – But let’s postpone further discussion about timing aspects yet.

We will see this in our following example in the queryArtists resolve of Query later.

Let’s develop the real schema.

The most basic components of a GraphQL schema are object types, which just represent a kind of object you can fetch from your service, and what fields it has. from Schemas and Types

import { buildSchema } from 'graphql';

const schema = buildSchema(`
#
# Let's start simple.
# Here we only use a little information from Spotify API
# from e.g. https://api.spotify.com/v1/artists/3t5xRXzsuZmMDkQzgOX35S
# This should be extended on-demand (only, when needed)
#
type Artist {
  name: String
  image_url: String
  albums: [Album]
}
# could also be a single
type Album {
  name: String
  image_url: String
  tracks: [Track]
}
type Track {
  name: String
  preview_url: String
  artists: [Artists]
  track_number: Int
}

# The "root of all queries."

type Query {
  // artists which contain this given name
  queryArtists(byName: String = "Red Hot Chili Peppers"): [Artist]
}
`);

This graphql schema cheatsheet by Hafiz Ismail gives great support.

Here we defined concrete Types, and also their relations, building the structure of our entity graph:

  1. Uni-directional connections, e.g. albums of a type Artist: To specify this relation, we just define it as a field, of a specific type array of Albums. Any artist itself will be found when we use the query field/method with the query parameter: So, starting from the top-query, any node in our complete entity graph can be reached, e.g. Let’s fetch all tracks of any album of a specific artist by this query:
    {
    queryArtists(byName:"Red Hot Chili Peppers") { 
      albums {
        name
        tracks {
          name
          artists { name }
        }
      }
    }
    }
    
  2. Even while the GraphQL can only provide tree-like data, the query can also fetch data from a cyclic graph. As you can see in the Track‘s artists field, above, or similar to fetching the followers of all followers of a Twitter user…

Resolvers – how to fill with data

Quick start with a mocking server

Because the schema in GraphQL provides a lot of information, it can be directly used to model sample data for running a mock server!

To demonstrate this, we use the mockServer from graphql-tools library and create dummy data dynamically:

import { mockServer, MockList } from 'graphql-tools';

let counter = 0;
const simpleMockServer = mockServer(schema, {
    String: () => 'loremipsum ' + (counter ++),
    Album: () => {
        return {
            name: () => { return 'Album One' }
        };
    }
    }
});

const result = myMockServer.query(`{
  queryArtists(artistNameQuery:"Marilyn Manson") { 
    name
    albums {
      name
      tracks {
        name
        artists { name }
      }
    }
  }
}`).then(result => console.log(JSON.stringify(result, '  ', 1))));

You can try it yourself with our source project per:

npm run simpletest

Result:

{
 "data": {
  "queryArtists": {
   "name": "loremipsum 1",
   "albums": [
    {
     "name": "Album One",
     "tracks": [
      {
       "name": "loremipsum 3",
       "artists": [
        {
         "name": "loremipsum 4"
        },
        {
         "name": "loremipsum 5"
        }, 
        "..." ] } ] } ] } } }

See apollo’s graphql-tools mocking for how to tweak it in more details.

Of course, we can use some more sophisticated mock data, but this was already quite usable when we would like to start any client-side development on top of our schema!

Retrieving and serving real data

To get real data, we need to implement these resolver functions, just like the hi function above. These functions will be used to retrieve any data from any data source. We can use arguments for access further query parameters.

In our case, let’s start with the query for an artist by name:

We just have the queryArtists resolver implementation:

//...
const app = express();
app.use('/graphql', expressGraphQL(req => ({
    schema,
    rootValue: {
        queryArtists: (queryArgs) => fetchArtistsByName(queryArgs)
    }
    // ...
})));

We could have add the resolvers into the schema definition as we did in the first variant above, but for less complex schemas like this one I prefer the second variant. It lets me split the logic for data fetching from the type definitions.

Any ‘field’ in rootValue corresponds to a ‘top query’ with the same name. Currently we only have the queryArtists.

Different kinds of resolvers:

  • it can be any constant value/object: e.g. "Hello world."
  • it may be a function: e.g. new Date()
  • it may be a function returning a Promise which gets resolved asynchronously: e.g. fetch("from_url")
  • it can be omitted, if the value can be derived from a property of parent object by same name automatically: In fact, any artist’s fields which was returned from fetchArtistsByName get returned directly.

This allows us to use and integrate all the powerful libraries out there without much effort! It’s easy to use mongoose, any github, twitter, or other client libs!

Here we have our own small implementation for fetching information per Spotify REST API.

In this query, we use the argument byName:

{
    rootValue: {
        queryArtists: ({ byName }) => {
            // using ES6 destructuring which is shorter and allows
            // default values
            return fetchArtistsByName(byName);
        }
    }
}

Just to get an impression, here we can see how to query the REST api of Spotify:

import fetch from 'node-fetch';

const fetchArtistsByName = (name) => {
  return fetch('https://api.spotify.com/v1/search?q=${name}&type=artist')
    .then((response) => {
        return response.json();
    })
    .then((data) => {
        return data.artists.items || [];
    })
    .then((data) => {
        return data.map(artistRaw => toArtist(artistRaw));
    });
};

With the information in the raw JSON response we need to create the list of Artists per toArtist(). Reminder: any fields with same name as in schema will be used automatically!

const toArtist = (raw) => {
    return {
        // fills with raw data (by ES6 spread operator):
        ...raw,

        // This needs extra logic: defaults to an empty string, if there is no image
        // else: just takes URL of the first image
        image: raw.images[0] ? raw.images[0].url : '',

        // .. needs to fetch the artist's albums:
        albums: (args, object) => {
            // this is similar to fetchArtistsByName()
            // returns a Promise which gets resolved asynchronously !
            let artistId = raw.id;
            return fetchAlbumsOfArtist(raw.id); // has to be implemented, too ...
        }
    };
};

Summary:

We created our own Graphql server which loads basic information from the Spotify API server for us. Now, we can start fetching artists’ data in Graphiql: This should work with our local server per http://localhost:4000/graphql?query=%7B%0A%20%20queryArtis…. or per https://spotify-graphql-server.herokuapp.com/graphql?query=…. which gives you this result: graphiql-artists

I will update the published version at spotify-graphql-server. It is based on the sources of spotify-graphql-server on Github, so you can play around with the latest version with latest features developed in this blog posts. Have fun!

In this article we already saw these main advantages of GraphQL:

  • It allows us to specify exactly which data we need (no “over-fetching”)
  • Extend the schema driven by the requirements of the client (think of Open-Closed principle)
  • It defines a contract which always allows to verify the query against the schema definition (this works also at build time!)

In the next blog posts we will find out, how to

  • build our own client, backed on free libraries from the awesome graphql eco system
  • use standard express features for authentication for personalized infos (like play lists) …
  • improve performance by caching on the server side using dataLoader
  • adapt the schema for Relay support and use Relay on the client side,
    which can even be used in a React Native mobile client!
  • we should check out mutations for even write access.

The post Let’s build a Spotify GraphQL Server – Part 1 appeared first on codecentric AG Blog.

Categories: Agile, Java, TDD & BDD

40 Tips and Tricks for Spring in IntelliJ IDEA [Video]

Javalobby Syndicated Feed - Sat, 31-Dec-16 04:01

Recorded at SpringOne Platform 2016.

Speakers: Stephane Nicoll and Yann Cebron, JetBrains

Categories: Java

Thread Slivers eBook at Amazon

Syndicate content