An introduction to the Logging framework (a.k.a. System.out.println is evil)

Motivation

One of the first lines a programmer will write in a new language is surely “Hello World”. In Java you can write to the console or to the error stream quite easy with a simple System.out.println("Hello World") or System.err.println("Hello Error"). Great! When the code grows, bugs creep into the code and make live a bit harder. At this point programmers should definately start a deep and loving relationship with the debugger that is delivered with the IDE instead of using System.out/err.println() as debug method. Nevertheless – there are at cases, where a Debugger cannot (or hardly) be applied:

  1. The code runs in the IDE but not if startet directly. – What the hell’s going wrong?
  2. Handling of exceptions. An exception indicates a state that should not have happened and therefore it might be considered to be logged.
  3. The code is deployed to someone else and you cannot attach the debugger to his/her machine.

At either point, beginners tend to use System.out/err.println() to trace the execution path. While this might be okay if the onlyone that is using the code is the programmer alone, this can be very annoying if you are working in a team: If you forget to remove the debug messages, you’re polluting someone elses console output. Even worse: if the code is deployed to a client which reports an error, you cannot raise/lower debug levels or just enable/disable debugging. Do you really want to send a “debug version”? (No you wouldn’t.)

Continue reading An introduction to the Logging framework (a.k.a. System.out.println is evil)

Knowing: A Generic Data Analysis Application

We got another demo accepted:

Knowing: A Generic Data Analysis Application

Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, Nepomuk Seiler, Christoph Türmer, Dieter Dill
To appear at 15th International Conference on Extending Database Technology (2012)
March 27-30, 2012, Berlin, Germany

Abstract:

Extracting knowledge from data is, in most cases, not restricted to the analysis itself but accompanied by preparation and post-processing steps. Handling data coming directly from the source, e.g. a sensor, often requires preconditioning like parsing and removing irrelevant information before data mining algorithms can be applied to analyze the data. Stand-alone data mining frameworks in general do not provide such components since they require a specified input data format. Furthermore, they are often restricted to the available algorithms or a rapid integration of new algorithms for the purpose of quick testing is not possible. To address this shortcoming, we present the data analysis framework Knowing, which is easily extendible with additional algorithms by using an OSGi compliant architecture. In this demonstration, we apply the Knowing framework to a medical monitoring system recording physical activity. We use the data of 3D accelerometers to detect activities and perform data mining techniques and motion detection to classify and evaluate the quality and amount of physical activities. In the presented use case, patients and physicians can analyze the daily activity processes and perform long term data analysis by using an aggregated view of the results of the data mining process. Developers can integrate and evaluate newly developed algorithms and methods for data mining on the recorded database.

BibTex

@INPROCEEDINGS{BerGraKriSeietal12,
  AUTHOR     = {T. Bernecker and F. Graf and H.-P. Kriegel and N. Seiler and C. Tuermer and D. Dill},
  TITLE      = {Knowing: A Generic Data Analysis Application},
  BOOKTITLE  = {Proceedings of the 15th International Conference on Extending Database Technology (EDBT), Berlin, Germany},
  YEAR       = {2012}
}

More informations will be published at the official publication site at the LMU.

Research Idea: Evaluation of Traffic Lane Detection with OpenStreetMap GPS Data

I am soon leaving University and thus the time for pure research will soon be over. Unfortunately I still have some ideas for possible research. I’ve tried getting them out of my head as this has not yet worked out, I’ll try to write them down – maybe somewone finds them interesting enough for a Bachelor-/Masterthesis or something like that …

Introduction

OpenStreetMap creates and provides free geographic data such as street maps to anyone who wants them. The project was started because most maps you think of as free actually have legal or technical restrictions on their use, holding back people from using them in creative, productive, or unexpected ways. The OpenStreetMap approach is comparable to Wikipedia where everyone can contribute content. In openStreetMap, registered users can edit the map directly by using different editors or indirectly by providing ground truth data in terms of GPS tracks following pathes or roads. A recent study shows, that the difference between OpenStreetMap’s street network coverage for car navigation in Germany and a comparable proprietary dataset was only 9% in June 2011.

In 2010, Yihua Chen and John Krumm have published a paper at ACM GIS about “Probabilistic Modeling of Traffic Lanes from GPS Traces“. Chen and Krum apply Gaussian micture Models (GMM) on a data set of 55 shuttle vehicles driving between the Microsoft corporate buildings in the Seattle area. The vehicles were tracked for an average of 12.7 days resulting in about 20 million GPS points. By applying their algorithm to this data, they were able to infer lane structures from the given GPS tracks.

Adding and validating lane attributes completely manually is a rather tedious task for humans – especially in cases of data sets like OpenStreetMap. Therefore it should be evaluated if the proposed algorithm could be applied to OpenStreetMap data in order to infer and/or validate lane attributes on existing data in an automatic or semiautomatic way.

Continue reading Research Idea: Evaluation of Traffic Lane Detection with OpenStreetMap GPS Data

SRTM Plugin for OpenStreetMap

One of the features of our TrafficMining Project at the LMU was to use additional attributes in routing queries. Standart routing queries usually just use time and distance for calculating the fastest/shortest routes. In order to have an additional attribute we decided to use evelation data as this might be an issue if you also want to take fuel cost into account or if you’re planning a bike tour (instead of crossing a hill, you might want to consider a longer tour that avoids crossing the hill).

The problem just was that data nodes from OpenStreetMap are defined mostly by id, latitude and longitude, which is totally enough for painting 2D maps and standard routing queries. As the elevation is not added to OpenStreetMap data directly (and it is also not intended to be added to the OSM data base), a component was needed that parses both Nasa SRTM data as well as OSM data files in order to combine the data.

In the first version, we parsed the SRTM data directly and addied it to the nodes of the OSM-Graph directly. During one refactoring, we decided to integtrate Osmosis into the project in order to be able to read PBF files (another OpenStreetMap file format). During this integration we decided to separate the SRTM parsing into a separate module so that other people can make use of it as well. The plugin was open sourced some time ago at google code as the “osmosis-srtm-plugin” under an LGPL licence.

Relevant Links:

TrafficMining Project goes open source

Quite some time ago I wrote about a little demo that was published at SIGMOD 2010 and SSTD 2011 (see post1 and post2).

The TrafficMining project could be described shortly as:

An academic framework for routing algorithms based on OpenStreetMapdata. Actually this framework is not intended to replace current routing applications but to provide an easy to use GUI for testing and developing new routing algorithms on real OpenStreetMap data.

Well, what makes this worth a post is the fact that we finally switched development over to GoogleCode with a discussion group at Google Groups.
GoogleCode has the major advantage of a Mercurial repository, an issue tracker, easy code reviews and an miproved possibility to contribute code. If you just want to follow the development, just join the google group or keep a bookmark to the project’s update feed.

By the way: the PAROS and MARiO downloads can be found there in the downloads section.

Linking API and Sources to your IDE’s JARs (Part 2)

I tend to upload my complete project into version control. This includes the sources, tests, Jars and also the nbproject directory where NetBeans stores the project configuration. By doing so, I can check out the project on a different machine and start quickly without having to configure the project.

Sources and API Docs of external libraries are not commited as they’re are not requried for compiling. I usually keep sources and docs in a separate place outside my project (let’s say <userdir>javaLibs...).

When I check out the project on a different machine I can do coding but I do have neither the sources nor the API docs. Even worse: as I’ve commited the whole project including the configuration, I have also commited the nbproject/project.properties file which stores the pathes to the source and docs. Which is not a problem if the pathes on all the machines are the same. But when a new contributer wants to join in, (s)he either has to use the same directory structure (and possibly the same OS) or he has to overwrite the settings. Both not very desirable.

Continue reading Linking API and Sources to your IDE’s JARs (Part 2)

The Java7 bug … does it really affect you?

I was really happy about Java7 being finally delivered by Oracle – and really disappointed that there are 3 severe bugs that can either crash the JVM (ok, bad but – well) or produce wrong results silently (ohoh!). To excuse Oracle – the bugs were found very short before the release so they had no time to fix it – okay, really bad luck. Well – I don’t have Java 7 in production now, so I’m fine.

But as there’s so much fuss about it I dug a bit into the topic: The three evil bugs are the bugs with ID 7070134, 7044738 and 7068051.
All three bugs are in the states: Fix-Available and Fix-Delivered. So we just need to wait for the next Java update, right?

Wait: But all three bugs are “only” concerning the server VM. Of course, this is bad for the people who want to use Java 7 on their server right now. But if you just work on the client side and don’t use the server VM – well then you just don’t have to care about it.

Topics related to the bug:

How to create Memory Leaks by using Inner Classes.

The most recent Java Specialists Newsletter finally convinced me to start this post that I was having in mind for quite some time.

One of the really huge advantages of Java is that you almost do not have to care about cleaning up your memory as the Garbage Collector usually does this for you as soon as Objects are no more referenced. Usually this works really really well so that you really don’t have to care about annything! But maybe once in a while you may be observing something like a memory leak. Some people then call the Garbage explicitly – which is usually just a bad idea and possibly also doesn’t help either so that the “leak” remains. The better solution in this case would be profiling so that you can see why some classes are not cleaned up.

A nice source for memory leaks can be the use of anonymous inner classes. Assume the following class where you want to compute s.th and return a Result-Object which derives from an Interface:

interface Result{}
class Outer {
    int[] data = null;
    public Outer(int s) { data = new int[s]; }
    Object getResult() { return new Result(){}; }
}

So if you call new Outer(1).getResult(), you will still have an instance of Outer in memory even though you did not keep an explicit reference. As explained in the Java Specialists Newsletter, each instance of an anoymous inner class always keeps a reference to the outer class! This is not a big deal as long as

  • you don’t keep a lot of data in the Outer instance or
  • if the lifetime of the Result object is not long or
  • if you won’t create a lot of results anyways.

Let’s have an example. If you execute

ArrayList l = new ArrayList();
int i = 0;
while(true){
    l.add(new Outer(0).getResult());
    System.out.println(i++);
}

with the above classes without memory constraints (-Xmx), this will run for quite some time because you are only holding 2 class references (Outer, Result) 1 field (the emtpty data array) and an implicit reference from Result to Outer. Which makes a total of 48 bytes on my Win7 64bit machine (according to this measurement).

Now change the parameter in the constructor of Outer from 0 to 100000 and execute the code again. In my case I am getting an OutOfMemoryException after a bit more than 2000 created instances as now suddenly each iteration consumes 400.048 bytes (48 bytes as before + 100.000*4 bytes for the int-array) even though we only keep the explicit reference to the Result objects!

So – if you are creating an inner class the next time – you might have a brief look at the outer class as well and think about memory consumption and lifetime.

MARiO: Multi Attribute Routing in Open Street Map

Yeah, I got a new Publication accepted at Symposium on Spatial and Temporal Databases (SSTD) 2011 that is dealing with OpenStreetMap Data (using the JXMapKit and JXMapViewer).

MARiO: Multi Attribute Routing in Open Street Map

Franz Graf, Hans-Peter Kriegel, Matthias Schubert, Matthias Renz

Published at Symposium on Spatial and Temporal Databases (SSTD) 2011
Conference Date: August 24th – 26th, 2011
Conference Location: Minneapolis, MN, USA.

Abstract:

In recent years, the Open Street Map (OSM) project collected a large repository of spatial network data containing a rich variety of information about traffic lights, road types, points of interest etc.. Formally, this network can be described as a multi-attribute graph, i.e. a graph considering multiple attributes when describing the traversal of an edge. In this demo, we present our framework for Multi Attribute Routing in Open Street Map (MARiO). MARiO includes methods for preprocessing OSM data by deriving attribute information and integrating additional data from external sources. There are several routing algorithms already available and additional methods can be easily added by using a plugin mechanism. Since routing in a multi-attribute environment often results in large sets of potentially interesting routes, our graphical fronted allows various views to interactively explore query results.

Documents:

Bibtex

@INPROCEEDINGS{GraKriRenSch11,
  AUTHOR      = {F. Graf and H.-P. Kriegel and M. Renz and M. Schubert},
  TITLE       = {{MARiO}: Multi Attribute Routing in Open Street Map},
  BOOKTITLE   = {Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN, USA},
  YEAR        = {2011}
}

Robust Segmentation of Relevant Regions in Low Depth of Field Images

Great, we got accepted (as a poster) on the ICIP 2011 with the paper “Robust Segmentation of Relevant Regions in Low Depth of Field Images”:

Low depth of field (DOF) is an important technique to emphasize the object of interest (OOI) within an image. When viewing a low depth of field image, the viewer implicitly segments the image into region of interest and non regions of interest which has major impact on the perception of the image. Thus, robust algorithms for the detection of the OOI in low DOF images provide valuable information for subsequent image processing and image retrieval. In this paper we propose a robust and parameterless algorithm for the fully automatic segmentation of low depth of field images. We compare our method with three similar methods and show the superior robustness even though our algorithm does not require any parameters to be set by hand. The experiments are conducted on a real world data set with high and low depth of field images. (Abstract from the paper)

The work is a result of a collaboration with Michael Weiler. We extended his Diploma thesis and produced an improved segmentation algorithm for Low Depth Of Field images. Compared to the other 3 competing algorithms, ours is a bit slower but at least it works. The other algorithms turned out to be extremely unstable and/or sensitive to parameters.

On the project site you can find

  • an online demo
  • the test images,
  • the masks
  • the NetBeans project including the full Java source code for our algorithm and the reimplementation of the comparison partners (of course we had to re-implement as we didn’t even get binaries – as usual)

So if you plan to do some image segmentation, just go there download the stuff and cite our work 😉