Research Idea: Evaluation of Traffic Lane Detection with OpenStreetMap GPS Data

I am soon leaving University and thus the time for pure research will soon be over. Unfortunately I still have some ideas for possible research. I’ve tried getting them out of my head as this has not yet worked out, I’ll try to write them down – maybe somewone finds them interesting enough for a Bachelor-/Masterthesis or something like that …


OpenStreetMap creates and provides free geographic data such as street maps to anyone who wants them. The project was started because most maps you think of as free actually have legal or technical restrictions on their use, holding back people from using them in creative, productive, or unexpected ways. The OpenStreetMap approach is comparable to Wikipedia where everyone can contribute content. In openStreetMap, registered users can edit the map directly by using different editors or indirectly by providing ground truth data in terms of GPS tracks following pathes or roads. A recent study shows, that the difference between OpenStreetMap’s street network coverage for car navigation in Germany and a comparable proprietary dataset was only 9% in June 2011.

In 2010, Yihua Chen and John Krumm have published a paper at ACM GIS about “Probabilistic Modeling of Traffic Lanes from GPS Traces“. Chen and Krum apply Gaussian micture Models (GMM) on a data set of 55 shuttle vehicles driving between the Microsoft corporate buildings in the Seattle area. The vehicles were tracked for an average of 12.7 days resulting in about 20 million GPS points. By applying their algorithm to this data, they were able to infer lane structures from the given GPS tracks.

Adding and validating lane attributes completely manually is a rather tedious task for humans – especially in cases of data sets like OpenStreetMap. Therefore it should be evaluated if the proposed algorithm could be applied to OpenStreetMap data in order to infer and/or validate lane attributes on existing data in an automatic or semiautomatic way.


In the first step, the algorithm of the mentioned publication should be reimplemented and tested. Afterwards it should be tested how many traces in a region are needed to obtain a certain level of confidence and quality.

If GPS data obtained from OpenStreetMap can be used to provide reasonable results, the algorithm should be integrated into a JOSM-Plugin so that the result of the algorithm can be displayed as an overlay on the current data. Features for this plugin could be:

  • Detect regions with enough data coverage so that the algorithm can be applied
  • Option for filtering relevant traces:  simple filter criteria could select traces based on their speed so that for example data obtained by pedestrians is not included (Pedestrians hopefully never follow lanes directly!).
  • Apply the algorithm along selected  roads or – if it is fast enough – fully automatically.
  • Display the results of the algorithms in an overlay (in the JOSM plugin) to the user so that he can decide with one click, if he wants to assign the detected lane attributes to the given road.


The aim of the work should be a runnable demo application which could be submitted to according conferences.
Depending on the novelty of the resulting technique (which need not be a simple reimplementation of the mentioned paper), a scientific paper could also be strived for.


5 thoughts on “Research Idea: Evaluation of Traffic Lane Detection with OpenStreetMap GPS Data”

  1. I can see how this can be applicable in some scenarios, however GIS analysts have largely found cheats around some of these problems. For lane delineation, a simple remote sensing query usually does the trick. As for the traffic speed modeling, well that’s a little tricky. If you live in a “developed” country, rest assured that there are thousands of sensors seeing you from the beginning through the end of your drive. However if you do not live in an area with such a fanatic persuasion for tracking its traffic, then this algorithm would work well in theory. However, the biggest hang up with this would be that first bullet point (which, might I add, is the hang out for virtually every GIS project I have ever had the misfortune of carrying). Finding sufficient data will be challenging… and more importantly, finding free data will be practically impossible. Great post and keep the research ideas coming! Maybe you’ll encourage me to do the same!

    1. Thanks for the comment! I’ll try to explain in more detail why I posted this Idea:

      Well I’m an OpenStreetMap fan and also very critical to several research algorithms. In my PhD time, I’ve seen so many algorithms that only perform well on the “original” data set. As soon as the data distribution (or other characteristics) change, several algorithms turn bad or even completely useless. Thus it’s just worth reevaluating some algorithms under more general circumstances with different kinds of data. If the algorithm perfoms well (at least under some circumstances): wouldn’t it be cool if it just wouldn’t drown in the huge amount of publications but if it would just be applied to assist people who are doing a lot of work manually? I am following the OpenStreetMap community for quite a while now and I must say I’m alwas again impressed by the huge amount of data that was generated mostly manually. And also I very often think “OMG – there are algorithms and techniques that would make mapping a liitle bit easier if just someone would bring all that stuff from a simple paper to a plug in so that non-researchers could just USE it”.
      So you see: I want to combine supporting OpenStreetMap with evaluating (or even extending) research.

      According to the data-situation which definately IS problematic in several use cases: At least here in germany there are really a lot of GPS traces at least in major cities. Even in some smaller towns there could/should be enough data for at least some streets where the algorithm could be applied in order to verify lanes-attributes in the OpenStreetMap data (which is also freely available).
      The stats page at currently lists 2,701,369,969 uploaded GPS points world wide.

      As the coverage of GPS points is certainly not equal in all regions, it would be quite interesting how many GPS traces/points) are needed for a street in order to obtain a certain confidence. This information would again be interesting for people who just don’t want to add/modify the OpenStreetMap data actively but just contribute by having their GPS in the car.

      So to recapitulate: there is plenty of freely available data, real use cases and volunteering mappers with a need for helpful applications on the one side and possibly working algorithms on the other side. Combining these two can result in even better algorithms that could even be evaluated by crowdsourcing. It would help the volunteering mappers and (if done reasonably) the student / PhD if a paper/demo can be made out of the results. And at least a reject will hardly contain “too few data, unrealistic data set, artificial problem, unrealistic use case” 😉

      I’d be happy about any feedback.

      1. To elaborate on that algorithm problem, isn’t it a good thing that we encourage geographically unique algorithms considering Tobler’s law? It is convenient for us to have patterned forumulas like NDVIs that are so easily vicarious, but for such a geographically vulnerable variable as traffic patterns (or disease, crime, education, etc), should we not consider, or even assume, that the algorithms are to be used as a starting point? I often find myself having to remind colleagues of the very spatial and continuous nature of geography, which is just so convenient to overlook in a time of programmable/application algorithms that need be one-size-fits-all. I assumed you were speaking of traffic algorithms, although it’s just as likely you were speaking of generally published algorithms you’ve come across.

        I would absolutely agree that greater access for users to contribute data easily, efficiently and accurately would be the best thing that could happen to any study, however you have the very big variables of interference that would invariably come along: economic strain (of technology to the user), convenience (of sharing one’s data), privacy (self explanatory), and outreach (ie. education and general know how of the user).

        I currently work in GIS within the transportation field, and I can tell you from experience that we have toiled over several of the problems you have mentioned in your response. For example, my homework for the weekend is to research designated “car sizes” that sensors assume in traffic density counts. So for example, a wireless magnet sensor of brand X will count each passing 8 feet of mass as 1 car, with absolutely no considering for 18 wheelers. The research has certainly put that traffic data we pay so much for in perspective, and further solidify the need for the widespread adoption of GPS data.

        Out of curiosity, what was your PhD/dissertation focus?

        1. Indeed, I was speaking of algorithms in general – not just traffic algorithms.
          The algorithm described in the paper seems to be quite interesting. Sure, it won’t be the ultimate solution but – as you wrote – a starting point towards more insights on the requirements of this algorithm and later on, a more general application of the algorithm. That’s why I’d love to see a JOSM-Plugin with which you can just “play around” in arbitrary regions of OpenStreetMap and get some more insights about the algorithm and the data. Don’t get me wrong: I don’t want to bash the algorithm. I just want to see it applied on OSM-data in order to either a) see that it’s working – then we could think about a usable interface for end users, or b) see it’s not working and then dig deeper into the problems and try to enhance/modify/adapt it to work on this kind of data.

          Btw: the title of my dissertation is “Data and Knowledge Engineering in Medical Image and Sensor Data”. It’s composed of 3 major parts: medical imaging (Computer tomography), medical sensor data and indexing high dimensional featues as they are used in computer vision in general.

  2. For myself it would be more important to have a plugin that just combines gps traces without specific lanes. As you said in Germany we have a high density of GPS tracks and if you download them, its often really hard to see other things (like the background).
    I’m really looking forward to such a plugin. It would be nice if you could change some parameters manually (detection range/speed/direction).
    This way you could adjust such a plugin for your need
    –>highway with a lot of track–> low detection range/ high speed/ separating directions
    –>path with only few tracks–> higher range/ lower speeds/ directions not relevant


Comments are closed.