Visualizing Dataflow

Posted by Mike Haller on Monday, June 21. 2010 at 22:49 in Java
I've been playing around recently regarding visualization of dataflow in Java applications. It seems that there are tons of tools to inspect the control flow, but I had no luck yet finding something which can visualize the amount of data and the type of data flowing through complex systems.

Not being an AOP guru, i wondered if there was something else to use. Something, which is unobtrusive and can be applied to existing systems. The first thing i'm trying is Java's Debugging APIs, namely JDI, to automatically step through a program and record method entries.

Method parameters represent data flow into methods: all parameter values (ObjectReferences actually, no primitives currently) are recorded as a dataflow relationship between the caller and the receiver based on the call stack.

Assume the following little example program:
public class Application {
   public void doSomething() {
      Person person = new Person("Chuck Norris");
      Order order = new Order();
      order.setBuyer(person.getName());
   }
}

The program is started in debug mode, so the VM will suspend and wait for a debugger to connect. Next, i'm starting my tool which connects to the JDI port and registers debugging hooks. After these debug hooks are configured, the VM is resumed and the threads begin to spawn. After bootstrapping the main method (it contains only new Application();), the tool will receive the low-level debug events, stop all threads, record parameter values, resolve Object references, send out high-level events and visualize the high-level events using a graph viewer.

The output (made with JUNG graph visualization library) is then displayed on screen. You can watch the object's being created and the data flow evolving in "realtime" (debugging is very slow, hence the quotes).

Dataflow visualization in a simple Java program using JPDA/JDI debugging techniques

For the simple example, it looks nice. The green dots are application code, orange dots are java.* classes. The little orange dot in the bottom right corner is probably the Class's string constant pool, just ignore it for now.

You can see that the example data (the constant string Chuck Norris) flows into a Person object and it also somehow flows into the Order object. Note that the diagram does not show that the Application was involed in any way, e.g. that the application was under control when the data was copied from the Person object into the Order object. This may seem a little odd at first, since as a human we tend to see the Person object as the sole owner of the String Chuck Norris. But in the virtual reality, the String owns itself of course, since it's a normal Object and lives in the constant pool space.

I'm not sure if it would be better to see data as edges instead of vertices. But if data objects like the string are represented as standalone vertices, they can be referenced multiple times and their value flows into other objects. The more often the value flows from one object to another, the more weight can be put into the edge's stroke. You can see this in the example above. The edge between "Chuck Norris" and the Person object is thicker when more data flows.

The Person accesses the String object in its constructor (1) the first time:

public class Person {
   private final String name;
   public Person(String name) {
      this.name = name; // (1)
   }
   public String getName() {
      return name; // (2)
   }
}


and then a second time when its getter method accesses the value (2), to return it to the application which forwards it to the setter method of the Order. Since the setter method in the Order is only called once, it is the sole and last access (3) from the Order to "Chuck Norris" and therefore is only a thin line.

public class Order {
   private String buyer;
   public void setBuyer(String buyer) {
      this.buyer = buyer; // (3)
   }
}


The main problem with using JDI, besides that it was not meant for this kind of processing, is that it's very unstable. The VM crashes regularly:

# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000006dd098a0, pid=10416, tid=5180
#
# JRE version: 6.0_16-b01
# Java VM: Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode windows-amd64 )
# Problematic frame:
# 
>>>STDOUT>>> V
>>>STDOUT>>>   [jvm.dll+0x4798a0]
#

>>>STDOUT>>> # An error report file with more information is saved as:
# E:\workspaces\workspace\sandbox-debugging\hs_err_pid10416.log

>>>STDOUT>>> #
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

Program exited:1


How sad. Perhaps I should check out AspectJ for my toy project instead of the debugger api.

Patrick
Woah, that looks awesome. I wonder if the graphs can still be read on real world code (ie. code that contains more than 2 classes)
Have you tried if it works better when using a more up to date hotspot version? (Also a 32bit client vm?)

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications
 
Submitted comments will be subject to moderation before being displayed.
 

About

My name is Mike Haller and I'm a software developer and architect at Bosch Software Innovations in Germany. I love programming, playing games and reading books. I like good food, making photos and learning and mentoring about the craftsmanship of commercial software development. Stack Overflow profile for mhaller

Quicksearch