You are here: Home Blog BreakingPoint Labs Blog Visualizing the Twitter Social Network

Visualizing the Twitter Social Network

Back in July, BreakingPoint added support for the Twitter API to our Application Simulator. The method I used in developing this AppSim protocol was to reference the API documentation, writeup a small client, and capture network packets. Looking at how API clients and the Twitter servers behave makes developing a realistic simulation of Twitter pretty simple to do. One question we commonly hear, especially concerning our Security and Application Simulator components, is, "How do you verify correctness?" I want to give one specific example of this by talking about a side project I worked on recently.

Once a month, I attend a local meeting of people interested in computer security. The format for the meetings allows anyone to do a talk typically between five and fifteen minutes in length. I always wait until a few days before the meetings to start thinking about prospective topics. Since I'd been working on some Trac tickets related to Twitter, I thought I might do a short topic related to mapping out the Twitter network. My original goal was to find all the cool security celebrities as a way to find all the cool links everyone is talking about.

Twitter is like most social networks, in that it tracks 'friend' relationships between users. I thought it would be interesting to visualize some of the friend/follower relationships on the site. I started to write yet-another-twitter-client. After about 10 minutes, I saw that I was reimplementing the same code I'd already done for AppSim. I decided I might as well leverage the code I'd written for AppSim. This is one way to address the question of application correctness. If I take the code in the BreakingPoint product that implements the Twitter API, and can really communicate with Twitter servers, then I would call our implementation correct. It's also a good way to make sure I don't have any bugs in Twitter, too.

I started implementing a directed graph in ruby. About 15 seconds later I started looking for a library for doing directed graphs. I found RGL, the Ruby Graph Library. There's a ton of usefulness in that library, and I know I'm not scratching the surface of how I could use it. I hadn't even started thinking about visualization yet, but I found that RGL supports Dot files, the input supported by GraphViz. I was further convinced that RGL was going to be great to work with. I had about a day before the meeting, and didn't want to write any layout code. I just wanted something done quickly (and with the least amount of effort on my part).

I wrapped the AppSim code that implements the Friends/Followers Twitter API calls, and started putting the list of Twitter users into a queue. I then dequeue the first user from the front queue, and get her friends and followers, and repeat.

This graph shows the social network from the perspective of the @BreakingPoint Twitter account. This graph adds new nodes in whatever order they are returned in the API calls.

Ok, so there are some users that show prominently, but it's all a jumble and hard to really extract anything useful.

My next method was to order insertion into the queue by the number of follwers each friend had. This technique had an interesting effect; the user I start graphing at (again, @BreakingPoint) remains in the center of the graph. You'll notice that most of the edges are directed out of the @BreakingPoint user. This is due to the fact that I process all the friends first, then all the followers. One attribute of this view is that people we follow that have a large following are prominent. It's a nice side-effect of how GraphViz lays out the graph.

The biggest problem doing this is that you end up with many of the same users populating graphs, even when starting from different initial users. When you order by the number of followers, eventually someone in your network is following @CNN or @the_Onion, and graphs from different runs start looking very similar in terms of what users are prominent. Also, once you hit a user with a follower count in the tens-of-thousands, progress in mapping the network slows as you retrieve followers in batches of 100 at a time, which is a requirement of the API. If anyone in the network you're graphing follows, say, @BarackObama, you should go to your local Alamo Drafthouse and catch a movie.

It became obvious that filtering is key if you want to get any interesting results from the data.

Here is a graph generated by limiting the users included in the graph to those with between one hundred and one thousand followers. I have also modified the graph to show our biggest followers.

An image like this gives you an idea of the most potentially influential followers you have. If you were a marketer like @KyleFlaherty, you could use this information to start trying to influence the biggest influencers that follow you. In the marketing world, word-of-mouth is a goldmine. This seems especially true in social networks where your friends are hand-selected.

This is just a first step into the topic of data visualization for me. I have a feeling I'm going to have to come up with something better than GraphViz for visualization. I've also had a request for making a web application out of this and make it available to a wider audience. I'd like to use the comments to gauge demand for this tool. If it looks like people would find this useful, I might just try and get it cleaned up and make a simple version available.

The number one thing people have said they'd want is interactivity. I think I'll go get working on that. I have tons of features I'd like to see, and I still don't have a talk for the upcoming meeting.

Update: the first thumbnail and linked image were corrected.

Posted by Todd Manning (2008/09/30 18:00:00 GMT+0)

GraphML

Posted by pedantic gnome at 2008-10-05 12:52
You should check out GraphML and yEd from yWorks.com. They handle large graphs quite well and also have a number of different layout algorithms.

interested

Posted by Laban Johnson at 2008-10-31 01:58
I'm interested in social network visualization but beyond twitter - thinking of applying this for understanding how the social structure of large cities and political groups work