Finding duplicate code in C#, VB.Net, ASPX, Ruby, Python, Java, C, C++, ActionScript, or XAML

Whether you realize it or not, you need a tool that finds duplicate source code in your applications. In fact, if you’ve never used one before, you probably don’t realize how much you need an automated solution to this problem. It’s nearly impossible to manually locate the types of duplicate code that such a tool can easily bring to the surface. Even if you think you’re intimately aware of an application’s code base, every line of code you write contains the potential to awaken the duplicate code dragon.

To combat the problem, we have Atomiq – what I consider to be the best solution for finding duplicate/similar code in C#, VB.Net, ASPX, Ruby, Python, Java, C, C++, ActionScript, and XAML.

UPDATE 11/9/2010 10:46:26 AM by AD: The promotional code that used to appear on this post has been removed.

How can I make such bold claims? Well, for one, because I know I always write awesome code and yet it’s astonishing how frequently Atomic says, “No, Alex, you do not always write awesome code.” Smile

Oh, and it happens to be the only code similarity finder I could find that was easy to use for my purposes. You might say nothing else duplicates the Atomiq experience! *SNORT*

Atomiq doesn’t really need this much of an introduction. If you haven’t done so already, it’s an easy, safe, and incredibly eye opening experience to run Atomiq against any of your projects. Here’s how to get some immediate gratification:

  1. Download Atomiq from http://getatomiq.com (There’s no real need to spend much time on their website if you just want to get some immediate gratification)
  2. Get some immediate gratification.

That’s pretty much it as long as we ignore the part where Atomiq points and laughs at all the duplicate code it found! Smile

I was able to quickly gain an extensive understanding of where my duplicate code existed with minimal knowledge of how to use Atomiq. After I finished wiping up the tears, I was able to begin the gratifying process of fixing things.

A personal example

If you’ve read my posts about using delegates to eliminate duplicate code and using IDisposable responsibly, then this example is going to look familiar to you. To be honest, it was Atomiq that led me to the delegate-based design in those posts!

Below you will find two methods that do two different things with an OdbcDataReader. Don’t spend a lot of time trying to figure out what they do.

The first bit of code returns a list of first names from a database.

image

The second bit of code sends an email notification to a list of email addresses in the same database.

image

It’s pretty typical code you might find in any application that connects to a database. You can tell the author had good intentions, but it’s not hard to think of a few simple things we could do to make the code a better.

But it’s probably not a good idea to just dive in and start refactoring!!!

If you looked at the code above, you might have noticed that the two methods followed very similar patterns. Though the methods do two different things, they’re not that different from each other.  In other words, there’s a lot of duplicate code. There might be a hundred instances of that pattern in your application! It would be very difficult to find those instances manually.

Before you attempt to refactor, you might want to use a tool like Atomiq to help you find all of the duplicate code patterns in your code. Finding and eliminating those patterns will help you make better refactoring decisions.

For example, if we look at the NotifyPeople method from above in Atomiq, we can see from the red lines that there are two other places that have the same pattern as lines 103-115 and 120-131.

image

Closer inspection shows that, indeed, one of the places in our code that duplicates that pattern is the GetFirstNames method from above. Again, my two previous blog posts, using delegates to eliminate duplicate code and using IDisposable responsibly, go into detail explaining how I chose to solve this particular problem in one of my projects.

Digging deeper

I’m not going to go into great detail on how to use Atomiq, as its smart developers have already graciously done this on getatomiq.com. It’s not massively complicated software which makes getting to know its full extents a quick exercise. I do suggest watching the introductory videos on the homepage if you’re sitting on the toilet with nothing else to do.

The entire Atomiq user interface isn’t exactly what I would call “typical”, so spending some time on their website will help you get the most out of the tool.  And because of that unique interface, Atomiq has a few features that you might not discover without the aid of the website.

Constructive criticism

This wouldn’t be a proper review if I didn’t throw my opinion in the air and wave it like I just don’t care, now would it?

Here’s my bullet list of “other” notes I took during my review of Atomiq.

  • I’m not sure why they don’t include the option to download and run a standard installer for Atomiq, but I don’t necessarily find this to be a problem. I chose to xcopy the Atomiq exe out to c:\Utilities\Atomiq.
  • If you don’t configure the analyzer’s settings appropriately, the Atomiq user interface might not show anything useful. When this happens, it’s easy to assume you didn’t do something right.
    image
  • Minor detail, but I like to look at change logs. At the time of this writing, there isn’t one on the website or included with the Atomiq application.
  • The entire user interface isn’t exactly what I would call “standard”. That doesn’t mean it’s a bad/unusable interface, but I’m typically the kind of guy who likes things to look/work the way they do by default. I mean, I never even changed my myspace theme from the default skin for crying out loud! Smile It just catches me off guard when I can’t ALT+F, N to start a new project for example.
  • When you create a new project in Atomiq, the first thing you are required to do is pick a directory that contains your source code. The Pick button shows a pretty standard directory picker, but I really wish I could locate the directory I want more quickly by pasting the directory from my clipboard. I’m quite ninja-like when it comes to wrangling a computer, so it’s sometimes easier for me to get the directory I want in the clipboard from another application than it is for me to locate the directory with this user interface. But now that I think about it, I don’t think wrangling is something ninjas do.
    image
  • It would be great if Atomiq had an MRU list to help me open the files I use often.
  • It doesn’t appear that Atomiq is able to find duplicate code patterns that vary only by magic numbers, variable/class names, or syntactic variations. Using a tool like Atomiq most effectively sometimes requires gently massaging your code beforehand. For example, the screenshot below shows that Atomiq doesn’t find any similarities between NotifyPeople above and NotifyPeople2. Yet they are, for all practical purposes, identical.
    image

    If everyone on your team uses roughly the same coding and naming guidelines, you might not have to worry about that problem much.Here’s another example that I think Atomiq might be able to shed some light on one day. There’s clearly a very important similarity between lines 166 and 177 below.
    image
    Maybe the developers of Atomiq could provide some kind of clue that there’s an opportunity to perform the following “extract method” refactoring:
    image

Still a winner

Prior to finding Atomiq, I’d never used a code similarity finder. These days, it’s something I use often and can’t imagine living without! It’s such a simple, useful tool, I don’t know why anyone wouldn’t want to use it on their own projects.

Most of the time, you’ll likely use Atomiq as more of a detective tool. It usually finds little pieces of a pattern that, upon closer inspection, are much bigger, more important patterns.  So even though Atomiq can’t perform miracles, the tremendous satisfaction that comes along with deleting tons of duplicate code from your application is worth many times its $30 price tag.

UPDATED 11/9/2010 10:46:26 AM by AD: The promotional code that used to appear on this post has been removed.

  • Pingback: Tweets that mention Finding duplicate code in C#, VB.Net, ASPX, Ruby, Python, Java, C, C++, ActionScript, or XAML | AlexDresko.com -- Topsy.com()

  • http://Website Dale

    An alternative is PMD. PMD is distributed with CPD, the Copy Paste Detector, which locates duplicate code. PMD is a free, open source project. Out of the box CPD supports Java, JSP, C, C++, Fortran and PHP code, however, other languages are easy to add. In fact, the PMD plugin for jEdit supports all of the 200+ languages known to jEdit.

  • http://alexdresko.myopenid.com/ alexdresko

    I’ll give PMD a try if I can ever find time to figure out how to find and install the plugins I would need to make PMD useful for me. I played a little with the Java Web Start version of PMD, but it doesn’t support C#. My, oh my, what a difference it would have made if PMD supported C# out of the box like that.

  • http://www.semanticdesigns.com/Clone ira Baxter

    Another alternative is the Semantic Designs CloneDR. It uses compiler-quality parsing to extract code structure, which means it can find duplicate code in spite of whitespace changes, differences in comments or their content, name changes, insert or deleted statements or even blocks of statements.

    You can see a variety of clone detection reports at the website.

  • http://Website john

    “ALEX” coupon does not work. You have any discounts running now?

  • http://alexdresko.myopenid.com/ alexdresko

    I’m looking into it, John. Will let you know what I find.

  • http://alexdresko.myopenid.com/ alexdresko

    The Atomiq developers have just informed me that the promotional code has expired. I’ve removed it from the post as such.

  • http://www.conquatiopre.de Alfonso Leviston

    You made some good factors there. I did a search on the topic and discovered most people today will concur with
    your weblog.

  • Pingback: everything about java()

  • http://1 1

    -1′

  • Asha

    hello there

    can u give links for mostly used clone detecting tools/ software as I am preparing a research paper. These downloaded tools will help me in statistical detecting analysis. I have already done searching but no appropriate search result found