The key difference, to me, is that integration tests reveal if a feature is working or is broken, since they stress the code in a scenario close to reality. They invoke one or more software methods or features and test if they act as expected.
On the opposite, a Unit test testing a single method relies on the (often wrong) assumption that the rest of the software is correctly working, because it explicitly mocks every dependency.
Hence, when a unit test for a method implementing some feature is green, it does not mean the feature is working.
Say you have a method somewhere like this:
public SomeResults DoSomething(someInput) {
var someResult = [Do your job with someInput];
Log.TrackTheFactYouDidYourJob();
return someResults;
}
DoSomething
is very important to your customer: it's a feature, the only thing that matters. That's why you usually write a Cucumber specification asserting it: you wish to verify and communicate the feature is working or not.
Feature: To be able to do something
In order to do something
As someone
I want the system to do this thing
Scenario: A sample one
Given this situation
When I do something
Then what I get is what I was expecting for
No doubt: if the test passes, you can assert you are delivering a working feature. This is what you can call Business Value.
If you want to write a unit test for DoSomething
you should pretend (using some mocks) that the rest of the classes and methods are working (that is: that, all dependencies the method is using are correctly working) and assert your method is working.
In practice, you do something like:
public SomeResults DoSomething(someInput) {
var someResult = [Do your job with someInput];
FakeAlwaysWorkingLog.TrackTheFactYouDidYourJob(); // Using a mock Log
return someResults;
}
You can do this with Dependency Injection, or some Factory Method or any Mock Framework or just extending the class under test.
Suppose there's a bug in Log.DoSomething()
.
Fortunately, the Gherkin spec will find it and your end-to-end tests will fail.
The feature won't work, because Log
is broken, not because [Do your job with someInput]
is not doing its job. And, by the way, [Do your job with someInput]
is the sole responsibility for that method.
Also, suppose Log
is used in 100 other features, in 100 other methods of 100 other classes.
Yep, 100 features will fail. But, fortunately, 100 end-to-end tests are failing as well and revealing the problem. And, yes: they are telling the truth.
It's very useful information: I know I have a broken product. It's also very confusing information: it tells me nothing about where the problem is. It communicates me the symptom, not the root cause.
Yet, DoSomething
's unit test is green, because it's using a fake Log
, built to never break. And, yes: it's clearly lying. It's communicating a broken feature is working. How can it be useful?
(If DoSomething()
's unit test fails, be sure: [Do your job with someInput]
has some bugs.)
Suppose this is a system with a broken class:

A single bug will break several features, and several integration tests will fail.

On the other hand, the same bug will break just one unit test.

Now, compare the two scenarios.
The same bug will break just one unit test.
- All your features using the broken
Log
are red
- All your unit tests are green, only the unit test for
Log
is red
Actually, unit tests for all modules using a broken feature are green because, by using mocks, they removed dependencies. In other words, they run in an ideal, completely fictional world. And this is the only way to isolate bugs and seek them. Unit testing means mocking. If you aren't mocking, you aren't unit testing.
The difference
Integration tests tell what's not working. But they are of no use in guessing where the problem could be.
Unit tests are the sole tests that tell you where exactly the bug is. To draw this information, they must run the method in a mocked environment, where all other dependencies are supposed to correctly work.
That's why I think that your sentence "Or is it just a unit test that spans 2 classes" is somehow displaced. A unit test should never span 2 classes.
This reply is basically a summary of what I wrote here: Unit tests lie, that's why I love them.
Update (Jul 2021)
It's been quite a while since my original answer (almost 12 years) and best practices have been changing a lot during this time. So I feel inclined to update my own answer and offer different naming strategies to the readers.
Many comments and answers point out that the naming strategy I propose in my original answer is not resistant to refactorings and ends up with difficult to understand names, and I fully agree.
In the last years, I ended up using a more human readable naming schema where the test name describes what we want to test, in the line described by Vladimir Khorikov.
Some examples would be:
Add_credit_updates_customer_balance
Purchase_without_funds_is_not_possible
Add_affiliate_discount
But as you can see it's quite a flexible schema but the most important thing is that reading the name you know what the test is about without including technical details that may change over time.
To name the projects and test classes I still adhere to the original answer schema.
Original answer (Oct 2009)
I like Roy Osherove's naming strategy. It's the following:
[UnitOfWork_StateUnderTest_ExpectedBehavior]
It has every information needed on the method name and in a structured manner.
The unit of work can be as small as a single method, a class, or as large as multiple classes. It should represent all the things that are to be tested in this test case and are under control.
For assemblies, I use the typical .Tests
ending, which I think is quite widespread and the same for classes (ending with Tests
):
[NameOfTheClassUnderTestTests]
Previously, I used Fixture as suffix instead of Tests, but I think the latter is more common, then I changed the naming strategy.
Best Solution
Unit test: Specify and test one point of the contract of single method of a class. This should have a very narrow and well defined scope. Complex dependencies and interactions to the outside world are stubbed or mocked.
Integration test: Test the correct inter-operation of multiple subsystems. There is whole spectrum there, from testing integration between two classes, to testing integration with the production environment.
Smoke test (aka sanity check): A simple integration test where we just check that when the system under test is invoked it returns normally and does not blow up.
Regression test: A test that was written when a bug was fixed. It ensures that this specific bug will not occur again. The full name is "non-regression test". It can also be a test made prior to changing an application to make sure the application provides the same outcome.
To this, I will add:
Acceptance test: Test that a feature or use case is correctly implemented. It is similar to an integration test, but with a focus on the use case to provide rather than on the components involved.
System test: Tests a system as a black box. Dependencies on other systems are often mocked or stubbed during the test (otherwise it would be more of an integration test).
Pre-flight check: Tests that are repeated in a production-like environment, to alleviate the 'builds on my machine' syndrome. Often this is realized by doing an acceptance or smoke test in a production like environment.