Saturday, August 22, 2009

A few notes on TestNG data providers

TestNG has a notion of a Data Provider which means a method annotated with @DataProvider annotation and returning result of type Object[][]. This methods can be used to provide sets of arguments to test methods (you can see official explanation here). Such methods are proved to be very useful for me as I'm doing TDD a lot and often :). I want to mention just two of possible usages which became integral part of my testing style.

The first one is a God send if you like to do some basic (or not very basic) argument checks on a method start (and also want to test that it really works). As example suppose we have methodA for some class under test which takes 3 String arguments. Before I started to use Data Providers I used to have lots of such tests like the following:

@Test(expectedException = IllegalArgumentException.class)
public void methodAFailNullArg1() {
testInstance.methodA(null, "arg2", "arg3");
}

@Test(expectedException = IllegalArgumentException.class)
public void methodAFailNullArg2() {
testInstance.methodA("arg1", null, "arg3");
}

@Test(expectedException = IllegalArgumentException.class)
public void methodAFailNullArg3() {
testInstance.methodA("arg1", "arg2" , null);
}
This is very annoying but if you're doing TDD you have no choice but to write this kind of tests along with others (at least I think I don't have :) )

No look how this would look like with Data Providers:

@DataProvider
private Object[][] methodABadArgsProvider() {
return new Object[][] {
{null, "arg2", "arg3"},
{"arg1", null, "arg3"},
{"arg1", "arg2", null}
}
}

@Test(expectedException = IllegalArgumentException.class, dataProvider = "methodABadArgsProvider")
public void methodABadArgs(String arg1, String arg2, String arg3) {
testInstance.methodA(arg1, arg2, arg3);
}
I'm using the latter approach starting with 2 argument sets. The main advantage is that concern of testing bad arguments for methodA here addressed in one test instead of three, so you always have at least less semantic clutter. And starting with 3-4 argument sets you have also less syntactic clutter which is even better!

The second use case is to use Data Providers to provide test instances. This can be very useful when you're going to test interface contract or some common abstract superclass functionality while using several different realizations.
For example you can have SetTest class intended to test some common for all sets functionality (for example that they can not contain more than one same element). Then it can look like this:

public class SetTest {

@DataProvider
private Object[][] instanceProvider() {
return new Object[][] {
{new HashSet()}, {new TreeSet()}
}
}

@Test(dataProvider = "instanceProvider")
public void onlyOneSameElement(Set testInstance) {
testInstance.add("string");
testInstance.add("string");
assertEquals(testInstance.size(), 1);
}
Surely the use of Data Providers aren't limited by mentioned use cases.
Also currently I don't know whether JUnit has a similar feature (the last version I worked with about a year ago was 4.3 and I can not remember there anything like this)

Tuesday, August 18, 2009

Hunting bugs

The last several weeks was a bit nervous. Our customers reported several similar bugs and we had absolutely no idea of where they come from. Fortunately now the cause of bugs seems to be settled (though yet to be solved) thanks to extensive logging (the thing I talked about in the first blog post).

I will talk here using some domain specific terms like Driver entity & getStatus() operation but I think it should not make understanding harder (if you have different opinion, write a comment and I'll try to explain if something is unclear)

The bug pattern was (as we discovered it):
  • Driver performed closeOrder() operation which should have changed his status + status of some order. Driver must have been set FREE status, Order must have been set ARCHIVED status.
  • Concurrently driver's mobile client executed background request getStatus() (which is executed once in half a minute automatically). This should have set only LAST_POLLING flag on Driver entity
This resulted in a very strange combination. Driver remained BUSY while Order was successfully set to ARCHIVED status which means that system was put into inconsistent state. As we first thought getStatus() command must not have any relation to the fact that driver somehow remains in (or is put back to) BUSY status (because it only changes 1 unrelated attribute LAST_POLLING). But we were wrong because of 2 things:

The first one is more general one: transaction isolation level (READ COMMITTED) was insufficient for transactions performed by operations mentioned. This made possible race condition between 2 threads executing different transactions. As a result thread executing getStatus() was able to
  • See driver in BUSY status (because it managed to read driver's state before closeOrder() operation
  • Write driver's status (because it managed to commit after! closeOrder() committed
But now one can ask: how does operation which only should have written LAST_POLLING managed to overwrite driver's status? This is because of the second thing which is related to the default Hibernate settings:
By default Hibernate for any given entity executes updates involving not only fields changed during Hibernate session but all entity fields! This allows it to prepare update operations on application startup. But this also makes application more fragile to race conditions. So I want to warn everyone about this property of the default Hibernate's strategy. It can be changed by setting dynamicUpdates entity attribute to true (either in annotations or in xml descriptors)

Setting dynamicUpdates to true only lowers the chance of race condition between different threads but not eliminates it. So now we are working hard on making possible Hibernate optimistic locking for Driver entity. The main difficulty here is to make some common operation be able to recover from OptimisticLockException (recovery means be able to try the same sequence of actions again)

Sunday, August 16, 2009

Easy VCS

I discovered for myself (as I think many others did in the recent years) the benefits of using distributed version control systems for home-grown projects. Mercurial (which I'm using now) is the one which is really fast to grasp and to use. Initial setup is extremely simple (will talk about Windows one)
  1. Download it
  2. Run hg init inside project you're going to version control
  3. [Optional step] Put .hgignore to your project folder. (see this page)
  4. Run hg add to mark all project files to be managed by Mercurial
  5. Run hg commit to commit your first revision
And that is it! Sure you will have to tune it a bit afterward (e.g set up HGEDITOR env variable to use your favorite editor for editing commit messages), and learn some of its commands, but if you're going to use it for personal projects their number will be really-really small: something like hg log, hg update, hg remove, hg rename (as I'm talking about personal projects, I suppose most of you will not want to branch your project.. but if you will then you will have to learn more commands :P)

Thursday, August 13, 2009

Cool staff

A pair of interesting things for today that I discovered for myself.
The first one seems to be awesome project from Mozilla labs called Bespin, which is browser IDE focused on collaborative development and even on the very early stage of development can boast of several nice ideas implemented. You can watch video & play around by following this link:
https://bespin.mozilla.com/
The second one is Theory of patches, on which quite original distributed VCS named darcs is based.

Tuesday, August 11, 2009

Order by in Hibernate named queries

Another annoying thing I discovered today with Hibernate is about putting dynamic Order By clause to named queries. You can not normally Order By column name because if you put into your query something like ' order by :columnName' and do something like setParameter(":columnName", "id") it won't work as id value will be put quoted into prepared statement. Even more the following query 'select * from Order o order by :column :direction' will result in error! Hibernate won't be able to parse this query.
On one of the forums I found 'nice' workaround suggesting put column numbers instead of column names, this seems like very inconvenient solution, also anyway this won't solve problem with sort direction.
If anyone reading this blog (is anyone here?! :)) ) knows better alternative - I'll be glad to hear. Currently marking named queries as 'useless' feature in our project.
Update: as suggested in comments there is a solution (quite a simple one I should admit) which is to wrap query string retrieved named query into another query call like that:

Query q = s.createQuery(s.getNamedQuery("query.without.order.by").
getQueryString()+ " order by " + sortByColumnName);

Thursday, August 6, 2009

Hibernate query filter vs load/get

ETaxi project uses Hibernate as JPA provider (which means nothing surprising :) ). We are using Hibernate Query Filters, one nuance of which I'm going to discuss here, so at first some short information about them

Query Filter comprises of filter definition declaring filter name and parameters , filter metainformation for an entity defining what SQL string will be appended to where clauses of queries for the entity, and code that enables filter by invoking

Filter filter = session.enableFilter("filterName") ;
filter.setParameter("key1", value1);
...
filter.setParameter("keyN", valueN);


The purpose of using them is to make our life a bit easier. Our application has multiple working areas each of which should be accessible only by corresponding customer. So customer specific entities are filtered by default, and we don't need to put customer id explicitly into the list of our DAO method parameters. For example, I mean method which should look like:

public interface DriverDao {
public List findAll(Customer customer);
}

Looks instead as:

public interface DriverDao {
public List findAll();
}

Thus when filters are off it is possible to retrieve all drivers, when on - only those belonging to current customer. The benefits we have from using filters are - shorter queries, less parameters for dao methods, it's harder to forget to include filtering by customer to some query (and have one customer see other customer drivers/users/orders etc, which means very angry customers)

But there is at least one annoying 'But', query filters don't apply to results of Session.get() and Session.load() methods (which try to locate objects in cache before querying database), so when you want filters to work you need to use a query finding entities by id instead of get/load methods - and now you have your session cache not working on simple entity find by id!

Today after analyzing one of the typical requests to our app I discovered that current user is queried 5 times during it! Sure this partially can be helped by redesigning user searches inside application, but in this way we loose one of the benefits of Hibernate (because we are trying to use the other one consistently)

Till recently I even have not understood why this is made so that load()/get() can not make use of filters, but it is now clear to: suppose you turned filter off and loaded some entity which normally should be filtered, than turned filter on and try to load() that entity by id. How can simple cache search know should particular entity be filtered or not? (considering that applied filter is only a piece of SQL that is appended to queries, probably containing parameters that are also filled when preparing JDBC statement)

So query filters turned to be one of those Hibernate features that should be paid for and for which is hard to predict in what way you will have to pay (if you don't have some experience using). I wish this post will help someone make more conscious decision about whether or not to use them

Tuesday, August 4, 2009

Logging and AspectJ

Posting the first message in this blog. As I'm writing here as programmer targeting other programmers let the first topic be about logging and aspect oriented programming.

In the majority of available sources logging appears to be the most favorite example of applying AOP. So some time ago we decided to improve logging in our ETaxi project (which I will probably describe in some later posts). We already had some simple aspects logging user access to certain high level methods directly corresponding to URLs. Slowly this appears to be non sufficient to identify the sources of some strange problems that arose on production environments.
So now we have 3 different choices to further apply logging to our project:
  1. Try to generalize all logging and perform it using aspects
  2. Use logging embedded into main program logic
  3. Somehow merge 1 and 2
Today I tried to analyze (and predict consequences) of using different approaches for us.
As I'm pretty much new to AOP (using it only for half a year) the following analyze could sure be full of mistakes so I appreciate comments.

Pros of 1:
  • Logging is decoupled from the main logic so main logic flow is much easier to read
  • Logging is concentrated and not scattered throughout the project which means it is easier to employ some common policies
Cons of 1:

The cons are based on the fact that pros are valid only when you can in some general form identify all points of interest for logging to be applied. As example consider the following Java code snippet
private Order order;
private Driver driver;
private OrderService service;

public void run() {
if(isDriverBusy(driver)) {
//the 1st log I want to move to aspects
log.info(order+" skipped. Driver "+driver+" is busy");
return;
}

//..some other processing probably with other method exit points

service.scheduleFurtherProcessing(order);
//the 2nd log I want to move to aspects
log.info("Further processing scheduled for "+order);
}
In this code example I want to log 2 things:
  • If isDriverBusy(driver) returns true I want to log that processing of order is finished because driver is busy
  • After service.scheduleFurtherProcessing(order) I want to log that current processing finished and next iteration scheduled.
The first thing is difficult to be handled in some generic way with AspectJ logging because after execution of some method I want directly expose part of object state in log message. I don't mean it is impossible, no! This is very possible, but you will not be able to do this with some simple tracing aspect which traces some methods entrance/exit - and this is the problem.
If you want to solve the problem you will need one of the following
  • Write advice that dumps the state of this on method exit/enter - then you will be able to view necessary information BUT you will have severe clutter in you log messages because in most situation you need only tiny part of state to be dumped
  • Write advice & pointcut which identifies and solves this particular problem - you will end up with tons of specific advices & pointcuts which is: first - much more code than simply put in log messages, and second - makes you aspects very prone to refactorings when you can accidentally loose some of you logging and not notice that
  • Employ some 'clever' idea. For example you can mark fields that should be dumped within class' method executions with annotation and dump only them. But this binds your code to aspects. Also methods which receives the state as parameter will dump corresponding objects twice which is annoying. This topic is different in that probably the main problem is with my ability to identify 'really clever' idea, here I appreciate any help (which will not suffer from the problems I mentioned)
The second thing is similar in that if I want to do it I need specific pointcut and advice targeting end of call to service.scheduleFurtherProcessing(order)
in the flow of run() method which as I already said means writing too much code for the simple log message

Cons of 2

are reverse of Pros of 1

Pros of 2
  • Ability to easily match messages written to log with actual code
  • Debug is simpler (at least when using Idea)
  • Runtime performance is better, no need to use reflection to dump anything
  • Compile time is shorter the less aspects are used in project
Analyzing cons and pros given above we decided to use the third approach mixing two other together. The idea is to reduce clutter of in-place logging with aspects, but use explicit messages whenever necessary. Thus we loose ability to centralize all logging in one place but be able to have fine control without too much of (fragile) effort. For example we decided to accompany explicit logging with support of @Log annotation. Methods marked with that annotation will be logged on entrance and exit by corresponding aspect.

That's all. Please leave your suggestions, corrections and improvements in comments