With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.
With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.
It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.
Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.
Normally when you look into optimistic locking you also use a library like Hibernate or an other JPA-Implementation with @Version
support.
Example could read like this:
public class Person {
private String firstName;
private String lastName;
private int age;
private Color favoriteColor;
@Version
private Long version;
}
while obviously there is no point of adding a @Version
annotation if you are not using a framework which supports this.
The DDL could then be
CREATE TABLE people (
person_id PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL, # } I realize these column defs are not valid but this is just pseudo-code
age INT NOT NULL,
color_id FOREIGN KEY (colors) NOT NULL, # Say we also have a colors table and people has a 1:1 relationship with it
version BIGINT NOT NULL
);
What happens with the version?
- Every time before you store the entity, you check if the version stored in the database is still the version you know.
- If it is, store your data with version incremented by one
To get both steps done without risking an other process changing data between both steps it is normally handled through a statement like
UPDATE Person SET lastName = 'married', version=2 WHERE person_id = 42 AND version = 1;
After executing the statement you check if you updated a row or not. If you did, nobody else changed the data since you've read it, otherwise somebody else changed the data. If somebody else changed the data you will normally receive an OptimisticLockException
by the library you are using.
This exception should cause all changes to be revoked and the process of changing the value to be restarted as the condition upon which the entity was to be updated may no longer be applicable.
So no collision:
- Process A reads Person
- Process A writes Person thereby incrementing version
- Process B reads Person
- Process B writes Person thereby incrementing version
Collision:
- Process A reads Person
- Process B reads Person
- Process A writes Person thereby incrementing version
- Process B receives an exception when trying to save as the version changed since Person was read
If Colour is another object you should put a version there by the same scheme.
What isn't Optimistic Locking?
- Optimistic Locking is no magic to merge conflicting changes. Optimistic Locking will just prevent processes from accidentally overwriting changes by another process.
- Optimistic Locking actually is no real DB-Lock. It just works by comparing the value of the version column. You don't prevent other processes from accessing any data, so expect that you get
OptimisticLockException
s
What column-type to use as version?
If many different applications access your data you may be best off using a column automatically updated by the database. e.g. for MySQL
version TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;
this way the applications implementing optimistic locking will notice changes by dumb applications.
If you update entities more often than the resolution of TIMESTAMP
or the Java-interpretation of it, ths approach can fail to detect certain changes. Also if you let Java generate the new TIMESTAMP
you need to ensure that all machines running your applications are in perfect time-sync.
If all of your applications can be altered an integer, long, ... version is normally a good solution as it will never suffer from differently set clocks ;-)
There are other scenarios. You could e.g. use a hash or even randomly generate a String
every time a row is to be changed. Important is, that you don't repeat values while any process is holding data for local processing or inside a cache as that process will not be able to detect change by looking at the version-column.
As a last resort you may use the value of all fields as version. While this will be the most expensive approach in most cases it is a way to get similar results without changing the table structure. If you use Hibernate there is the @OptimisticLocking
-annotation to enforce this behavior. Use @OptimisticLocking(type = OptimisticLockType.ALL)
on the entity-class to fail if any row changed since you have read the entity or @OptimisticLocking(type = OptimisticLockType.DIRTY)
to just fail when another process changed the fields you changed, too.
Best Solution
Optimistic Locking is a strategy where you read a record, take note of a version number (other methods to do this involve dates, timestamps or checksums/hashes) and check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic. (i.e. hasn't been updated between when you check the version and write the record to the disk) and update the version in one hit.
If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.
This strategy is most applicable to high-volume systems and three-tier architectures where you do not necessarily maintain a connection to the database for your session. In this situation the client cannot actually maintain database locks as the connections are taken from a pool and you may not be using the same connection from one access to the next.
Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks. To use pessimistic locking you need either a direct connection to the database (as would typically be the case in a two tier client server application) or an externally available transaction ID that can be used independently of the connection.
In the latter case you open the transaction with the TxID and then reconnect using that ID. The DBMS maintains the locks and allows you to pick the session back up through the TxID. This is how distributed transactions using two-phase commit protocols (such as XA or COM+ Transactions) work.