Java Persistence/Identity and Sequencing

=Identity=

An object id (OID) is something that uniquely identifies an object. Within a JVM this is typically the object's pointer. In a relational database table, a row is uniquely identified in its table by its primary key. When persisting objects to a database you need a unique identifier for the objects, this allows you to query the object, define relationships to the object, and update and delete the object. In JPA the object id is defined through the  annotation or   element and should correspond to the primary key of the object's table.

Strange behavior, unique constraint violation.

 * You must never change the id of an object. Doing so will cause errors, or strange behavior depending on your JPA provider.  Also do not create two objects with the same id, or try persisting an object with the same id as an existing object.  If you have an object that may be existing use the     API, do not use   for an existing object, and avoid relating an un-managed existing object to other managed objects.

No primary key.

 * See No Primary Key.

=Sequencing= An object id can either be a natural id or a generated id. A natural id is one that occurs in the object and has some meaning in the application. Examples of natural ids include email addresses, phone numbers, and social insurance numbers. A generated id (also known as a surrogate id) is one that is generated by the system. A sequence number in JPA is a sequential id generated by the JPA implementation and automatically assigned to new objects. The benefits of using sequence numbers are that they are guaranteed to be unique, allow all other data of the object to change, are efficient values for querying and indexes, and can be efficiently assigned. The main issue with natural ids is that everything always changes at some point; even a person's social insurance number can change. Natural ids can also make querying, foreign keys and indexing less efficient in the database.

In JPA an  can be easily assigned a generated sequence number through the   annotation, or   element.

Sequence Strategies
There are several strategies for generating unique ids. Some strategies are database agnostic and others make use of built-in databases support.

JPA provides support for several strategies for id generation defined through the  enum values: TABLE, SEQUENCE and IDENTITY.

The choice of which sequence strategy to use is important as it affects performance, concurrency and portability.

Table sequencing
Table sequencing uses a table in the database to generate unique ids. The table has two columns, one stores the name of the sequence, the other stores the last id value that was assigned. There is a row in the sequence table for each sequence object. Each time a new id is required the row for that sequence is incremented and the new id value is passed back to the application to be assigned to an object. This is just one example of a sequence table schema, for other table sequencing schemas see Customizing.

Table sequencing is the most portable solution because it just uses a regular database table, so unlike sequence and identity can be used on any database. Table sequencing also provides good performance because it allows for sequence pre-allocation, which is extremely important to insert performance, but can have potential concurrency issues.

In JPA the  annotation or   element is used to define a sequence table. The TableGenerator defines a  for the column used to store the name of the sequence,   for the column used to store the last id allocated, and   for the value to store in the name column (normally the sequence name).

Error when allocating a sequence number.

 * Errors such as "table not found", "invalid column" can occur if you do not have a SEQUENCE table defined in your database, or its schema does not match what you have configured, or what your JPA provider is expecting by default. Ensure you create the sequence table correctly, or configure your   to match the table that you created, or let your JPA provider create you tables for you (most JPA provider support schema creation).  You may also get an error such as "sequence not found", this means you did not create a row in the table for your sequence.  You must insert an initial row in the sequence table for your sequence with the initial id (i.e.  ), or let your JPA provider create your schema for you.

Deadlock or poor concurrency in the sequence table.

 * See concurrency issues.

Sequence objects
Sequence objects use special database objects to generate ids. Sequence objects are only supported in some databases, such as Oracle, DB2, and Postgres. Usually, a SEQUENCE object has a name, an INCREMENT, and other database object settings. Each time the  is selected the sequence is incremented by the INCREMENT.

Sequence objects provide the optimal sequencing option, as they are the most efficient and have the best concurrency, however they are the least portable as most databases do not support them. Sequence objects support sequence preallocation through setting the INCREMENT on the database sequence object to the sequence preallocation size.

In JPA the  annotation or   element is used to define a sequence object. The SequenceGenerator defines a  for the name of the database sequence object, and an   for the sequence preallocation size or sequence object INCREMENT.

Error when allocating a sequence number.

 * Errors such as "sequence not found", can occur if you do not have a SEQUENCE object defined in your database. Ensure you create the sequence object, or let your JPA provider create your schema for you (most JPA providers support schema creation).  When creating your sequence object, ensure the sequence's   matches your  's  .  The DDL to create a sequence object depends on the database, for Oracle it is,.

Invalid, duplicate or negative sequence numbers.

 * This can occur if your sequence object's  does not match your  .  This results in the JPA provider thinking it got back more sequences than it really did, and ends up duplicating values, or with negative numbers.  This can also occur on some JPA providers if your sequence object's   is 0 instead of a value equal or greater to the.

Identity sequencing
Identity sequencing uses special IDENTITY columns in the database to allow the database to automatically assign an id to the object when its row is inserted. Identity columns are supported in many databases, such as MySQL, DB2, SQL Server, Sybase, and PostgreSQL. Oracle supports IDENTITY columns as of Oracle 12c. If using an older version, it is possible to simulate them using sequence objects and triggers.

Although identity sequencing seems like the easiest method to assign an id, they have several issues. One is that since the id is not assigned by the database until the row is inserted the id cannot be obtained in the object until after commit or after a flush call. Identity sequencing also does not allow for sequence preallocation, so can require a select for each object that is inserted, potentially causing a major performance problem, so in general are not recommended.

In JPA there is no annotation or element for identity sequencing as there is no additional information to specify. Only the 's strategy needs to be set to.

PostgreSQL Serial Columns
Sequence objects are database constructs and can be instantiated as distinct objects as needed. They provide incremental values when their "next" method is called. Note that the increments can be values greater than one, for example incremented by two's, three's etc. An 'on insert' trigger can be constructed for each table that calls a sequence object created for that specific table primary key asking for the next value and inserting it into the table with the new record. Identity datatypes in products like MySQL or SQL Server are encapsulated types that do the same thing without requiring the set up of triggers and sequences (although PostgreSQL at least has automated much of this with its Serial and Big Serial pseudo datatypes (these actually create an int/bigint column and run a "macro" creating the sequence and 'on insert' trigger when the CREATE TABLE statement is executed.)).

null is inserted into the database, or error on insert.

 * This typically occurs because the @Id was not configured to use an @GeneratedValue(strategy=GenerationType.IDENTITY). Ensure it is configured correctly.  It could also be that your JPA provider does not support identity sequencing on the database platform that you are using, or you have not configured your database platform.  Most providers require that you set the database platform through a persistence.xml property, most provider also allow you to customize your own platform if it is not directly supported. It may also be that you did not set your primary key column in your table to be an identity type.

Object's id is not assigned after persist.

 * Identity sequencing requires the insert to occur before the id can be assigned, so it is not assigned on persist like other types of sequencing. You must either call   on the current transaction, or call   on the  .  It may also be that you did not set your primary key column in your table to be an identity type.

Child's id is not assigned from parent on persist.

 * A common issue is that the generated  is part of a child object's   through a   or   mapping.  In this case, because JPA requires that the child define a duplicate   mapping for the , its   will be inserted as null.  One solution to this is to mark the   on the   mapping in the child as  , and define the   or   using a normal   this will ensure the foreign key field is populated by the   or   not the  .  Another option is to first persist the parent, then call   before persisting the child.

Poor insert performance.

 * Identity sequencing does not support sequence preallocation, so requires a select after each insert, in some cases doubling the insert cost. Consider using a sequence table, or sequence object to allow sequence preallocation.

Lost latest free id.

 * MySQL bug 199 causes its autoincrement counter to be lost on restart. So if the last entity is removed and the MySQL server restarted, the same id will be re-used and thus not be unique.

=Advanced=

Composite Primary Keys
A composite primary key is one that is made up of several columns in the table. A composite primary key can be used if no single column in the table is unique. It is generally more efficient and simpler to have a one-column primary key, such as a generated sequence number, but sometimes a composite primary key is desirable and unavoidable.



Composite primary keys are common in legacy database schemas, where cascaded keys can sometimes be used. This refers to a model where dependent objects' key definitions include their parents' primary key; for example, 's primary key is ,  's primary key is composed of a   and a  ,  's primary key is composed of  ,  , and  , and so on. Although this generally does not match object-oriented design principles, some DBA's prefer this model. Difficulties with the model include the restriction that employees cannot switch departments, that foreign-key relationships become more complex, and that all primary-key operations (including queries, updates, and deletes) are less efficient. However, each department has control over its own employee IDs, and if needed the database  table can be partitioned based on the   or , as these are included in every query.

Other common usages of composite primary keys include many-to-many relationships where the join table has additional columns, so the table itself is mapped to an object whose primary key consists of the pair of foreign-key columns and dependent or aggregate one-to-many relationships where the child object's primary key consists of its parent's primary key and a locally unique field.

There are two methods of declaring a composite primary key in JPA,  and.

Id Class
An  defines a separate Java class to represent the primary key. It is defined through the  annotation or   XML element. The  must define an attribute (field/property) that mirrors each   attribute in the entity. It must have the same attribute name and type. When using an  you still require to mark each   attribute in the entity with.

The main purpose of the  is to be used as the structure passed to the     and   API. Some JPA products also use the  as a cache key to track an object's identity. Because of this, it is required (depending on JPA product) to implement an  and   method on the. Ensure that the  method checks each part of the primary key, and correctly uses   for objects and   for primitives. Ensure that the  method will return the same value for two equal objects.
 * TopLink / EclipseLink : Do not require the implementation of  or   in the id class.

Embedded Id
An  defines a separate   Java class to contain the entities primary key. It is defined through the  annotation or   XML element. The 's   class must define each id attribute for the entity using   mappings. All attributes in the 's   are assumed to be part of the primary key.

The  is also used as the structure passed to the     and   API. Some JPA products also use the  as a cache key to track an object's identity. Because of this, it is required (depending on JPA product) to implement an  and   method on the. Ensure that the  method checks each part of the primary key, and correctly uses   for objects and   for primitives. Ensure that the  method will return the same value for two equal objects.
 * TopLink / EclipseLink : Do not require the implementation of  or   in the id class.

Primary Keys through OneToOne and ManyToOne Relationships
A common model is to have a dependent object share the primary key of its parent. In the case of a  the child's primary key is the same as the parent, and in the case of a   the child's primary key is composed of the parent's primary key and another locally unique field.

JPA 1.0 does not allow  on a   or , but JPA 2.0 does.

One of the biggest pitfalls when working with composite primary keys comes with the implementation of associations to entity classes whose tables have multi-column primary keys (using the @JoinColumns annotation). Many JPA implementations may throw seemingly inconsistent exceptions when not specifying referencedColumnName for every @JoinColumns annotation, which JPA requires for composite foreign keys (even if all referenced column names are equal to the ones in the referencing table). See http://download.oracle.com/javaee/5/api/javax/persistence/JoinColumns.html

JPA 1.0
Unfortunately JPA 1.0 does not handle this model well, and things become complicated, so to make your life a little easier you may consider defining a generated unique id for the child. JPA 1.0 requires that all  mappings be   mappings, so if your   comes from a foreign key column through a   or   mapping, you must also define a     mapping for the foreign key column. The reason for this is in part that the  must be a simple object for identity and caching purposes, and for use in the   or the     API.

Because you now have two mappings for the same foreign key column you must define which one will be written to the database (it must be the  one), so the   or   foreign key must be defined to be read-only. This is done through setting the  attributes   and   to false, or by using the   instead of the.

A side effect of having two mappings for the same column is that you now have to keep the two in synch. This is typically done through having the set method for the  attribute also set the   attribute value to the target object's id. This can become very complicated if the target object's primary key is a, in this case you must ensure that the target object's id has been assigned before relating the two objects.

Some times I think that JPA primary keys would be much simpler if they were just defined on the entity using a collection of s instead of mixing them up with the attribute mapping. This would leave you free to map the primary key field in any manner you desired. A generic  could be used to pass the primary key to   methods, and it would be the JPA provider's responsibility for hashing and comparing the primary key correctly instead of the user's. But perhaps for simple singleton primary key models the JPA model is more straight forward.


 * TopLink / EclipseLink : Allow the primary key to be specified as a list of columns instead of using  mappings.  This allows   and   mapping foreign keys to be used as the primary key without requiring a duplicate mapping.  It also allows the primary key to be defined through any other mapping type.  This is set through using a   and the     API.


 * Hibernate / Open JPA / EclipseLink (as of 1.2): Allows the  annotation to be used on a   or   mapping.

JPA 2.0
Defining an  for a   or   in JPA 2.0 is much simpler. The  annotation or   XML attribute can be added to a   or   mapping. The  used for the object will be derived from the target object's. If the  is a single value, then the source object's   is the same as the target object's. If it is a composite, then the   will contain the     attributes, and the target object's   as the relationship value. If the target object also has a composite, then the source object's   will contain the target object's.

Concurrency and Deadlocks
One issue with table sequencing is that the sequence table can become a concurrency bottleneck, even causing deadlocks. If the sequence ids are allocated in the same transaction as the insert, this can cause poor concurrency, as the sequence row will be locked for the duration of the transaction, preventing any other transaction that needs to allocate a sequence id. In some cases the entire sequence table or the table page could be locked causing even transactions allocating other sequences to wait or even deadlock. If a large sequence pre-allocation size is used this becomes less of an issue, because the sequence table is rarely accessed. Some JPA providers use a separate (non-JTA) connection to allocate the sequence ids in, avoiding or limiting this issue. In this case, if you use a JTA data-source connection, it is important to also include a non-JTA data-source connection in your persistence.xml.

Guaranteeing Sequential Ids
Table sequencing also allows for truly sequential ids to be allocated. Sequence and identity sequencing are non-transactional and typically cache values on the database, leading to large gaps in the ids that are allocated. Typically this is not an issue and desired to have good performance, however if performance and concurrency are less of a concern, and true sequential ids are desired then a table sequence can be used. By setting the  of the sequence to 1 and ensuring the sequence ids are allocated in the same transaction of the insert, you can guarantee sequence ids without gaps (but generally it is much better to live with the gaps and have good performance).

Running Out of Numbers
One paranoid delusional fear that programmers frequently have is running out of sequence numbers. Since most sequence strategies just keep incrementing a number it is unavoidable that you will eventually run out. However as long a large enough numeric precision is used to store the sequence id this is not an issue. For example if you stored your id in a NUMBER(5) column, this would allow 99,999 different ids, which on most systems would eventually run out. However if you store your id in a NUMBER(10) column, which is more typical, this would store 9,999,999,999 ids, or one id each second for about 300 years (longer than most databases exist). But perhaps your system will process a lot of data, and (hopefully) be around a very long time. If you store your id in a NUMBER(20) this would be 99,999,999,999,999,999,999 ids, or one id each millisecond for about 3,000,000,000 years, which is pretty safe.

But you also need to store this id in Java. If you store the id in a Java int, this would be a 32 bit number, which is 4,294,967,296 different ids (well actually 2,147,483,648 positive ids), or one id each second for about 100 years. If you instead use a long, this would be a 64 bit number, which is 9,223,372,036,854,775,808 different ids, or one id each millisecond for about 300,000,000 years, which is pretty safe. I would recommend using a long vs an int however, as I have seen int ids run out on large databases before (what happens is they become negative until they wrap around to 0 and start getting constraint errors).

Customizing
JPA supports three different strategies for generating ids, however there are many other methods. Normally the JPA strategies are sufficient, so you would only use a different method in a legacy situation.

Sometimes the application has an application specific strategy for generating ids, such as prefixing ids with the country code, or branch number. There are several ways to integrate a customize ids generation strategy, the simplest is just define the id as a normal id and have the application assign the id value when the object is created.

Some JPA products provide additional sequencing and id generation options, and configuration hooks.


 * TopLink, EclipseLink : Several additional sequencing options are provided. A   allows a single column table to be used.  A   allows for custom SQL or stored procedures to be used.  An API also exists to allow a user to supply their own code for allocating ids.


 * Hibernate : A GUID id generation options is provided through the  annotation.

Primary Keys through Triggers
A database table can be defined to have a trigger that automatically assign its' primary key. Generally this is normally not a good idea (although some DBAs may think it is), and it is better to use a JPA provider generated sequence id, or assign the id in the application. The main issue with the id being assigned in a trigger is that the application and object require this value back. For non-primary key values assigned through triggers it is possible to refresh the object after committing or flushing the object to obtain the values back. However this is not possible for the id, as the id is required to refresh an object.

If you have an alternative way to select the id generated by the trigger, such as selecting the object's row using another unique field, you could issue this  select after the insert to obtain the id and set it back in the object. You could perform this select in a JPA event. Some JPA providers may not allow/like a query execution during an event, they also may not pick up a change to an object during an event callback, so there may be issues with doing this. Also some JPA providers may not allow the primary key to be un-assigned/null when not using a, so you may have issues. Some JPA providers have built-in support for returning values assigned in a trigger (or stored procedure) back into the object.


 * TopLink / EclipseLink : Provide a  that allows for any field values including the primary key to be returned from the database after an insert or update.  This is defined through the ,   annotations, or the  ,   XML elements in the eclipselink-orm.xml.

Primary Keys through Events
If the application generates its own id instead of using a JPA, it is sometimes desirable to perform this id generation in a JPA event, instead of the application code having to generate and set the id. In JPA this can be done through the event.

No Primary Key
Sometimes your object or table has no primary key. The best solution in this case is normally to add a generated id to the object and table. If you do not have this option, sometimes there is a column or set of columns in the table that make up a unique value. You can use this unique set of columns as your id in JPA. The JPA  does not always have to match the database table primary key constraint, nor is a primary key or a unique constraint required.

If your table truly has no unique columns, then use all of the columns as the id. Typically when this occurs the data is read-only, so even if the table allows duplicate rows with the same values, the objects will be the same anyway, so it does not matter that JPA thinks they are the same object. The issue with allowing updates and deletes is that there is no way to uniquely identify the object's row, so all of the matching rows will be updated or deleted.

If your object does not have an id, but its table does, this is fine. Make the object an  object, embeddable objects do not have ids. You will need a  that contains this   to persist and query it.

/Mapping