Thursday, February 6, 2014

Spring Data on GAE - Part 2 - Datastore Key

In the last blog we see it's straightforward to use Spring Data on GAE platform. However, due to the design of GAE datastore, we may see unexpected behavior about transactionality.

Assume that we define a simple entity with a primary key of Long type:

@Entity
public class Player {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id; 

    // other properties skipped

    public Player() {
    }

    // setters and getters skipped
}

If we update multiple Player instances in the same transaction and immediately query the result, we may see that the expected data changes will not be effective at the same time. For example, we may see Player A's data has been updated but not Player B's. However, if we keep querying the database, eventually we can see all the data changes.

It is because GAE datastore assumes each entity has an optional ancestor path. Entities with the same ancestor path will be placed in the same entity group. In GAE datastore, an entity group is the unit where transactionality can be guaranteed.

1. GAE Datastore Entity Key


According to the datastore design, each entity has a primary key composed of the following three elements (see here):
  • The entity's kind
  • An identifier, which can be either
    • a key name string
    • an integer ID
  • An optional ancestor path locating the entity within the Datastore hierarchy
When we use DataNucleus JPA to persist our entities to datastore, it will assign the simple class name (i.e. "Player") to the entity's kind value. For the previous POJO definition, it will also generate an Long identifier for us. However, the ancestor path will be empty.

Therefore, each Player instance is in its own entity group, and updates to multiple entity groups may not be effective at the same time.

2. Use GAE Primary Key Type


One way to overcome the issue is to group all Player entities in the same entity group by defining a common ancestor path. This can be done by using the com.google.appengine.api.datastore.Key class as primary key, instead of Long. However, I do not prefer this way as it makes the entity class very Google specific.

Instead I use the DataNucleus extension as advised by the book Programming Google App Engine.

Firstly, I need to change the primary key of Player entity from the type Long to String. I also need to include the DataNucleus annotation (@org.datanucleus.api.jpa.annotations.Extension) to generate a GAE primary key for me. I know it still depends on DataNucleus but I think it is better than depending on a specific Google class.

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Extension(vendorName = "datanucleus", key = "gae.encoded-pk", value = "true")
private String id;

I also need to add a new property to the entity to indicate the ancestor, which I created a Parent class to represent it.

@Basic
@Extension(vendorName = "datanucleus", key = "gae.parent-pk", value = "true")
private String parentKey;

@ManyToOne
private Parent parent;

public String getParentKey() {
    return parentKey;
}

public void setParentKey(String parentKey) {
    this.parentKey = parentKey;
}

@com.fasterxml.jackson.annotation.JsonIgnore
public Parent getParent() {
    return parent;
}

public void setParent(Parent parent) {
    this.parent = parent;
}


Parent.java
@Entity
public class Parent {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Extension(vendorName = "datanucleus", key = "gae.encoded-pk", value = "true")
    String key;

    @OneToMany(cascade = CascadeType.ALL, mappedBy = "parent")
    private List<Player> players;

    // setters and getters skipped 
} 


Note that the @JsonIgnore annotation has been added for the parent property to avoid the potential recursive references when creating the JSON string.

3. Save the Parent and Players objects


To create a parent, we can either use the EntityManager API or define a Spring Data repository to implement the DAO for us. Here I use a Spring Data repository.

public interface ParentRepository extends JpaRepository<Parent, String> {
}


The common parent object can be created in an utility method like the following. It returns a Parent object which we need for the creation of Player objects.

public Parent getParent() {
    List<Parent> parents = parentDao.findAll();
    if (parents == null || parents.size() == 0) {
        parent = new Parent();
        parentDao.save(parent);
    } else {
        parent = parents.get(0);
    }

    parentKey = KeyFactory.stringToKey(parent.getKey());

    return parent;
}


In previous blog we have created a Spring MVC Controller class to map the URL /service/players/initDB to a method:

@RequestMapping(value = "/initDB", method = RequestMethod.GET)

@Transactional(readOnly = false, isolation = Isolation.READ_COMMITTED)

public ResponseEntity<String> initDB() {

    dataService.initDB();

    return new ResponseEntity<String>("Players inserted to database", HttpStatus.OK);

}


The following method creates the players. Note that we need to set the parent property so that DataNucleus can set the parentKey string for us.

public ResponseEntity<String> initDB() {
    // get entity group parent
    Parent p = getParent();

    // insert testing player data

    List<Player> players = new ArrayList<Player>();
    players.add(new Player("Snoopy", "9p", p));
    players.add(new Player("Wookstock", "9p", p));
    players.add(new Player("Charlie", "1d", p));
    players.add(new Player("Lucy", "4d", p));
    players.add(new Player("Sally", "5d", p));
    playerRepository.save(players);
    return new ResponseEntity<String>("5 players inserted into database", HttpStatus.OK);
 }



That's it. If we init the database using the URL, we will see that the complete list of players can be shown immediately when we submit a query URL /service/players (see last blog for the Spring MVC implementation).

[{"id":"ahJhbmd1bGFyLXNwcmluZy1nYWVyJgsSBlBhcmVudBiAgICAgICACgwLEgZQbGF5ZXIYgICAgICAkAgM","parentKey":"ahJhbmd1bGFyLXNwcmluZy1nYWVyEwsSBlBhcmVudBiAgICAgICACgw","name":"Sally","rank":"5d"},{"id":"ahJhbmd1bGFyLXNwcmluZy1nYWVyJgsSBlBhcmVudBiAgICAgICACgwLEgZQbGF5ZXIYgICAgICA4AgM","parentKey":"ahJhbmd1bGFyLXNwcmluZy1nYWVyEwsSBlBhcmVudBiAgICAgICACgw","name":"Snoopy","rank":"9p"},{"id":"ahJhbmd1bGFyLXNwcmluZy1nYWVyJgsSBlBhcmVudBiAgICAgICACgwLEgZQbGF5ZXIYgICAgICA4AkM","parentKey":"ahJhbmd1bGFyLXNwcmluZy1nYWVyEwsSBlBhcmVudBiAgICAgICACgw","name":"Charlie","rank":"1d"},{"id":"ahJhbmd1bGFyLXNwcmluZy1nYWVyJgsSBlBhcmVudBiAgICAgICACgwLEgZQbGF5ZXIYgICAgICA4AoM","parentKey":"ahJhbmd1bGFyLXNwcmluZy1nYWVyEwsSBlBhcmVudBiAgICAgICACgw","name":"Wookstock","rank":"9p"},{"id":"ahJhbmd1bGFyLXNwcmluZy1nYWVyJgsSBlBhcmVudBiAgICAgICACgwLEgZQbGF5ZXIYgICAgICA4AsM","parentKey":"ahJhbmd1bGFyLXNwcmluZy1nYWVyEwsSBlBhcmVudBiAgICAgICACgw","name":"Lucy","rank":"4d"}]

We can also query a particular player by URL /service/players/{id} , with the id string shown above. The id string is in fact the encoded primary key which can be converted to google Key object by com.google.appengine.api.datastore.KeyFactory.stringToKey(String) method.

Now the primary key is about 80 characters long, which can locate ANY datastore entity because it encodes the full key (kind, id, ancestor path). It is not necessary if we know that it is used to locate a Player object. Preferably I would like query a player using only the Long id part of the key. I would discuss the way to do that by using a custom Spring Data Repository in next part.

The source can be found at GitHub (tagged v0.2).



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.