archive about

a suggestion for efficient and scalable counters in Datastore

As I've mentioned before, I'm trying to migrate urlBorg to Google AppEngine. urlBorg needs to count many things, like clicks on a short URL, etc, so I really need a scalable and efficient way to implement counters. This is not as trivial as it sounds in the Google AppEngine environment.

This post is actually the result of a good discussion done here

Here is the code I've come up with. An example usage would be as simple as adding a line like (where page_id is a unique string identifying each page)

Acc(page_id).acc()

in each one of your pages. Getting the total coun is as simple as

Acc(page_id).val()
(Due to the way the total count is calculated, this may not give accurate results if you are in the middle of a traffic spike, but it's good enough for web analytics usage)

class AccVals(db.Model):
       cluster = db.StringProperty(required=True)
       count = db.IntegerProperty(required=True)
       updated = db.DateTimeProperty(auto_now=True)
       rand = db.FloatProperty()

class Acc(): def init(self, name,init=0): self.__sec = 0.1 self.__name = name self.__init = init

   def inc(self):
           def trans(key):
                   obj = AccVals.get(key)
                   obj.count += 1
                   obj.put()
                   self.__val = obj.count

           q = db.Query(AccVals).filter('cluster =',self.__name).filter('rand >', random.random()).get()
           if (q):
                   if (datetime.datetime.now() - q.updated < datetime .timedelta(0,self.__sec)):
                           obj = AccVals(cluster=self.__name,count=self.__init, rand=random.random() )
                           key = obj.put()
                   else:
                           key = q.key()
           else:
                   obj = AccVals(cluster=self.__name,count=self.__init, rand=1.0 )
                   key = obj.put()

           db.run_in_transaction(trans,key)
           return self.__val

   def val(self):
           total = 0
           q = AccVals.all()
           q.filter('cluster =',self.__name)
           for r in q:
                   total += r.count
           return total

It behaves relatively good and looks like it can scale no matter how much traffic or traffic spikes you have.

If you look into it, you will see that a "counter instance" is chosen in random. You may be tempted to use the "instance" that was updated longer in the past ( order('-updated').get() ), but it turns out that when you have a traffic spike (or whatever it is your counters count) the indexes are not updated soon enough and this will return the last records that were updated :-) It looks like selecting a random instance is no big deal in low traffic and works much better in high traffic. I've also seen that after a while, you end up with the number of counter instances that are required to handle the traffic of the specific counter with few transaction collisions.

There is one interesting point: the value of self.__sec. I set it to 0.1 seconds, but this is just a value that looked good after some tests. I have the impression that this value is related to some kind of "global AppEngine constant", measuring the time it takes for a transaction to complete and safely propagate to the rest of the infrastructure. I guess this varies, depending on the resource allocation done for a specific app. Could someone from the AppEngine development team give us some insight on this?

As I've mentioned before, I'm a Python newbie, so use the code above at your risk :-)

Please post your comments here, so that they are all in one place.