Key-Value Storage using MemcacheDB

What is Entity-Attribute-Value model (aka key-value storage)

This is also know as Entity-Attribute-Value model, and it is used in circumstances where the number of attributes (properties) that can be used to describe an entity  is very vast but the number of attributes that will actually be used is modest.

Let’s think in terms of a database how an Entity-Attribute-Value model would look like for storing an user profile.

id user_id key value
1 101 screen_name john
2 101 first_name John
3 101 last_name Smith

The table has one row for each Attribute-Value pair. In practice, we prefer to separate values based on data type to let the database to perform type validation checks and to support proper indexing. So programmers tend to create separate EAV tables for strings, real and integer numbers, dates, long text and BLOBS.

The benefits of such structure are:

  1. Flexibility, there is no limit on attributes used to describe an entity. No schema redesign.
  2. The storage is efficient on sparse data.
  3. Easy to put the data into an XML format for interchange.

There are also some important drawbacks:

  1. No real use of data types
  2. Awkward use of database constraints
  3. There are several problems in querying such a structure.

What is MemcacheDB

Memcachedb is a distributed key-value storage system designed for persistence. It is a very fast an reliable distributed storage. It includes transaction and replication. It is using Berkeley DB as persistence storage.

Why is better than a database?

  1. Faster, no SQL engine on top of MemcacheDB
  2. Designed for concurrency, design for millions of requests
  3. Optimized for small data

Memcachedb is suitable for Messaging, metadata storage, Identity Management (Accounts, Profiles, Preferences, etc), index, counters, flags, etc.

The main features for Memcachedb are:

Storage, replication and recovery

Berkeley DB stores data quickly and easily without the overhead found in other databases. Read more about Berkeley DB here

MemcacheDB supports replication using Masters and Slaves nodes. The exact deployment design must chosen according with your application needs. A MemcacheDB environment consists intro three things:

One problem could be spot in Log files, that record you transaction, over time they will contain a lot of data making the recovery a pain moment. For this Memcache DB has a Checkpoint. The checkpoint empties the in-memory cache, writes a checkpoint record, flushes the logs and writes a list of open database files.

Berkeley DB also allows hot backups and uses gzip and tar to compress the backup.

Monitoring

Memcache DB has a lot of built in commands for monitoring, such as:

What i liked most at Memcached is that you can use telnet to log on the running process and issue commands from command prompt. The same thing is valid also for MemcacheDB.

Besides memcached built function the Berkeley DB engine comes with his own stats command:

db_stats, –c locking statistics, –l logging statistics, –m cache statistics, –r replication statistics, –t transaction statistics.

Overall i liked what i saw about this alternative and i think that this is the most suitable solution for storing user profiles and user data that don’t need to be queried. When you need to scale this is for sure a very reliable solution. Have fun!

Further reading

Homepage: http://memcachedb.org

Mailing list: http://groups.google.com/group/memcachedb

Comments

One Response to “Key-Value Storage using MemcacheDB”

  1. Filip on April 7th, 2009 2:49 am

    Readers of this article should also take a look at Starling here: http://rubyforge.org/projects/starling/