There is a lot of innovation happening now in the alternative data storage space. People working on these projects have started a NOSQL community already and those not being part of it yet are trying to come up with schema-less approaches on top of relational databases.
There are a few things I am concerned about though. It looks like each of these solution is inventing its own API and is using their own protocols (being it memcached(-like), protobuffers , thrift , absolutely custom, etc.). I am not sure what the adoption status of these solution is right now, but I believe that over time these inconsistencies will become extremely expensive. While probably still early, but I really wish the NOSQL guys will start talking sooner than later about common APIs and protocols. (n.b. I am aware that there is almost impossible to expose the whole functionality of these systems through a common API, but I’m pretty sure it will be possible to find out the common points).
I also think that anyone looking into this field will have quite a hard time figuring out what’s his best option. I know that the NOSQL people are doing their best to add documentation and provide valuable help on their user groups, but there seems to be an almost complete lack of information on recommended usage scenarios. And there also might be the misconception about what commodity hardware means for others.
I usually don’t trust (micro or not) benchmarks, but I have to agree that VPork is an interesting and possibly very useful initiative:
With the wide range of distributed, non relational databases out there it is hard to know which one to choose. One part of the puzzle is of course performance. Personally I’m interested in low response times.
Here is my short TODO list on how to make things better:
Is there anything else you’d add to this list?