From an interview with Steve Souders published on ☞ O’Reilly Radar:
The first thing I recommend website owners do is get a handle, get an idea of the overall page load time. And then the second thing I say is break that into two parts: the back-end and the front-end. And if you find that like most websites, your back-end time is less than 10 or 20 percent, then you’re correct in focusing on these front-end best practices. If your back-end time is 30 to 50 percent or more, then you should really start on looking at your back-end architecture.
And in case you are running a media company, you should also check the part of the interview focused on the degradation of performance due to non-optimized ad servers.
Good application architectures are supposed to transcend their platform, so you might find this book useful. It is also available for free in PDF format.
After writing about MonetDB and InfoBright, the MySQL Performance guys are now taking a look at ☞ Tokyo Tyrant, the network interface of the NOSQL Tokyo Cabinet solution.
Their observations span 3 articles (☞ part 1, ☞ part 2 and ☞ part 3) and cover subjects like data durability (the D in ACID), read and write performance.
And if I mentioned Tokyo Cabinet, I should point you to a presentation given by Ilya Grigorik (of PostRank): ☞ Lean & Mean Tokyo Cabinet Recipes
Then if you have another 57minutes, you can watch the O’Reilly Webcast: Tokyo Cabinet in One Hour:
Protocol buffers (or ☞ protobuf) are a way of encoding structured data in an efficient yet extensible format. The project have been released by Google who is using it internally for RPC protocols and file formats. Facebook has released a while back ☞ Thrift: which serves the same purposes. And people spent time trying to figure which one outperforms the other.
InfoQ has published an article about ☞ BERT a new format now powers GitHub’s backend which is different from the above two.
And there is also ☞ BSON the format introduced by MongoDB, which is a binary-encoded serialization of JSON-like documents.
This is just a quick thought I had this morning: KV (key-value) storage solutions are excelling at item-based read/write throughput, but suck at everything that involves range queries. The column-based storage solutions might probably not have the same read/write throughput, but have a better chance at offering range queries.
I’ll probably have to check this by looking at some of the solutions included in the NOSQL reference.
Meanwhile, what do you think? Are there any other upfront ‘advantages’?
Everyone interested in learning or applying REST should definitely start his journey by reading Roy Fielding’s Architectural Styles and the Design of Network-based Software Architectures ☞ which focuses on the rationale behind the design of the modern Web architecture and how it differs from other architectural styles.
Over time, I’ve put together a list of articles that introduced me to REST and helped me learn the concepts and how to apply them. Anyways, I’m not going to include all my Delicious tagged bookmarks here, but only quote the ones that I think we’ll give a good head start.
This article written by Stefan Tilkov talks about the REST five key principles:
- Give every “thing” an ID
- Link things together
- Use standard methods
- Resources with multiple representations
- Communicate statelessly
Another great article from Stefan Tilkov:
Invariably, learning about REST means that you’ll end up wondering just how applicable the concept really is for your specific scenario. And given that you’re probably used to entirely different architectural approaches, it’s only natural that you start doubting whether REST, or rather RESTful HTTP, really works in practice, or simply breaks down once you go beyond introductory, “Hello, World”-level stuff.
Mark Baker’s article is about that “Link things together principle” (formally known as hypermedia):
[…] there’s a sub-constraint that goes by the unwieldly name of “Hypermedia as the engine of application state”, which is arguably the most important constraint of REST in the sense that it alone provides the bulk of the “shape” of RESTful systems as we know them.
Once you got used to the REST concepts, the very next thing that might prove useful is to see real examples. Gregor Roth’s article is a good intro to how to design your RESTful HTTP app:
Applications which implement a dedicated architecture style will use the same patterns and other architectural elements such as caching or distribution strategies in the same way.
When people start trying out REST, they usually start looking around for examples – and not only find a lot of examples that claim to be “RESTful”, or are labeled as a “REST API”, but also dig up a lot of discussions about why a specific service that claims to do REST actually fails to do so.
There is also the excellent book written by Leonard Richardson and Sam Ruby: RESTful Web Services: Web Services for the real world ☞:
You’ve built web sites that can be used by humans. But can you also build web sites that are usable by machines? That’s where the future lies, and that’s what this book shows you how to do. Today’s web service technologies have lost sight of the simplicity that made the Web successful. This book explains how to put the “Web” back into web services with REST, the architectural style that drives the Web.
If there are other resources that you consider fundamental to learning and applying REST please drop a comment.
Disclaimer: Most of the linked articles have been published on InfoQ, but their inclusion here is based on my learning experience only.
My apologies for the typos in the post. If you find more please do let me know.
The article shows some awesome write performance on Redis:
The benchmarks I’ve done were for applications which is very update intensive with updates being pretty much random single row updates which are hard to batch. With MySQL/Innodb I got server being able to handle some 30.000 updates/sec on 16 core server with replication being able to handle 10.000 updates/sec. This was using about 5 cores so you could probably get 4 MySQL instances on this server and get up to 100K updates/sec with up to 40K updates/sec being able to replicate.
With Redis I got about 3 times more updates/sec – close to 100.000 updates/sec with about 1.5 core being used. I have not tried running multiple instances and I’m not sure the network and TCP stack would scale linearly in this case but anyway we’re speaking about hundreds of thousands of updates/sec.
Maybe a bit old, but definitely worth the read. Make sure you are checking the comment thread too, as it adds a lot of details.
In case you are reading Romanian (or you’d like to test the Google Translator service), here is another analysis done for a Ruby on Rails app ☞.
Update: MongoDB 1.0.0 is production ready and available for download ☞.
MongoDB is getting closer to the 1.0 release. From mongodb:
MongoDB 0.9.10 has been released. This release fixes a few minor bugs in 0.9.9 in preperation for 1.0. Please give it a try and let us know if there are any issues.
Downloads: http://www.mongodb.org/display/DOCS/Downloads
Jira change log: http://jira.mongodb.org/browse/SERVER/fixforversion/10035
Git change log: http://mongo-db.appspot.com/changelog/mongo/0.9.10
From Evan Weaver (@evan):
Ok, here are the common Cassandra misconceptions, and their sources, gleaned from experience and talking to various people.
Some are saying that the best way to understand Cassandra data model is to look at the Twitter model mapped in Cassandra:

Interesting remark about what means commodity hardware:
When I first heard the term, I (and many others I know) considered this to be on the order of desktop-grade machines - the machines I’d purchased were Dual Core 2+ Ghz Dell Desktops (purchased on eBay for $350 a piece). Well, you can definitely do certain tasks within the framework with these types of machines, but an ideal configuration consists of something much stronger - server-grade, quad core, 8Gigs of RAM, etc. HBase (particularly if you are going to do a lot of writes), needs really good Machine IO. If you are going to try to use machines with slow drives and controllers, it might be possible if you have a ton of datanodes, but not as advisable on smaller clusters.