I'll try to answer your question, but I'll start with something that may look strange at first: if you are not interested in Redis internals you should not care about how data types are implemented internally. This is for a simple reason: for every Redis operation you'll find the time complexity in the documentation and, if you have the set of operations and the time complexity, the only other thing you need is some clue about memory usage (and because we do many optimizations that may vary depending on data, the best way to get these latter figures are doing a few trivial real world tests).
But since you asked, here is the underlying implementation of every Redis data type.
- Strings are implemented using a C dynamic string library so that we don't pay (asymptotically speaking) for allocations in append operations. This way we have O(N) appends, for instance, instead of having quadratic behavior.
- Lists are implemented with linked lists.
- Sets and Hashes are implemented with hash tables.
- Sorted sets are implemented with skip lists (a peculiar type of balanced trees).
But when lists, sets, and sorted sets are small in number of items and size of the largest values, a different, much more compact encoding is used. This encoding differs for different types, but has the feature that it is a compact blob of data that often forces an O(N) scan for every operation. Since we use this format only for small objects this is not an issue; scanning a small O(N) blob is cache oblivious so practically speaking it is very fast, and when there are too many elements the encoding is automatically switched to the native encoding (linked list, hash, and so forth).
But your question was not really just about internals, your point was What type to use to accomplish what?.
Strings
This is the base type of all the types. It's one of the four types but is also the base type of the complex types, because a List is a list of strings, a Set is a set of strings, and so forth.
A Redis string is a good idea in all the obvious scenarios where you want to store an HTML page, but also when you want to avoid converting your already encoded data. So for instance, if you have JSON or MessagePack you may just store objects as strings. In Redis 2.6 you can even manipulate this kind of object server side using Lua scripts.
Another interesting usage of strings is bitmaps, and in general random access arrays of bytes, since Redis exports commands to access random ranges of bytes, or even single bits. For instance check this good blog post: Fast Easy real time metrics using Redis.
Lists
Lists are good when you are likely to touch only the extremes of the list: near tail, or near head. Lists are not very good to paginate stuff, because random access is slow, O(N).
So good uses of lists are plain queues and stacks, or processing items in a loop using RPOPLPUSH with same source and destination to "rotate" a ring of items.
Lists are also good when we want just to create a capped collection of N items where usually we access just the top or bottom items, or when N is small.
Sets
Sets are an unordered data collection, so they are good every time you have a collection of items and it is very important to check for existence or size of the collection in a very fast way. Another cool thing about sets is support for peeking or popping random elements (SRANDMEMBER and SPOP commands).
Sets are also good to represent relations, e.g., "What are friends of user X?" and so forth. But other good data structures for this kind of stuff are sorted sets as we'll see.
Sets support complex operations like intersections, unions, and so forth, so this is a good data structure for using Redis in a "computational" manner, when you have data and you want to perform transformations on that data to obtain some output.
Small sets are encoded in a very efficient way.
Hashes
Hashes are the perfect data structure to represent objects, composed of fields and values. Fields of hashes can also be atomically incremented using HINCRBY. When you have objects such as users, blog posts, or some other kind of item, hashes are likely the way to go if you don't want to use your own encoding like JSON or similar.
However, keep in mind that small hashes are encoded very efficiently by Redis, and you can ask Redis to atomically GET, SET or increment individual fields in a very fast fashion.
Hashes can also be used to represent linked data structures, using references. For instance check the lamernews.com implementation of comments.
Sorted Sets
Sorted sets are the only other data structures, besides lists, to maintain ordered elements. You can do a number of cool stuff with sorted sets. For instance, you can have all kinds of Top Something lists in your web application. Top users by score, top posts by pageviews, top whatever, but a single Redis instance will support tons of insertion and get-top-elements operations per second.
Sorted sets, like regular sets, can be used to describe relations, but they also allow you to paginate the list of items and to remember the ordering. For instance, if I remember friends of user X with a sorted set I can easily remember them in order of accepted friendship.
Sorted sets are good for priority queues.
Sorted sets are like more powerful lists where inserting, removing, or getting ranges from the the middle of the list is always fast. But they use more memory, and are O(log(N)) data structures.
Conclusion
I hope that I provided some info in this post, but it is far better to download the source code of lamernews from http://github.com/antirez/lamernews and understand how it works. Many data structures from Redis are used inside Lamer News, and there are many clues about what to use to solve a given task.
Sorry for grammar typos, it's midnight here and too tired to review the post ;)
First, you may want to read the Redis benchmark page. It provides a good summary of the main points to check to tune Redis.
Even supposing you do not use pipelining, 300 GETs in 150 ms is not that efficient. It means the average latency is 500 us. However, it actually depends on the size of your objects. Larger objects, more latency. On my very old 2 GHz AMD box, I can measure 150 us latencies for small objects (a few bytes).
To quickly check the average latency of the Redis instance, you can use:
$ redis-cli --latency
Be sure to use a recent Redis version (not 2.4) to get this option.
Note: 2.4 is quite old now, use Redis 2.6 - compile your own Redis version if needed, it is really straightforward.
To quickly run a benchmark to study latency, you can launch:
$ redis-benchmark -q -n 10000 -c 1 -d average_size_of_your_objects_in_bytes
It runs with a unique connection and no pipelining, so the latency can be deduced from throughput. Try to compare the result of these benchmarks to the figures measured with your application.
There are a number of points you may want to check:
- Which Redis client library do you use? With which development language? For some scripting languages, you need to install the hiredis module to get an efficient client.
- Is your machine a VM? On which OS?
- Are the connections to Redis persistent? (i.e. you are not supposed to connect/disconnect at each HTTP request of your app server).
Why is it better with memcached? Well, a single memcached instance is certainly more scalable, and may be more responsive than a single Redis instance, since it may run on multiple threads. Redis is fast, but single-threaded - the execution of all the commands is serialized. So when a command is on-going for a connection, all the other clients have to wait - a bad latency on a given command will also impacts all the pending commands. Generally, at low throughput, performance are comparable though.
At 1000 q/s (a low throughput by Redis or memcached standards), I would say it is more probable your problem is on client-side (i.e. choice of the client library, connection/disconnection, etc ...), than with Redis server itself.
Finally I should mention that if you generate a number of Redis queries per HTTP request, consider pipelining the commands you send to Redis as far as possible. It is really a key point to develop efficient Redis applications.
If your application servers are on the same box as Redis, you can also use unix domain sockets instead of the TCP loopback to connect to Redis. It slightly improves performance (up to 50% more throughput when pipelining is NOT used).
Best Answer
The main reason I see today as an use-case for memcached over Redis is the superior memory efficiency you should be able to get with plain HTML fragments caching (or similar applications). If you need to store different fields of your objects in different memcached keys, then Redis hashes are going to be more memory efficient, but when you have a large number of key -> simple_string pairs, memcached should be able to give you more items per megabyte.
Other things which are good points about memcached:
I believe that Redis as a cache makes more and more sense as people move towards intelligent caching or when they try to preserve structure of the cached data via Redis data structures.
Comparison between Redis LRU and memcached LRU.
Both memcached and Redis don't perform real LRU evictions, but only an approximation of that.
Memcache eviction is per-size class and depends on the implementation details of its slab allocator. For example if you want to add an item which fits in a given size class, memcached will try to remove expired / not-recently-used items in that class, instead to try a global attempt to understand what is the object, regardless of its size, which is the best candidate.
Redis instead tries to pick a good object as a candidate for eviction when the
maxmemory
limit is reached, looking at all the objects, regardless of the size class, but is only able to provide an approximately good object, not the best object with the greater idle time.The way Redis does this is by sampling a few objects, picking the one which was idle (not accessed) for the longest time. Since Redis 3.0 (currently in beta) the algorithm was improved and also takes a good candidates pools across evictions, so the approximation was improved. In the Redis documentation you can find a description and graphs with details about how it works.
Why memcached has a better memory footprint than Redis for simple string -> string maps.
Redis is a more complex piece of software, so values in Redis are stored in a way more similar to objects in a high level programming language: they have associated type, encoding, reference counting for memory management. This makes Redis internal structure good and manageable, but has an overhead compared to memcached which only deals with strings.
When Redis starts to be more memory efficient
Redis is able to store small aggregate data types in a special memory saving way. For example a small Redis Hash representing an object, is stored internally not with an hash table, but as a binary unique blob. So setting multiple fields per object into an hash is more efficient than storing N separated keys into memcached.
You can, actually, store an object into memcached as a single JSON (or binary-encoded) blob, but contrary to Redis, this will not allow you to fetch or update independent fields.
The advantage of Redis in the context of intelligent caching.
Because of Redis data structures, the usual pattern used with memcached of destroying objects when the cache is invalidated, to recreate it from the DB later, is a primitive way of using Redis.
For example, imagine you need to cache the latest N news posted into Hacker News in order to populate the "Newest" section of the site. What you do with Redis is to take a list (capped to M items) with the newest news inserted. If you use another store for your data, and Redis as a cache, what you do is to populate both the views (Redis and the DB) when a new item is posted. There is no cache invalidation.
However the application can always have logic so that if the Redis list is found to be empty, for example after a startup, the initial view can be re-created from the DB.
By using intelligent caching it is possible to perform caching with Redis in a more efficient way compared to memcached, but not all the problems are suitable for this pattern. For example HTML fragments caching may not benefit from this technique.