Clojure
Rich Hickey discusses Clojure features and syntax, example code, functional programming, concurrency semantics, transactions, software transactional memory, agents, implementation and pain points.
- Java,
Tracking change and innovation in the enterprise software development community
Posted by Srini Penchikala on Oct 06, 2008 10:30 AM
Memcached is a distributed memory object caching system used in dynamic web applications to alleviate database load. It is used to speed up dynamic database-driven websites by caching data and objects in memory to reduce the number of times the database must be read. Memcached is based on a hashmap storing key/value pairs. The daemon is written in C, but clients can be written in any language and talk to the daemon via the memcached protocol. But it does not provide any redundancy (e.g. via replication of its hashmap entries); when a server S is stopped or crashes, all key/value pairs hosted by S are lost.
Bela Ban, JGroups and Clustering Team Lead at JBoss, recently wrote a JGroups-based implementation of memcached which allows Java clients to access memcached directly. The implementation is written completely in Java and has few advantages over memcached framework:
The main class to start the JGroups memcached implementation is org.jgroups.demos.MemcachedServer. It creates an L1 cache (if configured), a L2 cache (that's the default hashmap storing all entries), and a MemcachedConnector. The API is very simple and includes the following caching methods:
InfoQ spoke with Bela Ban about the motivation behind JGroups implementation of memcached. He said that JGroups implementation of memcached allows them to experiment with a distributed cache and see how the various caching strategies fit into JBoss Clustering. He also explained how this new memcached implementation compare with JBossCache caching framework:
We see caching as a continuum between distributing data across multiple nodes (hosts) in a cluster (without redundancy) and fully replicating data (total replication of every data item to every cluster node). Between distribution and total replication, we have buddy replication, which replicates data to a few selected backup nodes. This can be compared to RAID, where RAID 0 has no redundancy (distribution), RAID 0+1 has fully redundancy and RAID 5 has partial redundancy.
Currently, the PartitionedHashMap in JGroups provides distribution, and JBossCache provides total replication and partial replication (with Buddy Replication). The idea is to let the user define K *per data item* they place into the cluster, so K=0 means distribution, but if a node which hosts one or more stripes, crashes then the data is gone, to K=X (where X < N) which is RAID 5, to K=N which is total replication.
The memcached implementation in JGroups is a first step to experiment with K=0, which is pure data distribution without redundancy. This will eventually make it into JBossCache.
Where does memcached implementation fit in JBoss Application Server modules?
It will be part of the Clustering subsystem, provided by JBossCache. Note that our implementation is really written with "Java" clients in mind, so we don't have to use that terribly inefficient memcached protocol, with the marshalling/unmarshalling/copying overhead.
Talking about the typical use cases for using JGroups implementation of memcached, Bela said:
The server side code (e.g. servlets) running in a JBoss or Tomcat cluster, which accesses a DB and needs a cache to speed up things and remove a DB bottleneck. The other use case is similar, but instead of accessing a DB, access is to the file system. For example, an HTML page caching server (Squid comes to mind).
Are there any plans to introduce memcached into JBoss Application Server in the future.
Absolutely. The Data Partitioning feature will allow users to configure caching according to their needs. So having something like a distributed cache is not a new feature in itself, but a matter of JBossCache configuration. The cool thing is that this can be dynamic, so developers can decide which redundancy features (none=distribution, full=total replication or partial) they want per data item they put into JBossCache.
Regarding the future direction of the project in terms of new features, Bela listed the things are on the todo list:
JGroups implementation of memcached and its library dependencies can be downloaded on their sourceforge website. Below is the command to launch the program:
java -jar memcached-jgroups.jar
Bela is looking for the feedback from the community. He said this is an experimental feature, but will become a supported feature of JBossCache, and community input will have a great influence on the direction of this feature.
Gnip Case Study: Reliable and Scalable Access to Massive Data Streams from Multiple Sources
Terracotta 10x Faster Than Oracle Coherence
Terracotta Distributed Cache Performance Case Study
I'm a little unclear what this has to do with memcached actually. I think it would be interesting to see support for the terribly inefficient memcached protocol
that the rest of the world seems to like so much. If the ascii protocol is so terrible and inefficient, than maybe the new binary protocol? Then you could open this thing up to non-java clients. Then you've got some real comparison to memcached and its versatility. That; would be interesting.
My worry of the month lately is how Java handles multi-gig heaps when dealing with cache like semantics (i.e. Lots of long lived objects, lots of short lived objects living long enough to get out of eden due to TTL or expiration, only to die)
One of the real selling points of the C memcached is the simple and robust slab memory allocator. Does anybody have any info on how Java's GC compares. I could imagine some cases in which it could do better, since it doesn't have to actually grab and free, but in general, I worry about triggering stop the worlds constantly due to cache churn.
I agree 100%. I know Danga had started an effort to provide a binary protocol, but so far no results.
Yes, the slab allocator is certainly a prominent feature of memcached. Coincidentally, we had this discussion on the JGroups dev mailing list some weeks ago too. I copied the relevant section below:
Correct, but that's a feature of Java versus C in general, and not
PartitionedHashMap in particular.
memcached uses something similar to a buddy memory allocation scheme
([1]), which is great, but they need to make sure they don't waste
memory. For instance, if you always allocate pages of 500 bytes, then
this mechanism is not the best, because the smaller pages won't get
used, and the larger pages are wasted, unless they get fragmented into
smaller ones.
I'll take the stance that, unless you know exactly what the avg size of
your app's memory requirements is, the OS does a better job at
allocating memory and in addition you'll benefit from future
improvements in the mem allocator code of your OS.
memcached probably shines when you know exactly what the memory pages
sizes are and you change the src to accommodate that.
I'd also claim that even with GC, this is very useful, because the few
GC cycles are a good tradeoff against having to go to the DB.
Note that we could implement something like memcached's memory
allocator, too: grab direct memory (ByteBuffer.allocateDirect() /
MappedByteBuffer), divide it up into lists of fixed sizes (buddy pools)
and then use those buffers to store data. Direct memory is allocated
outside of the Java heap, so it will never get garbage collected, but
TBH I'm not sure this is a good idea. I mean, we're coming back to Java
versus C here. There's a reason I switched and a big part is garbage
collection and the avoidance of dangling pointers.
I've attached the doc describing memcached's memory allocation strategy.
[1] en.wikipedia.org/wiki/Buddy_memory_allocation
Hanson Char wrote:
> One of the major benefits of using the native memcached is that unlike
> a JVM, GC (full GC in particular) can be entirely avoided. Wouldn't
> that benefit be lost if a memcached impl is done entirely in Java ?
I'm sure the JVM guys have explored plain slab allocation before. The trouble is how do you handle fragmentation in a multi-threaded environment without compactions? That's probably why there are some many GC settings to accommodate different application requirements.
A few months ago, I had written a small blog entry related to allocators - javaforu.blogspot.com/2008/05/memory-dont-forge...
You might also find this interesting - blogs.sun.com/jonthecollector/entry/our_collectors.
JVMs also have Thread level allocation buffers (TLAB) - conceptually similar to Arena allocators found in Google's TCMalloc and other such Malloc alternatives.
Ashwin.
Rich Hickey discusses Clojure features and syntax, example code, functional programming, concurrency semantics, transactions, software transactional memory, agents, implementation and pain points.
We introduce the concept of Composite Oriented Programming, and show how it avoids the issues with OOP and reignites the hope of being able to compose domain models with reusable pieces.
Dan Farino talks about the system architecture and the challenges faced when building a very large online community. Dan explains how a .NET product scales on hundreds of servers.
Alan Shalloway, CEO and founder of Net Objectives, presents the Lean software development principles and practices and how they can benefit to Agile practitioners.
Bernd Mathiske discusses Maxine VM, Java compatibility, swapping major VM components, research areas, Object handling, code examples, optimizing compiler, snippets, bytecode generation, JNI and JIT.
Joe Armstrong speaks on various aspects of the Erlang language, presenting its roots, how it compares with other languages and why it has become popular these days.
The java double-check singleton pattern is not thread safe and can’t be fixed. In this article, Dr. Alexey Yakubovich provides an implementation of the Singleton pattern that he claims is thread-safe.
Diana and Jim talk about patterns observed in CTOs' activity. CTOs emerge as real people caring for other people in their organization, and are put under a lot of pressure and constraints.
5 comments
Reply