Os Riak2 PDF

download Os Riak2 PDF

of 12

Transcript of Os Riak2 PDF

  • 7/29/2019 Os Riak2 PDF

    1/12

    Copyright IBM Corporation 2012 Trademarks

    Introducing Riak, Part 2: Integrating Riak as a heavy-

    duty caching server for web applications

    Page 1 of 12

    Introducing Riak, Part 2: Integrating Riak as a

    heavy-duty caching server for web applications

    Using Riak as a caching server to help alleviate the load on

    application and database servers

    Simon Buckle

    Independent Consultant

    Freelance

    Skill Level: Intermediate

    Date: 15 May 2012

    This article is Part 2 of a two-part series about Riak, a highly scalable, distributed

    data store written in Erlang and based on Dynamo, Amazon's high availability

    key-value store. For websites with heavy loads, a scalable caching solution

    can lighten the load on the application and database servers. This particularly

    applies to data that is read often but updated only occasionally. Explore an in-

    depth example of an online betting site and how you can use Riak to implement a

    caching solution. You also will learn to integrate Riak with an existing website and

    look at other Riak features such as search and how to use it to directly serve user

    requests. You will need a working Riak cluster if you want to follow along with theexamples. You can find the steps for setting up a cluster locally in Part 1 of this

    series.

    View more content in this series

    IntroductionCertain types of data exhibit access patterns that lend themselves to be cached. For

    example, online betting sites have an interesting load characteristic: odds and bet

    slips get requested often but are updated relatively infrequently.

    Other articles in this series

    View more articles in the Introducing Riak series.

    These situations need a highly scalable system with the following characteristics to

    cope with the demands of high loads:

    The system acts as a reliable cache to reduce demand on the application

    servers and database

    http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=Introducing+Riakhttp://www.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=Introducing+Riakhttp://www.ibm.com/developerworks/library/os-riak1/index.htmlhttp://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml
  • 7/29/2019 Os Riak2 PDF

    2/12

    developerWorks ibm.com/developerWorks/

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 2 of 12

    Cached items are searchable so you can update or invalidate them

    Any solution is easily integrated into an existing site

    Riak is a good choice for such a solution.

    Riak is not the only candidate for implementing such a caching solution; many

    different caches are available. A popular one is memcached; however, unlike Riak,

    memcached doesn't provide any kind of data replication, meaning that if the server

    holding a particular item goes down that item becomes unavailable. Redis, another

    popular key/value store that could be used as a cache, supports replication through

    a master-slave configuration; Riak has no concept of a master (node), therefore

    making the system resilient to failure.

    Website integrationAny solution needs to be easily integrated into an existing website. It is important

    to be able to do this, as it might not be possibleor even desirableto migrate all

    of your existing data into Riak. As mentioned previously, certain types of data lendthemselves to caching, particularly, in the case of a key/value store if you access that

    data with a primary key. That is the kind of data that is more suitable to migrate to

    Riak.

    As mentioned in Part 1 of this series on Riak, a number of client libraries are

    available in languages such as PHP, Ruby, and Java; the libraries provide an API

    that makes integrating with Riak very simple. In this example, I demonstrate the use

    of the PHP library to show how to integrate Riak with an existing website.

    Figure 1 shows the set-up to consider for this example. I left out details such as loadbalancing, firewall, and so on. The servers themselves, in this case, are just simple

    front-end boxes with a LAMP stack installed.

    I will assume that Riak is only used internally (it's not accessible from the outside)

    and that it runs in a non-hostile environment, so there are no security related issues

    such as authentication. This is not such a bad assumption to make as it might seem,

    as Riak does not have any built-in authorization anyway; you really should delegate

    authentication and the like to the application.

    Figure 1. A simple website integration

    http://www.ibm.com/developerworks/library/os-riak1/index.html
  • 7/29/2019 Os Riak2 PDF

    3/12

    ibm.com/developerWorks/ developerWorks

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 3 of 12

    What follows is a basic example of how you might integrate Riak into your existing

    website. You will create a simple form, that when submitted, will use the PHP client to

    store an object in Riak based on the values that were entered in the form.

    Figure 2 shows an example of a simple form that an administrator might use to create

    a bet entry in the system. Create this form in HTML and have it do a POST to the PHPscript in Listing 1; you can use a similar form in the source code that accompanies

    this article as a starting point. The "key" field entered in the form will be used as the

    key to store the object under in the bucket.

    Figure 2. Example form for creating a bet

    Listing 1 has example PHP code that shows how to use the PHP client library

    to integrate with Riak. Change the path to the PHP client libraryspecified in

    require_onceto wherever you have installed it. In this case, I just put it in the

    same directory as the PHP script. By default, all the client libraries expect Riak to be

    available on port 8098.

    Listing 1. Example PHP code for integrating with Riak

    Save the code to a PHP file (call it whatever you like) and upload it and the form

    to some location on your website, For example, http://www.yoursite.com/riak-

  • 7/29/2019 Os Riak2 PDF

    4/12

    developerWorks ibm.com/developerWorks/

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 4 of 12

    test.php. Fill out the example form and submit it. To prove it did work, try to retrieve

    the item directly from Riak using the key you entered in the form to create the item

    (see Listing 2).

    Listing 2. Retrieving the item from Riak

    $ curl -i http://localhost:8098/riak/odds/...

    { "odds":"", "description":"" }

    Although this integration example used the PHP client, the approach is similar for

    other languages or application frameworks such as Java or Ruby on Rails.

    Serving requests directlyIn addition to using the client libraries to integrate Riak into your current set-up, it's

    possible to serve user requests directly from Riak, using it as a simple HTTP engine.

    To demonstrate this, I will create a simple demo to show how you can request pages

    directly from Riak.

    Download the source code for this article. Make sure Riak is running then execute

    the script load.sh. This script will copy all the HTML and JavaScript files into a bucket

    called demo. This example uses the JavaScript client.

    To view the demo, open up this URL in your browser: http://localhost:8098/riak/

    demo/demo.html

    If you enter some values in the form to create a bet and you submit the form, a JSON

    object is stored in Riak. The properties of the object will correspond to the fields in

    the form. You will be redirected to a page that displays the value of the object you justcreated.

    Listing 3 shows the code for creating the object from the values you entered. The

    values key, odds, and description come from the values entered into the form.

    Listing 3. Example use of the JavaScript client library in Riakclient.bucket("odds", function(bucket) {

    var key = $('#key').val();

    bucket.get_or_new(key, function(status, object) {

    object.contentType = 'application/json';

    object.body = { 'odds': $('#odds').val(), 'description': $('#desc').val() };

    object.store(function(status, object, request) {if (status == 'ok') {

    window.location = "http://localhost:8098/riak/odds/"+key;

    } else {

    alert("Failed to create object.");

    }

    });

    });

    });

    As mentioned previously, I assume that Riak is running in a trusted environment. In

    this case there's no security issue from adding pages that store and retrieve items

  • 7/29/2019 Os Riak2 PDF

    5/12

    ibm.com/developerWorks/ developerWorks

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 5 of 12

    in Riak; however, you don't want to expose this kind of functionality to the Internet at

    large without having some form of authentication in place.

    Although it's a simple example, it gives you an idea how Riak can serve page

    requests directly. You could, for example, include data stored in Riak directly in

    your existing web pages either by using a technique such as JSONP or cross-originresource sharingAJAX requests are restricted to the same server the page resides

    on by a same domain policyor by proxying requests through your servers to Riak,

    to fetch the required data.

    Using Riak as a cache

    Caches are used to provide fast access to data. If requested data is contained in the

    cache (cache hit), the application can serve the request quickly by reading the value

    from the cache, comparatively quicker than retrieving the value from a database. If

    something is not in the cache (cache miss), then the application typically has to hit

    the database to retrieve the data. Generally, the more requests that you can servefrom the cache, the faster the system will be. Riak has a number of features that

    make it a good choice for implementing a caching solution.

    One such feature of Riak is its pluggable storage back-end; the storage back-end

    determines how the data is stored. There are several available, but I'm not going to

    cover them all here (see Resources for more information). The default storage back-

    end is Bitcask, an Erlang application that provides an API for storing and retrieving

    data backed by a hash table, which provides fast access to data; data is persisted.

    One back-end is perhaps more relevant for this article: the Memory back-end. TheMemory back-end uses an in-memory table to store all of its data (internally it uses

    Erlang's ets tables) and, when enabled, makes Riak behave like an LRU cache with

    timed expiry. The advantage of using an in-memory store is that it is significantly

    faster than if you have to go to disk to retrieve the data. When the data is stored in

    memoryit's not persistedand a node goes down, the data stored in that node will

    be lost. As you use it as a cache this is less of an issuethe application can always

    retrieve the data from the databaseas it would be if you used Riak as your primary

    data store. Riak replicates the data across several nodes in the cluster, so it will still

    be available.

    Riak ships with the Memory back-end included. To use the Memory back-end, openapp.config for each node in the cluster, locate the property storage_backend and

    change it from riak_kv_bitcask_backend to riak_kv_memory_backend . Now add the

    code in Listing 4 to the end of the file.

    Listing 4. Using the Memory back-end

    {memory_backend, [

    {max_memory, 4096}, %% 4GB of memory

    {ttl, 86400} %% Time in seconds

    ]}

  • 7/29/2019 Os Riak2 PDF

    6/12

    developerWorks ibm.com/developerWorks/

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 6 of 12

    Change the values to whatever is appropriate for your set-up. Restart the nodes in

    the cluster.

    It's also possible to run multiple storage back-ends within a Riak cluster. This is

    useful as it means it's possible to use different back-ends for different buckets. Forexample, you could configure a bucket (let's call it cache) to use the Memory back-

    end, but for the other bucketsthose that should persist the datato use, say,

    Bitcask.

    Now that you have Riak set-up to behave like a cache, you need some way to access

    the data in the cluster to either update it or possibly invalidate it for some reason

    (before its expiry time).

    Looking for something?As you have already seen, to retrieve data stored in Riak when using the HTTP

    interface, you construct a URL consisting of the bucket name and the key of the

    object you want to retrieve then do an HTTP GET on that URL. This is perfectly

    adequate when you know what the key is! However, sometimes you either don't know

    the key of the object you want to retrieve, or you want to retrieve a set of objects

    satisfying certain criteria. Then you need a way to search for objects held in the

    cluster.

    You have already seen how to query data by running a Map/Reduce job over

    documents that are stored in the cluster. The time taken to execute the query will,

    in general, be proportional to the number of documents in the cluster; the more

    documents, the longer it takes to query those documents. This is not a problem forqueries that are not time sensitive. By this, I mean queries where the user does

    not expect to get a reply instantly. For something like search, it's not feasible to

    (dynamically) search all of the documents every time; it could take minutes or hours

    to get the results back!

    Fortunately Riak already has a solution to this problem: Riak Search. Riak Search

    provides the functionality you need to search documents stored across your cluster.

    The subject of search is too great to go into in any depth in this article but at a high

    level it works like this: Documents are tokenized (Riak Search uses standard Lucene

    analysers) and added to an inverted index. This index is then queried based on thesearch terms a user enters. As new documents are added, they too are indexed and

    added to the index.

    Riak Search is disabled by default. Before you can use it you need to enable it.

    For each node in your cluster, open up rel/riakN/etc/app.config, locate the property

    riak_search and set it to true. You will need to restart the nodes in the cluster.

    Riak allows you to specify the name of a function to run before and after a document

    is added to a bucket through the use of pre- and post- commit hooks. For example,

  • 7/29/2019 Os Riak2 PDF

    7/12

    ibm.com/developerWorks/ developerWorks

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 7 of 12

    you might want to check that a document has particular required fields before adding

    it to the cluster. To search a document, it needs to be indexed. To do this, install a

    pre-commit hook on the bucket where the documents are stored. To do that, run the

    following command: $ rel/riak/bin/search-cmd install

    This will install a pre-commit hook riak_search_kv_hook on the bucket. Now,whenever a document is added to that bucket, it is analyzed and added to the index.

    The whitespace analyser is the default analyser; it processes characters into tokens

    based on whitespace, which then get indexed. A number of different analysers are

    available and you can also define your own.

    In many cases, Riak Search knows how to index your data. For example, out-of-the-

    box, if a JSON object is added to a bucket, the value of each property will be indexed

    and can be queried using the property name in the query string. See the search

    example in Listing 5. For more complicated structures it's possible to define your own

    schema that tells Riak Search how to index your data.

    When you have some documents indexed you need to be able to issue queries

    against them. One way is to run a query from the Erlang shell. For example, the

    query in Listing 5 searches the odds bucket for all bets that are related to horse

    racing; you do this by querying the description property of the stored item.

    Listing 5. Searching the odds bucket for bets related to horse racing

    $ rel/riak/bin/riak attach

    search:search(, ).

    In addition, Riak Search also provides a Solr-compatible HTTP API for searching

    documents. Apache Solr is a popular enterprise search server with a REST-like

    API. By making the API compatible with Solr it should be possible to switch out

    Solrif you use itand use Riak Search to power your searches instead. For

    example, to search for the odds for a particular event using the Solr interface, you

    would do something like this: $ curl "http:localhost:8098/solr/odds/select?

    start=0&q=description:horse"

    With search set-up, you now can locate items in the data store without knowing the

    primary key of the items you are looking for.

    Conclusion

    Other articles in this series

    View more articles in the Introducing Riak series.

    Riak's ability to scale and reliably replicate dataplus other features such as search

    makes it an ideal choice to implement a caching solution for heavy-load sites. You

    can easily integrate it into an existing site. With its ability to serve requests directly,

    http://www.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=Introducing+Riak
  • 7/29/2019 Os Riak2 PDF

    8/12

    developerWorks ibm.com/developerWorks/

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 8 of 12

    you can use Riak to reduce and eliminate the load on the application and database

    servers.

  • 7/29/2019 Os Riak2 PDF

    9/12

    ibm.com/developerWorks/ developerWorks

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 9 of 12

    Downloads

    Description Name Size Download

    method

    Article source code riakpt2sourcecode.zip 85KB HTTP

    Information about download methods

    http://www.ibm.com/developerworks/library/whichmethod.htmlhttp://www.ibm.com/developerworks/apps/download/index.jsp?contentid=814646&filename=riakpt2sourcecode.zip&method=http&locale=
  • 7/29/2019 Os Riak2 PDF

    10/12

    developerWorks ibm.com/developerWorks/

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 10 of 12

    Resources

    Learn

    Part 1: The language-independent HTTP API: Store and retrieve data using

    Riak's HTTP interface (Simon Buckle, developerWorks, March 2012): Read this

    introduction to Riak that covers the basics of storing and retrieving items in Riak

    using its HTTP API.

    Read the Riak Search wiki page to learn more about how it works.

    See what storage back-endsRiak provides and how they differ from each other.

    Get a list of available client libraries for integrating with Riak.

    See Basic Cluster Setup and Building a Development Environment for more

    detailed information on setting-up a 3-node cluster.

    Read Google's MapReduce: Simplified Data Processing on Large Clusters.

    Read Introduction to programming in Erlang (Martin Brown, developerWorks,

    May 2011) and learn about Erlang and how its functional programming stylecompares with other programming paradigms such as imperative, procedural

    and object-oriented programming.

    Read Amazon's Dynamo paper on which Riak is based. Highly recommended!

    See the article How To Analyze Apache Logs to learn how you can use Riak to

    process your server logs.

    Get an explanation of vector clocks and why they are easier to understand than

    you might think.

    Find a good explanation of vector clocks and more detailed information on link

    walking on the Riak wiki.

    The Project Gutenberg site is a great resource if you need some text resourcesfor experimenting.

    The Open Source developerWorks zone provides a wealth of information on

    open source tools and using open source technologies.

    developerWorks Web development specializes in articles covering various web-

    based solutions.

    Stay current with developerWorks technical events and webcasts focused on a

    variety of IBM products and IT industry topics.

    Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM

    products and tools, as well as IT industry trends.

    Watch developerWorks on-demand demos ranging from product installation

    and setup demos for beginners, to advanced functionality for experienced

    developers.

    Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on

    developerWorks.

    Get products and technologies

    Evaluate IBM products in the way that suits you best: Download a product trial,

    try a product online, use a product in a cloud environment, or spend a few hours

    http://www.ibm.com/developerworks/downloads/index.htmlhttp://www.ibm.com/developerworks/downloads/index.htmlhttp://search.twitter.com/search?q=%23linux+from%3Adeveloperworks+-RT+http://search.twitter.com/search?q=%23linux+from%3Adeveloperworks+-RT+http://www.twitter.com/developerworks/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/web/http://www.ibm.com/developerworks/web/http://www.ibm.com/developerworks/web/http://www.ibm.com/developerworks/opensource/http://www.ibm.com/developerworks/opensource/http://www.ibm.com/developerworks/opensource/http://www.gutenberg.org/http://www.gutenberg.org/http://wiki.basho.com/Links-and-Link-Walking.htmlhttp://wiki.basho.com/Vector-Clocks.htmlhttp://basho.com/blog/technical/2010/01/29/why-vector-clocks-are-easy/http://basho.com/blog/technical/2010/01/29/why-vector-clocks-are-easy/http://www.simonbuckle.com/2011/08/27/analyzing-apache-logs-with-riak/http://www.simonbuckle.com/2011/08/27/analyzing-apache-logs-with-riak/http://www.ibm.com/developerworks/opensource/library/os-erlang1/index.htmlhttp://research.google.com/archive/mapreduce.htmlhttp://research.google.com/archive/mapreduce.htmlhttp://wiki.basho.com/Client-Libraries.htmlhttp://wiki.basho.com/Client-Libraries.htmlhttp://www.ibm.com/developerworks/library/os-riak1/index.htmlhttp://www.ibm.com/developerworks/downloads/index.htmlhttp://search.twitter.com/search?q=%23linux+from%3Adeveloperworks+-RT+http://search.twitter.com/search?q=%23linux+from%3Adeveloperworks+-RT+http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/web/http://www.ibm.com/developerworks/opensource/http://www.gutenberg.org/http://wiki.basho.com/Links-and-Link-Walking.htmlhttp://wiki.basho.com/Links-and-Link-Walking.htmlhttp://wiki.basho.com/Vector-Clocks.htmlhttp://basho.com/blog/technical/2010/01/29/why-vector-clocks-are-easy/http://www.simonbuckle.com/2011/08/27/analyzing-apache-logs-with-riak/http://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlhttp://www.ibm.com/developerworks/opensource/library/os-erlang1/index.htmlhttp://research.google.com/archive/mapreduce.htmlhttp://wiki.basho.com/Building-a-Development-Environment.htmlhttp://wiki.basho.com/Client-Libraries.htmlhttp://wiki.basho.com/Storage-Backends.htmlhttp://wiki.basho.com/Riak-Search.htmlhttp://www.ibm.com/developerworks/library/os-riak1/index.htmlhttp://www.ibm.com/developerworks/library/os-riak1/index.html
  • 7/29/2019 Os Riak2 PDF

    11/12

    ibm.com/developerWorks/ developerWorks

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 11 of 12

    in the SOA Sandbox learning how to implement Service Oriented Architecture

    efficiently.

    Discuss

    Check out developerWorks blogs and get involved in the developerWorkscommunity.

    Get involved in the developerWorks community. Connect with other

    developerWorks users while exploring the developer-driven blogs, forums,

    groups, and wikis.

    http://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/downloads/soasandbox/index.html
  • 7/29/2019 Os Riak2 PDF

    12/12

    developerWorks ibm.com/developerWorks/

    Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications Page 12 of 12

    About the author

    Simon Buckle

    Simon Buckle is an independent consultant. His interests includedistributed systems, algorithms, and concurrency. He has a Masters

    Degree in Computing from Imperial College, London. Check out his

    website at simonbuckle.com.

    Copyright IBM Corporation 2012

    (www.ibm.com/legal/copytrade.shtml)

    Trademarks

    (www.ibm.com/developerworks/ibm/trademarks/)

    http://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtmlhttp://simonbuckle.com/