<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>LINUX For You &#187; Technology</title> <atom:link href="http://www.linuxforu.com/category/previews/technology/feed/" rel="self" type="application/rss+xml" /><link>http://www.linuxforu.com</link> <description>The Complete Magazine on Open Source</description> <lastBuildDate>Tue, 31 Jan 2012 17:22:40 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=</generator> <xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" /> <item><title>Voices Across the Digital Divide &#8212; Using Audio Portals to Connect Communities</title><link>http://www.linuxforu.com/2012/01/digital-divide-audio-portals-connect-communities/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=digital-divide-audio-portals-connect-communities</link> <comments>http://www.linuxforu.com/2012/01/digital-divide-audio-portals-connect-communities/#comments</comments> <pubDate>Tue, 31 Jan 2012 12:08:24 +0000</pubDate> <dc:creator>Arjun Venkatraman</dc:creator> <category><![CDATA[Blogs]]></category> <category><![CDATA[For You & Me]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[audio portal]]></category> <category><![CDATA[broadband]]></category> <category><![CDATA[Cathedral And The Bazaar]]></category> <category><![CDATA[cell phones]]></category> <category><![CDATA[CGNet]]></category> <category><![CDATA[citizen media]]></category> <category><![CDATA[communication technology]]></category> <category><![CDATA[content management system]]></category> <category><![CDATA[Democracy]]></category> <category><![CDATA[Developing country]]></category> <category><![CDATA[Digital Divide]]></category> <category><![CDATA[free speech]]></category> <category><![CDATA[FreedomFone]]></category> <category><![CDATA[GitHub]]></category> <category><![CDATA[GSM]]></category> <category><![CDATA[healthcare]]></category> <category><![CDATA[India]]></category> <category><![CDATA[Interactive Voice Response]]></category> <category><![CDATA[Internet penetration]]></category> <category><![CDATA[Internet users]]></category> <category><![CDATA[IVR]]></category> <category><![CDATA[journalists]]></category> <category><![CDATA[LFY January 2012]]></category> <category><![CDATA[mainstream media]]></category> <category><![CDATA[mass media]]></category> <category><![CDATA[MIT]]></category> <category><![CDATA[mobile phones]]></category> <category><![CDATA[open medium]]></category> <category><![CDATA[PBX system]]></category> <category><![CDATA[SMS]]></category> <category><![CDATA[Speech-recognition technology]]></category> <category><![CDATA[Swara]]></category> <category><![CDATA[text-to-speech system]]></category> <category><![CDATA[voice recognition]]></category> <category><![CDATA[Web frameworks]]></category> <category><![CDATA[Web interface]]></category> <category><![CDATA[Web platforms]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=9464</guid> <description><![CDATA[Human beings are the only species on earth with the ability to communicate complex ideas through language. Speaking and listening have been the basis of human society since people started living in communities....]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Opening-Image.jpg?d9c344" alt="A communication channel" title="A communication channel" width="350" height="240" class="alignright size-full wp-image-9469" /><div class="introduction">Human beings are the only species on earth with the ability to communicate complex ideas through language. Speaking and listening have been the basis of human society since people started living in communities. In fact, the words &#8220;community&#8221; and &#8220;communication&#8221; share a common etymology.</div><p>Democracy, as a system, is completely dependent on communication, to the extent that when communication breaks down, so does the democratic process. In order for a group of people to participate equally in democracy, they must necessarily share a communication platform, where they can share not just facts, but also views and opinions. Small wonder then, that free speech is prized and cherished by all democracies, and coveted by citizens of almost all countries that are yet to become  democracies.</p><p>One of the fundamental requirements of free speech and participation in democracy is the availability of a free, open medium and platform of communication that is equally accessible by all members of the democratic community. Almost every culture in the world has a concept of a central community gathering place, where people gather after a day&#8217;s work, to talk and share information.</p><p>In India, this is typically the village <em>chaupal</em>, in West Kalimantan (erstwhile Borneo), Indonesia, it&#8217;s called a <em>ruai</em>. In Afghanistan, it may be called a <em>chaikhana</em>. These community structures have traditionally provided the common platform and free medium for communication.</p><p>This type of platform is structured like a circle, and the free medium is air. In a circular structure, everyone has an equal say, because everyone has equal access to the medium and equal reach to every other member of the platform. No special equipment is required to use this medium; ears and a mouth will typically suffice. These structures provided a way for people to voice their opinion, share their concerns and find solutions to conflict through dialogue.</p><p>After the industrial revolution and the dawn of the corporation, mass media began to play this role in people&#8217;s lives. Newspapers, radio and television became the new media that people used. These media had a much wider reach and they seemed like the perfect democratic tool. However, these media have a structural problem that prevents them from being truly democratic. By virtue of corporate and editorial hierarchy, these media are structured like a triangle (Figure 1).</p><div id="attachment_9468" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-1-590x411.jpg?d9c344" alt="Media communication" title="Media communication" width="590" height="411" class="size-large wp-image-9468" /><p class="wp-caption-text">Figure 1: Media communication</p></div><p>News, in this model, travels downwards from an elite minority that determines what content is &#8220;newsworthy&#8221; to the community. The community typically cannot relate the incoming news to their own lives, and either becomes disenfranchised by virtue of lack of representation, or assumes the media version of facts to be true, and that they themselves are an anomaly. At the very least, this influences their participation in democracy, and at worst, they are rendered voiceless in that most fundamental democratic process &#8212; debate.</p><p>This hierarchical model of modern commercial media requires profits for the media organisation to continue to run. This means that news needs to sell. If a newspaper cannot generate advertising revenue, it will soon shut down. Obviously, with profit as the first imperative, relevance of the content to the community and their feedback must become secondary. Moreover, there is an incentive in preventing communication technology from reaching its true potential. For example, if community radio became fully deregulated, would commercial radio or, for that matter, television, stand a chance?</p><p>This skewed set of incentives, and the resulting policies and actions, has led to several communities across the world, particularly in the developing world, becoming alienated and disenfranchised with mainstream society. These communities are particularly susceptible to coercion and this might partly explain the escalating violence in the world today.</p><p>This conundrum should be quite familiar to open source enthusiasts, since the basic principles involved are much the same as the ones in the open source vs closed source software debate. To draw a parallel from <em>The Cathedral And The Bazaar</em>, mainstream media follows the cathedral model, while community platforms are more like bazaars. Both paradigms have their value and importance in the structure of society at large. However, in the context of media, the cathedral or top-down model appears to have reached its limits of effectiveness &#8212; and, in my opinion, has passed the point of diminishing returns.</p><p>The growth of user-generated content on the Internet over the last decade is a clear indicator that as connectivity improves, people are increasingly eager to directly voice their opinions and concerns without the need of mainstream media as an intermediary, particularly since in the real world, no intermediary is perfectly impartial.</p><h2>The developing world</h2><p>In the developing world, this uprising of citizen media has been stunted by the uneven distribution of resources, such as infrastructure, connectivity and literacy. While connectivity in the developed world has allowed the blogosphere to become a political force to contend with, most developing countries have an Internet penetration of less than 10 per cent, typically concentrated in urban areas.</p><p>Even where connectivity exists, the vast majority of users are only just starting to view the Internet as anything more than email and instant messaging. In many of these countries, even as economies have opened up and globalisation has settled in, entire communities are still disconnected from the rest of the world, primarily because they do not represent a market segment worthy of media representation.</p><p>Mainstream media in these countries typically focus on urban issues that relate to economic and political decision makers, rather than the vox populi.</p><p>In several of these countries, however, innovation is now taking place to bridge this gap by other means. While Internet penetration remains low, the use of mobile phones is a different story altogether. Most of the developing world has far outpaced the developed world in terms of mobile phone adoption and versatility of usage. Even in places where people earn less than a dollar a day, cell phones are ubiquitous. A medium that uses voice, the oldest mode of communication known to man, amplified by several orders of magnitude, so as to cover unimaginable distances, is as irresistible to a Gond tribal in Chhattisgarh, India, as it is to a street food vendor in Jakarta, Indonesia.</p><p>Recognising the potential of this medium, several groups are now actively engaged in developing technology to allow people to use their voice to connect themselves and their communities to the rest of the world. One of the first tools of this new age of innovation is the audio portal.</p><h2>An audio portal?</h2><p>An audio portal (Figure 2) is essentially a website with a lot of audio content that can be accessed both through the Web as well as by phone.</p><div id="attachment_9467" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-2.jpg?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-2-590x535.jpg?d9c344" alt="An audio portal" title="An audio portal" width="590" height="535" class="size-large wp-image-9467" /></a><p class="wp-caption-text">Figure 2: An audio portal</p></div><p>While the Web interface is usually like a blog, the phone interface is an IVR (Interactive Voice Response) system, where users press keys to navigate through menus and content. In more advanced IVR systems, voice recognition may be used, though this is still limited to the well-documented accents of the English language. The Web interface is very similar to a blog, and several audio portals do use the blog layout.</p><p>Behind the scenes, the platform will also provide an interface to manage posts. Early implementations of audio portals tended to rely on specialised moderation consoles, which have media-previewing capabilities as well as functionality for moderators to add metadata, such as a summary and title, to the content to make it friendlier to users on the Web.</p><p>Users will typically call the IVR interface to record and listen to content using their cell phones, while Web users will access the website interface to listen to the audio posts using a browser, and leave comments in text, which then may or may not be converted to audio using a text-to-speech system.</p><p>People who own the latest Android or iPhone may find the idea of an IVR interface to browse content somewhat counter-intuitive, since it makes no sense to call in and scroll through a set of menus, particularly with an irritatingly monotonic voice rattling out instructions all the time, when you can simply open the Web page on your cell phone&#8217;s browser, and read.</p><p>The graph in Figure 3 may help clarify why a purely visual interface is not adequate to reach the majority of the world.</p><div id="attachment_9470" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Figure-3.png?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Figure-3-590x402.png?d9c344" alt="Cell phone and broadband users" title="Cell phone and broadband users" width="590" height="402" class="size-large wp-image-9470" /></a><p class="wp-caption-text">Figure 3: Cell phone and broadband users</p></div><p>The percentage of Internet users, even among the mobile phone users of the world, is a fraction of the percentage of people using their phones purely for voice and SMS. While mobile Internet use is, and will continue to be, on the rise, the bulk of the world will continue to be on voice for some time to come.</p><p>This is also historically consistent, since most societies have far stronger oral traditions than written ones. Voice captures much more than simply language. Tone, quality, emotion are all interwoven in the spoken word. If a picture is worth a thousand written words, then a spoken word counts for at least a few hundred&#8230; not to mention that drawing an attractive picture takes considerably more skill than speaking!</p><p>What makes mobile phones particularly attractive as a medium, though, is the two-way nature of the medium. With radio and television, though the reach may be much wider than mobile phones, the ability to respond immediately to what you hear or see &#8212; on the same platform, at the same level as the source, which is extremely valuable in fostering dialogue &#8212; is missing.</p><p>The audio portal concept caters to every cell phone, whether mass-market or smartphone equally, which works very well to level the platform. Most importantly, audio portals use technology, skills and other resources that are available now, as opposed to those that require extensive &#8220;capacity building&#8221; exercises. This is probably the reason why audio portals, as a tool, find more favour with grassroots workers and members of the community, rather than with technology evangelists and academia.</p><h2>The technology</h2><p>Audio portals utilise relatively simple technology, most of which has been around in the open source world for some time. An audio portal will typically consist of a phone interface (either fixed-line or mobile), connected to a content-management system (usually a database) and a Web front-end, via an IVR running on a soft switch or software PBX system. Two examples of audio portal platforms are Swara and FreedomFone.</p><h2>Swara</h2><p>Swara is an open source project, originally written as a research project by students and professors at MIT to augment the outreach and activities of CGNet, a people&#8217;s discussion group working with indigenous communities in central India. CGNet was started by veteran journalist Shubhranshu Choudhary, who returned to Central India, where he grew up, to find it torn by violence. Probing to find the reason for the conflict, he quickly realised that open, accessible community media would be a key component of any solution to the conflict. Given that Internet penetration in the region is less than 1 per cent, and community radio is limited by regulation, the next best medium for a community platform was the mobile phone.</p><p>The first pilot of Swara was deployed in Bengaluru for use by indigenous communities in Chhattisgarh and neighbouring states in February 2010. Today, the pilot receives over 300 calls a day, and the team is working on building the platform out as an open source project for deployment in other locations. The first replica of the project went live in Indonesia in December 2011.</p><p>Swara uses a combination of the Asterisk PBX system in combination with the LoudBlog audio blogging platform, with the integration written in Python. The tested interfaces are GSM gateways (Topex Mobilink, etc) and fixed lines (PRI/BRI) using a Digium telephony card.</p><h2>FreedomFone</h2><p>FreedomFone was developed by Alberto Escudero Pascual and Louise Berthilson of IT46, a Swedish IT consultancy, for the Kubatana Trust in Zimbabwe. It was created for many of the same reasons as Swara was developed in India, i.e., lack of impartial and open commercial media, and the need for local and community-level reporting. The FreedomFone pilot, a weekly audio magazine called Inzwa, has been running in Zimbabwe since July 2009, and received over 2,500 calls between July and September 2009. FreedomFone&#8217;s team is also working on developing the platform as a user-friendly DIY IVR kit, and is keen on replicating the model in other areas.</p><p>FreedomFone uses the FreeSWITCH soft switch to interface with telephony devices such as the Mobigater and Office Router GSM gateways. The content management system is written in CakePHP, and FreedomFone additionally uses the Cepstral speech synthesis system for text-to-speech conversions. The stated objective is to create a purely phone-accessible platform.</p><h2>Deployment 101</h2><p>Both platforms have an almost identical design, as would most audio portal software. This is almost analogous to how traditional websites are built, with the choice of platform being similar to the choice between different Web frameworks. Just as you will find lots of different opinions and preferences for Web platforms among Web designers, you will find that the few implementers of audio portals are just as varied in their preferences for platforms. This usually depends on which platform the implementer is most familiar with &#8212; and if you are implementing your own, one is essentially as good as the other.</p><p>The key question, irrespective of which platform you use, is one of deployment strategy. At present, most implementations of audio portals as community media platforms are centralised instances deployed by a single organisation or group, with a specific agenda (such as news, healthcare or governance).</p><h2>Centralised function-oriented deployment</h2><p>Centralised, function-oriented deployments require content of a certain quality and, as a result, must usually be moderated. Speech-recognition technology, particularly in the area of automatic transcription, is still a far cry from being very accurate. As a result, moderating a function-specific audio portal is still a manual job, for the most part.</p><p>Typically, audio portal moderators will need to listen to each message and summarise and/or transcribe it. Beyond transcription, there may be more work to do to improve the quality of the content for the specific purpose of the deployment, like sound quality clean-ups and edits, fact verification (if journalism is the function, for example) and categorisation. All of this work is further exacerbated in a centralised deployment, since all incoming calls come to the same central hub (see Figure 4).</p><div id="attachment_9466" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-4.jpg?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-4-590x624.jpg?d9c344" alt="Centralised deployments" title="Centralised deployments" width="590" height="624" class="size-large wp-image-9466" /></a><p class="wp-caption-text">Figure 4: Centralised deployments</p></div><p>In India, and other countries where long-distance call charges are higher than local call charges, centralised platforms also suffer from an added cost element, since all callers must call the central number, regardless of their own locations.</p><h2>Hyperlocal deployments</h2><p>An alternative model is a hyperlocal community-oriented one. In this model, an instance of the platform is deployed at the community level and maintained by community members. Such community-level audio portals could be used as voice-based bulletin boards. By managing the size of the user base, and ensuring a manageable user adoption rate by limiting publicity to word of mouth, communities could eliminate the need for moderation by making sure everyone on the platform was known by the others and therefore accountable to the community.</p><p>Several communities can then choose to link their platforms, either by sharing content, or by simply listening to each other. This will eventually lead to an organically expanding network, where people can choose which deployments they want to subscribe to, much in the same way as Internet users subscribe to different forums and websites. This would also ease the burden on centralised deployments already in existence, since they could then simply trawl the community bulletin boards for usable content, rather than filter out unusable content on their own incoming stream. As you can see from Figure 5, the hyperlocal model offers more avenues for collaboration and the cross-fertilisation of ideas between communities than the centralised model.</p><div id="attachment_9465" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-5.jpg?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/Fig-5-590x632.jpg?d9c344" alt="The advantage of hyperlocal deployments" title="The advantage of hyperlocal deployments" width="590" height="632" class="size-large wp-image-9465" /></a><p class="wp-caption-text">Figure 5: The advantage of hyperlocal deployments</p></div><p>A word of caution: This approach is still experimental, and needs several more deployments before it can be considered a best practice. However, for communities interested in improving their information access and level of participation in mainstream society, this is a very worthwhile experiment to take on. Both systems described here can be installed on a mid-range notebook computer.</p><p>The software is all open source and free for non-commercial use. Mobile interfaces like GSM gateways and mobile ATAs are relatively cheap &#8212; a Matrix SETU ATA 211G would cost roughly US$ 120, and the Mobigater is priced at about US$ 50. The total cost of setting up a local IVR installation and running it through a year, including the cost of connectivity, is typically less than US$ 200 a year.</p><p>Of course, the most important thing to remember while setting up an alternative communication platform is that while technology will certainly provide the tools, the key to success is to build a strong community around your platform, and quickly demonstrate value to the community from participating. This is where most of the hard work lies.</p><p>It would be interesting to see how well the open source community in India takes to these projects and how quickly the hyperlocal model can be tested with several more installations.</p><h5>References</h5><ul><li><a href="https://github.com/mojolab/swara">Swara Project on GitHub</a></li><li><a href="http://swara.mojolab.org/">Swara Community</a></li><li><a href="http://cgnetswara.org/">CGNet Swasra</a></li><li><a href="http://freedomfone.org/">FreedomFone Project</a></li></ul><div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2011/11/droidcon-india-2011-a-report/" rel="bookmark" class="crp_title">Droidcon India 2011: A Report</a></li><li><a href="http://www.linuxforu.com/2010/12/%e2%80%9cmicrosoft-is-working-towards-establishing-a-long-term-community-connection%e2%80%9d/" rel="bookmark" class="crp_title">“Microsoft is working towards establishing a long-term community connection”</a></li><li><a href="http://www.linuxforu.com/2012/01/glimpses-of-dark-internet-protest-sopa-pipa/" rel="bookmark" class="crp_title">Some Glimpses of the &#8216;Dark&#8217; Internet in Protest of SOPA/PIPA</a></li><li><a href="http://www.linuxforu.com/2010/05/another-educational-institute-opens-its-gates-to-open-source/" rel="bookmark" class="crp_title">Another Educational Institute Opens Its Gates to Open Source</a></li><li><a href="http://www.linuxforu.com/2009/03/managing-music-efficiently/" rel="bookmark" class="crp_title">Managing Music Efficiently</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/audio-portal/" title="audio portal" rel="tag">audio portal</a>, <a href="http://www.linuxforu.com/tag/broadband/" title="broadband" rel="tag">broadband</a>, <a href="http://www.linuxforu.com/tag/cathedral-and-the-bazaar/" title="Cathedral And The Bazaar" rel="tag">Cathedral And The Bazaar</a>, <a href="http://www.linuxforu.com/tag/cell-phones/" title="cell phones" rel="tag">cell phones</a>, <a href="http://www.linuxforu.com/tag/cgnet/" title="CGNet" rel="tag">CGNet</a>, <a href="http://www.linuxforu.com/tag/citizen-media/" title="citizen media" rel="tag">citizen media</a>, <a href="http://www.linuxforu.com/tag/communication-technology/" title="communication technology" rel="tag">communication technology</a>, <a href="http://www.linuxforu.com/tag/content-management-system/" title="content management system" rel="tag">content management system</a>, <a href="http://www.linuxforu.com/tag/democracy/" title="Democracy" rel="tag">Democracy</a>, <a href="http://www.linuxforu.com/tag/developing-country/" title="Developing country" rel="tag">Developing country</a>, <a href="http://www.linuxforu.com/tag/digital-divide/" title="Digital Divide" rel="tag">Digital Divide</a>, <a href="http://www.linuxforu.com/tag/free-speech/" title="free speech" rel="tag">free speech</a>, <a href="http://www.linuxforu.com/tag/freedomfone/" title="FreedomFone" rel="tag">FreedomFone</a>, <a href="http://www.linuxforu.com/tag/github/" title="GitHub" rel="tag">GitHub</a>, <a href="http://www.linuxforu.com/tag/gsm/" title="GSM" rel="tag">GSM</a>, <a href="http://www.linuxforu.com/tag/healthcare/" title="healthcare" rel="tag">healthcare</a>, <a href="http://www.linuxforu.com/tag/india/" title="India" rel="tag">India</a>, <a href="http://www.linuxforu.com/tag/interactive-voice-response/" title="Interactive Voice Response" rel="tag">Interactive Voice Response</a>, <a href="http://www.linuxforu.com/tag/internet-penetration/" title="Internet penetration" rel="tag">Internet penetration</a>, <a href="http://www.linuxforu.com/tag/internet-users/" title="Internet users" rel="tag">Internet users</a>, <a href="http://www.linuxforu.com/tag/ivr/" title="IVR" rel="tag">IVR</a>, <a href="http://www.linuxforu.com/tag/journalists/" title="journalists" rel="tag">journalists</a>, <a href="http://www.linuxforu.com/tag/lfy-january-2012/" title="LFY January 2012" rel="tag">LFY January 2012</a>, <a href="http://www.linuxforu.com/tag/mainstream-media/" title="mainstream media" rel="tag">mainstream media</a>, <a href="http://www.linuxforu.com/tag/mass-media/" title="mass media" rel="tag">mass media</a>, <a href="http://www.linuxforu.com/tag/mit/" title="MIT" rel="tag">MIT</a>, <a href="http://www.linuxforu.com/tag/mobile-phones/" title="mobile phones" rel="tag">mobile phones</a>, <a href="http://www.linuxforu.com/tag/open-medium/" title="open medium" rel="tag">open medium</a>, <a href="http://www.linuxforu.com/tag/pbx-system/" title="PBX system" rel="tag">PBX system</a>, <a href="http://www.linuxforu.com/tag/sms/" title="SMS" rel="tag">SMS</a>, <a href="http://www.linuxforu.com/tag/speech-recognition-technology/" title="Speech-recognition technology" rel="tag">Speech-recognition technology</a>, <a href="http://www.linuxforu.com/tag/swara/" title="Swara" rel="tag">Swara</a>, <a href="http://www.linuxforu.com/tag/text-to-speech-system/" title="text-to-speech system" rel="tag">text-to-speech system</a>, <a href="http://www.linuxforu.com/tag/voice-recognition/" title="voice recognition" rel="tag">voice recognition</a>, <a href="http://www.linuxforu.com/tag/web-frameworks/" title="Web frameworks" rel="tag">Web frameworks</a>, <a href="http://www.linuxforu.com/tag/web-interface/" title="Web interface" rel="tag">Web interface</a>, <a href="http://www.linuxforu.com/tag/web-platforms/" title="Web platforms" rel="tag">Web platforms</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2012/01/digital-divide-audio-portals-connect-communities/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>NewSQL &#8212; The New Way to Handle Big Data</title><link>http://www.linuxforu.com/2012/01/newsql-handle-big-data/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=newsql-handle-big-data</link> <comments>http://www.linuxforu.com/2012/01/newsql-handle-big-data/#comments</comments> <pubDate>Mon, 30 Jan 2012 07:38:19 +0000</pubDate> <dc:creator>Prasanna Venkatesh</dc:creator> <category><![CDATA[Developers]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Sysadmins]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[451 Group]]></category> <category><![CDATA[acid]]></category> <category><![CDATA[analytics systems]]></category> <category><![CDATA[big data]]></category> <category><![CDATA[CouchDB]]></category> <category><![CDATA[data availability]]></category> <category><![CDATA[data management]]></category> <category><![CDATA[data management solutions]]></category> <category><![CDATA[data processing]]></category> <category><![CDATA[data storage]]></category> <category><![CDATA[database systems]]></category> <category><![CDATA[database technology]]></category> <category><![CDATA[databases]]></category> <category><![CDATA[enterprise applications]]></category> <category><![CDATA[exabytes]]></category> <category><![CDATA[Facebook]]></category> <category><![CDATA[Google]]></category> <category><![CDATA[high availability]]></category> <category><![CDATA[information management systems]]></category> <category><![CDATA[LFY January 2012]]></category> <category><![CDATA[LinkedIn]]></category> <category><![CDATA[Matthew Aslett]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[NewSQL]]></category> <category><![CDATA[NoSQL]]></category> <category><![CDATA[NoSQL solutions]]></category> <category><![CDATA[OLTP]]></category> <category><![CDATA[performance requirements]]></category> <category><![CDATA[processing power]]></category> <category><![CDATA[RDBMS]]></category> <category><![CDATA[real-time transactions]]></category> <category><![CDATA[Relational Database]]></category> <category><![CDATA[Twitter]]></category> <category><![CDATA[web applications]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=9359</guid> <description><![CDATA[Big data, big data, big data! This term has been dominating information management for a while, leading to enhancements in systems, primarily databases, to handle this revolution. Though there are many alternative information...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/NewSQL-Distribution.jpg?d9c344" alt="NewSQL Distribution" title="NewSQL Distribution" width="300" height="313" class="alignright size-full wp-image-9366" /><div class="introduction">Big data, big data, big data! This term has been dominating information management for a while, leading to enhancements in systems, primarily databases, to handle this revolution. Though there are many alternative information management systems available for users, in this article, we share our perspective on a new type, termed NewSQL, which caters to the growing data in OLTP systems.</div><p>Everywhere we look in today&#8217;s data management landscape, the volume of information is soaring. According to one estimate, the data created in 2010 is about 1,200 exabytes, and will grow to nearly 8,000 exabytes by 2015, with the Internet/Web being the primary data driver and consumer.</p><p>This growth is outpacing the growth of storage capacity, leading to the emergence of information management systems where data is stored in a distributed way, but accessed and analysed as if it resides on a single machine (Figure 1).</p><div id="attachment_9363" class="wp-caption aligncenter" style="width: 500px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/NewSQL-Distribution-Fig.1.jpg?d9c344" alt="Distributed data storage" title="Distributed data storage" width="500" height="310" class="size-full wp-image-9363" /><p class="wp-caption-text">Figure 1: Distributed data storage</p></div><p>Besides resolving the data size problem, these solutions also need to cater to massive performance requirements to ensure timeliness of data processing. Unfortunately, an increase in processing power does not directly translate to faster data access, triggering a rethink on existing database systems.</p><p>To understand the enormity of data volumes, let&#8217;s look at a couple of figures:  Facebook needs to store 135 billion messages a month. Twitter has the problem of storing 7 TB of data per day, with the prospect of this requirement doubling multiple times per year. Criticality of data and continuity in data availability has become more important than ever. We expect data to be available 24&#215;7 and from everywhere.</p><p>This brings in the third dimension of high availability and durability. So we have to factor in high availability without any single point of failure, which has traditionally been a telecom forte. These challenges have created a new wave in database processing solutions, which manage data in both structured<br /> and unstructured ways.</p><p>Legacy information management systems are characterised by monolithic relational databases, disk/tape-based stores (as memory in huge quantities is scarce or expensive), vendor lock-in and limited enterprise-level scalability. With the advent of the cloud, new data management solutions are emerging to handle distributed (relational/non-relational) content on open platforms at the speed of a mouse-click.</p><h2>NoSQL</h2><p>One of the key advances in resolving the &#8220;big-data&#8221; problem has been the emergence of NoSQL as an alternative database technology. NoSQL (sometimes expanded to &#8220;not only SQL&#8221;) is a broad class of DBMS that differ significantly from the classic RDBMS model. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally.</p><p>NoSQL solutions have carved a niche market for themselves as key-value stores, document databases, graph databases and big-tables. Use cases for these solutions in Web applications are a dime a dozen. They are used everywhere &#8212; whether it is in Web-based social interactions like online gaming/networking, or revenue maximising decisions like ad offerings, and not to mention basic operations like searching the Internet. Figure 2 shows the architecture of a famous NoSQL database (CouchDB) written in Erlang.</p><div id="attachment_9365" class="wp-caption aligncenter" style="width: 464px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/NewSQL-Cassandra-Fig.2.jpg?d9c344" alt="CouchDB architecture" title="CouchDB architecture" width="464" height="505" class="size-full wp-image-9365" /><p class="wp-caption-text">Figure 2: CouchDB architecture</p></div><p>But, is NoSQL the answer to all problems? In spite of the technological acumen provided by NoSQL solutions, the RDBMS users in enterprises are reluctant to switch to it. Why?</p><p>The greatest deployment of traditional RDBMS is primarily in enterprises, for everything the enterprise does, whether storing customer information, internal financials, employee information or anything else. Is such data growing? Yes, so big-data issues are relevant to them. So why not NoSQL?</p><p>Even though there are varieties of NoSQL offerings, they are typically characterised by lack of SQL support, and non-adherence to ACID (Atomicity, Consistency, Isolation and Durability) properties. (Here, we will not go into the technical reasons for this.)</p><p>So while NoSQL could help enterprises manage large distributed data, enterprises cannot afford to lose the ACID properties &#8212; which were the key reasons for choosing an RDBMS in the first place. Also, since NoSQL solutions don&#8217;t provide SQL support, which most current enterprise applications require, this pushes enterprises away from NoSQL.</p><p>Hence, a new set of data-management solutions are emerging to address large data OLTP concerns, without sacrificing ACID and SQL interfaces.</p><h2>Why is traditional OLTP insufficient?</h2><p>We have already discussed the fact that there is an explosion in the OLTP data space. For example, more widely used social network sites (like Facebook and Linkedin) lead to larger OLTP requirements. Each user of such sites requires credentials and user profile information to be stored, generally in some OLTP database, even though the actual user data is stored in a NoSQL data-store. Facebook&#8217;s 750 million users require a very large OLTP database.</p><p>While growing OLTP data is a direct contributor, business requirements also force non-OLTP data to be managed as OLTP. Let&#8217;s look at the example of how analytics (which is the quintessential OLAP application) has led to the scalability requirements of OLTP.</p><h2>Case study: Real-time analytics</h2><p>In the earlier days, analytics were performed on historic data with specialised data warehouse solutions, mostly using an ETL (Extract, Transform and Load) approach. Data extracted from OLTP systems was fed to data warehouse systems capable of handling voluminous data. Real-time analytics is an approach that enables business users to get up-to-the-minute data by directly accessing business operational systems or feeding real-time transactions to analytics systems.</p><p>For example, previously Google Analytics analysed past performance &#8212; but recently, Google launched <em>Google Analytics: Real Time</em>, which shows a set of new reports on what&#8217;s happening on their site, as it happens.</p><p>Even though traditional OLTP systems provide ACID, they are not well equipped to handle the volume of data seen because of new innovative business scenarios like real-time analytics. A combination of traditional OLTP systems and analytics systems might under-utilise the analytics systems, due to scalability limitations and the performance of traditional OLTP systems.</p><h2>Welcome NewSQL</h2><p>To address big-data OLTP business scenarios that neither traditional OLTP systems nor NoSQL systems address, alternative database systems have evolved, collectively named NewSQL systems. This term was coined by the 451 Group, in their <a href="https://www.451research.com/report-long?icid=1651">now famous report, &#8220;NoSQL, NewSQL and Beyond&#8221;</a>.</p><p>The term NewSQL was used to categorise these new alternative database systems. 451 Group&#8217;s senior analyst, Matthew Aslett, clarified the meaning of the term NewSQL (in his blog) as follows: &#8220;NewSQL is our shorthand for the various new scalable/high-performance SQL database vendors. We have previously referred to these products as &#8220;ScalableSQL&#8221; to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all products, we adopted the term NewSQL in the new report. And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL. NewSQL is a set of various new scalable/high-performance SQL database vendors (or databases). These vendors have designed solutions to bring the benefits of the relational model to the distributed architecture, and improve the performance of relational databases to an extent that the scalability is no longer an issue.&#8221;</p><p>Figure 3 is an architectural example of one of the NewSQL solutions (dbShards).</p><div id="attachment_9364" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/NewSQL-dbshards-overview-Fig.3-590x354.jpg?d9c344" alt="dbShards" title="dbShards" width="590" height="354" class="size-large wp-image-9364" /><p class="wp-caption-text">Figure 3: dbShards</p></div><h2>Technical characteristics of NewSQL solutions</h2><ol><li>SQL as the primary mechanism for application interaction.</li><li>ACID support for transactions.</li><li>A non-locking concurrency control mechanism so real-time reads will not conflict with writes, and thus cause them to stall.</li><li>An architecture providing much higher per-node performance than available from traditional RDBMS solutions.</li><li>A scale-out, shared-nothing architecture, capable of running on a large number of nodes without suffering bottlenecks.</li></ol><p>The expectation is that NewSQL systems are about 50 times faster than traditional OLTP RDBMS.</p><p>Similar to NoSQL, there are many categories of NewSQL solutions. Now we can look at these categorisations and their characteristics.</p><h2>NewSQL categorisation</h2><p>Categorisation is based on the different approaches adopted by vendors to preserve the SQL interface, and address the scalability and performance concerns of traditional OLTP solutions.</p><ol><li><strong>New databases:</strong> These NewSQL systems are newly designed from scratch to achieve scalability and performance. Of course, some (hopefully minor) changes to the code will be required and data migration is still needed. One of the key considerations in improving the performance is making non-disk (memory) or new kinds of disks (flash/SSD) the primary data store. Solutions can be software-only (VoltDB, NuoDB and Drizzle) or supported as an appliance (Clustrix, Translattice). Examples of offerings are Clustrix, NuoDB and Translattice (commercial); and VoltDB, Drizzle, etc., (open source).</li><li><strong>New MySQL storage engines:</strong> MySQL is part of the LAMP stack and is used extensively in OLTP. To overcome MySQL&#8217;s scalability problems, a set of storage engines are developed, which include Xeround, Akiban, MySQL NDB cluster, GenieDB, Tokutek, etc. The good part is the usage of the MySQL interface, but the downside is data migration from other databases (including old MySQL) is not supported. Examples of offerings are Xeround, GenieDB and TokuTek (commercial); and Akiban, MySQL NDB Cluster and others in open source.</li><li><strong>Transparent clustering:</strong> These solutions retain the OLTP databases in their original format, but provide a pluggable feature to cluster transparently, to ensure scalability. Another approach is to provide transparent sharding to improve scalability. Schooner MySQL, Continuent Tungsten and ScalArc follow the former approach, whereas ScaleBase and dbShards follow the latter approach. Both approaches allow reuse of existing skillsets and ecosystem, and avoid the need to rewrite code or perform any data migration. Examples of offerings are  ScalArc, Schooner MySQL, dbShards and ScaleBase (commercial); and Continuent Tungsten (open source).</li></ol><h2>Summing up</h2><p>Given the continuing trend of data growth in OLTP systems, a new generation of solutions is required to cater to them. DBMSs have to be rearchitected from scratch to meet this demand. However, unlike NoSQL, these DBMSs have to cater for applications already written for an earlier generation of RDBMS. Hence, drastic interface changes like throwing out SQL or extensive changes to data schema is out of the question.</p><p>A new generation of information management systems, termed NewSQL systems, caters to this trend and these constraints. NewSQL is apt for businesses that are planning to:</p><ol><li>migrate existing applications to adapt to new trends of data growth,</li><li>develop new applications on highly scalable OLTP systems, and</li><li>rely on existing knowledge of OLTP usage.</li></ol><p>We look forward to your comments on this introductory article, and we will also plan for a series of expert in-depth articles on the NewSQL database technology.<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2011/07/database-demands-of-peta-scale-computing/" rel="bookmark" class="crp_title">The Database Demands of Peta-scale Computing</a></li><li><a href="http://www.linuxforu.com/2011/02/up-close-and-personal-with-nosql/" rel="bookmark" class="crp_title">Up Close and Personal with NoSQL</a></li><li><a href="http://www.linuxforu.com/2012/01/importance-of-in-memory-databases/" rel="bookmark" class="crp_title">The Importance of In-memory Databases</a></li><li><a href="http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/" rel="bookmark" class="crp_title">Databases in the Era of Cloud Computing and Big Data</a></li><li><a href="http://www.linuxforu.com/2010/02/enterprise-db-score-a-goal-with-postgres/" rel="bookmark" class="crp_title">Score a Goal With Postgres!</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/451-group/" title="451 Group" rel="tag">451 Group</a>, <a href="http://www.linuxforu.com/tag/acid/" title="acid" rel="tag">acid</a>, <a href="http://www.linuxforu.com/tag/analytics-systems/" title="analytics systems" rel="tag">analytics systems</a>, <a href="http://www.linuxforu.com/tag/big-data/" title="big data" rel="tag">big data</a>, <a href="http://www.linuxforu.com/tag/couchdb/" title="CouchDB" rel="tag">CouchDB</a>, <a href="http://www.linuxforu.com/tag/data-availability/" title="data availability" rel="tag">data availability</a>, <a href="http://www.linuxforu.com/tag/data-management/" title="data management" rel="tag">data management</a>, <a href="http://www.linuxforu.com/tag/data-management-solutions/" title="data management solutions" rel="tag">data management solutions</a>, <a href="http://www.linuxforu.com/tag/data-processing/" title="data processing" rel="tag">data processing</a>, <a href="http://www.linuxforu.com/tag/data-storage/" title="data storage" rel="tag">data storage</a>, <a href="http://www.linuxforu.com/tag/database-systems/" title="database systems" rel="tag">database systems</a>, <a href="http://www.linuxforu.com/tag/database-technology/" title="database technology" rel="tag">database technology</a>, <a href="http://www.linuxforu.com/tag/databases/" title="databases" rel="tag">databases</a>, <a href="http://www.linuxforu.com/tag/enterprise-applications/" title="enterprise applications" rel="tag">enterprise applications</a>, <a href="http://www.linuxforu.com/tag/exabytes/" title="exabytes" rel="tag">exabytes</a>, <a href="http://www.linuxforu.com/tag/facebook/" title="Facebook" rel="tag">Facebook</a>, <a href="http://www.linuxforu.com/tag/google/" title="Google" rel="tag">Google</a>, <a href="http://www.linuxforu.com/tag/high-availability/" title="high availability" rel="tag">high availability</a>, <a href="http://www.linuxforu.com/tag/information-management-systems/" title="information management systems" rel="tag">information management systems</a>, <a href="http://www.linuxforu.com/tag/lfy-january-2012/" title="LFY January 2012" rel="tag">LFY January 2012</a>, <a href="http://www.linuxforu.com/tag/linkedin/" title="LinkedIn" rel="tag">LinkedIn</a>, <a href="http://www.linuxforu.com/tag/matthew-aslett/" title="Matthew Aslett" rel="tag">Matthew Aslett</a>, <a href="http://www.linuxforu.com/tag/mysql/" title="MySQL" rel="tag">MySQL</a>, <a href="http://www.linuxforu.com/tag/newsql/" title="NewSQL" rel="tag">NewSQL</a>, <a href="http://www.linuxforu.com/tag/nosql/" title="NoSQL" rel="tag">NoSQL</a>, <a href="http://www.linuxforu.com/tag/nosql-solutions/" title="NoSQL solutions" rel="tag">NoSQL solutions</a>, <a href="http://www.linuxforu.com/tag/oltp/" title="OLTP" rel="tag">OLTP</a>, <a href="http://www.linuxforu.com/tag/performance-requirements/" title="performance requirements" rel="tag">performance requirements</a>, <a href="http://www.linuxforu.com/tag/processing-power/" title="processing power" rel="tag">processing power</a>, <a href="http://www.linuxforu.com/tag/rdbms/" title="RDBMS" rel="tag">RDBMS</a>, <a href="http://www.linuxforu.com/tag/real-time-transactions/" title="real-time transactions" rel="tag">real-time transactions</a>, <a href="http://www.linuxforu.com/tag/relational-database/" title="Relational Database" rel="tag">Relational Database</a>, <a href="http://www.linuxforu.com/tag/twitter/" title="Twitter" rel="tag">Twitter</a>, <a href="http://www.linuxforu.com/tag/web-applications/" title="web applications" rel="tag">web applications</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2012/01/newsql-handle-big-data/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>The Importance of In-memory Databases</title><link>http://www.linuxforu.com/2012/01/importance-of-in-memory-databases/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=importance-of-in-memory-databases</link> <comments>http://www.linuxforu.com/2012/01/importance-of-in-memory-databases/#comments</comments> <pubDate>Mon, 30 Jan 2012 04:48:42 +0000</pubDate> <dc:creator>Prasanna Venkatesh</dc:creator> <category><![CDATA[Developers]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Sysadmins]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[CSQL]]></category> <category><![CDATA[data management]]></category> <category><![CDATA[data mining]]></category> <category><![CDATA[data structure/algorithm]]></category> <category><![CDATA[database management systems]]></category> <category><![CDATA[database systems]]></category> <category><![CDATA[IBM]]></category> <category><![CDATA[IMDB]]></category> <category><![CDATA[in-memory computing]]></category> <category><![CDATA[in-memory databases]]></category> <category><![CDATA[in-memory technology]]></category> <category><![CDATA[India]]></category> <category><![CDATA[LFY January 2012]]></category> <category><![CDATA[MonetDB]]></category> <category><![CDATA[National Research Institute]]></category> <category><![CDATA[Oracle]]></category> <category><![CDATA[RDBMS]]></category> <category><![CDATA[Real-time analytics]]></category> <category><![CDATA[real-time applications]]></category> <category><![CDATA[real-time performance]]></category> <category><![CDATA[simplified storage algorithms]]></category> <category><![CDATA[web applications]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=9341</guid> <description><![CDATA[It has been predicted that in-memory computing will be one of the Top 10 technologies of 2012. In-memory databases (IMDBs) are a critical part of this paradigm. Through this introductory article, let&#8217;s get...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/IN-Memory-DB.jpg?d9c344" alt="In-memory database" title="In-memory database" width="300" height="300" class="alignright size-full wp-image-9346" /><div class="introduction">It has been predicted that in-memory computing will be one of the Top 10 technologies of 2012. In-memory databases (IMDBs) are a critical part of this paradigm. Through this introductory article, let&#8217;s get acquainted with the basics of IMDBs. We will look at what they are, why they are developed, and the key differences between IMDB and traditional disk DBs.</div><p>Gartner has predicted in-memory computing to be one of the top 10 strategic technologies of 2012. In-memory computing is expected to have a disruptive impact on the data warehousing domain in the coming two years. There have been a series of such products released in the last two years, one of the most famous being SAP&#8217;s HANA. Real-time analytics and sub-second response times for enterprise applications require high-performance data management systems. This, in turn, has led to a surge in the importance of IMDBs.</p><p>Network bandwidth has increased dramatically and multi-core processors are available even in mobile phones. However, disk I/O speed has not been increasing at the same rate, which has crippled traditional databases. The first step towards high-performance databases is IMDBs. With ever-growing RAM size and the ability to address more RAM (with 64-bit address spaces), IMDBs are in vogue.</p><h2>What is an in-memory database?</h2><p>Wikipedia defines an in-memory database as, &#8220;An in-memory database (IMDB; also, main memory database system or MMDB) is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism.&#8221;</p><p>Margaret H Eich has given a very simple definition of an IMDB/MMDB: a database whose primary data store is main memory.</p><p>IMDBs are architected and designed differently from traditional RDBMSs. They have simplified algorithms and mechanisms, which are built with the awareness that all data is going to be in RAM. Figure 1 is a simple illustration of the differences.</p><div id="attachment_9345" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/IMDB-vs-Disk-DB-Fig.1-590x296.png?d9c344" alt="IMDB vs Disk DB" title="IMDB vs Disk DB" width="590" height="296" class="size-large wp-image-9345" /><p class="wp-caption-text">Figure 1: IMDB vs Disk DB</p></div><p>IMDBs are also different from simplified storage algorithms like hash-tables or trees, in that IMDBs are &#8220;databases&#8221; or RDBMSs. Most IMDBs are ACID-compliant relational databases offering SQL. They have all the properties of a traditional RDBMS, but are tuned for data to be in-memory.</p><h2>Getting deeper into the IMDB</h2><p>IMDBs are not a new phenomenon. The great DB scientist Jim Gray conceptualised this technology 30 years ago and also predicted that in the 2000s these technologies would become widely accepted. Almost from the early 1980s, IMDBs have existed in the telecom and defense domains. However, these were built and maintained as internal components. IBM built one of the first in-memory engines (IMS/VS FastPath) way back in 1978.</p><p>The first significant commercial IMDB offering was TimesTen (later acquired by Oracle). Since then, almost all top DB vendors boast an IMDB product &#8212; like IBM&#8217;s SolidDB, Sybase&#8217;s ASE, etc.</p><h2>Myths about IMDBs</h2><p>Consider the following assertions:</p><ul><li>Given the same amount of RAM, disk DBs can perform at the same speed as IMDBs (by using caching technology).</li><li>If a RAM disk is created and a traditional disk DB is deployed on it, it delivers the same performance as an in-memory database.</li><li>If the system crashes, all the data stored in IMDBs will be lost.</li><li>SSDs and flash storage are getting better and better. Using these technologies along with traditional disk DBs yields the same performance as an in-memory database.</li><li>Since RAM size is limited, sizes of IMDBs are also limited.</li><li>Finally, IMDBs are not special; they are traditional disk DBs made to run on RAM.</li></ul><p>Naturally, all of these are myths. It is close to impossible to create an in-memory DB from a traditional disk database by just changing the OS or hardware environment.</p><p>So, internally, how different is an IMDB from a traditional disk DB?</p><p>IMDBs are architected and designed keeping in mind the fact that all data is in memory. This actually leads to much simpler design as compared to disk DBs. There are six areas of difference:</p><ol><li><em>Query optimisation:</em> In disk DBs, the I/O cost factor dominates the optimisation. However, in IMDBs there is no such clear factor, which makes query optimisation very tricky. This is generally solved by taking constants and falling back on rule-based optimisation.</li><li><em>Indexing:</em> More memory-friendly data structures and algorithms are used for indexing. While most disk DBs use B-Tree as a primary indexing data structure/algorithm, IMDBs tend to use T-Tree as a primary indexing data structure/algorithm.</li><li><em>Internal data representation:</em> Compactness of representation dominates concerns for IMDBs. With all data being in memory, IMDBs tend to use direct memory pointers heavily. This is very typical of the IMDB memory page, index data or relation representations.</li><li><em>Durability and recovery:</em> Contrary to popular belief, IMDBs are durable. They use algorithms similar to disk DBs for persistence. However, the buffer management, which is the biggest performance bottleneck for disk DBs, is eliminated. During database loading, IMDBs tend to take a bit more time as they have to load the complete data into memory. Hence, recovery is a bit slower.</li><li><em>Access methodology:</em> Generally, disk DBs offer client server over sockets as a primary access method. However, with no disk I/O, if IMDBs only offer sockets for access, this will become a bottleneck. Hence, most IMDBs tend to offer shared-memory access as a primary method. In a few cases, JDBC/ODBC interfaces are also supported.</li><li><em>Concurrency control:</em> Due to inherent speed in processing, IMDBs can take coarser locks and also do less to persist them. However, disk DBs take finer locks and take elaborate measures to persist them.</li></ol><p>Figure 2 is a typical architecture for an in-memory database.</p><div id="attachment_9342" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/IMDB-Architecture-Fig.2-590x417.png?d9c344" alt="IMDB architecture" title="IMDB architecture" width="590" height="417" class="size-large wp-image-9342" /><p class="wp-caption-text">Figure 2: IMDB architecture</p></div><h2>Typical applications of IMDBs</h2><p>IMDBs are applicable in all domains that require real-time performance and very low latency. Four domains typically use IMDBs: telecom, financial segments, enterprises, and e-commerce and Web applications. In these spaces, IMDBs are used in a variety of applications (refer to Figure 3).</p><div id="attachment_9344" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/IMDB-Usecases-Fig.3-590x298.png?d9c344" alt="IMDB usecases" title="IMDB usecases" width="590" height="298" class="size-large wp-image-9344" /><p class="wp-caption-text">Figure 3: IMDB usecases</p></div><h2>Key offerings in the IMDB space</h2><p>The IMDB space is dominated by a lot of commercial players. Some of the most important ones are Oracle TimesTen, IBM SolidDB, Sybase ASE, ENEA Polyhedra and McObject ExtremeDB. There are also some typical open source solutions. Let us take a look at two of the FOSS IMDBs &#8212; CSQL and MonetDB.</p><h3>CSQ</h3><p>CSQL is an open source main-memory high-performance RDBMS developed in India. It is one of the fastest open source IMDBs. It is designed to provide high performance on simple SQL queries and DML statements that involve only one table. It supports only limited features, which are used by most real-time applications, like <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code> on a single table, and <code>SELECT</code> with local predicates on a single table.</p><p>It provides multiple interfaces such as JDBC, ODBC and other SQL APIs. CSQL offers atomicity, consistency and isolation. It is typically recommended for use as a cache for existing disk-based commercial databases.</p><h3>MonetDB</h3><p>MonetDB is an open source high-performance DBMS developed at the National Research Institute for Mathematics and Computer Science in the Netherlands. It was designed to provide high performance on complex queries against large databases, e.g., combining tables with hundreds of columns and multi-million rows.</p><p>MonetDB is one of the first database systems to focus its query optimisation effort on exploiting CPU caches. Development of MonetDB started in 1979 and it became an open source project in 2003. MonetDB has been successfully applied in high-performance applications for data mining, OLAP, GIS, XML Query, and text and multimedia retrieval.</p><p>How do IMDBs change the rules of the game? While existing applications and schemas can directly benefit from IMDB due to performance improvement, if some things are not carefully handled, the full benefit of IMDB is not achieved. Areas that require significant changes at both the conceptual as well as implementation levels are application design, database schema design and data design (partitioning of data).</p><h2>Application design</h2><p>Taking advantage of performance benefits offered by an IMDB means redesigning applications to take advantage of the specific strengths of in-memory technology. The main way to achieve this is to push work that is currently done in the application layer down to the database.</p><p>This not only allows developers to take advantage of special operations offered by the DBMS, but also reduces the amount of data that must be transferred between application layers. This can lead to substantial performance improvements and can open up new application areas.</p><h2>Database schema design</h2><p>IMDB is beneficial if the data fits into a single database. This requires efforts to conserve space, so more elaborate normalisation procedures are not suitable. Also, using very precise and apt data types enhances storage space. Reduced redundancy, carefully formed columns, precise index creation and efficient data management will help IMDB yield better performance.</p><h2>Data design</h2><p>One of the key changes when moving from traditional disk DBs to in-memory DBs is the space available, so data partitioning and storage assumes great significance. IMDB requires as much related data as possible in a single process space. Too much distribution will introduce network I/O in the processing, thereby degrading performance.</p><h2>Summing up</h2><p>IMDBs, combined with various current hardware trends, have the ability to change the performance of enterprise and other applications drastically. This, in turn, will result in tremendous value generation to businesses. Such enhanced performance will also foster the evolution of innovative applications and services.</p><p>Do send in your feedback and queries, which can be addressed in our forthcoming articles in the IMDB series.<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2012/01/newsql-handle-big-data/" rel="bookmark" class="crp_title">NewSQL &#8212; The New Way to Handle Big Data</a></li><li><a href="http://www.linuxforu.com/2011/07/database-demands-of-peta-scale-computing/" rel="bookmark" class="crp_title">The Database Demands of Peta-scale Computing</a></li><li><a href="http://www.linuxforu.com/2009/02/improve-multi-os-computer-performance-through-cross-swapping/" rel="bookmark" class="crp_title">Improve Multi-OS Computer Performance through Cross Swapping</a></li><li><a href="http://www.linuxforu.com/2012/01/postgres-xc-database-clustering-solution/" rel="bookmark" class="crp_title">Postgres-XC &#8212; A PostgreSQL Clustering Solution</a></li><li><a href="http://www.linuxforu.com/2011/08/comprehensive-lamp-guide-part-2-mysql/" rel="bookmark" class="crp_title">The Comprehensive LAMP Guide &#8212; Part 2 (MySQL)</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/csql/" title="CSQL" rel="tag">CSQL</a>, <a href="http://www.linuxforu.com/tag/data-management/" title="data management" rel="tag">data management</a>, <a href="http://www.linuxforu.com/tag/data-mining/" title="data mining" rel="tag">data mining</a>, <a href="http://www.linuxforu.com/tag/data-structurealgorithm/" title="data structure/algorithm" rel="tag">data structure/algorithm</a>, <a href="http://www.linuxforu.com/tag/database-management-systems/" title="database management systems" rel="tag">database management systems</a>, <a href="http://www.linuxforu.com/tag/database-systems/" title="database systems" rel="tag">database systems</a>, <a href="http://www.linuxforu.com/tag/ibm/" title="IBM" rel="tag">IBM</a>, <a href="http://www.linuxforu.com/tag/imdb/" title="IMDB" rel="tag">IMDB</a>, <a href="http://www.linuxforu.com/tag/in-memory-computing/" title="in-memory computing" rel="tag">in-memory computing</a>, <a href="http://www.linuxforu.com/tag/in-memory-databases/" title="in-memory databases" rel="tag">in-memory databases</a>, <a href="http://www.linuxforu.com/tag/in-memory-technology/" title="in-memory technology" rel="tag">in-memory technology</a>, <a href="http://www.linuxforu.com/tag/india/" title="India" rel="tag">India</a>, <a href="http://www.linuxforu.com/tag/lfy-january-2012/" title="LFY January 2012" rel="tag">LFY January 2012</a>, <a href="http://www.linuxforu.com/tag/monetdb/" title="MonetDB" rel="tag">MonetDB</a>, <a href="http://www.linuxforu.com/tag/national-research-institute/" title="National Research Institute" rel="tag">National Research Institute</a>, <a href="http://www.linuxforu.com/tag/oracle/" title="Oracle" rel="tag">Oracle</a>, <a href="http://www.linuxforu.com/tag/rdbms/" title="RDBMS" rel="tag">RDBMS</a>, <a href="http://www.linuxforu.com/tag/real-time-analytics/" title="Real-time analytics" rel="tag">Real-time analytics</a>, <a href="http://www.linuxforu.com/tag/real-time-applications/" title="real-time applications" rel="tag">real-time applications</a>, <a href="http://www.linuxforu.com/tag/real-time-performance/" title="real-time performance" rel="tag">real-time performance</a>, <a href="http://www.linuxforu.com/tag/simplified-storage-algorithms/" title="simplified storage algorithms" rel="tag">simplified storage algorithms</a>, <a href="http://www.linuxforu.com/tag/web-applications/" title="web applications" rel="tag">web applications</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2012/01/importance-of-in-memory-databases/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Postgres-XC &#8212; A PostgreSQL Clustering Solution</title><link>http://www.linuxforu.com/2012/01/postgres-xc-database-clustering-solution/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=postgres-xc-database-clustering-solution</link> <comments>http://www.linuxforu.com/2012/01/postgres-xc-database-clustering-solution/#comments</comments> <pubDate>Fri, 27 Jan 2012 13:09:40 +0000</pubDate> <dc:creator>Ashutosh Bapat</dc:creator> <category><![CDATA[Developers]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Sysadmins]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[cluster system]]></category> <category><![CDATA[clustering solution]]></category> <category><![CDATA[data storage]]></category> <category><![CDATA[database]]></category> <category><![CDATA[database clusters]]></category> <category><![CDATA[database servers]]></category> <category><![CDATA[disk cache]]></category> <category><![CDATA[EnterpriseDB]]></category> <category><![CDATA[high availability]]></category> <category><![CDATA[LFY January 2012]]></category> <category><![CDATA[Postgres-XC]]></category> <category><![CDATA[PostgreSQL]]></category> <category><![CDATA[relational database system]]></category> <category><![CDATA[scalability]]></category> <category><![CDATA[scalability problems]]></category> <category><![CDATA[SQL]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=9308</guid> <description><![CDATA[What started with a simple relational database system, is expanding its horizons by developing new technology that satiates the ever-increasing need for more data storage, greater transaction throughput and higher availability. Using a...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgre-xc-ist-image-590x267.jpg?d9c344" alt="PostgreSQL Clustering" title="PostgreSQL Clustering" width="590" height="267" class="aligncenter size-large wp-image-9314" /><div class="introduction">What started with a simple relational database system, is expanding its horizons by developing new technology that satiates the ever-increasing need for more data storage, greater transaction throughput and higher availability. Using a cluster to solve these scalability problems is a present trend. This article talks about Postgres-XC, a clustering solution based on the popular PostgreSQL RDBMS.</div><p>A cluster is a collection of commodity components that provide scalability and availability at a low cost to the consumer. A database cluster is a collection of database servers that store and process data using commodity hardware, satisfying the need for more data storage, higher throughput and providing high availability. Postgres-XC is such a database cluster system; it is based on PostgreSQL, and follows the same open source model.</p><p>The Postgres-XC project began in 2009, through a collaboration between NTT and EnterpriseDB. The goal was to build an open source clustering solution based on PostgreSQL with 100 per cent compatible client APIs. Having PostgreSQL-compatible APIs allows existing PostgreSQL applications to use Postgres-XC with little (or no) change. The licensing terms of this project are the same as that of PostgreSQL.</p><h2>Postgres-XC architecture</h2><p>Postgres-XC is a write-scalable, synchronous multi-master, transparent PostgreSQL clustering solution based on shared-nothing architecture. It is a collection of tightly coupled database components, which can be installed on one or more physical or virtual machines. The components do not share any resources such as disk, cache or memory.</p><ul><li><em>Write-scalability</em> means that Postgres-XC can be configured with as many database servers as needed; Postgres-XC is able to handle more writes than a single PostgreSQL server.</li><li><em>Multi-master</em> implies that clients can connect to multiple database servers, and that each database server provides a single, consistent, cluster-wide view of the database.</li><li><em>Synchronous</em> means that a write from any of the masters is immediately visible to other transactions running on other masters.</li><li><em>Transparent</em> means that applications do not have to worry about how the data is stored in multiple database servers, internally.</li></ul><div id="attachment_9310" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-01.png?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-01-590x479.png?d9c344" alt="Postgres-XC architecture" title="Postgres-XC architecture" width="590" height="479" class="size-large wp-image-9310" /></a><p class="wp-caption-text">Figure 1: Postgres-XC architecture</p></div><p>Figure 1 gives the architectural overview of Postgres-XC with its three main components:</p><ol><li><em>Global Transaction Manager (GTM)</em> gathers and manages information about transactional activities in Postgres-XC, issues global transaction identifiers to transactions (to maintain a consistent view of the database on all nodes), and  provides ACID properties. It provides support for other global data, such as sequences and timestamps. It stores no user data, except control information.</li><li><em>Coordinators</em> (masters) provide a point of contact for the application/client. They are responsible for parsing and executing queries from the clients, and returning the results (if needed). They do not store any user data themselves, but gather the data from datanodes, with the help of SQL queries fired through a PostgreSQL-native interface. The coordinators also process the data if required and even manage the two-phase commit. Although coordinators do not store user data, they use the catalogue data to parse queries, resolve symbols, plan queries, locate data, etc.</li><li><em>Datanodes</em> store user data and catalogues. The datanodes execute the queries received from the coordinator and return results to the coordinator.</li></ol><h2>Distribution of data and scalability</h2><p>Postgres-XC allows two ways of storing the tables on the datanodes:</p><ol><li><strong>Distributed tables:</strong> A table is distributed on a given set of datanodes using strategies like hash, round-robin, or modulo partitioning. Every row of a distributed table resides on a single datanode. Multiple rows can be modified or written in parallel to various datanodes; we can also read the rows from various datanodes in parallel. Performance is greatly improved by parallel writes and reads from different datanodes.</li><li><strong>Replicated tables:</strong> A table is replicated on a given set of datanodes using statement-level replication, which performs better than log-based replication, since the size of the logs that must be shipped is much greater than the size of the statement. In the case of a replicated table, a row in the table resides on each datanode on which the table is replicated. Any modifications to the row must be duplicated to each replicated copy. Since all the data in the table is available on a single datanode, the coordinator can gather all the data from a single node and in some cases, act as a proxy between the client application and the datanode. This allows multiple queries on the same table to be directed to different datanodes, thus balancing the load and increasing the read throughput.</li></ol><p>Figures 2 and 3 depict the read and write concepts for distributed and replicated tables, respectively.</p><div id="attachment_9311" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-02.png?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-02-590x265.png?d9c344" alt="Distributed tables" title="Distributed tables" width="590" height="265" class="size-large wp-image-9311" /></a><p class="wp-caption-text">Figure 2: Distributed tables</p></div><div id="attachment_9312" class="wp-caption aligncenter" style="width: 590px"><a href="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-03.png?d9c344"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-03-590x239.png?d9c344" alt="Replicated tables" title="Replicated tables" width="590" height="239" class="size-large wp-image-9312" /></a><p class="wp-caption-text">Figure 3: Replicated tables</p></div><h2>High availability</h2><p>To achieve high availability, one needs data redundancy, component redundancy and automatic failover. In Postgres-XC, data redundancy can be achieved by using the PostgreSQL native replication with Hot Standby for datanodes. Since each coordinator is a master (capable of writing data) and is capable of reading writes performed by any other coordinator instantaneously, every coordinator is capable of replacing any other, should that coordinator fail. GTM-standby acts as a redundant component for GTM. However, third-party tools are required for automatic failover of all the three types of components.</p><h2>Performance evaluation</h2><p>Initial transaction throughput measurements carried out using the DBT-1 benchmark have shown significant throughput scalability, as shown in Figure 4.</p><div id="attachment_9313" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2012/01/postgres-xc-04-590x405.jpg?d9c344" alt="Performance evaluation" title="Performance evaluation" width="590" height="405" class="size-large wp-image-9313" /><p class="wp-caption-text">Figure 4: Performance evaluation</p></div><p>The figure plots the Scaling Factor versus the Number of Servers in Postgres-XC. The Scaling Factor is the ratio of the number of transactions completed per unit time by Postgres-XC, to that completed by PostgreSQL. A Server comprised of a coordinator and a datanode run on single machine. This benchmark demonstrated an improvement in throughput of approximately 6 times, when using 10 servers.</p><h2>Release management &#038; development processes</h2><p>The Postgres-XC project is hosted on <a href="http://postgres-xc.sourceforge.net/">SourceForge</a>. The Postgres-XC team tries to release a minor version of Postgres-XC every three to four months to ensure that the latest Postgres-XC features are available to users. The team also tries to make the latest PostgreSQL features available in Postgres-XC by doing frequent merges with the latest stable release of PostgreSQL.</p><p>The last release (0.9.6) of Postgres-XC supports most of the SQL syntax and features of PostgreSQL 9.1. The team is currently working on the first major release of Postgres-XC 1.0, due in 2012, with maximum PostgreSQL compliance. Some of the other features like the dynamic addition and removal of components, global deadlock detection, global constraint support, etc, will be targeted for major release after 1.0.</p><p>The development team follows the open source development model, where the issues, features or any other development related items are discussed on the public mailing list: <code>postgres-xc-developers@lists.sourceforge.net</code>. The <code>postgres-xc-general@lists.sourceforge.net</code> mailing list is used to discuss other Postgres-XC matters and to solicit help about Postgres-XC.</p><h3>Wish to contribute?</h3><p>The Postgres-XC team needs help with feature development, bug fixing, creating installers and distribution packages, testing, and evaluation of Postgres-XC on real applications. To be part of the Postgres-XC community, feel free to contact the Postgres-XC team at the appropriate mailing list.<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2010/02/enterprise-db-score-a-goal-with-postgres/" rel="bookmark" class="crp_title">Score a Goal With Postgres!</a></li><li><a href="http://www.linuxforu.com/2011/06/telecom-service-provider-handles-huge-volumes-of-data-using-postgresql/" rel="bookmark" class="crp_title">A Telecom Service Provider Handles Huge Volumes of Data Using PostgreSQL</a></li><li><a href="http://www.linuxforu.com/2010/06/open-source-tools-in-gis/" rel="bookmark" class="crp_title">Open Source Tools in GIS</a></li><li><a href="http://www.linuxforu.com/2011/08/it-service-provider-turns-to-postgresql-for-scalability/" rel="bookmark" class="crp_title">IT Service Provider Turns to PostgreSQL for Scalability</a></li><li><a href="http://www.linuxforu.com/2011/03/a-foss-lovers-tryst-with-postgresql/" rel="bookmark" class="crp_title">A FOSS Lover&#8217;s Tryst With PostgreSQL</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/cluster-system/" title="cluster system" rel="tag">cluster system</a>, <a href="http://www.linuxforu.com/tag/clustering-solution/" title="clustering solution" rel="tag">clustering solution</a>, <a href="http://www.linuxforu.com/tag/data-storage/" title="data storage" rel="tag">data storage</a>, <a href="http://www.linuxforu.com/tag/database/" title="database" rel="tag">database</a>, <a href="http://www.linuxforu.com/tag/database-clusters/" title="database clusters" rel="tag">database clusters</a>, <a href="http://www.linuxforu.com/tag/database-servers/" title="database servers" rel="tag">database servers</a>, <a href="http://www.linuxforu.com/tag/disk-cache/" title="disk cache" rel="tag">disk cache</a>, <a href="http://www.linuxforu.com/tag/enterprisedb/" title="EnterpriseDB" rel="tag">EnterpriseDB</a>, <a href="http://www.linuxforu.com/tag/high-availability/" title="high availability" rel="tag">high availability</a>, <a href="http://www.linuxforu.com/tag/lfy-january-2012/" title="LFY January 2012" rel="tag">LFY January 2012</a>, <a href="http://www.linuxforu.com/tag/postgres-xc/" title="Postgres-XC" rel="tag">Postgres-XC</a>, <a href="http://www.linuxforu.com/tag/postgresql/" title="PostgreSQL" rel="tag">PostgreSQL</a>, <a href="http://www.linuxforu.com/tag/relational-database-system/" title="relational database system" rel="tag">relational database system</a>, <a href="http://www.linuxforu.com/tag/scalability/" title="scalability" rel="tag">scalability</a>, <a href="http://www.linuxforu.com/tag/scalability-problems/" title="scalability problems" rel="tag">scalability problems</a>, <a href="http://www.linuxforu.com/tag/sql/" title="SQL" rel="tag">SQL</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2012/01/postgres-xc-database-clustering-solution/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>A Low-cost Platform for PC-interfaced Science Experiments called expEYES</title><link>http://www.linuxforu.com/2011/07/expeyes-low-cost-platform-pc-interfaced-science-experiments/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=expeyes-low-cost-platform-pc-interfaced-science-experiments</link> <comments>http://www.linuxforu.com/2011/07/expeyes-low-cost-platform-pc-interfaced-science-experiments/#comments</comments> <pubDate>Thu, 30 Jun 2011 18:42:17 +0000</pubDate> <dc:creator>Jalaja Ramanunni</dc:creator> <category><![CDATA[For You & Me]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[Ajith Kumar]]></category> <category><![CDATA[developers kit]]></category> <category><![CDATA[Educational Resources]]></category> <category><![CDATA[electronics department]]></category> <category><![CDATA[engineering students]]></category> <category><![CDATA[experiment platform]]></category> <category><![CDATA[inexpensive hardware]]></category> <category><![CDATA[Inter University Accelerator Centre]]></category> <category><![CDATA[IUAC]]></category> <category><![CDATA[LFY July 2011]]></category> <category><![CDATA[low cost computer]]></category> <category><![CDATA[particle accelerator]]></category> <category><![CDATA[Phoenix]]></category> <category><![CDATA[python]]></category> <category><![CDATA[Python library]]></category> <category><![CDATA[science experiments]]></category> <category><![CDATA[sensor elements]]></category> <category><![CDATA[software framework]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=5009</guid> <description><![CDATA[There is nothing new about computer-interfaced science experiments, but how often do you come across an experiment platform that costs as little as Rs 3,000? Inter University Accelerator Centre (IUAC) has developed one...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2011/07/ResearchMain-590x306.jpg?d9c344" alt="Research time!" title="Research time!" width="590" height="306" class="aligncenter size-large wp-image-8094" /><div class="introduction">There is nothing new about computer-interfaced science experiments, but how often do you come across an experiment platform that costs as little as Rs 3,000? Inter University Accelerator Centre (IUAC) has developed one such framework that can also be used as an electronics developers&#8217; kit.</div><p>Schools and colleges often focus on theory, forgetting that practical experiments teach students much more. Members of the Inter University Accelerator Centre (IUAC), an autonomous research institute that provides research facilities to universities, realised the need for experiments in education when teachers and students from other colleges visited their centre and tried to use a particle accelerator. They were not familiar with computer-controlled experiments that use machines to collect data.</p><p>Dr Ajith Kumar, a scientist at IUAC, wished to provide teachers and students with exposure to this technology and started designing a low-cost device that measured and collected data, and displayed the results graphically on the computer screen.</p><p>Along with members of the electronics department, he developed Phoenix expEYES &#8212; a combination of hardware and a software framework for computer-interfaced science experiments that doesn&#8217;t require the user to get into the details of electronics or computer programming.</p><p>&#8220;We decided to create something on a smaller scale and introduce the idea of a low-cost computer-interfaced platform for science experiments,&#8221; comments Dr Kumar.</p><h2>A platform for experiments</h2><p>The equipment places emphasis on leveraging the power of personal computers for experiment control, data acquisition and processing. Phoenix expEYES enables one to carry out a variety of experiments using the same inexpensive hardware with different sensor elements and software. The design also enables the platform to be used as an electronics developers&#8217; kit, as it can be used to develop and learn microcontroller programming. Engineering students can play with the platform or use it as a case study to design similar applications. Since the code and design are open, one can get any details required from IUAC&#8217;s website.</p><h2>What sets it apart</h2><p>Dr Kumar emphasises, &#8220;Our main aim was to make the equipment affordable to educational institutes. Phoenix ExpEYES is priced at around Rs 3000, and a similar product in the market would cost you over Rs 1,00,000.&#8221;</p><p>The team managed to keep the price low by using low-cost hardware components that were readily available. Another cost-cutting factor is that the project runs on free and open source software, mostly written in the Python programming language.</p><p>One of the unique features of Phoenix is its job division. The microcontroller collects real-time data from the experiment, while the software running on the PC takes care of complex processing and graphical representation of data on the computer screen.</p><p>Phoenix&#8217; hardware consists of an Atmel ATMega32 microcontroller running a C program, a 12-bit resolution analogue-to-digital converter (ADC), a 12-bit digital-to-analogue converter, variable gain amplifiers, several oscillators and a programmable constant-current source. It is interfaced to a computer that runs Python.</p><p>&#8220;For the software, we have chosen Python, as it has a huge collection of libraries and programs for scientific computation. Several GUI programs have already been written to perform various experiments. A user can conduct several experiments using the GUI programs provided, but designing new experiments would require accessing the hardware accordingly. Hardware features can be accessed by entering codes that are well documented from the Python library,&#8221; Dr Kumar explains.</p><p>Phoenix&#8217;s clear-cut architecture is one of its USPs, enabling teachers to develop new experiments with the help of the library. Dr Kumar elaborates, &#8220;School teachers are usually not inclined to learn about electronics or micro-level C-programming, but this is usually required when communicating with the hardware. With Phoenix, all they need to do is write one line of code that is readily available in the Python Library, and they will get the required data. For example, if you want to capture the frequency from a guitar, you can type a code on the computer and send a request to the hardware unit. The computer&#8217;s monitor will display the frequency while you remain uninvolved with the programming details.&#8221;</p><p>The communication takes place between the microcontroller in the hardware unit and the computer running Python through a USB connection.</p><p>Dr Kumar adds, &#8220;The earlier units that we designed had parallel ports but this kind of interfacing lost popularity. Hence we migrated to the USB interface. It also helped that Python already had a library to communicate to the serial port.&#8221;</p><h2>Overcoming challenges</h2><p>Ensuring a low price point for Phoenix was one of the biggest challenges before Dr Kumar and his team while designing the platform. Getting high-resolution components at low prices was a challenge in itself and required a lot of research. For example, the IUAC team went to different markets to compare prices, easy availability and resolutions before selecting ADCs. They decided to use a product from Microchip which was available for Rs 150.</p><p>&#8220;Apart from affordability, we looked for components that could be sourced easily and repaired by the users in case of damages that are common in school and college laboratories,&#8221; reasons Dr Kumar.</p><p>The IUAC team also worked on the flexibility of the framework by writing the software in a generic manner. It is easy to get the hardware and write a simple software that makes the platform rigid. However, this would have meant only a limited number of experiments that could be done on the platform.</p><p>Dr Kumar illustrates, &#8220;We can easily design platforms that work for five experiments, but for the sixth one, the user should know all the details of hardware design and coding. This makes the architecture rigid. In Phoenix, a lot of time and effort went into making the architecture flexible. Its generic nature allows software that supports time measurement, to be used to measure acceleration due to gravity and the velocity of sound.&#8221;</p><h2>What lies ahead</h2><p>IUAC&#8217;s goal is to help engineering students develop their own projects rather than purchase them from professionals in the field. It has approached NCERT and CBSE to make it a part of every school&#8217;s lab activities.</p><p>&#8220;As of now, the only explanation given for AC and DC is that AC means alternating current and DC means direct current. Most of the teachers cannot answer a simple query like, &#8216;What is the nature of voltage on a three-point mains socket?&#8217; We hope that Phoenix will help them improve by changing their practices, and making students and teachers more aware of what is in their books,&#8221; Dr Kumar concludes.<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2011/06/open-hardware-sparks-innovation-in-robotics/" rel="bookmark" class="crp_title">Open Hardware Sparks Innovation in Robotics</a></li><li><a href="http://www.linuxforu.com/2012/01/building-image-processing-embedded-systems-using-python-part-1/" rel="bookmark" class="crp_title">Building Image Processing Embedded Systems using Python, Part 1</a></li><li><a href="http://www.linuxforu.com/2011/05/nit-c-students-use-postgresql-to-design-their-own-video-file-repository/" rel="bookmark" class="crp_title">NIT-C Students Use PostgreSQL to Design Their Own Video Repository</a></li><li><a href="http://www.linuxforu.com/2011/06/generic-hardware-access-in-linux/" rel="bookmark" class="crp_title">Device Drivers, Part 7: Generic Hardware Access in Linux</a></li><li><a href="http://www.linuxforu.com/2010/08/ami-bets-on-open-source-for-the-embedded-space/" rel="bookmark" class="crp_title">AMI bets on open source for the embedded space</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/ajith-kumar/" title="Ajith Kumar" rel="tag">Ajith Kumar</a>, <a href="http://www.linuxforu.com/tag/developers-kit/" title="developers kit" rel="tag">developers kit</a>, <a href="http://www.linuxforu.com/tag/educational-resources/" title="Educational Resources" rel="tag">Educational Resources</a>, <a href="http://www.linuxforu.com/tag/electronics-department/" title="electronics department" rel="tag">electronics department</a>, <a href="http://www.linuxforu.com/tag/engineering-students/" title="engineering students" rel="tag">engineering students</a>, <a href="http://www.linuxforu.com/tag/experiment-platform/" title="experiment platform" rel="tag">experiment platform</a>, <a href="http://www.linuxforu.com/tag/inexpensive-hardware/" title="inexpensive hardware" rel="tag">inexpensive hardware</a>, <a href="http://www.linuxforu.com/tag/inter-university-accelerator-centre/" title="Inter University Accelerator Centre" rel="tag">Inter University Accelerator Centre</a>, <a href="http://www.linuxforu.com/tag/iuac/" title="IUAC" rel="tag">IUAC</a>, <a href="http://www.linuxforu.com/tag/lfy-july-2011/" title="LFY July 2011" rel="tag">LFY July 2011</a>, <a href="http://www.linuxforu.com/tag/low-cost-computer/" title="low cost computer" rel="tag">low cost computer</a>, <a href="http://www.linuxforu.com/tag/particle-accelerator/" title="particle accelerator" rel="tag">particle accelerator</a>, <a href="http://www.linuxforu.com/tag/phoenix/" title="Phoenix" rel="tag">Phoenix</a>, <a href="http://www.linuxforu.com/tag/python/" title="python" rel="tag">python</a>, <a href="http://www.linuxforu.com/tag/python-library/" title="Python library" rel="tag">Python library</a>, <a href="http://www.linuxforu.com/tag/science-experiments/" title="science experiments" rel="tag">science experiments</a>, <a href="http://www.linuxforu.com/tag/sensor-elements/" title="sensor elements" rel="tag">sensor elements</a>, <a href="http://www.linuxforu.com/tag/software-framework/" title="software framework" rel="tag">software framework</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2011/07/expeyes-low-cost-platform-pc-interfaced-science-experiments/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>The Database Demands of Peta-scale Computing</title><link>http://www.linuxforu.com/2011/07/database-demands-of-peta-scale-computing/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=database-demands-of-peta-scale-computing</link> <comments>http://www.linuxforu.com/2011/07/database-demands-of-peta-scale-computing/#comments</comments> <pubDate>Thu, 30 Jun 2011 18:40:29 +0000</pubDate> <dc:creator>Saravanan Chidambaram</dc:creator> <category><![CDATA[Features]]></category> <category><![CDATA[Open Gurus]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[access times]]></category> <category><![CDATA[adverse impact]]></category> <category><![CDATA[analytics tools]]></category> <category><![CDATA[bank transactions]]></category> <category><![CDATA[big data]]></category> <category><![CDATA[Cassandra]]></category> <category><![CDATA[cloud computing]]></category> <category><![CDATA[Daniel Abadi]]></category> <category><![CDATA[data cloud]]></category> <category><![CDATA[data tables]]></category> <category><![CDATA[data volumes]]></category> <category><![CDATA[database access]]></category> <category><![CDATA[database analytics product]]></category> <category><![CDATA[database systems]]></category> <category><![CDATA[database technology]]></category> <category><![CDATA[dbms]]></category> <category><![CDATA[DBMS Technologies]]></category> <category><![CDATA[Gordon Moore]]></category> <category><![CDATA[Intel]]></category> <category><![CDATA[LFY July 2011]]></category> <category><![CDATA[Moore's Law]]></category> <category><![CDATA[multi-core processor]]></category> <category><![CDATA[multi-core processors]]></category> <category><![CDATA[NoSQL]]></category> <category><![CDATA[NVRAM]]></category> <category><![CDATA[OLAP]]></category> <category><![CDATA[OLTP]]></category> <category><![CDATA[OnLine Analytical Processing]]></category> <category><![CDATA[OnLine Transaction Processing]]></category> <category><![CDATA[partitioning]]></category> <category><![CDATA[persistent stores]]></category> <category><![CDATA[peta-scale computing]]></category> <category><![CDATA[query access]]></category> <category><![CDATA[query processing]]></category> <category><![CDATA[query results]]></category> <category><![CDATA[RDBMS]]></category> <category><![CDATA[Relational databases]]></category> <category><![CDATA[semiconductor chips]]></category> <category><![CDATA[traditional databases]]></category> <category><![CDATA[Vertica]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=5007</guid> <description><![CDATA[Let&#8217;s take a brief look at the challenges for next-generation databases. In the previous article we discussed that as data volumes grow towards the peta-scale and beyond, most traditional databases find it difficult...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2011/07/db-cloud1-590x206.jpg?d9c344" alt="Demands of Peta-scale Computing" title="Demands of Peta-scale Computing" width="590" height="206" class="aligncenter size-large wp-image-8062" /><div class="introduction">Let&#8217;s take a brief look at the challenges for next-generation databases.</div><p>In the <a href="http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/" title="Databases in the Era of Cloud Computing and Big Data">previous article</a> we discussed that as data volumes grow towards the peta-scale and beyond, most traditional databases find it difficult to scale up to the needs of cloud computing and Big Data. While Big Data drives the need for databases to handle huge volumes of data, cloud computing requires dynamic scalability.</p><h2>Dynamic scalability</h2><p>Dynamic scalability, at first glance, appears simple to achieve in databases with the &#8220;Shared Nothing&#8221; architecture. This architecture splits the database into multiple partitions, with one partition per server. Therefore, it should be possible to dynamically scale by adding another server and repartitioning the database across all servers. However, database partitioning is an important factor in the performance of database access.</p><p>Database queries frequently access related sets of information. If a frequently accessed related set of data gets partitioned across multiple servers, data from multiple partitions needs to be accessed before the query results can be returned &#8212; a process known as data shipping. This can have a large adverse impact on query access times.</p><p>Data partitioning needs to take data access patterns into account in order to reduce data shipping costs. Hence, &#8220;Shared Nothing&#8221; traditional relational databases typically find it difficult to achieve dynamic scalability.</p><p>The limitation of the traditional RDBMS in being able to scale dynamically led to the NoSQL movement, and the development of special-purpose data-stores (since the term database is typically used to represent an ACID-compliant traditional database, we use the term data-store for the NoSQL databases). They proposed data persistent stores typically built as replicated data tables, which trade off strict ACID semantics for improved scalability.</p><p>These systems are characterised by eventual consistency (which we discussed in <a href="http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/" title="Databases in the Era of Cloud Computing and Big Data">an earlier article</a>) instead of the strict ACID consistency of RDBMSs. While NoSQL databases such as BigTable, Cassandra or SimpleDB have shown excellent performance, it is not yet clear whether they can scale to the demands of the next decade&#8217;s peta-scale computing.</p><h2>Big data challenges</h2><p>Over the past few decades, there have been two major camps of database technology. The first one is that of OnLine Transaction Processing (OLTP) workloads, and the second that of OnLine Analytical Processing (OLAP) workloads.</p><p>An OLTP workload typically consists of a number of concurrent transactions that perform insertions, deletions and updates of the data stored in the database. OLTP workloads are both read- and write-intensive. Popular examples that we encounter in everyday life are bank transactions (deposits and withdrawals) and credit-card transactions.</p><p>OLAP workloads, on the other hand, typically perform complicated and compute-intensive analysis on the data stored in the database. For example, the CEO of a supermarket chain wants to know what the total volume of sales over the Christmas holiday period is, from each store. This requires OLAP query processing.</p><p>OLAP queries are typically read-intensive. An OLTP transaction typically accesses one or a few records, but it accesses all fields of the accessed records. On the other hand, OLAP queries access a large number of records of the database, but only access one, or a very few, of the fields of each record.</p><p>The differences in the very nature of OLTP and OLAP workloads demand certain design differences in the databases for each of these workloads. An example of these differences is the row-oriented record layout of Row-Store Databases, versus the columnar layout of Column-Store Databases.</p><p>In row stores, all the fields of the record are stored contiguously, with one record completely laid out before another record starts. In columnar stores, the column fields are laid out contiguously.</p><p>Recall that OLTP typically operates on one or a very few number of records, and hence benefits from the row layout. On the other hand, OLAP workloads access a large number of records, accessing only one or a small number of columns in each record. Hence they benefit from the columnar layout. With a column-store architecture, a DBMS needs only to read the values of the columns required for processing a given query, and can avoid bringing into memory irrelevant fields that are not needed for the current query.</p><p>In warehouse environments where typical queries involve aggregates performed over large numbers of data items, a column store has a sizeable performance advantage over a row store. A number of popular column oriented databases have been developed, such as <a href="http://www.voltdb.com/">VoltDB</a> and <a href="http://www.vertica.com/">Vertica</a>.</p><p>As mentioned earlier, OLAP is read-intensive; the database performance needs to be optimised for read operations, but at the same time, it needs to support database updates while maintaining ACID consistency. Some modern databases try to optimise for both, by providing both read- and write-optimised stores.</p><p>C-store is the academic project on which the popular database analytics product, Vertica, is based. C-store provides both a read-optimised column store and an update/insert-oriented writeable store, connected by a tuple mover.</p><div id="attachment_8064" class="wp-caption aligncenter" style="width: 400px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2011/07/c-store.jpeg?d9c344" alt="C-store" title="C-store" width="400" height="213" class="size-full wp-image-8064" /><p class="wp-caption-text">C-store</p></div><p>C-Store supports a small Writeable Store (WS) component, which provides extremely fast insert and update operations. It also supports a large Read-optimised Store (RS), which is well optimised for query operations. Read-Store supports only a very restricted form of insert, namely the batch movement of records from WS to RS, a task that is performed by the tuple mover.</p><p>While till now the buzz in the database industry has been about &#8220;Big Data&#8221;, which denotes the huge volumes of data expected to be generated by peta-scale computing, it is not just the data that matters. After all, DBMS and data analytics are not just about raw data; they are about converting data into information, which is then converted by analytics tools into business insights. Therefore, the term &#8220;Big Data&#8221; internally covers the following requirements as well:</p><ol><li>Any future database system should be able to handle huge volumes of data (as denoted by &#8220;Big Data&#8221; needs).</li><li>They should be able to process and query the data at great speeds (hence the need for fast data). This requirement brings forth the need for complex programming models that can query and process large data, such as Map-Reduce.</li><li>They should be able to analyse the data and convert it into business intelligence using complex analytics (hence the need for deep analytics).</li></ol><p>Therefore the challenge for future database systems is not just about big data, but about handling big data that requires fast query responses and deep analytics.</p><h2>Taking advantage of emerging hardware</h2><p>It is not only the database industry that is undergoing many changes to address the twin challenges of big data and cloud computing. The hardware industry is at an inflection point today as we discuss below.</p><p>In 1965, Dr Gordon Moore from Intel predicted that the transistor density of semiconductor chips would double approximately every 18 to 24 months &#8212; this is popularly referred to as Moore&#8217;s Law. For many decades, till now, such an exponential growth in the number of transistors on the chip has roughly translated into improvements in processor performance. Moore&#8217;s Law enabled processor performance, more specifically single-thread performance, to double every 18 months &#8212; thereby allowing software developers to deliver increasingly complex functionality and greater performance without having to rewrite their code.</p><p>Though Moore&#8217;s Law itself continues to hold good even now, the diminishing returns from hardware ILP extraction techniques, and the constraints on increasing the clock frequencies due to the limitations of power consumption and heat dissipation of the chip have forced chip manufacturers to look towards multi-core processor designs to improve overall system performance, instead of the traditional single-core processor designs.</p><p>With multiple cores on a single chip executing simultaneously, it is possible to meet the throughput demands of an application while keeping the processor clock frequencies moderate, to contain power consumption and the heat generated. So, what do multi-core processors mean to future database systems?</p><p>Multi-core systems require database systems that can extract greater parallelism, in both intra- and inter-query processing. This requires Massively Parallel Databases (MPP Databases), which support &#8220;Shared Nothing&#8221; architectures.</p><p>While parallel databases are known to scale to hundreds of nodes, they have not been shown to scale to thousands of nodes and beyond. On the other hand, non-traditional database architectures, like data stores supporting Map-Reduce, have been scaled to thousands of nodes with extremely high availability and fault resiliency.</p><p>Hence, the question that needs to be answered is: Are massively parallel databases the right solution for tomorrow&#8217;s multi-core era or is it the Map-Reduce data stores?</p><p>A detailed discussion that compares the relative merits of these two systems was described in the paper, &#8220;A Comparison of Approaches to Large-Scale Data Analysis&#8221; [<a href="http://www.cse.nd.edu/~dthain/courses/cse40771/spring2010/benchmarks-sigmod09.pdf">PDF</a>].</p><p>Current state-of-the-art approaches have also proposed combining massively parallel databases with the Hadoop architecture, as proposed in HadoopDB, which is described in the paper, &#8220;HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads&#8221; [<a href="http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf">PDF</a>].</p><p>Another interesting direction for the future of database technology is to take advantage of the emerging GPU Many-Core architectures. There has been work on off-loading many of the data query operators to GPU architectures. Techniques for mapping many database operations to GPUs are discussed in the paper, &#8220;Fast Computation of Database Operations using Graphics Processors&#8221; [<a href="http://www.cs.unc.edu/~lin/db.pdf">PDF</a>].</p><p>The most important hardware innovation for future databases is, of course, the introduction of Non-Volatile RAM. Databases have traditionally been using the buffer pool in memory to speed up database operations, while also maintaining the ACID properties.</p><p>The introduction of NVRAM takes away the need for the buffer pool, and requires a database redesign. While current databases have been designed for optimising disk reads, which are much slower than main-memory accesses, this will change drastically with NVRAM, since RAM is going to be persistent. Therefore, newer database systems explicitly designed for NVRAM are the need of the day.</p><p>While we have been discussing the challenges for future databases, I would like to bring to the reader&#8217;s attention an informative blog on state-of-the-art database design &#8212; <a href="http://dbmsmusings.blogspot.com/">dbmsmusings.blogspot.com</a> &#8212;  which is maintained by Prof Daniel Abadi, who is a pioneer in modern database design. His insightful articles on database research are a must-read for database programmers.<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2012/01/newsql-handle-big-data/" rel="bookmark" class="crp_title">NewSQL &#8212; The New Way to Handle Big Data</a></li><li><a href="http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/" rel="bookmark" class="crp_title">Databases in the Era of Cloud Computing and Big Data</a></li><li><a href="http://www.linuxforu.com/2011/02/up-close-and-personal-with-nosql/" rel="bookmark" class="crp_title">Up Close and Personal with NoSQL</a></li><li><a href="http://www.linuxforu.com/2012/01/importance-of-in-memory-databases/" rel="bookmark" class="crp_title">The Importance of In-memory Databases</a></li><li><a href="http://www.linuxforu.com/2011/05/a-simple-guide-to-database-design-in-mysql/" rel="bookmark" class="crp_title">A Simple Guide to Database Design in MySQL</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/access-times/" title="access times" rel="tag">access times</a>, <a href="http://www.linuxforu.com/tag/adverse-impact/" title="adverse impact" rel="tag">adverse impact</a>, <a href="http://www.linuxforu.com/tag/analytics-tools/" title="analytics tools" rel="tag">analytics tools</a>, <a href="http://www.linuxforu.com/tag/bank-transactions/" title="bank transactions" rel="tag">bank transactions</a>, <a href="http://www.linuxforu.com/tag/big-data/" title="big data" rel="tag">big data</a>, <a href="http://www.linuxforu.com/tag/cassandra/" title="Cassandra" rel="tag">Cassandra</a>, <a href="http://www.linuxforu.com/tag/cloud-computing/" title="cloud computing" rel="tag">cloud computing</a>, <a href="http://www.linuxforu.com/tag/daniel-abadi/" title="Daniel Abadi" rel="tag">Daniel Abadi</a>, <a href="http://www.linuxforu.com/tag/data-cloud/" title="data cloud" rel="tag">data cloud</a>, <a href="http://www.linuxforu.com/tag/data-tables/" title="data tables" rel="tag">data tables</a>, <a href="http://www.linuxforu.com/tag/data-volumes/" title="data volumes" rel="tag">data volumes</a>, <a href="http://www.linuxforu.com/tag/database-access/" title="database access" rel="tag">database access</a>, <a href="http://www.linuxforu.com/tag/database-analytics-product/" title="database analytics product" rel="tag">database analytics product</a>, <a href="http://www.linuxforu.com/tag/database-systems/" title="database systems" rel="tag">database systems</a>, <a href="http://www.linuxforu.com/tag/database-technology/" title="database technology" rel="tag">database technology</a>, <a href="http://www.linuxforu.com/tag/dbms/" title="dbms" rel="tag">dbms</a>, <a href="http://www.linuxforu.com/tag/dbms-technologies/" title="DBMS Technologies" rel="tag">DBMS Technologies</a>, <a href="http://www.linuxforu.com/tag/gordon-moore/" title="Gordon Moore" rel="tag">Gordon Moore</a>, <a href="http://www.linuxforu.com/tag/intel/" title="Intel" rel="tag">Intel</a>, <a href="http://www.linuxforu.com/tag/lfy-july-2011/" title="LFY July 2011" rel="tag">LFY July 2011</a>, <a href="http://www.linuxforu.com/tag/moores-law/" title="Moore&#039;s Law" rel="tag">Moore&#039;s Law</a>, <a href="http://www.linuxforu.com/tag/multi-core-processor/" title="multi-core processor" rel="tag">multi-core processor</a>, <a href="http://www.linuxforu.com/tag/multi-core-processors/" title="multi-core processors" rel="tag">multi-core processors</a>, <a href="http://www.linuxforu.com/tag/nosql/" title="NoSQL" rel="tag">NoSQL</a>, <a href="http://www.linuxforu.com/tag/nvram/" title="NVRAM" rel="tag">NVRAM</a>, <a href="http://www.linuxforu.com/tag/olap/" title="OLAP" rel="tag">OLAP</a>, <a href="http://www.linuxforu.com/tag/oltp/" title="OLTP" rel="tag">OLTP</a>, <a href="http://www.linuxforu.com/tag/online-analytical-processing/" title="OnLine Analytical Processing" rel="tag">OnLine Analytical Processing</a>, <a href="http://www.linuxforu.com/tag/online-transaction-processing/" title="OnLine Transaction Processing" rel="tag">OnLine Transaction Processing</a>, <a href="http://www.linuxforu.com/tag/partitioning/" title="partitioning" rel="tag">partitioning</a>, <a href="http://www.linuxforu.com/tag/persistent-stores/" title="persistent stores" rel="tag">persistent stores</a>, <a href="http://www.linuxforu.com/tag/peta-scale-computing/" title="peta-scale computing" rel="tag">peta-scale computing</a>, <a href="http://www.linuxforu.com/tag/query-access/" title="query access" rel="tag">query access</a>, <a href="http://www.linuxforu.com/tag/query-processing/" title="query processing" rel="tag">query processing</a>, <a href="http://www.linuxforu.com/tag/query-results/" title="query results" rel="tag">query results</a>, <a href="http://www.linuxforu.com/tag/rdbms/" title="RDBMS" rel="tag">RDBMS</a>, <a href="http://www.linuxforu.com/tag/relational-databases/" title="Relational databases" rel="tag">Relational databases</a>, <a href="http://www.linuxforu.com/tag/semiconductor-chips/" title="semiconductor chips" rel="tag">semiconductor chips</a>, <a href="http://www.linuxforu.com/tag/traditional-databases/" title="traditional databases" rel="tag">traditional databases</a>, <a href="http://www.linuxforu.com/tag/vertica/" title="Vertica" rel="tag">Vertica</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2011/07/database-demands-of-peta-scale-computing/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Shared Computing &#8212; Where PCI Performs and USB Lags</title><link>http://www.linuxforu.com/2011/06/shared-computing-where-pci-performs-usb-lags/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=shared-computing-where-pci-performs-usb-lags</link> <comments>http://www.linuxforu.com/2011/06/shared-computing-where-pci-performs-usb-lags/#comments</comments> <pubDate>Tue, 31 May 2011 18:53:55 +0000</pubDate> <dc:creator>LFY Bureau</dc:creator> <category><![CDATA[For You & Me]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[computing environment]]></category> <category><![CDATA[desktop virtualisation solutions]]></category> <category><![CDATA[graphics driver]]></category> <category><![CDATA[HP]]></category> <category><![CDATA[Intel]]></category> <category><![CDATA[LFY June 2011]]></category> <category><![CDATA[nvidia]]></category> <category><![CDATA[PCI]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[server-based computing]]></category> <category><![CDATA[shared computing]]></category> <category><![CDATA[ubuntu]]></category> <category><![CDATA[USB]]></category> <category><![CDATA[virtual desktops]]></category> <category><![CDATA[VLC Player]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=8825</guid> <description><![CDATA[This article explores the evolution of shared computing, and evaluates the value proposition that a shared computing environment can offer to organisations. Also included are the test results conducted to gauge the efficacy...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2011/06/pci-usb.jpg?d9c344" alt="PCI/USB" title="PCI/USB" width="250" height="333" class="alignright size-full wp-image-8829" /><div class="introduction">This article explores the evolution of shared computing, and evaluates the value proposition that a shared computing environment can offer to organisations. Also included are the test results conducted to gauge the efficacy of two methods (PCI and USB) that can be used for desktop virtualisation.</div><p>The genesis of shared computing dates back to the time when mainframes fed dumb clients. Then came the era of server-based computing, when these mainframes were replaced by servers and the earlier dumb clients by full-fledged PCs. Over time, thin clients have gained traction in many server-based computing setups &#8212; where most, if not all applications are run on the server.</p><p>This server-based computing was intended to provide the same advantages as mainframe computing, while mitigating the cost and environmental factors &#8212; but it created a completely different set of disadvantages, the key one being the constrained user experience with limited desktop interface performance, especially when graphical applications are used. Apart from this, thin clients, which can be called scaled-down versions of PCs, invariably require special customisation and administration, expensive high-end server components, and so on.</p><p>Another option that has emerged to overcome these disadvantages, while still tapping the benefits of server-based computing, has been the zero client, based on the concept of desktop virtualisation. The concept behind these virtual desktops is to optimally use the power of the PC, which remains under-utilised in most cases, as most of us use a small fraction of its power. Desktop virtualisation enables a single PC to be &#8220;virtualised&#8221; (or shared) by many users &#8212; where all users get to experience their own individual computing session.</p><h2>Relevance of desktop virtualisation for organisations</h2><p>It is important to evaluate the suitability of desktop virtualisation for organisations of different sizes. There are a number of reasons and advantages that CTOs of companies of all sizes and markets can consider before adopting desktop virtualisation.</p><p>Shared computing or desktop virtualisation is an ideal solution for companies moving to cloud and Web-based computing. It works fine for CIOs seeking to better manage PC proliferation and costs. It is ideal for education and public sector institutions seeking to provide desktop access to all constituencies at a low initial cost of purchase, and even for those who want to achieve sustainability through low on-going costs.</p><p>Other benefits of desktop virtualisation include ease of centralised management, enhanced security as a result of limited access to critical infrastructure, low cost of entry, and dramatically reduced life-cycle and maintenance costs. And, what if we use open source operating systems along with desktop virtualisation?</p><p>This makes the solution even more compelling from the cost perspective.</p><h2>Desktop virtualisation methods</h2><p>To explore further, we decided to try out two popular methods of desktop virtualisation &#8212; USB and PCI. To evaluate how both of these methods fare, we performed a test using the available USB and PCI solutions to assess the difference in client PC performance, where server configuration remained the same &#8212; and are the server configuration details:</p><ul><li>Computer system used: HP MS6000 extended configuration</li><li>Operating system: Ubuntu 10.04</li><li>Processor: Intel Core 2 Quad Q9500</li><li>Chipset: Intel Q43 Express</li><li>RAM: 6 GB DDR3 (1333 MHz)</li><li>Hard disk drive: 500 GB SATA 3.0 Gb/s</li><li>Removable media: 16x SATA DVD writer drive</li><li>Graphics: Integrated Intel Graphics Media Accelerator 4500 (Intel Graphics Media Accelerator driver installed using Nvidia graphics driver 96.43.18 for Ubuntu 10.04.)</li></ul><p>We installed Ubuntu 10.04 on the server, then we installed a desktop virtualisation software that is required to connect client machines with the server. We also installed VLC Player. After configuring the server, we connected a total of three clients, first using USB and then PCI.</p><p>We then recorded readings for server performance while playing HD video on VLC &#8212; initially on the first client; then we connected the second client and played the same video on both client machines. Subsequently, we recorded the performance of the server with three clients connected to it and playing the same HD video.</p><table border="0"><thead><tr><td>Desktop Virtualization Method</td><td>Test No.</td><td>Test Case</td><td>CPU Performance Percentage</td><td>Load Average</td><td>Memory Used in GB</td></tr></thead><tbody><tr><td rowspan="4"><strong>USB</strong></td><td>1</td><td>Host PC (server) without any client</td><td>1-3%</td><td>0.2</td><td>0.9</td></tr><tr><td>2</td><td>With single client viewing HD video on VLC</td><td>20-25%</td><td>1.9</td><td>2.4</td></tr><tr><td>3</td><td>With two clients viewing HD video on VLC</td><td>45-50%</td><td>2.3</td><td>3.1</td></tr><tr><td>4</td><td>With three clients viewing HD video on VLC</td><td>95%</td><td>4.0</td><td>4.9</td></tr><tr><td rowspan="4"><strong>PCI</strong></td><td>1</td><td>Host PC (server) without any client</td><td>1-3%</td><td>0.2</td><td>0.9</td></tr><tr><td>2</td><td>With single client viewing HD video on VLC</td><td>14%</td><td>0.9</td><td>1.1</td></tr><tr><td>3</td><td>With two clients viewing HD video on VLC</td><td>24%</td><td>1.4</td><td>1.15</td></tr><tr><td>4</td><td>With three clients viewing HD video on VLC</td><td>35-36%</td><td>1.7</td><td>1.3</td></tr></tbody></table><p>Here are a few inferences based on the readings as shown in the above table:</p><ul><li>PCI is technically more reliable than USB for networking. You are less likely to experience dropped connections or a sluggish response time to your modem when using PCI instead of USB.</li><li>Ethernet cables can reach a longer distance than USB cables. A single Ethernet cable can run 10 metres (32 ft) on a direct connection with PCI, while USB cable runs are limited to approximately 5 metres (16 feet).</li></ul><p>The test demonstrates that PCI is better suited for network computing, while USB works well for peripheral connectivity.</p><h2>In a nutshell</h2><p>According to recent studies, desktop virtualisation solutions result in 75 per cent lower maintenance and 90 per cent lower energy costs. The low entry and reduced lifecycle cost of desktop virtualisation is turning the old economics of PC purchasing and maintenance on its head.</p><p>Amongst the different variants of virtual desktop devices, PCI scores over USB significantly in terms of performance, optimum utitilisation of resources as well as user experience. This indeed makes shared computing using PCI a superior value proposition over USB for organisations that want to enable wider access to computing at a low cost.</p><div class="imagecredit">Feature image courtesy: <a href="http://www.flickr.com/photos/outletpro/3867191297/">outletpro</a>. Reused under the terms of CC-BY 2.0 License.</div><div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2009/09/sneak-into-your-thumb-drive-from-the-cloud/" rel="bookmark" class="crp_title">Sneak Into Your Thumb Drive from the Cloud</a></li><li><a href="http://www.linuxforu.com/2009/01/virtual-microsoft/" rel="bookmark" class="crp_title">Virtual Microsoft</a></li><li><a href="http://www.linuxforu.com/2011/07/linux-preferred-web-hosting-choice/" rel="bookmark" class="crp_title">What Makes Linux the Preferred Web Hosting Choice?</a></li><li><a href="http://www.linuxforu.com/2011/06/olive-healthcare-slashes-costs-by-migrating-to-desktop-linux/" rel="bookmark" class="crp_title">Olive Healthcare Slashes IT Costs by Migrating to Desktop Linux</a></li><li><a href="http://www.linuxforu.com/2011/06/graphics-using-xlib-1/" rel="bookmark" class="crp_title">Graphics Using Xlib, Part 1</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/computing-environment/" title="computing environment" rel="tag">computing environment</a>, <a href="http://www.linuxforu.com/tag/desktop-virtualisation-solutions/" title="desktop virtualisation solutions" rel="tag">desktop virtualisation solutions</a>, <a href="http://www.linuxforu.com/tag/graphics-driver/" title="graphics driver" rel="tag">graphics driver</a>, <a href="http://www.linuxforu.com/tag/hp/" title="HP" rel="tag">HP</a>, <a href="http://www.linuxforu.com/tag/intel/" title="Intel" rel="tag">Intel</a>, <a href="http://www.linuxforu.com/tag/lfy-june-2011/" title="LFY June 2011" rel="tag">LFY June 2011</a>, <a href="http://www.linuxforu.com/tag/nvidia/" title="nvidia" rel="tag">nvidia</a>, <a href="http://www.linuxforu.com/tag/pci/" title="PCI" rel="tag">PCI</a>, <a href="http://www.linuxforu.com/tag/performance/" title="performance" rel="tag">performance</a>, <a href="http://www.linuxforu.com/tag/server-based-computing/" title="server-based computing" rel="tag">server-based computing</a>, <a href="http://www.linuxforu.com/tag/shared-computing/" title="shared computing" rel="tag">shared computing</a>, <a href="http://www.linuxforu.com/tag/ubuntu/" title="ubuntu" rel="tag">ubuntu</a>, <a href="http://www.linuxforu.com/tag/usb/" title="USB" rel="tag">USB</a>, <a href="http://www.linuxforu.com/tag/virtual-desktops/" title="virtual desktops" rel="tag">virtual desktops</a>, <a href="http://www.linuxforu.com/tag/vlc-player/" title="VLC Player" rel="tag">VLC Player</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2011/06/shared-computing-where-pci-performs-usb-lags/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Databases in the Era of Cloud Computing and Big Data</title><link>http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=databases-in-era-of-cloud-computing-and-big-data</link> <comments>http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/#comments</comments> <pubDate>Sat, 30 Apr 2011 18:38:07 +0000</pubDate> <dc:creator>Saravanan Chidambaram</dc:creator> <category><![CDATA[Features]]></category> <category><![CDATA[Open Gurus]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[acid]]></category> <category><![CDATA[ACID vs BASE]]></category> <category><![CDATA[Amazon]]></category> <category><![CDATA[Amazon EC2]]></category> <category><![CDATA[App Engine]]></category> <category><![CDATA[AT&T]]></category> <category><![CDATA[BASE]]></category> <category><![CDATA[big data]]></category> <category><![CDATA[CAP theorem]]></category> <category><![CDATA[Cassandra]]></category> <category><![CDATA[cloud computing]]></category> <category><![CDATA[cloud computing services]]></category> <category><![CDATA[DaaS]]></category> <category><![CDATA[data management]]></category> <category><![CDATA[data revolution]]></category> <category><![CDATA[data structures]]></category> <category><![CDATA[database research]]></category> <category><![CDATA[database systems]]></category> <category><![CDATA[emerging technologies]]></category> <category><![CDATA[Eric Brewster]]></category> <category><![CDATA[Google]]></category> <category><![CDATA[Google App Engine]]></category> <category><![CDATA[I/O]]></category> <category><![CDATA[IBM]]></category> <category><![CDATA[JSON]]></category> <category><![CDATA[LFY May 2011]]></category> <category><![CDATA[LinkedIn]]></category> <category><![CDATA[media streaming]]></category> <category><![CDATA[Microsoft]]></category> <category><![CDATA[momentum]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[non volatile memory]]></category> <category><![CDATA[NoSQL]]></category> <category><![CDATA[OLAP]]></category> <category><![CDATA[Oracle]]></category> <category><![CDATA[platform architectures]]></category> <category><![CDATA[programming models]]></category> <category><![CDATA[programming paradigm]]></category> <category><![CDATA[Query Language]]></category> <category><![CDATA[Rackspace]]></category> <category><![CDATA[RDBMS]]></category> <category><![CDATA[Relational databases]]></category> <category><![CDATA[SaaS]]></category> <category><![CDATA[Salesforce.com]]></category> <category><![CDATA[software stacks]]></category> <category><![CDATA[SQL]]></category> <category><![CDATA[Twitter]]></category> <category><![CDATA[Voldemort]]></category> <category><![CDATA[Werner Vogel]]></category> <category><![CDATA[workloads]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=4366</guid> <description><![CDATA[We take a look at the directions in which databases are evolving, driven by the twin factors of the &#8220;Cloud&#8221; and &#8220;Big Data&#8221;. Let&#8217;s start with a quick look at cloud computing, and...]]></description> <content:encoded><![CDATA[<p><img class="aligncenter size-large wp-image-8059" title="Big data" src="http://cdn.linuxforu.com/wp-content/uploads/2011/05/db-cloud-590x286.jpg?d9c344" alt="Big data" width="590" height="286" /></p><div class="introduction">We take a look at the directions in which databases are evolving, driven by the twin factors of the &#8220;Cloud&#8221; and &#8220;Big Data&#8221;. Let&#8217;s start with a quick look at cloud computing, and discuss the Big Data explosion, focusing on its impact on database systems. We will trace the evolution of databases from simple flat-file to enterprise RDBMS and NoSQL databases. We will briefly touch on the NoSQL movement, CAP theorem and ACID vs BASE semantics. The second part will uncover the challenges for next-generation databases, and see how database research aims to address these needs, using emerging technologies like non-volatile memory and the many-core revolution.</div><p>If asked, &#8220;What are the two mega-trends in the computing industry likely to be in the coming decade?&#8221;, most IT industry folks would name &#8220;Big Data&#8221; and &#8220;Cloud Computing&#8221; as the driving forces shaping the industry today. The computing world is shifting from enterprise-centric to data-centric workloads driven by the &#8220;Big Data&#8221; revolution, while cloud computing is becoming mainstream, reinventing utility/elastic computing as the new mantra for the IT industry.</p><p>The influence of these two paradigms is driving the revolution in different fields of the computing industry, such as processor/platform architectures, programming models, programming languages and software stacks, to name just a few. After all, no one would have paid attention to &#8220;Map-Reduce&#8221; as a programming paradigm, if it was not for Web 2.0 and the shift towards cloud computing.</p><p>The field of databases is no exception, and has been influenced heavily by these two driving forces. Cloud computing is driving the momentum towards making the database available as a service on the cloud. &#8220;Big Data&#8221; is changing the traditional ways in which data is stored, accessed and manipulated, with the arrival of the NoSQL movement and domain specific databases, resulting in moving computing closer to data. And if you ask the Internet-savvy the cheeky question, &#8220;What is the connection between Cassandra, Voldemort and Dynamo?&#8221; (No, this is not related to Harry Potter!), you can expect the majority to give you the correct answer &#8212; that they are all specialised NoSQL databases.</p><p>The fact that databases are no longer the domain of the few specialised expert programmers signifies the shift of the database movement from enterprise-centric workloads to focusing and delivering on the needs of social media networks, collaborative computing, massively multi-player online gaming, etc.</p><p>Cloud computing is transforming the way data is stored, retrieved and served. Computing resources like servers, storage, network and applications (including databases) are hosted, and made available as cloud services, for a price. Cloud platforms have evolved to offer many IT needs as online services, without having to invest in expensive data centres and worry about the hassle of managing them. Cloud platforms virtually alleviate the need of having your own expensive data-centre.</p><p>Today, those managing cloud platforms offer to design, develop, test, deploy and host apps in the cloud environment, with impressively economical cost models. Amazon EC2, Google App Engine, Microsoft Azure, Appistry CloudIQ, AppScale, AT&amp;T Synaptic and RackSpace are just a few of the offerings.</p><p>There are various classes of cloud computing services, such as:</p><ol><li><strong>SaaS:</strong> In simple terms, SaaS is a software delivery model wherein business application software is hosted on the Internet/cloud centrally, and can serve multiple customers &#8220;on demand&#8221;.</li><li><strong>PaaS:</strong> This provides all systems and environments encompassing the software development life cycle, including development, testing, deploying and the hosting of business applications. Examples include Google App Engine and Microsoft&#8217;s Azure.</li><li><strong>IaaS:</strong> Infrastructure, in terms of computing resources and operating systems, is also offered as a service.</li></ol><p>After all, hosting of business applications on the cloud also entails maintaining and manipulating the data by the applications.</p><p>Initially, it was left to developers to install, manage and use their choice of database instance on the cloud, with the burden of all the database administration tasks being left to the developer. The advantage of this is that you choose your own database and have full control over how the data is managed.</p><p>In order to simplify the burden on the users of their cloud offerings, many PaaS vendors today have started offering database services on the cloud. All physical database administration tasks, such as backup, recovery, managing the logs, etc., are managed by the cloud provider. The responsibility for logical administration of the database, including table tuning and query optimisation, rests on the developer.</p><p>The choice of databases is also limited typically to MySQL or Oracle RDBMS. Examples include Amazon&#8217;s Relational Database Service (RDS), <a href="http://www.joyent.com/">Joyent</a>&#8216;s MySQL and Microsoft&#8217;s SQL Azure. Salesforce.com offers Database.com, which is actually built over Oracle&#8217;s Real Application Clusters. Though Database.com itself does not support traditional SQL, the query language it offers is built upon SQL, and is known as &#8220;Salesforce Object Query Language&#8221; (SOQL), with limited querying capabilities. This is internally translated to SQL by its system.</p><p>While these are cloud offerings of database services, their underlying technology is still traditional SQL-based database technology, not specifically reinvented for the cloud.</p><p>Of late, there has also been a considerable buzz around offering &#8220;Database as a Service (DaaS)&#8221;, wherein an organisation&#8217;s database needs are met by database offerings on the cloud. There are two use-case scenarios:</p><ol><li>A single large organisation that has many individual databases which can be migrated to a private cloud for the organisation, and&#8230;</li><li>Outsourcing the data management needs of small and medium organisations to a public cloud provider, who caters to multiple small and medium businesses.</li></ol><p>A true DaaS offering should satisfy certain requirements such as:</p><ul><li>Freeing the end developer/user from database administration, tuning and maintenance activities, while offering high performance, availability and fault tolerance, as well as advanced features like snapshot, analytics and time travel.</li><li>Elasticity, or the ability to dynamically adjust to changing workloads. Elasticity is required to meet user SLAs while minimising the cloud provider&#8217;s infrastructure, power and administration expenses.</li><li>Security and privacy guarantees, and a pay-as-per-usage pricing model. Today, there aren&#8217;t any true DaaS offerings that satisfy all these requirements. Therefore, these cloud-computing needs will drive the next generation of database evolution.</li></ul><h2>Big Data explosion and its impact on databases</h2><p>The term &#8220;Big Data&#8221; is used to represent the explosive growth in online data, which has significantly outpaced the increases in CPU processing power, memory and storage capacity over the last few years. For instance, the amount of online data indexed by Google has grown from 5 exabytes in 2002 (1 exabyte is equal to 1 million trillion bytes) to 280 exabytes in 2009, numbers that are conclusively beyond the processing capabilities of any single relational database.</p><p>This explosion in data is not just limited to the Web, but has also occurred at the enterprise. Where data was earlier being generated from simple in-house data entry feeds, database management has to cater to data from multiple external data sources such as customers, GPS, mobile devices, the general public, point-of-sale devices, sensor data and so on.</p><p>There are new kinds of data, such as Web pages, digitised content such as books and records, music, videos, photos, satellite images, scientific data, messages, tweets and sensor data &#8212; each with different data-processing requirements.</p><p>Traditionally, databases only needed to cater to enterprise-centric workloads such as OLTP/OLAP. However, Big Data has ushered in a whole new set of data-centric workloads, such as Web search, massively multi-player online games, online message systems like Twitter, sensor networks, social network analysis, media streaming, photo processing, etc. The data management needs of all these data-types cannot be met by traditional database architectures. These data-centric workloads have different characteristics in the following areas:</p><ol><li>Response time requirements &#8212; such as real-time versus non-real-time.</li><li>Data types:<ul><li>Structured data that fits in well with traditional RDBMS schemas.</li><li>Semi-structured data, like XML or email.</li><li>Fully unstructured data, such as binary or sensor data.</li></ul></li><li>Processing complexity:<ul><li>Simple data operations, such as aggregate, sort or upload/download, with a low compute-to-data-access ratio.</li><li>Medium compute complexity operations on data, such as pattern matching, search or encryption.</li><li>Complex processing, such as video encoding/decoding, analytics, prediction, etc.</li></ul></li></ol><p>Big Data has brought forth the issue of &#8220;database as the bottleneck&#8221; for many of these data-centric workloads, due to their widely varying requirements. The inability of the traditional RDBMS to scale up to massive data sets led to alternatives such as Data Sharding and Scale-Out Architectures, and subsequently to the NoSQL movement, which we will discuss later.</p><h2>Database evolution from the 1960s onwards</h2><p>There has been a huge evolution from the simple systems of the 1960s to what we have today. Let&#8217;s look at some of the stages in this evolution.</p><h3>Flat and hierarchical databases</h3><p>In a flat model, the data is stored as records and delimiters in a simple file. In  hierarchical data, model data is organised into a tree-like structure using parent/child relationships with a one-to-many ratio (see <a href="http://en.wikipedia.org/wiki/Tree_data_structure">this Wikipedia article</a>).</p><p>This was the precursor to relational databases, with no support for querying, and the responsibility of data base administration was ad-hoc, being left to the individual maintainer to take care of, without any software support.</p><h3>Relational databases</h3><p>A relational database is a set of relations such that the data satisfies the predicates which describe the constraints on the possible values and their combinations. It provides a declarative method for specifying data and queries. RDBMS software describes data structures for storing the data, as well as the retrieval procedures.</p><p>Relations are represented as tables in the database. A table describes a specific entity type, and all attributes of a specific record are listed under an entity type. Each individual record is represented as a row, and an attribute as a column. This is the relational database model as proposed by Codd in the 1960s.</p><p>The relational model was the first database model to be described in the formal terms of relational algebra. The relational database model went on to become the de-facto standard for all enterprise database management systems from the 1960s till the late nineties.</p><h3>Object-oriented databases</h3><p>In the mid-eighties, object-oriented databases were proposed, in order to allow greater programming flexibility by allowing objects to be directly stored in databases. Relations in a relational database represent behaviours, whereas interconnection between objects cannot be represented easily in the relational form.</p><p>OODBMS were intended to address this shortcoming. In an OODBMS, application data is represented by persistent objects that match the objects used in the programming language. However, object-oriented databases were not very successful, since they were more focused on addressing the programmer needs rather than the business intelligence needs of the organisation.</p><h3>Columnar databases</h3><p>These organise data from the same attribute as columns of values, as opposed to storing it as rows on disk. This results in large I/O savings in analytical and data-warehousing type of data retrieval, that largely accesses a set of columns. Note that columnar databases are relational, and support ACID semantics, as well as providing SQL support.</p><p>A popular columnar database that offers state-of-the-art analytical capabilities is Vertica. It is based on CStore, a column-oriented academic database research project described in this paper [<a href="http://db.csail.mit.edu/projects/cstore/vldb.pdf">PDF</a>].</p><p>Vertica has decoupled its Read Store (optimised for read-only accesses) from the Write Store (optimised for high performance updates and inserts) to evolve a hybrid model that offers excellent scalability. The high degree of compression that can be achieved due to the nature of columnar data, grouped together with the rest of the architecture, provides powerful analytical capabilities (<a href="http://www.vertica.com/resources/white-papers">white papers</a>).</p><p>There have been several commercial and open source RDBMS products, including IBM&#8217;s DB2, Oracle Database, Microsoft SQL Server, MySQL and many others. Historically, relational databases have been providing transaction processing with the clarity that emerged from their formal mathematical models, and an elegant way of storing/retrieving data using SQL. When the reality of planetary-scale data management kicked in with the Big Data explosion, combined with the need for massive Web capabilities fuelled by Web 2.0, the industry felt the need for alternatives to traditional RDBMS.</p><p>Implementations of RDBMSs are generally tuned for their specific category of usage &#8212; say, OLTP, data warehousing, decision-support regimes, etc. Big Data applications are characterised by mostly read accesses, need for quick retrieval of query results even if the results are not fully complete, streaming media requests, textual search in large number of documents, 24X7 Web requests with varying data access patterns. However, RDBMS are not tuned for such big data application characteristics.</p><h2>Architectural shifts on data-sharing models</h2><p>A number of techniques have been proposed to address the changing needs of data management driven by Big Data and the Cloud. These include:</p><ol><li>Data replication, which creates multiple copies of the databases. The copies can be read-only, with one master copy where updates occur, and then are propagated to the copies &#8212; or the copies can be read-write, which imposes the complexity of ensuring the consistency of the multiple copies.</li><li>Memory caching of frequently accessed data, as popularised by the memcached architecture.</li><li>From the traditional &#8220;Shared Everything Scale-up&#8221; architecture, the focus shifted to &#8220;Shared Nothing Scale-out&#8221; architectures. The shared-nothing architecture allows independent nodes as the building blocks, with information replicated, maintained and accessed. Database sharding is a method of horizontal partitioning in a database, which typically partitions its data among many nodes on different databases, with replication of the application&#8217;s data via synchronisation. Shared-disk clustered databases, such as Oracle RAC, use a different model to achieve scalability, based on a &#8220;shared-everything&#8221; architecture that relies upon high-speed connections between servers. The dynamic scalability required for cloud database offerings still remains elusive in both these approaches. &#8220;Shared-nothing&#8221; architectures require time-consuming and disturbing data rebalancing when nodes are added/deleted. While node addition/deletion is faster in the &#8220;Shared-everything&#8221; architecture, they have scaling issues with increasing node counts.</li></ol><p>Note that using the above techniques seriously impacts the maintenance of the ACID properties satisfied by traditional RDBMS engines. For instance, maintaining multiple read/write copies of the database impacts data consistency. Data sharding impacts atomicity requirements. Schema changes are time-consuming, and need to be propagated to multiple nodes in such data replicated/sharded/partitioned architectures.</p><p>Various SQL operations, such as joins, cannot be implemented at the database layer, since the database is partitioned; they need to be implemented in the application middleware layer. Therefore, supporting both RDBMS and distributed databases, which can scale to the needs of Big Data and the Cloud, have conflicting requirements.</p><h3>Concurrency control</h3><p>Relational data stores implement pessimistic concurrency, implying that updates to the records are protected and serialised. The locking overhead associated with the pessimistic concurrency control presents significant challenges for any light-weight implementation that needs to meet the high-performance requirements for massive data-stores.</p><p>On the contrary, many of the non-relational database implementations lean towards optimistic concurrency control, with a relaxed consistency model.</p><h3>Shifts in the programming paradigm</h3><p>The non-relational data models allow one to massively scale and perform in parallel. Map-Reduce (MR) techniques allow processing of partitioned subproblems in a distributed manner, with a greater degree of efficiency. There are libraries available in several forms that allow distributed processing of the map and reduction operations, with map operations executed in parallel.</p><p>Map-Reduce functions are both defined as key/value pairs, largely capable of operating on top of NoSQL data stores. It also provides inherent fault tolerance to recompute partial map results.</p><p>When does a static schema not make sense? Or, when does it need to be dynamic?</p><p>A database schema refers to the organisation of data to create a blueprint of how a database will be constructed. It specifies, based on the database administrator&#8217;s knowledge of possible applications, the facts that can enter the database, or those of interest to the possible end-users.</p><p>There is an incredible amount of diversity in terms of how data is structured in the Big Data era. The requirements for many data collections are different, and it could be a daunting task to attempt to unify all types of collections with a single schema. Instead of creating a &#8220;one size fits all&#8221; approach at the database level, the administrators who use this approach often look for flexibility in schemas. With the inability to design a predetermined schema during the design phase, a traditional SQL-based relational database is less likely to be appropriate.</p><p>The widely varying requirements of data management under Big Data and cloud computing resulted in the industry looking for alternatives to RDBMS. This led to the growth of non-relational distributed databases. These non-relational distributed database systems, which vary widely in their design, have come to be referred to by the term &#8220;NoSQL&#8221;, signifying that they are different from the traditional relational database systems that support a structured query language.</p><h2>What is NoSQL?</h2><p>NoSQL stands for &#8220;Not-Only-SQL&#8221;. It is the emergence of a growing number of non-relational, distributed data stores that typically do not attempt to provide ACID guarantees. NoSQL databases may not require fixed table schemas, and they typically scale horizontally. NoSQL architecture often provides weak consistency guarantees and restricted transactional support.</p><p>The origins of the NoSQL movement can be traced to what is now known as <a href="http://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a>. Recall the fact that in order to handle massive datasets, databases turned to &#8220;Shared Nothing&#8221; partitioned systems. In 2002, Eric Brewster made the following conjecture in the PODC conference keynote talk [<a href="http:/www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf" class="broken_link">PDF</a>]:</p><blockquote><p>It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:</p><ol><li>Consistency (all nodes see the same data at the same time)</li><li>Availability (node failures do not prevent survivors from continuing to operate)</li><li>Partition tolerance (the system continues to operate despite arbitrary message loss)</li></ol></blockquote><p>According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three. Therefore, the &#8220;Shared Nothing&#8221; partitioned database architectures moved towards non-relational databases, which sacrificed consistency in order to provide high availability, scalability and partition tolerance. These partitioned non-relational databases moved from the traditional ACID semantics supported by RDBMS to BASE semantics.</p><p>BASE stands for &#8220;Basically Available, Soft state, Eventually consistent&#8221;. Under the BASE semantics, it is enough for the database to eventually be in a consistent state. ACID is pessimistic, and forces consistency at the end of every transaction. BASE is optimistic, and accepts that the database consistency will be in a state of flux. In simple terms, eventual consistency means that while a database may be inconsistent at certain points of time, it will eventually become consistent; i.e., eventually, all database nodes will receive the latest consistent updates. This relaxed consistency allows BASE systems to provide high scalability.</p><p>A detailed discussion of ACID versus BASE semantics, and how correctness can still be maintained under BASE semantics, can be found in the <a href="http://queue.acm.org/detail.cfm?id=1394128">ACM queue article</a>. Incidentally, it was this article that coined the term &#8220;BASE&#8221; for such partitioned database architectures.</p><h2>Types of NoSQL systems</h2><p>Several NoSQL systems employ a distributed architecture, similar to the &#8220;Shared-Nothing&#8221; model. Simplistic implementations use associative arrays or key-value pairs. Often, they are implemented with distributed hash table (DHT), multi-dimensional tables, etc. A wide variety of distinct families exist under the NoSQL movement. Let&#8217;s look at these families now.</p><h3>Simple key-value store</h3><p>This is a schema-less data store that allows the application to store its data, and allows it to manage, most often with minimal data-type support. Dynamo (Amazon), Voldemort (LinkedIn) use the Key-Value Distributed Hash Table from implementing their databases.</p><p>SimpleDB provides simple key/value pairs, stored in a distributed hash table, but interoperates well with the rest of its cloud building blocks. Dynamo is Amazon&#8217;s major research paradigm for non-relational database design, with a simpler model with massive scaling abilities, and extraordinarily high availability requirements. It allows applications to relax their consistency guarantees, under certain failure scenarios. The techniques used to make Dynamo work and perform well include consistent hashing, vector clocks, Merkle tree data structure and Gossip (a distributed information sharing approach).</p><p>Dynomite is an open source implementation of Dynamo, written in Erlang. A detailed description of Dynamo is available from Werner Vogel&#8217;s paper [<a href="http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf">PDF</a>].</p><h3>Table-oriented NoSQL data stores</h3><p>These are similar to key-value stores, but define the value as a set of columns. An example of a table-oriented NoSQL store is Google&#8217;s BigTable. One of the leaders in this space, Google&#8217;s BigTable is integrated with its cloud computing platform (Google App Engine) and offers simplified SQL capabilities called GQL.</p><p>BigTable and its clones are implemented as sparse, multi-dimensional sorted maps, instead of simple key-value stores. The row, column and timestamp are used to index data. Atomicity is guaranteed at a low-level row (updates to a single row are always transactional). Locality groups provide the ability to combine different columns to impose access control and hints about data that are typically accessed together.</p><p>Hypertable and HBase are two open source clones of BigTable.</p><p>One of the key differences between these implementations is the underlying file system that actually stores the data. BigTable uses Google&#8217;s proprietary file system (Google File System &#8212; GFS) while the open source clones use HadoopFileSystem or KosmosFileSystem (KFS). These are open source implementations of Google File System. <a href="http://hbase.apache.org/">HBase</a>, in turn, provides additional database capabilities, map/reduce integration, etc.</p><h3>Document-oriented data stores</h3><p>These NoSQL key-value data stores are highly structured, and include complex self-defining objects as their values. Many of the popular document databases, such as MongoDB or CouchDB, use JavaScript Object Notation (JSON).</p><p><a href="http://couchdb.apache.org/">CouchDB</a> defines a basic key/value storage mechanism to store documents in JSON (JavaScript Object Notation) format. It allows the creation of persistent views on top of this data, similar to database tables which can be queried. The storage engine for CouchDB does support ACID properties, and is intended to be a distributed fault-tolerant system. It has the capabilities to scale down and hence, can be used for mobile computing, efficiently.</p><p>A similar project is <a href="http://www.mongodb.org/">MongoDB</a>, which does not provide ACID properties, but has powerful querying capabilities.</p><h3>Graph databases</h3><p>A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and properties to represent and store information. Nodes represent entities, and properties are information that relate to nodes. Edges connect nodes to nodes/properties representing the relationship. Most of the important information is really stored in the edges. Compared with relational databases, graph databases are often faster for associative data sets, scale more naturally to large data sets, and are more suitable for changing data/evolving schemas. Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements. Examples of graph databases are <a href="http://www.neo4j.org/">Neo4j</a> and <a href="http://www.hypergraphdb.org/">hypergraphDB</a>.</p><p>Various NoSQL implementations differ considerably on how consistency is achieved, what data types are supported, the structure and nature of how the data is stored, and the support of languages/libraries/interfaces for access. Implementations embrace what is best for them by considering how their targeted class of Web services would need to scale.</p><p>They also vary in terms of whether the key-values are stored in-memory, on-disk, or both; hierarchically or tabular; duple or triplet; single or multiple value per key, etc. A detailed discussion of NoSQL databases can be found in the paper &#8220;No Relation: The Mixed Blessings of Non-Relational Databases&#8221;, available <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.1113">here</a>.</p><h2>What’s around the corner for databases?</h2><p>The question that has frequently been asked is whether the NoSQL movement signifies the end of traditional RDBMS.</p><p>While the NoSQL movement has helped to answer the initial needs of massive data sets of Web 2.0, traditional OLTP and OLAP enterprise applications still depend on RDBMS. Therefore, we expect that both NoSQL and traditional RDBMS would continue to coexist for the next decade.</p><p>Regarding the question on whether today&#8217;s databases satisfy the challenges of data management posed by the Cloud and Big Data, we should look back at database history for our answer. Recall that the RDBMS revolution enabled the conversion of business data into information, driving business processes and business intelligence. It is not sufficient for NoSQL databases to support high availability, scalability and low latency needs. They also need to provide state-of-the-art analytics, which can power business intelligence while being elastic to fit the cloud environment.</p><p>The opinion widely shared by many in the database industry is that, &#8220;The real database revolution driven by Big Data and the Cloud is just around the corner.&#8221; So we do have a lot of interesting innovations to look forward to in the database space over the next few years.<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2011/07/database-demands-of-peta-scale-computing/" rel="bookmark" class="crp_title">The Database Demands of Peta-scale Computing</a></li><li><a href="http://www.linuxforu.com/2011/02/up-close-and-personal-with-nosql/" rel="bookmark" class="crp_title">Up Close and Personal with NoSQL</a></li><li><a href="http://www.linuxforu.com/2012/01/newsql-handle-big-data/" rel="bookmark" class="crp_title">NewSQL &#8212; The New Way to Handle Big Data</a></li><li><a href="http://www.linuxforu.com/2011/05/exploring-software-zodb-a-nosql-database/" rel="bookmark" class="crp_title">Exploring Software: ZODB, a NoSQL Database</a></li><li><a href="http://www.linuxforu.com/2011/08/it-service-provider-turns-to-postgresql-for-scalability/" rel="bookmark" class="crp_title">IT Service Provider Turns to PostgreSQL for Scalability</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/acid/" title="acid" rel="tag">acid</a>, <a href="http://www.linuxforu.com/tag/acid-vs-base/" title="ACID vs BASE" rel="tag">ACID vs BASE</a>, <a href="http://www.linuxforu.com/tag/amazon/" title="Amazon" rel="tag">Amazon</a>, <a href="http://www.linuxforu.com/tag/amazon-ec2/" title="Amazon EC2" rel="tag">Amazon EC2</a>, <a href="http://www.linuxforu.com/tag/app-engine/" title="App Engine" rel="tag">App Engine</a>, <a href="http://www.linuxforu.com/tag/att/" title="AT&amp;T" rel="tag">AT&amp;T</a>, <a href="http://www.linuxforu.com/tag/base/" title="BASE" rel="tag">BASE</a>, <a href="http://www.linuxforu.com/tag/big-data/" title="big data" rel="tag">big data</a>, <a href="http://www.linuxforu.com/tag/cap-theorem/" title="CAP theorem" rel="tag">CAP theorem</a>, <a href="http://www.linuxforu.com/tag/cassandra/" title="Cassandra" rel="tag">Cassandra</a>, <a href="http://www.linuxforu.com/tag/cloud-computing/" title="cloud computing" rel="tag">cloud computing</a>, <a href="http://www.linuxforu.com/tag/cloud-computing-services/" title="cloud computing services" rel="tag">cloud computing services</a>, <a href="http://www.linuxforu.com/tag/daas/" title="DaaS" rel="tag">DaaS</a>, <a href="http://www.linuxforu.com/tag/data-management/" title="data management" rel="tag">data management</a>, <a href="http://www.linuxforu.com/tag/data-revolution/" title="data revolution" rel="tag">data revolution</a>, <a href="http://www.linuxforu.com/tag/data-structures/" title="data structures" rel="tag">data structures</a>, <a href="http://www.linuxforu.com/tag/database-research/" title="database research" rel="tag">database research</a>, <a href="http://www.linuxforu.com/tag/database-systems/" title="database systems" rel="tag">database systems</a>, <a href="http://www.linuxforu.com/tag/emerging-technologies/" title="emerging technologies" rel="tag">emerging technologies</a>, <a href="http://www.linuxforu.com/tag/eric-brewster/" title="Eric Brewster" rel="tag">Eric Brewster</a>, <a href="http://www.linuxforu.com/tag/google/" title="Google" rel="tag">Google</a>, <a href="http://www.linuxforu.com/tag/google-app-engine/" title="Google App Engine" rel="tag">Google App Engine</a>, <a href="http://www.linuxforu.com/tag/io/" title="I/O" rel="tag">I/O</a>, <a href="http://www.linuxforu.com/tag/ibm/" title="IBM" rel="tag">IBM</a>, <a href="http://www.linuxforu.com/tag/json/" title="JSON" rel="tag">JSON</a>, <a href="http://www.linuxforu.com/tag/lfy-may-2011/" title="LFY May 2011" rel="tag">LFY May 2011</a>, <a href="http://www.linuxforu.com/tag/linkedin/" title="LinkedIn" rel="tag">LinkedIn</a>, <a href="http://www.linuxforu.com/tag/media-streaming/" title="media streaming" rel="tag">media streaming</a>, <a href="http://www.linuxforu.com/tag/microsoft/" title="Microsoft" rel="tag">Microsoft</a>, <a href="http://www.linuxforu.com/tag/momentum/" title="momentum" rel="tag">momentum</a>, <a href="http://www.linuxforu.com/tag/mysql/" title="MySQL" rel="tag">MySQL</a>, <a href="http://www.linuxforu.com/tag/non-volatile-memory/" title="non volatile memory" rel="tag">non volatile memory</a>, <a href="http://www.linuxforu.com/tag/nosql/" title="NoSQL" rel="tag">NoSQL</a>, <a href="http://www.linuxforu.com/tag/olap/" title="OLAP" rel="tag">OLAP</a>, <a href="http://www.linuxforu.com/tag/oracle/" title="Oracle" rel="tag">Oracle</a>, <a href="http://www.linuxforu.com/tag/platform-architectures/" title="platform architectures" rel="tag">platform architectures</a>, <a href="http://www.linuxforu.com/tag/programming-models/" title="programming models" rel="tag">programming models</a>, <a href="http://www.linuxforu.com/tag/programming-paradigm/" title="programming paradigm" rel="tag">programming paradigm</a>, <a href="http://www.linuxforu.com/tag/query-language/" title="Query Language" rel="tag">Query Language</a>, <a href="http://www.linuxforu.com/tag/rackspace/" title="Rackspace" rel="tag">Rackspace</a>, <a href="http://www.linuxforu.com/tag/rdbms/" title="RDBMS" rel="tag">RDBMS</a>, <a href="http://www.linuxforu.com/tag/relational-databases/" title="Relational databases" rel="tag">Relational databases</a>, <a href="http://www.linuxforu.com/tag/saas/" title="SaaS" rel="tag">SaaS</a>, <a href="http://www.linuxforu.com/tag/salesforce-com/" title="Salesforce.com" rel="tag">Salesforce.com</a>, <a href="http://www.linuxforu.com/tag/software-stacks/" title="software stacks" rel="tag">software stacks</a>, <a href="http://www.linuxforu.com/tag/sql/" title="SQL" rel="tag">SQL</a>, <a href="http://www.linuxforu.com/tag/twitter/" title="Twitter" rel="tag">Twitter</a>, <a href="http://www.linuxforu.com/tag/voldemort/" title="Voldemort" rel="tag">Voldemort</a>, <a href="http://www.linuxforu.com/tag/werner-vogel/" title="Werner Vogel" rel="tag">Werner Vogel</a>, <a href="http://www.linuxforu.com/tag/workloads/" title="workloads" rel="tag">workloads</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2011/05/databases-in-era-of-cloud-computing-and-big-data/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Layer 7 Load Balancers</title><link>http://www.linuxforu.com/2011/04/layer-7-load-balancers/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=layer-7-load-balancers</link> <comments>http://www.linuxforu.com/2011/04/layer-7-load-balancers/#comments</comments> <pubDate>Thu, 31 Mar 2011 18:41:48 +0000</pubDate> <dc:creator>Prashant Phatak</dc:creator> <category><![CDATA[Concepts]]></category> <category><![CDATA[Overview]]></category> <category><![CDATA[Sysadmins]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[broken web]]></category> <category><![CDATA[BSD]]></category> <category><![CDATA[Cisco]]></category> <category><![CDATA[Citrix]]></category> <category><![CDATA[Crossroads. Ultra Monkey]]></category> <category><![CDATA[data-centre infrastructure]]></category> <category><![CDATA[DNS]]></category> <category><![CDATA[Fedora]]></category> <category><![CDATA[http]]></category> <category><![CDATA[LFY April 2011]]></category> <category><![CDATA[load balance]]></category> <category><![CDATA[load balancers]]></category> <category><![CDATA[load-balancing]]></category> <category><![CDATA[load-balancing device]]></category> <category><![CDATA[LVS]]></category> <category><![CDATA[open source load-balancing solutions]]></category> <category><![CDATA[page requests]]></category> <category><![CDATA[Red Hat]]></category> <category><![CDATA[request processing]]></category> <category><![CDATA[resource limit]]></category> <category><![CDATA[server farm]]></category> <category><![CDATA[server performance]]></category> <category><![CDATA[SSL]]></category> <category><![CDATA[unix]]></category> <category><![CDATA[virtual server]]></category> <category><![CDATA[web portals]]></category> <category><![CDATA[Web requests]]></category> <category><![CDATA[Web servers]]></category> <category><![CDATA[web services infrastructure]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=9250</guid> <description><![CDATA[Linux has proven itself as a rock-solid operating system platform for industry-leading software appliances and applications, one of which is for load-balancing. As global Internet traffic increases, it demands an increased throughput from...]]></description> <content:encoded><![CDATA[<p><img src="http://cdn.linuxforu.com/wp-content/uploads/2011/04/load-balance.jpg?d9c344" alt="Load balancing?" title="Load balancing?" width="300" height="239" class="alignright size-full wp-image-9252" /><div class="introduction">Linux has proven itself as a rock-solid operating system platform for industry-leading software appliances and applications, one of which is for load-balancing. As global Internet traffic increases, it demands an increased throughput from the existing infrastructure.</p><p>It is crucial to deliver content fast; this is especially true for businesses whose only interface with clients is their Web portals. Load balancers add great value in this case, and also provide multiple other functionalities. This article explains new trends in this well-known product category, which are not adequately explored by IT managers and systems administrators.</p></div><p>Why is there a need for load balancers? While managing a Web services infrastructure, Web administrators often find it a challenge to cope with increased website hits, while maintaining high availability of the servers. This situation gets even tougher when a new Web application or functionality is released, attracting more users per day.</p><p>Optimisation of server performance is thus a continuous job. Consider a Web server hosting a site running a few applications. When the site gains more users, there are many more page requests. Serving each request uses a definite amount of CPU, memory and network resources. Adding powerful resources can only solve the problem to some extent, while introducing other challenges. When the Web server hits the ceiling in terms of its resource limit, it starts dropping Web requests, which results in a bad user experience &#8212; a &#8220;broken&#8221; Web page.</p><p>And if the Web server goes down, for some reason, the entire site becomes non-functional. This can certainly result in a loss of reputation, and in some cases, also a monetary loss for the organisation. To preempt such situations, IT management teams must deploy load-balancing solutions in the data-centre infrastructure. We will soon discuss how a load balancer can not only distribute traffic, but also help ease network operations tasks.</p><h2>How does a load balancer work?</h2><p>First-generation balancing devices were implemented around BSD UNIX versions. A new trend of balancing products is typically in the form of an appliance running a Linux distribution; some enterprise-grade appliances use Red Hat or similar Linux flavours.</p><div id="attachment_9253" class="wp-caption aligncenter" style="width: 590px"><img src="http://cdn.linuxforu.com/wp-content/uploads/2011/04/Typical-load-balancer-setup-590x379.jpg?d9c344" alt="Typical load balancer setup" title="Typical load balancer setup" width="590" height="379" class="size-large wp-image-9253" /><p class="wp-caption-text">Figure 1: Typical load balancer setup</p></div><p>Functionally, a load balancer can balance traffic by distributing it among two or more servers. Figure 1 shows a typical Web farm configuration with a load-balancing device that acts as a front-end server to handle all Web requests. Each silo hosts a different set of applications, whereas all servers in a given silo host identical applications.</p><p>From the configuration point of view, the device is configured with two separate IP ranges. One is used to handle incoming traffic, and the other, called virtual servers, is used to connect to the nodes under its control. Thus, it forms an agent service between the requesting client and the responding server. It also acts on the requests intelligently, based on the configured rules, to choose a recipient node with the least workload at that particular time.</p><p>Rules define how a request should be handled, and also how to handle special conditions such as node preference, session management, etc. The load-balancing device then makes a separate TCP connection with the recipient Web server, and redirects the requests to it, while it keeps track of the request processing.</p><p>In the technical sense, a load balancer balances underlying TCP connections, rather than actual Web requests. It is a misconception that a load balancer checks resource utilisation (such as CPU, memory, etc.) on a controlled server. In reality, it simply checks the network response time of a server, which is a result of the server&#8217;s overall resource utilisation. Since it acts as a catalyst in improving the scalability of a server farm, it maintains data for each node under its control, like the number of requests processed in history, the response time by each host for requests, the fault trend of each host, etc.</p><p>In earlier days, load balancing solutions were implemented around simple round-robin techniques, which did help distribute the load, but did not provide fault tolerance features, since they lacked the necessary intelligence. In today&#8217;s advanced data centres, load balancers are used to effectively distribute traffic for Web servers, databases, queue managers, DNS servers, email and SMTP traffic, and almost all applications which use IP traffic. Balancing DNS servers helps distribute DNS queries to servers that are dispersed geographically, which is useful for disaster-recovery implementations.</p><h2>Using load balancers to achieve fault tolerance</h2><p>In a server farm, servers often experience downtime due to unforeseen resource failure or scheduled maintenance. These resource failures can be at the hardware level or simply at the software application level. In a business-critical infrastructure, such situations should be transparent, never affecting the user. As discussed earlier, since the balancing device maintains separate TCP connectivity with the controlled node, it can be further used to achieve fault tolerance.</p><p>A configurable &#8220;heart beat&#8221;, called a monitor, is maintained by the balancer with each node. This can be a simple ICMP ping, or an FTP/HTTP connection to retrieve data. Upon an appropriate response from the node, the load balancer becomes aware that the node is live, and marks it as an active participant eligible for the balancing process. If the server or its application resource fails, the balancer waits for a certain period of time for a &#8220;heart beat&#8221; from the node; upon non-compliance, it marks that node as a non-participant, and removes it from the silo.</p><p>Once marked thus, the load balancer doesn&#8217;t send traffic to that node. However, it still keeps polling to see if the node is back online, and if found to be so, marks the node as an active participant again and starts sending traffic to it. If a fault situation occurs while the request is being transferred to a node, modern load balancers are capable of detecting that too, and taking (configurable) corrective action.</p><p>This feature can further be explored by the operations team for maintenance purposes. A service instance can be configured on a node &#8212; for example, a separate Web instance running under a separate IP address, with a dummy page on it. A monitor can be configured to access that page periodically. If the server is to be taken offline for maintenance purposes, the operations person can stop the dummy site, which results in the server being marked as a non-participant.</p><p>It can then be shut down or have other administration work done on it. Once maintenance is completed, the dummy site service can be started again, bringing the server back into the silo. This feature can be further extended by configuring many such monitors at the application level that can be reported upon in a dashboard via a network monitoring product, for an operations admin view.</p><h2>Layer 7 load balancing</h2><p>Earlier versions of load balancers used to work at the OSI model Layer 2 (link aggregation), or Layer 4 (IP-based). Since the requests flow through the balancing devices, it made sense to read into the requests at Layer 7, to bring additional flexibility in balancing techniques. Adding such flexibility offers higher scalability, better manageability and high availability.</p><p>Layer 7 load balancing primarily operates on the following three techniques:</p><ol><li>URL parsing</li><li>HTTP header interception</li><li>Cookie interception</li></ol><p>Typically, a Layer 7 rule structure looks somewhat like the one shown below. However, the exact syntax varies for each vendor and device model. As seen in the example, a request is first parsed based on the virtual directory being accessed, then by a particular cookie field&#8217;s content, and is finally sent to a default pool, if the first two conditions are not matched.</p><pre class="brush: text; gutter: false; first-line: 1">{
if (http_qstring, &quot;/&quot;) = &quot;mydir&quot;
   sendto server_pool1
else{
if cookie(&quot;mycookie&quot;) contains &quot;logon_cookie&quot;
   sendto server_pool2
else {
   sendto server_pool3
}</pre><p>Since the request is intercepted and interpreted at Layer 7, the possibilities of adding intelligence grow exponentially. Rules can be configured to distribute traffic based on a field in the HTTP header, the source IP address or custom cookie fields, to name just a few. There are endless possibilities to make intelligent traffic distributions.</p><p>For example, if the incoming request is from a smartphone, it can be sent to servers hosting mobile applications. If the request is for a URL that hosts a simple HTML-based site, it can be routed to an economical server farm. If a login cookie is not present in the request, it can be sent to a login server, avoiding loading down other busy servers.</p><p>As the Layer 7 rules bring programmability to balancing techniques, they can further be explored for the benefit of the technology operations staff. When a roll-out of a newer version is planned in an existing database server farm, the new set of servers can be configured as a separate pool to perform migration mock-tests, and can be brought online once the tests are passed.</p><p>In case the roll-out experiences problems, merely switching pools back to the original settings can achieve a rollback with minimum downtime. As another example, many mission-critical Web farms require to maintain legacy server operating systems for stability reasons, while new applications demand the latest and greatest platforms. In such cases, separate server pools can be configured for new applications, and traffic distribution can be achieved by checking Web request URLs at Layer 7.</p><p>Load balancing at Layer 7 also helps improve the return on investment (RoI) of an IT infrastructure. Consider a Web portal which caters to a high volume of users with Web pages that are content rich, with JavaScript and images. Since the scripts and images don&#8217;t change quite often, these can be treated as static content, and hosted on a separate set of servers. As a result, the Web servers running important business logic use fewer resources, which means that we can accommodate more users per server, or host more applications per server, and thus reduce the effective cost of hosting. This also proves that a carefully configured Layer 7 load balancer can achieve higher application performance throughputs on a given data-centre infrastructure footprint.</p><h2>Additional features in a load balancer appliance</h2><p>Besides powerful traffic distribution features, most industry-grade modern load balancers also come with features which are essential to take additional tasks from the managed nodes, or other infrastructure components. SSL negotiation is one such feature that can handle heavy volumes of SSL handshaking &#8212; which would otherwise take a performance toll on Web servers. Another great feature is cookie persistence, which helps applications stick to a particular server, in order to maintain a stateful session with it.</p><p>Many new load-balancer trends provide admin features such as traffic monitoring and TCP buffering; security features such as content filtering, and an intrusion detection firewall; and also performance-based features such as HTTP caching, HTTP compression, etc. Since a load-balancing device is a front-end component in a server farm, it comes equipped with high-speed network ports, such as Gigabit Ethernet and fibre connections.</p><h2>Open source load-balancing solutions</h2><p>Multiple vendors provide industry-grade enterprise load-balancing solutions, such as F5 networks (BigIP), Citrix Netscaler, Cisco, Coyote Point, etc. These devices are rich in features, provide flexible rule programmability, and exhibit high performance throughput &#8212; but they do come with a price tag and support cost.</p><p>For those who are interested in FOSS, there are multiple distributions available on the Linux platform, which offer features from simple load balancing to full-featured appliance-grade products. Let&#8217;s look at three such &#8216;most wanted&#8217; solutions.</p><p><a href="http://www.linuxforu.com/2009/05/balancing-traffic-across-data-centres-using-lvs/" title="Balancing Traffic Across Data Centres Using LVS">LVS (Linux Virtual Server)</a> is one famous solution, which has proved to be industry-grade software, and can be used to build highly scalable and available Linux cluster servers to cater to high volumes of Web requests. It comes with <a href="http://www.linux-vs.org">ample documentation</a>, which helps build a load-balanced farm, step by step.</p><p><a href="http://www.ultramonkey.org/">Ultra Monkey</a> is another interesting solution, which provides failover features in addition to basic load balancing: if one load-balancer device fails, the other can take over to provide device-level fault tolerance. It supports multiple Linux flavours such as Fedora, Debian, etc.</p><p>Another powerful, but lesser-known implementation is <a href="http://crossroads.e-tunity.com">Crossroads for Linux</a>, which is a TCP-based load balancer providing a very basic form of traffic distribution. The beauty of this product is that its source code can be easily modified to serve just one task, such as DNS or Web balancing, without any bells and whistles &#8212; thus achieving a very high performance for that single purpose.</p><p>Configuring Layer 7 rules on a load balancer is an art, and needs a deep understanding of networking protocols and server operations. Features of load balancers can also be used as an aid to the operations and maintenance tasks.</p><div class="imagecredit">Feature image courtesy: <a href="http://www.flickr.com/photos/72213316@N00/5531444032/">Frank Kovalchek</a>. Reused under the terms of CC-BY 2.0 License.</div><div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2009/05/balancing-traffic-across-data-centres-using-lvs/" rel="bookmark" class="crp_title">Balancing Traffic Across Data Centres Using LVS</a></li><li><a href="http://www.linuxforu.com/2009/02/building-a-highly-available-web-server-cluster/" rel="bookmark" class="crp_title">Building A Highly-Available Web Server Cluster</a></li><li><a href="http://www.linuxforu.com/2011/04/cnn-ibn%e2%80%99s-rocking-affair-with-postgresql/" rel="bookmark" class="crp_title">CNN-IBN’s Rocking Affair with PostgreSQL</a></li><li><a href="http://www.linuxforu.com/2009/03/building-a-highly-available-nginx-reverse-proxy-using-heartbeat/" rel="bookmark" class="crp_title">Building A Highly Available Nginx Reverse-Proxy Using Heartbeat</a></li><li><a href="http://www.linuxforu.com/2009/01/glassfish-part-1-architecture-community-et-al/" rel="bookmark" class="crp_title">GlassFish, Part 1: Architecture, Community, et al.</a></li></ul></div>Tags: <a href="http://www.linuxforu.com/tag/broken-web/" title="broken web" rel="tag">broken web</a>, <a href="http://www.linuxforu.com/tag/bsd/" title="BSD" rel="tag">BSD</a>, <a href="http://www.linuxforu.com/tag/cisco/" title="Cisco" rel="tag">Cisco</a>, <a href="http://www.linuxforu.com/tag/citrix/" title="Citrix" rel="tag">Citrix</a>, <a href="http://www.linuxforu.com/tag/crossroads-ultra-monkey/" title="Crossroads. Ultra Monkey" rel="tag">Crossroads. Ultra Monkey</a>, <a href="http://www.linuxforu.com/tag/data-centre-infrastructure/" title="data-centre infrastructure" rel="tag">data-centre infrastructure</a>, <a href="http://www.linuxforu.com/tag/dns/" title="DNS" rel="tag">DNS</a>, <a href="http://www.linuxforu.com/tag/fedora/" title="Fedora" rel="tag">Fedora</a>, <a href="http://www.linuxforu.com/tag/http/" title="http" rel="tag">http</a>, <a href="http://www.linuxforu.com/tag/lfy-april-2011/" title="LFY April 2011" rel="tag">LFY April 2011</a>, <a href="http://www.linuxforu.com/tag/load-balance/" title="load balance" rel="tag">load balance</a>, <a href="http://www.linuxforu.com/tag/load-balancers/" title="load balancers" rel="tag">load balancers</a>, <a href="http://www.linuxforu.com/tag/load-balancing/" title="load-balancing" rel="tag">load-balancing</a>, <a href="http://www.linuxforu.com/tag/load-balancing-device/" title="load-balancing device" rel="tag">load-balancing device</a>, <a href="http://www.linuxforu.com/tag/lvs/" title="LVS" rel="tag">LVS</a>, <a href="http://www.linuxforu.com/tag/open-source-load-balancing-solutions/" title="open source load-balancing solutions" rel="tag">open source load-balancing solutions</a>, <a href="http://www.linuxforu.com/tag/page-requests/" title="page requests" rel="tag">page requests</a>, <a href="http://www.linuxforu.com/tag/red-hat/" title="Red Hat" rel="tag">Red Hat</a>, <a href="http://www.linuxforu.com/tag/request-processing/" title="request processing" rel="tag">request processing</a>, <a href="http://www.linuxforu.com/tag/resource-limit/" title="resource limit" rel="tag">resource limit</a>, <a href="http://www.linuxforu.com/tag/server-farm/" title="server farm" rel="tag">server farm</a>, <a href="http://www.linuxforu.com/tag/server-performance/" title="server performance" rel="tag">server performance</a>, <a href="http://www.linuxforu.com/tag/ssl/" title="SSL" rel="tag">SSL</a>, <a href="http://www.linuxforu.com/tag/unix/" title="unix" rel="tag">unix</a>, <a href="http://www.linuxforu.com/tag/virtual-server/" title="virtual server" rel="tag">virtual server</a>, <a href="http://www.linuxforu.com/tag/web-portals/" title="web portals" rel="tag">web portals</a>, <a href="http://www.linuxforu.com/tag/web-requests/" title="Web requests" rel="tag">Web requests</a>, <a href="http://www.linuxforu.com/tag/web-servers/" title="Web servers" rel="tag">Web servers</a>, <a href="http://www.linuxforu.com/tag/web-services-infrastructure/" title="web services infrastructure" rel="tag">web services infrastructure</a><br /> ]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2011/04/layer-7-load-balancers/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Score a Goal With Postgres!</title><link>http://www.linuxforu.com/2010/02/enterprise-db-score-a-goal-with-postgres/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=enterprise-db-score-a-goal-with-postgres</link> <comments>http://www.linuxforu.com/2010/02/enterprise-db-score-a-goal-with-postgres/#comments</comments> <pubDate>Mon, 01 Feb 2010 11:00:24 +0000</pubDate> <dc:creator>Vanisha Joseph</dc:creator> <category><![CDATA[CXOs]]></category> <category><![CDATA[Features]]></category> <category><![CDATA[Technology]]></category><guid isPermaLink="false">http://www.linuxforu.com/?p=2826</guid> <description><![CDATA[Shrinking budgets and performance-hungry business applications are making organisations sway the open source way when choosing database management systems. EnterpriseDB, the enterprise Postgres company, is scoring consecutive goals on reliability, performance, scalability and cost comparisons against both its open source and proprietary opponents.]]></description> <content:encoded><![CDATA[<p>Most football freaks would agree on José Mourinho being the iconic coach of this decade, having consistently shown his ability at taking strong raw talent and transforming it into championship-calibre teams. After ensuring FC Porto was decorated with medals of the Portuguese Liga, Cup of Portugal, UEFA Cup and UEFA Champions League, he went on to guide Chelsea through two consecutive Premier League titles and helped Inter win the Supercoppa Italiana and National Serie A titles. In 2004, EnterpriseDB, a solution provider of an open source relational database management system, performed similar magic with PostgreSQL, a leading open source enterprise-class relational database management system (RDBMS). PostgreSQL was a promising star in the database market when it was originally built twenty years ago; EnterpriseDB helped transform the world-class technology of PostgreSQL into an enterprise class product with the release of Postgres Plus Standard Server and Postgres Plus Advanced Server.</p><p>“Postgres Plus is an open source database typically deployed in high profile, mission-critical applications. Our open source products provide capabilities that address a range of enterprise requirements (application development enhanced performance, runtime management, usability, and scalability), which enable organisations to build and deploy applications that solve critical business problems. We also provide 24&#215;7 ‘follow the sun’ technical support, training and professional services needed to support all phases of evaluation, development, deployment, and ongoing production of mission critical systems,” says Ashish Mehra, director India operations, EnterpriseDB Software India.</p><p>Based on open source PostgreSQL, which is developed by the largest independent RDBMS community in the world, EnterpriseDB’s Postgres Plus Standard Server and Postgres Plus Advanced Server are products suited for transaction-intensive and mixed-load applications. Typically, Postgres Plus is deployed as an online transaction processing (OLTP) database in support of enterprise applications such as enterprise resource planning (ERM), customer relationship management (CRM), supply chain management (SCM) and other applications that require databases to support high concurrency, performance, scalability, and security.</p><p>But with multiple players (both proprietary and open source) in the RDBMS market, what makes Postgres Plus products a cut above the rest?</p><table style="background-color: #cccccc; font-size: 12px; font-family: Verdana,Arial,Helvetica,sans-serif;" border="0" align="justify"><tbody><tr><td><a href="http://cdn.linuxforu.com/wp-content/uploads/temp-uploads/2010/02/Ashish-office1.JPG"><img src="http://cdn.linuxforu.com/wp-content/uploads/temp-uploads/2010/02/Ashish-office1.JPG" alt="" width="278" height="298" /></a></td><td>While the total cost of ownership of a 16-CPU ( four server, four CPUs/server) Oracle configuration is $1,261,600 over three years, the TCO of a comparably configured Postges Plus Advanced Server deployment is $215,760. This would mean a TCO savings of $1,045,840, or 83 per cent.”<strong>Ashish Mehra, director India operations, EnterpriseDB Software India</strong></td></tr></tbody></table><h2>Easy on the pocket</h2><p>Being open source software, Postgres Plus is cost effective as compared to its strong proprietary counterparts like Oracle, Microsoft, and IBM. Postgres Plus Advanced Server includes a breakthrough suite of compatibility technologies that run Oracle applications with little or no changes at a much lower cost. “The most significant difference between the two databases is around cost. While the total cost of ownership of a 16-CPU (four server, four CPUs/server) Oracle configuration is $1,261,600 over three years, the TCO of a comparably configured Postges Plus Advanced Server deployment is $215,760. This would mean a TCO savings of $1,045,840, or 83 per cent,” says Mehra.</p><p>NTT, a Japan-headquartered telecommunications company, couldn’t agree more. They switched from commercial RDBMS products for their telecom business operation and back office systems to PostgreSQL and have availed significant cost benefits. “We originally estimated to save $20-30 million for five years by deploying PostgreSQL to in-house systems. But after adding Postgres Plus Advanced Server to our options, we now estimate the savings to double,” testifies Takeshi Tachi, senior manager, NTT Open Source Software Center, Nippon Telegraph and Telephone Corporation.</p><p>InMobi Global Media Adnetwork, formerly mKhoj Solutions, that opted for EnterpriseDB in 2007, saw similar savings. “We chose Postgres Plus to solve our pain areas, namely, scalability and support issues. Postgres Plus was one-tenth the cost of other RDBMS. Further, we have seen huge cost reductions in our IT costs thereafter,” says Mohit Saxena, VP (technology), InMobi Global Media Adnetwork.</p><table style="background-color: #cccccc; font-size: 12px; font-family: Verdana,Arial,Helvetica,sans-serif;" border="0" align="justify"><tbody><tr><td><img src="http://cdn.linuxforu.com/wp-content/uploads/temp-uploads/2010/02/Mohit_Headshot.jpg?d9c344" alt="" width="425" height="461" /></td><td>We chose Postgres Plus to solve our pain areas, namely, scalability and support issues. Postgres Plus was one-tenth the cost of other RDBMS. Further, we have seen huge cost reductions in our IT costs thereafter.”</p><div><p><strong>Mohit Saxena, VP (technology), InMobi Global Media Adnetwork</strong></p></div></td></tr></tbody></table><p>Apart from the most obvious way of reducing IT expenditures by saving on commercial licensing fees, EnterpriseDB offers numerous strategies for organisations to cut costs. New line of business (LOB) application development and deployment, replication for reporting and business intelligence, and migrating non-mission critical and mission critical applications away from high-priced commercial databases are some leading strategies. “Query, reporting and BI activities degrade the performance of OLTP applications. By offloading those queries to lower-cost non-production databases, Postgres Plus’ replication capabilities can provide real-time data warehousing at a fraction of the cost of Oracle’s replication solutions. This eliminates OLTP performance problems while ensuring timely delivery of critical information to business stakeholders,” says Mehra. For example, FTD, the worldwide leader in floral-related products and services, implemented the Postgres Plus Advanced Server Replication Server module to improve performance of their production systems while still meeting the needs of their vendor network. “By moving their vendor-facing order tracking system away from their Oracle-based production environment and onto replicated systems running on Postgres Plus Advanced Server, the performance of their production systems increased more than 400 per cent during the next peak ordering season – all while reducing the cost of operating this system by more than 80 per cent,” says Mehra.</p><h2>Rich features</h2><p>The fundamental features of Postgres Plus Standard Server and Advanced Server distinguish them from their primary open source competitors, like MySQL and Ingres. “PostgreSQL contains a single unified storage engine capable of performing extremely fast for all load types: OLTP (with full ACID support), reporting and mixed usage. MySQL, on the other hand, has pluggable storage engines specialised for particular types of usage; this type of configuration can result in bottlenecks as applications grow and change. Subqueries, too, are poorly optimised in MySQL. Further, user defined data types in Postgres Plus give users more options for customised solutions than MySQL, also building room for database enforced encryption,” says Mehra.</p><p>The Oracle compatibility features found in Postgres Plus Advanced Server distinguish it from its distant cousin Ingres, another strong open source database competitor. “EnterpriseDB has added powerful and convenient Oracle features to Postgres Plus Advanced Server that are standard fare in the minds of many DBAs and developers, such as: function packages, dynamic runtime instrumentation, query optimisation hints, Oracle SQL extensions, explicit transaction controls, and data dictionary views,” says Mehra.</p><p>Further, Postgres Plus Standard Server and Postgres Plus Advanced Server come with many productivity tools like Postgres Studio (graphical tool for database and cluster creation/maintenance, SQL environment, and more), EDB SQL (SQL command line environment), EDB Loader (bulk data loader with error handling), DBA Management Server (for DBAs to handle monitoring, job scheduling, SQL terminal, software update management), DBA Monitoring Console (for resource usage and management), GRidSQL Monitoring Console for distributed data, and the Oracle Replication console.</p><h2>Performance enhancer</h2><p>EnterpriseDB’s Postgres Plus products have multiple features across many facets of the database to help improve performance. While GridSQL partitions data across multiple machines and transparently performs queries, Asynchronous Pre-Fetch can optimise regular index scans and bitmap index scans by issuing concurrent I/O requests to RAID (Redundant Array of Inexpensive Disks) hardware on Linux systems. On the other hand, Multi-Version Concurrency Control (MVCC) allows for high performance in applications that are both read and write intensive.</p><p>The best example of performance enhancement using Postgres Plus is hi5, a leading social networking site that deployed OLTP PostgreSQL installations, running on hundreds of servers. “The system supports the data transactions of more than 56 million active users each month. In June 2008, the company delivered more than 18.5 billion page views that were supported by PostgreSQL, serving nearly 11 million visitors to the site every day,” boasts Mehra of Postgres Plus.</p><p>NTT, too, saw better performance when they deployed Postgres Plus for transactional telecom business applications. “We deployed PostgreSQL for years, but earlier versions resulted in performance degradation in long-time operations requiring periodic data-vacuum operation. After PostgreSQL 8.3 released in February 2009, the performance degradation had been completely resolved, and EnterpriseDB has contributed significantly to this improvement,” says Tachi. Adding on, Saxena says: “Postgres Plus is the heart of our growth. With every release we have been setting new performance benchmarks. Postgres has helped us achieve even [previously thought] unachievable levels. This needs to be seen in the light of the fact that we are yet to use the capacity of Postgres Plus to its fullest.”</p><h2>Scalability</h2><p>EnterpriseDB solutions are highly scalable. Postgres Plus Advanced Server with the GridSQL configuration allows a single database to be partitioned across multiple commodity hardware machines for expandability. “InfiniteCache is a Postgres Plus Advanced Server feature providing an infinitely expandable cache across commodity hardware providing flexible growth management. The compression technology allows multi-gigabyte databases to reside in memory for lightning fast performance,” says Mehra.</p><p>Vouching for it, Saxena says: “InMobi is in a business where our traffic and our volumes are bound to grow exponentially. This is why we chose Postgres Plus and have faced no scalability issues so far. Today, we have two data centres in India and US. We plan to expand to Japan and see no problem with regard to managing databases so far.”</p><h2>Strong technical base</h2><p>Postgres comes with strong technical base and support. The Postgres community provides robust and well-tested fixes and enhancements across multiple platforms. Postgres user communities (like User Groups) provide a wealth of collaborative information to users, ensuring vendor independence and a strong ecosystem. This is where Postgres surpasses competitors like Ingres with no community outside of the Ingres Corporation.</p><h2>Foolproof support</h2><p>EnterpriseDB’s foolproof support has resolved the migration woes traditionally faced by organisations adopting open source RDBMS. EnterpriseDB offers a wide range of support options for Postgres — from free assets such as tutorials, product documentation and technical whitepapers to for-fee training, services, consulting, and subscription offerings. “Our training classes are developed and run by Postgres community’s leading contributors, like Bruce Momjian. Our Jump Start and Packaged Services offerings, like RemoteDBA and Architectural Health Checks, enable organisations to quickly develop skills and insight for fast realisation of the benefits of Postgres Plus,” say Mehra. “Migration involves two things – customisation and support. Migration is very easy with EnterpriseDB as they have plug-ins for all our existing databases. Further, their support is unmatchable. We use remote support from them,” testifies Saxena.</p><p>NTT, too, would vouch for it. Though their initial migration from commercial DBMS to PostgreSQL was a struggle, migration to Postgres Plus Advanced Server thereafter was a cakewalk. “Our estimates show that the total effort of [our] system migration project using Postgres Plus could be halved compared to the migration using the plain PostgreSQL,” says Tachi. The smooth experience has made NTT Japan look at migrating legacy accounting systems from commercial OS and DBMS to Linux and Postgres Plus Advanced Server. “We estimated this migration to be troublesome and not compensate the licensing cost saving before, but now we feel it is worth to try by deploying Advanced Server (Postgres Plus),” says Tachi.</p><h2>Road ahead!</h2><p>Today, EnterpiseDB employs over 100 associates worldwide, supporting more than 300 customers. Its business partners include Red Hat, Synnex, Compiere, Tomax, Contegix, Thunip, Elastra, immixGroup, Fujitsu, IBM, Continuent, and many other IT bigwigs. With EnterpriseDB winning consecutive trophies for Postgres, we might see them perform the same magic that Red Hat did for Linux. History sure does repeat itself!<div id="crp_related"><h5>Related Posts:</h5><ul><li><a href="http://www.linuxforu.com/2012/01/postgres-xc-database-clustering-solution/" rel="bookmark" class="crp_title">Postgres-XC &#8212; A PostgreSQL Clustering Solution</a></li><li><a href="http://www.linuxforu.com/2011/06/telecom-service-provider-handles-huge-volumes-of-data-using-postgresql/" rel="bookmark" class="crp_title">A Telecom Service Provider Handles Huge Volumes of Data Using PostgreSQL</a></li><li><a href="http://www.linuxforu.com/2010/05/an-auto-company-zips-along-the-fast-lane-with-foss/" rel="bookmark" class="crp_title">An Auto Company Zips Along the Fast Lane with FOSS</a></li><li><a href="http://www.linuxforu.com/2011/03/a-foss-lovers-tryst-with-postgresql/" rel="bookmark" class="crp_title">A FOSS Lover&#8217;s Tryst With PostgreSQL</a></li><li><a href="http://www.linuxforu.com/2011/02/open-source-databases-are-now-being-adopted-for-business-critical-systems/" rel="bookmark" class="crp_title">Open Source Databases Are Now Being Adopted For Business Critical Systems</a></li></ul></div> No tags for this post.]]></content:encoded> <wfw:commentRss>http://www.linuxforu.com/2010/02/enterprise-db-score-a-goal-with-postgres/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Content Delivery Network via cdn.linuxforu.com

Served from: www.linuxforu.com @ 2012-02-08 11:06:49 -->
