web analytics

Archive for the ‘Cloud computing’ Category

Developing for the Cloud in the Cloud: BigData Development with Docker in AWS

Why you may need it?

I am a developer, and I work daily in Integrated Development Environments (IDE), such as Intellij IDEA or Eclipse. These IDEs are desktop applications. Since the advent of Google Documents, I have seen more and more people moving their work from desktop versions of Word or Excel to the cloud using an online equivalent of a word processor or a spreadsheet application.

There are obvious reasons for using a cloud to keep your work. Today, compared to the traditional desktop business applications, some web applications do not have a significant disadvantage in functionalities. The content is available wherever there is a web browser, and these days, that’s almost everywhere. Collaboration and sharing are easier, and losing files is less likely.

Unfortunately, these cloud advantages are not as common in the world of software development as is for business applications. There are some attempts to provide an online IDE, but they are nowhere close to traditional IDEs.

That is a paradox; while we are still bound to our desktop for daily coding, the software is now spawned on multiple servers. Developers needs to work with stuff they cannot keep any more on their computer. Indeed, laptops are no longer increasing their processing power; having more than 16GB of RAM on a laptop is rare and expensive, and newer devices, tablets, for example, have even less.

However, even if it is not yet possible to replace classic desktop applications for software development, it is possible to move your entire development desktop to the cloud. The day I realized it it was no longer necessary to have all my software on my laptop, and noticing the availability of web version of terminals and VNC, I moved everything to the cloud. Eventually, I developed a build kit for creating that environment in an automated way.

Developer in the cloud

What is the cloud about for a developer? Developing in it, of course!

In this article I present a set of scripts to build a cloud-based development environment for Scala and big data applications, running with Docker in Amazon AWS, and comprising of a web-accessible desktop with IntelliJ IDE, Spark, Hadoop and Zeppelin as services, and also command line tools like a web based SSH, SBT and Ammonite. The kit is freely available on GitHub, and I describe here the procedure for using it to build your instance. You can build your environment and customize it to your particular needs. It should not take you more than 10 minutes to have it up and running.

What is in the “BigDataDevKit”?

My primary goal in developing the kit was that my development environment should be something I can simply fire up, with all the services and servers I work with, and then destroy them when they are no longer needed. This is especially important when you work on different projects, some of them involving a large number of servers and services, as when you work on big data projects.

My ideal cloud-based environment should:

  • Include all the usual development tools, most importantly a graphical IDE.
  • Have the servers and services I need at my fingertips.
  • Be easy and fast to create from scratch, and expandable to add more services.
  • Be entirely accessible using only a web browser.
  • Optionally, allow access with specialized clients (VNC client and SSH client).

Leveraging modern cloud infrastructure and software, the power of modern browsers, a widespread availability of broadband, and the invaluable Docker, I created a development environment for Scala and big data development that, for the better, replaced my development laptop.

Currently, I can work at any time, either from a MacBook Pro, a Surface Tablet, or even an iPad (with a keyboard), although admittedly the last option is not ideal. All these devices are merely clients; the desktop and all the servers are in the cloud.

Docker and Amazon AWS!

My current environment is built using following online services:

  • Amazon Web Services for the servers.
  • GitHub for storing the code.
  • Dropbox to save files.

I also use a couple of free services, like DuckDns for dynamic IP addresses and Let’s encrypt to get a free SSL certificate.

In this environment, I currently have:

  • A graphical desktop with Intellij idea, accessible via a web browser.
  • Web accessible command line tools like SBT and Ammonite.
  • Hadoop for storing files and running MapReduce jobs.
  • Spark Job Server for scheduled jobs.
  • Zeppelin for a web-based notebook.

Most importantly, the web access is fully encrypted with HTTPS, for both web-based VNC and SSH, and there are multiple safeguards to avoid losing data, a concern that is, of course, important when you do not “own” the content on your physical hard disk. Note that getting a copy of all your work on your computer is automatic and very fast. If you lose everything because someone stole your password, you have a copy on your computer anyway, as long as you configured everything correctly.

Using a Web Based Development Environment with AWS and Docker

Now, let’s start describing how the environment works. When I start work in the morning, the first thing is to log into the Amazon Web Services console where I see all my instances. Usually, I have many development instances configured for different projects, and I keep the unused ones turned off to save billing. After all, I can only work on one project at a time. (Well, sometimes I work on two.)

Screen 1

So, I select the instance I want, start it, I wait for a little or go grab a cup of coffee. It’s not so different to turning on your computer. It usually takes a bunch of seconds to have the instance up and running. Once I see the green icon, I open a browser, and I go to a well known URL: https://msciab.duckdns.org/vnc.html. Note, this is my URL; when you create a kit, you will create your unique URL.

Since AWS assigns a new IP to each machine when you start, I configured a dynamic DNS service, so you can always use the same URL to access your server, even if you stop and restart it. You can even bookmark it in your browser. Furthermore, I use HTTPS, with valid keys to get total protection of my work from sniffers, in case I need to manage passwords and other sensitive data.

Screen 2

Once loaded, the system will welcome you with a Web VNC web client, NoVNC. Simply log in and a desktop appears. I use a minimal desktop, intentionally, just a menu with applications, and my only luxury is a virtual desktop (since I open a lot of windows when I develop). For mail, I still rely on other applications, nowadays mostly other browser tabs.

In the virtual machine, I have what I need to develop big data applications. First and foremost, there is an IDE. In the build, I use the IntelliJ Idea community edition. Also, there is the SBT build tool and a Scala REPL, Ammonite.

Screen 3

The key features of this environment, however, are services deployed as containers in the same virtual machine. In particular, I have:

  • Zeppelin, the web notebook for using Scala code on the fly and doing data analysis (http://zeppelin:8080)
  • The Spark Job Server, to execute and deploy spark jobs with a Rest interface (http://sparkjobserver:8080).
  • An instance of Hadoop for storing and retrieving data from the HDFS (http://hadoop:50070).

Note, these URLs are fixed but are accessible within the virtual environment. You can see their web interfaces in the following screenshot.

Screen 4

Each service runs in a separate Docker container. Without becoming too technical, you can think of this as three separate servers inside your virtual machine. The beauty of using Docker is you can add services, and even add two or three virtual machines. Using Amazon containers, you can scale your environment easily.

Last, but not least, you have a web terminal available. Simply access your URL with HTTPS and you will be welcomed with a terminal in a web page.

Screen 5

In the screenshot above you can see I list the containers, which are the three servers plus the desktop. This command line shell gives you access to the virtual machine holding the containers, allowing you to manage them. It’s as if your servers are “in the Matrix” (virtualized within containers), but this shell gives you an escape outside the “Matrix” to manage servers, and desktop. From here, you can restart the containers, access their filesystems and perform other manipulations allowed by Docker. I will not discuss in detail Docker here, but there is a vast amount of documentation on Docker website.

How to setup your instance

Do you like this so far, and you want your instance? It is easy and cheap. You can get it for merely the cost of the virtual machine on Amazon Web Services, plus the storage. The kit in the current configuration requires 4GB of RAM to get all the services running. If you are careful to use the virtual machine only when you need it, and you work, say, 160 hours a month, a virtual machine at current rates will cost 160 x $0.052, or $8 per month. You have to add the cost of storage. I use around 30GB, but everything altogether can be kept under $10.

However, this does bot include the cost of an (eventual) Dropbox (Pro) account, should you want to backup more than 2GB of code. This costs another $15 per month, but it provides important safety for your data. Also, you will need a private repository, either a paid GitHub or another service, such as Bitbucket, which offers free private repositories.

I want to stress that if you use it only when you need it, it is cheaper than a dedicated server. Yes, everything mentioned here can be setup on a physical server, but since I work with big data I need a lot of other AWS services, so I think it is logical to have everything in the same place.

Let’s see how to do the whole setup.

Prerequisites

Before starting to build a virtual machine, you need to register with the following four services:

The only one you need your credit card for is Amazon Web Services. DuckDns is entirely free, while DropBox gives you 2GB of free storage, which can be enough for many tasks. Let’s Encrypt is also free, and it is used internally when you build the image to sign your certificate. Besides these, I recommend a repository hosting service too, like GitHub or Bitbucket, if you want to store your code, however, it is not required for the setup.

To start, navigate to the GitHub BigDataDevKit repository.

Screen 6

Scroll the page and copy the script shown in the image in your text editor of choice:

Screen 7

This script is needed to bootstrap the image. You have to change it and provide some values to the parameters. Carefully, change the text within the quotes. Note you cannot use characters like the quote itself, the backslash or the dollar sign in the password, unless you quote them. This problem is relevant only for the password. If you want to play safe, avoid a quote, dollar sign, or backslashes.

The PASSWORD parameter is a password you choose to access the virtual machine via a web interface. The EMAIL parameter is your email, and will be used when you register an SSL certificate. You will be required to provide your email, and it is the only requirement for getting a free SSL Certificate from Let’s Encrypt.

To get the values for TOKEN and HOST, go to the DuckDNS site and log in. You will need to choose an unused hostname.

Screen 8

Look at the image to see where you have to copy the token and where you have to add your hostname. You must click on the “add domain” button to reserve the hostname.

Configuring your instance

Assuming you have all the parameters and have edited the script, you are ready to launch your instance. Log in to the Amazon Web Services management interface, go to the EC2 Instances panel and click on “Launch Instance”.

Screen 9

In the first screen, you will choose an image. The script is built around the Amazon Linux, and there are no other options available. Select Amazon Linux, the first option in the QuickStart list.

Screen 10

On the second screen, choose the instance type. Given the size of the software running, there are multiple services and you need at least 4GB of memory, so I recommend you select the t2.medium instance. You could trim it down, using the t2.small if you shut down some services, or even the micro if you only want the desktop.

Screen 11

On the third screen, click “Advanced Details” and paste the script you configured in the previous step. I also recommend you enable protection against termination, so that with an accidental termination you won’t lose all your work.

Screen 12

The next step is to configure the storage. The default for an instance is 8GB, which is not enough to contain all the images we will build. I recommend increasing it to 20GB. Also, while it is not needed, I suggest another block device of at least 10GB. The script will mount the second block device as a data folder.You can make a snapshot of its contents, terminate the instance, then recreate it using the snapshot and recovering all the work. Furthermore, a custom block device is not lost when you terminate the instance so have double protection against accidental loss of your data. To increase your safety even further, you can backup your data automatically with Dropbox.

Screen 13

The fifth step is naming the instance. Pick your own. The sixth step offers a way to configure the firewall. By default only SSH is available but we also need HTTPS, so do not forget to add also a rule opening HTTPS. You could open HTTPS to the world, but it’s better if it’s only to your IP to prevent others from accessing your desktop and shell, even though that is still protected with a password.

Once done with this last configuration, you can launch the instance. You will notice that the initialization can take quite a few minutes the first time since the initialization script is running and it will also do some lengthy tasks like generating an HTTPS certificate with Let’s Encrypt.

Screen 14

When you eventually see the management console “running” with a confirmation, and it is no longer “initializing”, you are ready to go.

Assuming all the parameters are correct, you can navigate to https://YOURHOST.duckdns.org.

Replace YOURHOST with the hostname you chose, but do not forget it is an HTTPS site, not HTTP, so your connection to the server is encrypted so you must write https// in the URL. The site will also present a valid certificate for Let’s Encrypt. If there are problems getting the certificate, the initialization script will generate a self-signed certificate. You will still be able to connect with an encrypted connection, but the browser will warn you it is an unknown site, and the connections are insecure. It should not happen, but you never know.

Screen 15

Assuming everything is working, you then access the web terminal, Butterfly. You can log in using the user app and the password you put in the setup script.

Once logged in, you have a bootstrapped virtual machine, which also includes Docker and other goodies, such as a Nginx Frontend, Git, and the Butterfly Web Terminal. Now, you can complete the setup by building the Docker images for your development environment.

Next, type the following commands:

git clone https://github.com/sciabarra/BigDataDevKit
cd BigDataDevKit
sh build.sh

The last command will also ask you to type a password for the Desktop access. Once done, it will start to build the images. Note the build will take a about 10 minutes, but you can see what is happening because everything is shown on the screen.

Once the build is complete, you can also install Dropbox with the following command:

/app/.dropbox-dist/dropboxd

The system will show a link you must click to enable Dropbox. You need to log into Dropbox and then you are done. Whatever you put in the Dropbox folder is automatically synced between all your Dropbox instances.

Once done, you can restart the virtual machine, and access your environment at the https://YOURHOST.dyndns.org/vnc.html URL.

You can stop your machine and restart it when you resume work. The access URL stay the same. This way, you will pay only for the time you are using it, plus monthly extra for the used storage.

Preserving your data

The following discussion requires some knowledge of how Docker and Amazon works. If you do not want to understand the details, just keep in mind following simple rule: In the virtual machine, there is an /app/Dropbox folder available, whatever you place in /app/Dropbox is preserved, and everything else is disposable and can go away. To improve security further, also store your precious code in a version control system.

Now, if you do want to understand this, read on. If you followed my directions in the virtual machine creation, the virtual machine is protected from termination, so you cannot destroy it accidentally. If you expressly decide to terminate it, the primary volume will be destroyed. All the Docker images will be lost, including all the changes you made.

However, since the folder /app/Dropbox is mounted as a Docker Volume for containers, it is not part of Docker images. In the virtual machine, the folder /app is mounted in the Amazon Volume you created, which is also not destroyed even when you expressly terminate the virtual machine. To remove the volume, you have to remove it expressly.

Do not confuse Docker volumes, which are a Docker logical entity, with Amazon Volumes, which is a somewhat physical entity. What happens is that the /app/Dropbox Docker volume is placed inside the /appAmazon volume.

The Amazon Volume is not automatically destroyed when you terminate the virtual machine, so whatever is placed in it will be preserved, until you also expressly destroy the volume. Furthermore, whatever you put in the Docker volume is stored outside of the container, so it is not destroyed when the container is destroyed. If you enabled Dropbox, as recommended, all your content is copied to the Dropbox servers, and to your hard disk if you sync Dropbox with your computer(s). Also, it is recommended that the source code be stored in a version control system.

So, if you place your stuff in version control system under the Dropbox folder, to lose your data all of this must happen:

  • You expressly terminate your virtual machine.
  • You expressly remove the data volume from the virtual machine.
  • You expressly remove the data from Dropbox, including the history.
  • You expressly remove the data from the version control system.

I hope your data is safe enough.

I keep a virtual machine for each project, and when I finish, I keep the unused virtual machines turned off. Of course, I have all my code on GitHub and backed up in Dropbox. Furthermore, when I stop working on a project, I take a snapshot of the Amazon Web Services block before removing the virtual machine entirely. This way, whenever a project resumes, for example for maintenance, all I need to do is start a new virtual machine using the snapshot. All my data goes back in place, and I can resume working.

Optimizing access

First, if you have direct internet access, not mediated by a proxy, you can use native SSH and VNC clients. Direct SSH access is important if you need to copy files in and out of the virtual machine. However, for file sharing, you should consider Dropbox as a simpler alternative.

The VNC web access is invaluable, but sometimes, it can be slower than a native client. You have access to the VNC server on the virtual machine using port 5900. You must expressly open it because it is closed by default. I recommend that you only open it to your IP address, because the internet is full of “robots” that scan the internet looking for services to hook into, and VNC is a frequent target of those robots.

Conclusion

This article explains how you can leverage modern cloud technology to implement an effective development environment. While a machine in the cloud cannot be a complete replacement for your working computer or a laptop, it is good enough for doing development work when it is important to have access to the IDE. In my experience, with current internet connections, it is fast enough to work with.

Being in the cloud, server access and manipulation is faster than having them locally. You can quickly increase (or decrease) memory, fire up another environment, create an image, and so on. You have a datacenter at your fingertips, and when you work with big data projects, well, you need robust services and lots of space. That is what the cloud provides.

The original article was written by MICHELE SCIABARRA – FREELANCE SOFTWARE ENGINEER @ TOPTAL and can be read here.

If you’d like to learn more about Toptal designers or hire one, check this out.

Cloud Security: Is it Safe Enough for You?

Cloud computing is a service that is increasing rapidly in popularity, and companies are expanding to match that demand. Forbes reports that 42 percent of major IT decision makers are planning to increase spending on cloud computing. That spending is expected to reach $32 billion for the year 2015.

Yet while the services are popular, there have been some major breaches in the last few years. While some of them haven’t directly affected you and others only really affected celebrities, it makes one asks questions. Are cloud services really safe? Are they something you can trust your most sensitive data with?

“Cloud computing” by Dynamicwork under CC BY-SA 3.0

This is a basic graphic of what cloud computer is all about, but as you can see there are quite a few thinks that go in and out of the cloud. Do you want your data to be included?

 

Pros and Cons of the Situation Today

The first realization you have to make is that your data is going to be out of your control in a fundamental way. You can make sure there isn’t a leak on your end, but if someone hacks the servers, it is out of your hands. It also isn’t encouraging that cloud service providers have profit in mind as their top priority. This means that they might trim the budget or try to cut corners when it comes to security. They often look to prepare for the next threat, not three threats ahead, as they should.

Back to a simple question: can you trust them? Take a look at the following pros and cons and decide whether it is right for you.

Pros

  • Most hackers attacking cloud systems are only interested in going after major industries and data centers with valuable information. They are unlikely to want your individual information or want to comb through all of your documents to find it.
    • While it is easier to attack an endpoint, such as a user, the price an average hacker could get for your information is simply not worth the time expenditure that the hacker would put in.
  • While governments certainly have not stopped their surveillance in the past year or two, the recent disclosure of the extensive surveillance programs, such as PRISM, conducted by certain countries have forced them to taper off their ambitions. This means that you are less likely to be spied upon during your use of cloud services.
  • Even if it just to maintain a good reputation, all of the major players in cloud computing and storage will agree that security is an important issue. This means that they will be competing with each other for the title of “most secure.”

Cons

  • In 2014, Dropbox changed its terms of use to stop class action lawsuits, and then gave everyone 30 days to opt out of arbitration.
  • Note that this Dropbox document is currently (as of August 2015) about 2,000 words, but still very few people read it.
  • Nearly every major service has had one problem or another during its lifetime.
    • This is only counting the ones we know about.

How to Make Your Decision

Now knowing all that you do, take the following steps to determine whether you should use cloud services or an alternative option.

  • Determine your needs. Do you have a lot of items or documents to store, or are you working from home and have vital business interests to protect? If you have some sensitive data, it might be best stored on a flash drive where it is under your control.
  • Determine the service you might use. Try to research more about that company. Read any agreements beforehand and then sleep on it for a day or two. Compare services in terms of both security record and storage space.
  • Consider the costs. If you want to store a great deal of data that you don’t plan on using too often, then you might be better off getting a form of physical storage instead of paying for a cloud service.
  • Are you going to be sharing files or data with people? If so, then cloud solutions are probably best for you. Some services such as Dropbox are optimized for sharing, but you do need to be careful about who you share your folders with. Experts across the board say that human error is the number one cause of data leaks.

Additional Considerations

In addition to the above factors and tips for deciding whether the security measures put in place by cloud services are enough, there are few other things that you should know about as well.

  • Cloud security on your end is highly dependent on the general security of your online accounts and your computer. If you have a quality password, such as “Tr!yzxp176,” and have an up to date computer with the best anti-malware you can get, then you will have a significantly lower chance of experiencing a breach. Take this into your account when making your decision (or follow these tips anyway, as they are universally helpful).
  • If you are going to be using cloud services, then you should acquire the services of a Virtual Private Network for your computer. This service will connect your computer to an offsite secure server using an encrypted connection. This will protect you from surveillance and data interception on unprotected public networks.As an example of how this works, imagine that you are using cloud services in a café. Normally a hacker could open their laptop a half a dozen tables away, start up a small device, and intercept either the files being transferred or the username and password data for the account you are using. With a VPN your data will safely travel through and encrypted “tunnel” that is created around your connection, allowing no one access over the network.

Cloud computing is said to be secure, but remember that there is always a way to break in, even if it hasn’t been invented yet.

 

Conclusion

Have you come to a decision yet? I hope that you have, and while these tips can help you and give you all the facts that you need to make a decision, only you can make the final call about whether to use cloud services or not. Thank you for reading, and regardless of your decision may you never have to deal with a leak of your personal information.

 

About the Author: Cassie Phillips is an internet security specialist who likes to take a particular focus on the individual and how internet security can affect their day to day life. She loves to blog and is glad that she can share this important information about cloud computing with you.

Cloud computing vision

Cloud computing is the latest buzz on the Internet this days.
What does it mean to us and where does the future of Cloud computing goes?

Some background
In the mid 90’s, we had Citrix, with its vision for server based-computing.
Works similar to the Mainframe idea who came couple of decades before – you put all your resources on one server, and thin clients connect to receive resources.
Couple of years later, we had new buzz, called ASP (Application service provider), which according to Wikipedia is a business that provides computer-based services to customers over a network.
Few years later, ASP changed its name to SaaS (Software as a service), which also referred to as software on demand.
In between, we had VMware who presented to world (at least the most famous) server virtualization.

What is Cloud Computing?
According to Wikipedia, Cloud computing is Internet-based computing, whereby shared resources, software, and information are provided to computers and other devices on demand, like the electricity grid.
The idea of Cloud computing, enables the customers to avoid investing money on hardware and network equipment, and instead, renting usage from third-party provider.

Cloud computing has the following key features:

  • Agility improves with users’ ability to rapidly and inexpensively re-provision technological infrastructure resources.
  • Cost is claimed to be greatly reduced.
  • Device and location independence enable users to access systems using a web browser regardless of their location or what device they are using (e.g., PC, mobile).
  • Multi-tenancy enables sharing of resources and costs across a large pool of users.
  • Reliability is improved if multiple redundant sites are used, which makes well designed cloud computing suitable for business continuity and disaster recovery.
  • Scalability via dynamic (“on-demand”) provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads.
  • Maintenance cloud computing applications are easier to maintain, since they don’t have to be installed on each user’s computer.
  • Metering cloud computing resources usage should be measurable and should be metered per client and application on daily, weekly, monthly, and annual basis.

The confusion point and vision
People tend to confuse between companies moving their data-centers and applications toward the cloud, and actual Cloud computing providers.
A real Cloud computing provider is built from large-scale data centers around the world.
Each rack is built from cheap (to manufacture) hot-swappable hardware – it’s time to say goodbye to 1U-4U servers from all major vendors (HP, IBM, DELL, SUN, etc).
Each blade has many core CPU (4-core, 6-core and above), with allot of memory (as much as the hardware supports).
Each blade is connected to large-scale storage grid.
Everything must be redundant – you must be able to add new racks on-demand, without affecting any customer.
Servers, network equipment and storage devices must be configured in active-active clusters.
Data should be replicated on the fly between data centers across the world, in-order to provide 24/7 availability.
Guest operating system must be able to move between physical servers, transparently, as VMware introduced in its VMotion technology.
Server maintenance should be performed on schedule basis – since everything is transparent to the customer, firmware upgrades, patch management and software/application upgrades will not affect any customer.
The hardware/network/storage layer should be separated from the application layer, so that current SaaS companies will be able to integrate their current applications to the cloud era, and work transparently with Cloud computing infrastructure.

Cloud computing Achilles
The thing that drives most people off the cloud is security.
Customers can’t physically protect their hardware, since they don’t own it.
Customers having troubles protecting their data, since everything is built on virtual machines, connected to shared virtual storage.
I hope that in the near future information security professionals will be able to close this gap, and enable customers transparent, cheap and secure solutions.