web analytics

What To Look Out For In Software Development NDAs

The demand for technical talent, and the ease with which information can be shared, has increased entrepreneurs’ reliance on business relationships with outsiders. It has never been easier for an entrepreneur to find, meet, communicate and eventually enter into some sort of business relationship with an individual or company that is otherwise not associated with the business.

ndas and software development

Moreover, sky-high valuations and fairytale overnight success stories have fueled the notion that even a basic idea can be worth millions, if not billions, in a relatively short time. In light of these factors, you might presume that Non-Disclosure Agreements (NDAs) have been widely accepted in the tech world as a means to protect sensitive and potentially valuable information from theft and abuse. Not so fast.

Before jumping into the debate, though, it helps to have a quick understanding of what an NDA is, what one looks like and, eventually, what to look out for if you’re asked to sign one as a freelance software developer.

What Is A Non-Disclosure Agreement?

An NDA is exactly what its name implies — a legal agreement between two or more parties that (i) defines certain confidential information that will be disclosed and (ii) imposes a legal obligation on the receiving party to keep that information confidential. NDAs are most commonly used when a business relationship between two companies or individuals requires the sharing of confidential information.

For example:

Company A, a local retailer, has hired ABC IT Co. to build an online inventory and order management system. To build the system, Company A must provide ABC IT Co. with a list of Company A’s suppliers and certain pricing information. Before disclosing its supplier list and pricing information, Company A asks ABC IT Co. to sign an NDA forbidding ABC IT Co. from disclosing or using Company A’s confidential information.

If a party to an NDA breaches the agreement, by disclosing or using confidential information for example, the other party to the NDA may sue the breaching party for monetary damages (compensation for lost profits or business), injunctive relief (a court order requiring the breaching party to refrain from taking some action) or specific performance (a court order requiring that the breaching party take some specified action).

So What Do Software Development NDAs Look Like?

NDAs are negotiated legal agreements that can be as simple or as complex as the parties desire. An NDA can be a one page fill-in-the-blank form or a lengthy document drafted from scratch to reflect the unique circumstances of the parties’ relationship, the different negotiating leverage of each party, and the nature of the information that will be disclosed.

what ndas look like

Although there is no such thing as a one-size-fits-all software NDA, for purposes of this overview, and to understand generally how NDAs work, it’s important to appreciate the three “main-event” provisions that are common to all NDAs.

(a) The Definition of Confidential Information:

The definition of “Confidential Information” will set forth the type of disclosed information that is subject to the limitations on use and disclosure and, importantly, the type of disclosed information that is not subject to such limitations.

(b) The Term of the Recipient’s Obligations

The term of an NDA sets forth the time limit on the parties’ obligations. The term of an NDA may be measured in days, weeks, months or years depending on the circumstances of the relationship and the nature of the disclosed information.

(c) The Limitation on Use and Disclosure:

This provision will describe what a recipient party may do and what a recipient party may not do with disclosed information that falls within the definition of Confidential Information. This provision will almost certainly forbid disclosure of Confidential Information, but may also limit the use of Confidential Information and, in some cases, require that the recipient take certain affirmative steps to protect the confidentiality of Confidential Information.

The NDA Debate: Should You Ask For An NDA? Should You Sign One?

Although NDAs have been around for as long as there has been information worth protecting, the high-tech startup boom has thrust their use into the limelight and sparked a debate as to their value. As an industry that is highly dependent on data and constantly evolving technology, one would think that the high-tech startup world would embrace the use of software development NDAs. To understand why that isn’t the case, and to better gauge whether you should ask for an NDA or sign one presented to you, consider the following:

NDAs Are Often Unilateral

NDAs are unilateral when the business relationship requires that only one party disclose confidential information (rather than a mutual exchange of information by each party).

A startup seeks to hire an engineer to build its mobile app and has asked the engineer to sign an NDA. The startup will disclose information to the engineer, but the relationship does not require the engineer to provide confidential information to the startup. The NDA will be unilateral and will impose legal obligations, and potential liability, on the engineer only.

Because only one party is exchanging confidential information, only one party (the recipient party) has a legal obligation to comply with and, as such, only the recipient party is subject to potential liability. What an entrepreneur might view as a means by which to protect an idea, an NDA recipient might view as a one-sided contract.

Entrepreneurs Often Overstate the Need for an NDA

There are surely circumstances where NDAs make sense. Customer lists, pricing information, proprietary formulas and algorithms might have intrinsic value that is best protected by an NDA. Many argue, however, that some entrepreneurs are NDA trigger happy and think that every idea is worthy of legal protection. Ideas though, it is argued, are rarely new and, moreover, often have no value without execution.

NDAs should be asked for only when there is something worth protecting, and many argue that an idea alone does not warrant asking for an NDA. Finally, those most often asked to sign software NDAs – investors and engineers – rarely have any interest in stealing an idea when doing so would likely ruin any professional goodwill and reputation they’ve earned in their respective professional communities.

NDAs Indicate Mistrust

In a perfect world, business would be business and would never be personal. In reality, though, business is often about perception. What might be “just a contract” to an entrepreneur asking for a software development NDA, may be perceived as an indication of mistrust and a questioning of personal integrity by the person being asked to sign one. NDAs are most often requested at the outset of a business relationship, signaling mistrust and calling into question one’s professional integrity may start the relationship off on the wrong foot—even if that wasn’t the intention…perception is powerful.

This issue is less of a concern for business relationships where both parties will be disclosing confidential information and, thus, an NDA will be bilateral and both parties subject to legal obligation. Outside of strategic joint ventures, partnerships, mergers and similar arrangements, however, bilateral exchanges are rare and unilateral NDAs are much more common.

NDAs Can Limit An Information Recipient’s Ability To Earn A Living

As discussed earlier, an NDA defines a set of information that is to be considered “Confidential Information” and then specifies what a recipient may and may not do with that information during the term of the NDA. Whether an NDA is three pages or three-hundred pages, no contract can predict and plan for every possible circumstance and this limitation often works against the recipient of disclosed information.

What if, after signing an NDA, an engineer is asked to build a similar product or to execute a similar but technically different idea? Is using similar code on a different application a violation of the NDA’s non-use provision? What if the engineer learned new skills during the engagement? Can the engineer use those skills for another client? Can the engineer list the client on his or her resume?

There is a real concern that signing even one NDA, whether as an engineer, an investor or otherwise, can drastically shrink one’s pool of potential business. At worst, signing an NDA might foreclose a person’s ability to work on even slightly related projects. At best, signing an NDA complicates future business development efforts as every new opportunity requires a time consuming analysis of conflicts and liability under each and every NDA that the person may be subject to.

Enforcement Isn’t Cheap

The whole point of entering into an NDA is to have some legal remedy if the recipient party discloses confidential information in violation of the agreement. An NDA gives a disclosing party a basis to file a lawsuit seeking money damages and/or a court order against the breaching party. What many NDA proponents don’t fully appreciate, however, is the cost of enforcement.

Filing a lawsuit can be extremely costly and time consuming. A lawsuit for breach of contract will very likely require hiring a lawyer to gather evidence, assess possible legal claims, file the initial complaint and supporting documents, depose the allegedly breaching party and any witnesses and related parties, and argue the case before a judge. Lawsuits can take years, and lawyers typically charge by the hour. Before asking for an NDA, one should assess whether the information to be protected is more valuable than the potential cost of enforcement.

Though the above factors have contributed to a move away from NDAs in the startup world, NDAs are not without their value. Whether you should ask for an NDA before disclosing information, or agree to sign one if you’re on the receiving end of the equation, depends on the particular circumstances of the intended business relationship and each party’s motivation to enter into the relationship. The more valuable the relationship is to a party, the less leverage that party has to negotiate for or against the use of an NDA. The less valuable the relationship is to a party, the more leverage that party has to get its way or walk away. This push and pull is at the heart of all negotiations, the party with the better “Best Alternative to a Negotiated Agreement (BATNA)” has the upper hand.

If You Must Have An NDA…

So what if you’re an engineer and the opportunity to work on a particular project outweighs the risk of signing a software development NDA? What if you’re a startup and the intrinsic value of your information justifies the need for an NDA, despite the difficulty of finding an engineer that will sign one? If you have to sign an NDA, or if you must ask for one, what are some of the things to look out for and consider?

nda without reading

As is always the case, I strongly suggest seeking the guidance of a competent and licensed attorney.Contracts can get complex quickly and legal rights and obligations shouldn’t be left to “winging it.” As you’re finding an attorney, though, you can start by reviewing the some of the NDA’s main operative provisions. The following are a few preliminary things you might consider when presented with or requesting an NDA:

1. Definition of Confidential Information

Recall that this provision defines the type of disclosed information that is subject to the confidentiality obligations of the NDA and, as such, it should reflect the nature of the business relationship and that of the information to be disclosed.

If you’re a disclosing party, you’ll likely ask for a broad definition of Confidential Information to cover everything that might be disclosed to the receiving party during the course of the relationship. If you’re a receiving party, however, you might resist this request and seek instead to narrow the definition to include only specifically designated information such as, for example, written information that is marked “Confidential.” Regardless of where the negotiations come out, the parties should think carefully about striking the right balance between a definition of Confidential Information that is too broad (and thus extremely restrictive to the recipient party), on the one hand, and, on the other hand, too narrow (thus minimizing the protective effect to the disclosing party).

Though it’s important to determine the information that is to be held in confidence, it is equally important to “carve-out” certain information that is not subject to the confidentiality provisions. Common examples of such carve-outs include information that is or becomes publicly available and information that is lawfully known before entry into the business relationship.

2. Term of Confidentiality

The term of an NDA should reflect the nature of the parties’ business relationship and the nature of the information to be disclosed. If the relationship is limited to a one-year engagement, it might not make sense for the term of the NDA to extend too far after termination of the relationship. Similarly, certain types of information become less valuable or sensitive over time. Financial statements, for example, may be particularly valuable at and immediately after the time they are prepared, but probably don’t accurately reflect a company’s financial health months or years after their preparation. If information is of a type that decreases in value or sensitivity over time, a long term is likely not necessary.

3. Disclosure to Representatives

As discussed throughout this article, NDAs are typically signed by a single disclosing party and a single recipient party. The problem, though, is that a recipient party may not always work alone and, rather, may from time to time need to disclose information protected by an NDA to such recipient party’s colleagues, employees or representatives in order to carry out the terms of the business relationship.

David Developer has signed an NDA with BigCo to create a mobile app for BigCo.

During the project, David needs to enlist the help of his colleague, Peter Programmer, to write some code in a language that David is less familiar with. Peter has not signed an NDA, can David disclose information to Peter so that Peter can assist with the project?

Rather than go through the hassle of signing a new NDA for each new person to whom information needs to be disclosed during the course of a project, or trying to predict ahead of time every person to whom information may need to be disclosed, the parties to an NDA may include a representatives provision addressing permitted disclosures to certain defined persons.

The representatives provision is straightforward from a drafting perspective and is simply a definition of “Representatives” that specifies the persons or classes of persons to whom confidential information may be disclosed. A recipient party will likely want the definition to be broad and inclusive of any person with whom the recipient party may collaborate. The disclosing party, of course, will likely want to keep the definition of Representatives as narrow as possible to permit the project to move forward, on the one hand, while maintaining the protections of the software development NDA, on the other. Finally, the disclosing party will very likely wish to include a clause providing that, prior to any disclosure of confidential information to a Representative, the recipient party inform such Representative of the confidential nature of the information and of the terms of the NDA. A representatives clause may look something like the following:

During the Term of this Agreement, the Recipient Party will not disclose the Confidential Information to any person other than the Representatives, provided that, prior to any such disclosure to a Representative, the Recipient Party informs such Representative of the confidential nature of the information and the terms of this Agreement. “Representatives” shall include the employees, independent contractors, partners, agents and other third parties that are or may be engaged by the Recipient Party for purposes of the Project.

4. Non-Disclosure v. Non-Use

This is a big one. As mentioned earlier, NDAs will almost always include a prohibition on disclosure of Confidential Information. Some software NDAs, however, will also prohibit or limit use of Confidential Information. For example:

The Recipient Party agrees that, during the Term of this Agreement, the Recipient Party will not (i) disclose the Confidential Information to any person other than its Representatives and (ii) will not use the Confidential Information for any purpose other than for those purposes directly related to the Project.

Depending on the term of the NDA and the type of information disclosed, restriction on use may not be an issue. If the term is particularly long, however, or the definition of Confidential Information particularly broad, the “use prohibition” may be extraordinarily restrictive on the recipient party. For example, consider the following definition of Confidential Information:

“Confidential Information” includes (i) all information furnished by the Disclosing Party to the Recipient Party, whether furnished before or after the date of this Agreement, whether oral or written, and regardless of the manner in which it was furnished, and (ii) all analyses, compilations, forecasts, studies, interpretations, documents, code and similar work product prepared by the Recipient Party or its Representatives in connection with the Project.

What this means is that, for as long as the NDA is in effect, the Recipient Party cannot disclose or use anyinformation that the Disclosing Party made available to the Recipient Party or any information prepared in connection with the particular Project. Without any carve-outs or qualifications, these clauses could be incredibly limiting.

An engineer signs an NDA which includes the two provisions set out above. During the course of the Project, the engineer learns a new way of putting together common strings of code. The new method could be considered work product that was prepared in connection with the Project and, as such, the engineer may be prohibited from using the method in future projects during the term of the NDA.

Before hearing Startup A’s pitch, an investor signs an NDA which includes provisions similar to those set out above. During the pitch, Startup A reveals its most recent financial statements and its strategy for growth. The investor does not invest. A few months later, the investor is approached by a similar startup, Startup B, and asked to attend a pitch. The investor may be precluded from investing in Startup B as doing so might involve use of information learned during Startup A’s pitch, even if only remembered by the investor.

The above examples are admittedly extreme, but are used to stress the point that the combination of a broad definition of Confidential Information, an unnecessarily long term, and restrictions on use can be paralyzing. Additionally, these are by no means the only red-flags that can sneak into an NDA and what might be a red-flag for one NDA may be perfectly tolerable for a different business relationship.

understanding ndas

So What Do I Do…Specifically?

Though you might now have a better understanding of what an NDA is, what a software development NDA might look like, and why many in the tech world are reluctant to sign them, you might still be wondering what, specifically, you should do when on the receiving end of an NDA. There is no substitute for the advice of a competent attorney, but, with an understanding of the concepts discussed in this article, you can approach the first read of an NDA armed with some knowledge as to what is most important to watch for:

  • Is this a bilateral or unilateral NDA? Will both parties be disclosing information? If so, are the parties subject to identical limitations and requirements?
  • How broad, or narrow, is the definition of Confidential Information?
  • How long are the obligations in effect? Does the term of the NDA match the nature of the business relationship and the information to be disclosed?
  • Am I only prohibited from disclosing the Confidential Information, or disclosing and using the Confidential Information?
  • Am I permitted to disclose the information to my employees and colleagues who may assist with the project?
  • Is this relationship valuable enough to assume a legal obligation that can be enforced in a court?

Finally, the above considerations, and this write-up generally, are not solely for the benefit of those who may be asked to sign an NDA. Certainly, a recipient party should consider very carefully an NDA’s provisions before signing, but a party considering asking for an NDA, too, would be wise to consider these factors.

NDAs, like most contracts, have the most value, and are therefore most likely to be signed, when both parties are comfortable with the balance of risks managed by the NDA and the benefit to be realized by the underlying contractual relationship. By considering the perspective of the recipient party, a party asking for an NDA may be better able to tailor the scope of an NDA to match the business relationship and present to the recipient party a fair and balanced agreement.

Though the information in this write-up should give you a good starting point, there is a lot to consider when asking for or presented with an NDA. A competent attorney can work with both parties to draft an NDA that is protective to the disclosing party, without being overly restrictive to the recipient party, and help move the parties towards a mutually beneficial business relationship.

If you want to learn more about legal issues faced by startups and developers, I suggest you check outStartup Law Hacks as well.

Disclaimer: the contents of this article were written and are made available solely as general information and for educational purposes and not to provide specific legal advice of any kind or to establish an attorney-client relationship. This article should not be used as a substitute for competent legal advice from an attorney licensed in your jurisdiction. This article has been written by Bret Stancil in his individual capacity and the views and opinions expressed herein are his own.

Source: Toptal

Integrating Facebook Authentication in AngularJS App with Satellizer

With the arrival of feature-rich front-end frameworks such as AngularJS, more and more logic is being implemented on the front-end, such as data manipulation/validation, authentication, and more. Satellizer, an easy to use token-based authentication module for AngularJS, simplifies the process of implementing authentication mechanism in AngularJS, The library comes with built-in support for Google, Facebook, LinkedIn, Twitter, Instagram, GitHub, Bitbucket, Yahoo, Twitch, and Microsoft (Windows Live) accounts.

Integrating Facebook Login in AngularJS App with Satellizer

In this article, we will build a very simple webapp similar to the one here which allows you to login and see current user’s information.

Authentication vs Authorization

These are 2 scary words that you often encounter once your app starts integrating a user system. According to Wikipedia:

Authentication is the act of confirming the truth of an attribute of a single piece of data (a datum) claimed true by an entity.

Authorization is the function of specifying access rights to resources related to information security and computer security in general and to access control in particular.

In layman terms, let’s take an example of a blog website with some people working on it. The bloggers write articles and the manager validates the content. Each person can authenticate (login) into the system but their rights (authorisation) are different, so the blogger cannot validate content whereas the manager can.

Why Satellizer

You can create your own authentication system in AngularJS by following some tutorials such as this very detailed one: JSON Web Token Tutorial: An Example in Laravel and AngularJS. I suggest reading this article as it explains JWT (JSON Web Token) very well, and shows a simple way to implement authentication in AngularJS using directly the local storage and HTTP interceptors.

So why Satellizer? The principal reason is that it supports a handful of social network logins such as Facebook, Twitter, etc. Nowadays, especially for websites used on mobile, typing username and password is quite cumbersome and users expect to be able to use your website with little hindrance by using social logins. As integrating the SDK of each social network and following their documentations is quite repetitive, it would be nice to support these social logins with minimal effort.

Moreover Satellizer is an active project on Github. Active is key here as these SDKs change quite frequently and you don’t want to read their documentation every now and then (anyone working with Facebook SDK knows how annoying it is)

AngularJS App with Facebook Login

This is where things start to become interesting.

We will build a web app that has regular login/register (i.e. using username, password) mechanism and supports social logins as well. This webapp is very simple as it has only 3 pages:

  • Home page: anyone can see
  • Login page: to enter username/password
  • Secret page: that only logged in users can see

For backend, we will use Python and Flask. Python and the framework Flask are quite expressive so I hope porting the code to other languages/frameworks will not be very hard. We will, of course, use AngularJS for front-end. And for the social logins, we will integrate with Facebook only as it is the most popular social network at this time.

Let’s start!

Step #1: Bootstrap Project

Here is how we will structure our code:

- app.py
- static/
	- index.html
- app.js
	- bower.json
	- partials/
		- login.tpl.html
		- home.tpl.html
		- secret.tpl.html

All the back-end code is in app.py. The front-end code is put in static/ folder. By default, Flask will automatically serve the contents of static/ folder. All the partial views are in static/partials/ and handled by the ui.router module.

To start coding the back-end, we’ll need Python 2.7.* and install the required libraries using pip. You can of course use virtualenv to isolate a Python environment. Below is the list of required Python modules to put in requirements.txt:

Flask==0.10.1
PyJWT==1.4.0
Flask-SQLAlchemy==1.0
requests==2.7.0

To install all these dependencies:

pip install -r requirements.txt

In app.py we have some initial code to bootstrap Flask (import statements are omitted for brevity):

app = Flask(__name__)

@app.route('/')
def index():
    return flask.redirect('/static/index.html')

if __name__ == '__main__':
    app.run(debug=True)

Next we init bower and install AngularJS and ui.router:

bower init # here you will need to answer some question. when in doubt, just hit enter :)
bower install angular angular-ui-router --save # install and save these dependencies into bower.json

Once these libraries are installed, we need to include AngularJS and ui-router in index.html and create routings for 3 pages: home, login, and secret.

<body ng-app="DemoApp">

<a ui-sref="home">Home</a>
<a ui-sref="login">Login</a>
<a ui-sref="secret">Secret</a>
<div ui-view></div>

<script src="bower_components/angular/angular.min.js"></script>
<script src="bower_components/angular-ui-router/release/angular-ui-router.min.js"></script>
<script src="main.js"></script>
</body>

Below is the code that we need in main.js to configure routing:

var app = angular.module('DemoApp', ['ui.router']);

app.config(function ($stateProvider, $urlRouterProvider) {
  $stateProvider
    .state('home', {
      url: '/home',
      templateUrl: 'partials/home.tpl.html'
    })
    .state('secret', {
      url: '/secret',
      templateUrl: 'partials/secret.tpl.html',
    })
    .state('login', {
      url: '/login',
      templateUrl: 'partials/login.tpl.html'
    });
  $urlRouterProvider.otherwise('/home');

});

At this point if you run the server python app.py, you should have this basic interface at http://localhost:5000

The links Home, Login, and Secret should work at this point and show the content of the corresponding templates.

Congratulation, you just finished setting up the skeleton! If you encounter any error, please check out thecode on GitHub

Step #2: Login and Register

At the end of this step, you’ll have a webapp that you can register/login using email and password.

The first step is to configure the backend. We need a User model and a way to generate the JWT token for a given user. The User model shown below is really simplified and does not perform even any basic checks such as if field email contains “@”, or if field password contains at least 6 characters, etc.

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(100), nullable=False)
    password = db.Column(db.String(100))

    def token(self):
        payload = {
            'sub': self.id,
            'iat': datetime.utcnow(),
            'exp': datetime.utcnow() + timedelta(days=14)
        }
        token = jwt.encode(payload, app.config['TOKEN_SECRET'])
        return token.decode('unicode_escape')

We use the jwt module in python to generate the payload part in JWT. The iat and exp part correspond to the timestamp that token is created and expired. In this code, the token will be expired in 2 weeks.

After the model User was created, we can add the “login” and “register” endpoints. The code for both are quite similar, so here I will just show the “register” part. Please note that by default, Satellizer will call the endpoints /auth/login and /auth/signup for the “login” and “register” respectively.

@app.route('/auth/signup', methods=['POST'])
def signup():
    data = request.json

    email = data["email"]
    password = data["password"]

    user = User(email=email, password=password)
    db.session.add(user)
    db.session.commit()

    return jsonify(token=user.token())

Let’s check the endpoint using curl first:

curl localhost:5000/auth/signup -H "Content-Type: application/json" -X POST -d '{"email":"[email protected]","password":"xyz"}'

The result should look like this:

{
  "token": "very long string…."
}

Now that the back-end part is ready, let’s attack the front-end! First, we need to install satellizer and add it as a dependency in main.js:

bower install satellizer --save

Add satellizer as dependency:

var app = angular.module('DemoApp', ['ui.router', 'satellizer']);

Login and signup in satellizer is actually quite simple in comparison to all the setup until now:

$scope.signUp = function () {
    $auth
      .signup({email: $scope.email, password: $scope.password})
      .then(function (response) {
        // set the token received from server
        $auth.setToken(response);
        // go to secret page
        $state.go('secret');
      })
      .catch(function (response) {
        console.log("error response", response);
      })
  };

If you have any difficulty setting up the code, you can take a look at the code on GitHub.

Step #3: But Secret View Is Not Really Secret, Because Anyone Can See It!

Yes, that is correct! Until now, anyone can go to secret page without logging in.

It’s time to add some interceptor in AngularJS to make sure that if someone goes to secret page and if this user is not logged in, they will be redirected to the login page.

First, we should add a flag requiredLogin to distinguish secret page from other ones.

    .state('secret', {
      url: '/secret',
      templateUrl: 'partials/secret.tpl.html',
      controller: 'SecretCtrl',
      data: {requiredLogin: true}
    })

The “data” part will be used in the $stateChangeStart event which is fired each time the routing changes:

app.run(function ($rootScope, $state, $auth) {
  $rootScope.$on('$stateChangeStart',
    function (event, toState) {
      var requiredLogin = false;
      // check if this state need login
      if (toState.data && toState.data.requiredLogin)
        requiredLogin = true;
      
      // if yes and if this user is not logged in, redirect him to login page
      if (requiredLogin && !$auth.isAuthenticated()) {
        event.preventDefault();
        $state.go('login');
      }
    });
});

Now, the user cannot go directly to the secret page without logging in. Hooray!

As usual, the code of this step can be found here.

Step #4: It’s Time to Get Something Really Secret!

At this moment, there’s nothing really secret in the secret page. Let’s put something personal there.

This step starts by creating an endpoint in the back-end which is only accessible for an authenticated user, such as having a valid token. The endpoint /user below returns the user_id and email of the user corresponding to the token.

@app.route('/user')
def user_info():
    # the token is put in the Authorization header
    if not request.headers.get('Authorization'):
        return jsonify(error='Authorization header missing'), 401
    
    # this header looks like this: “Authorization: Bearer {token}”
    token = request.headers.get('Authorization').split()[1]
    try:
        payload = jwt.decode(token, app.config['TOKEN_SECRET'])
    except DecodeError:
        return jsonify(error='Invalid token'), 401
    except ExpiredSignature:
        return jsonify(error='Expired token'), 401
    else:
        user_id = payload['sub']
        user = User.query.filter_by(id=user_id).first()
        if user is None:
            return jsonify(error='Should not happen ...'), 500
        return jsonify(id=user.id, email=user.email), 200
    return jsonify(error="never reach here..."), 500

Again, we make use of the module jwt to decode the JWT token included in the ‘Authorization’ header and to handle the case when the token is expired or not valid.

Let’s test this endpoint using curl. First, we need to get a valid token:

curl localhost:5000/auth/signup -H "Content-Type: application/json" -X POST -d '{"email":"[email protected]","password":"xyz"}'

Then with this token:

curl localhost:5000/user -H "Authorization: Bearer {put the token here}"

Which gives this result:

{
  "email": "[email protected]",
  "id": 1
}

Now we need to include this endpoint in the Secret Controller. This is quite simple as we just need to call the endpoint using the regular $http module. The token is automatically inserted to the header by Satellizer, so we don’t need to bother with all the details of saving the token and then putting it in the right header.

  getUserInfo();

  function getUserInfo() {
    $http.get('/user')
      .then(function (response) {
        $scope.user = response.data;
      })
      .catch(function (response) {
        console.log("getUserInfo error", response);
      })
  }

Finally, we have something truly personal in the secret page!

The code of this step is on GitHub.

Step #5: Facebook Login with Satellizer

A nice thing about Satellizer, as mentioned at the beginning, is it makes integrating social login a lot easier. At the end of this step, users can login using their Facebook account!

First thing to do is to create an application on Facebook developers page in order to have an application_idand a secret code. Please follow developers.facebook.com/docs/apps/register to create a Facebook developer account if you don’t have one already and create a website app. After that, you will have the application ID and application secret as in the screenshot below.

Once the user chooses to connect with Facebook, Satellizer will send an authorization code to the endpoint/auth/facebook. With this authorization code, the back-end can retrieve an access token from Facebook/oauth endpoint that allows the call to Facebook Graph API to get user information such as location, user_friends, user email, etc.

We also need to keep track of whether a user account is created with Facebook or through regular signup. To do so, we add facebook_id to our User model.

facebook_id = db.Column(db.String(100)) 

The facebook secret is configured via env variables FACEBOOK_SECRET that we add to app.config.

app.config['FACEBOOK_SECRET'] = os.environ.get('FACEBOOK_SECRET')

So to launch the app.py, you should set this env variable:

FACEBOOK_SECRET={your secret} python app.py

Here is the method which handles Facebook logins. By default Satellizer will call the endpoint /auth/facebook.

@app.route('/auth/facebook', methods=['POST'])
def auth_facebook():
    access_token_url = 'https://graph.facebook.com/v2.3/oauth/access_token'
    graph_api_url = 'https://graph.facebook.com/v2.5/me?fields=id,email'

    params = {
        'client_id': request.json['clientId'],
        'redirect_uri': request.json['redirectUri'],
        'client_secret': app.config['FACEBOOK_SECRET'],
        'code': request.json['code']
    }

    # Exchange authorization code for access token.
    r = requests.get(access_token_url, params=params)
    # use json.loads instead of urlparse.parse_qsl
    access_token = json.loads(r.text)

    # Step 2. Retrieve information about the current user.
    r = requests.get(graph_api_url, params=access_token)
    profile = json.loads(r.text)

    # Step 3. Create a new account or return an existing one.
    user = User.query.filter_by(facebook_id=profile['id']).first()
    if user:
        return jsonify(token=user.token())

    u = User(facebook_id=profile['id'], email=profile['email'])
    db.session.add(u)
    db.session.commit()
    return jsonify(token=u.token())

To send a request to the Facebook server, we use the handy module requests. Now the difficult part on the back-end is done. On the front-end, adding Facebook login is quite simple. First, we need to tell Satellizer ourfacebook_id by adding this code into app.config function:

$authProvider.facebook({
    clientId: {your facebook app id},
    // by default, the redirect URI is http://localhost:5000
    redirectUri: 'http://localhost:5000/static/index.html'
  });

To login using Facebook, we can just call:

$auth.authenticate(“facebook”)

As usual, you can check the code on GitHub

At this time, the webapp is complete in terms of functionality. The user can login/register using regular email and password or by using Facebook. Once logged in, the user can see his secret page.

Make a Pretty Interface

The interface is not very pretty at this point, so let’s add a little bit of Bootstrap for the layout and the angular toaster module to handle an error message nicely, such as when login fails.

The code for this beautifying part can be found here.

Conclusion

This article shows a step-by-step integration of Satellizer in a (simple) AngularJS webapp. With Satellizer, we can easily add other social logins such as Twitter, Linkedin, and more. The code on the front-end is quite the same as in the article. However, the back-end varies as social network SDKs have different endpoints with different protocols. You can take a look at https://github.com/sahat/satellizer/blob/master/examples/server/python/app.py which contains examples for Facebook, Github, Google, Linkedin, Twiter and Bitbucket. When in doubt, you should take a look at the documentation on https://github.com/sahat/satellizer.

This article was written by Son Nguyen Kim, a Toptal freelance developer.

9 Essential System Security Interview Questions

  1. What is a pentest?

“Pentest” is short for “penetration test”, and involves having a trusted security expert attack a system for the purpose of discovering, and repairing, security vulnerabilities before malicious attackers can exploit them. This is a critical procedure for securing a system, as the alternative method for discovering vulnerabilities is to wait for unknown agents to exploit them. By this time it is, of course, too late to do anything about them.

In order to keep a system secure, it is advisable to conduct a pentest on a regular basis, especially when new technology is added to the stack, or vulnerabilities are exposed in your current stack.

 

2. What is social engineering?

“Social engineering” refers to the use of humans as an attack vector to compromise a system. It involves fooling or otherwise manipulating human personnel into revealing information or performing actions on the attacker’s behalf. Social engineering is known to be a very effective attack strategy, since even the strongest security system can be compromised by a single poor decision. In some cases, highly secure systems that cannot be penetrated by computer or cryptographic means, can be compromised by simply calling a member of the target organization on the phone and impersonating a colleague or IT professional.

Common social engineering techniques include phishing, clickjacking, and baiting, although several other tricks are at an attacker’s disposal. Baiting with foreign USB drives was famously used to introduce the Stuxnet worm into Iran’s uranium enrichment facilities, damaging the nation’s ability to produce nuclear material.

For more information, a good read is Christopher Hadnagy’s book Social Engineering: The Art of Human Hacking.

3. You find PHP queries overtly in the URL, such as /index.php=?page=userID. What would you then be looking to test? 

This is an ideal situation for injection and querying. If we know that the server is using a database such as SQL with a PHP controller, it becomes quite easy. We would be looking to test how the server reacts to multiple different types of requests, and what it throws back, looking for anomalies and errors.

One example could be code injection. If the server is not using authentication and evaluating each user, one could simply try /index.php?arg=1;system(‘id’) and see if the host returns unintended data.

4. You find yourself in an airport in the depths of of a foreign superpower. You’re out of mobile broadband and don’t trust the WI-FI. What do you do? Further, what are the potential threats from open WI-FIs?

Ideally you want all of your data to pass through an encrypted connection. This would usually entail tunneling via SSH into whatever outside service you need, over a virtual private network (VPN). Otherwise, you’re vulnerable to all manner of attacks, from man-in-the-middle, to captive portals exploitation, and so on.

5. What does it mean for a machine to have an “air gap”? Why are air gapped machines important?

An air gapped machine is simply one that cannot connect to any outside agents. From the highest level being the internet, to the lowest being an intranet or even bluetooth.

Air gapped machines are isolated from other computers, and are important for storing sensitive data or carrying out critical tasks that should be immune from outside interference. For example, a nuclear power plant should be operated from computers that are behind a full air gap. For the most part, real world air gapped computers are usually connected to some form of intranet in order to make data transfer and process execution easier. However, every connection increases the risk that outside actors will be able to penetrate the system.

 

6. You’re tasked with setting up an email encryption system for certain employees of a company. What’s the first thing you should be doing to set them up? How would you distribute the keys?

The first task is to do a full clean and make sure that the employees’ machines aren’t compromised in any way. This would usually involve something along the lines of a selective backup. One would take only the very necessary files from one computer and copy them to a clean replica of the new host. We give the replica an internet connection and watch for any suspicious outgoing or incoming activity. Then one would perform a full secure erase on the employee’s original machine, to delete everything right down to the last data tick, before finally restoring the backed up files.

The keys should then be given out by transferring them over wire through a machine or device with no other connections, importing any necessary .p7s email certificate files into a trusted email client, then securely deleting any trace of the certificate on the originating computer.

The first step, cleaning the computers, may seem long and laborious. Theoretically, if you are 100% certain that the machine is in no way affected by any malicious scripts, then of course there is no need for such a process. However in most cases, you’ll never know this for sure, and if any machine has been backdoored in any kind of way, this will usually mean that setting up secure email will be done in vain.

7. You manage to capture email packets from a sender that are encrypted through Pretty Good Privacy (PGP). What are the most viable options to circumvent this?

First, one should be considering whether to even attempt circumventing the encryption directly. Decryption is nearly impossible here unless you already happen to have the private key. Without this, your computer will be spending multiple lifetimes trying to decrypt a 2048-bit key. It’s likely far easier to simply compromise an end node (i.e. the sender or receiver). This could involve phishing, exploiting the sending host to try and uncover the private key, or compromising the receiver to be able to view the emails as plain text.

8. What makes a script fully undetectable (FUD) to antivirus software? How would you go about writing a FUD script? 

A script is FUD to an antivirus when it can infect a target machine and operate without being noticed on that machine by that AV. This usually entails a script that is simple, small, and precise

To know how to write a FUD script, one must understand what the targeted antivirus is actually looking for. If the script contains events such as Hook_Keyboard(), File_Delete(), or File_Copy(), it’s very likely it wil be picked up by antivirus scanners, so these events are not used. Further, FUD scripts will often mask function names with common names used in the industry, rather than naming them things like fToPwn1337(). A talented attacker might even break up his or her files into smaller chunks, and then hex edit each individual file, thereby making it even more unlikely to be detected.

As antivirus software becomes more and more sophisticated, attackers become more sophisticated in response. Antivirus software such as McAfee is much harder to fool now than it was 10 years ago. However, there are talented hackers everywhere who are more than capable of writing fully undetectable scripts, and who will continue to do so. Virus protection is very much a cat and mouse game.

9. What is a “Man-in-the-Middle” attack?

A man-in-the-middle attack is one in which the attacker secretly relays and possibly alters the communication between two parties who believe they are directly communicating with each other. One example is active eavesdropping, in which the attacker makes independent connections with the victims and relays messages between them to make them believe they are talking directly to each other over a private connection, when in fact the entire conversation is controlled by the attacker, who even has the ability to modify the content of each message. Often abbreviated to MITM, MitM, or MITMA, and sometimes referred to as a session hijacking attack, it has a strong chance of success if the attacker can impersonate each party to the satisfaction of the other. MITM attacks pose a serious threat to online security because they give the attacker the ability to capture and manipulate sensitive information in real-time while posing as a trusted party during transactions, conversations, and the transfer of data. This is straightforward in many circumstances; for example, an attacker within reception range of an unencrypted WiFi access point, can insert himself as a man-in-the-middle.

This article is from Toptal.

Credit Card Hacks and How To Avoid Them

Preface

If you know me, or have read my previous post, you know that I worked for a very interesting company before joining Toptal. At this company, our payment provider processed transactions in the neighborhood of $500k per day. Part of my job was to make our provider PCI-DSS compliant—that is, compliant with the Payment Card Industry – Data Security Standard.

It’s safe to say that this wasn’t a job for the faint of heart. At this point, I’m pretty intimate with Credit Cards (CCs), Credit Card fraud and web security in general. After all, our job was to protect our users’ data, to prevent it from being hacked, stolen or misused.

You could imagine my surprise when I saw Bennett Haselton’s 2007 article on Slashdot: Why Are CC Numbers Still So Easy to Find?. In short, Haselton was able to find Credit Card numbers through Google, firstly by searching for a card’s first eight digits in “nnnn nnnn” format, and later using some advanced queries built on number ranges. For example, he could use “4060000000000000..4060999999999999” to find all the 16 digit Primary Account Numbers (PANs) from CHASE (whose cards all begin with 4060). By the way: here’s a full list of Issuer ID numbers.

At the time, I didn’t think much of it, as Google immediately began to filter the types of queries that Bennett was using. When you tried to Google a range like that, Google would serve up a page that said something along the lines of “You’re a bad person”.

This is Google’s response to those trying to figure out how to find credit card numbers online.

About six months ago, while reminiscing with an old friend, this credit card number hack came to mind again. Soon-after, I discovered something alarming. Not terribly alarming, but certainly alarming—so I notified Google, and waited. After a month without a response, I notified them again to no avail.

With a minor tweak on Haselton’s old trick, I was able to Google Credit Card numbers, Social Security numbers, and any other sensitive information of interest.

So I notified Google, and waited. After a month without a response, I notified them again to no avail. With a minor tweak on Haselton’s old trick, I was able to Google Credit Card numbers, Social Security numbers, and any other sensitive information.

Bennett

Yesterday, some friends of mine (buhera.blog.hu and _2501) brought a more recent Slashdot post to my attention: Credit Card Numbers Still Google-able.

The article’s author, again Bennett Haselton, who wrote the original article back in 2007, claims that credit card numbers can still be Googled. You can’t use the number range query hack, but it still can be done. Instead of using simple ranges, you need to apply specific formatting to your query. Something like: “1234 5678” (notice the space in the middle). A lot of hits come up for this query, but very few are of actual interest. Among the contestants are phone numbers, zip-codes, and such. Not extremely alarming. But here comes the credit card hack twist.

The “Methodology”

I was curious if it was still possible to get credit card numbers online the way we could in 2007. As any good Engineer, I usually approach things using a properly construed and intelligent plan that needs to be perfectly executed with the utmost precision. If you have tried that method, you might know that it can fail really hard—in which case your careful planning and effort goes to waste.

In IT we have a tendency to over-intellectualize, even when it isn’t exactly warranted. I have seen my friends and colleagues completely break applications using seemingly random inputs. Their success rate was stunning and the effort they put into it was close to zero. That’s when I learned that to open a door, sometimes you just have to knock.

The Credit Card “Hack”

The previous paragraph was a cleverly disguised attempt to make me look like less of an idiot when I show off my “elite hacking skills”. Oops.

First, I tried several range-query-based approaches. Then, I looked at advanced queries and pretty much anything you might come up with in an hour or so. None of them yielded significant results.

And then I had a crazy idea.

What if there was a mismatch between the filtering engine and the actual back-end? What if the message I got from Google (“You are a bad person”) wasn’t from the back-end itself, but instead from a designated filtering engine Google had implemented to censor queries like mine?

It would make a lot of sense from an architectural perspective. And bugs like that are pretty common—we see them in ITSEC all the time, particularly in IDS/IPS solutions, but also in common software. There’s a filtering procedure that processes data and only gives it to the back-end if it thinks the data is acceptable/non-malicious. However, the back-end and the filtering server almost never parse the input in exactly the same way. Thus, a seemingly valid input can go through the filter and wreak havoc on the back-end, effectively bypassing the filter.

You can usually trigger this type of behavior by providing your input in various encodings. For example: instead of using decimal numbers (0-9), how about converting them to hexadecimal or octal or binary? Well, guess what…

Search for this and Google will tell you that you’re a bad person: “4060000000000000..4060999999999999”

Search for this and Google will be happy to oblige: “0xe6c8c69c9c000..0xe6d753e6ecfff”.

The only thing you need to do is to convert credit card numbers from decimal to hexadecimal. That’s it.

The results include…

  • Humongous CSV files filled with potentially sensitive information.

With this simple credit card hack, a major privacy invasion is waiting to happen.

  • Faulty e-commerce log files.

These faulty e-commerce log files make credit card lookup easy.

  • Sensitive information shared on hacker sites (and even Facebook).

How to hack credit cards is as simple as using hexadecimal.

It’s truly scary stuff.

I know this bug won’t inspire any security research, but there you have it. Google made this boo-boo and neglected to even write me back. Well, it happens. I don’t envy the security folks at the big G, though. They must have a lot of stuff to look out for. I’m posting about this credit card number hack here because:

  1. It’s relatively low impact.
  2. Anyone who’s interested and motivated will have figured this out by now.
  3. To quote Haselton, if the big players aren’t taking responsibility and acting on these exploits, then “the right thing to do is to shine a light on the problem and insist that they fix it as soon as possible”.

This trick can be used to look up phone numbers, SSNs, TFNs, and more. And, as Bennett wrote, these numbers are much much harder to change than your Credit Card, for which you can simply call your bank and cancel the card.

Sample Queries

WARNING: Do NOT Google your own credit card number in full!

Look for any CC PAN starting with 4060: 4060000000000000..4060999999999999 ? 0xe6c8c69c9c000..0xe6d753e6ecfff

Some Hungarian phone numbers from the provider ‘Telenor’? No problem: 36200000000..36209999999 ? 0x86db02a00..0x86e48c07f

Look for SSNs. Thankfully, these don’t return many meaningful results: 100000000..999999999 ? 0x5f5e100..0x3b9ac9ff

There are many, many more.

If you find anything very alarming, or if you’re curious about credit card hacking, please leave it in the comments or contact me by email at [email protected] or on Twitter at @synsecblog. Calling the police is usually futile in these cases, but it might be worth a try. The given merchant or the card provider is usually more keen to address the issue.

Where to Go From Here

Well, Google obviously has to fix this, possibly with the help of the big players like Visa and Mastercard. In fact, Haselton provides a number of interesting suggestions in the two articles linked above.

What you need to do, however (and why I’ve written this post), is spread the word. Credit Card fraud is a big industry, and simple awareness can save you from becoming a victim. Further, if you have an e-commerce site or handle any credit card processing, please make sure that you’re secure. PCI-DSS is a good guideline, but it is far from perfect. Plus, it is always a good idea to Google your site with the “site:mysite.com” advanced query, looking for sensitive numbers. There’s a very, very slim chance that you’ll find anything—but if you do, you must act on it immediately.

Also, a bit of friendly advice: You should never give out your credit card information to anyone. My advice would be to use PayPal or a similar service whenever possible. You can check out these links for further information:

And a few general tips: don’t download things you didn’t ask for, don’t open spam emails, and remember that your bank will never ask for your password.

By the way: If you think there’s no one stupid enough to fall for these credit card hacking techniques or give away their credit card information on the internet, have a look at @NeedADebitCard.

Stay safe people!

This article was written by Gergely Kalman, a Toptal Security Specialist.

Scaling Scala: How to Dockerize Using Kubernetes

Kubernetes is the new kid on the block, promising to help deploy applications into the cloud and scale them more quickly. Today, when developing for a microservices architecture, it’s pretty standard to choose Scala for creating API servers.

Microservices are replacing classic monolithic back-end servers with multiple independent services that communicate among themselves and have their own processes and resources.

If there is a Scala application in your plans and you want to scale it into a cloud, then you are at the right place. In this article I am going to show step-by-step how to take a generic Scala application and implement Kubernetes with Docker to launch multiple instances of the application. The final result will be a single application deployed as multiple instances, and load balanced by Kubernetes.

All of this will be implemented by simply importing the Kubernetes source kit in your Scala application. Please note, the kit hides a lot of complicated details related to installation and configuration, but it is small enough to be readable and easy to understand if you want to analyze what it does. For simplicity, we will deploy everything on your local machine. However, the same configuration is suitable for a real-world cloud deployment of Kubernetes.

Scale Your Scala Application with Kubernetes

Be smart and sleep tight, scale your Docker with Kubernetes.

What is Kubernetes?

Before going into the gory details of the implementation, let’s discuss what Kubernetes is and why it’s important.

You may have already heard of Docker. In a sense, it is a lightweight virtual machine.

Docker gives the advantage of deploying each server in an isolated environment, very similar to a stand-alone virtual machine, without the complexity of managing a full-fledged virtual machine.

For these reasons, it is already one of the more widely used tools for deploying applications in clouds. A Docker image is pretty easy and fast to build and duplicable, much easier than a traditional virtual machine like VMWare, VirtualBox, or XEN.

Kubernetes complements Docker, offering a complete environment for managing dockerized applications. By using Kubernetes, you can easily deploy, configure, orchestrate, manage, and monitor hundreds or even thousands of Docker applications.

Kubernetes is an open source tool developed by Google and has been adopted by many other vendors. Kubernetes is available natively on the Google cloud platform, but other vendors have adopted it for their OpenShift cloud services too. It can be found on Amazon AWS, Microsoft Azure, RedHat OpenShift, and even more cloud technologies. We can say it is well positioned to become a standard for deploying cloud applications.

Prerequisites

Now that we covered the basics, let’s check if you have all the prerequisite software installed. First of all, you need Docker. If you are using either Windows or Mac, you need the Docker Toolbox. If you are using Linux, you need to install the particular package provided by your distribution or simply follow the official directions.

We are going to code in Scala, which is a JVM language. You need, of course, the Java Development Kit and the scala SBT tool installed and available in the global path. If you are already a Scala programmer, chances are you have those tools already installed.

If you are using Windows or Mac, Docker will by default create a virtual machine named default with only 1GB of memory, which can be too small for running Kubernetes. In my experience, I had issues with the default settings. I recommend that you open the VirtualBox GUI, select your virtual machine default, and change the memory to at least to 2048MB.

VirtualBox memory settings

The Application to Clusterize

The instructions in this tutorial can apply to any Scala application or project. For this article to have some “meat” to work on, I chose an example used very often to demonstrate a simple REST microservice in Scala, called Akka HTTP. I recommend you try to apply source kit to the suggested example before attempting to use it on your application. I have tested the kit against the demo application, but I cannot guarantee that there will be no conflicts with your code.

So first, we start by cloning the demo application:

git clone https://github.com/theiterators/akka-http-microservice

Next, test if everything works correctly:

cd akka-http-microservice
sbt run

Then, access to http://localhost:9000/ip/8.8.8.8, and you should see something like in the following image:

Akka HTTP microservice is running

Adding the Source Kit

Now, we can add the source kit with some Git magic:

git remote add ScalaGoodies https://github.com/sciabarra/ScalaGoodies
git fetch --all
git merge ScalaGoodies/kubernetes

With that, you have the demo including the source kit, and you are ready to try. Or you can even copy and paste the code from there into your application.

Once you have merged or copied the files in your projects, you are ready to start.

Starting Kubernetes

Once you have downloaded the kit, we need to download the necessary kubectl binary, by running:

bin/install.sh

This installer is smart enough (hopefully) to download the correct kubectl binary for OSX, Linux, or Windows, depending on your system. Note, the installer worked on the systems I own. Please do report any issues, so that I can fix the kit.

Once you have installed the kubectl binary, you can start the whole Kubernetes in your local Docker. Just run:

bin/start-local-kube.sh

The first time it is run, this command will download the images of the whole Kubernetes stack, and a local registry needed to store your images. It can take some time, so please be patient. Also note, it needs direct accesses to the internet. If you are behind a proxy, it will be a problem as the kit does not support proxies. To solve it, you have to configure the tools like Docker, curl, and so on to use the proxy. It is complicated enough that I recommend getting a temporary unrestricted access.

Assuming you were able to download everything successfully, to check if Kubernetes is running fine, you can type the following command:

bin/kubectl get nodes

The expected answer is:

NAME        STATUS    AGE
127.0.0.1   Ready     2m

Note that age may vary, of course. Also, since starting Kubernetes can take some time, you may have to invoke the command a couple of times before you see the answer. If you do not get errors here, congratulations, you have Kubernetes up and running on your local machine.

Dockerizing Your Scala App

Now that you have Kubernetes up and running, you can deploy your application in it. In the old days, before Docker, you had to deploy an entire server for running your application. With Kubernetes, all you need to do to deploy your application is:

  • Create a Docker image.
  • Push it in a registry from where it can be launched.
  • Launch the instance with Kubernetes, that will take the image from the registry.

Luckily, it is way less complicated that it looks, especially if you are using the SBT build tool like many do.

In the kit, I included two files containing all the necessary definitions to create an image able to run Scala applications, or at least what is needed to run the Akka HTTP demo. I cannot guarantee that it will work with any other Scala applications, but it is a good starting point, and should work for many different configurations. The files to look for building the Docker image are:

docker.sbt
project/docker.sbt

Let’s have a look at what’s in them. The file project/docker.sbt contains the command to import the sbt-docker plugin:

addSbtPlugin("se.marcuslonnberg" % "sbt-docker" % "1.4.0")

This plugin manages the building of the Docker image with SBT for you. The Docker definition is in the docker.sbt file and looks like this:

imageNames in docker := Seq(ImageName("localhost:5000/akkahttp:latest"))

dockerfile in docker := {
  val jarFile: File = sbt.Keys.`package`.in(Compile, packageBin).value
  val classpath = (managedClasspath in Compile).value
  val mainclass = mainClass.in(Compile, packageBin).value.getOrElse(sys.error("Expected exactly one main class"))
  val jarTarget = s"/app/${jarFile.getName}"
  val classpathString = classpath.files.map("/app/" + _.getName)
    .mkString(":") + ":" + jarTarget
  new Dockerfile {
    from("anapsix/alpine-java:8")
    add(classpath.files, "/app/")
    add(jarFile, jarTarget)
    entryPoint("java", "-cp", classpathString, mainclass)
  }
}

To fully understand the meaning of this file, you need to know Docker well enough to understand this definition file. However, we are not going into the details of the Docker definition file, because you do not need to understand it thoroughly to build the image.

The beauty of using SBT for building the Docker image is that
the SBT will take care of collecting all the files for you.

Note the classpath is automatically generated by the following command:

val classpath = (managedClasspath in Compile).value

In general, it is pretty complicated to gather all the JAR files to run an application. Using SBT, the Docker file will be generated with add(classpath.files, "/app/"). This way, SBT collects all the JAR files for you and constructs a Dockerfile to run your application.

The other commands gather the missing pieces to create a Docker image. The image will be built using an existing image APT to run Java programs (anapsix/alpine-java:8, available on the internet in the Docker Hub). Other instructions are adding the other files to run your application. Finally, by specifying an entry point, we can run it. Note also that the name starts with localhost:5000 on purpose, because localhost:5000 is where I installed the registry in the start-kube-local.sh script.

Building the Docker Image with SBT

To build the Docker image, you can ignore all the details of the Dockerfile. You just need to type:

sbt dockerBuildAndPush

The sbt-docker plugin will then build a Docker image for you, downloading from the internet all the necessary pieces, and then it will push to a Docker registry that was started before, together with the Kubernetes application in localhost. So, all you need is to wait a little bit more to have your image cooked and ready.

Note, if you experience problems, the best thing to do is to reset everything to a known state by running the following commands:

bin/stop-kube-local.sh
bin/start-kube-local.sh

Those commands should stop all the containers and restart them correctly to get your registry ready to receive the image built and pushed by sbt.

Starting the Service in Kubernetes

Now that the application is packaged in a container and pushed in a registry, we are ready to use it. Kubernetes uses the command line and configuration files to manage the cluster. Since command lines can become very long, and also be able to replicate the steps, I am using the configurations files here. All the samples in the source kit are in the folder kube.

Our next step is to launch a single instance of the image. A running image is called, in the Kubernetes language, a pod. So let’s create a pod by invoking the following command:

bin/kubectl create -f kube/akkahttp-pod.yml

You can now inspect the situation with the command:

bin/kubectl get pods

You should see:

NAME                   READY     STATUS    RESTARTS   AGE
akkahttp               1/1       Running   0          33s
k8s-etcd-127.0.0.1     1/1       Running   0          7d
k8s-master-127.0.0.1   4/4       Running   0          7d
k8s-proxy-127.0.0.1    1/1       Running   0          7d

Status actually can be different, for example, “ContainerCreating”, it can take a few seconds before it becomes “Running”. Also, you can get another status like “Error” if, for example, you forget to create the image before.

You can also check if your pod is running with the command:

bin/kubectl logs akkahttp

You should see an output ending with something like this:

[DEBUG] [05/30/2016 12:19:53.133] [default-akka.actor.default-dispatcher-5] [akka://default/system/IO-TCP/selectors/$a/0] Successfully bound to /0:0:0:0:0:0:0:0:9000

Now you have the service up and running inside the container. However, the service is not yet reachable. This behavior is part of the design of Kubernetes. Your pod is running, but you have to expose it explicitly. Otherwise, the service is meant to be internal.

Creating a Service

Creating a service and checking the result is a matter of executing:

bin/kubectl create -f kube/akkahttp-service.yaml
bin/kubectl get svc

You should see something like this:

NAME               CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
akkahttp-service   10.0.0.54                  9000/TCP   44s
kubernetes         10.0.0.1     <none>        443/TCP    3m

Note that the port can be different. Kubernetes allocated a port for the service and started it. If you are using Linux, you can directly open the browser and type http://10.0.0.54:9000/ip/8.8.8.8 to see the result. If you are using Windows or Mac with Docker Toolbox, the IP is local to the virtual machine that is running Docker, and unfortunately it is still unreachable.

I want to stress here that this is not a problem of Kubernetes, rather it is a limitation of the Docker Toolbox, which in turn depends on the constraints imposed by virtual machines like VirtualBox, which act like a computer within another computer. To overcome this limitation, we need to create a tunnel. To make things easier, I included another script which opens a tunnel on an arbitrary port to reach any service we deployed. You can type the following command:

bin/forward-kube-local.sh akkahttp-service 9000

Note that the tunnel will not run in the background, you have to keep the terminal window open as long as you need it and close when you do not need the tunnel anymore. While the tunnel is running, you can open: http://localhost:9000/ip/8.8.8.8 and finally see the application running in Kubernetes.

Final Touch: Scale

So far we have “simply” put our application in Kubernetes. While it is an exciting achievement, it does not add too much value to our deployment. We’re saved from the effort of uploading and installing on a server and configuring a proxy server for it.

Where Kubernetes shines is in scaling. You can deploy two, ten, or one hundred instances of our application by only changing the number of replicas in the configuration file. So let’s do it.

We are going to stop the single pod and start a deployment instead. So let’s execute the following commands:

bin/kubectl delete -f kube/akkahttp-pod.yml
bin/kubectl create -f kube/akkahttp-deploy.yaml

Next, check the status. Again, you may try a couple of times because the deployment can take some time to be performed:

NAME                                   READY     STATUS    RESTARTS   AGE
akkahttp-deployment-4229989632-mjp6u   1/1       Running   0          16s
akkahttp-deployment-4229989632-s822x   1/1       Running   0          16s
k8s-etcd-127.0.0.1                     1/1       Running   0          6d
k8s-master-127.0.0.1                   4/4       Running   0          6d
k8s-proxy-127.0.0.1                    1/1       Running   0          6d

Now we have two pods, not one. This is because in the configuration file I provided, there is the value replica: 2, with two different names generated by the system. I am not going into the details of the configuration files, because the scope of the article is simply an introduction for Scala programmers to jump-start into Kubernetes.

Anyhow, there are now two pods active. What is interesting is that the service is the same as before. We configured the service to load balance between all the pods labeled akkahttp. This means we do not have to redeploy the service, but we can replace the single instance with a replicated one.

We can verify this by launching the proxy again (if you are on Windows and you have closed it):

bin/forward-kube-local.sh akkahttp-service 9000

Then, we can try to open two terminal windows and see the logs for each pod. For example, in the first type:

bin/kubectl logs -f akkahttp-deployment-4229989632-mjp6u

And in the second type:

bin/kubectl logs -f akkahttp-deployment-4229989632-s822x

Of course, edit the command line accordingly with the values you have in your system.

Now, try to access the service with two different browsers. You should expect to see the requests to be split between the multiple available servers, like in the following image:

Kubernets in action

Conclusion

Today we barely scratched the surface. Kubernetes offers a lot more possibilities, including automated scaling and restart, incremental deployments, and volumes. Furthermore, the application we used as an example is very simple, stateless with the various instances not needing to know each other. In the real world, distributed applications do need to know each other, and need to change configurations according to the availability of other servers. Indeed, Kubernetes offers a distributed keystore (etcd) to allow different applications to communicate with each other when new instances are deployed. However, this example is purposefully small enough and simplified to help you get going, focusing on the core functionalities. If you follow the tutorial, you should be able to get a working environment for your Scala application on your machine without being confused by a large number of details and getting lost in the complexity.

This article was written by Michele Sciabarra, a Toptal Scala developer.

Rethinking Authentication And Biometric Security, The Toptal Way

Toptal is a vast network of tech talent and we currently boast the biggest distributed workforce in the industry. This is a source of pride for many Toptalers, especially our hard-working dev team. Why? Because we make it appear so easy and seamless, and we do it every single day. While a traditional tech company is bound to have a vast infrastructure (loads of office space, servers, standardized equipment, abundant physical and cyber security resources, and so on), we don’t.

We rely on off-the-shelf technology and services. Traditional companies struggle to cope with a small number of BYOD users, but here at Toptal, all our hardware is BYOD. The problem with our platform-agnostic approach and the reliance on a distributed network is self-evident: How can we ensure and maintain security?

It was never easy, but we like a good challenge, and like to stay one step ahead. That’s why we set about designing multiple authentication and onboarding procedures last year. We used the first quarter of 2016 for trials and pilots, and they were encouraging. As a result, we decided to announce the results of our trials and unveil our rollout plans.

By the end of the third quarter, all Toptalers will be acquainted with our new solutions, and if all goes well, they will start using them by the end of the year.

The Challenge

How do we make sure everyone logged onto our network is who they claim they are? Most of our team members have never met in real life, yet they collaborate on a daily basis. What if someone’s security has been compromised? Or, what if a disgruntled member decides to undermine the network?

We settled on a twofold approach to addressing these concerns:

  • Including a set of personal reliability tests to our screening process.
  • Introducing a new layer of biometric security.

What sort of tests will we institute? Our approach was inspired by the Personnel Reliability Program (PRP), created by the U.S. Department of Defense. The program is designed to identify personnel with the highest degree of reliability, taking into account their prior conduct, trustworthiness, behavior, and allegiance. PRP compliance will be evaluated continuously by our newly formed Internal Security Division (ISD), staffed by military intelligence veterans from Israel and Bosnia.

Security starts with personnel. If you can’t trust your people, all the tech in the world won’t make a difference.

Security starts with personnel. If you can’t trust your people, all the tech in the world won’t make a difference.

Platform access will be limited to individuals who meet stringent PRP criteria, however, failure to meet these standards will not be grounds for termination or demotion. It will merely reflect the individual’s lack of suitability for certain roles, restricting their access to confidential information.

To ensure continuous compliance, every Toptal member will be required to sign a new non-disclosure agreement and undergo evaluation. The agreement will include provisions covering the treatment of confidential information and outline a set of sanctions for individuals in violation of said agreement.

Since we are a distributed network, we will also rely on input from our members. Our existent monthly TopTeam reports will be expanded to include a personal reliability questionnaire. In other words, each network member will be able to report suspect coworkers or behavior via an anonymous evaluation form.

Lt. Col. David Finci, Head of Toptal’s Internal Security Division, explains the decision to include anonymous ‘tips’:

“Our goal is not to encourage dissent and create friction among team members, but we are convinced this is vital to ensuring personal reliability. We must allow network members to scrutinize the professional performance and personal integrity of their coworkers. Otherwise our ability to source actionable, time-sensitive information would be compromised.”

Network members with full PRP clearance will be issued security tokens and one-time pads to ensure encryption should the integrity of our network is compromised. They will also receive ID cards featuring a scannable QR code and/or barcode.

Biometric ID card

Use of these security measures will be mandatory, and loss or theft of ID cards will be taken seriously. Fortunately, these cards will be an interim solution and will be phased out as soon as our new security platform is deemed ready. We expect an early 2017 release.

Biometrics: Imperfect Marriage Of Convenience

We started experimenting with quasi-biometric security last year, quite by accident. After one Toptaler decided to tattoo our logo on their arm, we realized this approach could be employed for QR codes. Nobody wants to carry around yet another card in their wallet, and QR codes are relatively small and so they can be easily tattooed, or even engraved on fingernails.

No, of course we won’t ask developers to tattoo our logo or QR code. Not yet anyway, that’s Phase Two.

No, of course we won’t ask developers to tattoo our logo or QR code. Not yet anyway, that’s Phase Two.

You may be wondering whether or not we are serious, and the answer is obviously no. However, Graham’s tattoo gave us a good idea: Why not use biometric technology, backed by off-the-shelf tracking solutions?

We are already moving towards a passwordless future, and Toptal wants to be on the cutting edge. Why burden people with passwords, silly QR codes, two-factor authentication, or security tokens, if we can ensure superior security without any of them?

There have been attempts at this before, using personal technology such as smartphones and fingerprint scanners, but these techniques aren’t bulletproof. (In the case of smartphone fingerprint scanners, they can be beaten by a simple inkjet printer or knife.)

Smartphone fingerprint scanners can be beaten by an inkjet printer, or a frustrated Tim Roth with a meat clever.

Smartphone fingerprint scanners can be beaten by an inkjet printer, or a frustrated Tim Roth with a meat clever.

Besides, using smartphones for authentication opens up a Pandora’s box of other issues.

Bluetooth LE: Rendering Personal Security Bulletproof And Seamless

A lost phone is a recipe for disaster, and with all due respect for all the anti-theft and anti-loss technology out there, much of it doesn’t work well, or requires user input to do its magic. Besides, why rely solely on smartphones when we need to authenticate people on their office hardware?

A lost phone is called a lost phone for a reason, because the user is unaware that it’s lost to begin with. If you wake up and realize you lost your phone last night, it’s too late. That said, if you have a habit of waking up at strange places without your phone, or any recollection of the night before, you should also be on the lookout for kidney theft.

This is where it gets interesting. Security tokens and dongles work, but they’re a pain to carry around, and they have a habit of getting lost at the worst possible moment. That is why we planned for our ID cards to be a temporary measure, only active for 9 months or so. We intend to replace them with inexpensive, wearableBluetooth devices.

Yes, Toptalers will be required to carry them on their person at all times, but this won’t be a problem. Bluetooth LE is a killer technology, at least in terms of power consumption, and these devices can be secured with relative ease, providing a new layer of authentication (we can’t discuss the details due to NDA restrictions).

We initially tried a number of cheap fitness trackers and anti-loss tags to prove our approach was feasible. It worked, but these off-the-shelf devices were not ideally suited to our needs, so we set about designing our own, which proved to be surprisingly easy.

Enter The Toptal TopBand

We reached out to a number of reputable Chinese OEMs for consultation and technical input. We provided them with the specs, they provided us with their quote and a shipping date. Yes, it was that simple, and yes, we were pleasantly surprised.

We are currently in the process of going through several different Toptal TopBand designs and form factors, as well as working on the software side. These devices will not only interface with your phone and computer as wireless security tokens, they will also track your work and sleep habits.

Why? Because they can. They are based on hardware used in fitness trackers, so we didn’t need to reinvent the wheel and design the hardware from scratch. In fact, it would cost more to remove unnecessary features and sensors than to use off-the-shelf solutions.

Here are the specifications of our initial product:

  • Bluetooth 4.0 chip manufactured by Dialog
  • Accelerometer from ADI
  • 50mAh lithium polymer battery by Sony, 40-day battery life
  • Vibration assembly, three LED UI, notification speaker
  • Dimensions: 8mm x 15mm x 35mm (estimate)
  • Weight: 8g (estimate, without strap or clip-on)

We have not finalized the design yet, so the physical dimensions are just estimates. We are still in the process of deciding whether to use aluminium or polycarbonate for the housing, or a combination of both (we want it to look insanely cool). Either way, the device will be IP67 weather resistant, so you don’t even have to take it off when you hit the shower.

This is why we are convinced the device won’t be a nuisance. It’s tiny, you don’t have to charge it every other day, it can be carried as a standard fitness tracker on the wrist, keychain, and it can even fit in your wallet (as an added bonus, it can be used to alert users if they misplace their wallet or keys).

Of course, you could just pair it to your computer as wireless security device and forget about these features, but where’s the fun in that?

Here is what the TopBand brings to the table, allowing users to:

  • Secure their hardware by limiting access to our platform if the TopBand is not paired and in range of the device.
  • Locate misplaced phones, or vice versa (use a phone to find the TopBand).
  • Receive notifications, via vibration and audio alarms.
  • Collect physical activity data, which can be used to prevent burnout and keep track of your work habits (when used as a wearable).

The last point may prove controversial, but might be useful in some circumstances. For example, it will allow your team members to know whether or not you are awake and working, and it’s perfect for time tracking. Naturally, Toptal will not collect or use this data without prior consent. It’s there for your convenience; use it to improve your health and boost productivity.

Toptal Pet Project

While we were tinkering with the prototypes, a few Toptalers decided to create a potential spin-off, a pet project of sorts and when we say “pet project,” we literally mean pet project. A lot of our people are obsessed with their four-legged friends, so they went about devising ways of using our hardware in ways we did not expect: they turned the TopBand into a pet tracker.

The hardware was ready, so all it took was some tweaked code. We encouraged them to test the device on their pets; the data collected would prove valuable if only to ensure that unethical developers couldn’t cheat the system by mounting the TopBand on their cat and telling everyone they are at home, hard at work.

Pet-specific functionality is still being tested, but the results are encouraging. For the time being, the devices monitor basic activity, check whether or not your pet is asleep, and vibrate if the your pet strays out of range. It sounds a bit more humane than those nasty electric shock collars, doesn’t it?

It may sound weird, but there is nothing to worry about. We are assured pets will love our Bluetooth implants. And so will our developers.

It may sound weird, but there is nothing to worry about. We are assured pets will love our Bluetooth implants. And so will our developers.

Since cats and dogs come in all shapes and sizes, the biggest problem is sensor calibration, which the team is working on. The device was tested on a few cats, including a morbidly obese Italian feline, and dogs ranging from Jack Russells’ to Akita Inus.

Beyond that, we cannot reveal many details, and here is why; our developers have turned their pet project into a serious endeavour. They approached a few potential investors and secured funding for a limited commercial rollout (also scheduled for 2017), but this is just the first step towards a full pet product line.

Our team is already working on the next generation pet tracker, based on proprietary hardware, with wireless charging and the ability to be used as a subdermal implant.

Sounds Geeky, But Your Pets Will Love It

Subdermal implants have a bad reputation, but most of it is unjustified and peddled by conspiracy cranks. If you ask any pet professional, they will tell you that animals larger than a rat don’t even notice them, and in fact, they tend to be safer and more comfortable than most smart collars. Microchipping is already a widely supported practice globally to minimize stray pet populations; this just takes it one step further.

Until now, subdermal implants were limited to rudimentary RFID functionality and this limited their appeal. This isn’t a swipe at RFID tech; a lot of legit companies are working on RFID implants, and Dangerous Things is one startup that stands out in terms of innovation.

However, Qi wireless charging assemblies are getting smaller and cheaper with each new generation. This, obviously, allows engineers to design feature-packed implants because they can afford to use more battery power for sensors and always-on Bluetooth connectivity.

Unfortunately, we are still not there, and the first prototypes won’t be ready until 2018 at the earliest. Our hardware partners also informed us they won’t be able to conduct animal trials in mainland China, due to the country’s strict and inflexible animal rights legislation.

See? Does that look like one happy pussycat or what?

See? Does that look like one happy pussycat or what?

Therefore, the devices will be tested in Cambodia. We were assured the research would be ethical, so there’s nothing to worry about. Our team is eager to try out the implants on their own pets, and they wouldn’t dream of doing anything that would put their furry bundles of joy at risk.

If It’s Good Enough For My Dog…

This is where we hit a minor snag. Thanks to Toptal’s gung-ho culture, two of our team members volunteered to have the implants tested on them, not just their pets. While it’s still too early for human trials, it goes to show that people might not mind using subdermal implants, provided they can trust the technology. Since these individuals played a pivotal role in the development of our TopBand, they are eager to prove the concept. We are told it sort of gets under your skin after a while.

We need human subjects to test wireless charging and a few other features, as attempts to train cats and dogs to sit in one place for hours are unlikely to work. We settled on an alternative approach for the pilot stage, whereby the animals could still move around and recharge their implants, but this involves strapping a big powerbank and Qi charger mat to the animals. As an interim solution, we plan to make good use of ‘cat condo’ cages and catnip to prove the concept over the course of a few hours.

Don’t worry, Big Brother won’t be watching you. Since we are a distributed network, everyone will be watching you!

Don’t worry, Big Brother won’t be watching you. Since we are a distributed network, everyone will be watching you!

Human trials are still a long way off, and they require more planning and regulatory oversight. While this approach works for new drugs, we don’t have the time or resources necessary for clinical trials. However, our volunteers agreed to sign a waiver and have the implants installed anyway. Since this could create legal issues in the EU or US, they managed to find a small, Brazilian plastic surgery clinic willing to do the job. The clinic also offered a generous discount on gynecomastia procedures.

Toptal is looking for more volunteers, and there is no doubt in my mind that we will find them. After all, Google managed to find thousands of people eager to pay $1,500 for a useless wearable, only to stop development months later, and they still called it a success! These brave Explorers didn’t even mind being called Glassholes by the interwebs.

As one Toptal volunteer put it:

“I’d rather have an implant the size of an avocado in my groin, than Google Glass on my face!”

Note: No cats were harmed in the making of this post.

This article was written by NERMIN HAJDARBEGOVIC, a Toptal editor.

Getting Started with Docker: Simplifying Devops

If you like whales, or are simply interested in quick and painless continuous delivery of your software to production, then I invite you to read this introductory Docker Tutorial. Everything seems to indicate that software containers are the future of IT, so let’s go for a quick dip with the container whales Moby Dock andMolly.

Docker, represented by a logo with a friendly looking whale, is an open source project that facilitates deployment of applications inside of software containers. Its basic functionality is enabled by resource isolation features of the Linux kernel, but it provides a user-friendly API on top of it. The first version was released in 2013, and it has since become extremely popular and is being widely used by many big players such as eBay, Spotify, Baidu, and more. In the last funding round, Docker has landed a huge $95 million.

Transporting Goods Analogy

The philosophy behind Docker could be illustrated with a following simple analogy. In the international transportation industry, goods have to be transported by different means like forklifts, trucks, trains, cranes, and ships. These goods come in different shapes and sizes and have different storing requirements: sacks of sugar, milk cans, plants etc. Historically, it was a painful process depending on manual intervention at every transit point for loading and unloading.

It has all changed with the uptake of intermodal containers. As they come in standard sizes and are manufactured with transportation in mind, all the relevant machineries can be designed to handle these with minimal human intervention. The additional benefit of sealed containers is that they can preserve the internal environment like temperature and humidity for sensitive goods. As a result, the transportation industry can stop worrying about the goods themselves and focus on getting them from A to B.

And here is where Docker comes in and brings similar benefits to the software industry.

How Is It Different from Virtual Machines?

At a quick glance, virtual machines and Docker containers may seem alike. However, their main differences will become apparent when you take a look at the following diagram:

Applications running in virtual machines, apart from the hypervisor, require a full instance of the operating system and any supporting libraries. Containers, on the other hand, share the operating system with the host. Hypervisor is comparable to the container engine (represented as Docker on the image) in a sense that it manages the lifecycle of the containers. The important difference is that the processes running inside the containers are just like the native processes on the host, and do not introduce any overheads associated with hypervisor execution. Additionally, applications can reuse the libraries and share the data between containers.

As both technologies have different strengths, it is common to find systems combining virtual machines and containers. A perfect example is a tool named Boot2Docker described in the Docker installation section.

Docker Architecture

Docker Architecture

At the top of the architecture diagram there are registries. By default, the main registry is the Docker Hub which hosts public and official images. Organizations can also host their private registries if they desire.

On the right-hand side we have images and containers. Images can be downloaded from registries explicitly (docker pull imageName) or implicitly when starting a container. Once the image is downloaded it is cached locally.

Containers are the instances of images – they are the living thing. There could be multiple containers running based on the same image.

At the centre, there is the Docker daemon responsible for creating, running, and monitoring containers. It also takes care of building and storing images. Finally, on the left-hand side there is a Docker client. It talks to the daemon via HTTP. Unix sockets are used when on the same machine, but remote management is possible via HTTP based API.

Installing Docker

For the latest instructions you should always refer to the official documentation.

Docker runs natively on Linux, so depending on the target distribution it could be as easy as sudo apt-get install docker.io. Refer to the documentation for details. Normally in Linux, you prepend the Docker commands with sudo, but we will skip it in this article for clarity.

As the Docker daemon uses Linux-specific kernel features, it isn’t possible to run Docker natively in Mac OS or Windows. Instead, you should install an application called Boot2Docker. The application consists of a VirtualBox Virtual Machine, Docker itself, and the Boot2Docker management utilities. You can follow the official installation instructions for MacOS and Windows to install Docker on these platforms.

Using Docker

Let us begin this section with a quick example:

docker run phusion/baseimage echo "Hello Moby Dock. Hello Molly."

We should see this output:

Hello Moby Dock. Hello Molly.

However, a lot more has happened behind the scenes than you may think:

  • The image ‘phusion/baseimage’ was download from Docker Hub (if it wasn’t already in local cache)
  • A container based on this image was started
  • The command echo was executed within the container
  • The container was stopped when the command exitted

On first run, you may notice a delay before the text is printed on screen. If the image had been cached locally, everything would have taken a fraction of a second. Details about the last container can be retrieved by by running docker ps -l:

CONTAINER ID		IMAGE					COMMAND				CREATED			STATUS				PORTS	NAMES
af14bec37930		phusion/baseimage:latest		"echo 'Hello Moby Do		2 minutes ago		Exited (0) 3 seconds ago		stoic_bardeen 

Taking the Next Dive

As you can tell, running a simple command within Docker is as easy as running it directly on a standard terminal. To illustrate a more practical use case, throughout the remainder of this article, we will see how we can utilize Docker to deploy a simple web server application. To keep things simple, we will write a Java program that handles HTTP GET requests to ‘/ping’ and responds with the string ‘pong\n’.

import java.io.IOException;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;

public class PingPong {

    public static void main(String[] args) throws Exception {
        HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
        server.createContext("/ping", new MyHandler());
        server.setExecutor(null);
        server.start();
    }

    static class MyHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            String response = "pong\n";
            t.sendResponseHeaders(200, response.length());
            OutputStream os = t.getResponseBody();
            os.write(response.getBytes());
            os.close();
        }
    }
}

Dockerfile

Before jumping in and building your own Docker image, it’s a good practice to first check if there is an existing one in the Docker Hub or any private registries you have access to. For example, instead of installing Java ourselves, we will use an official image: java:8.

To build an image, first we need to decide on a base image we are going to use. It is denoted by FROMinstruction. Here, it is an official image for Java 8 from the Docker Hub. We are going to copy it into our Java file by issuing a COPY instruction. Next, we are going to compile it with RUN. EXPOSE instruction denotes that the image will be providing a service on a particular port. ENTRYPOINT is an instruction that we want to execute when a container based on this image is started and CMD indicates the default parameters we are going to pass to it.

FROM java:8
COPY PingPong.java /
RUN javac PingPong.java
EXPOSE 8080
ENTRYPOINT ["java"]
CMD ["PingPong"]

After saving these instructions in a file called “Dockerfile”, we can build the corresponding Docker image by executing:

docker build -t toptal/pingpong .

The official documentation for Docker has a section dedicated to best practices regarding writing Dockerfile.

Running Containers

When the image has been built, we can bring it to life as a container. There are several ways we could run containers, but let’s start with a simple one:

docker run -d -p 8080:8080 toptal/pingpong

where -p [port-on-the-host]:[port-in-the-container] denotes the ports mapping on the host and the container respectively. Furthermore, we are telling Docker to run the container as a daemon process in the background by specifying -d. You can test if the web server application is running by attempting to access ‘http://localhost:8080/ping’. Note that on platforms where Boot2docker is being used, you will need to replace ‘localhost’ with the IP address of the virtual machine where Docker is running.

On Linux:

curl http://localhost:8080/ping

On platforms requiring Boot2Docker:

curl $(boot2docker ip):8080/ping

If all goes well, you should see the response:

pong

Hurray, our first custom Docker container is alive and swimming! We could also start the container in an interactive mode -i -t. In our case, we will override the entrypoint command so we are presented with a bash terminal. Now we can execute whatever commands we want, but exiting the container will stop it:

docker run -i -t --entrypoint="bash" toptal/pingpong

There are many more options available to use for starting up the containers. Let us cover a few more. For example, if we want to persist data outside of the container, we could share the host filesystem with the container by using -v. By default, the access mode is read-write, but could be changed to read-only mode by appending :ro to the intra-container volume path. Volumes are particularly important when we need to use any security information like credentials and private keys inside of the containers, which shouldn’t be stored on the image. Additionally, it could also prevent the duplication of data, for example by mapping your local Maven repository to the container to save you from downloading the Internet twice.

Docker also has the capability of linking containers together. Linked containers can talk to each other even if none of the ports are exposed. It can be achieved with –link other-container-name. Below is an example combining mentioned above parameters:

docker run -p 9999:8080 
    --link otherContainerA --link otherContainerB 
    -v /Users/$USER/.m2/repository:/home/user/.m2/repository 
    toptal/pingpong
 Other Container and Image Operations

Unsurprisingly, the list of operations that one could apply to the containers and images is rather long. For brevity, let us look at just a few of them:

  • stop – Stops a running container.
  • start – Starts a stopped container.
  • commit – Creates a new image from a container’s changes.
  • rm – Removes one or more containers.
  • rmi – Removes one or more images.
  • ps – Lists containers.
  • images – Lists images.
  • exec – Runs a command in a running container.

Last command could be particularly useful for debugging purposes, as it lets you to connect to a terminal of a running container:

docker exec -i -t <container-id> bash

Docker Compose for the Microservice World

If you have more than just a couple of interconnected containers, it makes sense to use a tool like docker-compose. In a configuration file, you describe how to start the containers and how they should be linked with each other. Irrespective of the amount of containers involved and their dependencies, you could have all of them up and running with one command: docker-compose up.

Docker in the Wild

Let’s look at three stages of project lifecycle and see how our friendly whale could be of help.

Development

Docker helps you keep your local development environment clean. Instead of having multiple versions of different services installed such as Java, Kafka, Spark, Cassandra, etc., you can just start and stop a required container when necessary. You can take things a step further and run multiple software stacks side by side avoiding the mix-up of dependency versions.

With Docker, you can save time, effort, and money. If your project is very complex to set up, “dockerise” it. Go through the pain of creating a Docker image once, and from this point everyone can just start a container in a snap.

You can also have an “integration environment” running locally (or on CI) and replace stubs with real services running in Docker containers.

Testing / Continuous Integration

With Dockerfile, it is easy to achieve reproducible builds. Jenkins or other CI solutions can be configured to create a Docker image for every build. You could store some or all images in a private Docker registry for future reference.

With Docker, you only test what needs to be tested and take environment out of the equation. Performing tests on a running container can help keep things much more predictable.

Another interesting feature of having software containers is that it is easy to spin out slave machines with the identical development setup. It can be particularly useful for load testing of clustered deployments.

Production

Docker can be a common interface between developers and operations personnel eliminating a source of friction. It also encourages the same image/binaries to be used at every step of the pipeline. Moreover, being able to deploy fully tested container without environment differences help to ensure that no errors are introduced in the build process.

You can seamlessly migrate applications into production. Something that was once a tedious and flaky process can now be as simple as:

docker stop container-id; docker run new-image

And if something goes wrong when deploying a new version, you can always quickly roll-back or change to other container:

docker stop container-id; docker start other-container-id

… guaranteed not to leave any mess behind or leave things in an inconsistent state.

Summary

A good summary of what Docker does is included in its very own motto: Build, Ship, Run.

  • Build – Docker allows you to compose your application from microservices, without worrying about inconsistencies between development and production environments, and without locking into any platform or language.
  • Ship – Docker lets you design the entire cycle of application development, testing, and distribution, and manage it with a consistent user interface.
  • Run – Docker offers you the ability to deploy scalable services securely and reliably on a wide variety of platforms.

Have fun swimming with the whales!

Part of this work is inspired by an excellent book Using Docker by Adrian Mouat.

This article was written by RADEK OSTROWSKI, a Toptal Java developer.

Developing for the Cloud in the Cloud: BigData Development with Docker in AWS

Why you may need it?

I am a developer, and I work daily in Integrated Development Environments (IDE), such as Intellij IDEA or Eclipse. These IDEs are desktop applications. Since the advent of Google Documents, I have seen more and more people moving their work from desktop versions of Word or Excel to the cloud using an online equivalent of a word processor or a spreadsheet application.

There are obvious reasons for using a cloud to keep your work. Today, compared to the traditional desktop business applications, some web applications do not have a significant disadvantage in functionalities. The content is available wherever there is a web browser, and these days, that’s almost everywhere. Collaboration and sharing are easier, and losing files is less likely.

Unfortunately, these cloud advantages are not as common in the world of software development as is for business applications. There are some attempts to provide an online IDE, but they are nowhere close to traditional IDEs.

That is a paradox; while we are still bound to our desktop for daily coding, the software is now spawned on multiple servers. Developers needs to work with stuff they cannot keep any more on their computer. Indeed, laptops are no longer increasing their processing power; having more than 16GB of RAM on a laptop is rare and expensive, and newer devices, tablets, for example, have even less.

However, even if it is not yet possible to replace classic desktop applications for software development, it is possible to move your entire development desktop to the cloud. The day I realized it it was no longer necessary to have all my software on my laptop, and noticing the availability of web version of terminals and VNC, I moved everything to the cloud. Eventually, I developed a build kit for creating that environment in an automated way.

Developer in the cloud

What is the cloud about for a developer? Developing in it, of course!

In this article I present a set of scripts to build a cloud-based development environment for Scala and big data applications, running with Docker in Amazon AWS, and comprising of a web-accessible desktop with IntelliJ IDE, Spark, Hadoop and Zeppelin as services, and also command line tools like a web based SSH, SBT and Ammonite. The kit is freely available on GitHub, and I describe here the procedure for using it to build your instance. You can build your environment and customize it to your particular needs. It should not take you more than 10 minutes to have it up and running.

What is in the “BigDataDevKit”?

My primary goal in developing the kit was that my development environment should be something I can simply fire up, with all the services and servers I work with, and then destroy them when they are no longer needed. This is especially important when you work on different projects, some of them involving a large number of servers and services, as when you work on big data projects.

My ideal cloud-based environment should:

  • Include all the usual development tools, most importantly a graphical IDE.
  • Have the servers and services I need at my fingertips.
  • Be easy and fast to create from scratch, and expandable to add more services.
  • Be entirely accessible using only a web browser.
  • Optionally, allow access with specialized clients (VNC client and SSH client).

Leveraging modern cloud infrastructure and software, the power of modern browsers, a widespread availability of broadband, and the invaluable Docker, I created a development environment for Scala and big data development that, for the better, replaced my development laptop.

Currently, I can work at any time, either from a MacBook Pro, a Surface Tablet, or even an iPad (with a keyboard), although admittedly the last option is not ideal. All these devices are merely clients; the desktop and all the servers are in the cloud.

Docker and Amazon AWS!

My current environment is built using following online services:

  • Amazon Web Services for the servers.
  • GitHub for storing the code.
  • Dropbox to save files.

I also use a couple of free services, like DuckDns for dynamic IP addresses and Let’s encrypt to get a free SSL certificate.

In this environment, I currently have:

  • A graphical desktop with Intellij idea, accessible via a web browser.
  • Web accessible command line tools like SBT and Ammonite.
  • Hadoop for storing files and running MapReduce jobs.
  • Spark Job Server for scheduled jobs.
  • Zeppelin for a web-based notebook.

Most importantly, the web access is fully encrypted with HTTPS, for both web-based VNC and SSH, and there are multiple safeguards to avoid losing data, a concern that is, of course, important when you do not “own” the content on your physical hard disk. Note that getting a copy of all your work on your computer is automatic and very fast. If you lose everything because someone stole your password, you have a copy on your computer anyway, as long as you configured everything correctly.

Using a Web Based Development Environment with AWS and Docker

Now, let’s start describing how the environment works. When I start work in the morning, the first thing is to log into the Amazon Web Services console where I see all my instances. Usually, I have many development instances configured for different projects, and I keep the unused ones turned off to save billing. After all, I can only work on one project at a time. (Well, sometimes I work on two.)

Screen 1

So, I select the instance I want, start it, I wait for a little or go grab a cup of coffee. It’s not so different to turning on your computer. It usually takes a bunch of seconds to have the instance up and running. Once I see the green icon, I open a browser, and I go to a well known URL: https://msciab.duckdns.org/vnc.html. Note, this is my URL; when you create a kit, you will create your unique URL.

Since AWS assigns a new IP to each machine when you start, I configured a dynamic DNS service, so you can always use the same URL to access your server, even if you stop and restart it. You can even bookmark it in your browser. Furthermore, I use HTTPS, with valid keys to get total protection of my work from sniffers, in case I need to manage passwords and other sensitive data.

Screen 2

Once loaded, the system will welcome you with a Web VNC web client, NoVNC. Simply log in and a desktop appears. I use a minimal desktop, intentionally, just a menu with applications, and my only luxury is a virtual desktop (since I open a lot of windows when I develop). For mail, I still rely on other applications, nowadays mostly other browser tabs.

In the virtual machine, I have what I need to develop big data applications. First and foremost, there is an IDE. In the build, I use the IntelliJ Idea community edition. Also, there is the SBT build tool and a Scala REPL, Ammonite.

Screen 3

The key features of this environment, however, are services deployed as containers in the same virtual machine. In particular, I have:

  • Zeppelin, the web notebook for using Scala code on the fly and doing data analysis (http://zeppelin:8080)
  • The Spark Job Server, to execute and deploy spark jobs with a Rest interface (http://sparkjobserver:8080).
  • An instance of Hadoop for storing and retrieving data from the HDFS (http://hadoop:50070).

Note, these URLs are fixed but are accessible within the virtual environment. You can see their web interfaces in the following screenshot.

Screen 4

Each service runs in a separate Docker container. Without becoming too technical, you can think of this as three separate servers inside your virtual machine. The beauty of using Docker is you can add services, and even add two or three virtual machines. Using Amazon containers, you can scale your environment easily.

Last, but not least, you have a web terminal available. Simply access your URL with HTTPS and you will be welcomed with a terminal in a web page.

Screen 5

In the screenshot above you can see I list the containers, which are the three servers plus the desktop. This command line shell gives you access to the virtual machine holding the containers, allowing you to manage them. It’s as if your servers are “in the Matrix” (virtualized within containers), but this shell gives you an escape outside the “Matrix” to manage servers, and desktop. From here, you can restart the containers, access their filesystems and perform other manipulations allowed by Docker. I will not discuss in detail Docker here, but there is a vast amount of documentation on Docker website.

How to setup your instance

Do you like this so far, and you want your instance? It is easy and cheap. You can get it for merely the cost of the virtual machine on Amazon Web Services, plus the storage. The kit in the current configuration requires 4GB of RAM to get all the services running. If you are careful to use the virtual machine only when you need it, and you work, say, 160 hours a month, a virtual machine at current rates will cost 160 x $0.052, or $8 per month. You have to add the cost of storage. I use around 30GB, but everything altogether can be kept under $10.

However, this does bot include the cost of an (eventual) Dropbox (Pro) account, should you want to backup more than 2GB of code. This costs another $15 per month, but it provides important safety for your data. Also, you will need a private repository, either a paid GitHub or another service, such as Bitbucket, which offers free private repositories.

I want to stress that if you use it only when you need it, it is cheaper than a dedicated server. Yes, everything mentioned here can be setup on a physical server, but since I work with big data I need a lot of other AWS services, so I think it is logical to have everything in the same place.

Let’s see how to do the whole setup.

Prerequisites

Before starting to build a virtual machine, you need to register with the following four services:

The only one you need your credit card for is Amazon Web Services. DuckDns is entirely free, while DropBox gives you 2GB of free storage, which can be enough for many tasks. Let’s Encrypt is also free, and it is used internally when you build the image to sign your certificate. Besides these, I recommend a repository hosting service too, like GitHub or Bitbucket, if you want to store your code, however, it is not required for the setup.

To start, navigate to the GitHub BigDataDevKit repository.

Screen 6

Scroll the page and copy the script shown in the image in your text editor of choice:

Screen 7

This script is needed to bootstrap the image. You have to change it and provide some values to the parameters. Carefully, change the text within the quotes. Note you cannot use characters like the quote itself, the backslash or the dollar sign in the password, unless you quote them. This problem is relevant only for the password. If you want to play safe, avoid a quote, dollar sign, or backslashes.

The PASSWORD parameter is a password you choose to access the virtual machine via a web interface. The EMAIL parameter is your email, and will be used when you register an SSL certificate. You will be required to provide your email, and it is the only requirement for getting a free SSL Certificate from Let’s Encrypt.

To get the values for TOKEN and HOST, go to the DuckDNS site and log in. You will need to choose an unused hostname.

Screen 8

Look at the image to see where you have to copy the token and where you have to add your hostname. You must click on the “add domain” button to reserve the hostname.

Configuring your instance

Assuming you have all the parameters and have edited the script, you are ready to launch your instance. Log in to the Amazon Web Services management interface, go to the EC2 Instances panel and click on “Launch Instance”.

Screen 9

In the first screen, you will choose an image. The script is built around the Amazon Linux, and there are no other options available. Select Amazon Linux, the first option in the QuickStart list.

Screen 10

On the second screen, choose the instance type. Given the size of the software running, there are multiple services and you need at least 4GB of memory, so I recommend you select the t2.medium instance. You could trim it down, using the t2.small if you shut down some services, or even the micro if you only want the desktop.

Screen 11

On the third screen, click “Advanced Details” and paste the script you configured in the previous step. I also recommend you enable protection against termination, so that with an accidental termination you won’t lose all your work.

Screen 12

The next step is to configure the storage. The default for an instance is 8GB, which is not enough to contain all the images we will build. I recommend increasing it to 20GB. Also, while it is not needed, I suggest another block device of at least 10GB. The script will mount the second block device as a data folder.You can make a snapshot of its contents, terminate the instance, then recreate it using the snapshot and recovering all the work. Furthermore, a custom block device is not lost when you terminate the instance so have double protection against accidental loss of your data. To increase your safety even further, you can backup your data automatically with Dropbox.

Screen 13

The fifth step is naming the instance. Pick your own. The sixth step offers a way to configure the firewall. By default only SSH is available but we also need HTTPS, so do not forget to add also a rule opening HTTPS. You could open HTTPS to the world, but it’s better if it’s only to your IP to prevent others from accessing your desktop and shell, even though that is still protected with a password.

Once done with this last configuration, you can launch the instance. You will notice that the initialization can take quite a few minutes the first time since the initialization script is running and it will also do some lengthy tasks like generating an HTTPS certificate with Let’s Encrypt.

Screen 14

When you eventually see the management console “running” with a confirmation, and it is no longer “initializing”, you are ready to go.

Assuming all the parameters are correct, you can navigate to https://YOURHOST.duckdns.org.

Replace YOURHOST with the hostname you chose, but do not forget it is an HTTPS site, not HTTP, so your connection to the server is encrypted so you must write https// in the URL. The site will also present a valid certificate for Let’s Encrypt. If there are problems getting the certificate, the initialization script will generate a self-signed certificate. You will still be able to connect with an encrypted connection, but the browser will warn you it is an unknown site, and the connections are insecure. It should not happen, but you never know.

Screen 15

Assuming everything is working, you then access the web terminal, Butterfly. You can log in using the user app and the password you put in the setup script.

Once logged in, you have a bootstrapped virtual machine, which also includes Docker and other goodies, such as a Nginx Frontend, Git, and the Butterfly Web Terminal. Now, you can complete the setup by building the Docker images for your development environment.

Next, type the following commands:

git clone https://github.com/sciabarra/BigDataDevKit
cd BigDataDevKit
sh build.sh

The last command will also ask you to type a password for the Desktop access. Once done, it will start to build the images. Note the build will take a about 10 minutes, but you can see what is happening because everything is shown on the screen.

Once the build is complete, you can also install Dropbox with the following command:

/app/.dropbox-dist/dropboxd

The system will show a link you must click to enable Dropbox. You need to log into Dropbox and then you are done. Whatever you put in the Dropbox folder is automatically synced between all your Dropbox instances.

Once done, you can restart the virtual machine, and access your environment at the https://YOURHOST.dyndns.org/vnc.html URL.

You can stop your machine and restart it when you resume work. The access URL stay the same. This way, you will pay only for the time you are using it, plus monthly extra for the used storage.

Preserving your data

The following discussion requires some knowledge of how Docker and Amazon works. If you do not want to understand the details, just keep in mind following simple rule: In the virtual machine, there is an /app/Dropbox folder available, whatever you place in /app/Dropbox is preserved, and everything else is disposable and can go away. To improve security further, also store your precious code in a version control system.

Now, if you do want to understand this, read on. If you followed my directions in the virtual machine creation, the virtual machine is protected from termination, so you cannot destroy it accidentally. If you expressly decide to terminate it, the primary volume will be destroyed. All the Docker images will be lost, including all the changes you made.

However, since the folder /app/Dropbox is mounted as a Docker Volume for containers, it is not part of Docker images. In the virtual machine, the folder /app is mounted in the Amazon Volume you created, which is also not destroyed even when you expressly terminate the virtual machine. To remove the volume, you have to remove it expressly.

Do not confuse Docker volumes, which are a Docker logical entity, with Amazon Volumes, which is a somewhat physical entity. What happens is that the /app/Dropbox Docker volume is placed inside the /appAmazon volume.

The Amazon Volume is not automatically destroyed when you terminate the virtual machine, so whatever is placed in it will be preserved, until you also expressly destroy the volume. Furthermore, whatever you put in the Docker volume is stored outside of the container, so it is not destroyed when the container is destroyed. If you enabled Dropbox, as recommended, all your content is copied to the Dropbox servers, and to your hard disk if you sync Dropbox with your computer(s). Also, it is recommended that the source code be stored in a version control system.

So, if you place your stuff in version control system under the Dropbox folder, to lose your data all of this must happen:

  • You expressly terminate your virtual machine.
  • You expressly remove the data volume from the virtual machine.
  • You expressly remove the data from Dropbox, including the history.
  • You expressly remove the data from the version control system.

I hope your data is safe enough.

I keep a virtual machine for each project, and when I finish, I keep the unused virtual machines turned off. Of course, I have all my code on GitHub and backed up in Dropbox. Furthermore, when I stop working on a project, I take a snapshot of the Amazon Web Services block before removing the virtual machine entirely. This way, whenever a project resumes, for example for maintenance, all I need to do is start a new virtual machine using the snapshot. All my data goes back in place, and I can resume working.

Optimizing access

First, if you have direct internet access, not mediated by a proxy, you can use native SSH and VNC clients. Direct SSH access is important if you need to copy files in and out of the virtual machine. However, for file sharing, you should consider Dropbox as a simpler alternative.

The VNC web access is invaluable, but sometimes, it can be slower than a native client. You have access to the VNC server on the virtual machine using port 5900. You must expressly open it because it is closed by default. I recommend that you only open it to your IP address, because the internet is full of “robots” that scan the internet looking for services to hook into, and VNC is a frequent target of those robots.

Conclusion

This article explains how you can leverage modern cloud technology to implement an effective development environment. While a machine in the cloud cannot be a complete replacement for your working computer or a laptop, it is good enough for doing development work when it is important to have access to the IDE. In my experience, with current internet connections, it is fast enough to work with.

Being in the cloud, server access and manipulation is faster than having them locally. You can quickly increase (or decrease) memory, fire up another environment, create an image, and so on. You have a datacenter at your fingertips, and when you work with big data projects, well, you need robust services and lots of space. That is what the cloud provides.

The original article was written by MICHELE SCIABARRA – FREELANCE SOFTWARE ENGINEER @ TOPTAL and can be read here.

If you’d like to learn more about Toptal designers or hire one, check this out.

Home Smart Home: Domesticating the Internet of Things

The following article is a guest post from Toptal. Toptal is an elite network of freelancers that enables businesses to connect with the top 3% of software engineers and designers in the world.

The smart home technology boom is upon us. Despite lucrative projections for the market, and ever increasing numbers of connected devices, we have yet to witness much social impact from consumer adoption into the home. As a potential tipping point looms, there are several debates surrounding privacy, integration and other technical issues. Yet, there seems to be less speculation regarding why consumers still haven’t bought into the hype, nor how domestic life has improved. Considering how personal the home is, should it be concerning that those advertising these products discuss quality of life less than data, energy and ‘security’? Is the adoption of the Internet of Things into our homes inevitable, or is it already here?

Smart Home Interface

Somewhere in the Near Future

The smart person returns to their certified ‘Internet – of -Things‘ smart home after a long day at work. The smart security system senses that the smart person is alone and initiates the ‘Friday Night In’ sequence. Inside, an intercom with a standardized motherly voice suggests that the smart person might want to order in tonight. The smart person unloads their things in the kitchen where the smart stove displays a selection of take out, rather than it’s default recipe guide. Following the arrival of the food, the smart person retreats to the living room to wind down, and watch some TV in their underwear. The smart TV prepares a selection of Netflix marathons categorized by mood. The smart person chooses: ‘Looking to be cheered up? Comedy Playlist’. Before starting the show, the smart person reviews a set of graphs that display the data from activity and diet throughout the day. A list of tips for smart living is generated at the bottom, one of which reads that based on the amount of consecutive nights that the Smart Person has had alone, they might consider investigating a selection of popular dating sites instead of watching TV tonight. At the slip of a thumb the smart person OKS the request and instantly a set of profiles are displayed, each chosen from a generated list of Smart Person’s tracked preferences. Suddenly, a flurry of pings and messages from other stay-at-home hopefuls fill the screen. The smart home intercom repeats aloud ‘You’ve got mail!’. The smart person fumbles for the remote and – oops” – the TV snaps a selfie in response to the flood of pings. Their image, sitting in their underwear eating noodles appears briefly on the screen before being whisked off into the ether. The flood of messages doubles only to freeze the system, causing the smart home to reboot. The house goes dark. In the now blank screen of the smart TV reflects the image of the smart person again, finally alone.

Home Smart Home

With all the debate and headlines regarding the Internet of Things, and amounts of devices connected and market valuations – is there anyone left to ask about what will happen to the home once Smart Homes take over? The keeping of a home is one of, if not the oldest traditions that we have as humans. Does the Smart Home mean the end of the home as we know it?

The home is the original place where we build our identity and mark our place in the world – the original profile. Each generation has formed its radical dwellings as their respective marks on the world. We can now look back into those past homes as windows into the past lives of those generations, their values and ambitions. What do our Smart Homes tell us about ourselves? Or perhaps instead, what is it telling everyone else?Smart Home History

A Brief History of the Automated Home

The process towards the automated home began almost two centuries ago now. When we first plugged our homes in, the light bulb gave us the night. No longer was man confined to the limits of the sun. The technology offered liberation from the natural hours of the day. Later, appliances replaced tools and everything that moved, or could move, became battery powered. The first generation of the automated home advertised better performance for leisure in exchange. More time for the family, or affording the once confined housewife to pursue her career as well. The automated home liberated us from the need to maintain it.

Now, the technological trend continues to carry us through the next generation into a new domesticity. Although there will always be laments for what has passed, perhaps change isn’t so bad. If there is a new liberation perhaps it is the freedom to stay home. The freedom to sit and allow the world to visit us inside. Freedom from the outdoors. Freedom from each other. Yet, this time, is the freedom coming at a cost?

The home is the original security device – the original firewall. But now, as we allow the entire world to float through our walls and into our homes, have we deflated the entire meaning of our home that has stood for millennia? We speak of security and privacy now in the context of technical systems and hardware. But have we forgotten the origin of what privacy meant? In the spaces where we were once the most intimate, by inviting the world in we are becoming the most exposed. To adopt the Smart Home, must we forfeit the home?

Given the acceptance worldwide that privacy is dead, these thoughts may be obsolete falling on ears of developers and web designers. The point though is to fundamentally question what the Smart Home is offering us in exchange for what we must give it. Effectively, is the trade worth it? Will domestic life improve as it did during the first generations of automation? Or, how do we ensure, especially as the community who may be taking part in that change, that some amount of domesticity is salvaged?

Home Alone With the Internet of Things

The State of the Internet of Things

The Internet of Things is one of those monster hot topic terms that when we hear it, we know of its significance, and yet may know much less about its tangible effects. We may hear that there are estimates of 200 billion devices being connected in the near future, or that the market’s value is projected at 80 billion dollars. The numbers have similar significance as knowing that the earth is 92.2 million miles away from the sun. They’re very important numbers, yet most of us don’t understand the specifics, or address the numbers daily. All that most people want to know is most cases, is how is this thing going to change, or improve our life?

What is resisting smart home technology from becoming adopted with the same ubiquity as smart phones today? Consider the early phases of smartphone technology. What caused them to make the leap from a niche device, to being fastened to the hip of nearly every person and their grandparent?

Smart phones existed for about a decade before the market saw a significant boom. The release of the iPhone in 2007 generated a major shift as Apple was the first to design and market the device for everyday use. The elements that the iPhone introduced or improved seem to be subtle interface adjustments, and yet were able to catalyze a major shift. Thus, what might be that critical tipping point, or product for smart homes?

The Issue With Niche Products

Everyday there seems to be more smart home products to outfit your home with. Yet with each new application there is a new device, with a new remote that might connect with another new app on your smart phone. No matter how potentially helpful any of these devices might be, they appear to be novel and excessive without being associate with a greater purpose.

Recently, Yves Behar released the designs for a connected garden tracking system called Edyn. The system contains two products, one to monitor the soil, and another to respond to the tracked data and irrigate the garden as necessary. With the data, the app can recommend which plants might be the easiest to grow, and what the produce might need in order to flourish.

Seems pretty handy – but is it necessary enough to become widely adopted, or to really change the way we maintain our yards? Especially considering that for many, gardening is a pleasurable activity. People often garden as a meditative tool. In fact, there was a study recently suggesting that the happier someone is, the more they garden, and the more someone gardens, the happier they are. So, what exactly is Edyn suggesting that they can improve upon in this process that gardening was without before? If gardening becomes easier, what else is to be gained in terms of time in the day? Perhaps now the smart gardener can fit in another Netflix episode, rather than do the watering themselves. Hopefully one day they won’t even have to lift a green thumb at all! Imagine all the Netflix that could be watched.

Now, I don’t mean to pick on Edyn too much, but my point is that this tool is representative of most smart home technology. A lot of it would be nice to have I suppose, but it doesn’t quite seem worth the cost and the trouble.

So What’s Next

The issue with the clunky interaction between multiple apps is now clear to the market, as there is a definite push towards who can develop the ‘hub’ for all smart home devices. Some of the key players so far are SmartThings, Wink, Apple is supposedly generating momentum through AppleTV, and Amazon has the device Echo.

However, there so far is no real front runner, nor have many homes really begun to adopt the technology. What seems to be lacking, as was the case with the iPhone, is the proper interface that can relieve the smart home technology of all of the headaches that get in the way of convenience.

Most of the processes that these technologies are looking to improve are not exactly major burdens – turning of lights, playing music etc. Thus, if the experience is hardly more enjoyable, the new technology won’t be adopted.

One product, Josh.ai, has recognized this need for cohesion between devices, and focuses its manifesto towards interfacing. Josh.ai anticipates the need for programs to develop and build over time along witht the user. Eventually, certain commands become memorized such that Josh.ai will know that every morning it should slowly raise the lights, turn on the morning news, and prepare a cup of coffee. The interface takes any work out of choosing between apps, and instead responses to voice commands to manage all connected devices. Josh.ai is advertised as your home’s best friend. You even talk to Josh.ai as if its your best friend.

Josh.ai bases its product around the use of voice rather than any other interface. This may be a critical move for smart home technology. Josh.ai’s platform essentially asks how can the user program their home with as little work as possible? The hope is that even complex processes like morning routines might be programmed by voice, then memorized for future instances. “Hey Josh, when I get home can you turn up the lights, set the oven to 300, and maybe put on something like Frank Sinatra? I’m bringing a friend home!”

If Josh.ai is one of the more optimistic potential hubs to move the Smart Home trend into the next stage, what does its manifesto tell us about the ambition of what we can expect from the technology in near future?

Very often does the phrase ‘make your life easier and more productive’ occur in many of these manifestos. But, most of what these hubs offer is the autonomous control of small things like lights and music, how much easier will our lives get? Can this at all compare to the transition from brooms to vacuums?

The three essential points of Josh.ai are: thoughtless energy saving, continuous awareness of devices, and a more networked system. How do these new values compare to past technologies that significantly liberated the average person? In the manifestos of most of these hub devices, there is hardly any language that appears human at all.

Although the benefits of some of the products seem underwhelming, the issue is that even the smallest thing poses a significant risk. Take Nest for example, what could be more harmless than a thermostat? Yet already in its relatively short life has Nest been now caught for the sale of information surrounding home fire history to insurance companies.

Even if the Smart Home Hub were your ‘best friend’, do you want your best friend to know everything about your preferences? How much do we even allow our actual best friends to know about ourselves? Do we tell them that we’re lonely? Who’s profiles we look at online? Even if our best friend were really good at keeping secrets, would we tell them? About that one time?

Anytime that someone cries out against the inevitable, they come across as a cranky soap boxer. But this is an odd transition where there seems to be little gained by most of these products, and yet there is such significant risk in what might be lost if the technologies are adopted. Doesn’t it feel strange that in the last century our parents and grandparents marched for privacy, and here we are eagerly handing it back?

 

Persisting Data Across Page Reloads: Cookies, IndexedDB and Everything In-Between

The following article is a guest post from Toptal. Toptal is an elite network of freelancers that enables businesses to connect with the top 3% of software engineers and designers in the world.

Suppose I’m visiting a web site. I right-click on one of the navigation links and select to open the link in a new window. What should happen? If I’m like most users, I expect the new page to have the same content as if I had clicked the link directly. The only difference should be that the page appears in a new window. But if your web site is a single-page application (SPA), you may see weird results unless you’ve carefully planned for this case.

Recall that in an SPA, a typical navigation link is often a fragment identifier, starting with a hash mark (#). Clicking the link directly does not reload the page, so all the data stored in JavaScript variables are retained. But if I open the link in a new tab or window, the browser does reload the page, reinitializing all the JavaScript variables. So any HTML elements bound to those variables will display differently, unless you’ve taken steps to preserve that data somehow.

Persisting Data Across Page Reloads: Cookies, IndexedDB and Everything In-Between

Persisting Data Across Page Reloads: Cookies, IndexedDB and Everything In-Between

There’s a similar issue if I explicitly reload the page, such as by hitting F5. You may think I shouldn’t ever need to hit F5, because you’ve set up a mechanism to push changes from the server automatically. But if I’m a typical user, you can bet I’m still going to reload the page. Maybe my browser seems to have repainted the screen incorrectly, or I just want to be certain I have the very latest stock quotes.

APIs May Be Stateless, Human Interaction Is Not

Unlike an internal request via a RESTful API, a human user’s interaction with a web site is not stateless. As a web user, I think of my visit to your site as a session, almost like a phone call. I expect the browser to remember data about my session, in the same way that when I call your sales or support line, I expect the representative to remember what was said earlier in the call.

An obvious example of session data is whether I’m logged in, and if so, as which user. Once I go through a login screen, I should be able to navigate freely through the user-specific pages of the site. If I open a link in a new tab or window and I’m presented with another login screen, that’s not very user friendly.

Another example is the contents of the shopping cart in an e-commerce site. If hitting F5 empties the shopping cart, users are likely to get upset.

In a traditional multi-page application written in PHP, session data would be stored in the $_SESSION superglobal array. But in an SPA, it needs to be somewhere on the client side. There are four main options for storing session data in an SPA:

  • Cookies
  • Fragment identifier
  • Web storage
  • IndexedDB

Four Kilobytes of Cookies

Cookies are an older form of web storage in the browser. They were originally intended to store data received from the server in one request and send it back to the server in subsequent requests. But from JavaScript, you can use cookies to store just about any kind of data, up to a size limit of 4 KB per cookie. AngularJS offers the ngCookies module for managing cookies. There is also a js-cookies package that provides similar functionality in any framework.

Keep in mind that any cookie you create will be sent to the server on every request, whether it’s a page reload or an Ajax request. But if the main session data you need to store is the access token for the logged-in user, you want this sent to the server on every request anyway. It’s natural to try to use this automatic cookie transmission as the standard means of specifying the access token for Ajax requests.

You may argue that using cookies in this manner is incompatible with RESTful architecture. But in this case it is just fine as each request via the API is still stateless, having some inputs and some outputs. It’s just that one of the inputs is being sent in a funny way, via a cookie. If you can arrange for the login API request to send the access token back in a cookie also, then your client side code hardly needs to deal with cookies at all. Again, it’s just another output from the request being returned in an unusual way.

Cookies offer one advantage over web storage. You can provide a “keep me logged in” checkbox on the login form. With the semantics, I expect if I leave it unchecked then I will remain logged in if I reload the page or open a link in a new tab or window, but I’m guaranteed to be logged out once I close the browser. This is an important safety feature if I’m using a shared computer. As we’ll see later, web storage does not support this behavior.

So how might this approach work in practice? Suppose you’re using LoopBack on the server side. You’ve defined a Person model, extending the built-in User model, adding the properties you want to maintain for each user. You’ve configured the Person model to be exposed over REST. Now you need to tweak server/server.js to achieve the desired cookie behavior. Below is server/server.js, starting from what was generated by slc loopback, with the marked changes:

var loopback = require('loopback');
var boot = require('loopback-boot');

var app = module.exports = loopback();

app.start = function() {
  // start the web server
  return app.listen(function() {
    app.emit('started');
    var baseUrl = app.get('url').replace(/\/$/, '');
    console.log('Web server listening at: %s', baseUrl);
    if (app.get('loopback-component-explorer')) {
      var explorerPath = app.get('loopback-component-explorer').mountPath;
      console.log('Browse your REST API at %s%s', baseUrl, explorerPath);
    }
  });
};

// start of first change
app.use(loopback.cookieParser('secret'));
// end of first change

// Bootstrap the application, configure models, datasources and middleware.
// Sub-apps like REST API are mounted via boot scripts.
boot(app, __dirname, function(err) {
  if (err) throw err;

  // start of second change
  app.remotes().after('Person.login', function (ctx, next) {
    if (ctx.result.id) {
      var opts = {signed: true};
      if (ctx.req.body.rememberme !== false) {
        opts.maxAge = 1209600000;
      }
      ctx.res.cookie('authorization', ctx.result.id, opts);
    }
    next();
  });
  app.remotes().after('Person.logout', function (ctx, next) {
    ctx.res.cookie('authorization', '');
    next();
  });
  // end of second change

  // start the server if `$ node server.js`
  if (require.main === module)
    app.start();
});

The first change configures the cookie parser to use ‘secret’ as the cookie signing secret, thereby enabling signed cookies. You need to do this because although LoopBack looks for an access token in either of the cookies ‘authorization’ or ‘access_token’, it requires that such a cookie be signed. Actually, this requirement is pointless. Signing a cookie is intended to ensure that the cookie hasn’t been modified. But there’s no danger of you modifying the access token. After all, you could have sent the access token in unsigned form, as an ordinary parameter. Thus, you don’t need to worry about the cookie signing secret being hard to guess, unless you’re using signed cookies for something else.

The second change sets up some postprocessing for the Person.login and Person.logout methods. For Person.login, you want to take the resulting access token and send it to the client as the signed cookie ‘authorization’ also. The client may add one more property to the credentials parameter, rememberme, indicating whether to make the cookie persistent for 2 weeks. The default is true. The login method itself will ignore this property, but the postprocessor will check it.

For Person.logout, you want to clear out this cookie.

You can see the results of these changes right away in the StrongLoop API Explorer. Normally after a Person.login request, you would have to copy the access token, paste it into the form at the top right, and click Set Access Token. But with these changes, you don’t have to do any of that. The access token is automatically saved as the cookie ‘authorization’, and sent back on each subsequent request. When the Explorer is displaying the response headers from Person.login, it omits the cookie, because JavaScript is never allowed to see Set-Cookie headers. But rest assured, the cookie is there.

On the client side, on a page reload you would see if the cookie ‘authorization’ exists. If so, you need to update your record of the current userId. Probably the easiest way to do this is to store the userId in a separate cookie on successful login, so you can retrieve it on a page reload.

The Fragment Identifier

As I’m visiting a web site that has been implemented as an SPA, the URL in my browser’s address bar might look something like “https://example.com/#/my-photos/37”. The fragment identifier portion of this, “#/my-photos/37”, is already a collection of state information that could be viewed as session data. In this case, I’m probably viewing one of my photos, the one whose ID is 37.

You may decide to embed other session data within the fragment identifier. Recall that in the previous section, with the access token stored in the cookie ‘authorization’, you still needed to keep track of the userId somehow. One option is to store it in a separate cookie. But another approach is to embed it in the fragment identifier. You could decide that while I’m logged in, all the pages I visit will have a fragment identifier beginning with “#/u/XXX”, where XXX is the userId. So in the previous example, the fragment identifier might be “#/u/59/my-photos/37” if my userId is 59.

Theoretically, you could embed the access token itself in the fragment identifier, avoiding any need for cookies or web storage. But that would be a bad idea. My access token would then be visible in the address bar. Anyone looking over my shoulder with a camera could take a snapshot of the screen, thereby gaining access to my account.

One final note: it is possible to set up an SPA so that it doesn’t use fragment identifiers at all. Instead it uses ordinary URLs like “http://example.com/app/dashboard” and “http://example.com/app/my-photos/37”, with the server configured to return the top level HTML for your SPA in response to a request for any of these URLs. Your SPA then does its routing based on the path (e.g. “/app/dashboard” or “/app/my-photos/37”) instead of the fragment identifier. It intercepts clicks on navigation links, and uses History.pushState() to push the new URL, then proceeds with routing as usual. It also listens for popstate events to detect the user clicking the back button, and again proceeds with routing on the restored URL. The full details of how to implement this are beyond the scope of this article. But if you use this technique, then obviously you can store session data in the path instead of the fragment identifier.

Web Storage

Web storage is a mechanism for JavaScript to store data within the browser. Like cookies, web storage is separate for each origin. Each stored item has a name and a value, both of which are strings. But web storage is completely invisible to the server, and it offers much greater storage capacity than cookies. There are two types of web storage: local storage and session storage.

An item of local storage is visible across all tabs of all windows, and persists even after the browser is closed. In this respect, it behaves somewhat like a cookie with an expiration date very far in the future. Thus, it is suitable for storing an access token in the case where the user has checked “keep me logged in” on the login form.

An item of session storage is only visible within the tab where it was created, and it disappears when that tab is closed. This makes its lifetime very different from that of any cookie. Recall that a session cookie is still visible across all tabs of all windows.

If you use the AngularJS SDK for LoopBack, the client side will automatically use web storage to save both the access token and the userId. This happens in the LoopBackAuth service in js/services/lb-services.js. It will use local storage, unless the rememberMe parameter is false (normally meaning the “keep me logged in” checkbox was unchecked), in which case it will use session storage.

The result is that if I log in with “keep me logged in” unchecked, and I then open a link in a new tab or window, I won’t be logged in there. Most likely I’ll see the login screen. You can decide for yourself whether this is acceptable behavior. Some might consider it a nice feature, where you can have several tabs, each logged in as a different user. Or you might decide that hardly anyone uses shared computers any more, so you can just omit the “keep me logged in” checkbox altogether.

So how would the session data handling look if you decide to go with the AngularJS SDK for LoopBack? Suppose you have the same situation as before on the server side: you’ve defined a Person model, extending the User model, and you’ve exposed the Person model over REST. You won’t be using cookies, so you won’t need any of the changes described earlier.

On the client side, somewhere in your outermost controller, you probably have a variable like $scope.currentUserId which holds the userId of the currently logged in user, or null if the user is not logged in. Then to handle page reloads properly, you just include this statement in the constructor function for that controller:

$scope.currentUserId = Person.getCurrentId();

It’s that easy. Add ‘Person’ as a dependency of your controller, if it isn’t already.

IndexedDB

IndexedDB is a newer facility for storing large amounts of data in the browser. You can use it to store data of any JavaScript type, such as an object or array, without having to serialize it. All requests against the database are asynchronous, so you get a callback when the request is completed.

You might use IndexedDB to store structured data that’s unrelated to any data on the server. An example might be a calendar, a to-do list, or saved games that are played locally. In this case, the application is really a local one, and your web site is just the vehicle for delivering it.

At present, Internet Explorer and Safari only have partial support for IndexedDB. Other major browsers support it fully. One serious limitation at the moment, though, is that Firefox disables IndexedDB entirely in private browsing mode.

As a concrete example of using IndexedDB, let’s take the sliding puzzle application by Pavol Daniš, and tweak it to save the state of the first puzzle, the Basic 3×3 sliding puzzle based on the AngularJS logo, after each move. Reloading the page will then restore the state of this first puzzle.

I’ve set up a fork of the repository with these changes, all of which are in app/js/puzzle/slidingPuzzle.js. As you can see, even a rudimentary usage of IndexedDB is quite involved. I’ll just show the highlights below. First, the function restore gets called during page load, to open the IndexedDB database:

/*
 * Tries to restore game
 */
this.restore = function(scope, storekey) {
    this.storekey = storekey;
    if (this.db) {
        this.restore2(scope);
    }
    else if (!window.indexedDB) {
        console.log('SlidingPuzzle: browser does not support indexedDB');
        this.shuffle();
    }
    else {
        var self = this;
        var request = window.indexedDB.open('SlidingPuzzleDatabase');
        request.onerror = function(event) {
            console.log('SlidingPuzzle: error opening database, ' + request.error.name);
            scope.$apply(function() { self.shuffle(); });
        };
        request.onupgradeneeded = function(event) {
            event.target.result.createObjectStore('SlidingPuzzleStore');
        };
        request.onsuccess = function(event) {
            self.db = event.target.result;
            self.restore2(scope);
        };
    }
};

The request.onupgradeneeded event handles the case where the database doesn’t exist yet. In this case, we create the object store.

Once the database is open, the function restore2 is called, which looks for a record with a given key (which will actually be the constant ‘Basic’ in this case):

/*
 * Tries to restore game, once database has been opened
 */
this.restore2 = function(scope) {
    var transaction = this.db.transaction('SlidingPuzzleStore');
    var objectStore = transaction.objectStore('SlidingPuzzleStore');
    var self = this;
    var request = objectStore.get(this.storekey);
    request.onerror = function(event) {
        console.log('SlidingPuzzle: error reading from database, ' + request.error.name);
        scope.$apply(function() { self.shuffle(); });
    };
    request.onsuccess = function(event) {
        if (!request.result) {
            console.log('SlidingPuzzle: no saved game for ' + self.storekey);
            scope.$apply(function() { self.shuffle(); });
        }
        else {
            scope.$apply(function() { self.grid = request.result; });
        }
    };
}

If such a record exists, its value replaces the grid array of the puzzle. If there is any error in restoring the game, we just shuffle the tiles as before. Note that grid is a 3×3 array of tile objects, each of which is fairly complex. The great advantage of IndexedDB is that you can store and retrieve such values without having to serialize them.

We use $apply to inform AngularJS that the model has been changed, so the view will be updated appropriately. This is because the update is happening inside a DOM event handler, so AngularJS wouldn’t otherwise be able to detect the change. Any AngularJS application using IndexedDB will probably need to use $apply for this reason.

After any action that would change the grid array, such as a move by the user, the function save is called which adds or updates the record with the appropriate key, based on the updated grid value:

/*
 * Tries to save game
 */
this.save = function() {
    if (!this.db) {
        return;
    }
    var transaction = this.db.transaction('SlidingPuzzleStore', 'readwrite');
    var objectStore = transaction.objectStore('SlidingPuzzleStore');
    var request = objectStore.put(this.grid, this.storekey);
    request.onerror = function(event) {
        console.log('SlidingPuzzle: error writing to database, ' + request.error.name);
    };
    request.onsuccess = function(event) {
        // successful, no further action needed
    };
}

The remaining changes are to call the above functions at appropriate times. You can review the commitshowing all of the changes. Note that we are calling restore only for the basic puzzle, not for the three advanced puzzles. We exploit the fact that the three advanced puzzles have an api attribute, so for those we just do the normal shuffling.

What if we wanted to save and restore the advanced puzzles also? That would require some restructuring. In each of the advanced puzzles, the user can adjust the image source file and the puzzle dimensions. So we’d have to enhance the value stored in IndexedDB to include this information. More importantly, we’d need a way to update them from a restore. That’s a bit much for this already lengthy example.

Conclusion

In most cases, web storage is your best bet for storing session data. It’s fully supported by all major browsers, and it offers much greater storage capacity than cookies.

You would use cookies if your server is already set up to use them, or if you need the data to be accessible across all tabs of all windows, but you also want to ensure it will be deleted when the browser is closed.

You already use the fragment identifier to store session data that’s specific to that page, such as the ID of the photo the user is looking at. While you could embed other session data in the fragment identifier, this doesn’t really offer any advantage over web storage or cookies.

Using IndexedDB is likely to require a lot more coding than any of the other techniques. But if the values you’re storing are complex JavaScript objects that would be difficult to serialize, or if you need a transactional model, then it may be worthwhile. Source: Toptal.