Posts Tagged ‘web development’
Getting the Most Out of Your PHP Log Files: A Practical Guide
It could rightfully be said that logs are one of the most underestimated and underutilized tools at a freelance php developer’s disposal. Despite the wealth of information they can offer, it is not uncommon for logs to be the last place a developer looks when trying to resolve a problem.
In truth, PHP log files should in many cases be the first place to look for clues when problems occur. Often, the information they contain could significantly reduce the amount of time spent pulling out your hair trying to track down a gnarly bug.
But perhaps even more importantly, with a bit of creativity and forethought, your logs files can be leveraged to serve as a valuable source of usage information and analytics. Creative use of log files can help answer questions such as: What browsers are most commonly being used to visit my site? What’s the average response time from my server? What was the percentage of requests to the site root? How has usage changed since we deployed the latest updates? And much, much more.
This article provides a number of tips on how to configure your log files, as well as how to process the information that they contain, in order to maximize the benefit that they provide.
Although this article focuses technically on logging for PHP developers, much of the information presented herein is fairly “technology agnostic” and is relevant to other languages and technology stacks as well.
Note: This article presumes basic familiarity with the Unix shell. For those lacking this knowledge, an Appendix is provided that introduces some of the commands needed for accessing and reading log files on a Unix system.
Our PHP Log File Example Project
As an example project for discussion purposes in this article, we will take Symfony Standard as a working project and we’ll set it up on Debian 7 Wheezy with rsyslogd
, nginx
, and PHP-FPM
.
composer create-project symfony/framework-standard-edition my "2.6.*"
This quickly gives us a working test project with a nice UI.
Tips for Configuring Your Log Files
Here are some pointers on how to configure your log files to help maximize their value.
Error Log Confguration
Error logs represent the most basic form of logging; i.e., capturing additional information and detail when problems occur. So, in an ideal world, you would want there to be no errors and for your error logs to be empty. But when problems do occur (as they invariably do), your error logs should be one of the first stops you make on your debugging trail.
Error logs are typically quite easy to configure.
For one thing, all error and crash messages can be logged in the error log in exactly the same format in which they would otherwise be presented to a user. With some simple configuration, the end user will never need to see those ugly error traces on your site, while devops will be still able to monitor the system and review these error messages in all their gory detail. Here’s how to setup this kind of logging in PHP:
log_errors = On
error_reporting = E_ALL
error_log = /path/to/my/error/log
Another two lines that are important to include in a log file for a live site, to preclude gory levels of error detail from being to presented to users, are:
display_errors = Off
display_startup_errors = Off
System Log (syslog
) Confguration
There are many generally compatible implementations of the syslog
daemon in the open source world including:
syslogd
andsysklogd
– most often seen on BSD family systems, CentOS, Mac OS X, and otherssyslog-ng
– default for modern Gentoo and SuSE buildsrsyslogd
– widely used on the Debian and Fedora families of operating systems
(Note: In this article, we’ll be using rsyslogd
for our examples.)
The basic syslog configuration is generally adequate for capturing your log messages in a system-wide log file (normally /var/log/syslog
; might also be /var/log/messages
or /var/log/system.log
depending on the distro you’re using).
The system log provides several log facilities, eight of which (LOG_LOCAL0
through LOG_LOCAL7
) are reserved for user-deployed projects. Here, for example, is how you might setup LOG_LOCAL0
to write to 4 separate log files, based on logging level (i.e., error, warning, info, debug):
# /etc/rsyslog.d/my.conf
local0.err /var/log/my/err.log
local0.warning /var/log/my/warning.log
local0.info -/var/log/my/info.log
local0.debug -/var/log/my/debug.log
Now, whenever you write a log message to LOG_LOCAL0
facility, the error messages will go to /var/log/my/err.log
, warning messages will go to /var/log/my/warning.log
, and so on. Note, though, that the syslog daemon filters messages for each file based on the rule of “this level and higher”. So, in the example above, all error messages will appear in all four configured files, warning messages will appear in all but the error log, info messages will appear in the info and debug logs, and debug messages will only go to debug.log
.
One additional important note; The -
signs before the info and debug level files in the above configuration file example indicate that writes to those files should be perfomed asynchronously (since these operations are non-blocking). This is typically fine (and even recommended in most situations) for info and debug logs, but it’s best to have writes to the error log (and most prpobably the warning log as well) be synchronous.
In order to shut down a less important level of logging (e.g., on a production server), you may simply redirect related messages to /dev/null
(i.e., to nowhere):
local0.debug /dev/null # -/var/log/my/debug.log
One specific customization that is useful, especially to support some of the PHP log file parsing we’ll be discussing later in this article, is to use tab as the delimiter character in log messages. This can easily be done by adding the following file in /etc/rsyslog.d
:
# /etc/rsyslog.d/fixtab.conf
$EscapeControlCharactersOnReceive off
And finally, don’t forget to restart the syslog daemon after you make any configuration changes in order for them to take effect:
service rsyslog restart
Server Log Confguration
Unlike application logs and error logs that you can write to, server logs are exclusively written to by the corresponding server daemons (e.g., web server, database server, etc.) on each request. The only “control” you have over these logs is to the extent that the server allows you to configure its logging functionality. Though there can be a lot to sift through in these files, they are often the only way to get a clear sense of what’s going on “under the hood” with your server.
Let’s deploy our Symfony Standard example application on nginx environment with MySQL storage backend. Here’s the nginx host config we will be using:
server {
server_name my.log-sandbox;
root /var/www/my/web;
location / {
# try to serve file directly, fallback to app.php
try_files $uri /app.php$is_args$args;
}
# DEV
# This rule should only be placed on your development environment
# In production, don't include this and don't deploy app_dev.php or config.php
location ~ ^/(app_dev|config)\.php(/|$) {
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_split_path_info ^(.+\.php)(/.*)$;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param HTTPS off;
}
# PROD
location ~ ^/app\.php(/|$) {
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_split_path_info ^(.+\.php)(/.*)$;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param HTTPS off;
# Prevents URIs that include the front controller. This will 404:
# http://domain.tld/app.php/some-path
# Remove the internal directive to allow URIs like this
internal;
}
error_log /var/log/nginx/my_error.log;
access_log /var/log/nginx/my_access.log;
}
With regard to the last two directives above: access_log
represents the general requests log, while error_log
is for errors, and, as with application error logs, it’s worth setting up extra monitoring to be alerted to problems so you can react quickly.
Note: This is an intentionally oversimplified nginx config file that is provided for example purposes only. It pays almost no attention to security and performance and shouldn’t be used as-is in any “real” environment.
This is what we get in /var/log/nginx/my_access.log
after typing http://my.log-sandbox/app_dev.php/
in browser and hitting Enter
.
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /app_dev.php/ HTTP/1.1" 200 6715 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /bundles/framework/css/body.css HTTP/1.1" 200 6657 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /bundles/framework/css/structure.css HTTP/1.1" 200 1191 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /bundles/acmedemo/css/demo.css HTTP/1.1" 200 2204 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /bundles/acmedemo/images/welcome-quick-tour.gif HTTP/1.1" 200 4770 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /bundles/acmedemo/images/welcome-demo.gif HTTP/1.1" 200 4053 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /bundles/acmedemo/images/welcome-configure.gif HTTP/1.1" 200 3530 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:28 +0300] "GET /favicon.ico HTTP/1.1" 200 6518 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
192.168.56.1 - - [26/Apr/2015:16:13:30 +0300] "GET /app_dev.php/_wdt/e50d73 HTTP/1.1" 200 13265 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
This shows that, for serving one page, the browser actually performs 9 HTTP calls. 7 of those, however, are requests to static content, which are plain and lightweight. However, they still take network resources and this is what can be optimized by using various sprites and minification techniques.
While those optimisations are to be discussed in another article, what’s relavant here is that we can log requests to static contents separately by using another location
directive for them:
location ~ \.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js)$ {
access_log /var/log/nginx/my_access-static.log;
}
Remember that nginx location
performs simple regular expression matching, so you can include as many static contents extensions as you expect to dispatch on your site.
Parsing such logs is no different than parsing application logs.
Other Logs Worth Mentioning
Two other PHP logs worth mentioning are the debug log and data storage log.
The Debug Log
Another convenient thing about nginx logs is the debug log. We can turn it on by replacing the error_log
line of the config with the following (requires that the nginx debug module be installed):
error_log /var/log/nginx/my_error.log debug;
The same setting applies for Apache or whatever other webserver you use.
And incidentally, debug logs are not related to error logs, even though they are configured in the error_log
directive.
Although the debug log can indeed be verbose (a single nginx request, for example, generated 127KB of log data!), it can still be very useful. Wading through a log file may be cumbersome and tedious, but it can often quickly provide clues and information that greatly help accelerate the debugging process.
In particular, the debug log can really help with debugging nginx configurations, especially the most complicated parts, like location
matching and rewrite
chains.
Of course, debug logs should never be enabled in a production environment. The amount of space they use also and the amount of information that they store means a lot of I/O load on your server, which can degrade the whole system’s performance significantly.
Data Storage Logs
Another type of server log (useful for debugging) is data storage logs. In MySQL, you can turn them on by adding these lines:
[mysqld]
general_log = 1
general_log_file = /var/log/mysql/query.log
These logs simply contain a list of queries run by the system while serving database requests in chronological order, which can be helpful for various debugging and tracing needs. However, they should not stay enabled on production systems, since they will generate extra unnecessary I/O load, which affects performance.
Writing to Your Log Files
PHP itself provides functions for opening, writing to, and closing log files (openlog()
, syslog()
, and closelog()
, respectively).
There are also numerous logging libraries for the PHP developer, such as Monolog (popular among Symfonyand Laravel users), as well as various framework-specific implementations, such as the logging capabilities incorporated into CakePHP. Generally, libraries like Monolog not only wrap syslog()
calls, but also allow using other backend functionality and tools.
Here’s a simple example of how to write to the log:
<?php
openlog(uniqid(), LOG_ODELAY, LOG_LOCAL0);
syslog(LOG_INFO, 'It works!');
Our call here to openlog
:
- configures PHP to prepend a unique identifier to each system log message within the script’s lifetime
- sets it to delay opening the syslog connection until the first
syslog()
call has occurred - sets
LOG_LOCAL0
as the default logging facility
Here’s what the contents of the log file would look like after running the above code:
# cat /var/log/my/info.log
Mar 2 00:23:29 log-sandbox 54f39161a2e55: It works!
Maximizing the Value of Your PHP Log Files
Now that we’re all good with theory and basics, let’s see how much we can get from logs making as few changes as possible to our sample Symfony Standard project.
First, let’s create the scripts src/log-begin.php
(to properly open and configure our logs) and src/log-end.php
(to log information about successful completion). Note that, for simplicity, we’ll just write all messages to the info log.
# src/log-begin.php
<?php
define('START_TIME', microtime(true));
openlog(uniqid(), LOG_ODELAY, LOG_LOCAL0);
syslog(LOG_INFO, 'BEGIN');
syslog(LOG_INFO, "URI\t{$_SERVER['REQUEST_URI']}");
$browserHash = substr(md5($_SERVER['HTTP_USER_AGENT']), 0, 7);
syslog(LOG_INFO, "CLIENT\t{$_SERVER['REMOTE_ADDR']}\t{$browserHash}"); <br />
# src/log-end.php
<?php
syslog(LOG_INFO, "DISPATCH TIME\t" . round(microtime(true) - START_TIME, 2));
syslog(LOG_INFO, 'END');
And let’s require these scripts in app.php
:
<?php
require_once(dirname(__DIR__) . '/src/log-begin.php');
syslog(LOG_INFO, "MODE\tPROD");
# original app.php contents
require_once(dirname(__DIR__) . '/src/log-end.php');
For the development environment, we want to require these scripts in app_dev.php
as well. The code to do so would be the same as above, except we would set the MODE
to DEV
rather than PROD
.
We also want to track what controllers are being invoked, so let’s add one more line in Acme\DemoBundle\EventListener\ControllerListener
, right at the beginning of the ControllerListener::onKernelController()
method:
syslog(LOG_INFO, "CONTROLLER\t" . get_class($event->getController()[0]));
Note that these changes total a mere 15 extra lines of code, but can collectively yield a wealth of information.
Analyzing the Data in Your Log Files
For starters, let’s see how many HTTP requests are required to serve each page load.
Here’s the info in the logs for one request, based on the way we’ve configured our logging:
Mar 3 12:04:20 log-sandbox 54f58724b1ccc: BEGIN
Mar 3 12:04:20 log-sandbox 54f58724b1ccc: URI /app_dev.php/
Mar 3 12:04:20 log-sandbox 54f58724b1ccc: CLIENT 192.168.56.1 1b101cd
Mar 3 12:04:20 log-sandbox 54f58724b1ccc: MODE DEV
Mar 3 12:04:23 log-sandbox 54f58724b1ccc: CONTROLLER Acme\DemoBundle\Controller\WelcomeController
Mar 3 12:04:25 log-sandbox 54f58724b1ccc: DISPATCH TIME 4.51
Mar 3 12:04:25 log-sandbox 54f58724b1ccc: END
Mar 3 12:04:25 log-sandbox 54f5872967dea: BEGIN
Mar 3 12:04:25 log-sandbox 54f5872967dea: URI /app_dev.php/_wdt/59b8b6
Mar 3 12:04:25 log-sandbox 54f5872967dea: CLIENT 192.168.56.1 1b101cd
Mar 3 12:04:25 log-sandbox 54f5872967dea: MODE DEV
Mar 3 12:04:28 log-sandbox 54f5872967dea: CONTROLLER Symfony\Bundle\WebProfilerBundle\Controller\ProfilerController
Mar 3 12:04:29 log-sandbox 54f5872967dea: DISPATCH TIME 4.17
Mar 3 12:04:29 log-sandbox 54f5872967dea: END
So now we know that each page load is actually served with two HTTP requests.
Actually there are two points worth mentioning here. First, the two requests per page load is for using Symfony in dev mode (which I have done throughout this article). You can identify dev mode calls by searching for /app-dev.php/
URL chunks. Second, let’s say each page load is served with two subsequent requests to the Symfony app. As we saw earlier in the nginx access logs, there are actually more HTTP calls, some of which are for static content.
OK, now let’s surf a bit on the demo site (to build up the data in the log files) and let’s see what else we can learn from these logs.
How many requests were served in total since the beginning of the logfile?
# grep -c BEGIN info.log
10
Did any of them fail (did the script shut down without reaching the end)?
# grep -c END info.log
10
We see that the number of BEGIN
and END
records match, so this tells us that all of the calls were successful. (If the PHP script had not completed successfully, it would not have reached execution of the src/log-end.php
script.)
What was the percentage of requests to the site root?
# `grep -cE "\s/app_dev.php/$" info.log`
2
This tells us that there were 2 page loads of the site root. Since we previously learned that (a) there are 2 requests to the app per page load and (b) there were a total of 10 HTTP requests, the percentage of requests to the site root was 40% (i.e., 2×2/10).
Which controller class is responsible for serving requests to site root?
# grep -E "\s/$|\s/app_dev.php/$" info.log | head -n1
Mar 3 12:04:20 log-sandbox 54f58724b1ccc: URI /app_dev.php/
# grep 54f58724b1ccc info.log | grep CONTROLLER
Mar 3 12:04:23 log-sandbox 54f58724b1ccc: CONTROLLER Acme\DemoBundle\Controller\WelcomeController
Here we used the unique ID of a request to check all log messages related to that single request. We thereby were able to determine that the controller class responsible for serving requests to site root is Acme\DemoBundle\Controller\WelcomeController
.
Which clients with IPs of subnet
192.168.0.0/16
have accessed the site?
# grep CLIENT info.log | cut -d":" -f4 | cut -f2 | sort | uniq
192.168.56.1
As expected in this simple test case, only my host computer has accessed the site. This is of course a very simplistic example, but the capability that it demonstrates (of being able to analyse the sources of the traffic to your site) is obviously quite powerful and important.
How much of the traffic to my site has been from FireFox?
Having 1b101cd
as the hash of my Firefox User-Agent, I can answer this question as follows:
# grep -c 1b101cd info.log
8
# grep -c CLIENT info.log
10
Answer: 80% (i.e., 8/10)
What is the percentage of requests that yielded a “slow” response?
For purposes of this example, we’ll define “slow” as taking more than 5 seconds to provide a response. Accordingly:
# grep "DISPATCH TIME" info.log | grep -cE "\s[0-9]{2,}\.|\s[5-9]\."
2
Answer: 20% (i.e., 2/10)
Did anyone ever supply GET parameters?
# grep URI info.log | grep \?
No, Symfony standard uses only URL slugs, so this also tells us here that no one has attempted to hack the site.
These are just a handful of relatively rudimentary examples of the ways in which logs files can be creatively leveraged to yield valuable usage information and even basic analytics.
Other Things to Keep in Mind
Keeping Things Secure
Another heads-up is for security. You might think that logging requests is a good idea, in most cases it indeed is. However, it’s important to be extremely careful about removing any potentially sensitive user information before storing it in the log.
Fighting Log File Bloat
Since log files are text files to which you always append information, they are constantly growing. Since this is a well-known issue, there are some fairly standard approaches to controlling log file growth.
The easiest is to rotate the logs. Rotating logs means:
- Periodically replacing the log with a new empty file for further writing
- Storing the old file for history
- Removing files that have “aged” sufficiently to free up disk space
- Making sure the application can write to the logs uniterrupted when these file changes occur
The most common solution for this is logrotate
, which ships pre-installed with most *nix distributions. Let’s see a simple configuration file for rotating our logs:
/var/log/my/debug.log
/var/log/my/info.log
/var/log/my/warning.log
/var/log/my/error.log
{
rotate 7
daily
missingok
notifempty
delaycompress
compress
sharedscripts
postrotate
invoke-rc.d rsyslog rotate > /dev/null
endscript
}
Another, more advanced approach is to make rsyslogd
itself write messages into files, dynamically created based on current date and time. This would still require a custom solution for removal of older files, but lets devops manage timeframes for each log file precisely. For our example:
$template DynaLocal0Err, "/var/log/my/error-%$NOW%-%$HOUR%.log"
$template DynaLocal0Info, "/var/log/my/info-%$NOW%-%$HOUR%.log"
$template DynaLocal0Warning, "/var/log/my/warning-%$NOW%-%$HOUR%.log"
$template DynaLocal0Debug, "/var/log/my/debug-%$NOW%-%$HOUR%.log"
local1.err -?DynaLocal0Err
local1.info -?DynaLocal0Info
local1.warning -?DynaLocal0Warning
local1.debug -?DynaLocal0Debug
This way, rsyslog
will create an individual log file each hour, and there won’t be any need for rotating them and restarting the daemon. Here’s how log files older than 5 days can be removed to accomplish this solution:
find /var/log/my/ -mtime +5 -print0 | xargs -0 rm
Remote Logs
As the project grows, parsing information from logs gets more and more resource hungry. This not only means creating extra server load; it also means creating peak load on the CPU and disk drives at the times when you parse logs, which can degrade server response time for users (or in a worst case can even bring the site down).
To solve this, consider setting up a centralized logging server. All you need for this is another box with UDP port 514 (default) open. To make rsyslogd
listen to connections, add the following line to its config file:
$UDPServerRun 514
Having this, setting up the client is then as easy as:
*.* @HOSTNAME:514
(where HOSTNAME
is the host name of your remote logging server).
Conclusion
While this article has demonstrated some of the creative ways in which log files can offer way more valuable information than you may have previously imagined, it’s important to emphasize that we’ve only scratched the surface of what’s possible. The extent, scope, and format of what you can log is almost limitless. This means that – if there’s usage or analytics data you want to extract from your logs – you simply need to log it in a way that will make it easy to subsequently parse and analyze. Moreover, that analysis can often be performed with standard Linux command line tools like grep
, sed
, or awk
.
Indeed, PHP log files are a most powerful tool that can be of tremendous benefit.
Resources
Code on GitHub: https://github.com/isanosyan/toptal-blog-logs-post-example
Appendix: Reading and Manipulating Log Files in the Unix Shell
Here is a brief intro to some of the more common *nix command line tools that you’ll want to be familiar with for reading and manipulating your log files.
cat
is perhaps the most simple one. It prints the whole file to the output stream. For example, the following command will printlogfile1
to the console:cat logfile1
>
character allows user to redirect output, for example into another file. Opens target stream in write mode (which means wiping target contents). Here’s how we replace contents oftmpfile
with contents oflogfile1
:cat logfile1 > tmpfile
>>
redirects output and opens target stream in append mode. Current contents of target file will be preserved, new lines will be added to the bottom. This will appendlogfile1
contents totmpfile
:cat logfile1 >> tmpfile
grep
filters file by some pattern and prints only matching lines. Command below will only print lines oflogfile1
containingBingo
message:grep Bingo logfile1
cut
prints contents of a single column (by number starting from 1). By default searches for tab characters as delimiters between column. For example, if you have file full of timestamps in formatYYYY-MM-DD HH:MM:SS
, this will allow you to print only years:cut -d"-" -f1 logfile1
head
displays only the first lines of a filetail
displays only the last lines of a filesort
sorts lines in the outputuniq
filters out duplicate lineswc
counts words (or lines when used with the-l
flag)|
(i.e., the “pipe” symbol) supplies output from one command as input to the next. Pipe is very convenient for combining commands. For example, here’s how we can find months of 2014 that occur within a set of timestamps:grep -E "^2014" logfile1 | cut -d"-" -f2 | sort | uniq
Here we first match lines against regular expression “starts with 2014”, then cut months. Finally, we use combination of sort
and uniq
to print occurrences only once.
Scaling Scala: How to Dockerize Using Kubernetes
Kubernetes is the new kid on the block, promising to help deploy applications into the cloud and scale them more quickly. Today, when developing for a microservices architecture, it’s pretty standard to choose Scala for creating API servers.
If there is a Scala application in your plans and you want to scale it into a cloud, then you are at the right place. In this article I am going to show step-by-step how to take a generic Scala application and implement Kubernetes with Docker to launch multiple instances of the application. The final result will be a single application deployed as multiple instances, and load balanced by Kubernetes.
All of this will be implemented by simply importing the Kubernetes source kit in your Scala application. Please note, the kit hides a lot of complicated details related to installation and configuration, but it is small enough to be readable and easy to understand if you want to analyze what it does. For simplicity, we will deploy everything on your local machine. However, the same configuration is suitable for a real-world cloud deployment of Kubernetes.
What is Kubernetes?
Before going into the gory details of the implementation, let’s discuss what Kubernetes is and why it’s important.
You may have already heard of Docker. In a sense, it is a lightweight virtual machine.
For these reasons, it is already one of the more widely used tools for deploying applications in clouds. A Docker image is pretty easy and fast to build and duplicable, much easier than a traditional virtual machine like VMWare, VirtualBox, or XEN.
Kubernetes complements Docker, offering a complete environment for managing dockerized applications. By using Kubernetes, you can easily deploy, configure, orchestrate, manage, and monitor hundreds or even thousands of Docker applications.
Kubernetes is an open source tool developed by Google and has been adopted by many other vendors. Kubernetes is available natively on the Google cloud platform, but other vendors have adopted it for their OpenShift cloud services too. It can be found on Amazon AWS, Microsoft Azure, RedHat OpenShift, and even more cloud technologies. We can say it is well positioned to become a standard for deploying cloud applications.
Prerequisites
Now that we covered the basics, let’s check if you have all the prerequisite software installed. First of all, you need Docker. If you are using either Windows or Mac, you need the Docker Toolbox. If you are using Linux, you need to install the particular package provided by your distribution or simply follow the official directions.
We are going to code in Scala, which is a JVM language. You need, of course, the Java Development Kit and the scala SBT tool installed and available in the global path. If you are already a Scala programmer, chances are you have those tools already installed.
If you are using Windows or Mac, Docker will by default create a virtual machine named default
with only 1GB of memory, which can be too small for running Kubernetes. In my experience, I had issues with the default settings. I recommend that you open the VirtualBox GUI, select your virtual machine default
, and change the memory to at least to 2048MB.
The Application to Clusterize
The instructions in this tutorial can apply to any Scala application or project. For this article to have some “meat” to work on, I chose an example used very often to demonstrate a simple REST microservice in Scala, called Akka HTTP. I recommend you try to apply source kit to the suggested example before attempting to use it on your application. I have tested the kit against the demo application, but I cannot guarantee that there will be no conflicts with your code.
So first, we start by cloning the demo application:
git clone https://github.com/theiterators/akka-http-microservice
Next, test if everything works correctly:
cd akka-http-microservice
sbt run
Then, access to http://localhost:9000/ip/8.8.8.8
, and you should see something like in the following image:
Adding the Source Kit
Now, we can add the source kit with some Git magic:
git remote add ScalaGoodies https://github.com/sciabarra/ScalaGoodies
git fetch --all
git merge ScalaGoodies/kubernetes
With that, you have the demo including the source kit, and you are ready to try. Or you can even copy and paste the code from there into your application.
Once you have merged or copied the files in your projects, you are ready to start.
Starting Kubernetes
Once you have downloaded the kit, we need to download the necessary kubectl
binary, by running:
bin/install.sh
This installer is smart enough (hopefully) to download the correct kubectl
binary for OSX, Linux, or Windows, depending on your system. Note, the installer worked on the systems I own. Please do report any issues, so that I can fix the kit.
Once you have installed the kubectl
binary, you can start the whole Kubernetes in your local Docker. Just run:
bin/start-local-kube.sh
The first time it is run, this command will download the images of the whole Kubernetes stack, and a local registry needed to store your images. It can take some time, so please be patient. Also note, it needs direct accesses to the internet. If you are behind a proxy, it will be a problem as the kit does not support proxies. To solve it, you have to configure the tools like Docker, curl, and so on to use the proxy. It is complicated enough that I recommend getting a temporary unrestricted access.
Assuming you were able to download everything successfully, to check if Kubernetes is running fine, you can type the following command:
bin/kubectl get nodes
The expected answer is:
NAME STATUS AGE
127.0.0.1 Ready 2m
Note that age may vary, of course. Also, since starting Kubernetes can take some time, you may have to invoke the command a couple of times before you see the answer. If you do not get errors here, congratulations, you have Kubernetes up and running on your local machine.
Dockerizing Your Scala App
Now that you have Kubernetes up and running, you can deploy your application in it. In the old days, before Docker, you had to deploy an entire server for running your application. With Kubernetes, all you need to do to deploy your application is:
- Create a Docker image.
- Push it in a registry from where it can be launched.
- Launch the instance with Kubernetes, that will take the image from the registry.
Luckily, it is way less complicated that it looks, especially if you are using the SBT build tool like many do.
In the kit, I included two files containing all the necessary definitions to create an image able to run Scala applications, or at least what is needed to run the Akka HTTP demo. I cannot guarantee that it will work with any other Scala applications, but it is a good starting point, and should work for many different configurations. The files to look for building the Docker image are:
docker.sbt
project/docker.sbt
Let’s have a look at what’s in them. The file project/docker.sbt
contains the command to import the sbt-docker
plugin:
addSbtPlugin("se.marcuslonnberg" % "sbt-docker" % "1.4.0")
This plugin manages the building of the Docker image with SBT for you. The Docker definition is in the docker.sbt
file and looks like this:
imageNames in docker := Seq(ImageName("localhost:5000/akkahttp:latest"))
dockerfile in docker := {
val jarFile: File = sbt.Keys.`package`.in(Compile, packageBin).value
val classpath = (managedClasspath in Compile).value
val mainclass = mainClass.in(Compile, packageBin).value.getOrElse(sys.error("Expected exactly one main class"))
val jarTarget = s"/app/${jarFile.getName}"
val classpathString = classpath.files.map("/app/" + _.getName)
.mkString(":") + ":" + jarTarget
new Dockerfile {
from("anapsix/alpine-java:8")
add(classpath.files, "/app/")
add(jarFile, jarTarget)
entryPoint("java", "-cp", classpathString, mainclass)
}
}
To fully understand the meaning of this file, you need to know Docker well enough to understand this definition file. However, we are not going into the details of the Docker definition file, because you do not need to understand it thoroughly to build the image.
the SBT will take care of collecting all the files for you.
Note the classpath is automatically generated by the following command:
val classpath = (managedClasspath in Compile).value
In general, it is pretty complicated to gather all the JAR files to run an application. Using SBT, the Docker file will be generated with add(classpath.files, "/app/")
. This way, SBT collects all the JAR files for you and constructs a Dockerfile to run your application.
The other commands gather the missing pieces to create a Docker image. The image will be built using an existing image APT to run Java programs (anapsix/alpine-java:8
, available on the internet in the Docker Hub). Other instructions are adding the other files to run your application. Finally, by specifying an entry point, we can run it. Note also that the name starts with localhost:5000
on purpose, because localhost:5000
is where I installed the registry in the start-kube-local.sh
script.
Building the Docker Image with SBT
To build the Docker image, you can ignore all the details of the Dockerfile. You just need to type:
sbt dockerBuildAndPush
The sbt-docker
plugin will then build a Docker image for you, downloading from the internet all the necessary pieces, and then it will push to a Docker registry that was started before, together with the Kubernetes application in localhost
. So, all you need is to wait a little bit more to have your image cooked and ready.
Note, if you experience problems, the best thing to do is to reset everything to a known state by running the following commands:
bin/stop-kube-local.sh
bin/start-kube-local.sh
Those commands should stop all the containers and restart them correctly to get your registry ready to receive the image built and pushed by sbt
.
Starting the Service in Kubernetes
Now that the application is packaged in a container and pushed in a registry, we are ready to use it. Kubernetes uses the command line and configuration files to manage the cluster. Since command lines can become very long, and also be able to replicate the steps, I am using the configurations files here. All the samples in the source kit are in the folder kube
.
Our next step is to launch a single instance of the image. A running image is called, in the Kubernetes language, a pod. So let’s create a pod by invoking the following command:
bin/kubectl create -f kube/akkahttp-pod.yml
You can now inspect the situation with the command:
bin/kubectl get pods
You should see:
NAME READY STATUS RESTARTS AGE
akkahttp 1/1 Running 0 33s
k8s-etcd-127.0.0.1 1/1 Running 0 7d
k8s-master-127.0.0.1 4/4 Running 0 7d
k8s-proxy-127.0.0.1 1/1 Running 0 7d
Status actually can be different, for example, “ContainerCreating”, it can take a few seconds before it becomes “Running”. Also, you can get another status like “Error” if, for example, you forget to create the image before.
You can also check if your pod is running with the command:
bin/kubectl logs akkahttp
You should see an output ending with something like this:
[DEBUG] [05/30/2016 12:19:53.133] [default-akka.actor.default-dispatcher-5] [akka://default/system/IO-TCP/selectors/$a/0] Successfully bound to /0:0:0:0:0:0:0:0:9000
Now you have the service up and running inside the container. However, the service is not yet reachable. This behavior is part of the design of Kubernetes. Your pod is running, but you have to expose it explicitly. Otherwise, the service is meant to be internal.
Creating a Service
Creating a service and checking the result is a matter of executing:
bin/kubectl create -f kube/akkahttp-service.yaml
bin/kubectl get svc
You should see something like this:
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
akkahttp-service 10.0.0.54 9000/TCP 44s
kubernetes 10.0.0.1 <none> 443/TCP 3m
Note that the port can be different. Kubernetes allocated a port for the service and started it. If you are using Linux, you can directly open the browser and type http://10.0.0.54:9000/ip/8.8.8.8
to see the result. If you are using Windows or Mac with Docker Toolbox, the IP is local to the virtual machine that is running Docker, and unfortunately it is still unreachable.
I want to stress here that this is not a problem of Kubernetes, rather it is a limitation of the Docker Toolbox, which in turn depends on the constraints imposed by virtual machines like VirtualBox, which act like a computer within another computer. To overcome this limitation, we need to create a tunnel. To make things easier, I included another script which opens a tunnel on an arbitrary port to reach any service we deployed. You can type the following command:
bin/forward-kube-local.sh akkahttp-service 9000
Note that the tunnel will not run in the background, you have to keep the terminal window open as long as you need it and close when you do not need the tunnel anymore. While the tunnel is running, you can open: http://localhost:9000/ip/8.8.8.8
and finally see the application running in Kubernetes.
Final Touch: Scale
So far we have “simply” put our application in Kubernetes. While it is an exciting achievement, it does not add too much value to our deployment. We’re saved from the effort of uploading and installing on a server and configuring a proxy server for it.
Where Kubernetes shines is in scaling. You can deploy two, ten, or one hundred instances of our application by only changing the number of replicas in the configuration file. So let’s do it.
We are going to stop the single pod and start a deployment instead. So let’s execute the following commands:
bin/kubectl delete -f kube/akkahttp-pod.yml
bin/kubectl create -f kube/akkahttp-deploy.yaml
Next, check the status. Again, you may try a couple of times because the deployment can take some time to be performed:
NAME READY STATUS RESTARTS AGE
akkahttp-deployment-4229989632-mjp6u 1/1 Running 0 16s
akkahttp-deployment-4229989632-s822x 1/1 Running 0 16s
k8s-etcd-127.0.0.1 1/1 Running 0 6d
k8s-master-127.0.0.1 4/4 Running 0 6d
k8s-proxy-127.0.0.1 1/1 Running 0 6d
Now we have two pods, not one. This is because in the configuration file I provided, there is the value replica: 2
, with two different names generated by the system. I am not going into the details of the configuration files, because the scope of the article is simply an introduction for Scala programmers to jump-start into Kubernetes.
Anyhow, there are now two pods active. What is interesting is that the service is the same as before. We configured the service to load balance between all the pods labeled akkahttp
. This means we do not have to redeploy the service, but we can replace the single instance with a replicated one.
We can verify this by launching the proxy again (if you are on Windows and you have closed it):
bin/forward-kube-local.sh akkahttp-service 9000
Then, we can try to open two terminal windows and see the logs for each pod. For example, in the first type:
bin/kubectl logs -f akkahttp-deployment-4229989632-mjp6u
And in the second type:
bin/kubectl logs -f akkahttp-deployment-4229989632-s822x
Of course, edit the command line accordingly with the values you have in your system.
Now, try to access the service with two different browsers. You should expect to see the requests to be split between the multiple available servers, like in the following image:
Conclusion
Today we barely scratched the surface. Kubernetes offers a lot more possibilities, including automated scaling and restart, incremental deployments, and volumes. Furthermore, the application we used as an example is very simple, stateless with the various instances not needing to know each other. In the real world, distributed applications do need to know each other, and need to change configurations according to the availability of other servers. Indeed, Kubernetes offers a distributed keystore (etcd
) to allow different applications to communicate with each other when new instances are deployed. However, this example is purposefully small enough and simplified to help you get going, focusing on the core functionalities. If you follow the tutorial, you should be able to get a working environment for your Scala application on your machine without being confused by a large number of details and getting lost in the complexity.
This article was written by Michele Sciabarra, a Toptal Scala developer.
Getting Started with Docker: Simplifying Devops
If you like whales, or are simply interested in quick and painless continuous delivery of your software to production, then I invite you to read this introductory Docker Tutorial. Everything seems to indicate that software containers are the future of IT, so let’s go for a quick dip with the container whales Moby Dock andMolly.
Docker, represented by a logo with a friendly looking whale, is an open source project that facilitates deployment of applications inside of software containers. Its basic functionality is enabled by resource isolation features of the Linux kernel, but it provides a user-friendly API on top of it. The first version was released in 2013, and it has since become extremely popular and is being widely used by many big players such as eBay, Spotify, Baidu, and more. In the last funding round, Docker has landed a huge $95 million.
Transporting Goods Analogy
The philosophy behind Docker could be illustrated with a following simple analogy. In the international transportation industry, goods have to be transported by different means like forklifts, trucks, trains, cranes, and ships. These goods come in different shapes and sizes and have different storing requirements: sacks of sugar, milk cans, plants etc. Historically, it was a painful process depending on manual intervention at every transit point for loading and unloading.
It has all changed with the uptake of intermodal containers. As they come in standard sizes and are manufactured with transportation in mind, all the relevant machineries can be designed to handle these with minimal human intervention. The additional benefit of sealed containers is that they can preserve the internal environment like temperature and humidity for sensitive goods. As a result, the transportation industry can stop worrying about the goods themselves and focus on getting them from A to B.
And here is where Docker comes in and brings similar benefits to the software industry.
How Is It Different from Virtual Machines?
At a quick glance, virtual machines and Docker containers may seem alike. However, their main differences will become apparent when you take a look at the following diagram:
Applications running in virtual machines, apart from the hypervisor, require a full instance of the operating system and any supporting libraries. Containers, on the other hand, share the operating system with the host. Hypervisor is comparable to the container engine (represented as Docker on the image) in a sense that it manages the lifecycle of the containers. The important difference is that the processes running inside the containers are just like the native processes on the host, and do not introduce any overheads associated with hypervisor execution. Additionally, applications can reuse the libraries and share the data between containers.
As both technologies have different strengths, it is common to find systems combining virtual machines and containers. A perfect example is a tool named Boot2Docker described in the Docker installation section.
Docker Architecture
At the top of the architecture diagram there are registries. By default, the main registry is the Docker Hub which hosts public and official images. Organizations can also host their private registries if they desire.
On the right-hand side we have images and containers. Images can be downloaded from registries explicitly (docker pull imageName
) or implicitly when starting a container. Once the image is downloaded it is cached locally.
Containers are the instances of images – they are the living thing. There could be multiple containers running based on the same image.
At the centre, there is the Docker daemon responsible for creating, running, and monitoring containers. It also takes care of building and storing images. Finally, on the left-hand side there is a Docker client. It talks to the daemon via HTTP. Unix sockets are used when on the same machine, but remote management is possible via HTTP based API.
Installing Docker
For the latest instructions you should always refer to the official documentation.
Docker runs natively on Linux, so depending on the target distribution it could be as easy as sudo apt-get install docker.io
. Refer to the documentation for details. Normally in Linux, you prepend the Docker commands with sudo
, but we will skip it in this article for clarity.
As the Docker daemon uses Linux-specific kernel features, it isn’t possible to run Docker natively in Mac OS or Windows. Instead, you should install an application called Boot2Docker. The application consists of a VirtualBox Virtual Machine, Docker itself, and the Boot2Docker management utilities. You can follow the official installation instructions for MacOS and Windows to install Docker on these platforms.
Using Docker
Let us begin this section with a quick example:
docker run phusion/baseimage echo "Hello Moby Dock. Hello Molly."
We should see this output:
Hello Moby Dock. Hello Molly.
However, a lot more has happened behind the scenes than you may think:
- The image ‘phusion/baseimage’ was download from Docker Hub (if it wasn’t already in local cache)
- A container based on this image was started
- The command echo was executed within the container
- The container was stopped when the command exitted
On first run, you may notice a delay before the text is printed on screen. If the image had been cached locally, everything would have taken a fraction of a second. Details about the last container can be retrieved by by running docker ps -l
:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
af14bec37930 phusion/baseimage:latest "echo 'Hello Moby Do 2 minutes ago Exited (0) 3 seconds ago stoic_bardeen
Taking the Next Dive
As you can tell, running a simple command within Docker is as easy as running it directly on a standard terminal. To illustrate a more practical use case, throughout the remainder of this article, we will see how we can utilize Docker to deploy a simple web server application. To keep things simple, we will write a Java program that handles HTTP GET requests to ‘/ping’ and responds with the string ‘pong\n’.
import java.io.IOException;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;
public class PingPong {
public static void main(String[] args) throws Exception {
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.createContext("/ping", new MyHandler());
server.setExecutor(null);
server.start();
}
static class MyHandler implements HttpHandler {
@Override
public void handle(HttpExchange t) throws IOException {
String response = "pong\n";
t.sendResponseHeaders(200, response.length());
OutputStream os = t.getResponseBody();
os.write(response.getBytes());
os.close();
}
}
}
Dockerfile
Before jumping in and building your own Docker image, it’s a good practice to first check if there is an existing one in the Docker Hub or any private registries you have access to. For example, instead of installing Java ourselves, we will use an official image: java:8
.
To build an image, first we need to decide on a base image we are going to use. It is denoted by FROMinstruction. Here, it is an official image for Java 8 from the Docker Hub. We are going to copy it into our Java file by issuing a COPY instruction. Next, we are going to compile it with RUN. EXPOSE instruction denotes that the image will be providing a service on a particular port. ENTRYPOINT is an instruction that we want to execute when a container based on this image is started and CMD indicates the default parameters we are going to pass to it.
FROM java:8
COPY PingPong.java /
RUN javac PingPong.java
EXPOSE 8080
ENTRYPOINT ["java"]
CMD ["PingPong"]
After saving these instructions in a file called “Dockerfile”, we can build the corresponding Docker image by executing:
docker build -t toptal/pingpong .
The official documentation for Docker has a section dedicated to best practices regarding writing Dockerfile.
Running Containers
When the image has been built, we can bring it to life as a container. There are several ways we could run containers, but let’s start with a simple one:
docker run -d -p 8080:8080 toptal/pingpong
where -p [port-on-the-host]:[port-in-the-container] denotes the ports mapping on the host and the container respectively. Furthermore, we are telling Docker to run the container as a daemon process in the background by specifying -d. You can test if the web server application is running by attempting to access ‘http://localhost:8080/ping’. Note that on platforms where Boot2docker is being used, you will need to replace ‘localhost’ with the IP address of the virtual machine where Docker is running.
On Linux:
curl http://localhost:8080/ping
On platforms requiring Boot2Docker:
curl $(boot2docker ip):8080/ping
If all goes well, you should see the response:
pong
Hurray, our first custom Docker container is alive and swimming! We could also start the container in an interactive mode -i -t. In our case, we will override the entrypoint command so we are presented with a bash terminal. Now we can execute whatever commands we want, but exiting the container will stop it:
docker run -i -t --entrypoint="bash" toptal/pingpong
There are many more options available to use for starting up the containers. Let us cover a few more. For example, if we want to persist data outside of the container, we could share the host filesystem with the container by using -v. By default, the access mode is read-write, but could be changed to read-only mode by appending :ro
to the intra-container volume path. Volumes are particularly important when we need to use any security information like credentials and private keys inside of the containers, which shouldn’t be stored on the image. Additionally, it could also prevent the duplication of data, for example by mapping your local Maven repository to the container to save you from downloading the Internet twice.
Docker also has the capability of linking containers together. Linked containers can talk to each other even if none of the ports are exposed. It can be achieved with –link other-container-name. Below is an example combining mentioned above parameters:
docker run -p 9999:8080
--link otherContainerA --link otherContainerB
-v /Users/$USER/.m2/repository:/home/user/.m2/repository
toptal/pingpong
Unsurprisingly, the list of operations that one could apply to the containers and images is rather long. For brevity, let us look at just a few of them:
- stop – Stops a running container.
- start – Starts a stopped container.
- commit – Creates a new image from a container’s changes.
- rm – Removes one or more containers.
- rmi – Removes one or more images.
- ps – Lists containers.
- images – Lists images.
- exec – Runs a command in a running container.
Last command could be particularly useful for debugging purposes, as it lets you to connect to a terminal of a running container:
docker exec -i -t <container-id> bash
Docker Compose for the Microservice World
If you have more than just a couple of interconnected containers, it makes sense to use a tool like docker-compose. In a configuration file, you describe how to start the containers and how they should be linked with each other. Irrespective of the amount of containers involved and their dependencies, you could have all of them up and running with one command: docker-compose up
.
Docker in the Wild
Let’s look at three stages of project lifecycle and see how our friendly whale could be of help.
Development
Docker helps you keep your local development environment clean. Instead of having multiple versions of different services installed such as Java, Kafka, Spark, Cassandra, etc., you can just start and stop a required container when necessary. You can take things a step further and run multiple software stacks side by side avoiding the mix-up of dependency versions.
With Docker, you can save time, effort, and money. If your project is very complex to set up, “dockerise” it. Go through the pain of creating a Docker image once, and from this point everyone can just start a container in a snap.
You can also have an “integration environment” running locally (or on CI) and replace stubs with real services running in Docker containers.
Testing / Continuous Integration
With Dockerfile, it is easy to achieve reproducible builds. Jenkins or other CI solutions can be configured to create a Docker image for every build. You could store some or all images in a private Docker registry for future reference.
With Docker, you only test what needs to be tested and take environment out of the equation. Performing tests on a running container can help keep things much more predictable.
Another interesting feature of having software containers is that it is easy to spin out slave machines with the identical development setup. It can be particularly useful for load testing of clustered deployments.
Production
Docker can be a common interface between developers and operations personnel eliminating a source of friction. It also encourages the same image/binaries to be used at every step of the pipeline. Moreover, being able to deploy fully tested container without environment differences help to ensure that no errors are introduced in the build process.
You can seamlessly migrate applications into production. Something that was once a tedious and flaky process can now be as simple as:
docker stop container-id; docker run new-image
And if something goes wrong when deploying a new version, you can always quickly roll-back or change to other container:
docker stop container-id; docker start other-container-id
… guaranteed not to leave any mess behind or leave things in an inconsistent state.
Summary
A good summary of what Docker does is included in its very own motto: Build, Ship, Run.
- Build – Docker allows you to compose your application from microservices, without worrying about inconsistencies between development and production environments, and without locking into any platform or language.
- Ship – Docker lets you design the entire cycle of application development, testing, and distribution, and manage it with a consistent user interface.
- Run – Docker offers you the ability to deploy scalable services securely and reliably on a wide variety of platforms.
Have fun swimming with the whales!
Part of this work is inspired by an excellent book Using Docker by Adrian Mouat.
This article was written by RADEK OSTROWSKI, a Toptal Java developer.
The Role of WebRTC Technology In Online Security
WebRTC technology is rather new (spearheaded by Google in 2012 through the World Wide Web Consortium). It is a free project that provides browsers with Real-Time Communications. The technology is now widely used in live help customer support solutions, webinar platforms, chat rooms for dating, etc. But there are too little solutions for enhanced safety. It’s weird. Since this technology offers great opportunities in this field.
WebRTC opens great opportunities in secure communications online
In the case of WebRTC technology to create a communication channel between subscribers is used Peer to Peer method. At the same time, there is no data transfer to any server. It is a great advantage. This ensures the confidentiality of transmitted information.
The majority of modern communication services works through central server. It means that all history is stored on the server and third parties can get access to them.
Using WebRTC technology security provider Privatoria.net developed a solution for confidential communication online in 2013. The main difference is the absence of data transfer to the server. Only the subscribers’ web browsers are used.
Chat service provides users with an opportunity to exchange messages by establishing a direct connection between their browsers and uses Peer to Peer method to communicate online.
To create a communication channel between subscribers it is enough to get a one-time key, and pass it to the called subscriber by any means of communication available. When the communication session is over, the history is deleted and the browser is closed, all correspondence between the subscribers disappears from the system.
In such case, no one can gain access to the content of communications.
A user will benefit from:
- Secure text messaging
- Secure Voice Call
- Secure Video Call
- Secure Data Transfer
As WebRTC supports not all browsers, Secure Chat solution works only in Google Chrome, Opera and Mozilla. At now developers are working on beta application for Android which will be available in Google Play Market in the nearest month.
Therefore, it is good chance for all us today to communicate securely online.
The Basics of Secure Web Development
The internet has contributed a great deal to commerce around the world in the last decade, and of course with a whole new generation of people breaking into the online world we’re starting to see just what computers are capable of accomplishing. Particularly when there is malicious intent on the other side of that keyboard.
Hackers and crackers are one of the biggest threats the world has ever experienced; they can take your money, your products or even destroy your business from the inside out – and you’ll never see them do it, they might not leave a trace at all. That is the terrifying truth about the internet; in most cases those with the skills to take what they want have the skills to hide themselves from detection – so what can you do to stop them.
The easiest way of protecting your website is to ensure that your business have a securely developed website. Secure web development is a complex area, and most likely something that you will need the help of a professional in order to fully implement, but it is worth noting that there are three different levels of security to take into consideration for your website and thus three different levels that need to be securely developed in order to ensure the protection of your business.
Consider these levels almost like doors. If your website was a business property you would have three ways in to the top secret bits; a front door, a side door and a back door.
The front door is the user interface; the bit of the website that you yourself work with. Now; the web developer might have made you a big magnificent door, lovely and secure – the sort of user interface that lets you manage your stock, orders, customers and all of the individual aspects of your business effortlessly without giving anything up. However; if your passwords aren’t secure it’s the equivalent of putting a rubbish, rusty old lock on that lovely secure door – completely pointless and insecure. Easy access. This is the first place a hacker is going to look – why would they waste their time hunting down and trying to exploit tiny weaknesses in the back door if they could open the front door with one little shove?
Change your passwords regularly, select passwords that use upper case, lower case, numbers and punctuation. Do not use the same password for everything.
The side door is the programming. The code used to construct your website puts everything in place and says who can do what and when; everything is controlled with the code, so an opening here can cause big problems if a hacker finds it. There are a number of different potential security risks when it comes to the code; there are bugs, which are just general, little faults with the website that occur when something didn’t go quite as planned or something was missed in the development stage. They always happen and there isn’t a single piece of software that doesn’t have bugs, the secure ones are just those that resolve the bugs as soon as they’re found, which stops them from being exploited.
Another risk to that side door is an injection; sort of like a fake key. This is something some of the smarter hackers can accomplish by injecting their own instructions into your system when it sends off a command or query – they can intercept your command or query. For example; let’s say you perform a simple PHP query that will fetch the products from the database when your user selects a product category. Normally this sort of script would be accessed through the URL with a category id.
For example;
Let’s say you did a regular sql database select query looking for the category ID, your category information and URL command might look something like;
c.category_id = ‘ . $_GET[‘cat’] . ‘LIMIT 10’;
Now; obviously this example suggests that the clever programmer has included a limit to prevent what is going to happen next – but this won’t protect him. Poor clever programmer is about to be outsmarted.
First of all; the only thing the thing the hacker needs to do is find your product list page and look for everything, example;
Yourwebsite.com/productlist.php?cat=1 or 1=1-
Doesn’t look like anything special right? Well, with this alone the hacker can now see every single one of your products. Depending on how secure your website is this might let them find faults in the products, but it’s probably still not that dangerous right? Well, what if they did this;
/productlist.php?cat=1 or error!;#$
Yep – bet you’re horrified now, because this will typically reveal the DBMS version of the query, and sometimes expose your table and column names. Not dangerous enough for you? With the tables and columns are revealed the hacker can move on to attacking the user table, all thanks to exploiting a weakness in the products table.
/productlist.php?cat=1 and (select count(*) from users) > 0
Creating a new query inside the existing one means that they don’t need to verify the database connection; they’re using yours. They have access to your database not and their using it to find your user table, which can progress to finding how many users you have, and even finding the information within the user table. I’m quite sure I don’t need to specify why having access to your user database is such a bad thing.
So – if you want to avoid the injections you need to ensure that every bit of input data gets validated, reduce the amount of information shown when an error displays, really limit the database permissions to prevent php queries from being able to pull any more information than they absolutely need to and use parameters in your queries.
Finally – the back door. This is the server. You need to ensure that the server you use to host your information and website is secure. There have been a number of cases where highly secure websites were eventually hacked by first hacking a much lower security website that shared the host server. If you want to avoid this you can consider a dedicated server for your website, you should also consider keeping to companies hosting companies that offer support and security as part of the hosting package. Ask them what software their servers are running; this will give you an idea of how regularly they are updated – up to date servers are the most secure. Older software has had longer to be exploited and thus more of the weaknesses in these are already known to hackers.
Kate Critchlow is a young and enthusiastic writer with a particular interest for technology, covering everything from secure development to the latest gadget releases.