Wednesday, April 12, 2017

Docker Container DNS Problem - The Fix

The DNS Problem

I recently ran into a problem with my containers not being able resolve names to ip addresses. I found this problem when I was installing Jenkins in a container. When I got to the step to install plugins jenkins said it was not connected to the internet. So I do a couple of checks.

First I got the Container ID from docker.

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
91aa53f4642a jenkins@sha256:c0cac51cbd3af8947e105ec15aa4bcdbf0bd267984d8e 7be5663b5551bbc5f4b "/bin/tini -- /usr..." 5 hours ago Up 5 hours

Notice the Container ID is 91aa53f4642a now you can attach to the container and run any system command you want.

#docker exec -it 91aa53f4642a /bin/bashjenkins@91aa53f4642a:/$ ping www.google.com

The ping command returned not found. Next I checked if I could actually get to an external IP address.

jenkins@91aa54f4642a:/$ ping 8.8.8.8

When I ran this I got access to the remote site. So I have internet connectivity, but no name resolution.

The Fix

Turns out this is a known problem with docker 1.11 and on. When the resolv.conf is created for for the container it does its best to handle inter-container name and service resolution, but it does not handle the external name resolution. In order to make this happen the host machine of the docker container must start dockerd with the --dns option set to an external DNS like 8.8.8.8.

First you have to find out how dockerd is getting started. if you are using Linux this is probably in the systemctl subsystem. For CentOS you can find this by using the systemctl command.

# systemctl show docker.service | grep Fragment
FragmentPath=/usr/lib/systemd/system/docker.service

Now look at the docker.service file and look for the ExecStart. /usr/lib/systemd/system/docker.service

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target firewalld.service
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=multi-user.target

Now edit this file and change the ExecStart to include the --dns option.

ExecStart=/usr/bin/dockerd --dns=8.8.8.8

This will tell all of the containers that get started on this host to use 8.8.8.8 as the secondary dns service after the inter-container dns service. Now that you have made the change then you need to restart the docker daemon.

# systemctl daemon-reload
# systemctl restart docker.service

That is all you need to do. Now you check things out by running ping in the container again.

#docker exec -it 91aa53f4642a /bin/bashjenkins@91aa53f4642a:/$ ping www.google.com

Hope this helps you out with this problem.

DWP

Tuesday, April 11, 2017

Fault Tolerance Jenkins with Docker Swarm

Installing Docker Swarm

I have chosen to use CentOS 7 for my cluster of machines. So these instructions are for CentOS 7.

First I need to install docker on all of the machines in the cluster.

Set up yum so it can see the latest docker packages.

# sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

Next install docker onto each machine in your cluster.

# sudo yum install docker-ce

Once you have installed docker on every node in your cluster you can now set up your swarm. First you have to choose which machines will be your manager(s).

On one of the masters you need to initialize the swarm

# docker swarm init

If your machine has more than one network then you will need to specify the ip address to use for the master.

# docker swarm init --advertise-addr 172.16.0.100
Swarm initialized: current node (a2anz4z0mpb0vmcly5ksotfo1) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \ --token SWMT....wns 172.16.0.100:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

In this example the IP address is 172.16.0.100 for the swarm master. Now on each worker node I just run the join command as specified in the output of the init command.

# docker swarm join --token SWMT...wns 172.16.0.100:2377

If you want to add another master then you run the command

# docker swarm join-token manager

It will tell you exactly what you need to do.

Setting up Jenkins in your Swarm

Now this is the easy part. Sort of. With docker swarm and services this has just gotten much easier. There are several docker images that are available with jenkins already installed in them. So it is best if we just use one of them. The most popular is "jenkins". Go figure. Now with the image name all we need to do is start a service in the swarm. We can simply write a small compose file and we will be set.

# docker-jenkins.yaml
version: '3'
services:
jenkins:
image: jenkins
ports:
- "8082:8080"
- "50000:50000"
environment:
JENKINS_OPTS: --prefix=/jenkins
deploy:
placement:
constraints: [node.role == manager]
volumes:
- $PWD/docker/jenkins:/var/jenkins_home

There are a couple of things to note.

Ports - First the ports are mapped to 50000 and 8082. This are external ports and will be accessible outside of the container.
Environment - You can set any jenkins options on this line any following environment lines
Volumes - This will give us the ability to "mount" a directory from the host machine into the container. So if the container goes down we still have our jenkins installation. You will need to create the directory using

# mkdir ~/docker/jenkins && chmod 777 ~/docker/jenkins

if you don't do this you will have problems with jenkins coming up.

Now it is time to actually start the service.

# docker stack deploy -c docker-jenkins.yaml buildCreating network build_default
Creating service build_jenkins

Two things where created when the deploy was run. A default network "build_default" and the service "build_jenkins" notice all of the artifacts created will begin with "build_". The default network is created when a network is not specified.

Now you should be able to access the jenkins web site at

http://172.16.0.100:8082/jenkins

Jenkins now requires a password when you install. You can find the password in the secrets directory in the docker/jenkins base directory.

# cat ~/docker/jenkins/secrets/initialAdminPassord
asldfkasdlfkjlasdfj23iwrh

Cut and paste this into your browser and you will be set and ready to go.

Debugging Tools

Here are a couple of things I found useful when I was setting up the environment.

# docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
91aa53f4642a jenkins@sha256:c0cac51cbd3af8947e105ec15aa4bcdbf0bd267984d8e7be5663b5551bbc5f4b "/bin/tini -- /usr..." 5 hours ago Up 5 hours 8080/tcp, 50000/tcp build_jenkins.1.abu55c8tybjwrsd35ouaor1d2

Shows the containers that are currently running. This will include the containers that are running the services. I found that some of the containers never started up. So I was trying to find out what happen. So I ran the following command:

# docker service ps build_jenkins

ID NAME IMAGE NODE DESIRED ST ATE CURRENT STATE ERROR PORTS
abu55c8tybjw build_jenkins.1 jenkins:latest node0.intel.local Running Running 5 hours ago
nac73zp1gc68 \_ build_jenkins.1 jenkins:latest node0.intel.local Shutdown Failed 5 hours ago "task: non-zero exit (1)"
xyrmzvx1pnnp \_ build_jenkins.1 jenkins:latest node0.intel.local Shutdown Failed 5 hours ago "task: non-zero exit (1)"
phycp5ypp61o \_ build_jenkins.1 jenkins:latest node0.intel.local Shutdown Failed 5 hours ago "task: non-zero exit (1)"
o3ewixv3hvcy \_ build_jenkins.1 jenkins:latest node0.intel.local Shutdown Failed 5 hours ago "task: non-zero exit (1)"

This will show the tasks for the services before the containers get launched and their status.

Friday, April 7, 2017

KubeCon 2017 Europe

KubeCon was held in Berlin this spring. As this is a developer focused conference it was most definitely a Tee-Shirt conference. Intel had a small booth where we had continuous demos of Secure Clear Containers and Kubernetes Federation. Intel was a Diamond Sponsor of the event. The big announcement was the release of Kubernetes 1.6 with its added features.

Rolling updates with DaemonSets
Beta release of kubernetes federation
improved networking functionality and tools
Improved scheduling
Storage Improvements
New adm tool for enterprise customers.

The biggest buzz around the show was default networking, storage and security. Typically Kubernetes chooses configurability over convention, which leads to longer setup time and variability in deployments specify around networking and storage. Security is a hot topic/issue with all container technologies, not just kubernetes.

One of Kubernetes biggest complaints is it is hard to get up and running, especially around network configurations. With 1.6 some network aspects come configured out of the box. For example etcd comes installed and configured (Service Discovery), CNI is now integrated with CRI by default and a stand bridge plugin has been validated with the combination. This decreases the amount of time and variability in previous releases. These are welcomed changes in the distro.

Another big issue with Kubernetes and Containers in general is lack of support of storage. Kubernetes is taking a clue from OpenStack here
and are supporting more Software Defined Storage options. Kubernetes gives the ability to plugin to Ceph, Swift, Lustre and other basic Storage sub-systems. But they are not planning on supporting a storage solution themselves. The announcement at KubeCon was an increased focus on Persistent Volumes. It will be interesting to see how a focus in this area will change the community from compute focused to complete solution focused. Time will tell if it takes.

As I worked the booth for two days and attended sessions which were standing room only, it was good to interact with developers and hear their problems and concerns about working in the data-center. There was interest in the Kubernetes Federation demo which was somewhat problematic, but gave plenty of talking points. The Secure Clear Containers got lots of traffic and buzz. Many of the conversations were around secure as it is still a major problem with containers in general. Everyone was looking for what was available in the security area.

On a personal note I got the opportunity to meet a long lost cousin from the Pulsipher/Pulsifer side of my family. He was excited to see another Pulsipher and thought he was the last of his family out there. It was fun to share family stories and he got to hear about our common Ancestor which came into the Americans in the 1640s. It was also a great technical contact as he works for Spotify and works as the Director of Security in their data center.

DWP

Sunday, April 2, 2017

Moving Docker Compose to Docker Stack

Ok. I am finally moving from Docker Compose to Docker Stack. It has been a while since I updated my Docker environment and I am very happy with the direction that docker has moved. They have moved in the direction that I am personally have been promoting. Check out my blog on Services in Multiple environments. Multiple Environment Development.

Swarm Concepts

The first thing I did was read up on the changes in concepts between compose and stack. Docker introduced new concepts of Stack, Service and Task. The easiest way to think of it is a Stack consists of several services, networks and volumes. A Stack can represent a complex application that has multiple services.

A Service can have a set or replicas that consists of a image running in a container and tasks that are run on the container. A Service has State. This is where things are different between compose and stack. Compose launches all of the containers and runs tasks and then forgets about it. Stack can keep track of the state of the containers even after they have been launched. This means if a container that correlates with a service goes down it will launch another one in its place based on policies. Basically your application will be kept up by Swarm. Built in HA, load balancing and Business continuity.

When you specify a service you specify:

the port where the swarm will make the service available outside the swarm
an overlay network for the service to connect to other services in the swarm
CPU and memory limits and reservations
a rolling update policy
the number of replicas of the image to run in the swarm

Notice the word Swarm here. You must have a docker swarm before you use services and stacks.

Practical differences

Compose files can be used for Stack deployments. But there are a couple of things to watch out for.

"buid" is not supported in stack you have to build with docker build
"external_links" is not supported in stack this is covered by links and external hosts.
"env_file" is not supported in stack you have to specify each environment variable with "environment"

Wow! That was a problem for me because my compose file had build directives for my project and had env_file to pass in environment variables. Now I had to make changes to make things work the way before.

"build" Alternative

Simply put stack services only take images. So that means that you must build your image before you deploy or update your stack. So instead of just specifying the build in the service definition you must call docker build before calling docker stack deploy.

File: docker-compose.yaml

etsy-web:
build: .
expose:
- 80
- 8080
- 1337
links:
- etsy-mongo
- etsy-redis
ports:
- "1337:1337"
- "80:80"
command: npm start

And to launch my containers then I just call

# docker-compose up

To make this work properly we need to remove the build line above. And replace it with an image key.

etsy-web:
image: etsy-web
expose:
- 80
- 8080
- 1337
links:
- etsy-mongo
- etsy-redis
ports:
- "1337:1337"
- "80:80"
command: npm start

Then you have to build the etsy-web image first and then deploy the stack

# docker build .
# docker stack deploy --compose-file docker-compose.yaml

So it is that easy. Change one key in your yaml file and you can be up and running.

env_file alternative

With stack you specify the environments using the environment tag.

This can be using dictionary or array formats.

Dictionary

environment:

PASSWORD: qwerty

USERNAME: admin

or Array

environment:

- PASSWORD=qwerty

- USERNAME=admin

Also take note that the environment variables in the docker-compose.yaml file override any environment variables defined in the dockerfile for the container. Additionally environment variables can be passed into the command line when calling "docker stack". These environment variable override the environment variables in the both the docker-compose.yaml and dockerfile.yaml files.

external_links alternative

"stack" uses link and external hosts to establish service names that are looked up when the containers are launched. This is a change from before when a changes to /etc/hosts was changed for each container to establish container connectivity. See Docker Service Deiscovery.

Benefits of Stack

Even though there are changes to some of the yaml file format and some additional command line options the benefits of having services over containers, stacks over containers is huge. I can now have a managed stack that keeps my services up and running based on policies that I have established for the each service.

DWP

Cloud Aware Application Development in Multiple Environments.

With the shift from traditional Client Server Application Software to Cloud Aware Application many Software Engineers have found themselves dusting off old System Administration Books from college. With multiple services running on multiple machine or containers software engineers have to be able to manage their applications across more and more complex environments. As I have been talking to some of my customers I have found common pain points in managing these complex applications:

Consistency between environments
Single point of failure services
Differing environment requirements (Not all environments are created equal)
Managing multiple environments across multiple clouds

All of these factors and many more can lead to time wasted, applications being released into production before their time, or worst of all unhappy software engineers.

DevOps to the rescue?

Wouldn't it be nice if the software engineer just worried about their application and its code, instead of all of the environments that it has to run on? In some places that is exactly what happens. Developers develop on their local laptops or in a development cloud and then check in their code and it moves to production. DevOps cleans up any problems with applications using single instance bottle-necked services, out of sync versions of centralized services, or adding load balancing services to the front end or back end of the application. The App developers have no clue what mess they have caused with their code changes, or a new version of service that they are using. Somehow we need to make sure that the application developer is still connected to the application architecture but disconnected from the complexity of managing multiple environments.

Single Definition Multiple Environments

Working on my Local machine

One approach that I have been looking at is having the ability to define my application as a set of service templates. In this simple example I have a simple Node JS application that uses Redis and MongoDB. If I use a yaml format. It might look something like this.







MyApp:     
  Services:  
    web: NodeJS  
      ports: 80  
      links: mqueue, database  
    mqueue: Redis  
      ports: 6789  
    database: MongoDB  
      ports: 25678, 31502

So with this definition I would like to deploy my application on my local box, using Virtual Box. I put this yaml file in my home directory of the application. This should be very familiar to those of you that have used docker-compose. Now I should be able to launch my application on my local machine using a command similar to docker-compose.

$ caade up

After a couple of minutes my multi-service application is running on my local laptop.

I can change the application code and even make changes to the services that I need to work with.

Working in a Development Cloud

Now that I have it running on my laptop I want to make sure that I can run it in a cloud. Most organizations work with development clouds. Typically development clouds are not as big as production and test clouds but give the developer a good place to try out new code and debug problems found in production and test environments. Ideally the developer should use the same application definition and just point to another environment to launch the application.

$ caade up --env=Dev

This launches the same application in the development environment. Which could be a OpenStack, VMWare or Kubernetes based SDI solution. The developer really does not care about how the infrastructure gets provisioned, just that it is done quickly and reliably. On quick inspection we see a slight difference in the services that are running in the development cloud. There is another instance of the NodeJS service running. This comes from the service definition of the NodeJS service. The NodeJS service is defined to have multiple instance in the development cloud and only one instance in the local environment.

NodeJS.yml - Service Definition

NodeJS:  
  Local:  
   web:  
      image: node-3.0.2  
      port: 1337  
  Dev:  
    web:  
     image: node-3.0.2  
     port: 1337  
    worker:  
     image: node-3.0.2  
     port:1338  
     cardinality: 3  
  Test: …  
  Prod: …  

This definition is produced by the service and stack developer not the application developer. So the service can be reused by several developers and can be defined for different environments (Local, Dev, Test, & Production). This ensures that services are defined for the different requirements of the environments. For example Production NodeJS might have a NGNX load balancer on the front end of it for serving up NodeJS web services for each user logged in. The key is that this is defined for the Service

that is reused. This increases reusability and quality at the same time.

Working in the Test Cloud

Now that I have tried my application in the development cloud. It is time to run it through a series of tests before it gets pushed to production. This is just as easy for the developer as working in the development cloud.

$ caade up --env=Test

$ caade run --env=Test --exec runTestSuites

We launched the environment and then run the test suites in that environment. When the environment launches you can see additional instances of the same services we have seen before in the development cloud. Additionally, there is a new service running in the environment. The Perf Monitor Service is also running. It is monitoring the performance of the services while the tests are running. Where did the definition of this service come from? It came from the application stack definition. This definition just like the service definition can show that the application can have a different service landscape for each environment. But the software developer still sees them as the same. That is to say, code should not change based on the environment that is running the application. This decouples the application from the environment and frees up the software developer to focus on code and not environments.

What about Production

The ultimate goal of course is to get the application into production. Some organizations, the smart ones, don't let developers publish directly into production without some gates to pass thru. So instead of just calling "Caade up --env=Prod" we have a publish mechanism that versions the application, its configurations and supporting services.

$ caade publish --version=1.0.2

In this case the application is published and tagged with version 1.0.2. Once the application is published, it will then launch the environment if it is not currently running. If it is running then it will "upgrade the service" to the new version. The upgrade process will be covered in another blog. Needless to say it allows for rolling updates with minimum or no downtime. As you can see additional services have been added and some taken away from the test environment.

Happy "Coder" Happy Company

The software engineer in this story focuses on writing software not on the environment. Services are being reused from application to application. Environment requirements are being met with service and application definitions. Stack and service developers are focusing on writing services for reuse instead of fixing application developers code. Now your company can run fast and deploy quality products into production,a