Linux technologies make up the foundations of building/running a container process in your system. Technologies like:
Namespaces provide a layer of isolation for the containers by giving the container a view of what appears to be its own Linux filesystem. This would limit as to what a process can see and therefore restrict the amount of resources available to this process.
There are several namespaces in the Linux kernel that are used by docker while creating a container:
[nivedv@homelab ~]$ docker container run alpine ping 188.8.131.52 [nivedv@homelab ~]$ sudo lsns -p 29413 NS TYPE NPROCS PID USER COMMAND 4026531835 cgroup 299 1 root /usr/lib/systemd/systemd --switched... 4026531837 user 278 1 root /usr/lib/systemd/systemd --switched... 4026533105 mnt 1 29413 root ping 184.108.40.206 4026533106 uts 1 29413 root ping 220.127.116.11 4026533107 ipc 1 29413 root ping 18.104.22.168 4026533108 pid 1 29413 root ping 22.214.171.124 4026533110 net 1 29413 root ping 126.96.36.199
/proc/<PID>/mountslocation in your Linux system.
[nivedv@homelab ~]$ docker container run -it --name nived alpine sh / # hostname 9c9a5edabdd6 / # [nivedv@homelab ~]$ sudo unshare -u sh sh-5.0# hostname isolated.hostname sh-5.0# hostname isolated.hostname sh-5.0# sh-5.0# exit exit [nivedv@homelab ~]$ hostname homelab.redhat.com
[root@demo /]# ipcmk -M 10M Shared memory id: 0 [root@demo /]# ipcmk -M 20M Shared memory id: 1 [root@demo /]# [root@demo /]# ipcs ------ Message Queues -------- key msqid owner perms used-bytes messages ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0xd1df416a 0 root 644 10485760 0 0xbd487a9d 1 root 644 20971520 0 ------ Semaphore Arrays -------- key semid owner perms nsems
[nivedv@homelab ~]$ docker container run --rm -it alpine sh / # ping 188.8.131.52 PING 184.108.40.206 (220.127.116.11): 56 data bytes 64 bytes from 18.104.22.168: seq=0 ttl=119 time=21.643 ms 64 bytes from 22.214.171.124: seq=1 ttl=119 time=20.940 ms ^C [root@homelab ~]# ip link show veth84ea6fc veth84ea6fc@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
Control groups ( cgroups ):
Cgroups are fundamental blocks of making a container. It is responsible to allocate and limit the resources, such as CPU, memory, Network I/O, that are used by containers. The Container Engine automatically creates cgroup filesystem of each type.
[root@homelab ~]# lscgroup | grep docker cpuset:/docker net_cls,net_prio:/docker cpu,cpuacct:/docker hugetlb:/docker devices:/docker freezer:/docker memory:/docker perf_event:/docker blkio:/docker pids:/docker
The Container Runtime sets up the cgroups values for each container when the container is run and all information is stored in
/sys/fs/cgroup/*/docker. The following command will ensure that the container can use 50,000 microseconds of CPU time, and set up the soft and hard limits of memory to 500M and 1G respectively.
[root@homelab ~]# docker container run -d --name test-cgroups --cpus 0.5 --memory 1G --memory-reservation 500M httpd [root@homelab ~]# lscgroup cpu,cpuacct:/docker memory:/docker cpu,cpuacct:/docker/ cpu,cpuacct:/docker/c3503ac704dafea3522d3bb82c77faff840018e857a2a7f669065f05c8b2cc84 memory:/docker/ memory:/docker/c3503ac704dafea3522d3bb82c77faff840018e857a2a7f669065f05c8b2cc84 [root@homelab c....c84]# cat cpu.cfs_period_us 100000 [root@homelab c....c84]# cat cpu.cfs_quota_us 50000 [root@homelab c....c84]# cat memory.soft_limit_in_bytes 524288000 [root@homelab c....c84]# cat memory.limit_in_bytes 1073741824
Seccomp basically stands for Secure computing. It is a Linux feature that is used to restrict the set of system calls that an application is allowed to make. The default seccomp profile of docker disables around 44 syscalls out of the 300+.
The idea here is to provide containers access to only those resources which the container might need. For example, if you don’t need the container to be changing the clock time on your host machine, then probably you have no use of the
clock_adjtime & clock_settime syscalls and it makes sense to block them out. Similarly, you wouldn’t want the containers to make changes to the kernel modules so there is no need for them to call
create_module, delete_module syscalls.
SELinux stands for security-enhanced Linux. If you are running a Red Hat distribution on your hosts, then SELinux is enabled by default. SELinux lets you limit an application to have access only to its own files and prevent any other processes from being able to access them. So, if an application is compromised, it would limit the number of files that it can affect or control. It does this by setting up contexts for files and processes and by defining policies that would enforce what a process is able to see and make changes to.
SELinux policies for containers are defined by the container-selinux package. By default, containers are run with the container_t label and are allowed to read & execute under the
/usr directory and read most content from
/etc directory. The files under
/var/lib/containers have the label
Originally posted @ Medium
The views expressed and the content shared in all published articles on this website are solely those of the respective authors, and they do not necessarily reflect the views of the author’s employer or the techbeatly platform. We strive to ensure the accuracy and validity of the content published on our website. However, we cannot guarantee the absolute correctness or completeness of the information provided. It is the responsibility of the readers and users of this website to verify the accuracy and appropriateness of any information or opinions expressed within the articles. If you come across any content that you believe to be incorrect or invalid, please contact us immediately so that we can address the issue promptly.