domingo, 18 de diciembre de 2016

Process mating season 101 - Fork and Clone

Yeah, I know, the title is awesome, isn't it? Come on! at least it should be if you understand what fork() and clone() do in the context of Linux syscalls.

These two calls are the ones in charge of creating life on your system, basically spawning new processes. We'll have a look at the fundamental differences, some examples to understand them and some interesting facts

That said, before moving forward I'd like to outline the following:
  • Syscalls are the interface the kernel exposes to the user space processes, or if you want, they are the way processes access to the kernel services.
  • Usually processes don't use syscalls directly but they use them through some wrapper functions provided by glibc (in GNU/Linux at least). This wrapper functions provide sort of an abstraction layer, handling syscall parameters, return values and other situations.
Note: btw I've already talked a bit about syscalls here maybe you want to have a look.

Fork, the processes mitosis process

[off topic] I'm getting extremely good at writing titles![/off topic]

Pretty much like the eukaryotic cells replication process, the result of a fork call is a new almost identical process known as child process. The "almost" is key here, there are some properties that will be different, like (for an exhaustive list, please have a look at this):
  • PID will be different, the kernel will assign a new/unused PID to the child process. The parent PID of the child process will be its parent's PID (kind of makes sense, doesn't it?)
  • The child process doesn't inherit timers or memory locks.
  • Others
On the other side of the "almost" we have:
  • The child process will have an exact copy of its parent entire virtual address space (fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.). 
  • The child process inherits copies of structures like open file descriptors, open directory streams, etc.
Right after the fork() call the processes will be, although they are still sharing some resources,  two different entities, and they could be running two different code paths. This will be easier to understand with a simple example.

Simple fork() example


The code is self explanatory (or at least I tried) so I won't explain it in details
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <string.h>

int main()
{
        int pid,ppid,childpid,keep_it;
        pid=getpid();//Get process ID
        ppid=getppid();//Get parent process ID

        childpid=fork();//fork() returns the child PID on the parent's code path and 0 on the child's. On failure returns -1
        //From this point on, there are 2 processes running, unless fork failed of course :D.
        keep_it=errno;
        if(childpid==-1)//Check if fork() failed
        {
                printf("Fork failed due to \"%s\"\n",strerror(keep_it));//print the system error that corresponds to errno
                return -1;
        }

        if(childpid==0)//Here is where the paths change, it could be done differently.
        {//Child code path here
                printf("Child process: \nPID\tPPID\n%d\t%d\n",pid,ppid);
                pid=getpid();
                ppid=getppid();
                printf("Child process: \nPID\tPPID\n%d\t%d\n",pid,ppid);
                sleep(5);
        }
        else
        {//Parent code path here
                sleep(10);
                printf("Parent process: \nPID\tPPID\n%d\t%d\n",pid,ppid);
                printf("Parent process: child PID was %d\n",childpid);
        }
        return 1;
}

lets run it to see what happens:
juan@test:~/clone_fork$ gcc -o fork_simple_example fork_simple_example.c
juan@test:~/clone_fork$ ./fork_simple_example
Child process:
PID     PPID
3213    2965
Child process:
PID     PPID
3214    3213
Parent process:
PID     PPID
3213    2965
Parent process: child PID was 3098
juan@test:~/clone_fork$
on a different shell I also captured the processes with ps:
juan@test:~$ ps axo stat,user,comm,ppid,pid|grep fork
S+   juan     fork_simple_exa  2965  3213
S+   juan     fork_simple_exa  3213  3214
juan@test:~$
So, what do we see from both outputs? The child process printed twice its PID and PPID just for the sake of showing how the first time the values on those variables were actually the ones collected by its father before the fork() call.

 File descriptors are preserved example


As we mentioned before, certain kernel structures are copied to the new child process, one of them are the open file descriptors. Now lets see that in an example using a pipe.

Note: a pipe is a type of Inter Process Communication mechanism, you can think the pipe as that simply a pipe with two ends, one where you can write to and one where you can read from. For more details please have a look at this.

The code is:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <string.h>
#define SIZE 250

int main()
{
        char buffer[SIZE],ch;
        int keep_it,childpid,aux,count;
        int my_pipe[2];//my_pipe[] will keep two FD, my_pipe[1] to write into the pipe and my_pipe[0] to read from the pipe.

        aux=pipe(my_pipe);//Note how the pipe is created BEFORE the fork() call

        if(aux==-1)//Check if pipe() failed
        {
                printf("Pipe failed due to \"%s\"\n",strerror(keep_it));//print the system error that corresponds to errno
                return -1;
        }

        childpid=fork();//fork() returns the child PID on the parent's code path and 0 on the child's. On failure returns -1
        //From this point on, there are 2 processes running, unless fork failed of course :D.
        keep_it=errno;
        if(childpid==-1)//Check if fork() failed
        {
                printf("Fork failed due to \"%s\"\n",strerror(keep_it));//print the system error that corresponds to errno
                return -1;
        }

        if(childpid==0)//Here is where the paths change, it could be done differently.
        {//Child code path here
                close(my_pipe[0]);//On the child process we can close the read end of the pipe
                printf("Hi, this is the child process, insert message here (:P less than %d letters please): ",SIZE);
                fgets(buffer,sizeof(buffer),stdin);
                count=write(my_pipe[1],buffer,SIZE);
                printf("message sent to parent process.\n");
        }
        else
        {//Parent code path here
                close(my_pipe[1]);//On the parent process we can close the write end of the pipe
                read(my_pipe[0],buffer,SIZE);
                printf("Parent process received message: %s",buffer);
        }
        return 1;
}

This code is a bit more complex, but the comments should help.

You can see how the pipe was open on the parent process (Line 15) and yet it was used on the child process (Line 37) without any problems! Also worth noting how the pipe requires 2 file descriptors, one to read from the pipe (stored in my_pipe[0]) and one to write into the pipe (stored in my_pipe[1]). After the fork() call since child and parent process have copies of these open file descriptors they can safely close the ones they won't use, and then they end up with a unidirectional inter process communication channel (child -> PIPE -> parent).

A funny fact

 
At this point I was tempted to run some straces to show how the fork syscall was being used (using strace) and noticed the following, this is the strace output of running the first simple example:
juan@test:~/clone_fork$ strace ./fork_simple_example
execve("./fork_simple_example", ["./fork_simple_example"], [/* 22 vars */]) = 0
brk(0)                                  = 0x16ba000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa76a34a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=95253, ...}) = 0
mmap(NULL, 95253, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa76a332000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P \2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa769d65000
mprotect(0x7fa769f1f000, 2097152, PROT_NONE) = 0
mmap(0x7fa76a11f000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7fa76a11f000
mmap(0x7fa76a125000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa76a125000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa76a331000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa76a32f000
arch_prctl(ARCH_SET_FS, 0x7fa76a32f740) = 0
mprotect(0x7fa76a11f000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7fa76a34c000, 4096, PROT_READ) = 0
munmap(0x7fa76a332000, 95253)           = 0
getpid()                                = 4981
getppid()                               = 4978
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa76a32fa10) = 4982
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({10, 0}, Child process:
PID     PPID
4981    4978
Child process:
PID     PPID
4982    4981
{4, 995479188})      = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4982, si_status=1, si_utime=0, si_stime=0} ---
restart_syscall(<... resuming interrupted call ...>
) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa76a349000
write(1, "Parent process: \n", 17Parent process:
)      = 17
write(1, "PID\tPPID\n", 9PID    PPID
)              = 9
write(1, "4981\t4978\n", 104981 4978
)            = 10
write(1, "Parent process: child PID was 49"..., 35Parent process: child PID was 4982
) = 35
exit_group(1)                           = ?
+++ exited with 1 +++
juan@test:~/clone_fork$

do you see any fork() call there? ... exactly there's no fork call!!! But I said fork is a Linux syscall and blah blah blah, right? Well, worry not, I wasn't lying :D all I said is true however...

Since version 2.3.3, rather than invoking the kernel's fork() system call, the glibc fork() wrapper that is provided as part of the NPTL threading implementation invokes clone(2) with flags that provide the same effect as the traditional system call. (A call to fork() is equivalent to a call to clone(2) specifying flags as just SIGCHLD.) 

that's the reason why we do see a clone call instead!

Now my brain needs some rest so I'll finish this post here, any feedback will be more than welcome!

On the next post I'll describe clone() and we'll see some examples to understand even better the differences with fork().