Ningún detalle es anecdótico...: Process mating season 102

It's been a while since I posted Process mating season 101 where I reviewed how processes are created in Linux systems showing examples of the behavior of fork() syscall. At the very end of the post I mentioned glibc is not using fork() syscall anymore when you call the fork() wrapper, but now is calling clone() syscall instead. Even though you can still use the old fork syscall, if you call the syscall directly, clone provides the same features and many more!

So what's the difference between clone and fork?

Since you already know what to expect from fork() (because you have read Process mating season 101) I will only talk about clone now. Similarly to fork, clone creates processes... wait..., what? Yes, the peculiarity of clone() is that the new child process has the possibility of sharing part of the execution context like the memory space (excluding stuff like, stack and CPU registers) with its parent process. Therefore this special feature of clone() is the one used to implement the concept of threads in Linux, basically several processes running in the same memory space.

To give a bit more of context to how clone() works, I think is worth mentioning the following concepts:

tgid: thread group ID, this ID represents the thread group a task belongs to. This ID is also known as PID, since kernel 2.4 getpid() function actually returns TGID
tid: thread ID, this is a unique identifier for a given task, clone() returns the TID of the newly created process/thread. You can obtain the current task TID by using gettid() function.
When clone() is called specifying CLONE_THREAD flag, the new task is created under the same TGID, a new unique TID is assigned to the thread (task). On the contrary, when clone() is called without CLONE_THREAD flag, the new task is placed in a new thread group whose TGID is the TID of the new task (remember TID is unique system-wide).

lets see this in a classic example you may find in your system right now:

juan@test:~$ ps -eLF |grep "PID\|rsyslog"
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY          TIME CMD
syslog     568     1   568  0    4 65534  3040   1 18:41 ?        00:00:00 rsyslogd
syslog     568     1   570  0    4 65534  3040   0 18:41 ?        00:00:00 rsyslogd
syslog     568     1   571  0    4 65534  3040   1 18:41 ?        00:00:00 rsyslogd
syslog     568     1   572  0    4 65534  3040   0 18:41 ?        00:00:00 rsyslogd
juan      2358  1938  2358  0    1  3987  2168   0 19:53 pts/1    00:00:00 grep --color=auto PID\|rsyslog
juan@test:~$

Rsyslogd is our lovely syslog service and seems to be spawning 4 LWP (light weight process, aka threads). Please note that LWP is an alias for TID, and NLWP is the number of threads under the particular TGID which as we said before is the PID value as well :D, a bit tricky right?

Let me re phrase that putting some time context on it:

rsyslog's parent is process 1 as we can see from PPID column. This means at some point in time init process fork()ed itself (actually fork+execve :D), giving birth to a new task (rsyslogd) with TID 568 (LWP) which was placed under a new thread group identified by PID 568.
later on rsyslogd TID 568 decided more threads were necessary to take care of our logs and spawn a few more tasks (this time using clone(CLONE_THREAD)). This way three new threads were spawn TID 570, 571 and 572, all of them of course under the same thread group 568.

I hope it is more clear now, you can't said I haven't tried! If by any chance you don't believe any of this, about PID being TGID, and LWP being TID I'd strongly suggest to have a look at map ps.

Show me an example in C god damn it!!

Well... I have to be honest, I thought it would be as simple as it was for the previous post, but once again I was wrong xD. That said I came up with an extremely simple example anyways, just for the sake of writing some C and probing myself I'm not thaaaat lazy.

The example doesn't do anything meaningful, it only proves that threads can access to memory on the process that spawned them.

Here it goes the code, again is heavily commented:

#include <stdio.h>
#include <errno.h>
#include <linux/sched.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sys/syscall.h>

#define STACK_SIZE (1024*1024)
#define THREADS 5

int myFunction(void *arg);

int main(int argc, char *argv[])
{
        int aux, i, childtid, array[THREADS];
        char *stack_low, *stack_high;

        for(i=0;i<THREADS;i++)
        {
                *(array+i)=0;//Initializing array with zero, position by position, just for the fun of it

                //Allocate some bytes for the cloned child, since STACK
                // grows downward we want to know the last memory position
                // to use it on clone().
                stack_low=malloc(STACK_SIZE);
                stack_high=stack_low+STACK_SIZE;

                //When CLONE_THREAD is set, CLONE_SIGHAND and CLONE_VM have to be set as well, have a look at https://linux.die.net/man/2/clone.
                childtid=clone(myFunction,(void *)stack_high,CLONE_THREAD|CLONE_VM|CLONE_SIGHAND,(void *)(array+i));
                aux=errno;
                if(childtid == -1)
                {
                        printf("Clone failed: \"%s\"\n",strerror(aux));
                        return 2;
                }
        }
        sleep(2);//We just wait... synching the threads is a pain...

        //We print all the TID values stored in the array by every thread
        for(i=0;i<THREADS;i++)
        {
                printf("Spawned thread %d\n",*(array+i));
        }
        return 0;
}

// This function is used to run every new thread, it receives a pointer to the place where the thread should store its TID
int myFunction(void *arg)
{
        int *aux;
        aux=(int *)arg;
        *aux=syscall(SYS_gettid);//Stores the thread TID in the memory possition passed as argument.
        pause();
}

As you can see clone() requires a few more parameters than fork() in order to create life in your system. Essentially these:

A pointer to the function that will be the code the thread will execute. In the example the function is called myFunction (I'm great at naming stuff, right?) and all it does is to write the TID value of the thread to a position in memory that comes in the arg argument.
The next parameter is the stack, as I mentioned before the stack is one of those things that can't be shared among threads, simply because it keeps the current execution context, local variables, etc. So, you need to allocate a certain amount of memory to act as stack on the new thread, in this case I'm reserving 1024*1024 bytes, nothing fancy on the number, smaller numbers caused Segmentation faults xD. Another interesting thing here is that due to the way stack works, you have to pass the highest memory position as argument not the lowermost as usually happens.
As third parameter we have the flags that will define clone()'s behavior, since we want it to spawn a process with shared context I set CLONE_THREAD, and the other 2 (CLONE_VM and CLONE_SIGHAND) have to be there as well because of CLONE_THREAD (more details about that here).
Parameter number 4 is the argument that will be passed to the function in parameter number 1, in the case of this example is just a memory position where the thread should store the TID value.

So reading the code you can kind of understand what's going to happen:

5 threads should be spawned, unless something goes terrible wrong during the clone call.
Each thread should write its TID in a particular position on array[], for example the first spawned thread should write its TID in array[0], the second thread in array[1] and so on. NOTE: array[] is placed in the stack of the leader thread, yet threads can access it!
After writing the TID in that memory position the threads will just pause themselves to prevent them from exiting and terminating the rest of the threads, including the leader thread.
The leader thread after spawning the 5 threads just sleeps for 2 seconds waiting for the threads to hopefully execute, and then prints the results. Sending signals between threads turned out to be way harder than I thought so I gave up and went down the easy road, just wait for it.

No big deal, but now lets see if that works:

juan@test:~/clone_fork$ gcc -o clone_test_simplified clone_test_simplified.c
juan@test:~/clone_fork$ ./clone_test_simplified
Spawned thread 2934
Spawned thread 2935
Spawned thread 2936
Spawned thread 2937
Spawned thread 2938
juan@test:~/clone_fork$

indeed it did!!! (the fun part here is that you have no idea how many times I had to compile this stuff to make it work xD).

For the fun of it, I increased the sleep time and captured the output of ps as I did with rsyslogd before so we can have a look at the threads created and the other values. Here we have the output:

juan@test:~/clone_fork$ gcc -o clone_test_simplified clone_test_simplified.c
juan@test:~/clone_fork$ ./clone_test_simplified &
[1] 2945
juan@test:~/clone_fork$ ps -eLF |grep "PID\|clone"
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY          TIME CMD
juan      2945  1938  2945  0    6  2334   628   1 22:04 pts/1    00:00:00 ./clone_test_simplified
juan      2945  1938  2946  0    6  2334   628   0 22:04 pts/1    00:00:00 ./clone_test_simplified
juan      2945  1938  2947  0    6  2334   628   1 22:04 pts/1    00:00:00 ./clone_test_simplified
juan      2945  1938  2948  0    6  2334   628   0 22:04 pts/1    00:00:00 ./clone_test_simplified
juan      2945  1938  2949  0    6  2334   628   0 22:04 pts/1    00:00:00 ./clone_test_simplified
juan      2945  1938  2950  0    6  2334   628   0 22:04 pts/1    00:00:00 ./clone_test_simplified
juan      2952  1938  2952  0    1  3987  2268   1 22:04 pts/1    00:00:00 grep --color=auto PID\|clone
juan@test:~/clone_fork$ Spawned thread 2946
Spawned thread 2947
Spawned thread 2948
Spawned thread 2949
Spawned thread 2950

[1]+  Done                    ./clone_test_simplified
juan@test:~/clone_fork$

So we see now NLWP states there are 6 threads under TGID 2945, you can see how 5 LWP match with the ones printed by the binary right afterwards, these were the ones spawned by the leader thread TID 2945.

Wrap up

Processes in Linux are created by either fork() or clone() syscall, and not only they are called different but they provide different features as well. Fork will result in two identical copies of the original process, while clone (when used with CLONE_THREAD) will instead create what is usually called light weight process, that shares part of the execution context with its creator process.

The main difference between processes created with fork and "threads" created with clone is that threads share the same memory space and therefore communication between them is way easier, improving cache and TLB usage since page tables are shared. However you have to be careful when dealing with shared memory spaces and using proper locking mechanisms unless you want to have some headaches. On the contrary processes spawned from fork run in completely different an isolated memory spaces, therefore communication between them is more expensive (pipes, shared memory, other IPC mechanisms).

Bibliography

https://linux.die.net/man/2/clone
http://man7.org/linux/man-pages/man2/syscall.2.html
http://blog.man7.org/
http://www.google.com

Ningún detalle es anecdótico...

martes, 21 de febrero de 2017

Process mating season 102 - Fork and Clone

So what's the difference between clone and fork?

Show me an example in C god damn it!!

Wrap up

Bibliography

No hay comentarios:

Publicar un comentario