viernes, 19 de agosto de 2016

Strace 101 - stracing your stuff!

It's been a while since the last blog post now, like 4 months went by! I remeber once I thought I could write one post per week xD, that did not work at all hahaha. Anyway..., the last few days had the chance to spend some time working with strace and perf, so I decided to write something about it here as well.

Strace? What the heck is it?

Strace is a nice debugging/troubleshooting tool for Linux, that helps you identify the syscalls a particular process is using and the signals a proess receives (more details here). Syscalls are basically the interface to access to kernel services. Therefore knowing what particular syscalls a program is issuing and what the results of these calls are is really interesting when debugging some software problems.

But how is it possible that a User space process is able to see the syscalls another User space process issues? Well, there's a kernel feature called ptrace that makes that possible (yeah... ptrace is a system call xD). By definition ptrace is:

The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing. 

There are basically 2 ways you can strace a process to see its syscalls, you can either launch the process using strace or you can attach strace to a running process (under certain conditions :D).

How does it work? Show me the money!

I strongly believe the best way to explain something is by showing working examples, so there we go... Lets see how many syscalls (and which ones) the classic "Hello World" issues. The C code is:
#include <stdio.h>
int main()
{
     printf("Hello World\n");
     return 0;
}
Compile and run:
juan@juan-VirtualBox:~$ gcc -o classic classic.c 
juan@juan-VirtualBox:~$ ./classic 
Hello World
juan@juan-VirtualBox:~$ 
now we run it using strace instead:
juan@juan-VirtualBox:~$ strace ./classic
execve("./classic", ["./classic"], [/* 60 vars */]) = 0
brk(0)  = 0x7f0000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0362d17000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=85679, ...}) = 0
mmap(NULL, 85679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0362d02000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P \2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0362732000
mprotect(0x7f03628ec000, 2097152, PROT_NONE) = 0
mmap(0x7f0362aec000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f0362aec000
mmap(0x7f0362af2000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0362af2000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0362d01000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0362cff000
arch_prctl(ARCH_SET_FS, 0x7f0362cff740) = 0
mprotect(0x7f0362aec000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ) = 0
mprotect(0x7f0362d19000, 4096, PROT_READ) = 0
munmap(0x7f0362d02000, 85679) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0362d16000
write(1, "Hello World\n", 12Hello World
) = 12
exit_group(0) = ?
+++ exited with 12 +++
juan@juan-VirtualBox:~$
The ouput provides one system call per line, and each line includes the syscall name, the parameters used to invoke it and the result returned (after the =). As we can see the simplest piece of code you can imagine will actually make use of many syscalls. I will only comment on a few of them (probably in a next post we can dive deep there :D):
  • execve (the first one), just before this call strace forked itself and the child process called ptrace syscall allowing strace parent to trace him. So after all that, the process runs execve changing its running code to our classic C program.
  • brk changes the location of the program break, which defines the end of the process's data segment. Increasing it has the effect of allocating memory to the process; or decreasing the break deallocates memory. In this case is called with 0 as increment, which makes the call return the current program break (0x7f0000).
  • we see a couple of mmap syscalls mapping memory regions as anonymous and also some mapping library libc.so.6. This is the Dynamic Linker doing its job and adding all the necessary libraries to the process memory space.
  • there are also a few open syscalls, opening files like /etc/ld.so.cache where it can find a list of the available system libraries.
  • just before finishing we see a write syscall, sending our classic "Hello World" to file descriptor 1, also known as STDOUT (standard output). Since both strace and classic are sending the standard output to the console we can see how the colided in lines 29 and 30.
  • the last call in place was exit_group, it's the equivalent to exit syscall but it terminates not only the calling thread but all the threads in the thread group (this particular example was single threaded).
This should provide a fair idea of what a particular piece of software does and which kernel services it's accessing to. However, sometimes we don't want to go into so much details, but instead we would like to see a summary. We can easily get that with -c flag:
juan@juan-VirtualBox:~$ strace -c ./classic 
Hello World
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         1           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         2           open
  0.00    0.000000           0         2           close
  0.00    0.000000           0         3           fstat
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         4           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           brk
  0.00    0.000000           0         3         3 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    28         3 total
juan@juan-VirtualBox:~$ 
the output shows the list of syscalls issued by the process (and its threads if there were more than 1), the number of times each one was called, the number of times they failed, and the CPU time they consumed (kernel space time aka system time). With this output we can easily identify that 3 syscalls returned failed states, if we go back to the full strace output we'll see that lines 4, 6 and 11 returned ENOENT "No such file or directory".

I mentioned before that is also possible to strace an already running processes, so lets take a look at that. First we have to identify a process as target, in this case nc with PID 8739:
juan@juan-VirtualBox:~$ ps aux|grep -i nc
root       954  0.0  0.0  19196  2144 ?        Ss   01:33   0:04 /usr/sbin/irqbalance
juan      2055  0.0  0.2 355028  8396 ?        Ssl  01:33   0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher --launch-immediately
juan      8739  0.0  0.0   9132   800 pts/0    S    11:52   0:00 nc -l 9999
juan      2585  0.0  0.0  24440  1964 ?        S    01:33   0:00 dbus-launch --autolaunch 0c0058daf07f369dd9b0d1605654eff1 --binary-syntax --close-stderr
juan      9477  0.0  0.0  15948  2304 pts/2    R+   14:03   0:00 grep --color=auto -i nc
juan@juan-VirtualBox:~$ 
now lets try to attach strace to it in order to inspect the syscalls:
juan@juan-VirtualBox:~$ strace -p 8739
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
juan@juan-VirtualBox:~$ 
interesting :D. Turns out that by default Ubuntu doesn't allow ptrace_attach feature, why? we'll see an example later :P. Ptrace has different scopes (4 actually, 0 to 3, being 3 the most restrictive one):

A PTRACE scope of "0" is the more permissive mode.  A scope of "1" limits PTRACE only to direct child processes (e.g. "gdb name-of-program" and "strace -f name-of-program" work, but gdb's "attach" and "strace -fp $PID" do not). The PTRACE scope is ignored when a user has CAP_SYS_PTRACE, so "sudo strace -fp $PID" will work as before.

if we take a look at the current ptrace_scope value will see we have scope 1:
juan@juan-VirtualBox:~$ cat /proc/sys/kernel/yama/ptrace_scope
1
juan@juan-VirtualBox:~$ 
At this point we have two options, we either try with sudo or we enable scope 1 system wide by changing /proc/sys/kernel/yama/ptrace_scope (this might be dangerous). Lets go with sudo now:
juan@juan-VirtualBox:~$ sudo strace -p 8739
Process 8739 attached
accept(3, {sa_family=AF_INET, sin_port=htons(34404), sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
close(3) = 0
poll([{fd=4, events=POLLIN}, {fd=0, events=POLLIN}], 2, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
read(4, "Hello World through NC\n", 2048) = 23
write(1, "Hello World through NC\n", 23) = 23
poll([{fd=4, events=POLLIN}, {fd=0, events=POLLIN}], 2, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
read(4, "", 2048) = 0
shutdown(4, SHUT_RD) = 0
close(4) = 0
close(3) = -1 EBADF (Bad file descriptor)
close(3) = -1 EBADF (Bad file descriptor)
exit_group(0) = ?
+++ exited with 0 +++
juan@juan-VirtualBox:~$ 
It worked indeed! Lets do a brief review of the syscalls:

  • First an accept call. It extracts the first connection request on the queue of pending connections for the listening socket, 3, creates a new connected socket, and returns a new file descriptor referring to that socket , 4. The newly created socket is not in the listening state. The original socket 3 is unaffected by this call.
  • Then close call closes the listening socket 3.
  • The next call was poll, it waits for one of a set of file descriptors to become ready to perform I/O. In this case we can see it waits for fd 4 (the recent socket created due to the incoming connection) and fd 0 (the standard input).
  • Right after the poll we sea the read call, reading 23 bytes out of fd 4 using a 2048 bytes buffer.
  • After finishing reading, nc uses write to send the received bytes to fd 1, the standard output.
  • Then nc polls again for any extra data coming in, and this time the call returns empty as can be seen on the next read call returning 0. This poll is probably triggered by the connection being finished on the other side.
  • Shutdown call shuts down all or part of a full dupplex connection on a given socket, in this case the one pointed by fd 4. 
  • Then we have 3 close calls, the first one closes the file descriptor used by the socket created by the accept call, while the next two calls try to close a fd that has already been closed on line 4 (fd 3), which is kind of weird and could be a bug.

Why ptrace could be dangerous?

Usually debugging tools are like two edged swords, right? Well, ptrace is no exception to that rule. Having access to the interface between user space and kernel space of a process can leak some important information, like credentials.

Lets see an extremely simple example. My virtual machine has vsftpd 3.0.2 running, so lets capture the credentials of a system user that logs into the FTP service. In this case will set a few extra flags on strace in order to make things easier:

  • -f will Trace child processes as they are created by currently traced processes as a result of the fork system call.
  • -eread -ewrite are two filters to tell strace to only record read and write syscalls.
  • -o sets an output file where the syscalls will be recorded.
So lets strace:
juan@juan-VirtualBox:~$ sudo strace -f -e trace=read,write -o output -p $(pidof vsftpd)
Process 10040 attached
Process 10280 attached
Process 10281 attached
Process 10282 attached
^CProcess 10040 detached
juan@juan-VirtualBox:~$ 
we can see vsftpd forked a couple of times while we connected to it using the ftp client. Now lets take a look at the content of output file:
juan@juan-VirtualBox:~$ cat output 
10280 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\n\0\0\0\0"..., 4096) = 3533
10280 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\v\0\0\0\v\0\0\0\0"..., 4096) = 2248
10280 read(4,  
10281 read(4, "# /etc/nsswitch.conf\n#\n# Example"..., 4096) = 507
10281 read(4, "", 4096)                 = 0
10281 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\23\0\0\0\0\0\0"..., 832) = 832
10281 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240!\0\0\0\0\0\0"..., 832) = 832
10281 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\"\0\0\0\0\0\0"..., 832) = 832
10281 write(3, "Sat Aug 20 17:06:02 2016 [pid 10"..., 65) = 65
10281 write(0, "220 (vsFTPd 3.0.2)\r\n", 20) = 20
10281 read(0, "USER juan\r\n", 11)      = 11
10281 write(0, "331 Please specify the password."..., 34) = 34
10281 read(0, "PASS MyPassw0rd\r\n", 15)  = 15
10281 write(5, "\1", 1)                 = 1
10280 <... read resumed> "\1", 1)       = 1
10281 write(5, "\4\0\0\0", 4 
10280 read(4,  
10281 <... write resumed> )             = 4
10280 <... read resumed> "\4\0\0\0", 4) = 4
10281 write(5, "juan", 4 
10280 read(4,  
...
10282 write(0, "230 Login successful.\r\n", 23) = 23
10282 read(0, "SYST\r\n", 6)            = 6
10282 write(0, "215 UNIX Type: L8\r\n", 19) = 19
10282 read(0, "QUIT\r\n", 6)            = 6
10282 write(0, "221 Goodbye.\r\n", 14)  = 14
10282 +++ exited with 0 +++
10280 <... read resumed> 0x7ffe4994960f, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10280 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10282, si_status=0, si_utime=0, si_stime=0} ---
10280 +++ killed by SIGSYS +++
10040 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=10280, si_status=SIGSYS, si_utime=0, si_stime=1} ---
juan@juan-VirtualBox:~$ 
As we anticipated, we can see both username and password on lines 12 and 14. Ok, yes, this is FTP and we could have captured the keys also with a simple network capture, but this is just an example :D of how strace (ptrace actually) can be used to leak sensitive information.

I hope this was interesting, at least it was for me xD. Here I list some interesting links I found on the way:

http://linux.die.net/man/1/strace
http://linux.die.net/man/2/ptrace
http://man7.org/linux/man-pages/man2/syscalls.2.html
https://www.kernel.org/doc/Documentation/security/Yama.txt