Saturday, November 06, 2021

__stack_chk_fail(): What It Means

Recently I had a “stack smashing” incident to debug at work. It turned out to be a little more complicated than the example I'm about to show you, but at the bottom it was the same. Here's a silly example program.
collin@collin-t450:~/stack-chk$ pr -tn smash.c
    1	#include <stdio.h>
    2	#include <string.h>
    3	
    4	/*
    5	 * Bad programming practice
    6	 */
    7	static void
    8	oops(char const *buf)
    9	{
   10		char local[10];		/* if strlen(buf) > 9, then */
   11		strcpy(local, buf);	/* this line could smash the stack. */
   12		printf("%s\n", local);
   13	}
   14	
   15	/*
   16	 * this provides a level of indirection.
   17	 */
   18	static void
   19	doit(char const *buf)
   20	{
   21		oops(buf);
   22	}
   23	
   24	int
   25	main(int argc, char **argv)
   26	{
   27		char *msg = "hi there";
   28		if (argc > 1 && argv[1] && *argv[1]) {
   29			msg = argv[1];
   30		}
   31		doit(msg);
   32		return 0;
   33	}
collin@collin-t450:~/stack-chk$ 
So, main calls doit, passing either a short string—“hi there”—or a string of indeterminate length provided on the command line.

In turn, doit passes that same string to oops, which blindly copies it into a fixed-length buffer, local (line 11). This is a very bad practice because strcpy can overrun the destination (i.e. it can write past the end of local) if the source string (buf) is too long.

We compile it like this:

collin@collin-t450:~/stack-chk$ make smash
cc -fstack-protector-all -Wall -Werror -g    smash.c   -o smash
collin@collin-t450:~/stack-chk$ 
That -fstack-protector-all says to insert the stack-protector (or stack checking) code into every routine. This is a really good idea, and you should always have it in your makefiles.

Now if we run the program with a short string, all is well, but if the string is longer than about 9 bytes, bad things happen:

collin@collin-t450:~/stack-chk$ ./smash
hi there
collin@collin-t450:~/stack-chk$ ./smash hello
hello
collin@collin-t450:~/stack-chk$ ulimit -c unlimited       ←so we can get a coredump in case of abort
collin@collin-t450:~/stack-chk$ ./smash good-morning
good-morning
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
collin@collin-t450:~/stack-chk$ 
What is “stack smashing”, and how does the code tell that it’s happened? Let’s run gdb on the crash dump and see.
collin@collin-t450:~/stack-chk$ gdb smash core
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
…copyright, GPL, hints, etc. here
Reading symbols from smash...done.
[New LWP 18026]
Core was generated by `./smash good-morning'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fcddd774535 in __GI_abort () at abort.c:79
#2  0x00007fcddd7cb508 in __libc_message (action=, 
    fmt=fmt@entry=0x7fcddd8d607b "*** %s ***: %s terminated\n")
    at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fcddd85c80d in __GI___fortify_fail_abort (
    need_backtrace=need_backtrace@entry=false, 
    msg=msg@entry=0x7fcddd8d6059 "stack smashing detected") at fortify_fail.c:28
#4  0x00007fcddd85c7c2 in __stack_chk_fail () at stack_chk_fail.c:29
#5  0x0000556ce7f921a4 in oops (buf=0x7ffcbf978577 "good-morning") at smash.c:13
#6  0x0000556ce7f921cd in doit (buf=0x7ffcbf978577 "good-morning") at smash.c:21
#7  0x0000556ce7f9224d in main (argc=2, argv=0x7ffcbf977e68) at smash.c:31
(gdb) 
Right. gdb’s “bt” command displays a backtrace; the above shows main calling doit calling oops, which called __stack_chk_fail. The numbers on the left are the frame numbers at the time the crash dump was taken.

I’ll belabor the maybe-obvious for a bit before continuing. Each frame is a record of where the caller expects to resume execution, when/if the callee returns; that is, the caller’s return-address is pushed onto the stack and then the machine begins executing the callee, in the new frame.

Let's see how __stack_chk_fail was called.

(gdb) f 5
#5  0x0000556ce7f921a4 in oops (buf=0x7ffcbf978577 "good-morning") at smash.c:13
13	}
(gdb) disass oops
Dump of assembler code for function oops:
   0x0000564f056b5155 <+0>:	push   %rbp
   0x0000564f056b5156 <+1>:	mov    %rsp,%rbp
   0x0000564f056b5159 <+4>:	sub    $0x30,%rsp
   0x0000564f056b515d <+8>:	mov    %rdi,-0x28(%rbp)
   0x0000564f056b5161 <+12>:	mov    %fs:0x28,%rax       put magic value → %rax
   0x0000564f056b516a <+21>:	mov    %rax,-0x8(%rbp)     stash %rax; → %rbp-8  
   0x0000564f056b516e <+25>:	xor    %eax,%eax
   0x0000564f056b5170 <+27>:	mov    -0x28(%rbp),%rdx
   0x0000564f056b5174 <+31>:	lea    -0x12(%rbp),%rax
   0x0000564f056b5178 <+35>:	mov    %rdx,%rsi
   0x0000564f056b517b <+38>:	mov    %rax,%rdi
   0x0000564f056b517e <+41>:	callq  0x564f056b5030 <strcpy@plt>
   0x0000564f056b5183 <+46>:	lea    -0x12(%rbp),%rax
   0x0000564f056b5187 <+50>:	mov    %rax,%rdi
   0x0000564f056b518a <+53>:	callq  0x564f056b5040 <puts@plt>
   0x0000564f056b518f <+58>:	nop
   0x0000564f056b5190 <+59>:	mov    -0x8(%rbp),%rax                          fetch saved magic value
   0x0000564f056b5194 <+63>:	xor    %fs:0x28,%rax                            xor vs real magic
   0x0000564f056b519d <+72>:	je     0x564f056b51a4 <oops+79>                 jump if saved still matches real magic
   0x0000564f056b519f <+74>:	callq  0x564f056b5050 <__stack_chk_fail@plt>    saved value got corrupted; abort
=> 0x0000564f056b51a4 <+79>:	leaveq 
   0x0000564f056b51a5 <+80>:	retq   
End of assembler dump.
(gdb) 
The “=>” in the left-hand margin shows what we were about to execute in the frame—that is, the return point from calling __stack_chk_fail. But how did we decide to call it?

Let's go back to the beginning of oops. At the <+12> location, we move %fs:0x28 into %rax. What is %fs:0x28? I'm deducing from the usage that it holds a magic value which we store into %rbp-0x8, uh, I mean -0x8(%rbp)—at <+21>.

Then, at <+59>, we read -0x8(%rbp) into %rax; we xor it with %fs:0x28 at <+63>. If they are equal, the xor at +63 will set %rax to zero; then the je (“jump if equal”) at +72 sends us to the leaveq instruction. But if they are not equal, we call __stack_chk_fail.

To summarize, then, at the beginning of the routine, we store %fs:0x28 into %rbp-0x8; just before returning, we load the (64-bit) word in %rbp-0x8 and compare it to %fs:0x28. If it matches, we’re good, but if not, we call __stack_chk_fail. This stack checking code is inserted into every function—provided that

  • you use the compiler option -fstack-protector-all
  • the function can return (i.e., it doesn’t consist only of a no-break, no-return infinite loop)
  • the function call isn’t optimized out by the optimizer (e.g., compiled with -O0, or function isn’t declared static)

So what is at %rbp-0x8 here?

(gdb) x/xg $rbp-0x8
0x7ffcbf977d18:	0x88a84adec300676e
(gdb) 
Alert readers may note that the low-order 3 bytes of the above (i.e., the 00676e) turn out to match the tail end of the string provided on the command line: “ng\0”; this is an effect of a bad programming practice: we wrote into a 10-byte buffer, but we wrote more than 10 bytes!
(gdb) info locals
local = "good-morni"
(gdb) p sizeof local
$1 = 10
(gdb) x/s local
0x7ffcbf977d0e:	"good-morning"
(gdb)
So by writing past the end of the 10-byte buffer “local[]”, we stomped (with “ng\0”) on the magic value used for stack check.

Now let’s have a look at the value(s) of %fs:0x28 stored elsewhere, starting one level “up,” that is, with oops’s caller:

(gdb) up
#6  0x0000556ce7f921cd in doit (buf=0x7ffcbf978577 "good-morning") at smash.c:21
21		oops(buf);
(gdb) x/8i doit
   0x556ce7f921a6 <doit>:	push   %rbp
   0x556ce7f921a7 <doit+1>:	mov    %rsp,%rbp
   0x556ce7f921aa <doit+4>:	sub    $0x20,%rsp
   0x556ce7f921ae <doit+8>:	mov    %rdi,-0x18(%rbp)
   0x556ce7f921b2 <doit+12>:	mov    %fs:0x28,%rax       put magic value → %rax
   0x556ce7f921bb <doit+21>:	mov    %rax,-0x8(%rbp)     stash %rax; → %rbp-8
   0x556ce7f921bf <doit+25>:	xor    %eax,%eax
   0x556ce7f921c1 <doit+27>:	mov    -0x18(%rbp),%rax
(gdb) x/xg $rbp-8
0x7ffcbf977d48:	0x88a84adec3f40c00
(gdb) 
Now let's try one more.
(gdb) up
#7  0x0000556ce7f9224d in main (argc=2, argv=0x7ffcbf977e68) at smash.c:31
31		doit(msg);
(gdb) x/8i main
   0x556ce7f921e4 <main>:	push   %rbp
   0x556ce7f921e5 <main+1>:	mov    %rsp,%rbp
   0x556ce7f921e8 <main+4>:	sub    $0x20,%rsp
   0x556ce7f921ec <main+8>:	mov    %edi,-0x14(%rbp)
   0x556ce7f921ef <main+11>:	mov    %rsi,-0x20(%rbp)
   0x556ce7f921f3 <main+15>:	mov    %fs:0x28,%rax       put magic value → %rax
   0x556ce7f921fc <main+24>:	mov    %rax,-0x8(%rbp)     stash %rax; → %rbp-8  
   0x556ce7f92200 <main+28>:	xor    %eax,%eax
(gdb) x/xg $rbp-8
0x7ffcbf977d78:	0x88a84adec3f40c00
(gdb) 
Now compare the above vs. what we had in $rbp-8 in frame 5:
0x7ffcbf977d78: 0x88a84adec3f40c00 ← frame 7
0x7ffcbf977d48: 0x88a84adec3f40c00 ← frame 6
0x7ffcbf977d18: 0x88a84adec300676e ← frame 5
Identical except for the low-order 3 bytes. The value of %fs:0x28 stored by main and doit match; the value in oops doesn’t. And that’s how the program knew there really was stack smashing.

A few more points

  • The stack checking code doesn’t always catch overruns. It did in this case because the variable named local was immediately below (i.e., lower memory address) the spot where the magic value was stashed away, and we overran local by a few bytes. But if we did something nastier in oops, like
    local[59] = 'x';
    then oops’s magic value would not have been disturbed. Probably doit’s magic value would have been detectably corrupted, and the backtrace would have shown doit, not oops, calling __stack_chk_fail
  • If local had been allocated via malloc(3) with that size, rather than being an on-stack variable, buffer overruns might be detected by bug-catching code in malloc or free, rather than code surrounding a call to __stack_chk_fail.
  • As alluded to earlier, if function(s) are declared static and the file is compiled with optimization, the corruption may occur in an “interior” or lower-level routine (a callee of a callee of…) but the stack-checking code may be present in only the caller. This is in fact what happened when I added “-O2” to the compilation command for smash.c
    collin@collin-t450:~/stack-chk$ cc -fstack-protector-all -Wall -Werror -g -O2   smash.c   -o smash
    collin@collin-t450:~/stack-chk$ ./smash good-morning
    good-morning
    *** stack smashing detected ***: <unknown> terminated
    Aborted (core dumped)
    collin@collin-t450:~/stack-chk$ gdb smash core
    ...
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from smash...done.
    [New LWP 16201]
    Core was generated by `./smash good-morning'.
    Program terminated with signal SIGABRT, Aborted.
    b#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
    (gdb) bt
    #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    #1  0x00007f3d06aea535 in __GI_abort () at abort.c:79
    #2  0x00007f3d06b41508 in __libc_message (action=, 
        fmt=fmt@entry=0x7f3d06c4c07b "*** %s ***: %s terminated\n")
        at ../sysdeps/posix/libc_fatal.c:181
    #3  0x00007f3d06bd280d in __GI___fortify_fail_abort (
        need_backtrace=need_backtrace@entry=false, 
        msg=msg@entry=0x7f3d06c4c059 "stack smashing detected")
        at fortify_fail.c:28
    #4  0x00007f3d06bd27c2 in __stack_chk_fail () at stack_chk_fail.c:29
    #5  0x000055a71abc70d9 in main (argc=<optimized out>, argv=)
        at smash.c:32
    (gdb)     

Thursday, November 04, 2021

Animations with gifsicle... plus complications

Some years ago, I discovered the excellent gifsicle, which I've used a few many times to create animated GIFs, like this sunrise at Zabreski Point. Today I did a slightly(?) more difficult one, made harder because I was snapping away with a phone, and about half-way through I changed the phone's orientation, switching from “portrait” (tall) to “landscape” (wide) images. It also didn't help that I know only the basics of image files. The good news, though, is that Imagemagick’s amazing convert(1) program has every capability we need to give gifsicle what it needs for a reasonable-looking animation. But I’m getting ahead of myself.

At first, I did what came naturally: downloaded the photos, converted them all to "gif"s, and told gifsicle to create the animated gif. Well, it was a disaster. Not a real disaster (no animals were harmed), but the animation started out in portrait mode and then switched to… Bad. The original files didn't tell me that though!

collin@collin-t450:/tmp/iCloud Photos$ file IMG_*.JPG | sed -e 's/Exif.*one 6,//' -e 's/xresolution.*precision 8,//'
IMG_0733.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0734.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0735.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0736.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0737.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0738.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0739.JPG: JPEG image data,  orientation=upper-right,  3264x2448, components 3
IMG_0740.JPG: JPEG image data,  orientation=upper-left,  3264x2448, components 3
IMG_0741.JPG: JPEG image data,  orientation=upper-left,  3264x2448, components 3
IMG_0742.JPG: JPEG image data,  orientation=upper-left,  3264x2448, components 3
IMG_0743.JPG: JPEG image data,  orientation=upper-left,  3264x2448, components 3
IMG_0744.JPG: JPEG image data,  orientation=upper-left,  3264x2448, components 3
collin@collin-t450:/tmp/iCloud Photos$ 
The “sed ...” removes a bunch of “TIFF image data...” stuff that was the same for all the files; I wanted the above output more readable. The thing I want you to notice is that all the image files are supposedly 3264x2248 pixels. Now there is a clue in the “orientation=” stuff, but as I said, I know only very basic stuff about these things.

Now part of my process (i.e., when doing what came naturally) was to convert these JPEG files to GIFs. I think I did something like

for F in IMG*JPG; do convert $F ${F%.JPG}.gif; done
After that, the switch from portrait to landscape became more obvious:
collin@collin-t450:/tmp/iCloud Photos$ file IMG*gif
IMG_0733.gif: GIF image data, version 89a, 2448 x 3264
IMG_0734.gif: GIF image data, version 89a, 2448 x 3264
IMG_0735.gif: GIF image data, version 89a, 2448 x 3264
IMG_0736.gif: GIF image data, version 89a, 2448 x 3264
IMG_0737.gif: GIF image data, version 89a, 2448 x 3264
IMG_0738.gif: GIF image data, version 89a, 2448 x 3264
IMG_0739.gif: GIF image data, version 89a, 2448 x 3264
IMG_0740.gif: GIF image data, version 89a, 3264 x 2448    ←landscape begins here
IMG_0741.gif: GIF image data, version 89a, 3264 x 2448
IMG_0742.gif: GIF image data, version 89a, 3264 x 2448
IMG_0743.gif: GIF image data, version 89a, 3264 x 2448
IMG_0744.gif: GIF image data, version 89a, 3264 x 2448
collin@collin-t450:/tmp/iCloud Photos$ 

Now the good news is that when I switched from portrait to landscape, Sheri's head remained roughly centered and about the same distance from the top of the image. To cut to the chase, I wrote a shell “one-liner” like this:
collin@collin-t450:/tmp/iCloud Photos$ for F in IMG_07*gif; do NEW=new${F#IMG_}; if file $F | grep "3264 x"; then C='2448x2448+408+0!' ; else C='2448x2448+0+0!'; fi; convert $F -crop $C +repage -resize 50% -remap IMG_0733.gif $NEW; done
which I'll explain tersely because it's time to go do something with the lovely Carol.

  1. NEW=new${F#IMG_} makes $NEW to be $F except with new replacing IMG_. So new0733.gif for example
  2. we check for landscape (those 3264 x 2448 images), and change the cropping parameter to fit the orientation of the original; that's what the C= stuff is.
  3. we need +repage to make the “canvas size” fit the image boundaries. No, I don't really know what that means. But if I don't do it, everything looks weird.
  4. -remap is so that all the new files will share the same colormap. Because gifsicle requires that.
then to make the animation,
collin@collin-t450:/tmp/iCloud Photos$ gifsicle -d 20 new*gif -o foo.gif
you can see the result on facebook if you’re Sheri’s “friend” there.