Refactoring for Smaller Code
Everyone loves to preach about refactoring. How you should refactor, why you should refactor, when you should refactor, etc. But, does anyone actually measure the effect of refactoring on their code? Some people do, but I'm willing to bet that a larger portion of the crowd don't. They worry about their trivial asthetic concerns and leave the compiler to its own devices in the hopes that it will magically produce better code.
Consider the case of developing an embedded application. It's one area where refactoring can substantially benfit or harm the end goal: to keep the code small and fast. With this post, I'm going to write very simple program, and then refactor it, measuring the results along the way.
Now, keep in mind the fact that optimization by the compiler may produce varied results with different compiler versions and may perform differently across different architectures. YMMV, but you can't deny that sometimes the compiler needs a helping hand. :P
Remember, the way you structure your code will impact how the compiler
generates and optimizes that code. For the record, the code and results
below were written and compiled on an x86_64
machine with the
following version of GCC:
$ gcc --version gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The Program
Our victim for this evening will be a very simple, and quickly implemented. "echo" server. You connect to it, and it echoes back whatever you send it.
Let's look at how big this program is:
$ gcc -O0 -Wall -o echo-O0 echo.c $ gcc -O2 -Wall -o echo-O2 echo.c $ gcc -Os -Wall -o echo-Os echo.c $ size echo-O* text data bss dec hex filename 3086 688 48 3822 eee echo-O0 2879 664 48 3591 e07 echo-O2 2775 664 48 3487 d9f echo-Os $ readelf -s echo-O* | grep ' FUNC ' | grep -v ' UND ' | grep main 60: 00000000004009fd 665 FUNC GLOBAL DEFAULT 12 main 56: 0000000000400860 608 FUNC GLOBAL DEFAULT 12 main 56: 0000000000400860 513 FUNC GLOBAL DEFAULT 12 main
| Optimzation Level | .text Size | functions Size | | ----------------- | ---------- | -------------- | | -O0 (None) | 3086 | 665 | | -O2 (Speed) | 2879 | 608 | | -Os (Size) | 2775 | 513 |
This will be our baseline measurement for comparing changes during our refactoring. Optimization Level refers to what level of optimization we instructed the compiler to use, .text Size is the size of the resulting binary's .text section where our code is stored. functions Size is the total size of all functions we've written.
Refactor #1: Move the socket setup code into a separate function
Some people would see the code above and start complaining that the
main()
function is too big. Well, let's move the socket setup
code into its own function and see what happens.
Now, the main()
function is smaller, thus we've acheived our asthetic
goal, right?
Now, let's measure the difference:
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3158 | 699 | +72 / +34 | | -O2 (Speed) | 2959 | 637 | +80 / +29 | | -Os (Size) | 2857 | 512 | +82 / -1 |
"WTF? It got bigger?", you say? Well, by God, it did. That extra function only saved us 1 byte with -Os! It cost us otherwise, and inflated the .text section. All functions come with a little bit of overhead, but this is a little excessive.
How can we avoid that? Simple, make add the static qualifier to the function. GCC's optimizer should then automatically inline this new function at -O2 or -Os, since it's static and only called from one place. But, how much difference, if any, does that make?
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3158 | 699 | +72 / +34 | | -O2 (Speed) | 2879 | 601 | 0 / -7 | | -Os (Size) | 2775 | 512 | 0 / -1 |
For -O0, there's no change. At -O2 we save 7 bytes in total, and the only difference with -Os is that we avoid bloating the .text section. The bottom line is that moving this code out into a separate function was not worth it.
Note that just specifying inline by itself will have no effect because the function is, by definition, global.
Refactor #2: Move the work loop into its own function
Since we've proven that moving the socket setup code is't worth it, let's try something that actually makes sense.
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3094 | 642 | +8 / -23 | | -O2 (Speed) | 2895 | 620 | +16 / -17 | | -Os (Size) | 2759 | 485 | -16 / -28 |
Now, there's a bit of savings on function size across the board, and the .text section only became marginally larger (smaller in the case of -Os) due to its alignment (16-bytes.)
Refactor #3: Make the work loop smaller
Building on our recent success, let's see if we can do better by
reducing the size of the work loop. We expect the work loop to run
at least once, so we'll replace the while
loop with a
do { } while()
loop to reflect that notion, for readability.
Then, we'll move all of the basic checks (e.g. checking for EOF,
checking for '\n', etc.) into the while(...)
expression.
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3094 | 636 | +8 / -29 | | -O2 (Speed) | 2839 | 567 | -40 / -41 | | -Os (Size) | 2767 | 497 | -8 / -16 |
Interestingly, this is worse at -Os but far better at -O0 and -O2.
Let's turn our attention back to the main()
function for now.
Refactor #4: Replace duplicate returns with goto
Going back to the original code, our program runs in a coninuous loop.
We never want it to exit. So, we can make EXIT_SUCCESS
(0) the
default return code (exiting is itself an error) and refactor all those
duplicate return
statements in main()
by using our old, and
often misunderstood friend: goto
.
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3038 | 625 | -48 / -40 | | -O2 (Speed) | 2823 | 559 | -56 / -49 | | -Os (Size) | 2759 | 486 | -16 / -27 |
Hell, look at that. Our smallest sizes yet! In fact, the gain for -Os is only 1 byte bigger than in Refactor #2. This means we've done something right.
The lesson to be learned here is two-fold:
- It's less work to return '0' instead of a non-zero value.
- Using
goto
instead of duplicatingreturn
produces a sizeable savings.
Refactor #5: Reduce the depth of nesting in main()
Let's see if our old pal goto
can help us by reducing the depth of
nesting in main()
. Too much nesting is often a readability nightmare,
but does it actually have any affect on GCC's optimizer? Let's replace
our while
loops and find out. I've also simplfied the work loop
here, since replacing the main while(1)
loop with goto doesn't
effect any change at all.
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3022 | 601 | -64 / -64 | | -O2 (Speed) | 2839 | 569 | -40 / -39 | | -Os (Size) | 2727 | 459 | -48 / -54 |
Interestingly, we see the largest gains in -O0 and -Os yet, although -O2 was slightly worse off. We're definitely on the right track!
Refactor #6: Move the work loop back out into its own function.
We got some gains from moving the work loop out before, and it's better for readability. Let's see if now, we gain anything from it.
| Optimzation Level | .text Size | functions Size | Deltas | | ----------------- | ---------- | -------------- | ----------- | | -O0 (None) | 3062 | 601 | -24 / -64 | | -O2 (Speed) | 2839 | 572 | -40 / -36 | | -Os (Size) | 2727 | 455 | -48 / -58 |
-O0 stays the same function-wise, but the .text section grew by 40 bytes, -O2 is worse by 3 bytes, and -Os improved by 4 bytes; with no change in section size.
Taking everything into account, I'd say this is the best variant so far. In preparing to write this blog entry, I experiemented with and tested over 25 different variations of our little echo server, to present the changes which had the most profound impact to you, the reader.
With this variant, we achieved code that is ~11% smaller with -Os, 9.62% smaller with -O0, and 5.9% smaller with -O2.
Conclusion
By measuring the effect of each refactoring step, it was simple to see that while some things may be better asthetically, they can have a profoundly negative impact on the generated code, if you're aiming for small code that is. The biggest lesson in that is that it's much more important to remove duplicate code than to appease purely asthetic concerns. You can also see that sometimes it doesn't really matter. ;)
Of course, this is a mere toy program, and there are tons of flags which can be passed to GCC to affect the behavior of the optimizer. I didn't want to get into all that here, but instead wanted to give an idea of how "suface-level" changes can effect the code generated by GCC. If I cared to generate a statically-linked executable, I'd go a little deeper into tweaking the amount of crap imported from libc, etc. But, I wanted to keep this short and simple.
Plenty of people would say that in this day and age, optimizing for size is an arcane art best left upto the hackers, tinkerers, and embedded developers of the world; and that the focus of refactoring efforts should be solely on readability and "coding standards."
I have to completely disagree with thier viewpoint. I say that it all depends on what your needs are. That's why you're writing code, isn't it? Because you need something. Money from your employer, a solution to an interesting problem, it all comes back to need. So, if you need small code, you should write code with that idea at the forefront of your mind.
By the way, if anyone has any ideas of how to continue slimming down this little program, beyond the obvious "rewrite it in assembly," feel free to share in the comment box below. :P