Information Assurance
Bugtraq Analysis
Buffer Overflow Vulnerabilities in Gaim
back to bugtraq analyses page
|
Bugtraq Email: 12 x Gaim remote overflows
Link:
http://www.securityfocus.com/archive/1/351235
Gaim is a AOL Instant Messenger client written originally for Linux,
but which has since been ported to several other operating systems.
This Bugtraq advisory was of particular interest to me because I use
Gaim for instant messaging on my Linux machine. The sheer number of
vulnerabilities presented was quite disconcerting. Normally, in a
widely used program such as Gaim, one buffer overflow presents a large
potential problem, but the existence of 12 such bugs seems
unthinkable. In addition, the email provided a very good description
of the problems, such that it was quite easy to see why there was a
potential problem in each situation.
For my presentation, I focused on just one of the vulnerabilities
disclosed, which occurs in the gaim_url_parse() function. I will
present the problem, show how it could be exploited, explain how it
could be fixed, and discuss how it might be possible to prevent such
problems in the future.
Here is the relevant code snippet as presented in the email:
gboolean
gaim_url_parse(const char *url, char **ret_host, int
*ret_port, char **ret_path)
{
char scan_info[255];
char port_str[5];
int f;
const char *turl;
char host[256], path[256];
int port = 0;
/*hyphen at end includes it in control set */
static char addr_ctrl[] = "A-Za-z0-9.-";
static char port_ctrl[] = "0-9";
static char page_ctrl[] = "A-Za-z0-9.~_/:*!@&%%?=+^-";
...
g_snprintf(scan_info, sizeof(scan_info),
"%%[%s]:%%[%s]/%%[%s]", addr_ctrl,
port_ctrl, page_ctrl);
f = sscanf(url, scan_info, host, port_str, path); <-- [10]
...
The two lines of interest here are the calls to g_snprintf() and
sscanf(), the real problem being the call to sscanf(). The call to
g_snprintf() is really just setup for the call to sscanf(), and as
such it only really needs to be understood in terms of how it works
with the sscanf() call.
The sscanf() function is a standard C library routine that reads
formatted input from a string. Lets say that I have a string that
contains some input, lets say "Zico 20", and I want to parse this into
two data items, one with the string "Zico" and one with the number 20.
I would write code like this:
char name[256];
int age;
char *data;
...
sscanf(data, "%s %d", name, &age);
This may look a little complicated, but it's really quite simple. As
the first argument, sscanf() takes the input string it will parse.
The second argument is the format of the input string. "%s" denotes a
string in the input, while "%d" denotes an integer. Of course, there
are other several different formatting characters that define how the
function will parse the input. This call is like the C++ iostream
library call of:
cin >> name >> age;
except that it reads input from a string, not from the user's input at
the terminal.
Because a complex formatting string for the sscanf() instruction can
be unwieldy, the Gaim authors chose to break up the call into two
parts, one call to generate the formatting string, and the second to
parse the string. In this case, the call to g_snprintf() always
generates the same formatting string, and does not rely on any user
input. To be precise, the formatting string it generates will always
be:
"%[A-Za-z0-9.-]:%[0-9]/%[A-Za-z0-9.~_/:*!@&%%?=+^-]"
When used with the sscanf(), this format string will read any number
of alphanumeric characters, followed by a colon, followed by any
number of digits, followed by a slash, followed by any number of
alphanumeric characters and certain special symbols. So if it read
the string "www.cs.georgetown.edu:80/~clay", it would parse this into
three strings, "www.cs.georgetown.edu", "80", and "~clay".
The problem with this, however, occurs when the input is too big to
fit in the corresponding strings. In the gaim_url_parse() function,
the host variable is declared as an array of 256 characters. So what
happens if a malicious person manipulates the IM protocol so as to
send Gaim a url with a host that is longer than 256 characters? The
sscanf() function will continue reading the string, and write past the
end of the array. This will then start to overwrite other data on the
stack, such as other variables or strings declared in the function.
As the very least, a malformed string would probably corrupt the value
of other variables in the function.
The real risk of this kind of buffer overflow, however, is much more
serious. When you declare a local variable or array in a function,
the program reserves room for these local variables on the stack, a
section of memory at the end of a program's usable memory space. But
the stack is also used for storing the return addresses of functions.
When call a function in C, you jump to the new function, but also push
the address of the current instruction on to the stack, so that when
you return from the function, the processor knows where to continue
execution. However, this also means that if you overwrite data on the
stack, you're not only overwriting data, you're overwriting the return
address for the current function. So when the function finishes
execution, it won't return to the correct instruction. At the very
least, this can cause the program to jump to an incorrect address and
crash the program. But, if a clever sequence of bytes is written, it
possible to tell the program to jump to an address that we've
overwritten with our own instructions. We could then execute any code
on the remote machine that has equal access privileges as the program.
And, now that such exploits are well known, we don't even have to be
particularly clever in writing good code, as there is exploit code
freely available that will, for example, execute a shell on the remote
machine. This would effectively give us full access to that machine,
or at least however much access the person running Gaim has.
(For a better description of buffer overflows, see the link on the IA
website to the
"Smashing the Stack for Fun and Profit" article.)
There is some good news about this particular buffer overflow,
however. The setup of the formatting string only allows, at the most,
the characters "A-Za-z0-9.~_/:*!@&%%?=+^-" to be printed to the path
variable. This means that even if we can overwrite stack, we have to
overwrite it with just these characters, making it difficult to
exploit this overflow. However, even if we couldn't execute arbitrary
code, we could still easily crash the system, and the potential exists
that someone could devise malicious code that uses only these
characters.
So, now that we know the problem, what can be done about it? The
solution in this case is very simple. The sscanf() function, in
addition to specifying how to read the input, can also specify the
maximum number of characters to into any particular field of the
input. After the % character in the formatting string, we simply
write the maximum size of that input. So when we generate the
formatting string with the g_snprintf() call, all problems would be
solved by using the following code:
g_snprintf(scan_info, sizeof(scan_info),
"%%255[%s]:%%5[%s]/%%255[%s]", addr_ctrl,
port_ctrl, page_ctrl);
Or, if we didn't want to hardcode values into the code, we could just
use:
g_snprintf(scan_info, sizeof(scan_info),
"%%%d[%s]:%%%d[%s]/%%%d[%s]", sizeof(host) - 1,
addr_ctrl, sizeof(port_str), port_ctrl,
sizeof(path) - 1, page_ctrl);
Now, if a malicious person sends an oversized url to Gaim, the url
will just be truncated. No longer will this overwrite any data on the
stack.
So, in a more broad sense, what can be done about this problem? These
kinds of buffer overflows have been well understood for more than 10
ppyears, yet people still make the same programming mistake that allow
for such exploitation. The only real solution is educating
programmers to know that when you make any call to scanf() or sscanf()
that reads input to a string, you have to specify a maximum size for
the string.
On the plus side, it does seem like progress is being made in this
direction. Look at the call to g_snprintf(). The function snprintf()
in this case g_snprintf() is just Gaim's own version of the function
is well known as a safe replacement to sprintf(). The sprintf()
function is kind of the opposite of the sscanf() function: it writes
formated data to a string. But, just like before, if the length of
the string we're going to write is bigger than the size of the array
to which it is being written, it can overflow the stack and lead to
the same kind of exploit. So it has become a somewhat
well-established principle of good programming that instead of using
the sprintf() function, you should always use the snprintf() function,
which allows you to specify a maximum length of the output string,
such that it will never write data beyond this length. And that is
exactly what the authors of Gaim did. The irony is that the call to
g_snprintf() is not based on user input, and therefore there is no
potential to overwrite the stack. The call will always generate the
same string, which will always have the same length, and a malicious
person could not exploit this to execute arbitrary code on the system.
But despite this, the Gaim authors still use g_snprintf(). And
although it's unnecessary, I would not say this is a bad thing though,
as it is just reinforcing the habit that programmers should use this
function instead of it's less-safe counterparts. All that needs to
happen, then, is that programmers need to develop the same mindset
about specifying maximum string lengths when using scanf() or
sscanf(). If we can do this, then we go a long way towards decreasing
the number of buffer overflow exploits present in our applications.
|