- Here's a cat program (call it cat1.c) written
in C that aims to emulate the UNIX cat command.
#include <stdio.h>
main()
{
char c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
This program uses the getchar and putchar functions
defined in stdio.h (i.e., they are not part of the
standard C language) to read from standard input and write to standard
output.
- Here's a different program (cat2.c) that uses
fread and fwrite calls (again, part of
stdio.h) to achieve the same effect.
#include <stdio.h>
main()
{
char c[1];
int i;
i = fread(c, 1, 1, stdin);
while(i > 0) {
fwrite(c, 1, 1, stdout);
i = fread(c, 1, 1, stdin);
}
}
To learn more about these functions, just use man.
- Finally, here's a third variant (cat3.c) that does not use
any library functions but uses direct UNIX system calls:
main()
{
char c;
int i;
i = read(0, &c, 1);
while(i > 0) {
write(1, &c, 1);
i = read(0, &c, 1);
}
}
- Let us time these programs using the UNIX (bash) command
time, like so (well, first we have to compile them!):
gcc cat1.c -o cat1
time cat1 < bigfile > /dev/null
gcc cat2.c -o cat2
time cat2 < bigfile > /dev/null
gcc cat3.c -o cat3
time cat3 < bigfile > /dev/null
Here, bigfile is a really big file, created
by us for testing purposes. For each cat program,
we are reading from standard input (redirected through the
< operator) and shunting the output to a black hole
(/dev/null). You will notice that the last
program performs the slowest even though it is using
UNIX system calls directly! Whereas the other functions
are library functions that indirectly invoke the system
calls, and yet they function much more rapidly! How do
we explain this anomaly?
- There are two reasons. First, when you execute system
calls, although they are powerful, we are doing context switching
between the program and the operating system, since system
calls are executed directly by the OS. This introduces a lot
of overhead. Second, we are reading only one character at a time.
We could, on the other hand, read multiple characters at a time,
hence buffering the reading. So, here are two programs
(cat4.c and cat5.c) that do buffering and that
also take an extra argument to indicate the size of the buffer.
First, cat4.c:
#include <stdio.h>
#include <malloc.h>
main(int argc, char **argv)
{
int bufsize;
char *c;
int i;
bufsize = atoi(argv[1]);
c = malloc(bufsize*sizeof(char));
i = 1;
while (i > 0) {
i = read(0, c, bufsize);
if (i > 0) write(1, c, i);
}
}
Then, cat5.c:
#include <stdio.h>
#include <malloc.h>
main(int argc, char **argv)
{
int bufsize;
char *c;
int i;
bufsize = atoi(argv[1]);
c = (char *)malloc(bufsize*sizeof(char));
i = 1;
while (i > 0) {
i = fread(c, 1, bufsize, stdin);
if (i > 0) fwrite(c, 1, i, stdout);
}
}
Try running these programs with small sizes for buffers and then
slowly increasing the buffer size (in bytes). E.g.,:
gcc cat4.c -o cat4
time cat4 1 < bigfile > /dev/null
time cat4 64 < bigfile > /dev/null
time cat4 4096 < bigfile > /dev/null
...
gcc cat5.c -o cat5
time cat5 1 < bigfile > /dev/null
...
Notice that beyond a certain buffer size, the performance is
comparable.
- Now try time-ing the regular UNIX cat command.
What can you infer?