==============================================================================
C-Scene Issue #2
CGI in C a starters tutorial
Name: Brent York
Handle: The Dragon (ThaDragon on IRC)
Email: york@nbnet.nb.ca
Affils: Coder for Nuclear Winter Entertainment
Coder for Spheringer technologies.
Coder for Henry and York Development "Development with thought."
Editor of Cscene Magazine
Organizer of the OS Developers Information Network (ODIN).
Desc: CGI in C a starters tutorial
==============================================================================
Many web developers haven't really experienced coding CGI, and if they
have its usually in a language such as perl. The nice thing about CGI is
its interface is common (hence Common Gateway Interface), and therefore
its a standard accross all languages.
CGI requires that you use a language that writes to STDOUT, which means
that pretty much any language can be used for writing CGI, but for fast
execution C is IDEAL for cgi, and fast execution is usually a must for
small or large databases, and other types of CGI.
In this article I will give you a background on CGI, apply that
background to C, and then walk you through developing a CGI application.
The CGI application we will develop will take 2 forms, a text based counter,
and a program that shows you how to get input from the HTTPD, so you can use it
in your program.
So without further ado, lets get on with the article.
=============================================================================
The common gateway interface (CGI)
==================================
The common gateway interface, or CGI as people know it, is a
standard put forth by NCSA for allowing more interaction with webpages.
All programs are run on the server side, unlike java, and prettymuch any
language can be used for developing CGI.
The CGI interface for output merely consists of a Content type
line, that contains a MIME type/subtype description followed by two
newlines that alerts the browser of whats coming to it. The CGI interface
as it applies to input consists of a set of environment variables which can be
retrieved and used by the CGI application, thereby allowing us to get
information from the client.
This probably means nothing to you, but as we go on it will.
===============
Output with CGI
===============
CGI output is quite simple. It consists of a Content type line
with a MIME type/subtype that alerts the HTTPD of what type of data is
coming for it to parse and return the proper data to the client
so it can use it.
However, on the CGI end of things, the type/subtype you output
has to be adhered to quite strictly. This means that if you tell it youre
going to send it text/plain youre sending plain text... much like
text/html expects plaintext or HTML.
MIME types/subtypes are as follows:
text/plain
text/html
image/gif
image/jpeg
There are others but I wont relay them here as these are the ones we are
concerned with.
There are many headers that are valid to CGI and they are as follows:
Content-type - A header used to tell the HTTPD what type of data to
expect so it can parse it and output things properly.
Example: "Content-type: text/plain"
Location - A header used to refer the HTTPD to another location for
the proper document, often used in things like Microsoft's
pull down page referral lists etc.
Example:
"Location: http://dwi.netc.com/CScene/CS1/CS101.html"
Content-length: - A header used to tell the HTTPD the size of the data
being sent to it.
Example: "Content-length: 1024"
Expires - used to tell the HTTPD only to show the data if its earlier
than a certain day of the week/month/day/year at a certain
Hour/Minute/Second based at GMT in a 24 hour format.
Example: "Expires: Tuesday, 02-may-12 24:00:00 GMT"
Content-encoding - Used to specify the encoding of a document, valid
values for this are x-gzip (.gz), x-compress (.z) and
x-zip (.zip).
Example: "Content-encoding: x-gzip"
Note all of these are followed by two newlines (\n\n).
The only headers we will be occupied with in this article are text/html
and text/plain, which are used for our counter and which will also be used for
our variable display program.
So, basically, the CGI output consists of a header, two newlines
and your appropriate data, and thats it. Simple eh?
Example:
sayhi.c /* Compile to sayhi.cgi */
==================================================
int main(void) {
printf("Context-type: text/html\n\n");
printf("Hi from the CGI!\n");
return 0;
}
This simply adds "Hi from the CGI!" into the HTML code. You could also
add text manipulation tags, infact you could add your entire page as the
CGI, however it would be a waste of time to do so.
So continuing on, youve now got a small grip on CGI and how it
works, so lets describe the input.
============================================================
Input with the environment with the common gateway interface
============================================================
Input with CGI usually (but not always) requires environment
variables, which can be gotten with a call to getenv() for each
environment variable you want.
There are quite a few variables to choose from, each giving you
valuable information that you might be able to use to your advantage.
We will cover each of them here, and possibly some you might not find in
NCSA's own documentation (woo!). This is mainly because we will be using
them all in our program to display them.
I wont go into how getenv() works you can check your helpfile,
C-Book or manpage for that. I will however list the variables that you
can access:
SERVER_SOFTWARE - This obviously holds the software name and version of the
server you are running on. For example "NCSA 1.0"
SERVER_NAME - This holds the servers hostname, DNS alias, or IP address,
as it would appear in self referencing URLs.
GATEWAY_INTERFACE - This holds the revision of the CGI specification to which
this server complies and understands. Format is
CGI/revision.
SERVER_PROTOCOL - This holds the name and revision of the information
protocol this request came in with.
Format is protocol/revision
SERVER_PORT - The port the server listens on for connections, usually
80, but its best to check this if your CGI relys on it
because it doesn't have to be.
REQUEST_METHOD - The method with which the request was made, for the HTTP
protocol, this is "GET", "HEAD", or "POST".
PATH_INFO - Scripts can be accessed as thier virtual pathname,
followed by extra information at the end of this path.
The extra information is sent as PATH_INFO. This
information should be decoded by the server if it comes
as a URL before it is passed to the CGI script.
PATH_TRANSLATED - The server provides a translated version of PATH_INFO,
which takes the path and does any virtual to physical
mapping to it. It is then stored in this environment
variable.
SCRIPT_NAME - This is a virtual path to any script being executed, used
for self referencing URLs.
QUERY_STRING - Any information following a ? in the URL which
referred to this script. It should not be decoded in any
fashion when it gets to you, which means of course youll
have to decode it. This is *GREAT* for search engines ;}.
REMOTE_HOST - This holds the address of the remote host which is
the host of the person calling the script.
If the server doesn't have the information this is NULL
and REMOTE_ADDR is set instead with its IP.
REMOTE_ADDR - The ip of the remote address making the request.
AUTH_TYPE - If the server supports authentification, and the script
is protected this is the protocol specific method
used to validate the user.
REMOTE_USER - If the server supports authentification, and the
script is protected this is the username they have
authenticated as.
REMOTE_IDENT - If the server supports RFC 931 identfication
protocol, then this variable will be set to the name of
the user that it retrieved from the remote host.
Usage of this variable should be limited to logging only
and is not suggested for authentification purposes as
identification can be faked easily.
CONTENT_TYPE - For queries which have attached information such as "HTTP"
"POST" and "PUT", this is the content type of the
data, usually its text/plain.
CONTENT_LENGTH - For queries which have attached information such as "HTTP"
"POST" and "PUT", this is the content length of the
data.
HTTP_ACCEPT - The MIME types which the client will accept, as given by
the HTTP headers. Each item in this list is seperated by
commas.
HTTP_USER_AGENT - The browser the client is using to send the request.
General format is software/version library/version, but
it can prettymuch be anything.
HTTP_REFERER - The URL of the document that refered you to the script.
This of course will be nothing if you happen to just
access the script instead of accessing it from an html
document.
These are all variables which return the information that you have in the
description of them all through the use of getenv().
Above and beyond that, for input HTTPD has its own encoding, which you
have to handle yourself, its pretty simple to handle and only has a few
quirks, basically you dont need to know any of this until you cover forms
and cgi, which will be my next installment of this document in the next
CScene. The reason I leave it till the next CScene is because I dont have
the time to cover forms in this document, there is a HUGE plethora of
information that involves forms and its a tutorial in itself =}.
===========================================================
C programming as it applies to the common gateway interface
===========================================================
C programs are EXCELLENT for CGI because its a fast compiled
language, and doesn't take up as much ram as perl or other programming
languages. Above and beyond that C programs allow for a bit of security
in that they are compiled and someone cant swipe your CGI program (which
is possible).
So as it applies to CGI, C is an excellent way to go, C is
completely capable of handling CGI output and input, although its
sometimes harder in C to handle input, but its never a complete and total
disaster.
C is also perfect because it can open binary files and print the
data from them to stdout, which is EXTREMELY useful when making things
that involve picture based counters.
==============================================
Developing A text based C/CGI counter program.
==============================================
In developing a text based CGI counter program we will encounter
a few quirks of CGI and therefore its a great way to start programming in it.
The first quirk is that with text you cant have its output on the
end of the page without SSI (Server Side Includes), This is fine however
as we will write it and put it in an A HREF and access it as a link, so
you can see it work.
So basically the sequence we want is to print the header out to STDOUT
with printf, followed by two newline characters. We are printing out text
so we will go with text/plain so theres no translation by the server.
Other things we need are file opening input, and output, as well
as a way to increment the counter. This should be left to you as you
should know C and or C++ well enough before you ever tackle CGI. Protocol
interfacing is *NEVER* a task for a newbie.
So without further ado, tcount.c
----------->8 Cut 8<----------tcount.c ------------>8 Cut 8<------------
#include <stdio.h> /* Standard IO routines */
int main(void) {
FILE *data_ptr;
int count;
/* Print the header */
printf("Content-type: text/plain\n\n");
/* Open the data file if you cant say so and exit with errlevel 1 */
if (!(data_ptr=fopen("tcount.dat","r"))) {
printf("Error opening tcount.dat for reading!\n");
printf("Error 001: Exiting.\n");
return 1;
} else { /* Obviously the datafile
opened fine so read it. */
fscanf(data_ptr,"%i\n",&count);
printf("%i\n",++count); /* print the counter and */
fclose(data_ptr); /* increment */
/* Open the same file for writing, if you cant say so and exit with
errorlevel 2. */
if (!(data_ptr=fopen("tcount.dat","w"))) {
printf("Error opening tcount.dat for writing!\n");
printf("Error 002: Exiting.\n");
return 2;
} else /* You got write access */
fprintf(data_ptr,"%i\n",count); /* write the new access count
to the datafile */
}
return 0; /* Exit without error */
}
This is about as simple as a counter cgi gets. Compile it with:
gcc -o tcount.cgi tcount.c (or whatever is appropriate for your compiler).
Then comes the fun part of setting it up, chmod it with execute values
globally and make a datafile that says "0" with a newline call it
tcount.dat and chmod it with rw values globally. Stick both in your
cgi-bin dir (whereever that may be) and use the following HTML to test it.
<!-- Test for tcount.cgi -->
<Html>
<Body>
<A Href="/cgi-bin/tcount.cgi">See the counter!</A>
</Body>
</Html>
It should work perfectly =}.
And now for input using CGI, isn't this document wonderful ? ;}
========================================
Input using the Common Gateway Interface
========================================
Given the above variables, we can get some input from the user
for things like search engines, forms, and a few other things. We wont go
into forms right now so basically we wont get into the GET, or POST way
of things.
All we will cover is how to get the variables and print them from
a CGI program. But with this you could use the QUERY_STRING variable with
your graphical counter CGI to implement things like number sets etc...
this will be an excercise for you, I wont cover using the QUERY_STRING
variable with the counter, but I will cover it with the CGI program Im
going to write.
Basically we will be outputting a Content-type: text/plain header and
then outputting the names of the environment variables and what they contain.
We will however use the QUERY_STRING in our html so you can see whats
going on with it =}.
So basically heres our program
-------->8 Cut 8<------- showvars.c ------->8 Cut 8<-------
#include <stdio.h>
char evars[20][80]={"SERVER_SOFTWARE", "SERVER_NAME", "SERVER_PROTOCOL",
"SERVER_PORT",
"GATEWAY_INTERFACE", "REQUEST_METHOD",
"PATH_INFO", "PATH_TRANSLATED", "SCRIPT_NAME",
"QUERY_STRING",
"REMOTE_HOST", "REMOTE_ADDR", "REMOTE_USER",
"REMOTE_IDENT",
"AUTH_TYPE", "CONTENT_TYPE", "CONTENT_LENGTH",
"HTTP_ACCEPT", "HTTP_USER_AGENT", "HTTP_REFERER"};
int main(void) {
const numvars=20;
int i;
printf("Content-type: text/plain\n\n");
for (i=0;i<numvars;i++) printf("%s = %s\n", evars[i], getenv(evars[i]));
return 0;
}
Note that the things you did for the text counter MUST be done for this
as well. That means make it world executable. Note theres no data files
so chmod'ing them is not needed. If there was any there would be.
The HTML for the test for this is:
<Html>
<Body>
<A Href="/cgi-bin/evars.cgi?Testing_Testing_1_2_3">See the environment vars!</a>
</Body>
</Html>
Note the ?Testing_Testing_1_2_3... This will appear in QUERY_STRING... I
think you can see the possibilities ;}.
=======
Summary
=======
In this document you have learned how to get basic input from the
user and output many things from your CGI to the HTTPD and conversely to
the remote host accessing the document.
This is enough information to begin writing useful CGI, however,
Im sure you would like to know more... Therefore, what I plan on doing is
doing another article covering forms and CGI in the next issue of CScene.
Finally in a third installment two CScenes from now I shall finish off
the entire CGI tutorial set with CGI and graphics.
Be on the lookout for them, as they supplement this article and should give
you almost everything you ever needed to know to write good CGI.
Thank you for listening to my rants and raves.
You may contact me at york@nbnet.nb.ca if you have any questions or
concerns about this document.
The Dragon
Brent York
You can download a zipfile of all the source in this document here.
C Scene Official Web Site :
http://cscene.oftheinter.net
C Scene Official Email :
cscene@mindless.com
This page is Copyright © 1997 By
C Scene. All Rights Reserved