[LUPG Home] [Tutorials]
[Related
Material] [Essays] [Project
Ideas] [Send Comments]
Preface, or - who is this for?
This document is meant to
provide people who already have a knowledge of Programming in C, with the
knowledge necessary to write Unix programs that use the network (actually, the
Internet). It is supposed to save you all the time it took me to learn how to do
this, due to lack of decent online documentation about the subject.
The idea is to explain only the really necessary information for writing
client and server applications, leaving less "urgent" information for the
appendices, and even less important information for the "see also" part. By the
way, I'm not providing an Index, because usually indexed documents scare me
away, and I want this document to look friendly to *me*, hoping it will also
look friendly to you.
OK, lets get down to business.
(yeah, I know I promised to avoid an index, but an un-detailed one is necessary)
(Skip this if you know what Internet is, what protocols it uses, what kind of addresses are used over Internet, etc).
The Internet is a computer communication network. Every computer connected to the Internet is also known as a "host", so we could say that Internet's role is to allow hosts to talk amongst themselves. I assume you are already familiar with Internet, as a user of programs such as 'Telnet', 'Ftp', 'Irc' and others. Lets first discuss Internet addresses a little, before we talk about the Internet protocols, and various programming aspects regarding network uniformity.
(Skip this if you know what IP addresses are and what ports in Internet are).
An Internet address (or an IP address, or an IP number) is a number made of 4 bytes (numbers between 0 and 255), written in what is called 'a dot notation'. for example, "128.0.46.1" is a valid Internet address. Such an address identifies a computer which is connected to the Internet. Note that a computer might have more than one such address, if it has more than one physical connections to the Internet (such as having two Ethernet cards connected. What's Ethernet? read the appendices, or ignore this).
However, when a normal human being uses Internet, they usually use human-readable (oh, really?) addresses of the form "uunet.uu.net" or "wuarchive.wustl.edu". A system in the Internet called "The Domain Name System" or DNS for short, is responsible to translate between human-readable addresses and IP addresses (again, read the appendices for information about DNS). You will not have to know anything about DNS in order to use it in your programs, so do not worry about it now.
OK. We said that IP numbers define a computer. Well, usually there is more than one program that wants to use the network, that runs on a given computer. For this purpose, the Internet people made up some extension for an IP address, called a port number. Each communications address is made up of an IP number AND a port number. Port numbers could be any number between 1 and 65535, although for certain reasons, you will use port numbers above 1024, unless you have a superuser privileges on your machine (also stated sometimes as "having root password").
For our purposes, we will use addresses of the form: 114.58.1.6:6072 where the "114.58.1.6" part is the IP number, and the "6072" part is the port number. Remember this for later usage.
(Skip this if you know what IP, TCP and UDP are).
You probably heard the term "TCP/IP" and wondered, or had some vague idea about what it means. Lets make things clearer:
The Internet is a network. In order to talk on a network, you need some kind of a "language". That language is also called "a protocol". The Internet has many kinds of protocols used to talk on it, in a manner called "layering".
Layering means that instead of defining one protocol that will do everything, which will be very hard to design and implement, the tasks are divided between several protocols, sitting on top of each other.
What does this mean? Think about it as sending letters: you write your letter on a paper, and then put it in an envelope and write the address on it. The postman doesn't care WHAT you wrote in your letter, as long as you wrote the correct address on the envelope.
The same thing is done when layering protocols. If one protocol contains the data sent and knows how to make sure the data is correct, then a lower protocol will contain the address to be used, and will know how to transfer the data to the correct target. The lower protocol does not understand the format of the data in the upper protocol, and the upper protocol doesn't have to know how to actually transfer the data. This way, each protocol becomes much simpler to design and test. Furthermore, If we will want to use the same protocol for writing the data, but send the data on a different network, we will only need to replace the protocol that knows how to transfer the data over the network, not the whole set of protocols.
(By the way, This sort of packing up several protocols on top of each other is called Encapsulation.)
One other important notion about the Internet is that it forms something known as a "packet switching network". This means that every message sent out is divided into small amounts of information, called packets. The different protocols send the data in packets, which might get divided into smaller packets while it travels to the target host, due to "Electrical" limitations of physical networks. The target machine will eventually combine the small packets (also known as fragments) and build the original message again.
The packets are what allows several connections to use the same physical network simultaneously, in a manner transparent to the network users. No machine will take over a line completely, even if it needs to send a large message. Instead, it will send the message in small fragments, allowing other machines to send their packets too.
Let us now name out some of the Internet protocols, and explain briefly each one of them:
UDP is another protocol that is placed on top of IP. It is used for services that send small amounts of data between programs that do not require a long-time connection, or that send only little amount of data at a time. The 'talk' program uses the UDP protocol.
UDP adds only port numbers to the functionality IP gives, so the programmer needs to worry about checking for errors in messages (that come due to line noises and such), making sure the data sent using UDP arrives in the right order (which is not automatically achieved, due to IP's nature of not having a constantly open connection), and such.
There are various other protocols used in conjunction with IP, such as ARP, RARP, ICMP, SNMP and others, but those won't be dealt with in this document. Look at the 'See Also' part to find pointers to articles discussing those protocols.
(Skip this if you already know what byte ordering means, and what are 'well known ports' across Internet)
It is important to understand that protocols need to define some low-level details, in order to be able to talk to each other. We will discuss two such aspects here, in order to understand the example programs given later on.
It is an old argument amongst different computer manufacturers how numbers should be kept in a computer.
As all computers divide memory into bytes (or octets) of information, each 8 bit long, there is no problem with dealing with byte-sized numbers. The problem arises as we use larger numbers: short integers (2 bytes long) and long integers (4 bytes long). Suppose we have a short integer number, FE4Ch (that is, FE4C in hexadecimal notation). Suppose also that we say this number is kept in memory address 100h. This could mean one of two things, lets draw them out:
--------------- Address: | 100h | 101h | --------------- Contents: | FEh | 4Ch | ---------------
--------------- Address: | 100h | 101h | --------------- Contents: | 4Ch | FEh | ---------------
In the first form, also called 'Big Endian', The Most Significant Byte (MSB) is kept in the lower address, while the Least significant Byte (LSB) is kept in the higher address.
In the second form, also called 'Little Endian', the MSB is kept in the higher address, while the LSB is kept in the lower address.
Different computers used different byte ordering (or different endianess), usually depending on the type of CPU they have. The same problem arises when using a long integer: which word (2 bytes) should be kept first in memory? the least significant word, or the most significant word?
In a network protocol, however, there must be a predetermined byte and word ordering. The IP protocol defines what is called 'the network byte order', which must be kept on all packets sent across the Internet. The programmer on a Unix machine is not saved from having to deal with this kind of information, and we'll see how the translation of byte orders is solved when we get down to programming.
When we want two programs to talk to each other across Internet, we have to find a way to initiate the connection. So at least one of the 'partners' in the conversation has to know where to find the other one. This is done by letting one partner know the address (IP number + port number) of the other side.
However, a problem could arise if one side's address is randomly taken over by a third program. Then we'll be in real trouble. In order to avoid that, There are some port numbers which are reserved for specific purposes on any computer connected to the Internet. Such ports are reserved for programs such as 'Telnet', 'Ftp' and others.
These port numbers are specified in the file /etc/services on any decent Unix machine. Following is an excerpt from that file:
daytime 13/tcp
daytime 13/udp
netstat 15/tcp
qotd 17/tcp quote
chargen 19/tcp ttytst source
chargen 19/udp ttytst source
ftp-data 20/tcp
ftp 21/tcp
telnet 23/tcp
smtp 25/tcp mail
Read that file to find that Telnet, for example, uses port 23, and Ftp uses port 21. Note that for each kind of service, not only a port number is given, but also a protocol name (usually TCP or UDP). Note also that two services may use the same port number, provided that they use different protocols. This is possible due to the fact that different protocols have what is called different address spaces: port 23 of a one machine in the TCP protocol address space, is not equivalent to port 23 on the same machine, in the UDP protocol address space.
So how does this relate to you? When you will write your very own programs that need to 'chat' over Internet, you will need to pick an unused port number that will be used to initiate the connection.
(Skip this if you already know what clients and servers are, and what are the relations between them).
This section will discuss the most common type of interaction across the Internet - the Client and Server model. Note that this discussion is relevant to other types of networks too, and a few examples will be mentioned along the text.
We will first explain what the client-server model is, then detail the roles of the client, the roles of the server, and give examples of some famous servers and clients used.
(Skip this if you know what the client-server model basically means)
The client-server model is used to divide the work of Internet programs into two parts. One part knows how to do a certain task, or to give a certain service. This part is called the Server. The other part knows how to talk to a user, and connect that user to the server. this part is called the Client. One server may give service to many different clients, either simultaneously , or one after the other (the server designer decides upon that). on the other hand, a Client talks to a single user at a time, although it might talk to several servers, if it's nature requires that. There are other such complex possibilities, but we will discuss only clients that talk to a single server.
(Skip this if you already know what clients are supposed to do)
A client's main feature is giving a convenient User interface, hiding the details of how the server 'talks' from the user. Today, people are trying to write mostly graphical clients, using windows, pop-up-menus and other such fancy stuff. We will leave this to someone else to explain, and concentrate on the networking part. The client needs to first establish a connection with the server, given it's address. After the connection is established, The Client needs to be able to do two things:
This forms the basic loop a client performs:
get the server's address
form a working address that can be used to talk over Internet.
connect to the server
while (not finished) do:
wait until there's either information from the server, or from the
user.
If (information from server) do
parse information
show to user, update local state information, etc.
else {we've got a user command}
parse command
send to server, or deal with locally.
done
In the end of this tutorial you will be able to write such clients.
(Skip this if you already know what servers are supposed to do)
A server main feature is to accept requests from clients, handle them, and send the results back to the clients. We will discuss two kinds of servers: a Single-client server, and a multi-client server.
These are servers that talk to a single client at a time. They need to be able to:
this forms the main loop a Single-Client Server performs:
bind a port on the computer, so Clients will be able to connect
forever do:
listen on the port for connection requests.
accept an incoming connection request
if (this is an authorized Client)
while (connection still alive) do:
receive request from client
handle request
send results of request, or error messages
done
else
abort the connection
done
this forms the main loop a Multi-Client Server performs:
bind a port on the computer, so Clients will be able to connect
listen on the port for connection requests.
forever do:
wait for either new connection requests, or requests from existing
Clients.
if (this is a new connection request)
accept connection
if (this is an un-authorized Client)
close the connection
else if (this is a connection close request)
close the connection
else { this is a request from an existing Client connection}
receive request from client
handle request
send results of request, or error messages
done
(Skip this if you already know about too many server types, or if you're not interested in knowing about them)
In this section we will give short descriptions of some "famous" servers and clients, that are used daily over Internet, and over some other famous kinds of networks. This is simply an illustrative section, that can be safely skipped by a rushing reader.
[LUPG Home] [Tutorials]
[Related
Material] [Essays] [Project
Ideas] [Send Comments]
This document is copyright (c) 1998-2002 by guy keren.
The material
in this document is provided AS IS, without any expressed or implied warranty,
or claim of fitness for a particular purpose. Neither the author nor any
contributers shell be liable for any damages incured directly or indirectly by
using the material contained in this document.
permission to copy this
document (electronically or on paper, for personal or organization internal
use) or publish it on-line is hereby granted, provided that the document is
copied as-is, this copyright notice is preserved, and a link to the original
document is written in the document's body, or in the page linking to the copy
of this document.
Permission to make translations of this document is
also granted, under these terms - assuming the translation preserves the
meaning of the text, the copyright notice is preserved as-is, and a link to
the original document is written in the document's body, or in the page
linking to the copy of this document.
For any questions about the
document and its license, please contact
the author.