2003-MMLAB-TR-05

7/31/2019 2003-MMLAB-TR-05

1/134

ATHENS UNIVERSITY of ECONOMICS AND BUSINESSDEPARTMENT OF COMPUTER SCIENCE

Peer-To-Peer Wireless Network Confederation

Traffic Logging Subsystem

Name: Pantelis FrangoudisStudent Number: 3990108

Supervisor: Prof. G.C. PolyzosSupervising Assistant: E.C. Efstathiou

Athens, September 2003

7/31/2019 2003-MMLAB-TR-05

2/134

1

Summary

This document is the report for my diploma thesis project. What it deals with is Peer-

To-Peer Wireless Network Confederation (P2PWNC) and, in particular, an

Administrative Domains Local Traffic Logging/Accounting and Monitoring System.The above terms will be more thoroughly discussed in following chapters.

As stated in [1], [2], [3] and [4], a Peer-To-Peer Wireless Network Confederation

(P2PWNC) is a community of WLAN Administrative Domains (ADs) that offer

network access to each others registered users. An AD provides internet services to

P2PWNC users of other AD to compensate for the services that its own registered

users enjoy from other AD when they roam. This roaming scheme is decentralized,

leaving ADs make their own decisions on the amount of resources they contribute to

the P2PWNC system. ADs are composed of some modules, such as the WLAN

control module, the user authentication module, the P2PWNC module, etc.

This document deals with the traffic logging and analysis subsystem, which can be

considered as a section of the LANs Control Module of an AD. As a matter of fact,

though, the information this subsystem makes available can be used by other modules,

too. The traffic logging subsystem is divided in two sections. The first one is a

network packet capturing, analysis and logging daemon program. It captures packets

that pass from the network interface where a wireless LAN access point is connected

and analyses them. The purpose of this analysis is to gather aggregate traffic statistics

and application layer protocol information about the P2PWNC users.

The second section of the system is an XML-based statistics retrieval and exchange

protocol. It is a client server protocol designed for the retrieval and exchange of thestatistics that the packet logging and analysis daemon generates. Clients do not have

direct access to the database where the statistics are stored. Instead, they issue

properly formed requests (XML documents) to a server, which, in turn, queries the

database and returns the results to the clients in messages of a protocol-specified

format. Apart from the specification of the protocol, the implementation of a typical

server and client of it is discussed and presented.

As this is the first version of this system, it is expected that further improvement is

possible or needed on some topics. Future work must be bone on issues that have to

do with the packet logging and analysis daemons performance as well as its traffic

analysis capabilities. As far as the statistics retrieval, exchange and presentation is

concerned, issues of security are of greater importance. Also, extending the statistics

exchange protocol and developing servers and clients with more functionality and

operating system independence would enhance statistics retrieval and presentation

capabilities.

7/31/2019 2003-MMLAB-TR-05

3/134

2

Table of Contents

Chapter 1

INTRODUCTION 61.1 About Peer-To-Peer Networking 61.2 Peer-To-Peer Wireless Network Confederation 6

1.2.1 System Overview and Terminology 6

1.2.2 Modules and Subsystems 8

1.2.2.1 WLAN Control Module .. 8

1.2.2.2 Authentication and User Identification Module .. 8

1.2.2.3 Local AD Services Module . 8

1.2.2.4 Internet Connectivity Module .. 8

1.2.2.5 P2PWNC Management Module .. 8

1.2.2.6 Local P2PWNC Policy Module .. 8

1.2.3 Administrative Domains Local Traffic Accounting and

Monitoring System . 9

Chapter 2

TRAFFIC LOGGING AND ANALYSIS SUBSYSTEM . 102.1 System Testbed 10

2.1.1 Hardware 10

2.1.2 Operating System . 10

2.1.3 Other software ... 10

2.2 Tools Used .. 10

2.3 Logging Subsystem General Architecture .. 11

2.3.1 Login state.. 122.3.2 Connected State . 13

2.3.3 Logout state 13

2.3.4 Communication with the authentication module .. 14

2.3.4.1 Interprocess Communication (IPC) in Linux Environments 14

2.3.4.2 IPC implementation in the Traffic Logging Subsystem 15

2.3.4.3 Shared Memory Allocation and Handling . 21

2.4 Traffic logging and analysis subsystem implementation 25

2.4.1 Packet Capturing Systems . 25

2.4.2 Libpcap architecture and principles .. 26

2.4.3 Libpcap Advantages .. 28

2.4.4 Libpcap Performance Issues . 292.4.5 Data Storage .. 29

2.4.5.1 Available Information . 29

2.4.5.2 MySQL Database Scheme .. 30

2.5 The Packet Capturing and Analysis Daemon ... 36

2.5.1 Packet Capturing Daemon Architecture . 36

2.5.1.1 Protocol Header Stripping 36

2.5.1.2 User Packet Matching .. 38

2.5.1.3 Application Layer Protocol Determination .. 40

2.5.2 Application Layer Protocol-specific Statistics Extraction .. 43

2.5.2.1 Connection Tracking Principles .. 43

2.5.2.2 HTTP Tracking 45

2.5.2.2.1 Typical HTTP Scenario 45

7/31/2019 2003-MMLAB-TR-05

4/134

3

2.5.2.2.2. HTTP Request Tracking Algorithm 46

2.5.2.3 FTP Tracking .. 53

2.4.2.3.1 Typical FTP Scenario . 53

2.4.2.3.2 FTP Connection Tracking Algorithm .. 55

2.5.2.4 SMTP Tracking ... 59

2.4.2.4.1 Typical SMTP Scenario .. 592.4.2.4.2 SMTP Connection Tracking Algorithm .. 60

2.5.2.5 Pop3 Tracking . 64

2.5.2.5.1 Typical POP3 Scenario 64

2.5.2.5.2 POP3 Connection Tracking Algorithm 66

2.6 Demonstration .. 73

Chapter 3

AN XML BASED PROTOCOL for STATISTICS RETRIEVAL and

EXCHANGE . 753.1 Introduction 753.2 System Testbed 75

3.2.1 Hardware . 75

3.2.2 Operating System ... 76

3.2.3 Other software . 76

3.3 Tools Used . 76

3.4 Protocol Description .. 76

3.4.1 Introduction 76

3.4.2 ABNF Specification 77

3.4.3 Protocol semantics .. 80

3.4.3.1 REQUEST .. 803.4.3.1.1 Request version and header information . 80

3.4.3.1.2 Request fields ... 80

3.4.3.1.3 Request criteria . 80

3.4.3.2 RESPONSE . 81

3.4.3.2.1 Response version and header Information .. 81

3.4.3.2.2 Response pairs .. 82

3.4.4 Client functions .. 82

3.4.4.1 Login Function .. 82

3.4.4.2 Logout Function .. 83

3.4.4.3 Password changing function ... 84

3.4.4.4 Statistics retrieval function .. 853.4.5 Server functions 86

3.4.5.1 Server sequence of actions . 87

3.4.5.2 Login Function 87

3.4.5.3 Logout Function . 88

3.4.5.4 Password changing function .. 88

3.4.5.5 Statistics retrieval function . 89

3.4.6 Security Issues 91

3.4.6.1 Protocol Specifications 91

3.4.6.2 Suggestions about security .. 92

3.4.7 Protocol usage scenario .. 93

3.4.8 Advantages and disadvantages of this approach 96

3.5 Client and Server Implementation 97

7/31/2019 2003-MMLAB-TR-05

5/134

4

3.5.1 Introduction . . 97

3.5.2 XML parsing ... 98

3.5.3 Server implementation 101

3.5.3.1 Architecture of the server program . 101

3.5.3.2 Login and Logout Functions ... 102

3.5.3.3 Password changing function 1043.5.3.4 Statistics retrieval function . 106

3.5.4 Client Implementation 112

3.5.4.1 Login Function . 113

3.5.4.2 Logout Function .. 113

3.5.4.3 Password changing function 113

3.5.4.4 Statistics retrieval function . 114

3.6 Demonstration ... 115

Chapter 4

FUTURE WORK. 123

4.1 Traffic Logging and Analysis Subsystem . 1234.1.1 Performance issues . 123

4.1.2 Traffic Analysis .. 124

4.2 Statistics Exchange Protocol . 124

4.2.1 Security issues . 124

4.2.2 Protocol extension .. 124

4.2.3 Presentation and monitoring issues . 125

Conclusions ... 126

APPENDIX .... 128Installation of the programs .... 128

Setting up the database .... 129

Configuration ... 129

Execution and usage . 130

REFERENCES .... 132

LINKS .... 133

7/31/2019 2003-MMLAB-TR-05

6/134

5

Figures

Figure 1. P2PWNC Architecture . 7

Figure 2. Administrative Domains Linux Box .. 11

Figure 3. Traffic Logging Subsystem High Level View .. 12Figure 4. Login State .. 13

Figure 5. Logout State . 14

Figure 6. Subsystems IPC communication using Shared Memory 16

Figure 7. shmAppendUser() Determining new nodes position .. 22

Figure 8. removeFromList() shifting nodes back . 24

Figure 9. After removeFromList() . 24

Figure 10. Libpcap architecture .. 27

Figure 11. Packet Capturing Daemon Architecture .....36

Figure 12. Ethernet Packet Format .. 38

Figure 13. FTP command reply sequence 55

Figure 14. POP3 command reply sequence . 66Figure 15. FTP stored statistics ... 73

Figure 16. SMTP stored statistics 74

Figure 17. Statistics Client and Server Architecture and Interconnection with the

Traffic Logging Subsystem .....75

Figure 18. XML Parser Benchmark Test . 98

Figure 19. Login Form .116

Figure 20. General Statistics 117

Figure 21. Aggregate traffic ttatistics ..117

Figure 22. HTTP statistics . .118

Figure 23. FTP statistics .. 119

Figure 24. SMTP statistics ..119Figure 25. POP3 statistics 120

Figure 26. Password changing form 121

Figure 27. Connected administrator information .121

Figure 28. About box 122

7/31/2019 2003-MMLAB-TR-05

7/134

6

Chapter 1

INTRODUCTION

1.1 About Peer-To-Peer NetworkingA Peer-To-Peer (P2P) Network is a network comprised of autonomous and equivalent

entities. This is a fairly old concept in the area of Computer Networks and

Communications. A pure P2P system is decentralized. That is, there is no central

entity coordinating communication and interaction between peers. Today, there are

numerous examples of the Peer-To-Peer network model. Most of them, like Kazaa

(www.kazaa.com) and Gnutella are file sharing systems.

1.2 Peer-To-Peer Wireless Network Confederation1.2.1 System Overview and TerminologyA Peer-To-Peer Wireless Network Confederation (P2PWNC) is a community of

WLAN Administrative Domains (ADs) that offer network access to each others

registered users. Obviously, it is a roaming scheme which is decentralized. This

means that there is no central coordinative entity, nor any bilateral contracts to control

the parties behavior. The peers of this network are the different Administrative

Domains, which are autonomous as to the amount of resources they contribute and

their participation level in the confederation. The main goal of the P2PWNC System

is to provide ubiquitous network access to its members. The ability of an ADs

registered users to roam and enjoy free network access outweighs the ADs cost in

resources, performance, etc. of providing access to visitors (registered to other ADs ofthe P2PWNC).

As mentioned before, the main goal of this system is ubiquitous, cheap, fast and

secure access to network resources and particularly internet services. Wireless LANs

provide the best way to achieve this. It is easier to deploy wireless infrastructure to

quickly and more cost-effectively create a high-coverage network satisfying the above

demands in the best way. A P2PWNC, whose components are WLAN Administrative

Domains, incorporates the IEEE 802.11 technology. This is a standard growing

steadily in popularity. It offers easy access to the network as well as security (this is a

topic where a lot of discussion is taking place). Furthermore, it is relatively cheap and

fast to employ.

Being a pure P2P system, the P2PWNC lacks the notion of a central controlling entity

which can enforce a uniform policy among peers as to the usage and availability of

resources and participation. In the same way as in other peer-to-peer systems like

Gnutella, the problem of free riding is encountered. That is, an ADs members can

consume many of other ADs resources, while this AD offers little resources to other

ADs members. The equivalent case in terms of a file sharing P2P system like Kazak

would be something like this; a Kazaa user downloads many files from other users,

while he shares very few of his own files. The solution to this problem is to offer

incentives for peers to contribute to the system by imposing community-wide rules.

Such rules can control a peers behavior. Rule breaking would result to some sort ofpunishment, while rule compliance would be beneficial for both the peer (who would

7/31/2019 2003-MMLAB-TR-05

8/134

7

be rewarded) and the whole community (its function would be more regular and

systems use would be more sufficient). Rule enforcement is based on a distributed

accounting model which is briefly described below.

Although peers (ADs) have the autonomy as to the usage and availability of their

resources, the system imposes distributed constraint structures so that peers have anincentive to conform to the community rules. Every time a peer receives or offers

service, messages are exchanged and distributed accounting records are updated. The

messages exchanged include signed receipts that prove the provision of the service.

Therefore, forging the global accounting statistics of the system is made harder to

achieve. It is easy for a peer to deduce the rate of consumption of any other and this

can be achieved by inspecting and aggregating the above receipts. Although forging

the statistics is possible, distributed accounting provides the functionality of gathering

aggregated opinion about a peer by querying other peers about the services offered to

/ provided by it and thus assessing its reputation.

The combination of the P2PWNC peer-to-peer nature and the set of community-widerules described above offer the system the following advantages over other solutions:

- Scalability

- Decentralization

- Flexibility and low complexity

- Economic efficiency

The architecture of a P2PWNC is shown in the following figure

AD: Administrative Domain

AP network view

: Member

ADBlack

AD

Grey

AD

White

Figure 1. P2PWNC Architecture

7/31/2019 2003-MMLAB-TR-05

9/134

8

1.2.2 Modules and SubsystemsAn Administrative Domain consists of some modules and each of them has its own

functionality. These modules are:

1.2.2.1 WLAN Control ModuleThe WLAN control module manages the Access Points (AP) network and shapes

traffic coming from, or destined to, APs (and ultimately User Agents UAs). It

consists of the bandwidth control, the traffic logging and other local subsystems.

The bandwidth control subsystem is responsible for allocating portions of the

available bandwidth to the users visiting the AD and requesting services (we mean

users of this AD, as well as visitors from other ADs of the P2PWNC). The traffic

logging and monitoring subsystem will be described in detail in this document.

1.2.2.2 Authentication and User Identification ModuleThis module checks UA credentials and then decides what services the UA is

authorized to access. The decision is enforced by the WLAN control module. Users

registered to the ADs of the P2PWNC have a unique username within the same

domain. Usernames have the format: user@domain. Every time a user wishes to

use the internet services or other facilities available by an Administrative Domain, an

authentication/identification process takes place. If the user has the necessary

qualifications (certificates or user name-password pair) so that he can be served, he is

assigned a dynamic IP address. After the authentication / login process has been

successfully carried out, the user is identified using the assigned IP address. While

logged in, each packet traversing the router having the above IP address as its source /

destination is considered to be sent / received by the above user. After he has loggedout, the IP address that has up till now been used for identifying him is released (and

therefore can be assigned to another user) and is of no more significance for user

accounting purposes.

1.2.2.3 Local AD Services ModuleThis module offers other local services (PSTN VoIP gateway, webcache, etc)

1.2.2.4 Internet Connectivity ModuleThe Internet Connectivity module, as its name implies, is the Administrative

Domains gateway to the Internet.

1.2.2.5 P2PWNC Management ModuleThis module implements all the high-level Peer-To-Peer functionality of the system

(generic service provision, rules enforcement, etc)

1.2.2.6 Local P2PWNC Policy ModuleThis encapsulates the strategy of an AD as a participant in a P2PWNC (the amount of

resources offered to visitors, the request rate allowed for its own members, etc)

The above modules can be distinguished to those that would exist in any typical

WLAN AD, even if it wasnt participating in a P2PWNC (User Authentication,

WLAN Control, Internet Connectivity, Local Services) and those that have to do with

7/31/2019 2003-MMLAB-TR-05

10/134

9

communication between peers and are a distinctive characteristic of a Peer-To-Peer

Wireless Network Confederation member (P2PWNC Management Module, Local

P2PWNC Policy Module)

1.2.3 Administrative Domains Local Traffic Accounting

and Monitoring SystemThis is the ADs subsystem with which this document will deal with in the following

chapters. It is composed of two parts. The first one refers to the traffic logging and

analysis and local user accounting system. The second one is an XML-based protocol

for the retrieval and exchange of the statistics generated by the logging/accounting

subsystem.

7/31/2019 2003-MMLAB-TR-05

11/134

10

Chapter 2

TRAFFIC LOGGING AND ANALYSISSUBSYSTEMThe traffic logging subsystem is in fact a packet capturing and analysis daemon. This

daemon is responsible for capturing packets from a defined network device that

belongs to the router and analyzing them so that it can gather some aggregate traffic

statistics as well as application-specific information about the P2PWNC users. The

information is grouped by application protocol. For example, for the HTTP protocol

the statistics available include the HTTP requests users have made and in particular

the host name, request method (GET, POST, etc), the user agent (e.g. Mozilla,

Microsoft IE, etc) and the request URI. In a similar way, the traffic logging daemon

can track down information about other application level protocols (FTP, SMTP,POP3). Data is stored in a MySQL Database.

2.1 System TestbedIn order for the traffic logging subsystem to work properly, the following are required

2.1.1 HardwareThe system was developed and tested on an Intel Pentium III (800 MHz) computer

with 256 Mbytes of RAM. It was also tested on an Intel Celeron (500 MHz) with 64

Mbytes of RAM. These systems included an 802.11b Access Point, and a Network

Interface Card for Internet connectivity. The experiments included two Compaq

N610c laptop machines, with 512 Mbytes of RAM and a ZoomAir 4100 802.11b cardin ad hoc mode.

2.1.2 Operating SystemThe operating system used was RedHat Linux 8.0. However, the packet logging

subsystem was also tried in earlier as well as more recent versions of RedHat (7.3,

9.0) and Mandrake Linux. The kernel versions that the system has been tested on are

2.4.18-14. The Compaq laptops were running RedHat 9.0, kernel 2.4.21 (see

LINKS [11]).

2.1.3 Other softwareThe system must have MySQL version 4.0.15 (or later) installed for the data storage.

However, earlier versions work fine, too. Also, libpcap version 0.7.1 (or later) is

needed by the packet capturing daemon.

2.2 Tools UsedThe packet capturing daemon is libpcap-based. Therefore, libpcap version 0.7.1

packages had to be installed (see LINKS section, [2])

The MySQL version used was 4.0.15 ([1], LINKS section). Compiling programs

with MySQL support requires that the packages MySQL-devel-4.0.15-0 and MySQL-

client-4.0.15-0 are installed. However, the system works properly with both earlierand later versions of these packages.

7/31/2019 2003-MMLAB-TR-05

12/134

11

The programs were implemented in the C language and the compiler used was gcc.

The editor used was mainly KWrite.

The debugger used was gdb.

For database viewing, phpMyAdmin was used ([7]). phpMyAdmin is a web

interface to MySQL written in php.

2.3 Logging Subsystem General ArchitectureThe logging subsystem is located on a Linux Box that functions as the Administrative

Domains router. This Linux Box is responsible for the ADs control and includes the

Authentication module, the traffic shaping module, etc. It has two network interfaces.

One is the 802.l1b access point and the other is a Network Card that serves as the

ADs gateway to the Internet. Users approaching the Wireless Network Access Point

receive / send packets from the access point. These packets have passed from / are

routed to the Internet Gateway network card.

The packet capturing daemon captures every single packet that the network interface

that is being watched sends or receives. However, not every packet captured is

important for the system. We only care about packets that are sent or received by

users registered to the P2PWNC and are online at the moment. This means that there

has to be a mechanism of distinguishing which packets are important for accounting

reasons and should be further analyzed and which should be ignored by the system.

The above are shown in the next figure, which describes at a high level what is

happening when a user sends a network packet.

Wireless Users

Internet Gateway

A.

PLINUX Box (Routing,

Accounting, Traffic Shaping,etc)

Figure 2. Administrative Domains Linux Box

7/31/2019 2003-MMLAB-TR-05

13/134

12

The above example involved a user registered to an AD of the P2PWNC who was

causing network traffic through the AD. The system figured out that a packet it

captured belonged to him and the next step was to further analyze the packet (its

header and probably its payload data) and update the database statistics. For example,

if it was an ftp packet, the system would increment the total ftp upload statistics by

the length of this packet and if the packet payload carried some extra informationabout the ftp connection (e.g. the users ftp account name or his password), this

information would be tracked written in the database. Capturing a packet that is to be

received by the user is an almost identical case. In case the daemon captures a packet

which is not found to belong to any online registered user, it is ignored and no

further analysis takes place.

There are three discrete user states that can be distinguished in this system, the

login state, the connected state and the logout state.

2.3.1 Login stateObviously, the term login state refers to the first step of a user who wishes to usethe internet services provided by an AD of the P2PWNC. During the login state the

user authentication / identification is taking place. The user to-be-connected, makes a

login request sending his credentials (by means of a username password pair or a

certificate of some kind). If he can meet the requirements to login, he is assigned a

(dynamic) IP address. After the successful IP assignment, the APs authentication

module has to do the necessary updates in the database. Namely, it has to update the

newly logged-in user database record with the IP address assigned, the users network

interfaces (802.11 wireless network interface card) MAC address and a timestamp.

Also, the authentication module has to notify the packet capturing module of the login

event. For this purpose, there is a dynamic list of the online users which is in a

memory segment shared by the two modules (the way the authentication module

always keeps the packet capturing daemon informed of the users who are logged in at

user@domain -

ip:

xxx.xxx.xxx.xxx

Sending Packet

Source IP:

xxx.xxx.xxx.xxx

Packet

Captured

IP

src/dstcheck

Packet belongs to a

registered user who is

online

Packet

Ignored

Packet AnalysisDatabase

Statistics / Info

Figure 3. Traffic Logging Subsystem High Level View

7/31/2019 2003-MMLAB-TR-05

14/134

13

any moment will be thoroughly discussed at a following chapter). From now on, the

user is identified by the username IP address pair and he enters the connected

state. What takes place during the login state is shown in the following figure.

2.3.2 Connected StateThe connected state is the state the user enters as soon as he has successfully passed

the login state. At this time, the user can be identified by the user name IP address

pair. During the connected state the user can make use of the internet services andfacilities an Administrative Domain can offer. User accounting starts at the moment

he has logged in and stops when the user has exited the connected state. During this

state, every packet the user sends / receives is captured and examined. After the

examination and extraction of any useful information, user statistics are updated, as

mentioned before.

2.3.3 Logout stateThis is the last state a user can come to. When a user issues a logout request or, more

frequently, when a user moves out of the coverage area of the Administrative

Domains WLAN, he is entering logout mode. What the system has to do in this case

is similar to what it does at the login state. The database record that shows the users

status (online / offline), his temporary IP and his MAC address and the online users

list have to be updated. The user status is set to offline and the node referring to him

in the online users list is removed. These actions can be seen in the next figure.

username@somedomain

Login request -

CredentialsAuthentication

Module

DB

Shared

Memory -

Online Users

List

Traffic

Logging

Subsystem

Updating Database

and Online UsersList

Figure 4. Login State

7/31/2019 2003-MMLAB-TR-05

15/134

14

2.3.4 Communication with the authentication moduleIn the previous section it was made clear that in order for the system to function

properly, a means of communication between the authentication and the traffic

logging modules is needed. It is crucial that the packet capturing daemon is notified of

login/logout events so that it will be informed of all the users that are logged in at any

moment. For the period of time the user is in the connected state, he is identified by

the pair consisted of his username (which has the format user@domain) and the

dynamic IP address that has been assigned to him by the authentication module.

Obviously, what the system needs is a way of interprocess communication between

the process that logs users in and the one responsible for user accounting. The

authentication module must have a way of sending the traffic logging module theinformation about any changes in the state of users. The minimum information needed

is the user name, the (assigned) IP address of the user and an indicator of his new state

(logged in or logged out). In the section that follows, a brief discussion about

communication between processes in a Linux environment takes place.

2.3.4.1 Interprocess Communication (IPC) in Linux EnvironmentsThere are numerous ways of achieving IPC in a Linux System. Some of them are the

following.

- Signals

- Pipes

- Message Queues

- Sockets

- Threads

- Shared Memory

Signals are events that may be delivered to a process by the same or a different

process. Usually, signals are used to notify a process of an exceptional event.

Examples of signals are SIGINT, which is sent to a process when a user pressed the

Ctrl+C keys, SIGTERM, which is delivered to a process when a user kills it,

SIGUSR1 and SIGUSR2 which are user defined, SIGSEGV which takes place when

there is a memory violation in the process (segmentation fault), etc.

username@somedomain -

ip address

Logout

NotificationAuthentication

Module

DB

Shared

Memory -

Online Users

List

Traffic

Logging

Subsystem

Updating Database

and Online Users

List

(Setting user

offline, removing

from online users

list

Figure 5. Logout State

7/31/2019 2003-MMLAB-TR-05

16/134

15

Pipes can be regarded as files, which can be named or unnamed. A pipe is a one-way

communication channel between two processes. Unnamed pipes are used mainly for

communication between a parent and a child (or forked) process. Named pipes are

more appropriate for communication between different programs that share the same

file system.

Sockets are a more general and efficient way of IPC than pipes. They can be

considered as logical files that can achieve two-way communication. Usually, sockets

are used in network and distributed programming. Some socket types are used for

communication between kernel and userspace (e.g. netlink sockets).

Message Queues are similar to pipes. However, they allow messages to be tagged

with specific message types. Therefore, they allow messages different message types

to be exchanged. Unlike sockets, they can only used for communication between

processes running on the same machine. Message queues and pipes were mainly used

in older UNIX systems and the idea of using them in modern programs has started to

be abandoned.

Threads, which are in fact Lightweight Processes, enable processes to share their

fundamental parts, that is their code, data, stack, file I/O and signal tables. There are

both user-level and kernel-level threads.

Finally, a way of IPC in Linux and UNIX environments is by using shared memory.

As its name implies, shared memory is a memory segment where more than one

processes can have access. The shared memory segment is created by one process and

other processes can access it, given that they have access privileges to that segment.

This is a fast way of IPC and it is appropriate for cases when processes need to use a

shared resource (memory).

2.3.4.2 IPC implementation in the Traffic Logging SubsystemThe IPC method chosen in this system is via shared memory. This approach was

considered more appropriate because it was relatively simple to implement and closer

to the nature of the problem. That is, the two processes need to share the same

resource, which is the list of the online users. This list resides in a memory segment

that is accessible by both processes.

The use of signals for communication was not useful, because signals could not carry

all the information needed from one process to another. They could only notify thetraffic logging module of a log-in / log-out event, but could not give more information

about the name and the IP address of the user the event referred to.

The authentication and the traffic logging module do not communicate directly.

Instead, there is a middle level between the two subsystems. The figure below will

make this clear.

7/31/2019 2003-MMLAB-TR-05

17/134

16

shmhandleuseris in fact the process that implements the middle level between the

two modules. Every time a login / logout event takes place, the authentication module

must call this process, which adds / removes the user to/from the list of online users

(which is in the shared memory segment).

The presence of this middle level between the two subsystems is needed so that theyare as more independent from one another as possible. Every time a user logs in and is

assigned an IP address or a user logs out and his IP address is released, the

authentication module can call the external program shmhandleuser via a

system, exec or a similar system call. shmhandleusers arguments are:

- the P2PWNC users username

- the users IP address (newly assigned or released IP)

- a flag (0 or 1) indicating whether a logout or login event has taken place.

For example, if the authentication module program was written in C, a call of the

following format would be issued:

/* authentication program code */.

.

.

system (shmhandleuser username@domain xxx.xxx.xxx.xxx 1);

.

.

.

/* more code */

The above piece of code adds the user username@domain with the assigned IP

address xxx.xxx.xxx.xxx (it is supposed that all database updates concerning the new

user is a task the authentication module has already carried out).

Obviously, the authentication program could have been written in another

programming language. In that case, the equivalent system call should be issued. The

generality and independence of this approach lies in the fact that the communication

module is not bound to the authentication module. Therefore, even if the

authentication module was created again from scratch, there would not have to be any

changes in the middle level. The only thing the programmer would have to do would

be to include the shmhandleusercall in his code every time a login / logout event

would take place. Also, this approach enables the creation of a central entity that can

control the whole AD system. One could create a controlling module which would

coordinate the authentication, the traffic logging and other AD modules. In such acase, for example, the controlling module could search in the database where user

Traffic Logging

Daemon

SHARED MEMORY

SEGMENT

List of the online

users

shmhandleuser

functionAuthentication Module

Figure 6. Subsystems IPC communication using Shared Memory

7/31/2019 2003-MMLAB-TR-05

18/134

17

information is stored on a regular basis (e.g. every second) to find out if a new user

has logged into the system or if logouts have taken place. After that, the controlling

module would issue shmhandleusercalls for every user that has arrived / exited the

system.

The list of the online users, as it may have been made clear, is located in a memorysegment that is shared between the packet capturing daemon and the shmhandleuser

program.

This list is implemented as a kind of a linked list. It is composed of nodes which have

the following format:

struct usrnode{

/* User List Node */

char username[200];

char ipaddr[20];

int count;

int updated;int pos;

struct usrnode *next;

};

typedef struct usrnode unode;

usrnode: This struct is a node of the online users list data structure

username: Users username (usually of the format: username@domain )

ipaddr: The assigned IP address

count: Number of nodes currently in the list. This field only makes sense for the head

of the list

next: Pointer to the next node of the list.

The above structure, as well as the list handling functions are defined and

implemented in the usrlist.h file.

If the pcap daemon program wishes to find out whether a newly captured packet is

sent or is to be received by an online user, what it has to do is search the user list to

check if the packets source or destination IP address matches with any of the users in

the list. Obviously, this way the packet capturing daemon is always informed of the

users that are online at any time and is instantly notified of any change in a users

state (online / offline).

As mentioned before, there are two processes that can have access to the shared

memory block. Of these two processes, only shmhandleuser actually writes on that

block. The packet capturing daemon (packet_cap process) only reads from that

memory area. That is, the daemon only searches the list of users located there and

never actually writes anything on it. The other process is the one that adds and

removes nodes from the list. As it seems, the synchronization problems are made less

serious, because there is not any chance that both processes will try to write on the

same segment at the same time.

The way the two processes work in terms of the shared memory is as follows.- The packet capturing daemon (packet_cap) first creates the shared memory segment:

7/31/2019 2003-MMLAB-TR-05

19/134

18

/*creating mem segment*/

mid = shmget(M_KEY, MAXUSERS*sizeof(unode), IPC_CREAT|PERMS);

if (mid == -1) {

printf("ERROR GETTING MEM..\n");

exit(1);

}

[ code taken from packet_cap.c ]

The above function (shmget) takes three parameters. The first is the key of the shared

memory segment. The second argument is the size of the shared memory block that

will be allocated. In this case, we have to allocate size as much as the size of the

maximum number of users (MAXUSERS, which in our case has the value 2000) our

system permits. The third argument includes the flags that control access to the shared

memory block. IPC_CREAT indicates that shmget must create a new shared memory

segment, whereas PERMS defines access rights to the block (in our case, it is 0666).

Then,packet_cap must map the shared memory block to its own address space. This

is achieved with the following call:

usrListHead = (unode*)shmat(mid, NULL, 0);

usrListHeadis a pointer to a unode struct which is declared as static in another part of

the program (static unode* usrListHead;) and represents the head of the users list

residing in the shared memory. usrListHeadis in fact a dummy node. It is used for

access to the users list. The members username and ipaddrhave no value. The most

important thing is that the countmember ofusrListHeadreports the number of nodes

in the list. Also, thepos member, which indicates the position of the node in the list,has a zero value. A call to the shmat function attaches the shared memory segment

identified by midto the address space of the calling process and returns a pointer to

that memory area.

Following that, thepacket_cap process has to search the database to check if there are

any online users. In issues the following SQL query:

SELECT u_username, u_ip_addr FROM users WHERE u_online='y'

The above fields are self-explanatory. The packet capturing daemon then checks the

query results and adds the users that are found online to the online users list (of theshared memory). This requires a call to the shmAppendUser function, which is

declared in the urslist.h file.

shmAppendUser(usrListHead, usrListHead + (usrListHead->

count)*sizeof(unode), row[0], row[1]);

The above function adds a new user in the end of the online users list, in the shared

memory block. The first argument of the function is the pointer to the head of the user

list (usrListHead) as it was described before. The second parameter is the exact

position in the shared memory block where the new node will be placed (the new

node has to be placed inside the shared memory block and, particularly, at the end ofthe list). The third parameter is the new nodes username and the fourth the new

7/31/2019 2003-MMLAB-TR-05

20/134

19

nodes ipaddr. The above is the only case when the packet capturing daemon actually

writes to the shared memory segment. In all other cases, the daemon only reads. The

reason why the packet capturing program has to check the database for online users

on its startup is that the authentication module may be already running at the moment

that the traffic logging module is starting up. This means that there is the possibility

that users may have already been assigned an IP address (as it was said before, thedatabase updates as far as the users status is involved are a task that the

authentication module is responsible for carrying out).

- Every time a login or logout event takes place, the shmhandleuseris called by the

authentication module. shmhandleuser gets a descriptor of the shared memory

segment where the online users list is located in a similar way as the packet capturing

program did:

mid = shmget(M_KEY, MAXUSERS*sizeof(unode), PERMS);

if (mid == -1) {

printf("ERROR GETTING MEM..\n");exit(1);

}

[ code taken from shmhandleuser.c ]

midis the shared memory descriptor returned by the shmgetfunction. The parameters

this function takes were described in the previous section. It should be noted that the

third argument of the function contains only the access permissions to the memory

block, while in the previous case there was the flag IPC_CREAT which indicated that

shmgetwas creating the memory segment.

Then, shmhandleuserprogram must obtain a pointer to the shared memory area. This

is achieved calling shmatfunction, which was also described in the previous section.

After attaching the memory block to the address space of the calling function,

shmhandleuser decides what to do with the specified user. According to the flag

specified as the last argument ofshmhandleuserthe program can either add or remove

the user from the online users list. These are shown in the next code fragment:

if (atoi(argv[3])) {

shmAppendUser(mem, mem + (mem->count)*sizeof(unode),

argv[1], argv[2]);

}

else {removeFromList(mem, argv[1]);

}

[ code taken from shmhandleuser.c ]

Obviously, if the third argument is non-zero, the user with a username specified by

the first argument of the program and an ipaddrspecified by the second command

line argument is appended to the users list.

In case the third argument (flag) is zero, the user is removed from the list. The

function that removes users is removeFromList. Its first parameter is the shared

memory descriptor and the second one the username field of the node that is to beremoved.

7/31/2019 2003-MMLAB-TR-05

21/134

20

Finally, shmhandleusermust detach the shared memory segment it has attached to its

address space. This is achieved by a call to shmdet:

/*detaching mem block..*/

shmdt(mem);

The parameter shmdt takes is the pointer to the shared memory block, which was

acquired by the shmatcall.

- After discussing what is happening on packet_cap startup and what is taking place

every time a login / logout event takes place, it is time for discussing what actions the

packet capturing daemon has to perform when it terminates. Normal program

termination takes place when the packet capturing daemon is sent the SIGINT or

SIGTERM signal, that is, when a user sends the Ctrl-C command or the kill

%processid command. In such a case, the signal handling function is called. Its

prototype is :

void termhandler(int sig);

The parameter sig is the signal code of the received signal (SIGINTor SIGTERM).

The operations termhandlerperforms, as far as the shared memory is concerned) are

show in the next piece of code:

shmdt(usrListHead);

if (shmctl(mid, IPC_RMID, NULL)) {

printf("ERROR REMOVING MEM...\n");

exit(1);

}else {

printf("Shared memory segment removed successfully. mid:

%d\n", mid);

}[ code taken from packet_cap.c ]

First the shared memory block is detached from the processs address space and then

the block (with the middescriptor) is removed calling the function shmctl, passing the

flagIPC_RMID as the second argument.

In case there is an abnormal program termination (e.g. a SIGSEGV signal) there is a

facility program called killmem which removes the shared memory segment createdby the packet capturing program with the shmgetcall.

In the above functions, we made use of the variables M_KEY and PERMS. These

variables are static:

static int M_KEY;

static int PERMS;

Their values are read by the packet capturing daemons configuration file

(packet_cap.conf). The function that reads the information stored in the configuration

7/31/2019 2003-MMLAB-TR-05

22/134

21

file and gives values to the appropriate values is calledparseConf. It is a void function

with the following prototype:

void parseConf();

This function searches for the configuration file in a defined path:

#define PACKET_CAP_CONFFILE "/etc/packet_cap.conf"

If the configuration file is not in this location, then the parseConffunction searches

for the file in the same path where the applications binary is located. These are

shown in the next piece of code which is in the body of theparseConffunction:

if (!(f = fopen(PACKET_CAP_CONFFILE, "r+"))) {

if (!(f = fopen("packet_cap.conf", "r+"))) {

printf("ERROR OPENING CONF FILE\n");

return;

}}


2.3.4.3 Shared Memory Allocation and HandlingIn the above section there was much discussion about communication between the

traffic logging and the authentication modules via the middle level implemented by

the shmhandleuser process. There was a detailed description about what happens

when the packet capturing program starts and terminates, as well as the steps taken

when a login / logout event takes place. However, little was mentioned about the user

list implementation and the way shared memory list handling functions work.

The user list, as mentioned before, consists ofusrnode (or unode) structures. It could

be considered as a linked list, but its implementation is dependent on the nature of

shared memory and thus differs a lot from typical linked list implementations.

The main difference with typical linked list implementations is that the valid address

space for a node of the list is limited in the shared memory segment created by the

shmget call and attached to the processs address space by shmat. Therefore, great

care must be taken so that every node is located inside the shared memory block

allocated for the processes.

A call to the malloc function for a new node pointer (unode*) would return an address

which would be inside the address space of the calling process, but certainly outside

the shared memory area. Obviously, this node would not be accessible by the other

process sharing the resource (user list) as its address would be out of the valid address

space where the second process has access.

A solution to this problem is the following.

The program responsible for the creation of the shared memory segment (packet_cap,

in our case) allocates space forMAXUSERS(consecutive) unode structs.MAXUSERS

is the maximum number of users allowed by the traffic logging subsystem to be

logged in at the same time. When a new user has to be added to the list, a call to the

shmAppendUser is issued. The first parameter is the user lists head pointer. The

7/31/2019 2003-MMLAB-TR-05

23/134

22

second is a unode pointer which points to the place in memory that the new node will

be placed. In order to ensure that the new nodes address is inside the allocated shared

memory block, we pass as the second parameter ofshmAppendUserthe address of the

first empty place in the shared memory segment where a unode can be placed. The

address of this position is:

usrListHead + (usrListHead->count)*sizeof(unode);

where usrListHead is the pointer to the head of the list, a dummy node which stores

the number of nodes in the list (usrListHead->count). The following figure will help

to better understand this method.

Obviously, the lists head is calculated in the number of nodes in the list. It is also

obvious that by using this method of adding nodes, the nodes of the list will always be

placed in consecutive positions. However, when removing a node, all nodes that are

located after it must be shifted back so that no empty ones will exist before the end of

the list (and existing nodes to continue being in consecutive positions). The code of

the node appending function follows.

void shmAppendUser(unode* usrlist, unode* newnode, char un[], char

ipaddr[]) {

unode* cur;

if (!usrlist) {

return;

}

if (usrFindInListByName(usrlist, un)) {

/*

if the node is already in the list then return

*/

return;

}

cur = usrlist + (usrlist->count - 1)*sizeof(unode);

strcpy(newnode->username, un);

usrListHead

head Unode1 . Last unode

usrListHead->count *

sizeof(unode) bytes

usrListHead + usrListHead->count * sizeof(unode)

Figure 7. shmAppendUser() - Determining new nodes position

7/31/2019 2003-MMLAB-TR-05

24/134

23

strcpy(newnode->ipaddr, ipaddr);

newnode->updated = 0;

newnode->next=NULL;

cur->next = newnode;

usrlist->count++;

newnode->pos = usrlist->count - 1;

}

[ code taken from usrlist.h ]

In order to remove a node from the list, one has to call the removeFromListfunction,

which is the following.

int removeFromList(unode* usrlist, char username[]) {

unode* cur;

unode* temp;

unode* prev;

cur = usrlist;

int nodepos = 0;

int i = 0;

int j = 0;

if (!strcmp(cur->username, username)) {//removing head node

if (usrlist->count > 1) {

(usrlist+sizeof(unode))->count = usrlist->count-1;

usrlist = usrlist + sizeof(unode);

}

else {

usrlist = NULL;

}

return 1;

}

for (i=0;icount;i++) {

if (!strcmp( (cur + sizeof(unode))->username, username)){

for (j=(cur+sizeof(unode))->pos;jcount; j++){

memcpy(cur + sizeof(unode),

cur+2*sizeof(unode), sizeof(unode));

(cur + sizeof(unode))->pos -= 1;

cur += sizeof(unode);

}

bzero ( (usrlist + usrlist->count*sizeof(unode)),

sizeof(unode));

usrlist->count--;

return 1; //found}


prev = cur;

}

if (!strcmp(cur->username, username)) {

prev->next = NULL;

bzero(cur, sizeof(unode));

usrlist->count--;

return 1; //found and removed

}

return 0; //not found}

7/31/2019 2003-MMLAB-TR-05

25/134

24


As said before, after removing a node, the nodes that are located after it have to be

shifted back by one position so that no empty nodes can be found before the end of

the list. The following figures demonstrate node removal.

After the node removal the list will be in the state shown in the figure below

In this implementation of the list of online users, search by user name and by IP

address are supported. The search by user name is implemented in the

usrListHead

head unode1 unode2 unode3 unode4

Figure 8. removeFromList() shifting nodes back

usrListHead

head unode1 unode3 unode4

Figure 9. After removeFromList()

7/31/2019 2003-MMLAB-TR-05

26/134

25

usrFindInListByName function. This function takes as a parameter the head of the list

and the username of the user. The code for this function is the following.

unode* usrFindInListByName(unode* usrlist, char username[]) {

unode* cur;

int i = 0;

cur = usrlist;

for (i=0;icount;i++) {

if (!strcmp(cur->username, username)) {

/* User found */

return cur;

}


}

/* user not found */

return NULL;

}


The function that performs search based on the IP address of the user is similar. The

difference lies in the search criteria. In the IP based search, obviously, the IP address

given as an argument is compared to the ipaddrfield of the lists unode structures.

If one takes a closer look into the above pieces of code, he will realize that list

traversal is not performed by following the nextpointers. In fact, knowing that nodes

are stored in a serial form and that they have a fixed length, we can determine the next

nodes position by moving the pointer that points to the current nodes by

sizeof(unode) bytes forward.

It seems that this list implementation resembles that of a static array of unode

structures. It cannot be considered fully dynamic, as its length cannot exceed

MAXUSERS members and its nodes are positioned in a serial way. In the typical

linked list implementation, nodes can be physically located everywhere within the

address space of the procedure that has created the list and list traversal is

implemented by following the next pointers of the nodes. In this implementation

though, the typical way of traversing the list is supported to.

2.4 Traffic logging and analysis subsystem

implementationIn this section what is described in detail is the traffic logging subsystems

implementation. Issues of design and development will be discussed.

2.4.1 Packet Capturing SystemsIn the Linux world there are numerous attempts ([5], [6], [7], [8], [9], [10], [12]) to

create traffic logging systems, whose base is packet capturing and filtering. The need

for such systems is as old as the age of computer networks. Usually, traffic logging

and monitoring systems are used for accounting reasons and for reasons of networksecurity and inspection of problems in the networks function.

7/31/2019 2003-MMLAB-TR-05

27/134

26

Some traffic logging systems or architectures are the following:

- SYSLOG: Kernel level logging via the iptables SYSLOG target. It can log some

information in files of a specific format and it is a relatively old way of logging, with

little information about network traffic and more things about the operating system

state.

- ULOG: (Links [10], [12]) Userspace logging via the iptables ULOG target. This

provides more functionality than SYSLOG. It can do more refined logging and it is

much more flexible. In order to make it work, the administrator of the Linux system

has to give the appropriate iptables commands. Then, by running a daemon (the

creator of ULOG, Harald Welte, has written such a daemon program, called ulogd)

the traffic is filtered according to the filtering criteria specified in the iptables

commands and it is logged and analyzed. Ulogd offers the opportunity to specify the

level of the packet analysis the administrator of the system wishes to have by loading

appropriate plugins. Also, the administrator can configure ulogd in such a way that it

can log data in numerous file types, including MySQL or PostgreSQL databases.Ulogd makes use of netlink sockets for the communication between kernel and

userspace. Kernel / userspace switching is the biggest drawback of ulogd, as it suffers

from severe packet loss in high speed networks. Furthermore, due to the fact that it is

a logging method that has recently emerged and its use is not widespread, there is not

much information about it on the web. Also, its documentation is relatively poor.

Finally, ulogd does not offer a straightforward way of packet payload examination.

One has to develop his own interpreter plugin to do further packet analysis.

- Direct kernel level logging using mmap. This works as follows. First, the program

maps (via mmap) the network interface on a circular buffer. Then, a loop begins, in

which packets are read and analyzed and exported information is stored.

- Network Monitoring via SNMP. SNMP (Simple Network Management Protocol) is

a protocol used for (relatively) low level monitoring of the traffic of a network

interface. It uses MIBs (Managing Information Base) which describe the information

to be monitored (e.g. IP traffic volume, open ports, etc.).

- Libpcap based userspace logging. Libpcap is a system-independent interface for

user-level packet capturing. It provides a portable framework for low-level network

monitoring. This is the base for the packet capturing and analysis daemon described in

this document.

2.4.2 Libpcap architecture and principlesLibpcap supports a packet filtering mechanism based on the architecture in the BSD

Packet Filter (BPF). BPF is described in the 1993 Winter Usenix paper The BSD

Packet Filter: A New Architecture for User-level Packet Capture (see [9] reference).

The figure below shows how libpcap works.

7/31/2019 2003-MMLAB-TR-05

28/134

27

The BPF packet filter is a human readable expression which sets the packet filtering

criteria. For example, if the BPF filter is tcp then only tcp traffic will be captured.

The filter can have more detailed expressions, including other protocols or port

numbers. If no filter is specified, then all traffic will be captured. The BPF filter is

then compiled by libpcap, that is, the filter is evaluated by the library and imposed

on intercepted packets.

A program using libpcap generally has to take the following steps:- First, it has to determine the network interface that is to be watched. The network

interface name can be defined from a string(e.g. dev = eth0) or we can let pcap

provide us with a name of an interface. This can be achieved with the

pcap_lookupdev function. Its prototype is:

char* pcap_lookupdev(char* errbuff)

- Then, it has to initialize pcap. Therefore, the function pcap_open_live has to be

called. This is the function where we actually tell pcap the network device that is to be

sniffed.. pcap_open_live function prototype is as follows:

pcap_t *pcap_open_live(char *device, int snaplen,

int promisc, int to_ms, char *errbuf)

device: the device to be sniffed

snaplen: maximum number of bytes to capture

promisc: if set to TRUE, the interface is set in promiscuous mode

to_ms:read timeout in milliseconds

errbuf:buffer to store errors

- The following step is to set the expression to be used for traffic filtering. That is, we

have to specify a rule set according to which packets will be filtered. For example wemay want to examine only packets going to port 21 or only tcp packets. The set of

BPF

BPF Driver

Ethernet Device Driver

Protocol Stack (IP, TCP,..O.S. Kernel

Users ace A lication

Packet Copy

Figure 10. Libpcap architecture

7/31/2019 2003-MMLAB-TR-05

29/134

28

rules must be converted to a format that pcap can understand. This task is performed

by the function pcap_compile. This functions prototype is as follows:

intpcap_compile(pcap_t*p,structbpf_program*fp,

char*str,intoptimize,bpf_u_int32netmask)

The above function compiles the string strinto a filter program, pointed to byfp.fp is

a pointer to a bpf_program struct and is filled by pcap compile. The next step is to

apply the filter. The function pcap_set_filter is responsible for applying the filter.

- Then, we tell pcap to enter its primary execution loop. Every time a new packet gets

sniffed a callback function already defined is called. In this callback function packet

analysis and data logging takes place. The call that tells pcap to enter that loop is

pcap_loop, one of the parameters of which is the name of the callback function

described above.

- The final step is to close the pcap session. However, the loop described in the abovestep is eternal (may be stopped only in case of an error). The solution to this problem

that has been implemented in this packet capturing daemon is to perform the session

closing (as well as other tasks that do not have to do with pcap, such as freeing global

pointers, closing the database handle, etc) when the program receives the SIGINT or

SIGTERM signal. That is, if the program runs in the background and the

user/administrator issues a linux kill %processidcommand, the process receives the

SIGTERM signal so the above tasks are performed before exiting the program.

2.4.3 Libpcap AdvantagesOne of the most important advantages of libpcap is its high portability. Libpcap

programs can be easily ported to most Unix/Linux systems, as well as Windows (the

equivalent for Win32 is winpcap).

Libpcap offers a low level mechanism for capturing packets. Programs can get a copy

of the packet that was intercepted at the Network Interface Card. Then, after stripping

it of its ethernet, tcp and IP headers (which can be used for accounting reasons) the

program can deduce application level protocol information by examining the actual

packets payload. This is harder to be achieved using other methods of traffic logging.

For example, using ULOG/ulogd does not offer the opportunity for packet payload

examination with its default modules. One has to develop his own plugins and embed

them to the existent ulogd body (ulogd is extensible by plugins for packetinterpretation and data output to files and databases). Libpcaps way to do this is

much more straightforward. The same problem is encountered when using SNMP.

SNMP was not designed for detailed application-level monitoring.

Furthermore, libpcap tends to be a standard and well-tested way of packet capturing.

There are lots of traffic logging, accounting and monitoring applications which are

based on libpcap and are widely used. The most well known are probably tcpdump

and Ethereal. Other similar applications are ntop, Snort, etc. The fact that there are

both Microsoft Windows and UNIX/Linux versions of most of the above products is

another proof of libpcaps portability.

7/31/2019 2003-MMLAB-TR-05

30/134

29

2.4.4 Libpcap Performance IssuesLibpcap applications run at user-level. As a result, data has to be exchanged with the

kernel, where the packet is intercepted. This can prove quite costly. Libpcap data

exchange between the kernel and user applications is carried out via system calls,

which are time consuming. Another serious cause of delay is the fact that there have

to be multiple copies of the packet from the moment it is actually intercepted until themoment a copy of it has been delivered to the userland packet capturing application.

These delays result in packet loss. Packet loss, in turn, obviously results in statistics

data loss or erroneous information about the network traffic.

The problem of packet loss is more severe at high speed networks. However, the

network interface the traffic of which has to be logged as far as an AD of the

P2PWNC is concerned is the other end of an 802.11b access point which is of

relatively low bitrate compared to the speed of other networks(non-wireless). In

particular, 802.11b was designed for 11Mbit/sec speed, although the actual bitrate

offered is usually about half of it (or less). Taking into consideration that this bitratewill be divided among a number of users (according to the traffic shaping / bandwidth

control policy) and probably not all of it will be in use, the problems our libpcap-

based packet capturing daemon will face will be less important.

2.4.5 Data StorageThe medium selected for the storage of the traffic statistics is a MySQL database.

There are numerous reasons that support the choice of this database system. First of

all, it is free for non-commercial use. Second, it is a well tested database system, with

known efficiency and speed. Third, a C API is available, which is very well

documented. Also, reasons of compatibility with other modules of the AD (e.g.

authentication module) made the use of MySQL more preferable.

2.4.5.1 Available InformationThe aim of the traffic logging subsystem was to provide the system with network

traffic statistics of the P2PWNC registered users. The information that is of

significance to the system is IP traffic and application-level protocol information per

user. The following table lists the statistics that are logged grouped by the network

protocol they refer to.

Network Protocol Available Statistics

Total IP Upload Volume(Bytes)

Total IP Download Volume(Bytes)

Total HTTP/HTTPS Upload Volume (Bytes)

Total HTTP/HTTPS Download Volume (Bytes)

Total FTP Upload Volume (Bytes)

Total FTP Download Volume (Bytes)

Total SMTP Upload Volume (Bytes)

Total SMTP Download Volume (Bytes)

Total POP3 Upload Volume (Bytes)

Total POP3 Download Volume (Bytes)

IP

Total TELNET Upload Volume (Bytes)

7/31/2019 2003-MMLAB-TR-05

31/134

30

Total TELNET Download Volume (Bytes)

Total SSH Upload Volume (Bytes)

Total SSH Download Volume (Bytes)

FTP Host the User Connected

FTP User Account (FTP User Name)

FTP Account PasswordFTP

FTP Connection Count (To the above host using

the specified user account)

HTTP Request Method

HTTP Request Host

HTTP Request URIHTTP

HTTP Request User Agent

SMTP Sender

SMTP ReceiverSMTP

SMTP Subject

POP3 Server

POP3 Account (user name)

POP3 Password (for the above account)

Using APOP (true if the user uses the APOP

authentication scheme)

POP3 Connection Count (to the above server

using the above user name)

POP3 Message Subject

POP3 Message Sender

POP3 Message Length

POP3 Message ID

POP3

Times the above POP3 message has been

retrieved by the user

2.4.5.2 MySQL Database Scheme

The scheme of the database that stores the above information is the following

#

# `admin` table structure

#

CREATE TABLE admin (

adm_username varchar(80) NOT NULL default '',adm_pass varchar(80) NOT NULL default '',

adm_logged_in enum('y','n') NOT NULL default 'n'

adm_real_name varchar(80) NOT NULL default '',

adm_ipaddr varchar(30) NOT NULL default '',

adm_last_login date NOT NULL default '0-0-0-0',

PRIMARY KEY (adm_username)

) TYPE=MyISAM;

# --------------------------------------------------------

#

# `ftp` table structure#

7/31/2019 2003-MMLAB-TR-05

32/134

31

CREATE TABLE ftp (

f_username varchar(80) NOT NULL default '',

f_ftp_host varchar(80) NOT NULL default '',

f_ftp_user_name varchar(80) NOT NULL default '',

f_ftp_pass varchar(30) NOT NULL default '',

f_ftp_c_count int(11) NOT NULL default '0',PRIMARY KEY (f_username,f_ftp_host,f_ftp_user_name)

) TYPE=MyISAM;

# --------------------------------------------------------

#

# `http` table structure

#

CREATE TABLE http (

h_id bigint(20) NOT NULL auto_increment,

h_username varchar(255) default NULL,

h_host varchar(255) default NULL,h_method int(11) default NULL,

h_uri varchar(255) default NULL,

h_user_agent varchar(255) default NULL,

PRIMARY KEY (h_id)

) TYPE=MyISAM;

# --------------------------------------------------------

#

# `pop3` table structure

#

CREATE TABLE pop3 (

p_id varchar(80) NOT NULL default '',

p_username varchar(80) NOT NULL default '',

p_pop3_srv varchar(40) NOT NULL default '',

p_pop3_user varchar(40) NOT NULL default '',

p_sender varchar(255) NOT NULL default '',

p_msg_subject varchar(255) NOT NULL default '',

p_date varchar(255) NOT NULL default '',

p_msg_length bigint(20) NOT NULL default '0',

p_times_retrieved int(11) NOT NULL default '0',

PRIMARY KEY (p_id,p_username,p_pop3_user)

) TYPE=MyISAM;

# --------------------------------------------------------

#

# `pop3_users` table structure

#

CREATE TABLE pop3_users (

pu_username varchar(80) NOT NULL default '',

pu_pop3_srv varchar(80) NOT NULL default '',

pu_pop3_username varchar(80) NOT NULL default '',

pu_pop3_pass varchar(40) NOT NULL default '',

pu_using_apop tinyint(4) NOT NULL default '0',

pu_pop3_conn_count bigint(20) NOT NULL default '0',

PRIMARY KEY (pu_username,pu_pop3_srv,pu_pop3_username)

) TYPE=MyISAM;

7/31/2019 2003-MMLAB-TR-05

33/134

32

# --------------------------------------------------------

#

# `smtp` table strucutre

#

CREATE TABLE smtp (sm_username varchar(255) NOT NULL default '',

sm_smtp_from varchar(255) NOT NULL default '',

sm_smtp_to varchar(255) NOT NULL default '',

sm_subject varchar(255) NOT NULL default ''

) TYPE=MyISAM;

# --------------------------------------------------------

#

# `user_stats` table structure

#

CREATE TABLE user_stats (ust_username varchar(80) NOT NULL default '',

ust_real_name varchar(80) NOT NULL default '',

ust_domain varchar(80) NOT NULL default '',

ust_online enum('y','n') NOT NULL default 'y',

ust_priv enum('y','n') NOT NULL default 'y',

ust_total_ul bigint(20) NOT NULL default '0',

ust_total_dl bigint(20) NOT NULL default '0',

ust_total_http_ul bigint(20) NOT NULL default '0',

ust_total_http_dl bigint(20) NOT NULL default '0',

ust_total_ftp_ul bigint(20) NOT NULL default '0',

ust_total_ftp_dl bigint(20) NOT NULL default '0',

ust_total_smtp_ul bigint(20) NOT NULL default '0',

ust_total_smtp_dl bigint(20) NOT NULL default '0',

ust_total_telnet_ul bigint(20) NOT NULL default '0',

ust_total_telnet_dl bigint(20) NOT NULL default '0',

ust_total_pop3_ul bigint(20) NOT NULL default '0',

ust_total_pop3_dl bigint(20) NOT NULL default '0',

ust_total_ssh_ul bigint(20) NOT NULL default '0',

ust_total_ssh_dl bigint(20) NOT NULL default '0',

PRIMARY KEY (ust_username)

) TYPE=MyISAM;

# --------------------------------------------------------

#

# `users` table structure

#

CREATE TABLE users (

u_username varchar(255) NOT NULL default '',

u_real_name varchar(255) NOT NULL default '',

u_ip_addr varchar(255) NOT NULL default '',

u_mac_addr varchar(255) NOT NULL default '',

u_online enum('y','n') NOT NULL default 'n',

PRIMARY KEY (u_username)

) TYPE=MyISAM;

Running this MySQL script would create the database where the traffic logging

subsystem stores the information it extracts from the captured packets. A briefdescription of the above databases tables follows.

7/31/2019 2003-MMLAB-TR-05

34/134

33

- Table ftp

In this table the system stores the information about the FTP connections the user has

made.

f_username: This is the P2PWNC user name of the user

f_ftp_host: The FTP host the user makes a connection tof_ftp_user_name: The user name used to connect to the FTP host

f_ftp_pass: The most recent password for this FTP account that has been captured by

the system.

f_ftp_c_count: The number of connections the user has made to the specified host

using the f_ftp_user_name account name.

The primary key of this table is the triplet (f_username,f_ftp_host,f_ftp_user_name).

Every record of this table represents the connections a user makes to a specific ftp

account.

- Table http

In this table information about the HTTP requests a user has made is stored.h_id: This is an auto-increment field which identifies the table records

h_username: The P2PWNC user name

h_host: The host the user has made this HTTP request

h_method: The HTTP request method

h_uri: The URI of the request

h_user_agent: The user agent that was used to issue the HTTP request

The primary key of this table is the h_id field. Each record of this field represents a

single HTTP request. The field h_method stands for the request method (head, get,

post, etc). It takes integer values that stand for method types. The system can deal

only with GET, POST and HEAD request types. The method codes are defined in the

file httplist.h.

- Table pop3users

Here are stored data about POP3 connections users make to specific POP3 accounts.

pu_username: The P2PWNC user name

pu_pop3_srv: The POP3 server the user has connected

pu_pop3_username: The username of the POP3 account

pu_pop3_pass: The most recent password for this account that has been captured

pu_using_apop: Set to 1 if the user is applying the APOP authentication method for

this account

pu_pop3_conn_count: Number of connection this P2PWNC user has made to thisaccount

The primary key of this table is the triplet (pu_username, pu_pop3_srv,

pu_pop3_username). Every record of this table represents the connections a user

makes to a specific pop3 account.

- Table pop3

This is the second database table that deals with the POP3 protocol. Whereas the

previous table includes information about POP3 accounts and connections, this table

stores information about the messages themselves.

p_id: The message ID of the e-mail message, as it can be found in the mail header.

p_username: The P2PWNC user name

7/31/2019 2003-MMLAB-TR-05

35/134

34

p_pop3_srv: The POP3 server the user has connected to (same as in the pop3users

table)

p_pop3_user: The username of the POP3 account (same as in the pop3users table)

p_sender: The sender of the e-mail

p_msg_subject: The subject of the e-mail

p_date: Date the mail was sent (as it can be found in the mail header)p_msg_length: Message length, if available. If it is not available, the value 1 is

stored in the database

p_times_retrieved: How many times this message has been retrieved by the

P2PWNC user.

The primary key of this table is consisted of the fields p_id, p_username and

p_pop3_srv. This means that every record of this table refers to a specific message

(with p_id message ID), as this was retrieved by the P2PWNC user p_username

from the p_pop3_srv POP3 server.

- Table smtp

This table holds information about mail messages sent via SMTP.sm_username: P2PWNC user name

sm_smtp_from: Sender of the mail (e-mail address)

sm_smtp_to: Receiver of the mail (e-mail address)

sm_subject: Mail subject (if available)

- Table users

This table stores user identification information. This table is supposed to be updated

by the authentication module. In particular, each time a user logs in the system, the

authentication module is supposed to update the record that refers to the specific user

with the new dynamic IP address assigned, as well as the users MAC address and set

the field u_online to y. Also, when the packet capturing daemon starts up, it

executes the following SQL statement:SELECT u_username, u_ip_addr FROM users WHERE u_online='y'

as described in a previous chapter, to determine which users are already logged in the

system.

u_username: The P2PWNC user name

u_real_name: The users real name

u_ip_addr: The IP address the user has been assigned. This value is of significance

mainly when the user is online

u_mac_addr: The MAC address of the users NIC

u_online: A flag indicating the users state. It has the value y when the user is

online. Otherwise it has the n value.

The primary key of this table is the users P2PWNC user name (u_username field).

- table user_stats

This table, apart from some user identification information, includes the aggregate IP

traffic statistics. That is, it stores the volume of the users traffic by protocol.

ust_username: P2PWNC username

ust_real_name: Users real name

ust_domain: Administrative Domain the user is registered to

ust_online: Flag indicating whether the user is online

ust_priv: Flag indicating whether the user has extra privileges. This field is not usedby the system and exists only for possible future use.

7/31/2019 2003-MMLAB-TR-05

36/134

35

ust_total_ul: Total volume of the uploaded IP traffic the user has caused

ust_total_dl: Total volume of the downloaded IP traffic of the user

ust_total_http_ul: Total volume of the users HTTP uploads

ust_total_http_dl: Total volume of the users HTTP downloads

ust_total_ftp_ul: Total volume of the users FTP uploads

ust_total_ftp_dl: Total volume of the users FTP downloadsust_total_smtp_ul: Total volume of the users SMTP uploads

ust_total_smtp_dl: Total volume of the users SMTP downloads

ust_total_telnet_ul: Total volume of the users TELNET uploads

ust_total_telnet_dl: Total volume of the users TELNET downloads

ust_total_pop3_ul: Total volume of the users POP3 uploads

ust_total_pop3_dl: Total volume of the users POP3 downloads

ust_total_ssh_ul: Total volume of the users SSH uploads

ust_total_ssh_dl: Total volume of the users SSH downloads

Just like the users table, the primary table of this table is the P2PWNC username

(ust_username). The volume of traffic is always calculated in bytes. As one can see,

the semantics of the primary keys of the tables users and user_stats is the same.This means that the two tables could be merged. However, for reasons of

compatibility with other ADs modules and for reasons of independence between the

modules, the idea of having two separate tables was more preferable. As described

before, it is a duty of the authentication module to update the database with the users

state (online / offline) and identification information. However, the authentication

module does not deal with traffic logging issues. The traffic logging module only

reads identification and user state information. That is why the users table is

bound to the authentication module, while the user_stats table is bound to the traffic

logging and accounting module.

- table admin

This table holds information useful for the traffic statistics server, which is described

in detail in the third chapter of the document.

adm_username: Administrators username

adm_pass: Administrators password

adm_logged_in: A flag indicating whether the administrator is logged in using a

client program.

adm_real_name: An administrators real name

adm_ipaddr: The current IP address of the client the administrator is using

adm_last_login: The date of the last time the administrator logged in

Obviously, this table has nothing to do with the packet capturing daemon program.The data it stores only have to do with the statistics exchange server and client

programs, that will be discussed in Chapter 3. One thing that should be mentioned

now is that the primary key of this table is the field adm_username. This implies that

there can be more than one people with administrative rights as to the statistics server

/ client.

As a remark on the database scheme, it should be mentioned that the table fields are

named considering name of the table. Specifically, the first few letters of the field

name are the initials of the table name, followed by an underscore.

7/31/2019 2003-MMLAB-TR-05

37/134

36

2.5 The Packet Capturing and Analysis Daemon2.5.1 Packet Capturing Daemon Architecture

The above figure demonstrates the operations that are performed on a captured

packet. It can be considered as a flow diagram of the callback function that is called

by the libpcap-based daemon every time a packet is captured. These operations willnow be described in detail.

2.5.1.1 Protocol Header StrippingAs shown above, the first step taken is to strip the packet of the ethernet, IP and tcp

headers. The remainder is the payload of the packet (the data which are useful for

application-level statistics extraction). The fact that these headers have a fixed length

makes their extraction easier. The following code will make that thing clear.

const struct ethhdr *ethernet; /*The ethernet header */

const struct iphdr *ip; /* The IP header */

const struct tcphdr *tcp; /* The TCP header */const char *payload; /* Packet payload */

YES

Captured Packet

Strip Ethernet Header

Strip IP Header

Strip TCP Header

Packets

IP Addr

Check

Shared Memory

Online Users List

Packet Does Not

Belong To An

Online User

Packet Ignored

Packet Sent By

User

Packet Received

By User

TCP src port check

TCP dst port check

TCP Ports

20 / 21 - FTP

22 - SSH

23 - TELNET

25 - SMTP

80 / 443 - HTTP/HTTPS

110 - POP3

DBResults Of The Packet Analysis

Protocol Statistics

Figure 11. Packet Capturing Daemon Architecture

7/31/2019 2003-MMLAB-TR-05

38/134

37

int size_ethernet = sizeof(struct ethhdr);

int size_ip = sizeof(struct iphdr);

int size_tcp = sizeof(struct tcphdr);

/*

Stripping headers*/

ethernet = (struct ethhdr*)(packet);

ip = (struct iphdr*)(packet + size_ethernet);

tcp = (struct tcphdr*)(packet + size_ethernet + size_ip);

payload = (u_char *)(packet + size_ethernet + size_ip + size_tcp);


packet is a pointer to the packet captured (which is handled as an unsigned char

array). The definitions of the Ethernet, IP and TCP headers are located in the

linux/if_ether.h, netinet/ip.h and netinet/tcp.h files.

A brief explanation of the above method of extracting packet headers follows.

As mentioned, the packet is handled as a string of unsigned chars. Packet

manipulation takes place in the body of the callback function that is defined as

parameter of the call pcap_loop. In this program, the call to pcap_loop has the

following format:pcap_loop(handle, 0, (pcap_handler)updateUserStats, NULL);

handle refers to the pcap handle we have acquired be the call to pcap_open_live

(pcap session opening function).

updateUserStats is our packet handling callback function. Such a callback function

has a specified prototype. In this case, the definition of updateUserStats is as follows:

void updateUserStats(u_char *args, const struct pcap_pkthdr *header,

const u_char *packet)

What the packet capturing daemon program is more interested about is the packet

parameter. This parameter is an unsigned char array containing all of the packet

sniffed data. It is a collection of other structures (protocol headers and packet

payload) rather than a string. In fact, it is the serialised version of these structures.

In order to strip the packet of its headers, the program has to perform some type

casting tasks. As mentioned before, packet is a pointer to the start of the packet

structure. Making use of the fact that Ethernet, IP and TCP headers, as defined in

linux/if_ether.h, netinet/ip.h and netinet/tcp.h are of fixed length, we can acquire

pointers to the start of the Ethernet, IP and TCP headers. First, we get a pointer to the

beginning of the Ethernet header, which points to the start of the captured packet. This

pointer is type-cast to a struct of type struct ethhdr

ethernet = (struct ethhdr*)(packet);

The start of the IP header is immediately after the ending of the Ethernet header.

Therefore, a pointer to the beginning of the IP header is exactly sizeof(ethhdr) bytes

after the start of the Ethernet packet. In a similar way, we can calculate the position of

the pointer to the start of the TCP header (if we refer to a TCP / IP Packet) by adding

the size of the ethernet and ip header structures (in bytes) to the value of the position

7/31/2019 2003-MMLAB-TR-05

39/134

38

of the packet start pointer in memory and typecast this pointer to a tcphdrstructure.

Finally, we can find out the exact position in the packet where the payload data start

in the same way and typecast the pointer to u_char*.

ip = (struct iphdr*)(packet + sizeof(struct ethhdr));

tcp = (struct tcphdr*)(packet + sizeof(struct ethhdr) + sizeof(struct

iphdr));

payload = (u_char *)(packet + sizeof(struct ethhdr) + sizeof(struct

iphdr) + sizeof(struct tcphdr) );

The next figure demonstrates the format of an Ethernet packet and how one can

perform the above steps to get the protocol headers out of the captured packet.

2.5.1.2 User Packet MatchingAfter the protocol headers have been extracted, the system has to determine whether

the packet was sent or received by a registered online user. This is achieved by

checking the packets source and destination IP address.

The matching between users and packets is carried out as follows:

First, pointers to the unode structures that will point to the sender and the receiver of

the intercepted packet (if they can be found in the online users list) must be declared.

unode usrsnd; /* packet sender */unode usrrcv; /* packet receiver */

Then, given the source and destination IP addresses of the packet the program

searches the online users list (which is in the shared memory segment) to find out if

the above IP addresses correspond to any of the online users IP address (src / dst) . In

case such users are located in the list, the packet capturing program goes on to log

statistics of the traffic they cause. If no such user is found, then both usrsnd and

usrrcv pointers have a NULL value. The calls to the function that searches the users

list are the following.

usrsnd = usrFindInListByIp(usrListHead,(char*)inet_ntoa(ip-> ip_src.s_addr));

usrrcv = usrFindInListByIp(usrListHead,

sizeof (struct ethhdr)

IP Header

sizeof (struct iphdr)

TCP Header

sizeof (struct tcphdr)

Packet Payloa

2003-MMLAB-TR-05

Documents

Transcript of 2003-MMLAB-TR-05