5.3. SteelScript NetShark Tutorial¶
This tutorial will show you how to interface with a NetShark using SteelScript for Python. This tutorial assumes a basic understanding of Python (if not, see the Beginner’s Guide to Python). In addition, you should be somewhat familiar with NetShark and packet analysis via Packet Analyzer (formerly Pilot). If you have never used NetShark and Packet Analyzer, it may be helpful to first read Background on the NetShark architecture.
The tutorial has been organized so you can follow it sequentially.
Throughout the examples, you will be expected to fill in details
specific to your environment. These will be called out using a dollar
sign $<name>
– for example $host
indicates you should fill
in the host name or IP address of a NetShark appliance.
Whenever you see >>>
, this indicates an interactive session using
the Python shell. The command that you are expected to type follows
the >>>
. The result of the command follows. Any lines with a
#
are just comments to describe what is happening. In many cases
the exact output will depend on your environment, so it may not match
precisely what you see in this tutorial.
5.3.1. NetShark Objects¶
As with any Python code, the first step is to import the module(s) we
intend to use. The SteelScript code for working with NetShark
appliances resides in a module called
steelscript.netshark.core
. The main class in this module is
NetShark
.
This object represents a connection to a NetShark appliance.
To start, start python from the shell or command line:
$ python
Python 2.7.3 (default, Apr 19 2012, 00:55:09)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Once in the python shell, let’s create a NetShark object:
>>> from steelscript.netshark.core import NetShark
>>> from steelscript.common import UserAuth
>>> sk = NetShark('$host', auth=UserAuth('$username', '$password'))
The first argument is the hostname or IP address of the NetShark appliance. The second argument is a named parameter and identifies the authentication method to use – in this case, simple username/password is used.
As soon as the NetShark object is created, a connection is established to the NetShark appliance, and the authentication credentials are validated. If the username and password are not correct, you will immediately see an exception.
The sk
object is the basis for all communication with the NetShark
appliance, whether that is running views, checking configuration, or
simply retrieving information about the appliance. Lets take a look
at some basic information about the netshark that we just connected
to:
>>> info = sk.get_serverinfo()
>>> info['version']
'10.0'
# Returns the uptime in nanoseconds
>>> info['uptime']
1747209615759000
# Returns the local_time in nanoseconds since Jan 1, 1970
>>> info['local_time']
1355803849818051000
# Utility function to print convert this time into a datetime object...
>>> from steelscript.common.timeutils import nsec_string_to_datetime
# Now let's see what that time really is...
>>> str(nsec_string_to_datetime(info['local_time']))
'2012-12-18 04:10:49.818051+00:00'
# Let's see the entire info structure
>>> info
{'build_time': 'Nov 27 2012 16:10:50',
'build_type': 'Final',
'build_variant': 'TurboCap',
'has_webui': True,
'hostname': 'netshark',
'internal_version': '10.0.1005.0241',
'local_time': 1355803849818051000,
'protocol_version': '4.0',
'start_time': 1354056640202292000,
'system_type': 'Linux',
'timezone': '-0800',
'uptime': 1747209615759000,
'version': '10.0',
'view_version': '3.5',
'webui_SSL': True,
'webui_port': 443,
'webui_root_path': '/'}
Before moving on, exit the python interactive shell:
>>> [Ctrl-D]
$
5.3.2. Views¶
Let’s create our first script. We’re going write a simple script that creates and applies a view on the first capture interface on our NetShark appliance.
This script will use packets in a pcap file. To start, download a
copy of tutorial.pcap
and save it in a new directory.
In the same directory as the pcap file, create a file called
view.py
and insert the following code:
import pprint
from steelscript.netshark.core import NetShark
from steelscript.common import UserAuth
from steelscript.netshark.core import Value, Key
# Fill these in with appropriate values
host = '$host'
username = '$username'
password = '$password'
# Open a connection to the appliance and authenticate
sk = NetShark(host, auth=UserAuth(username, password))
# First, upload our pcap file, if it's not already present
if not sk.exists('/{0}/tutorial.pcap'.format(username)):
homedir = sk.get_dir(username)
tracefile = homedir.upload_trace_file('tutorial.pcap', 'tutorial.pcap')
else:
# If it's already there, just grab a handle to it
tracefile = sk.get_file('/{0}/tutorial.pcap'.format(username))
# Now create a view on this tracefile. Start by selecting the columns of interest
columns = [ Key(sk.columns.ip.address),
Value(sk.columns.generic.packets),
Value(sk.columns.generic.bytes) ]
view = sk.create_view(tracefile, columns, name="tutorial view")
# Retrieve the data
data = view.get_data(aggregated=True)
# Close the view
view.close()
# Print the output to the screen
printer = pprint.PrettyPrinter(2)
printer.pprint(data)
Be sure to fill in appropriate values for $host
, $username
and
$password
. Run this script as follows and you should see
something like the following:
$ python view.py
[ { 'p': 388,
't': datetime.datetime(2012, 12, 18, 12, 41, 33, 808202, tzinfo=tzutc()),
'vals': [ ['11.1.1.90', 384, 255208],
['173.194.75.106', 13, 1025],
['216.34.181.45', 48, 39486],
['184.31.179.172', 263, 195713],
['74.125.226.220', 10, 1446],
['208.70.199.49', 16, 2960],
['23.66.231.51', 3, 198],
['23.66.231.41', 7, 1253],
['74.125.226.219', 16, 10555],
['11.1.1.100', 2, 100],
['204.93.70.150', 6, 2472]]}]
Let’s take a closer look at what this script is doing. The first few lines are simply importing a few libraries that we’ll be using:
import pprint
from steelscript.netshark.core import NetShark
from steelscript.common import UserAuth
from steelscript.netshark.core import Value, Key
Next, we create a NetShark object that establishes our connection to the target appliance:
# Open a connection to the appliance and authenticate
sk = NetShark(host, auth=UserAuth(username, password))
This next section ensures that the pcap file that we want to analyze is up on the appliance.
# First, upload our pcap file, if it's not already present
if not sk.exists('/{0}/tutorial.pcap'.format(username)):
homedir = sk.get_dir(username)
tracefile = homedir.upload_trace_file('tutorial.pcap', 'tutorial.pcap')
else:
# If it's already there, just grab a handle to it
tracefile = sk.get_file('/{0}/tutorial.pcap'.format(username))
At this point, the variable tracefile
is a handle to the tracefile
‘tutorial.pacap’ that now present on the filesystem of the NetShark
appliance.
Next, we’re going to actually create a view. The first step is to select the set of columns that we’re interested in collecting:
columns = [ Key(sk.columns.ip.address),
Value(sk.columns.generic.packets),
Value(sk.columns.generic.bytes) ]
NetShark supports numerous columns, and any column can be either a key column or a value column. Each row of data will be aggregated according to the set of key columns selected. The value columns define the set of additional data to collect per row. In this example, we are asking to collect total packets and bytes for each IP address seen in the pcap file.
Now create the view:
view = sk.create_view(tracefile, columns, name="tutorial view")
The first argument is the packet source
. When creating a view,
the packet source can be one of four types of source objects:
Interfaces, Trace Files, Capture Jobs and Trace Clips. A packet
source can be live (e.g. a NetShark capture port) or offline (e.g. a
Trace Clip). General information about packet sources can be found in
the NetShark Glossary. See Packet Source Objects for details on
how to work with the various source types as objects.
We can now use the view object to get data:
# Retrieve the data
data = view.get_data(aggregated=True)
5.3.3. Data Objects¶
The data object returned by the get_data()
method contains the key
and value columns requested, but also returns a few addition fields of
meta data.
First, edit view.py
and comment out the line that closes the
view - add a ‘#’ in front of view.close()
:
# Close the view
# view.close()
Now rerun the python script, but pass the -i
argument to python to
drop into an interactive shell after running the script. This will
allow us to inspect the data that was returned:
$ python -i view.py
[ { 'p': 388,
't': datetime.datetime(2012, 12, 18, 12, 41, 33, 808202, tzinfo=tzutc()),
'vals': [ ['11.1.1.90', 384, 255208],
['173.194.75.106', 13, 1025],
['216.34.181.45', 48, 39486],
['184.31.179.172', 263, 195713],
['74.125.226.220', 10, 1446],
['208.70.199.49', 16, 2960],
['23.66.231.51', 3, 198],
['23.66.231.41', 7, 1253],
['74.125.226.219', 16, 10555],
['11.1.1.100', 2, 100],
['204.93.70.150', 6, 2472]],
'value_count': 11}]
$
We are now back at the python prompt, but all the variables assigned in the script are available to use for inspection.
First of all, note that the data object itself is a list of length 1:
>>> type(data)
<type 'list'>
>>> len(data)
1
Each element in the list is called a sample
. We only have a
single sample in this output - we’ll cover more about samples later.
A sample has 3 fields in it:
p
- number of packets processedt
- timestamp of the beginning of the sample intervalvals
- the key and value columns that were requested when the view was created
For this output, there is only one sample. In the sample interval, 388 packets were processed. The sample interval started as 12:41:33.808202 on Dec 18, 2012.
The get_data()
method supports a number of additional options that
allow us to change how the data is returned. For example, we can ask
for the data to be sorted by bytes, the third column (index 2 starting
from 0):
>>> data = view.get_data(aggregated=True, sortby=2)
# Look only at the 'vals' for the first and only sample (index 0)
>>> printer.pprint(data[0]['vals'])
[ ['11.1.1.90', 384, 255208],
['184.31.179.172', 263, 195713],
['216.34.181.45', 48, 39486],
['74.125.226.219', 16, 10555],
['208.70.199.49', 16, 2960],
['204.93.70.150', 6, 2472],
['74.125.226.220', 10, 1446],
['23.66.231.41', 7, 1253],
['173.194.75.106', 13, 1025],
['23.66.231.51', 3, 198],
['11.1.1.100', 2, 100]]
Or sort by packets (index 1), in ascending order:
>>> data = view.get_data(aggregated=True, sortby=1, sorttype="ascending")
>>> printer.pprint(data[0]['vals'])
[ ['11.1.1.100', 2, 100],
['23.66.231.51', 3, 198],
['204.93.70.150', 6, 2472],
['23.66.231.41', 7, 1253],
['74.125.226.220', 10, 1446],
['173.194.75.106', 13, 1025],
['208.70.199.49', 16, 2960],
['74.125.226.219', 16, 10555],
['216.34.181.45', 48, 39486],
['184.31.179.172', 263, 195713],
['11.1.1.90', 384, 255208]]
Note that the list of columns has the same order as requested when the view was created.
5.3.4. Aggregated or Not¶
Notice that with each call to get_data()
, we are passing the
argument aggregated=True
. This argument indicates that we are not
interested in time-series data, we want only care about the Key()
columns that were used to create the view. But what happens if you
set aggregated=False
?
Normally all data on the NetShark appliance is collected in time
intervals and will return that data by time. This is what happens
when aggregated=False
. The time interval must be set when you
create the view, but by default it is 1 second.
Let’s see what the output would look like when it’s not aggregated. Change the True to False and rerun the script:
>>> data = view.get_data(aggregated=False)
The output should look like this:
>>> len(data)
6
>>> printer.pprint(data)
[ { 'p': 15,
't': datetime.datetime(2012, 12, 18, 12, 41, 33, 808202, tzinfo=tzutc()),
'vals': [['11.1.1.90', 13, 1025], ['173.194.75.106', 13, 1025]]},
{ 'p': 289,
't': datetime.datetime(2012, 12, 18, 12, 41, 34, 808202, tzinfo=tzutc()),
'vals': [ ['11.1.1.90', 289, 193299],
['216.34.181.45', 44, 39222],
['184.31.179.172', 203, 139597],
['74.125.226.220', 10, 1446],
['208.70.199.49', 6, 1028],
['23.66.231.51', 3, 198],
['23.66.231.41', 7, 1253],
['74.125.226.219', 16, 10555]]},
{ 'p': 60,
't': datetime.datetime(2012, 12, 18, 12, 41, 35, 808202, tzinfo=tzutc()),
'vals': [['184.31.179.172', 60, 56116], ['11.1.1.90', 60, 56116]]},
{ 'p': 3,
't': datetime.datetime(2012, 12, 18, 12, 41, 36, 808202, tzinfo=tzutc()),
'vals': [['11.1.1.90', 2, 380], ['208.70.199.49', 2, 380]]},
{ 'p': 11,
't': datetime.datetime(2012, 12, 18, 12, 41, 37, 808202, tzinfo=tzutc()),
'vals': [ ['11.1.1.100', 2, 100],
['11.1.1.90', 10, 2904],
['208.70.199.49', 2, 332],
['204.93.70.150', 6, 2472]]},
{ 'p': 10,
't': datetime.datetime(2012, 12, 18, 12, 41, 39, 808202, tzinfo=tzutc()),
'vals': [ ['208.70.199.49', 6, 1220],
['11.1.1.90', 10, 1484],
['216.34.181.45', 4, 264]]}]
Where as before data
was a list of length one, it now has multiple
samples. Each sample provides a snapshot of the key and value columns
requested for one interval starting at the time indicated by t
.
Looking in detail at the second sample:
>>> data[1]
{'p': 289,
't': datetime.datetime(2012, 12, 18, 12, 41, 34, 808202, tzinfo=tzutc()),
'vals': [['11.1.1.90', 289, 193299],
['216.34.181.45', 44, 39222],
['184.31.179.172', 203, 139597],
['74.125.226.220', 10, 1446],
['208.70.199.49', 6, 1028],
['23.66.231.51', 3, 198],
['23.66.231.41', 7, 1253],
['74.125.226.219', 16, 10555]]}
>>> from steelscript.common.timeutils import *
>>> data[1]['t'].strftime("%x %X")
'12/18/12 12:41:34'
From this, we can tell that the sample covers the time from 12:41:34 to 12:41:35. (Note, to be precise, it actually covers from 12:41:34.808202 to 12:41:35.808202) Within that interval 289 packets were processed and host 11.1.1.90 was involved in each and every one of those packets accounting for 193,299 bytes.
Let’s take a look at the time range covered for each sample using a little Python magic.
>>> for sample in data:
... print "Start: {0}, processed {1} packets".format(sample['t'].strftime("%x %X"), sample['p'])
...
Start: 12/18/12 12:41:33, processed 15 packets
Start: 12/18/12 12:41:34, processed 289 packets
Start: 12/18/12 12:41:35, processed 60 packets
Start: 12/18/12 12:41:36, processed 3 packets
Start: 12/18/12 12:41:37, processed 11 packets
Start: 12/18/12 12:41:39, processed 10 packets
Note
Note: do not type in the leading ...
for the second and third
lines above. After typing in the first line (for sample
), and
press enter, Python will prompt you with ...
for additional
commands to be executed for each iteration of the for loop. You
must type in the 4 leading spaces before print
. At the end
of the second line, when you press enter it will prompt again with
...
, indicating that you may enter additional commands. In
this case, we are done so just press enter again, and Python will
execute the for loop. See Dive Into Python - 2.5: Indenting Code
for more information.
Notice that 12:41:38 is missing? This is not a bug – it just means that there were no packets in the trace file during that sample interval, so there is no data to show.
Before continuing on, exit from the Python shell:
>>> [Ctrl-D]
$
5.3.5. Processing View Data¶
Ok, now let’s enhance the script to do a bit more:
- compute average packet size (bytes / packets)
- select hosts sending small packets (< 100 bytes)
- for each host, print out the protocols in use
Open up view.py and add a new import to the top of the file:
from steelscript.netshark.core import Value, Key
from steelscript.netshark.core.filters import * # <--- Add this line
import pprint
Next, uncomment the line the closes the view, delete the lines that print the data and replace the last section that prints the output to the screen with the following code:
# Close the view
view.close()
# Print the output to the screen # <--- delete these three lines
# printer = pprint.PrettyPrinter(2)
# printer.pprint(data)
# Compute avg bytes/packet, and resort # <--- add the rest of this to the script
rows = data[0]['vals']
filtered_rows = [row for row in rows if (row[2] / row[1]) < 100]
print "{0} Hosts are sending small packets (avg size < 100 bytes)".format(len(filtered_rows))
for row in filtered_rows:
print "{0}\t{1} bytes/pkt".format(row[0], row[2] / row[1])
# Now create a new view that breaks out the protocol / port for each host above
columns = [ Key(sk.columns.ip.protocol_name),
Value(sk.columns.generic.packets),
Value(sk.columns.generic.bytes) ]
for row in filtered_rows:
filters = [NetSharkFilter('ip.address="{0}"'.format(row[0]))]
view = sk.create_view(tracefile, columns, filters, name="tutorial view - ip {0}".format(row[0]))
data = view.get_data(aggregated=True)
view.close()
print "\nHost {0}".format(row[0])
for pp_row in data[0]['vals']:
print "{0}\t{1} bytes/pkt".format(pp_row[0],pp_row[2] / pp_row[1])
Save your changes and rerun the script (without the -i
this time):
$ python view.py
3 Hosts are sending small packets (avg size < 100 bytes)
173.194.75.106 78 bytes/pkt
23.66.231.51 66 bytes/pkt
11.1.1.100 50 bytes/pkt
Host 173.194.75.106
TCP 78 bytes/pkt
Host 23.66.231.51
TCP 66 bytes/pkt
Host 11.1.1.100
ICMP 50 bytes/pkt
This script now runs a total of 4 views, the first view collects bytes
and packets per IP address. The subsequent views collect bytes and
packets per protocol for an individual IP address using a
NetSharkFilter
:
filters = [NetSharkFilter('ip.address="{0}"'.format(row[0]))]
A NetSharkFilter allows you to form complex expressions using operators and various fields within a packet.
5.3.6. Existing Views¶
In the above examples, we have always created a new view from scratch, then closed that view when we were done. Often, a view may be created and running for a longer period of time. For example, a live view is continually being updated as new traffic is received. Views may also be created using the Packet Analyzer application.
If there are already open views on the NetShark appliance, we can
access them with the get_open_views()
method. Start up a new
Python shell and lets give this a try:
$ python
Python 2.7.3 (default, Apr 19 2012, 00:55:09)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from steelscript.netshark.core import NetShark
>>> from steelscript.common import UserAuth
>>> sk = NetShark('$host', auth=UserAuth('$username', '$password'))
>>> views = sk.get_open_views()
>>> views
[<View source="fs/admin/noon.cap" title="Bandwidth Over Time">,
<View source="fs/admin/noon.cap" title="TCP Flags by Protocol Over Time"
<View source="fs/admin/tutorial.pcap">]
Note
Your appliance will likely show a different set of open views. You should at least see the tutorail-1.pcap view in the list.
This method returns a list of objects, one representing each open
view. We can get information about the time interval covered by the
view with the get_timeinfo()
method:
>>> view = views[0]
>>> view.get_timeinfo()
{'delta': 1000000000, 'end': 1195590918719742000, 'start': 1195590481719742000}
This method returns a struct with 3 fields:
start
andend
indicate the timestamp of the first and last samples covered by the viewdelta
specifies the interval of time covered by a single sample in nanoseconds (defaults to 1 second)
For a view applied to a trace clip
or a static file, the start
and end
times will be fixed for a particular view. For a view
applied to a capture port
or the virtual device associated with a
capture job
, the end
time will be regularly updated as new
packets arrive and are processed.
The views created above by the view.py script had only a single
output
associated with it. In general, a view may have multiple
outputs associated with it. Each output has the same basic
structure - it contains a list of samples as described above.
In this example, we are looking at the “TCP Flags by Protocol Over
Time” view which has separate outputs for the different flags that can
appear in TCP headers. In Packet Analyzer, these outputs show
up as separate graphs. In SteelScript, there is a separate
output object for each output, we can get at
them with the all_outputs()
method:
>>> view.all_outputs()
[<view output OUID_Fin>,
<view output OUID_Psh>,
<view output OUID_Urg>,
<view output OUID_Ack>,
<view output OUID_Rst>,
<view output OUID_Syn>]
Note
The number of outputs returned is based upon the view selected. The view created above as part of the tutorial only has a single output at index 0. Note that if the view you have selected does not have 6 outputs, adjust the array index below.
Let’s use the helper routine viewutils.print_data()
to
print the data in the view to the console:
>>> from steelscript.netshark.core.viewutils import print_data
# Grab the 6th output corresponding to the SYN flag, adjust as necessary
# for the view selected
>>> output = view.all_outputs()[5]
>>> output
<view output OUID_Syn>
>>> print_data(output.get_legend(), output.get_data())
Time Protocol Packets
2012/05/10 12:31:37.502796 http 1
2012/05/10 12:31:51.502796 http 1
2012/05/10 12:32:51.502796 http 1
2012/05/10 12:32:56.502796 http 1
2012/05/10 12:33:16.502796 https 1
2012/05/10 12:33:27.502796 http 2
2012/05/10 12:33:51.502796 http 1
...
Note that the method View.get_data()
simply calls the
get_data() function for the first output of a view. As such, the
following are equivalent:
# Retrieving the data for the first output:
>>> output0 = view.get_output(0)
>>> data0 = output0.get_data()
# Equivalent shortcut from the view object:
>>> data = view.get_data()