DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - metrics - download
Index: T p

⟦caa85bf85⟧ TextFile

    Length: 16760 (0x4178)
    Types: TextFile
    Names: »planning.tex«

Derivation

└─⟦4f9d7c866⟧ Bits:30007245 EUUGD6: Sikkerheds distributionen
    └─⟦36857feb3⟧ »./papers/Security_Primer/primer.tar.Z« 
        └─⟦5c5f5f2d8⟧ 
            └─⟦this⟧ »planning.tex« 

TextFile

\section{Pre-Planning your Incident Handling}

\subsection{Goals}

Despite your best plans to avoid incidents they may very well occur.
Proper planning can reduce their serverity, cost and inconvenience
levels.  There are about half dozen different goals that one can have
while handling an incident.  

\begin{enumerate}

\item Maintain and restore data.
%	back ups



\item Maintain and restore service.

\item Figure out how it happenned.

\item Avoid the future incidents and escalation.

\item Avoid looking foolish.
%	outside world -- press see appendix
%	inside world (people who should have been called)

\item Find out who did it.

\item  Punish the attackers.


\end{enumerate}


The order shown above is  what I believe the order of priorities
generally should be.  Of course in a real situation there are many
reasons why this ordering might not be appropriate and we will discuss
the whens and why of changing our priorities in the next section.  

For any given site, one can expect that a standard goal prioritization
can be developed.  This should be done in advance.  There is nothing
so terrible as being alone in a {\em cold\/} machine room at 4 on a
Sunday morning trying to decide whether to shut down the last hole to
protect the system or try to get a phone trace done to catch the
attacker.  It is similarly difficult to decide in the middle of a
disaster whether you should shut down a system to protect the existing
data or do everything you can to continue to provide service.

Noone who is handling the technical side of an incident wants to make
these policy decisions without guidance in the middle of a disaster.
One can be sure that these decisions will be replayed an re-analyzed
by a dozen ``Monday Morning Quarterbacks'' who will explain what
should have been done could not be bothered to make up a set of
guidelines before.


Let us look at each of these goals in a little more detail.


\subsubsection{Maintaining and restoring data}

To me, the user data is of paramount importance.  Anything else is
generally replacable.  You can buy  more disk drives, more computers,
more electrical power.  If you lose the data, though a security incident
or otherwise, it is gone.  

Of course, if the computer is controlling a physical device, there may
be more than just data at stake.  For example, the most important goal
for the computer in Pacemaker is to get the next pulse out on time.

In terms of the protection of user data, there is {\em nothing\/} that
can take the place of a good back-up strategy.  During the week that this
chapter was written, three centers that I work with suffered
catastrophic data loss.  Two of the three from air conditioning
problems, one from programmer error.  At all three centers, there were
machines with irreplacable scientific data that had never been backed
up in their lives.

Many backup failures are caused by more subbtle problems than these.
Still it is instructive to note that many sites {\em never\/} make a
second copy of their data.  This means than any problem from a
defective disk drive, to a water main break, to a typing mistake when
updating system software can spell disaster.

If the primary goal is that of maintaining and restoring data, the
first thing to do during an incident needs to be to check when the
most recent backup was completed.  If it was not done very recently,
an immediate full system dump {\em must} be made and the system must
be shutdown until it is done.  Of course, one can't trust this dump as
the attacker may have already modified the system.


\subsubsection{Maintaining and restoring service}

Second to maintaining the data, maintaining service is important.  Users
have probably come to rely on the computing center and will not be
pleased if they can't continue to use it as planned. 

\subsubsection{Figuring how it happenned}

This is by far the most interesting part of the problem and in
practice seems to take precident over all of the others.  It of course
strongly conflicts with the two preceeding goals.  

By immediately making a complete copy of the system after the attack,
one can analyze it at one's leisure.  This means that we don't need to
worry about normal use destroying evidence of about the attacker
re-entering to destroy evidence of what happenned.

Ultimately, one may never be able to determine how it happenned.  One
may find several ways that ``could have happenned'' presenting a
number of things to fix.


\subsubsection{Avoiding the Future Incidents and Escalation}

This needs to be an explicit goal and often is not realized until much
too late.  To avoid future incidents one of course should fix the
problem that first occurred and remove any new security
vulnerabilities that were added either by the attackers or by the
system staff while trying to figure out what was going on.

Beyond this, one needs to prevent turning a casual attacker who may
not be caught into dedicate opponent, to prevent enticing other
attackers and to prevent others in one's organization and related
organizations from being forced to introduce restrictions that would
be neither popular nor helpful.

\subsubsection{Avoiding looking foolish}

Another real world consideration that I had not expected to become an
issue is one of image management.  In practice, it is important not to
look foolish in the press, an issue that we will discuss more fully in
\press. Also it is important for the appropriate people within the organization
to be briefed on the situation.  It is embarrising to find out about
an incident in one's own organization from a reporter's phone call.

\subsubsection{ Finding out who did it}

This goal is often over emphasized.  There is definitely a value in
knowing who the attacker was so that one can debrief him and
discourage him from doing such things in the future. 

In the average case, it effort to determine the attackers identity
than  it is worth unless one plans to prosecute him.


\subsubsection{Punishing the attackers}

This merits of this goal have been seriously debated in the past few
years.  As a practical matter it is very difficult to get enough
evidence to prosecuter someone and very few succesful prosecutions.
If this is a one of the goals, very careful record keeping needs to be
done at all times during the investigation, and solving the problem
will be slowed down as one waits for phone traces and various court
orders.



\subsection{Backups}
It should be clear that accomplishing most of the goals requires having
extra copies of the data that is stored on the system.  These extra
copies are called ``Backups'' and generally stored on magnetic tape. 

Let us consider two aspects of keeping backup copies of your data.
First, we will look at why this important and what the backups are
used for and then we will examine the charateristics of a good backup
strategy. 

\subsubsection{Why We Need Back Ups} 

Good back ups are needed for four types of reasons.  The first three of
these are not security related per se, though an insufficeint  back up
strategy will lead to problems with these first three as well. 

If a site does not have a reliable back up system, when an incident
occurs, one must seriously consider immediate shutdown of the system
so as not to endanger the user data.

\begin{description}

\item[User Errors.]  Every once in a while, a user delete a file or
overwrites data and then realizes that he needs it back.  In some
operating systems, ``undelete'' facilities or version numbering is
enough to protect him, if he notices his mistake quickly enough.
Sometimes he doesn't notice the error for a long time, or deletes all
of the versions, or expunges them and then wants the data back.

If there is no backup system at all, the users data is just plain
lost.  If there is a perfect backup system, he quickly is able to
recover from his mistake.  If there is a poor back up system, his data
may be recovered in a corrupted form or with incorrect permission set
on it.  

There have been cases where back up systems returned data files to be
publically writeable and obvious problems have ensued from it.
Perhaps as seriously, there are sites that have stored all of the back
up data in a publically readable form, including the data that was
protected by the individual user.
	
\item[System Staff Errors.]  Just as users make mistakes, staff
members do as well.  In doing so, they may damage user files, system
files or both.  Unless there is a copy of the current system files, the
staff must restore the system files from the original distribution and
then rebuild all of the site specific changes.  This is an error prone
process and often the site specific changes including removing
unwanted debugging features that pose security vulnerabilities.

\item[Hardware/Software Failures.]  Hardware occassionally fails.  If
the only copy of the data is on a disk that has become unreadable it
is lost.  Software occasionally fails.  Given a serious enough error,
it can make a disk unreadable.

\item[Security Incidents.]  In this document, our main concern is with
security incidents.  In determining what happen and correcting it,
backups are essential.

Basically, one would like to return every file to the state before the
incident except for those that are being modified to prevent future
incidents.  Of course, to do this, one needs a copy to restore from.
Naively, one would think that using that modification date would allow
us to tell which files need to be updated.  This is of course not the
case. The clever attack will modify the system clock and/or the
timestamps on files to prevent this.

In many attacks, at one the following types of  files are modified.

\begin{itemize}
\item The system binary that controls logging in.
\item The system authorization file lists the users and their privileges.
\item The system binary that controls one or more daemons.


\item The accounting and auditing files.
\item User's startup files and permission files.
\item The system directory walking binary.
\end{itemize}


\end{description}

Now that we understand why we need back ups in order to recover

\subsubsection{How to form a Back Up Strategy that Works}

There are a few basic rules that provide for a good backup strategy. 


\begin{itemize}
\item Every file that one cares about must be included.

\item The copies must be in non-volitile form.  While having two
copies of each file, one on each of two separate disk drives is good
for protection from simple hardware failures, it is not defense from
an intelligent attacker that will modify both copies, of from a clever
system staffer who saves time by modifying them both at once.

\item Long cycles.  It may take weeks or months to notice a mistake.
A system that reuses the same tape every week will have destroyed the
data before the error is noticed.

\item Separate tapes.  Overwriting the existing backup before having
the new one completed is an accident waiting to happen. 

\item Verified backups.  It is necessary to make sure that one can
read the tapes back in.  One site with a programming bug in its back
up utility had a store room filled with unreadable tapes!




\end{itemize}




\subsection{Forming a Plan}

While the first major section (avoidance) contained a lot of standard
solutions to standard problems, planning requires a great deal more
thought and consideration.  A great deal of this is list making.

\begin{description}

\item[Calls Lists.]  If there a system staffer suspects security
incident is happening right now, who he should call?  

And if he gets no answer on that line? 

What if the people are the call list are no longer employees or have
long since died?

What if it Christmas Day or Sunday morning?


\item[Time--Distance.]  How long will it take for the people who are
called to arrive?

 What should be done until they get there?

\item[This a user notices.]  If a user notices something odd, who
should he tell?  


How does he know this?

\item[Threats and Tips.]  What should your staffers do if they receive
a threat or a tip-off about a breakin?

\item[Press.]  What should a system staffer do when he receives a
call from the press asking about an incident that he, himself doesn't
know about?

What about when there is a real incident underway?

\item[Shutting Down.]  Under what circumstances should the center be
shutdown or removed from the net?

Who can make this decision?

When should service be restored?

\item[Prosecution.]  Under what circumstances do you plan to prosecute?

\item[Timestamps.]  How can you tell that the timestamps have been altered?

What should you do about it?

Would running NTP (the network time protocal) help?

\item[Informing the Users.] What do you tell the users about all this?

\item[List Logistics.] How often to you update the incident plan?

How does you system staff learn about it?
\end{description}

\subsection{Tools to have on hand}

File Differencing Tools

Netwatcher

Spying tools

Backup Tapes

Blanks Tapes

Notebooks

\subsection{Sample Scenarios to Work on in Groups}

In order to understand what goal priorities you have for you center
and as a general exercise in planning, let us consider a number of
sample problems.  Each of these is a simplified version of a real
incident.  What would be appropriate to do if a similar thing
happenned at your center?  Each new paragraph indicates new
information that is received later.

\begin{itemize}
\item A system programmer notices that at midnight each night, someone
makes 25 attempts to guess a username--password combination

Two weeks later, he reports that each night it is the same
username--password combination. 

\item A system programmer gets a call reporting that a major
underground cracker newsletter is being distributed from the
administrative machine at his center to five thousand sites in the US
and Western Europe.

Eight weeks later, the authorities call to inform you the information
in one of these newsletters was used to disable ``911'' in a major
city for five hours.

\item A user calls in to report that he can't login to his account at
3 in the morning on a Saturday.  The system staffer can't login
either. After rebooting to single user mode, he finds that password
file is empty.

By Monday morning, your staff determines that a number of privileged
file transfer took place between this machine and a local university.

Tuesday morning a copy of the deleted password file is found on the
university machine along with password files for a dozen other
machines. 

A week later you find that your system initialization files had been
altered in a hostile fashion.

\item You receive a call saying that breakin to a government  lab
occurred from one of your center's machines.  You are requested to
provide accounting files to help trackdown the attacker.

A week later you are given a list of machines at your site that have
been broken into.

\item A user reports that the last login time/place on his account
aren't his.

Two weeks later you find that your username space isn't unique and
that unauthenticated logins are allowed between machines based
entirely on username.

\item A guest account is suddenly using four CPU hours per day when
before it had just been used for mail reading.

You find that the extra CPU time has been going into password cracking.

You find that the password file isn't one from your center.

You determine which center it is from.

\item You hear reports of computer virus that paints trains on CRT's.

You login to a machine at your center and find such a train on your
screen. 

You look in the log and find not notation of such a feature being
added. 

You notice that five attempts were made to install it within an hour
of each before the current one.

Three days later you learn that it was put up by a system
administrator locally who had heard nothing about the virus scare or
about your asking about it.

\item You notice that your machine has been broken 
into.  

You find that nothing is damaged.

A high school student calls up and apologizes for doing it.

\item An entire disk partition of data is deleted.  Mail is bouncing
bouncing because the mail utilities was on that partition.

When you restore the partition, 
you find that a number of system binaries have been changed.  You also
notice that the system date is wrong.  Off by 1900 years.

\item A reporter calls up asking about the breakin at your center.
You haven't heard of any such breakin.

Three days later you learn that there was a breakin.  The center
director had his wife's name as a password. 


\item A change in system binaries  is detected.

The day that it is corrected they again are changed.  

This repeats itself for some weeks.
\end{itemize}