|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T p
Length: 16760 (0x4178) Types: TextFile Names: »planning.tex«
└─⟦4f9d7c866⟧ Bits:30007245 EUUGD6: Sikkerheds distributionen └─⟦36857feb3⟧ »./papers/Security_Primer/primer.tar.Z« └─⟦5c5f5f2d8⟧ └─⟦this⟧ »planning.tex«
\section{Pre-Planning your Incident Handling} \subsection{Goals} Despite your best plans to avoid incidents they may very well occur. Proper planning can reduce their serverity, cost and inconvenience levels. There are about half dozen different goals that one can have while handling an incident. \begin{enumerate} \item Maintain and restore data. % back ups \item Maintain and restore service. \item Figure out how it happenned. \item Avoid the future incidents and escalation. \item Avoid looking foolish. % outside world -- press see appendix % inside world (people who should have been called) \item Find out who did it. \item Punish the attackers. \end{enumerate} The order shown above is what I believe the order of priorities generally should be. Of course in a real situation there are many reasons why this ordering might not be appropriate and we will discuss the whens and why of changing our priorities in the next section. For any given site, one can expect that a standard goal prioritization can be developed. This should be done in advance. There is nothing so terrible as being alone in a {\em cold\/} machine room at 4 on a Sunday morning trying to decide whether to shut down the last hole to protect the system or try to get a phone trace done to catch the attacker. It is similarly difficult to decide in the middle of a disaster whether you should shut down a system to protect the existing data or do everything you can to continue to provide service. Noone who is handling the technical side of an incident wants to make these policy decisions without guidance in the middle of a disaster. One can be sure that these decisions will be replayed an re-analyzed by a dozen ``Monday Morning Quarterbacks'' who will explain what should have been done could not be bothered to make up a set of guidelines before. Let us look at each of these goals in a little more detail. \subsubsection{Maintaining and restoring data} To me, the user data is of paramount importance. Anything else is generally replacable. You can buy more disk drives, more computers, more electrical power. If you lose the data, though a security incident or otherwise, it is gone. Of course, if the computer is controlling a physical device, there may be more than just data at stake. For example, the most important goal for the computer in Pacemaker is to get the next pulse out on time. In terms of the protection of user data, there is {\em nothing\/} that can take the place of a good back-up strategy. During the week that this chapter was written, three centers that I work with suffered catastrophic data loss. Two of the three from air conditioning problems, one from programmer error. At all three centers, there were machines with irreplacable scientific data that had never been backed up in their lives. Many backup failures are caused by more subbtle problems than these. Still it is instructive to note that many sites {\em never\/} make a second copy of their data. This means than any problem from a defective disk drive, to a water main break, to a typing mistake when updating system software can spell disaster. If the primary goal is that of maintaining and restoring data, the first thing to do during an incident needs to be to check when the most recent backup was completed. If it was not done very recently, an immediate full system dump {\em must} be made and the system must be shutdown until it is done. Of course, one can't trust this dump as the attacker may have already modified the system. \subsubsection{Maintaining and restoring service} Second to maintaining the data, maintaining service is important. Users have probably come to rely on the computing center and will not be pleased if they can't continue to use it as planned. \subsubsection{Figuring how it happenned} This is by far the most interesting part of the problem and in practice seems to take precident over all of the others. It of course strongly conflicts with the two preceeding goals. By immediately making a complete copy of the system after the attack, one can analyze it at one's leisure. This means that we don't need to worry about normal use destroying evidence of about the attacker re-entering to destroy evidence of what happenned. Ultimately, one may never be able to determine how it happenned. One may find several ways that ``could have happenned'' presenting a number of things to fix. \subsubsection{Avoiding the Future Incidents and Escalation} This needs to be an explicit goal and often is not realized until much too late. To avoid future incidents one of course should fix the problem that first occurred and remove any new security vulnerabilities that were added either by the attackers or by the system staff while trying to figure out what was going on. Beyond this, one needs to prevent turning a casual attacker who may not be caught into dedicate opponent, to prevent enticing other attackers and to prevent others in one's organization and related organizations from being forced to introduce restrictions that would be neither popular nor helpful. \subsubsection{Avoiding looking foolish} Another real world consideration that I had not expected to become an issue is one of image management. In practice, it is important not to look foolish in the press, an issue that we will discuss more fully in \press. Also it is important for the appropriate people within the organization to be briefed on the situation. It is embarrising to find out about an incident in one's own organization from a reporter's phone call. \subsubsection{ Finding out who did it} This goal is often over emphasized. There is definitely a value in knowing who the attacker was so that one can debrief him and discourage him from doing such things in the future. In the average case, it effort to determine the attackers identity than it is worth unless one plans to prosecute him. \subsubsection{Punishing the attackers} This merits of this goal have been seriously debated in the past few years. As a practical matter it is very difficult to get enough evidence to prosecuter someone and very few succesful prosecutions. If this is a one of the goals, very careful record keeping needs to be done at all times during the investigation, and solving the problem will be slowed down as one waits for phone traces and various court orders. \subsection{Backups} It should be clear that accomplishing most of the goals requires having extra copies of the data that is stored on the system. These extra copies are called ``Backups'' and generally stored on magnetic tape. Let us consider two aspects of keeping backup copies of your data. First, we will look at why this important and what the backups are used for and then we will examine the charateristics of a good backup strategy. \subsubsection{Why We Need Back Ups} Good back ups are needed for four types of reasons. The first three of these are not security related per se, though an insufficeint back up strategy will lead to problems with these first three as well. If a site does not have a reliable back up system, when an incident occurs, one must seriously consider immediate shutdown of the system so as not to endanger the user data. \begin{description} \item[User Errors.] Every once in a while, a user delete a file or overwrites data and then realizes that he needs it back. In some operating systems, ``undelete'' facilities or version numbering is enough to protect him, if he notices his mistake quickly enough. Sometimes he doesn't notice the error for a long time, or deletes all of the versions, or expunges them and then wants the data back. If there is no backup system at all, the users data is just plain lost. If there is a perfect backup system, he quickly is able to recover from his mistake. If there is a poor back up system, his data may be recovered in a corrupted form or with incorrect permission set on it. There have been cases where back up systems returned data files to be publically writeable and obvious problems have ensued from it. Perhaps as seriously, there are sites that have stored all of the back up data in a publically readable form, including the data that was protected by the individual user. \item[System Staff Errors.] Just as users make mistakes, staff members do as well. In doing so, they may damage user files, system files or both. Unless there is a copy of the current system files, the staff must restore the system files from the original distribution and then rebuild all of the site specific changes. This is an error prone process and often the site specific changes including removing unwanted debugging features that pose security vulnerabilities. \item[Hardware/Software Failures.] Hardware occassionally fails. If the only copy of the data is on a disk that has become unreadable it is lost. Software occasionally fails. Given a serious enough error, it can make a disk unreadable. \item[Security Incidents.] In this document, our main concern is with security incidents. In determining what happen and correcting it, backups are essential. Basically, one would like to return every file to the state before the incident except for those that are being modified to prevent future incidents. Of course, to do this, one needs a copy to restore from. Naively, one would think that using that modification date would allow us to tell which files need to be updated. This is of course not the case. The clever attack will modify the system clock and/or the timestamps on files to prevent this. In many attacks, at one the following types of files are modified. \begin{itemize} \item The system binary that controls logging in. \item The system authorization file lists the users and their privileges. \item The system binary that controls one or more daemons. \item The accounting and auditing files. \item User's startup files and permission files. \item The system directory walking binary. \end{itemize} \end{description} Now that we understand why we need back ups in order to recover \subsubsection{How to form a Back Up Strategy that Works} There are a few basic rules that provide for a good backup strategy. \begin{itemize} \item Every file that one cares about must be included. \item The copies must be in non-volitile form. While having two copies of each file, one on each of two separate disk drives is good for protection from simple hardware failures, it is not defense from an intelligent attacker that will modify both copies, of from a clever system staffer who saves time by modifying them both at once. \item Long cycles. It may take weeks or months to notice a mistake. A system that reuses the same tape every week will have destroyed the data before the error is noticed. \item Separate tapes. Overwriting the existing backup before having the new one completed is an accident waiting to happen. \item Verified backups. It is necessary to make sure that one can read the tapes back in. One site with a programming bug in its back up utility had a store room filled with unreadable tapes! \end{itemize} \subsection{Forming a Plan} While the first major section (avoidance) contained a lot of standard solutions to standard problems, planning requires a great deal more thought and consideration. A great deal of this is list making. \begin{description} \item[Calls Lists.] If there a system staffer suspects security incident is happening right now, who he should call? And if he gets no answer on that line? What if the people are the call list are no longer employees or have long since died? What if it Christmas Day or Sunday morning? \item[Time--Distance.] How long will it take for the people who are called to arrive? What should be done until they get there? \item[This a user notices.] If a user notices something odd, who should he tell? How does he know this? \item[Threats and Tips.] What should your staffers do if they receive a threat or a tip-off about a breakin? \item[Press.] What should a system staffer do when he receives a call from the press asking about an incident that he, himself doesn't know about? What about when there is a real incident underway? \item[Shutting Down.] Under what circumstances should the center be shutdown or removed from the net? Who can make this decision? When should service be restored? \item[Prosecution.] Under what circumstances do you plan to prosecute? \item[Timestamps.] How can you tell that the timestamps have been altered? What should you do about it? Would running NTP (the network time protocal) help? \item[Informing the Users.] What do you tell the users about all this? \item[List Logistics.] How often to you update the incident plan? How does you system staff learn about it? \end{description} \subsection{Tools to have on hand} File Differencing Tools Netwatcher Spying tools Backup Tapes Blanks Tapes Notebooks \subsection{Sample Scenarios to Work on in Groups} In order to understand what goal priorities you have for you center and as a general exercise in planning, let us consider a number of sample problems. Each of these is a simplified version of a real incident. What would be appropriate to do if a similar thing happenned at your center? Each new paragraph indicates new information that is received later. \begin{itemize} \item A system programmer notices that at midnight each night, someone makes 25 attempts to guess a username--password combination Two weeks later, he reports that each night it is the same username--password combination. \item A system programmer gets a call reporting that a major underground cracker newsletter is being distributed from the administrative machine at his center to five thousand sites in the US and Western Europe. Eight weeks later, the authorities call to inform you the information in one of these newsletters was used to disable ``911'' in a major city for five hours. \item A user calls in to report that he can't login to his account at 3 in the morning on a Saturday. The system staffer can't login either. After rebooting to single user mode, he finds that password file is empty. By Monday morning, your staff determines that a number of privileged file transfer took place between this machine and a local university. Tuesday morning a copy of the deleted password file is found on the university machine along with password files for a dozen other machines. A week later you find that your system initialization files had been altered in a hostile fashion. \item You receive a call saying that breakin to a government lab occurred from one of your center's machines. You are requested to provide accounting files to help trackdown the attacker. A week later you are given a list of machines at your site that have been broken into. \item A user reports that the last login time/place on his account aren't his. Two weeks later you find that your username space isn't unique and that unauthenticated logins are allowed between machines based entirely on username. \item A guest account is suddenly using four CPU hours per day when before it had just been used for mail reading. You find that the extra CPU time has been going into password cracking. You find that the password file isn't one from your center. You determine which center it is from. \item You hear reports of computer virus that paints trains on CRT's. You login to a machine at your center and find such a train on your screen. You look in the log and find not notation of such a feature being added. You notice that five attempts were made to install it within an hour of each before the current one. Three days later you learn that it was put up by a system administrator locally who had heard nothing about the virus scare or about your asking about it. \item You notice that your machine has been broken into. You find that nothing is damaged. A high school student calls up and apologizes for doing it. \item An entire disk partition of data is deleted. Mail is bouncing bouncing because the mail utilities was on that partition. When you restore the partition, you find that a number of system binaries have been changed. You also notice that the system date is wrong. Off by 1900 years. \item A reporter calls up asking about the breakin at your center. You haven't heard of any such breakin. Three days later you learn that there was a breakin. The center director had his wife's name as a password. \item A change in system binaries is detected. The day that it is corrected they again are changed. This repeats itself for some weeks. \end{itemize}