|
|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T s
Length: 14375 (0x3827)
Types: TextFile
Names: »savenews.c«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit
└─⟦95173b3df⟧ »EurOpenD3/news/bnews.2.11/misc.tar.Z«
└─⟦ff4664b96⟧
└─⟦this⟧ »misc/archive/savenews.c«
/*
* savenews filename [filename ...]
*
* Savenews is a program designed to clean up and compact a
* usenet archive. It will take the filename(s) given to it as arguments
* and save them in a netnews archive (defined by SAVENEWS, default is
* /usr/spool/savenews).
*
* This program was set up to do two main things:
*
* 1) compact out the useless parts of the message, specifically the lines
* in the header that don't serve a useful purpose in an archive. This
* is done by removing all but the following header lines: From, Date,
* Newsgroups, Subject, and Message-ID, and seems to save an average of
* 500 bytes an article.
*
* 2) keep the quadratic nature of unix(TM AT&T Bell labs) directory searches
* from making your life miserable. Storing a raw archive of
* net.unix-wizards is a silly thing to do, for example. What I do is
* create a one level subdirectory set to keep any one directory from
* getting too large, but this program is currently set so that there
* are enough directories to keep the total number of files in any one
* directory below about 150 in the largest parts of my archive. The
* algorithm I use is abs(atoi(Message-ID)%HASHVAL)) with HASHVAL being
* prime. This quick and dirty hash gives you directories with the
* numbers 0 to HASHVAL-1, and about the same number of files in each
* given a random distribution of Message-ID numbers (not bad, in
* reality)
*
* The program will add the name of the file and the subject line of the
* article in a logfile in subdirectory LOGS, the filename being the
* newsgroup.
*
* As currently written, an article will be saved only to the first
* newsgroup in the Newsgroups header line. This means that something
* posted to 'net.source,net.flame' will end up in net.sources, but that
* somethine posted to 'net.flame,net.sources' will end up in net.flame.
* I consider this a feature. Others may disagree.
*
* If an article is saved that has a duplicate message-ID of one already
* in the archive, then it will be saved by adding the character '_' and
* some small integer needed to make the filename unique. You can then
* use ls or find to look for these and see if they are duplicates (and
* remove them) or if they are simply botches by some other site (it does
* happen, unfortunately).
*
* This program will do intelligent things if given a non-news article,
* such as nothing. Don't push it, though -- I haven't tried it on
* special devices, symbolic links, and other wierdies and it is likely
* to throw up on some of them since I didn`t feel like protecting someone
* from trying to archive /dev (if tar can consider this a feature, so can
* I...)
*
* This program uses the 4.2 Directory routines (libndir). If you don't
* run 4.2, get ahold of a copy of the compatibility library for your
* system and use it, or hack up do_dir and is_dir to get around it
* if you believe in messing around with primitive hacks (I LIKE libndir)
*
* General usage: every so often run the program with
* 'savenews /usr/spool/oldnews'. Look through /usr/spool/savenews
* for duplicated articles and remove them, and then copy all of the
* stuff to tape. Remove everything except the LOGS directory, so that
* people can use grep to look for things in the archive. It should be
* easy to get things back off of tape and make the archive useful this
* way. Thinking about it, if you can't use the archive, you might as well
* not have it, which is why this program got written (I needed something
* out of my archive, and it took me a week to find it).
*
* This program is designed to run under 2.10.2, but should work under any
* B news system. Anyone else is on their own. This is in
* the public domain by the kindness of my employer, national
* semiconductor, but neither I nor national make any guarantee that it
* will work, that we will support this program, or even admit that it
* exists. This is called a disclaimer, and means that if you use this
* program, you are on your own. It DOES, however, pass lint cleanly, which
* is more than I can say for most stuff posted to the net. Feel free to
* fix, break, enhance, change, or do anything to this program except
* claim it to be your own (unless, of course, you break it...). Passing
* enhancements back to me would be nice, too.
*
* chuq von rospach, national semiconductor (nsc!chuqui)
*
*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/dir.h>
#include <ctype.h>
#define FALSE 0
#define TRUE 1
#define HASHVAL 37 /* hash value for sub-dirs. Prime number! */
#define NUMDIRS 1024 /* number of dirs that can be pushed */
#define SAVENEWS "/usr/spool/savenews" /* home of the archive */
#define LOGFILE "LOGS" /* subdir in SAVENEWS to save logs in */
#define JOBLOG "joblog" /* where log of this job is put */
#define DIRMODE 0755 /* mkdir with this mode */
#define COPYBUF 8192 /* block read/write buffer size */
char *Progname; /* name of the program for Eprintf */
char line[BUFSIZ]; /* general purpose line buffer */
#define NUM_HEADERS 5 /* number of headers we are saving */
#define GROUP_HEADER 1 /* where Newsgroup will be found */
#define SUBJECT_HEADER 2 /* where Subject will be found */
#define MESSAGE_HEADER 3 /* where Message-ID will be found */
char header_data[NUM_HEADERS][BUFSIZ];
char *headers[NUM_HEADERS] =
{
"From:",
"Newsgroups:",
"Subject:",
"Message-ID:",
"Date:"
};
long num_saved = 0; /* number of articles saved */
FILE *logfp; /* file pointer to joblog file */
char *rindex(), *strcat(), *pop_dir(), *strcpy(), *strsave(), *index();
main(argc,argv)
int argc;
char *argv[];
{
register int i;
char joblogfile[BUFSIZ];
char *dirname;
/*
* This removes and preceeding pathname so that
* anything printed out by Eprintf has just the
* program name and not where it came from
*/
if ((Progname = rindex(argv[0],'/')) == NULL)
Progname = argv[0];
else
Progname++;
if (argc == 1) {
fprintf(stderr,"Usage: %s file [file ...]\n",Progname);
exit(1);
}
sprintf(joblogfile,"%s/%s",SAVENEWS,JOBLOG);
if ((logfp = fopen(joblogfile,"w")) == NULL)
fprintf(stderr,"Can't open %s, logging suspended\n",joblogfile);
for (i = 1 ; i < argc; i++) { /* process each parameter */
register int rc;
if ((rc = is_dir(argv[i])) == -1)
continue;
else if (rc == TRUE)
do_dir(argv[i]);
else
save_file(argv[i]);
}
while((dirname = pop_dir()) != NULL) {
do_dir(dirname); /* process whatever is left on dirstack */
}
printf("Total articles saved was %d\n",num_saved);
exit(0);
}
do_dir(dname) /* process a directory, push other directories on stack */
/* to be handled recursively later */
char *dname;
{
DIR *dirp;
struct direct *dp;
char fullname[BUFSIZ];
if ((dirp = opendir(dname)) == NULL) {
Eprintf("can't opendir %s\n",dname);
return;
}
for (dp = readdir(dirp); dp != NULL; dp = readdir(dirp)) {
register int rc;
if(dp->d_namlen == 2 && !strcmp(dp->d_name,"..")
|| (dp->d_namlen == 1 && !strcmp(dp->d_name,".")))
continue; /* skip . and .. */
sprintf(fullname,"%s/%s",dname,dp->d_name);
if((rc = is_dir(fullname)) == -1)
continue;
else if (rc == TRUE)
push_dir(fullname);
else
save_file(fullname);
}
closedir(dirp);
}
is_dir(name)
char *name;
{
struct stat sbuf;
if (stat(name,&sbuf) == -1) {
Eprintf("can't stat '%s'\n",name);
return(-1);
}
return((sbuf.st_mode & S_IFDIR) ? TRUE : FALSE);
}
/* VARARGS */
Eprintf(s1,s2,s3,s4,s5,s6,s7,s8,s9)
char *s1,*s2,*s3,*s4,*s5,*s6,*s7,*s8,*s9;
{
if (logfp == NULL)
return;
fprintf(logfp,"%s: ",Progname);
fprintf(logfp,s1,s2,s3,s4,s5,s6,s7,s8,s9);
fflush(logfp);
}
/*
* quick and dirty stack routines.
*
* push_dir(name) char *name;
* stores the given string in the stack
* char *pop_dir()
* returns a string from the stack, or NULL if none.
*/
static char *dirstack[NUMDIRS];
static int lastdir = 0;
static char pop_name[BUFSIZ];
push_dir(name)
char *name;
{
if (lastdir >= NUMDIRS) {
Eprintf("push_dir overflow!\n");
return;
}
dirstack[lastdir] = strsave(name);
if (dirstack[lastdir] == NULL)
{
Eprintf("malloc failed!\n");
return;
}
lastdir++;
}
char *pop_dir()
{
if(lastdir == 0)
return(NULL);
lastdir--;
strcpy(pop_name,dirstack[lastdir]);
dirstack[lastdir] = NULL;
free(dirstack[lastdir]);
return(pop_name);
}
char *strsave(s)
char *s;
{
char *p, *malloc();
if ((p = malloc((unsigned)strlen(s)+1)) != NULL)
strcpy(p,s);
return(p);
}
save_file(name) /* save the article in the archive */
char *name;
{
FILE *fp, *ofp, *fopen(), *output_file();
register int i, nc;
char diskbuf[COPYBUF];
Eprintf("saving '%s'\n",name);
if ((fp = fopen(name,"r")) == NULL) {
Eprintf("can't open\n");
return;
}
if ((fgets(line,BUFSIZ,fp) == NULL)) {
Eprintf("0 length file\n");
fclose(fp);
return;
}
if (!start_header(line)) {
Eprintf("not a news article\n");
fclose(fp);
return;
}
read_header(fp);
if ((ofp = output_file()) == NULL) {
Eprintf("Can't save\n");
fclose(fp);
return;
}
for (i = 0; i < NUM_HEADERS; i++)
fprintf(ofp,"%s\n",header_data[i]);
fputc('\n',ofp);
while ((nc = fread(diskbuf,sizeof(char),COPYBUF,fp)) != 0)
fwrite(diskbuf,sizeof(char),nc,ofp); /* copy body of article */
fclose(ofp);
fclose(fp);
num_saved++;
return;
}
start_header(s) /* see if this is the start of a news article */
char *s;
{
/*
* If this is coming from B news, the first line will 'always' be
* Relay-Version (at least, on my system). Your mileage my vary.
*/
if (!strncmp(s,"Relay-Version:",14))
return(TRUE);
/*
* If you are copying a section of archive already archived by
* sendnews, then the first line will be From (unless you changed
* the headers data structure, then its up to you...)
*/
if (!strncmp(s,"From:",5))
return(TRUE);
return(FALSE);
}
/*
* By the time we get here, the first line will already be read in and
* checked by start_header(). If we are re-copying a savenews archive
* (which happens when you decide to play with HASHVAL, trust me) then
* we need to save the From line, so we can't just throw it away. Hence
* the funky looking do-while setup instead of something a bit more
* straightforward
*/
read_header(fp)
FILE *fp;
{
register int i;
for (i = 0; i < NUM_HEADERS; i++)
header_data[i][0] = '\0'; /* remove last articles data */
do {
char *cp;
if (line[0] == '\n') /* always be a blank line after the header */
return;
for (i = 0 ; i < NUM_HEADERS; i++) {
if (!strncmp(headers[i],line,strlen(headers[i]))) {
strcpy(header_data[i],line);
if (cp = index(header_data[i],'\n'))
*cp = '\0'; /* eat newlines */
}
}
} while (fgets(line,BUFSIZ,fp) != NULL);
}
FILE *output_file() /* generate the name in the archive */
{
int hashval, copy = 0;
FILE *fp, *fopen();
char *p, newsgroup[BUFSIZ], message_id[BUFSIZ];
char shortname[BUFSIZ], filename[BUFSIZ], filename2[BUFSIZ];
/* get the first newsgroup */
p = index(header_data[GROUP_HEADER],':'); /* move past Newsgroups */
if (!p) {
Eprintf("Invalid newsgroups\n");
return(NULL);
}
p++; /* skip the colon */
while (isspace(*p))
p++; /* skip whitespace */
strcpy(newsgroup,p);
if (p = index(newsgroup,','))
*p= '\0'; /* newsgroup now only has one name in it */
/* get the message-id */
p = index(header_data[MESSAGE_HEADER],':');
if (!p) {
Eprintf("Invalid message-id\n");
return(NULL);
}
p++; /* skip the colon */
while (isspace(*p))
p++; /* skip whitespace */
if (*p == '<' || *p == '(')
p++;
if (*p == '-') /* make negative article id numbers positive (hack) */
p++;
strcpy(message_id,p);
if (p = index(message_id,'.')) /* trim off the .UUCP if any */
*p = '\0';
else if (p = index(message_id,'>')) /* or get the closing bracket */
*p = '\0';
else if (p = index(message_id,')')) /* or get the closing paren */
*p = '\0';
if (p = index(message_id,'@')) /* change nnn@site */
*p = '.'; /* to nnn.site */
/* generate the hash value for the subdirectory */
hashval = atoi(message_id) % HASHVAL;
/* setup the filename to save to */
sprintf(shortname,"%s/%d/%s",newsgroup,hashval,message_id);
sprintf(filename,"%s/%s",SAVENEWS,shortname);
while (exists(filename)) { /* make it unique if neccessary */
sprintf(shortname,"%s/%d/%s_%d",newsgroup,hashval,message_id,++copy);
sprintf(filename,"%s/%s",SAVENEWS,shortname);
}
strcpy(filename2,filename); /* must chop off the filename */
if (p = rindex(filename2,'/')) /* since we don't want to */
*p = '\0'; /* to makeparents */
makeparents(filename2);
if ((fp = fopen(filename,"w")) == NULL) {
Eprintf("Can't open %s for output\n",filename);
return(NULL);
}
log(newsgroup,shortname);
return(fp);
}
exists(name)
char *name;
{
struct stat sbuf;
if (stat(name,&sbuf) == -1) {
return(FALSE);
}
return(TRUE);
}
makeparents(name) /* recursively make parent directories */
char *name;
{
char *p, buf[BUFSIZ];
if (exists(name))
return;
strcpy(buf,name);
if (!(p = rindex(buf,'/'))) {
Eprintf("makeparents failed!\n");
return;
}
*p = '\0';
makeparents(buf);
mkdir(name,DIRMODE);
}
log(group,name) /* write to the logfile */
char *group, *name;
{
char *subject, logfile[BUFSIZ];
FILE *ofp, *fopen();
/* get the subject */
subject = index(header_data[SUBJECT_HEADER],':');
if (!subject) {
Eprintf("Invalid subject, no log entry\n");
return;
}
subject++; /* skip the colon */
while (isspace(*subject))
subject++; /* skip whitespace */
/* generate the place where it goes */
sprintf(logfile,"%s/%s",SAVENEWS,LOGFILE);
makeparents(logfile);
strcat(logfile,"/");
strcat(logfile,group);
if ((ofp = fopen(logfile,"a")) == NULL)
{
Eprintf("open failed on %s\n",logfile);
return;
}
fprintf(ofp,"%s\t%s\n", name, subject);
fclose(ofp);
}