|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: P T
Length: 15692 (0x3d4c) Types: TextFile Names: »PROJECTS«
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89 └─⟦d53cfc7b2⟧ »./gcc-1.35.tar.Z« └─⟦90f628c1d⟧ └─⟦this⟧ »gcc-1.35/PROJECTS«
0. Improved efficiency. * Parse and output array initializers an element at a time, freeing storage after each, instead of parsing the whole initializer first and then outputting. This would reduce memory usage for large initializers. 1. Better optimization. * Constants in unused inline functions It would be nice to delay output of string constants so that string constants mentioned in unused inline functions are never generated. Perhaps this would also take care of string constants in dead code. The difficulty is in finding a clean way for the RTL which refers to the constant (currently, only by an assembler symbol name) to point to the constant and cause it to be output. * More cse The techniques for doing full global cse are described in the red dragon book. It is likely to be slow and use a lot of memory, but it might be worth offering as an additional option. It is probably possible to extend cse to a few very frequent cases without so much expense. For example, it is not very hard to handle cse through if-then statements with no else clauses. Here's how to do it. On reaching a label, notice that the label's use-count is 1 and that the last preceding jump jumps conditionally to this label. Now you know it is a simple if-then statement. Remove from the hash table all the expressions that were entered since that jump insn and you can continue with cse. It is probably not hard to handle cse from the end of a loop around to the beginning, and a few loops would be greatly sped up by this. * Support more general tail-recursion among different functions. This might be possible under certain circumstances, such as when the argument lists of the functions have the same lengths. Perhaps it could be done with a special declaration. You would need to verify in the calling function that it does not use the addresses of any local variables and does not use setjmp. * Put short statics vars at low addresses and use short addressing mode? Useful on the 68000/68020 and perhaps on the 32000 series, provided one has a linker that works with the feature. This is said to make a 15% speedup on the 68000. This brings to mind Hayes' changes for Stanford MIPS. * Detect dead stores into memory? A store into memory is dead if it is followed by another store into the same location; and, in between, there is no reference to anything that might be that location (including no reference to a variable address). * Loop optimization. Strength reduction and iteration variable elimination could be smarter. They should know how to decide which iteration variables are not worth making explicit because they can be computed as part of an address calculation. Based on this information, they should decide when it is desirable to eliminate one iteration variable and create another in its place. It should be possible to compute what the value of an iteration variable will be at the end of the loop, and eliminate the variable within the loop by computing that value at the loop end. When a loop has a simple increment that adds 1, instead of jumping in after the increment, decrement the loop count and jump to the increment. This allows aob insns to be used. * Using constraints on values. Many operations could be simplified based on knowledge of the minimum and maximum possible values of a register at any particular time. These limits could come from the data types in the tree, via rtl generation, or they can be deduced from operations that are performed. For example, the result of an `and' operation one of whose operands is 7 must be in the range 0 to 7. Compare instructions also tell something about the possible values of the operand, in the code beyond the test. Value constraints can be used to determine the results of a further comparison. They can also indicate that certain `and' operations are redundant. Constraints might permit a decrement and branch instruction that checks zeroness to be used when the user has specified to exit if negative. * Smarter reload pass. The reload pass as currently written can reload values only into registers that are reserved for reloading. This means that in order to use a register for reloading it must spill everything out of that register. It would be straightforward, though complicated, for reload1.c to keep track, during its scan, of which hard registers were available at each point in the function, and use for reloading even registers that were free only at the point they were needed. This would avoid much spilling and make better code. * Change the type of a variable. Sometimes a variable is declared as `int', it is assigned only once from a value of type `char', and then it is used only by comparison against constants. On many machines, better code would result if the variable had type `char'. If the compiler could detect this case, it could change the declaration of the variable and change all the places that use it. * Order of subexpressions. It might be possible to make better code by paying attention to the order in which to generate code for subexpressions of an expression. * More code motion. Consider hoisting common code up past conditional branches or tablejumps. * Trace scheduling. This technique is said to be able to figure out which way a jump will usually go, and rearrange the code to make that path the faster one. * Distributive law. The C expression *(X + 4 * (Y + C)) compiles better on certain machines if rewritten as *(X + 4*C + 4*Y) because of known addressing modes. It may be tricky to determine when, and for which machines, to use each alternative. Some work has been done on this, in combine.c. * Jump-execute-next. Many recent machines have jumps which optionally execute the following instruction before the instruction jumped to, either conditionally or unconditionally. To take advantage of this capability requires a new compiler pass that would reorder instructions when possible. After reload may be a good place for it. On some machines, the result of a load from memory is not available until after the following instruction. The easiest way to support these machines is to output each RTL load instruction as two assembler instructions, the second being a no-op. Putting useful instructions after the load instructions may be a similar task to putting them after jump instructions. * Pipeline scheduling. On many machines, code gets faster if instructions are reordered so that pipelines are kept full. Doing the best possible job of this requires knowing which functional units each kind of instruction executes in and how long the functional unit stays busy with it. Then the goal is to reorder the instructions to keep many functional units busy but never feed them so fast they must wait. * Can optimize by changing if (x) y; else z; into z; if (x) y; if z and x do not interfere and z has no effects not undone by y. This is desirable if z is faster than jumping. * For a two-insn loop on the 68020, such as foo: movb a2@+,a3@+ jne foo it is better to insert dbeq d0,foo before the jne. d0 can be a junk register. The challenge is to fit this into a portable framework: when can you detect this situation and still be able to allocate a junk register? * For the 80387 floating point, perhaps it would be possible to use 3 or 4 registers in the stack to hold register variables. (It would be necessary to keep track of how those slots move in the stack as other pushes and pops are done.) This is probably very tricky, but if you are a GCC wizard and you care about the speed of floating point on an 80386, you might want to work on it. 2. Simpler porting. Right now, describing the target machine's instructions is done cleanly, but describing its addressing mode is done with several ad-hoc macro definitions. Porting would be much easier if there were an RTL description for addressing modes like that for instructions. Tools analogous to genflags and genrecog would generate macros from this description. There would be one pattern in the address-description file for each kind of addressing, and this pattern would have: * the RTL expression for the address * C code to verify its validity (since that may depend on the exact data). * C code to print the address in assembler language. * C code to convert the address into a valid one, if it is not valid. (This would replace LEGITIMIZE_ADDRESS). * Register constraints for all indeterminates that appear in the RTL expression. 3. Other languages. Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are desirable. Pascal, Modula-2 and Ada require the implementation of functions within functions. Some of the mechanisms for this already exist. 4. Generalize the machine model. * Some new compiler features may be needed to do a good job on machines where static data needs to be addressed using base registers. * Some machines have two stacks in different areas of memory, one used for scalars and another for large objects. The compiler does not now have a way to understand this. 5. Precompilation of header files. In the future, many programs will use thousands of lines of header files. Compiling the headers might be slower than compiling the guts of any one source file. Here is a scheme for precompiling header files to make compilation faster for a sequence of headers which is often used. A precompiled header corresponds to a sequence of header files. The preprocessor recognizes when the input starts with a sequence of `#include' commands and searches a data base for a precompiled header corresponding to that sequence. The modtimes of all these files are stored in the data base so that one can tell whether the precompiled header is still valid. For robustness, each directory should have its own collection of precompiled headers and its own data base of them. Probably each precompiled header would be a file and the data base would be one more file. The data base records the entire collection of predefined macros and their definitions, except for __FILE__, __LINE__ and __DATE__, for each precompiled header. If this collection does not match the setup at the start of the current compilation (including the results of -D and -U switches), the precompiled header is inapplicable. It might be possible to have distinct precompiled headers for the same sequence of header files but different collections of predefined macros. The state of any option that affects macro processing, such as -ansi or -traditional, must also be recorded, and the precompiled header is valid only if these options match. The precompiled header contains an ordered series of strings. Some strings are marked "unconditional"; these must be compiled each time the precompiled header is used. Other strings are have keys, which are identifiers. A string with keys must be compiled if at least one of its keys is mentioned in the input. The order these strings appear in the precompiled header is called their intrinsic order. The C preprocessor reads in the precompiled header file and scan all the strings, making for each key an entry in the same symbol table used for macros, pointing at a list of all the strings for which it is a key. Each string must have a flag (one flag per string, not one per key per string). The same code in `rescan' that detects references to macros would detect a reference to a key and flag all of the strings that it belongs to as needing to be output. Each of these strings is immediately recursively macroexpanded (i.e. `rescan' is called), but the output from this is discarded. The expansion is to detect any other keys mentioned in the string, and to define any macros for which the string contains a #define. The key's symbol table entry is be deleted to save time if the key is encountered again, and to avoid an infinite recursion. The unconditional strings are macroexpanded with `rescan' (but the output is discarded) at some time before anything is actually output. At the end of compilation, before any of the actual input text is output, the list of strings is scanned in the intrinsic order, and each string that is unconditional or flagged is output verbatim, except that any #define lines are discarded. Precompiled headers would be constructed by explicit request with a special tool. The first step is to run cpp on the sequence of header files' contents. This would use a new option that would cause all #define lines to be output unchanged as well as defining the macro. The second step is to divide the output into strings, some keyed and some unconditional. This division is done without changing the order of the text being divided up. JNC@lcs.mit.edu has some ideas on this subject also. 6. Better documentation of how GCC works and how to port it. Here is an outline proposed by Allan Adler. I. Overview of this document II. The machines on which GCC is implemented A. Prose description of those characteristics of target machines and their operating systems which are pertinent to the implementation of GCC. i. target machine characteristics ii. comparison of this system of machine characteristics with other systems of machine specification currently in use B. Tables of the characteristics of the target machines on which GCC is implemented. C. A priori restrictions on the values of characteristics of target machines, with special reference to those parts of the source code which entail those restrictions i. restrictions on individual characteristics ii. restrictions involving relations between various characteristics D. The use of GCC as a cross-compiler i. cross-compilation to existing machines ii. cross-compilation to non-existent machines E. Assumptions which are made regarding the target machine i. assumptions regarding the architecture of the target machine ii. assumptions regarding the operating system of the target machine iii. assumptions regarding software resident on the target machine iv. where in the source code these assumptions are in effect made III. A systematic approach to writing the files tm.h and xm.h A. Macros which require special care or skill B. Examples, with special reference to the underlying reasoning IV. A systematic approach to writing the machine description file md A. Minimal viable sets of insn descriptions B. Examples, with special reference to the underlying reasoning V. Uses of the file aux-output.c VI. Specification of what constitutes correct performance of an implementation of GCC A. The components of GCC B. The itinerary of a C program through GCC C. A system of benchmark programs D. What your RTL and assembler should look like with these benchmarks E. Fine tuning for speed and size of compiled code VII. A systematic procedure for debugging an implementation of GCC A. Use of GDB i. the macros in the file .gdbinit for GCC ii. obstacles to the use of GDB a. functions implemented as macros can't be called in GDB B. Debugging without GDB i. How to turn off the normal operation of GCC and access specific parts of GCC C. Debugging tools D. Debugging the parser i. how machine macros and insn definitions affect the parser E. Debugging the recognizer i. how machine macros and insn definitions affect the recognizer ditto for other components VIII. Data types used by GCC, with special reference to restrictions not specified in the formal definition of the data type IX. References to the literature for the algorithms used in GCC