⟦48522ef81⟧

TextFile

@setfilename ../info/strings
@node Strings and Characters, Lists, Numbers, Top
@comment  node-name,  next,  previous,  up
@chapter Strings and Characters

@cindex strings
@cindex character arrays
@cindex characters
@cindex bytes

  Strings are used to send messages to users, to hold extracts from
buffers, and for many other purposes.  The print names of symbols are
strings.  Because strings are so important, a large number of functions
are provided expressly for manipulating them.  In Emacs Lisp,
programmers often use strings, but they seldom use characters, except
when defining keymaps.

  The length of a string (like any array) is fixed, and cannot be
altered.  In particular, strings in Lisp are @emph{not} terminated by
any particular character, such as @sc{ASCII} code 0, as they are in C.  This
means that any character, including the null character (@sc{ASCII} code 0),
is a valid element of a string.@refill

  Since strings are considered arrays, you can operate on them with the
general array functions.  (@xref{Sequences Arrays Vectors}.)  For example, you can
access or change individual characters in a string by using the
@code{aref} and @code{aset} functions (@pxref{Arrays}).

  On the other hand, although strings are considered arrays, they
require only 8 bits for each element rather than a minimum of 32.  This
means that a string takes up much less memory than a vector of the same
length.

  @xref{Text}, for information about functions that display strings or
copy them into buffers.

  @xref{Character Type}, and @pxref{String Type}, for information about
the syntax of characters and strings.

@menu
* Predicates for Strings::      
* Creating Strings::    
* Comparison of Characters and Strings::        
* Conversion of Characters and Strings::        
* Character Case::      
@end menu

@node Predicates for Strings, Creating Strings, Strings and Characters, Strings and Characters
@section The Predicates for Strings

For more information about general sequence and array predicates,
@pxref{Sequences Arrays Vectors}, and @pxref{Arrays}.

@defun stringp object
  This function returns @code{t} if the object is a string, @code{nil}
otherwise.
@end defun

@defun char-or-string-p object
  This function returns @code{t} if the object is a string or a character
(i.e., an integer), @code{nil} otherwise.
@end defun

@node Creating Strings, Comparison of Characters and Strings, Predicates for Strings, Strings and Characters
@section Creating Strings

  The following functions return a string.  For example, in
@file{emacs/lisp/undigest.el}, the @code{makestring} function is used
generate a string of similar characters to use in searching for the
separator line in a mail digest.  In the @code{dired-uncompress}
function @file{emacs/lisp/dired.el}, the @code{substring} function is
used to extract a filename without its @samp{.Z} extension, whereas in
@code{dired-compress}, the @code{concat} function is used to append
@samp{.Z} to a filename.

@defun make-string integer character
  This function returns a string made up of @var{integer} repetitions of
@var{character}.

@example
(make-string 5 ?x)
     @result "xxxxx"
(make-string 0 ?x)
     @result      ""
@end example

  Compare this function to @code{char-to-string}, @code{make-vector},
and @code{make-list}.  (@xref{Conversion of Characters and Strings},
@pxref{Vectors}, and @pxref{Building Cons Cells and Lists}.)

@end defun

@defun substring string start &optional end
  This function returns a new string which consists of those characters
from @var{string} in the range from (and including) the character at the
index @var{start} up to (but excluding) the character at the index
@samp{end}.  The first character is numbered zero.

@group
@example
(substring "abcdefg"  0 3)
     @result "abc"
@end example
@end group

@noindent
In the example, the index for @samp{a} is 0, the index for @samp{b} is 1,
and the index for @samp{c} is 2.  Thus, three letters, @samp{abc}, are
copied from the full string.  The index @samp{3} marks the character
position up to which the substring is copied.  The character whose index
is three is actually the fourth character in the string.

  Note that the index location is just before the character that the
cursor appears on top of; it may be just before the first character.
This location is exactly analogous to the location of point in a buffer,
except that the position of the first character in a buffer is 1, but
the index of the first character in a string is 0.  (@xref{Positions}.)

  A negative number counts from the end of the string, so that -1 is the
index of the last character of the string.

@group
@example
(substring "abcdefg" -3 -1)
     @result "ef"
@end example
@end group

@noindent
In this example, the index for @samp{e} = @minus{}3, the index for
@samp{f} = @minus{}2, and the index for @samp{g} = @minus{}1.  More
precisely, the starting index, @minus{}3, falls between the letters
@samp{d} and @samp{e}; and the ending index, @minus{}1, falls between
the letters @samp{f} and @samp{g}.  That is why only the two characters
between the two index positions are copied, @samp{e} and @samp{f}.

When @code{nil} is used as an index, it falls after the last character
in the string.  Thus:

@group
@example
(substring "abcdefg" -3  nil)
     @result "efg"
@end example
@end group

  When given 0 for @var{start} and no end argument, the @code{substring}
function returns the whole string passed it as an argument.

@group
@example
(substring "abcdefg" 0)
     @result "abcdefg"
@end example
@end group

@noindent
(But don't use @samp{substring} to copy a whole string; use
@code{copy-sequence} instead.)

@cindex args-out-of-range error
  A @code{wrong-type-argument} error results if either @var{start} or
@var{end} are non-integers.  An @code{args-out-of-range} error results
if @var{start} indicates a character following @var{end}, or if either
integer is out of range for the string.@refill
@end defun

@defun concat &rest sequences
@cindex copying sequences
@cindex copying strings
@cindex concatenation
  This function returns a new string consisting of the characters in the
one or more arguments passed to it.  The arguments are not changed.  The
arguments may be strings, lists of numbers, or vectors of numbers.  If
no arguments are passed to @code{concat}, the function returns an empty
string.

@example
(concat "abc" "-def")
     @result "abc-def"
(concat "abc" -123 (list 120 (+ 256 121)) [122])
     @result "abc-123xyz"
(concat "The " "quick brown " "fox.")
     @result "The quick brown fox."
(concat)
     @result ""
@end example

  The @code{concat} functions always constructs a new string which is
not @code{eq} to any existing string.

@cindex integer to decimal
  As a feature, in the special case where an argument is an integer
(not a sequence of integers), it is converted to a string of digits
making up the decimal print representation of the integer.

@group
@example
(concat 137)
     @result "137"
(concat 54 321)
     @result "54321"
@end example
@end group

  For information about other concatenation functions, see
@code{mapconcat} in @ref{Mapping Functions}, @code{vconcat} in
@ref{Vectors}, and @code{append} in @ref{Building Cons Cells and Lists}.

@end defun

@node Comparison of Characters and Strings, Conversion of Characters and Strings, Creating Strings, Strings and Characters
@section Comparison of Characters and Strings

@cindex string equality

@defun char-equal character1 character2
  This function returns @code{t} if the arguments represent the same
character, @code{nil} otherwise.  This is done by comparing two integers
modulo 256.

@example
(char-equal ?x ?x)
     @result t
(char-to-string (+ 256 ?x))
     @result "x"
(char-equal ?x  (+ 256 ?x))
     @result t
@end example
@end defun

@defun string= string1 string2
  This function returns @code{t} if the characters of the two strings
match exactly.  Case is significant.  @code{string-equal} is the same as
@code{string=}.

@example
(string= "abc" "abc")
     @result t
(string= "abc" "ABC")
     @result nil
(string= "ab" "ABC")
     @result nil
@end example
@end defun

@defun string-equal string1 string2
  @code{string-equal} is another name for @code{string=}.
@end defun

@cindex lexical comparison
@defun string< string1 string2
@c (findex string< causes problems for permuted index!!)
  This function compares two strings and finds the first pair of
characters, if any, in the two strings that do not match.  If the
numeric value of the @sc{ASCII} code of the character from the first string
is less than the numeric value of the @sc{ASCII} code of the character from
the second string, then this function returns @code{t}.  If the strings
match, the value is @code{nil}.

  Lower case letters have higher numeric values in the @sc{ASCII} character set
than their upper case counterparts; numbers and many punctuation
characters have a lower numeric value than upper case letters.

@group
@example
(string< "abc" "abd")
     @result t
(string< "abd" "abc")
     @result nil
(string< "123" "abc")
     @result t
@end example
@end group

  When the strings have different lengths, and they match
up to the length of @var{string1}, then the result is @code{t}.  If they
match up to the length of @var{string2}, the result is @code{nil}.
A string without any characters in it is the smallest possible string.

@group
@example
(string< "" "abc")
     @result t
(string< "ab" "abc")
     @result t
(string< "abc" "")
     @result nil
(string< "abc" "ab")
     @result nil
(string< "" "")
     @result nil                   
@end example
@end group
@end defun

@defun string-lessp string1 string2
@code{string-lessp} is another name for @code{string<}.
@end defun

@node Conversion of Characters and Strings, Formatting Strings, Comparison of Characters and Strings, Strings and Characters
@comment  node-name,  next,  previous,  up
@section Conversion of Characters and Strings

@cindex conversion
  Characters and strings may be converted into each other and into
integers.  @code{format} and @code{prin1-to-string}
(@pxref{Output Functions}) may also be used to convert Lisp objects into
strings.  @code{read-from-string} (@pxref{Input Functions}) may be used
to ``convert'' a string representation of a Lisp object into an object.
Also @code{concat}, @code{append}, and @code{vconcat} perform conversion
of an integer to decimal representation as a special feature.@refill

  @xref{Documentation}, for a description of the functions
@code{single-key-description} and @code{text-char-description}, which
return a string representing the Emacs standard notation of the
argument character.  These functions are used primarily for printing
help messages.@refill

@defun char-to-string character
@cindex character to string
  This function returns a new string with a length of one character.
The value of @var{character}, modulo 256, is used to initialize the
element of the string.

This function is similar to @code{make-string} with an integer argument
of 1.  (@xref{Creating Strings}.)

@example
(char-to-string ?x)
     @result "x"
(char-to-string (+ 256 ?x))
     @result "x"
(make-string 1 ?x)
     @result "x"
@end example
@end defun

@defun string-to-char string
@cindex string to character
  This function returns the decimal numeric value of the first character
in the string.  Expressed another way, this function returns the first
character of the string in the print syntax for characters.
(@xref{Character Type}.)

If the string is empty, the function returns 0, which is identical to
the result if the string begins with the null character, @sc{ASCII} code 0.

@example
(string-to-char "ABC")
     @result 65
(string-to-char "xyz")
     @result 120
(string-to-char "")
     @result 0
(string-to-char "\000")
     @result 0
@end example
@end defun

@defun int-to-string integer
@cindex integer to string
@cindex integer to decimal
  This function returns a string consisting of the digits of
@var{integer}, base ten.  When passed a positive integer as an argument,
this function returns an unsigned string.  When passed a negative
integer, the function returns a string with a leading minus sign.

@example
(int-to-string 256)
     @result "256"
(int-to-string -23)
     @result "-23"
@end example
@end defun

@defun string-to-int string
@cindex string to integer
  This function returns the integer value of the characters in @var{string},
read as a number in base ten.

  The string is read starting from (and including) the first character,
and it is read until a non-digit is encountered.  If the first character
is not a digit, this function returns 0.

@example
(string-to-int "256")
     @result 256
(string-to-int "25 is a perfect square.")
     @result 25
(string-to-int "X256")
     @result 0
@end example
@end defun

@node Formatting Strings, Character Case, Conversion of Characters and Strings,Strings and Characters
@comment  node-name,  next,  previous,  up
@section Formatting Strings
@cindex formatting strings
@cindex strings, formatting them

The @code{message} and @code{error} functions use conversion
specification sequences that are identical to those described here for
@code{format}.

@defun format string &rest objects
@cindex formatting
  This function returns a new string that is made up from the characters
in @var{string} with any conversion specification sequences in it
replaced by the print representations of the matching @var{objects}.

  A conversion specification sequence is a string of characters
beginning with a @samp{%}.  For example, if there is a @samp{%s} in 
@var{string}, the @code{format} function does not print the @samp{%s} as such,
but substitutes one of the other arguments there.  It evaluates that
argument and inserts its print representation in place of the @samp{%s}.

@group
@example
(format "The value of fill-column is %d." fill-column)
     @result "The value of fill-column is 72."
@end example
@end group

  If @var{string} contains more than one conversion specification, the
conversion specifications are matched in order with successive
arguments.  Thus, the first conversion specification in @var{string} is
matched with the first argument that follows @var{string}; the second
conversion specification is matched with the second argument that
follows @var{string}, and so on.

Any extra conversion specifications (those for which there are no
matching arguments) will have unpredictable behavior.  Any extra objects
will be ignored.  

Although conversion specifications are supposed to specify the types of
object with which they are matched, no error is reported if there is a
mismatch between the typed specified by the conversion specification and
the type of its corresponding object.

@cindex @samp{%} in format
  The character @samp{%} begins a conversion specification.  The
characters following it indicate how the object should be represented.
The characters recognized are described below:

@table @samp
@item s
@cindex print representation
@cindex object to string
Replace the specification with the print representation of the object.
If there is no corresponding object, the empty string is used.

@item o
@cindex integer to octal
Replace the specification with the base eight representation of an
integer.

@item d
@cindex integer to decimal
Replace the specification with the base ten representation of an
integer.

@item x
@cindex integer to hexadecimal
Replace the specification with the base sixteen representation of an
integer.

@item c
@cindex character to string
Replace the specification with the print representation of a character.

@item %
A single @kbd{%} is placed in the string.

@end table

@cindex invalid format error
  Any other conversion character results in an @samp{Invalid format
operation} error.

  Here are several examples:

@group
@example
(format "The name of this buffer is %s." (buffer-name))
     @result "The name of this buffer is strings.texinfo."

(format "In hash notation, this buffer is %s." (current-buffer))
     @result "In hash notation, this buffer is #<buffer strings.texinfo>."

(format "The octal value of 18 is %o, and the hex value is %x."
        18 18)
     @result "The octal value of 18 is 22, and the hex value is 12."
@end example
@end group

@cindex numeric prefix
@cindex field width
@cindex padding
  All the specification characters allow an optional numeric prefix
between the @samp{%} and the character.  The optional numeric prefix
defines the minimum width for the object.  If the print representation
of the object contains fewer characters than this, then it is padded.
The padding character is normally a space, but if the numeric prefix
starts with a zero, zeros are used for padding. 

  The padding is on the left if the prefix is positive (or starts with
zero) and on the right if the prefix is negative.

@example
(format "%06d will be padded on the left with zeros" 123)
     @result "000123 will be padded on the left with zeros"
(format "%-6d will be padded on the right" 123)
     @result "123    will be padded on the right"
@end example

  No matter what the prefix, nothing in the print representation will
be truncated.  This allows the programmer to specify spacing exactly
without knowing how many characters there are in the object's print
representation.

  In the three following examples, @samp{%7s} specifies a minimum width
of 7.  In the first case, the word inserted in place of @samp{%7s} has
only 3 letters, so 4 blank spaces are inserted for padding.  In the
second case, the word "specification" is 13 letters wide but is not
truncated.  In the third case, the padding is on the right.  (This does
not work in version 18, but does work in version 19.)

@example 
(format "The word `%7s' actually has %d letters in it." "foo" 
        (length "foo"))
     @result "The word `    foo' actually has 3 letters in it."  

(format "The word `%7s' actually has %d letters in it."
        "specification" 
        (length "specification")) 
     @result "The word `specification' actually has 13 letters in it."  

(format "The word `%-7s' actually has %d letters in it." "foo" 
        (length "foo"))
     @result "The word `foo    ' actually has 3 letters in it."  
;; @r{%-7s fails to work in version 18, but does work in version 19.}
;; @r{In version 18, padding is not inserted.}
@end example
@end defun

@node Character Case, , Formatting Strings, Strings and Characters
@comment node-name, next, previous, up 
@section Character Case

@cindex upper case 
@cindex lower case 
@cindex character case 
The character case functions change the case both of single characters
and characters within strings.  The functions only convert alphabetic
characters (the letters `A' through `Z' and `a' through `z'); other
characters are not converted.  The functions do not modify the strings
which are passed to them as arguments.

  The examples below use the characters @samp{X} and @samp{x} which have
@sc{ASCII} values 88 and 120 respectively.

@defun downcase string-or-char
  This function converts a character or a string to lower case.  When
the argument is a string, a string is returned.  When the argument is a
character, the returned value is a Lisp object that is represented in
the print syntax for characters; it is the decimal number of the @sc{ASCII}
code for the character.

  When the argument to @code{downcase} is a string, the function creates
a new string in which each letter in the argument that is upper case is
converted to lower case.@refill

  When the argument to @code{downcase} is a character, @code{downcase}
returns the print representation of the lower case of that character if
it is a letter, or the print representation of the character itself if
not.

@example
(downcase "The cat in the hat")
     @result "the cat in the hat"

(downcase ?X)
     @result 120
@end example
@end defun

@defun upcase string-or-char
  This function converts a character or a string to upper case.  When
the argument is a string, a string is returned.  When the argument is a
character, the returned value is a Lisp object that is represented in
the print syntax for characters; it is the decimal number of the @sc{ASCII}
code for the character.

  When the argument to @code{upcase} is a string, the function creates a
new string in which each letter in the argument that is lower case is
converted to upper case.@refill

  When the argument to @code{upcase} is a character, @code{upcase}
returns the upper case of the print representation of that character if
it is a letter, or the print representation of the character itself if
not.

@example
(upcase "The cat in the hat")
     @result "THE CAT IN THE HAT"

(upcase ?x)
     @result 88
@end example
@end defun

@defun capitalize string-or-char
@cindex capitalization
  This function capitalizes strings or characters.  If
@var{string-or-char} is a string, the function capitalizes each word in
@var{string-or-char}.  This means that the first character of each word
is converted to upper case and the rest are converted to lower case.

  The definition of a word is any sequence of consecutive characters
that are assigned to the @code{word} category in the current syntax
table (@xref{Syntax Classes}).

  When the argument to @code{capitalize} is a character,
@code{capitalize} returns the upper case of that character if it is a
letter, or the character itself if not.  In this case, the returned
value is a Lisp object that is represented in the print syntax for
characters; it is the decimal number of the @sc{ASCII} code for the
character.

  When the argument is a string, a string is returned.

@example
(capitalize "The cat in the hat")
     @result "The Cat In The Hat"

(capitalize "THE 77TH-HATTED CAT")
     @result "The 77th-Hatted Cat"

(capitalize ?x)
     @result 88
@end example
@end defun
DataMuseum.dk

DKUUG/EUUG Conference tapes

⟦48522ef81⟧ TextFile

Derivation

TextFile