C string handling


C string handling

In computer programming, a C string is a character string stored as an array containing the characters and terminated with a null character ('\0', called NUL in ASCII). The name refers to the C programming language which uses this null-terminated string representation. Alternative names are C string and ASCIIZ (note that C strings do not imply the use of ASCII).

The length of a C string is found by searching for the (first) NUL byte. This can be slow, as it takes O(n) (linear time) with respect to the string length. It also means that a NUL cannot be inside the string, as the only NUL is the one marking the end.

Contents

Definitions

The term, string, is used in C to describe a contiguous sequence of characters terminated by and including the first null byte.[1] A common misconception is that a string is an array, because string literals are converted to arrays during the compilation (or translation) phase.[2] It is important to remember that a string ends at the first NUL byte. An array or string literal that contains a null byte before the last byte therefore contains a string, or possibly several strings, but is not itself a string.[3]

The term, pointer to a string is used in C to describe a pointer to the initial (lowest addressed) byte of a string.[1] As pointers are used to pass a reference to a string to functions in C, documentation (including this page) will often use the term string when correct notation is to say pointer to string.

The term, length of a string is used in C to describe the number of bytes preceding the null character.[1] strlen is a standardised function commonly used to determine the length of a string.

Overview of functions

Most of the functions that operate on C strings are defined in the string.h (cstring header in C++). This header contains declarations of functions and types used not only for handling C strings but also various memory handling functions; the name is thus something of a misnomer.

Functions declared in string.h are extremely popular, since as a part of the C standard library, they are guaranteed to work on any platform which supports C. However, some security issues exist with these functions, such as buffer overflows, leading programmers to prefer safer, possibly less portable variants. Also, the string functions only work with character encodings made of bytes, such as ASCII and UTF-8. In historical documentation the term "character" was often used instead of "byte", which if followed literally would mean that multi-byte encodings such as UTF-8 were not supported. The BSD documentation has been fixed to make this clear, but POSIX, Linux, and Windows documentation still uses "character" in many places. Functions to handle character encodings made up of larger code units than bytes, such as UTF-16, is generally achieved through wchar.h.

Constants and types

Name Notes
NULL macro expanding to the null pointer constant; that is, a constant representing a pointer value which is guaranteed not to be a valid address of an object in memory.
size_t an unsigned integer type which is the type of the result of the sizeof operator.

Functions

String manipulation
  • strcpy - copies one string to another
  • strncpy - write exactly n bytes to a string, copying from src or add 0's
  • strcat - appends one string to another
  • strncat - appends no more than n bytes from one string to another
  • strxfrm - transforms a string according to the current locale
String examination
  • strlen - is a function in C, C++, PHP which returns the length of a string
  • strcmp - compares two strings
  • strncmp - compares specific number of bytes in two strings
  • strcoll - compares two strings according to the current locale
  • strchr - finds the first occurrence of a byte
  • strrchr - finds the last occurrence of a byte
  • strspn - finds the first occurrence of a byte not in a set of bytes
  • strcspn - finds the last occurrence of a byte not in a set of bytes
  • strpbrk - finds the first occurrence of a byte in a set of bytes
  • strstr - finds the first occurrence of a substring
  • strtok - finds the next occurrence of a token
Miscellaneous
  • char* strerror(int errnum); - generates and reports a C-style string, containing an error message derived from the error code passed in with errnum.[4] The strerror function is not reentrant.
Memory manipulation
  • memset - fills a buffer with a byte repeated
  • memcpy - copies one buffer to another
  • memmove - copies one buffer to another, possibly overlapping, buffer
  • memcmp - compares two buffers
  • memchr - finds the first occurrence of a byte

Numeric conversions

C standard library contains several functions for numeric conversions. They all are defined in the stdlib.h header (cstdlib header in C++).

  • atof - converts a string to a floating-point value
  • atoi, atol, atoll (C99/C++11) - converts a string to an integer
  • strtof(C99/C++11), strtod, strtold(C99/C++11) - converts a string to a floating-point value
  • strtol, strtoll - converts a string to a signed integer
  • strtoul, strtoull - converts a string to an unsigned integer

Popular extensions

  • memccpy - SVID, POSIX - copies up to specified number of bytes between two memory areas, which must not overlap, stopping when a given byte is found[5]
  • mempcpy - GNU - a variant of memcpy returning a pointer to the byte following the last written byte
  • strcat_s - ISO/IEC WDTR 24731 - a variant of strcat that checks for errors, such as destination buffer being too small, before copying
  • strcpy_s - ISO/IEC WDTR 24731 - a variant of strcpy that checks for errors, such as destination buffer being too small, before copying
  • strdup - POSIX - allocates and duplicates a string
  • strerror_r - POSIX 1 - a variant of strerror that is thread-safe.
  • strlcpy - a variant of strcpy that truncates the copied string if the destination is too small[6]
  • strlcat - a variant of strcat that truncates the appended string if the destination is too small[6]
  • strsignal - POSIX:2008 - returns string representation of a signal code. Not thread safe[7]
  • strtok - POSIX - a variant of strtok_r that is thread-safe.[8]

Criticism

strcat_s and strcpy_s attracted considerable criticism because even though they are defined in ISO/IEC WDTR 24731 standard, they are currently supported only by Microsoft Visual C++. Warning messages produced by Microsoft's compilers suggesting programmers use these functions instead of standard ones have been speculated by some to be a Microsoft attempt to lock developers to its platform.[9][10]

strlcpy and strlcat have been criticised on the basis that they create more problems than they solve[11] and lacking documentation[12]. Consequently they have not been included to Linux even though several other operating systems, notably OpenBSD, FreeBSD, Solaris, Mac OS X, implement it.

See also

References

  1. ^ a b c "The C99 standard draft + TC3". Section 7.1.1p1. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf. Retrieved 7 January 2011. 
  2. ^ "The C99 standard draft + TC3". Section 6.4.5p7. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf. Retrieved 7 January 2011. 
  3. ^ "The C99 standard draft + TC3". Section 6.4.5 footnote 66. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf. Retrieved 7 January 2011. 
  4. ^ strerror
  5. ^ "memccpy". Pubs.opengroup.org. http://pubs.opengroup.org/onlinepubs/009695399/functions/memccpy.html. Retrieved 9 November 2011. 
  6. ^ a b Todd C. Miller; Theo de Raadt (1999). "strlcpy and strlcat - consistent, safe, string copy and concatenation.". USENIX '99. http://www.gratisoft.us/todd/papers/strlcpy.html. 
  7. ^ "strsignal". Pubs.opengroup.org. http://pubs.opengroup.org/onlinepubs/9699919799/functions/strsignal.html. Retrieved 9 November 2011. 
  8. ^ "strtok". Pubs.opengroup.org. http://pubs.opengroup.org/onlinepubs/009695399/functions/strtok.html. Retrieved 9 November 2011. 
  9. ^ Danny Kalev. "They're at it again". InformIT. http://www.informit.com/blogs/blog.aspx?uk=Theyre-at-it-again. Retrieved 10 November 2011. 
  10. ^ "Security Enhanced CRT, Safer Than Standard Library?". http://fsfoundry.org/codefreak/2008/09/15/security-crt-safer-than-standard-library/. Retrieved 10 November 2011. 
  11. ^ libc-alpha mailing list, selected messages from 8 August 2000 thread: 53, 60, 61
  12. ^ Antill, James. "Security with string APIs: Security relevant things to look for in a string library API". http://www.and.org/vstr/security#strncpy-ex7. Retrieved 10 November 2011. 

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • String.h — is the header in the C standard library for the C programming language which contains macro definitions, constants, and declarations of functions and types used not only for string handling but also various memory handling functions; the name is… …   Wikipedia

  • String Buffer — In object oriented programming, a String Buffer is an alternative to String. It has the ability to be altered through adding or appending, whereas a String is normally fixed or unchangeable. Java s approach Theory Java s standard way to handle… …   Wikipedia

  • String potentiometer — A string potentiometer is a transducer used to detect and measure linear position and velocity using a flexible cable and spring loaded spool. Other common names include string pot , cable extension transducer , draw wire sensor , and yo yo… …   Wikipedia

  • Null-terminated string — In computer programming, a null terminated string is a character string stored as an array containing the characters and terminated with a null character ( , called NUL in ASCII). Alternative names are C string, which refers to the C… …   Wikipedia

  • Exception handling syntax — varies between programming languages to accommodate their overall syntax. Some languages don t call the concept exception handling or they may not have direct facilities for it, but they can still provide means for implementing it. Catalogue of… …   Wikipedia

  • Exception handling — is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution. Programming languages differ considerably in their support… …   Wikipedia

  • Empty string — In computer science and formal language theory, the empty string (or null string)[1] is the unique string of length zero. It is denoted with λ or sometimes Λ or ε. The empty string is distinct from a null reference in that in an object oriented… …   Wikipedia

  • Drill string — A drill string on a drilling rig is a column, or string, of drill pipe that transmits drilling fluid (via the mud pumps) and torque (via the kelly drive or top drive) to the drill bit. The term is loosely applied as the assembled collection of… …   Wikipedia

  • Magic string — A magic string is an input that a programmer believes will never come externally and which activates otherwise hidden functionality. A user of this program would likely provide input that gives an expected response in most situations. However, if …   Wikipedia

  • The G-String Murders — infobox Book | name = The G String Murders title orig = translator = image caption = Cover of 1943 hardcover edition. author = Gypsy Rose Lee or possibly ghost written by Craig Rice (authorship in dispute) illustrator = cover artist = country =… …   Wikipedia