Getfline specification
Note: This is a revised specification for getfline. The original
specification is at gflspec0.html.
There is an implementation at
https://richardhartersworld.com/cri_a/source_code/getfline/.
Synopsis:#include “getfline.h” Code inserted by getfline.h #include <stdio.h> enum gfl_flags { gfl_clean = 0x1, gfl_eofok = 0x2, gfl_cut = 0x4, gfl_trunc = 0x8, gfl_omit = 0x10, gfl_exit = 0x20, gfl_log = 0x40, gfl_nomax = 0x80 }; enum gfl_errors { gfl_ok = 0, gfl_stream, gfl_maxlen, gfl_flags, gfl_long, gfl_badeof, gfl_io, gfl_storage, gfl_corrupt }; struct gfl_cb { void * pvt; size_t length; enum gfl_errors errno; size_t maxlen; unsigned int flags; }; char * getfline (FILE* fptr, struct gfl_cb * cb); int gfl_terminate ( struct gfl_cb * cb);
DescriptionFunction getfline reads lines from a stream. Each time it is called it fetches the next line until there are none left. Getfline has fairly elaborate error detection and reporting facilities. It also permits minimizing storage allocation and deallocation. The user may choose either to get “clean” copies of each line or “transient” copies of each line, depending on the source of the storage for the line. In clean copy mode each line is a separate storage element allocated by malloc; the user code is responsible for freeing the storage for the line. In transient copy mode the storage for the lines comes from a buffer maintained by getfline; the line persists until the next call to getfline. Getfline has the entire responsibility for managing the storage for transient copies; it is an error to try to free the storage for a transient copy. Getfline calling sequenceArgument fptr is the stream from which to read characters. It must have previously been opened with fopen. fptr==NULL is an argument error; other invalid values for n produce an I/O error. Argument cb is the getfline “control block”. If cb is NULL getfline will run in clean copy mode with no bounds check and will ignore a missing final EOL. This is the default mode; use it when simplicity is more important than efficiency and error checking. The control block can be changed between calls to getfline, including switching back and forth from a NULL control block.
Getfline returnIf a line was read, getfline returns a pointer to a buffer containing the line. The line will be terminated with and space for at least one additional character. (The extra space is there in case the user wants to add an EOL line character.)
Function gfl_terminateGfl_terminate is there for a special case. Ordinarily getfline will clean up its private data after the last line is extracted. However the user may break out of the read loop before the last line is read. If so, gfl_terminate should be called to clean up the private data. Failure to do is not fatal; however it will leak some memory. Return of zero is okay, nonzero is an error.
Structure gfl_cbThe gfl_cb structure is used to pass data to and from getfline. It has two input fields, maxlen and flags. Maxlen is a bound on the length of the line that will be returned. The flags word holds flags that control the course of processing. There are two output fields, length and errno. Length is the length of the returned line, not counting the terminating 0. Errno is set when getfline does not return a line. Errno is 0 if the termination was normal and nonzero if there was an error. The is one private structure, pvt, that is an opaque pointer to a hidden structure created by getfline. It is used to hold state data that persists from one call to the next. If a control block is being used it should be populated as follows: struct gfl_cb cb = {0,0,0,maxlen,flags};where maxlen is the maximum line length permitted, and flags is set as the or of the selected flags. Maxlen may be 0 if the nomax flag is selected. Arguments two and three (outputs length and errno) don’t need to be initializing, however setting them to zero might be good practice. The first argument, the opaque pointer pvt, MUST be NULL when the first line is read from the file; failure to properly initialize pvt may produce unfortunate behaviour.
Usage modesGetfline can produce either “clean copies” or “transient copies”. Clean copy mode will be used if there is a NULL control block pointer or if the gfl_clean flag is set; transient copy mode is the default. In clean copy mode getfline allocates a new buffer for each line that is read. The user is responsible for freeing the storage for the lines. Clean copy mode is appropriate when we want to keep part or all of the file in memory. In transient copy mode getfline reuses the line buffer; the previous contents, if any, may be overwritten. Getfline handles buffer storage management; the user does not have to allocate or free space for lines. Transient copy mode is appropriate when the contents of a line are immediately processed. Transient copy mode may be more efficient than clean copy mode because it minimizes the number of calls to malloc and free. When getfline returns a NULL pointer it also frees any buffers that it has allocated.
Anomalous linesThere are two kinds of anomalous lines that can occur, a prematurely terminated last line (i.e., one that lacks an EOL (n) marker), and lines that are longer than maxlen. A prematurely terminated last line will be treated as error unless the gfl_eolok flag is set. Getfline will not check for long lines if the gfl_nomax flag is set. If getfline is checking for long lines it provides four different ways to handle “long” lines; they can be treated as errors, they can be omitted, they can be truncated, or they can be cut into pieces that are maxlen bytes long. The default is to treat “long” lines as errors. One of the other three choices can be setting one of the following three flags: gfl_omit, gfl_trunc, or gfl_cut. Only one of these flags can be set; setting more than one is an arguments error.
Setting flagsGetfline uses an enumerated type, gfl_flags, for the various control flags. The general plan is that each separate flag is a differnt single bit. The flags field is a int constructed by oring the desired flags together. For example cb.flags = gfl_eofok | gfl_cut | gfl_log;The flags that can be selected are:
Responses to errors by getflineWhen getfline detects an error or when it reaches an EOF it returns a NULL line. If there was no control block in the calling sequence it is up to the user to determine whether or not there was an error. The detectable errors in this mode are a memory allocation fault, a I/O error while attempting to read the file, or a NULL file pointer. To disambiguate, first check for an EOF with feof (normal termination). If that fails check for a NULL file pointer. If there is one, check for an I/O error with ferror. If there was no I/O error assume that there was an allocation failure. When there is a control block in the calling sequence getfline sets the errno field. If the gfl_log field is set it will also write an error message to stderr. If the gfl_exit field is set it will call exit. The possible values for the errno field are:
If there is an error the last line, if any, will be lost. However the stream will not be closed and the file pointer will point at the last character read.
Sample usageHere are a couple of usage examples. For each we assume the following includes and declarations: #include "getfline.h" ... FILE *fptr; char *line = 0; struct gfl_cb cb = {0,0,0,0,0};The first illustrates not using a control block. It produces clean copies. In this example we are reading a file called somefile.txt. Each line is passed to processing function that takes ownership of the lines and the responsibility for freeing their storage. fptr = fopen("somefile.txt","r"); while(line = getfline(fptr,NULL)) { process_line(line); } /* Error checks here if desired */ if (fptr) fclose(fptr);Example two illustrates transient copy mode. In example two we are reading input from stdin and writing it to stdout. However in lines longer than 80 characters we insert new line characters every 80 characters. cb.maxlen = 80; cb.flags = gfl_cut while(line = getfline(stdin,&cb;)) { fprintf(stdout,"%sn",line); } if (!cb.errno) {/* do error stuff */} if (fptr) fclose(fptr); This page was last updated November 23, 2007. |