Save our prairies! View the You Tube video here.
Richard Harter’s World
Site map
November 2007
Mathcomp
Getfline design doc
email

Getfline specification

Note: This is a revised specification for getfline. The original specification is at gflspec0.html.

There is an implementation at http://richardhartersworld.com/cri_a/source_code/getfline/.
The directory contains two files, getfline.h and getfline.c, that implement this description.

Synopsis:

#include “getfline.h”

Code inserted by getfline.h
#include <stdio.h>
enum gfl_flags {
    gfl_clean = 0x1,  gfl_eofok = 0x2,  gfl_cut   = 0x4,
    gfl_trunc = 0x8,  gfl_omit  = 0x10, gfl_exit  = 0x20,
    gfl_log   = 0x40, gfl_nomax = 0x80
};
enum gfl_errors {
    gfl_ok = 0, gfl_stream, gfl_maxlen, gfl_flags,
    gfl_long,   gfl_badeof, gfl_io,     gfl_storage, 
    gfl_corrupt   
};
struct gfl_cb {
    void            * pvt;
    size_t            length;
    enum gfl_errors   errno;
    size_t            maxlen;
    unsigned int      flags;
};
char * getfline      (FILE* fptr, struct gfl_cb * cb);
int    gfl_terminate (            struct gfl_cb * cb);

Description

Function getfline reads lines from a stream. Each time it is called it fetches the next line until there are none left. Getfline has fairly elaborate error detection and reporting facilities. It also permits minimizing storage allocation and deallocation.

The user may choose either to get “clean” copies of each line or “transient” copies of each line, depending on the source of the storage for the line. In clean copy mode each line is a separate storage element allocated by malloc; the user code is responsible for freeing the storage for the line. In transient copy mode the storage for the lines comes from a buffer maintained by getfline; the line persists until the next call to getfline. Getfline has the entire responsibility for managing the storage for transient copies; it is an error to try to free the storage for a transient copy.

Getfline calling sequence

Argument fptr is the stream from which to read characters. It must have previously been opened with fopen. fptr==NULL is an argument error; other invalid values for n produce an I/O error.

Argument cb is the getfline “control block”. If cb is NULL getfline will run in clean copy mode with no bounds check and will ignore a missing final EOL. This is the default mode; use it when simplicity is more important than efficiency and error checking.

The control block can be changed between calls to getfline, including switching back and forth from a NULL control block.

Getfline return

If a line was read, getfline returns a pointer to a buffer containing the line. The line will be terminated with and space for at least one additional character. (The extra space is there in case the user wants to add an EOL line character.)

Function gfl_terminate

Gfl_terminate is there for a special case. Ordinarily getfline will clean up its private data after the last line is extracted. However the user may break out of the read loop before the last line is read. If so, gfl_terminate should be called to clean up the private data. Failure to do is not fatal; however it will leak some memory. Return of zero is okay, nonzero is an error.

Structure gfl_cb

The gfl_cb structure is used to pass data to and from getfline. It has two input fields, maxlen and flags. Maxlen is a bound on the length of the line that will be returned. The flags word holds flags that control the course of processing.

There are two output fields, length and errno. Length is the length of the returned line, not counting the terminating 0. Errno is set when getfline does not return a line. Errno is 0 if the termination was normal and nonzero if there was an error.

The is one private structure, pvt, that is an opaque pointer to a hidden structure created by getfline. It is used to hold state data that persists from one call to the next.

If a control block is being used it should be populated as follows:

    struct gfl_cb cb = {0,0,0,maxlen,flags};
where maxlen is the maximum line length permitted, and flags is set as the or of the selected flags. Maxlen may be 0 if the nomax flag is selected. Arguments two and three (outputs length and errno) don’t need to be initializing, however setting them to zero might be good practice. The first argument, the opaque pointer pvt, MUST be NULL when the first line is read from the file; failure to properly initialize pvt may produce unfortunate behaviour.

Usage modes

Getfline can produce either “clean copies” or “transient copies”. Clean copy mode will be used if there is a NULL control block pointer or if the gfl_clean flag is set; transient copy mode is the default.

In clean copy mode getfline allocates a new buffer for each line that is read. The user is responsible for freeing the storage for the lines. Clean copy mode is appropriate when we want to keep part or all of the file in memory.

In transient copy mode getfline reuses the line buffer; the previous contents, if any, may be overwritten. Getfline handles buffer storage management; the user does not have to allocate or free space for lines. Transient copy mode is appropriate when the contents of a line are immediately processed. Transient copy mode may be more efficient than clean copy mode because it minimizes the number of calls to malloc and free.

When getfline returns a NULL pointer it also frees any buffers that it has allocated.

Anomalous lines

There are two kinds of anomalous lines that can occur, a prematurely terminated last line (i.e., one that lacks an EOL (n) marker), and lines that are longer than maxlen. A prematurely terminated last line will be treated as error unless the gfl_eolok flag is set.

Getfline will not check for long lines if the gfl_nomax flag is set. If getfline is checking for long lines it provides four different ways to handle “long” lines; they can be treated as errors, they can be omitted, they can be truncated, or they can be cut into pieces that are maxlen bytes long.

The default is to treat “long” lines as errors. One of the other three choices can be setting one of the following three flags: gfl_omit, gfl_trunc, or gfl_cut. Only one of these flags can be set; setting more than one is an arguments error.

Setting flags

Getfline uses an enumerated type, gfl_flags, for the various control flags. The general plan is that each separate flag is a differnt single bit. The flags field is a int constructed by oring the desired flags together. For example

    cb.flags = gfl_eofok | gfl_cut | gfl_log;
The flags that can be selected are:
  • gfl_clean: Select this flag to produce clean copies; the default is to produce transient copies.
  • gfl_eofok: Select this flag to accept a prematurely terminated final line. The default is to treat it as an error.
  • gfl_cut: Select this flag to chop long lines into pieces of length maxlen.
  • gfl_trunc: Select this flag to truncate long lines.
  • gfl_omit: Select this flag to ignore long lines.
  • gfl_nomax: Select this flag to skip checking for long lines. The default is to check.
  • gfl_exit: Select this flag if you want getfline to call exit if there is an error.
  • gfl_log: Select this flag if you want getfline to write an an error message to stderr if there is an error.

Responses to errors by getfline

When getfline detects an error or when it reaches an EOF it returns a NULL line. If there was no control block in the calling sequence it is up to the user to determine whether or not there was an error. The detectable errors in this mode are a memory allocation fault, a I/O error while attempting to read the file, or a NULL file pointer.

To disambiguate, first check for an EOF with feof (normal termination). If that fails check for a NULL file pointer. If there is one, check for an I/O error with ferror. If there was no I/O error assume that there was an allocation failure.

When there is a control block in the calling sequence getfline sets the errno field. If the gfl_log field is set it will also write an error message to stderr. If the gfl_exit field is set it will call exit. The possible values for the errno field are:

  • gfl_ok: This says that there was no error. It has the integer value 0. The test for no error is (!cb.errno).
  • gfl_stream: This says that the file pointer was a NULL pointer or was otherwise detectably in error. Getfline checks that the file pointer has not changed from one read to the next.
  • gfl_maxlen: This says that the gfl_nomax flag was not set and that maxlen was 0.
  • gfl_flags: This says that the flags word was illegal. This can either be because there were stray bits in the word or because there was more than one “long line” option selected.
  • gfl_long: This says that there was a long line and that getfline wasn’t told what to do with it.
  • gfl_badeof: This says the file was prematurely terminated and the gfl_eofok flag was not set.
  • gfl_io: This says that there was an I/O error while attempting to read a line.
  • gfl_storage: This says that there was a storage allocation error. It’s probably time to die.
  • gfl_corrupt: This says that the private data in getfline is corrupt. This may be the result of a usage error, e.g., reusing a control block with with properly initializing it.

If there is an error the last line, if any, will be lost. However the stream will not be closed and the file pointer will point at the last character read.

Sample usage

Here are a couple of usage examples. For each we assume the following includes and declarations:

#include "getfline.h"
...
    FILE *fptr;
    char *line = 0;
    struct gfl_cb cb = {0,0,0,0,0};
The first illustrates not using a control block. It produces clean copies. In this example we are reading a file called somefile.txt. Each line is passed to processing function that takes ownership of the lines and the responsibility for freeing their storage.
    fptr = fopen("somefile.txt","r");
    while(line = getfline(fptr,NULL)) {
        process_line(line);
    }
    /* Error checks here if desired */
    if (fptr) fclose(fptr);
Example two illustrates transient copy mode. In example two we are reading input from stdin and writing it to stdout. However in lines longer than 80 characters we insert new line characters every 80 characters.
    cb.maxlen = 80;
    cb.flags = gfl_cut
    while(line = getfline(stdin,&cb;)) {
        fprintf(stdout,"%sn",line);
    }
    if (!cb.errno) {/* do error stuff */}
    if (fptr) fclose(fptr);


This page was last updated November 23, 2007.

Richard Harter’s World
Site map
November 2007
Mathcomp
Getfline design doc
email
Save our prairies! View the You Tube video here.