Dickinson — C library for time series operations
Dickinson is a C library containing various utilities, mainly for time
series operations. The Python pthelma library is partly a front end to
Dickinson.
ts - Time series operations
The ts module contains utilities for time series management. It
has two data types: record for time series records,
and timeseries for time series. Memory management and
other operations on these two data types are performed by the
accompanying functions.
Basic operations
-
struct ts_record
Represents a time series record.
-
long_time_t int timestamp
- Number of seconds since 1 January 1970.
-
int null
- Boolean, indicating whether the value is missing.
-
double value
- The value of the time series record; irrelevant if
null is true.
-
char *flags
- A pointer to a string holding flags as space separated ASCII
words.
-
struct timeseries
- Represents a time series. Contains some internal attributes which
you should not attempt to access directly; instead, you should
create and destroy timeseries objects using
ts_create() and ts_free(), and use the functions
described below to insert, delete, and retrieve records, and
otherwise manipulate timeseries objects.
-
int ts_append_record(struct timeseries *ts, long_time_t timestamp, int null, double value, const char *flags, int *recindex, char **errstr)
- Append a record to the specified time series. Returns nonzero on
error, setting errstr to a static error message; the return value
is an appropriate errno. Returns in recindex the actual index
after adding the record.
-
void ts_clear(struct timeseries *ts)
- Delete all time series records.
-
struct timeseries *ts_create()
- Create and return a time series object. What it actually does is
malloc a memory block enough to hold the timeseries, which
it initializes with 0 records and a nonexistent memory block for
records. The memory block for records will automatically be
initialized when records are added to the time series. Returns NULL if
malloc() returns NULL.
-
int ts_delete_item(struct timeseries *ts, int index)
- Delete the record at index. Return index or -1 if such index
does not exist.
-
struct ts_record *ts_delete_records(struct timeseries *ts, struct ts_record *r1, struct ts_record *r2)
- Delete all records from r1 to r2 (inclusive), which must be
valid pointers to existing records within ts. Return r1 or
NULL if there is any error in the supplied pointers,
including r2*<*r1.
-
int ts_delete_record(struct timeseries *ts, long_time_t timestamp)
- Delete the record that has time stamp tm. Frees the memory
occupied by the record and the associated flag string, and shifts
following records as needed. Returns -1 if no such record exist or
the index of the record deleted.
-
void ts_free(struct timeseries *ts)
- Destroy a time series object. It frees all memory occupied by the
flag strings, the records, and the structure itself. Only call this
function if the object has been created by ts_create(); do
not call it if the object is an automatic or static variable, since
in that case it will attempt to free memory that has not been
dynamically allocated.
-
struct ts_record ts_get_item(struct timeseries *ts, int index)
- Return the record at index. If such a record does not
exist, a segmentation violation is likely.
-
struct ts_record *ts_get_next(struct timeseries *ts, long_time_t timestamp)
- Return first record with date >= timestamp, or NULL if such a
record does not exist.
-
struct ts_record *ts_get_prev(struct timeseries *ts, long_time_t timestamp)
- Return last record with date <= timestamp, or NULL if such a
record does not exist.
-
int ts_get(struct timeseries *ts, long_time_t timestamp)
- Return the record with given timestamp, or NULL if no such record
exists.
-
int ts_get_next_i(struct timeseries *ts, long_time_t timestamp)
-
int ts_get_prev_i(struct timeseries *ts, long_time_t timestamp)
-
int ts_get_i(struct timeseries *ts, long_time_t timestamp)
- These functions are the same as the ones without the _i suffix, except
that they return an index instead of a pointer to a
ts_record, and -1 if the record is not found.
-
int ts_insert_record(struct timeseries *ts, long_time_t timestamp, int null, double value, const char *flags, int allow_existing, int *recindex, char **errstr)
- Insert a record to the specified time series. Returns nonzero on
error, setting errstr to a static error message; the return value
is an appropriate errno. Returns in recindex the actual index
after adding the record. If a record with the specified timestamp
already exists, it returns an error, except if allow_existing is
nonzero, in which case the existing record is overwritten.
-
int ts_length(struct timeseries *ts)
- Return the number of records of the time series.
-
int ts_merge(struct timeseries *ts1, struct timeseries *ts2, char **errstr)
- Merge ts2 into ts1. The two time series must not have any
common timestamps, and after merging ts2 records must be
consecutive in ts1 (i.e. there must be no intermixing of
records). Returns 0 on success, or an appropriate errno on error,
in which case it also sets errstr to an appropriate error
message.
-
double ts_min(struct timeseries *ts, long_time_t start_date, long_time_t end_date)
-
double ts_max(struct timeseries *ts, long_time_t start_date, long_time_t end_date)
-
double ts_average(struct timeseries *ts, long_time_t start_date, long_time_t end_date)
-
double ts_sum(struct timeseries *ts, long_time_t start_date, long_time_t end_date)
Return minimum, maximum, average, or sum of the time series in the
specified interval. Use LLONG_MIN and LLONG_MAX
as the start_date and end_date to return the value for the
entire time series.
If the value cannot be computed (e.g. because the time series
does not have any not-null values in the specified interval),
these functions return NAN.
-
int ts_merge_anyway(struct timeseries *ts1, struct timeseries *ts2, char **errstr)
- Merge ts2 into ts1. ts1 records with timestamps that exist in
ts2 are overwritten. ts2 records can be interspersed with ts1
records. Returns 0 on success, or an appropriate errno on error,
in which case it also sets errstr to an appropriate error
message.
-
int ts_readline(char *line, struct timeseries *ts, char **errstr)
Read a comma delimited line of input and insert that record in
the time series.
The line must have the format datestr,value,flags,
where value is a floating point number (using a dot as the
decimal separator, regardless of system settings), and flags is
string of space separated ASCII words; value and flags can be
empty. datestr is the date in one of the date formats accepted by
parsedatestring(). If a record with that date already
exists in the time series, it is replaced; otherwise, a new record
is inserted in the appropriate position. Returns 0 on success, or
an appropriate errno on error, in which case it also sets errstr
to an appropriate error message.
-
int ts_readfile(FILE* fp, struct timeseries *ts, int *errline, char **errstr)
- Read data from FILE* fp stream, by using the ts_readline function.
-
int ts_readfromstring(char *string, struct timeseries *ts, int *errline, char **errstr)
- Read data from a string containing time series records separated by
line feeds, or carriage returns, or both. ts_readline is used for
string parsing of each line (time series record).
-
int ts_set_item(struct timeseries *ts, int index, int null, double value, const char *flags, char **errstr)
- Set the time series record at index. A record with that index
must exist, or an error is returned. Returns 0 on success, or an
appropriate errno on error, in which case errstr is also set to
an appropriate error message.
-
int ts_writeline(struct ts_record *r, int precision, char *str, size_t max_length)
Converts the record pointed to by r to an ASCII representation
for including in a file format, and writes that representation,
including a terminating null byte, to string str of size
max_length. precision is an integer indicating the required
value precision, in number of decimal digits; precision can be
-9999, meaning to use “%G” as the printf formatting string.
Returns the number of characters written to str, not including
the null byte. This number is at most max_length minus 1. If
writing the result would exceed that number, then it returns zero,
in which case the contents of str are undefined.
-
struct timeseries_list
Contains two members, the number of timeseries n (an
int), and a pointer to a timeseries array,
ts, which is normally dynamically allocated. Use the
following functions to play with timeseries_list:
-
struct timeseries_list *tsl_create(void)
-
void tsl_free(struct timeseries_list *tsl)
-
int tsl_append(struct timeseries_list *tsl, struct timeseries *t)
-
int tsl_delete(struct timeseries_list *tsl, int index)
These functions perform dynamic memory allocation of
timeseries_list objects. tsl_create()
creates and returns a :ctype`timeseries_list` object
containing zero elements, or NULL if insufficient
memory. tsl_free() frees such an object.
tsl_append() and tsl_delete() append or delete
an element, returning zero or an appropriate errno on
insufficient memory or invalid argument.
Important
These functions handle memory allocation of the
timeseries_list object and its contained array of
pointers to timeseries objects, but does not touch
the timeseries objects themselves. It is the
caller’s responsibility to allocate and free the
timeseries objects.
Extended operations
-
int ts_identify_events(const struct timeseries_list *ts, struct interval range, int reverse, double start_threshold, double end_threshold, int ntimeseries_start_threshold, int ntimeseries_end_threshold, long_time_t time_separator, struct interval_list *events, char **errstr)
- This function is intended to find precipitation events in ts,
which is supposed to be a set of spatially proximate time series,
but it can also be used to find any kind of event where the value
of a time series goes beyond a threshold, such as events of heat
or cold. An event is defined as a time interval at the start of
which there is a value at least start_threshold in at least
ntimeseries_start_threshold time series, at the end of which
there is a value less than end_threshold in at least all but
ntimeseries_end_threshold time series, and separated by at least
time_separator from the nearest similar event. Only the interval
specified by range is examined, and all time series should have
the same time stamps within that interval. If reverse is
nonzero, then the function finds events that are smaller than the
thresholds instead of greater (e.g. cold events). The events are
returned in events, which must have been allocated with
il_create() by the caller and must also be freed by the
caller. Returns 0 on success, or an approriate errno on
error, in which case it also sets errstr to an appropriate error
message.
dates - Date utilities
-
long_time_t
- This type is like time_t, but is guaranteed to be at least
64 bits, therefore ensuring that it spans many years.
-
struct interval
- Contains two long_time_t members, start_date and end_date.
-
struct interval_list
Contains two members, the number of intervals n (an
int), and a pointer to a interval array,
intervals, which is normally dynamically allocated. Use the
following functions to play with interval_list:
-
struct interval_list *il_create(void)
-
void il_free(struct interval_list *intrvls)
-
int il_append(struct interval_list *intrvls, long_time_t start_date, long_time_t end_date)
-
int il_delete(struct interval_list *intrvls, int index)
- These functions perform dynamic memory allocation of
interval_list objects. il_create()
creates and returns a :ctype`interval_list` object
containing zero elements, or NULL if insufficient
memory. il_free() frees such an object.
il_append() and il_delete() append or delete
an element, returning zero or an appropriate errno on
insufficient memory or invalid argument.
-
void add_minutes(struct tm *tm, int mins)
- Increases or decreases tm by the specified number of minutes.
-
void igmtime(long_time_t gm_time, struct tm *tm)
- Do the same thing as the time.h gmtime() function,
except using a long_time_t value (gm_time) instead of the
standard time_t. The result is written in the tm.
-
int is_leap_year(int y)
- Return nonzero of y is a leap year. Not that this is a macro and
may evaluate y multiple times.
-
int month_days(int mon, int year)
- Return number of days in specified month (0 to 11) of specified
year.
-
int parsedatestring(const char *s, struct tm *tm, char **errmsg)
- Parse supplied string and set tm to the parsed date. s must be
in one of the following formats: %Y-%m-%d %H:%M, %Y-%m-%d
%H:%M:00, %Y-%m-%d %H:%M:00:00, %Y-%m-%d %H, %Y-%m-%d,
%Y-%m, %Y. A slash may also be used instead of a hyphen as
the date separator, a “T” instead of a space as the date/time
separator, and a full stop instead of a colon as the time
separator. Returns nonzero on error, setting errmsg to a static
error message. The return value is EINVAL if supplied
string is not a valid date, or ENOMEM on insufficient
memory.
-
int tmcmp(struct tm *tm1, struct tm *tm2)
- Return -1, 0, or 1 if tm1 is less than, equal to, or greater than
tm2. Uses minute precision.
strings - string utilities
-
char *strip(char *s)
- Strip leading and trailing whitespace from s in place, and
return s.
csv - operations with CSV files
The word “quote” thereafter means the double-quote character, ".
Unfortunately there is no universally accepted CSV standard, and not
all applications behave the same. The definition we accept here is
this: a field is a sequence of zero or more characters; fields are
delimited by commas; leading and trailing white space characters are
preserved; fields cannot contain newline characters; fields can begin
and end with quotes, in which case they may contain commas; inside a
quoted field, quotes are designated by double quotes; a field is
considered to be quoted if it begins with a quote and ends with the
character sequence ", (quote followed by comma) or "\n (quote
followed by newline), or "\0 (quote ends the string), provided the
end quote is not the second character of a double quote; if no such
field ending sequence can be found on the line, the field is
considered unquoted; single quotes inside a quoted string are ignored.
-
char *csvtok(char **stringp)
csvtok() assumes that stringp points to a line from a
CSV file. It finds the first item in the string, modifies it, if it
is quoted, by converting double quotes to single quotes, terminates
it with ‘0’ (by overwriting the delimiting comma or the end quote,
or some character before those if the item has shrinked because of
double quote interpretation) and updates stringp to point past
the item. If there is no comma in stringp, or if the entire
stringp is quoted, csvtok() sets stringp to
NULL. If stringp is NULL, csvtok()
does nothing.
csvtok() returns the beginning of the field, which is the
original value of stringp, unless the field is quoted, in which
case it is the original value incremented. If stringp is
NULL, csvtok() returns NULL.
-
char *csvquote(const char *s)
- csvquote() is like strdup(), except that if the
original string contains commas or quotes, the returned string is
quoted as needed in order to be a CSV field; that is, a leading and
trailing quote is added, and any other quotes are converted into
double quotes. Like strdup(), it returns a dynamically
allocated string, or NULL on insufficient memory.