libMVL
Mappable vector library
Functions
libMVL.c File Reference

core libMVL functions More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdarg.h>
#include <fcntl.h>
#include <alloca.h>
#include "libMVL.h"
Include dependency graph for libMVL.c:

Go to the source code of this file.

Functions

LIBMVL_CONTEXTmvl_create_context (void)
 Create MVL context. More...
 
void mvl_free_context (LIBMVL_CONTEXT *ctx)
 Release memory associated with MVL context. More...
 
const char * mvl_strerror (LIBMVL_CONTEXT *ctx)
 Obtain description of error code. More...
 
LIBMVL_OFFSET64 mvl_write_vector (LIBMVL_CONTEXT *ctx, int type, LIBMVL_OFFSET64 length, const void *data, LIBMVL_OFFSET64 metadata)
 Write complete MVL vector. More...
 
LIBMVL_OFFSET64 mvl_start_write_vector (LIBMVL_CONTEXT *ctx, int type, LIBMVL_OFFSET64 expected_length, LIBMVL_OFFSET64 length, const void *data, LIBMVL_OFFSET64 metadata)
 Begin write of MVL vector. This is only needed if the vector has to be written in parts, such as due to memory constraints. More...
 
void mvl_rewrite_vector (LIBMVL_CONTEXT *ctx, int type, LIBMVL_OFFSET64 base_offset, LIBMVL_OFFSET64 idx, long length, const void *data)
 Write more data to MVL vector that has been previously created with mvl_start_write_vector() More...
 
LIBMVL_OFFSET64 mvl_indexed_copy_vector (LIBMVL_CONTEXT *ctx, LIBMVL_OFFSET64 index_count, const LIBMVL_OFFSET64 *indices, const LIBMVL_VECTOR *vec, const void *data, LIBMVL_OFFSET64 data_length, LIBMVL_OFFSET64 metadata, LIBMVL_OFFSET64 max_buffer)
 Write MVL vector that contains data at specific indices. The indices can repeat, and can themselves be stored in memory mapped MVL file. More...
 
LIBMVL_OFFSET64 mvl_write_concat_vectors (LIBMVL_CONTEXT *ctx, int type, long nvec, const long *lengths, void **data, LIBMVL_OFFSET64 metadata)
 Write complete MVL vector concatenating data from many vectors or arrays. More...
 
LIBMVL_OFFSET64 mvl_write_string (LIBMVL_CONTEXT *ctx, long length, const char *data, LIBMVL_OFFSET64 metadata)
 Write a single C string. In particular, this is handy for providing metadata tags. More...
 
LIBMVL_OFFSET64 mvl_write_cached_string (LIBMVL_CONTEXT *ctx, long length, const char *data)
 Write a single C string if it has not been written before, otherwise return offset to previously written object. In particular, this is handy for providing metadata tags. More...
 
LIBMVL_OFFSET64 mvl_write_packed_list (LIBMVL_CONTEXT *ctx, long count, const long *str_size, unsigned char **str, LIBMVL_OFFSET64 metadata)
 Write an array of strings as a packed list data type. This is convenient for storing a lot of different strings. More...
 
LIBMVL_OFFSET64 mvl_get_character_class_offset (LIBMVL_CONTEXT *ctx)
 Get offset to metadata describing R-style character class - an array of strings. This is convenient for writing columns of strings to be analyzed with R - just provide this offset as the metadata field of mvl_write_packed_list() More...
 
void mvl_add_directory_entry (LIBMVL_CONTEXT *ctx, LIBMVL_OFFSET64 offset, const char *tag)
 Add an entry to the top level directory of MVL file. More...
 
void mvl_add_directory_entry_n (LIBMVL_CONTEXT *ctx, LIBMVL_OFFSET64 offset, const char *tag, LIBMVL_OFFSET64 tag_size)
 Add entry to the top level directory of MVL file. More...
 
LIBMVL_OFFSET64 mvl_write_directory (LIBMVL_CONTEXT *ctx)
 Write out MVL file directory with entries collected so far. If this is called multiple times only the latest written directory is retrieved when MVL file is opened. It is an error to write out an empty directory. More...
 
LIBMVL_NAMED_LISTmvl_create_named_list (int size)
 Allocate and initialize structure for LIBMVL_NAMED_LIST. More...
 
void mvl_free_named_list (LIBMVL_NAMED_LIST *L)
 Free structure for LIBMVL_NAMED_LIST. More...
 
void mvl_recompute_named_list_hash (LIBMVL_NAMED_LIST *L)
 Recompute named list hash. More...
 
long mvl_add_list_entry (LIBMVL_NAMED_LIST *L, long tag_length, const char *tag, LIBMVL_OFFSET64 offset)
 Add entry to LIBMVL_NAMED_LIST. The entry is always appended to the end. More...
 
LIBMVL_OFFSET64 mvl_find_list_entry (LIBMVL_NAMED_LIST *L, long tag_length, const char *tag)
 Find existing entry inside LIBMVL_NAMED_LIST. If several identically named entries exist this function returns last written value. Hash table is used if present. More...
 
LIBMVL_NAMED_LISTmvl_create_R_attributes_list (LIBMVL_CONTEXT *ctx, const char *R_class)
 Create R-style attribute list for class given by R_class, which could be, for example, "data.frame". More...
 
LIBMVL_OFFSET64 mvl_write_attributes_list (LIBMVL_CONTEXT *ctx, LIBMVL_NAMED_LIST *L)
 Write out R-style attribute list. More...
 
LIBMVL_OFFSET64 mvl_write_named_list (LIBMVL_CONTEXT *ctx, LIBMVL_NAMED_LIST *L)
 Write out named list. In R, this would be read back as list. More...
 
LIBMVL_OFFSET64 mvl_write_named_list2 (LIBMVL_CONTEXT *ctx, LIBMVL_NAMED_LIST *L, char *cl)
 Write out named list. In R, this would be read back as list with class attribute set to "cl". More...
 
LIBMVL_OFFSET64 mvl_write_named_list_as_data_frame (LIBMVL_CONTEXT *ctx, LIBMVL_NAMED_LIST *L, int nrows, LIBMVL_OFFSET64 rownames)
 Write out named list in the style of R data frames. It is assumed that all entries of L are vectors with the same number of elements. More...
 
LIBMVL_NAMED_LISTmvl_read_attributes_list (LIBMVL_CONTEXT *ctx, const void *data, LIBMVL_OFFSET64 data_size, LIBMVL_OFFSET64 metadata_offset)
 Read back MVL attributes list, typically used to described metadata. This function also initialize hash table for fast access. This function does not check that the offsets stored in returned LIBMVL_NAMED_LIST data structure are valid, this should be done by the code that uses those offsets. More...
 
LIBMVL_NAMED_LISTmvl_read_named_list (LIBMVL_CONTEXT *ctx, const void *data, LIBMVL_OFFSET64 data_size, LIBMVL_OFFSET64 offset)
 Read back MVL named list. This function also initialize hash table for fast access. More...
 
void mvl_open (LIBMVL_CONTEXT *ctx, FILE *f)
 Prepare context for writing to file f. More...
 
void mvl_close (LIBMVL_CONTEXT *ctx)
 Write out MVL file directory and postable and close file. More...
 
LIBMVL_OFFSET64 mvl_find_directory_entry (LIBMVL_CONTEXT *ctx, const char *tag)
 Find entry in MVL file directory. More...
 
void mvl_load_image (LIBMVL_CONTEXT *ctx, LIBMVL_OFFSET64 length, const void *data)
 Initilize MVL context to operate with memory mapped area data. More...
 
int mvl_hash_indices (LIBMVL_OFFSET64 indices_count, const LIBMVL_OFFSET64 *indices, LIBMVL_OFFSET64 *hash, LIBMVL_OFFSET64 vec_count, LIBMVL_VECTOR **vec, void **vec_data, LIBMVL_OFFSET64 *vec_data_length, int flags)
 This function is used to compute 64 bit hash of vector values array hash[] is passed in and contains the result of the computation. More...
 
int mvl_hash_range (LIBMVL_OFFSET64 i0, LIBMVL_OFFSET64 i1, LIBMVL_OFFSET64 *hash, LIBMVL_OFFSET64 vec_count, LIBMVL_VECTOR **vec, void **vec_data, LIBMVL_OFFSET64 *vec_data_length, int flags)
 This function is used to compute 64 bit hash of vector values array hash[] is passed in and contains the result of the computation. More...
 
LIBMVL_OFFSET64 mvl_compute_hash_map_size (LIBMVL_OFFSET64 hash_count)
 Compute suggested size of hash map given the number of entries to hash. Hash map size should always be a power of 2. More...
 
HASH_MAPmvl_allocate_hash_map (LIBMVL_OFFSET64 max_index_count)
 Create HASH_MAP structure. More...
 
void mvl_free_hash_map (HASH_MAP *hash_map)
 Free allocated HASH_MAP. More...
 
void mvl_compute_hash_map (HASH_MAP *hm)
 Compute hash map. This assumes that hm->hash array has been populated with hm->hash_count hashes computed with mvl_hash_indices(). More...
 
LIBMVL_OFFSET64 mvl_hash_match_count (LIBMVL_OFFSET64 key_count, const LIBMVL_OFFSET64 *key_hash, HASH_MAP *hm)
 Find count of matches between hashes of two sets. More...
 
int mvl_find_matches (LIBMVL_OFFSET64 key_indices_count, const LIBMVL_OFFSET64 *key_indices, LIBMVL_OFFSET64 key_vec_count, LIBMVL_VECTOR **key_vec, void **key_vec_data, LIBMVL_OFFSET64 *key_hash, LIBMVL_OFFSET64 indices_count, const LIBMVL_OFFSET64 *indices, LIBMVL_OFFSET64 vec_count, LIBMVL_VECTOR **vec, void **vec_data, HASH_MAP *hm, LIBMVL_OFFSET64 *key_last, LIBMVL_OFFSET64 pairs_size, LIBMVL_OFFSET64 *key_match_indices, LIBMVL_OFFSET64 *match_indices)
 Compute pairs of merge indices. This is similar to JOIN operation in SQL. More...
 
void mvl_find_groups (LIBMVL_OFFSET64 indices_count, const LIBMVL_OFFSET64 *indices, LIBMVL_OFFSET64 vec_count, LIBMVL_VECTOR **vec, void **vec_data, HASH_MAP *hm)
 This function transforms HASH_MAP into a list of groups. Similar to GROUP BY clause in SQL. More...
 
void mvl_extend_partition (LIBMVL_PARTITION *el, LIBMVL_OFFSET64 nelem)
 Increase storage of previously allocated partition. More...
 
void mvl_find_repeats (LIBMVL_PARTITION *el, LIBMVL_OFFSET64 count, LIBMVL_VECTOR **vec, void **data)
 Compute list of extents describing stretches of data with identical values. More...
 
void mvl_init_partitiion (LIBMVL_PARTITION *el)
 Initialize freshly allocated partition structure. More...
 
void mvl_free_partition_arrays (LIBMVL_PARTITION *el)
 free arrays of previously allocated partition. This function does not free the structure itself. More...
 
void mvl_init_extent_list (LIBMVL_EXTENT_LIST *el)
 Initialize freshly allocated partition structure. More...
 
void mvl_free_extent_list_arrays (LIBMVL_EXTENT_LIST *el)
 free arrays of previously allocated partition. This function does not free the structure itself. More...
 
void mvl_extend_extent_list (LIBMVL_EXTENT_LIST *el, LIBMVL_OFFSET64 nelem)
 Increase storage of previously allocated extent list. More...
 
void mvl_init_extent_index (LIBMVL_EXTENT_INDEX *ei)
 Initialize freshly allocated extent list structure. More...
 
void mvl_free_extent_index_arrays (LIBMVL_EXTENT_INDEX *ei)
 free arrays of previously allocated extent list. This function does not free the structure itself. More...
 
int mvl_compute_extent_index (LIBMVL_EXTENT_INDEX *ei, LIBMVL_OFFSET64 count, LIBMVL_VECTOR **vec, void **data, LIBMVL_OFFSET64 *data_length)
 Compute an extent index. More...
 
LIBMVL_OFFSET64 mvl_write_extent_index (LIBMVL_CONTEXT *ctx, LIBMVL_EXTENT_INDEX *ei)
 Write extent index to MVL file. More...
 
int mvl_load_extent_index (LIBMVL_CONTEXT *ctx, void *data, LIBMVL_OFFSET64 data_size, LIBMVL_OFFSET64 offset, LIBMVL_EXTENT_INDEX *ei)
 Load extent index from memory mapped MVL file.
 
void mvl_compute_vec_stats (const LIBMVL_VECTOR *vec, LIBMVL_VEC_STATS *stats)
 Compute vector statistics, such as a bounding box. More...
 
void mvl_normalize_vector (const LIBMVL_VECTOR *vec, const LIBMVL_VEC_STATS *stats, LIBMVL_OFFSET64 i0, LIBMVL_OFFSET64 i1, double *out)
 normalize vector More...
 

Detailed Description

core libMVL functions

Definition in file libMVL.c.

Function Documentation

◆ mvl_add_directory_entry()

void mvl_add_directory_entry ( LIBMVL_CONTEXT ctx,
LIBMVL_OFFSET64  offset,
const char *  tag 
)

Add an entry to the top level directory of MVL file.

Parameters
ctxMVL context pointer that has been initialized for writing
offsetdirectory entry value - typically an offset pointing to previously written MVL object
tagC string describing directory entry. When necessary, these can repeat, in which case the last written entry is retrieved first.

Definition at line 835 of file libMVL.c.

◆ mvl_add_directory_entry_n()

void mvl_add_directory_entry_n ( LIBMVL_CONTEXT ctx,
LIBMVL_OFFSET64  offset,
const char *  tag,
LIBMVL_OFFSET64  tag_size 
)

Add entry to the top level directory of MVL file.

Parameters
ctxMVL context pointer that has been initialized for writing
offsetdirectory entry value - typically an offset pointing to previously written MVL object
tagstring describing directory entry. When necessary, these can repeat, in which case the last written entry is retrieved first.
tag_sizelength of tag

Definition at line 859 of file libMVL.c.

◆ mvl_add_list_entry()

long mvl_add_list_entry ( LIBMVL_NAMED_LIST L,
long  tag_length,
const char *  tag,
LIBMVL_OFFSET64  offset 
)

Add entry to LIBMVL_NAMED_LIST. The entry is always appended to the end.

Parameters
Lpointer to previously allocated LIBMVL_NAMED_LIST
tag_lengthsize of tag
tagstring identifying entry - these can repeat.
offset64-bit value
Returns
index of entry inside named list

Definition at line 1001 of file libMVL.c.

◆ mvl_allocate_hash_map()

HASH_MAP* mvl_allocate_hash_map ( LIBMVL_OFFSET64  max_index_count)

Create HASH_MAP structure.

This creates default HASH_MAP structure with all members allocated with new arrays. In some situations, such as to save memory it is possible to reuse existing arrays by specifying hm->flags appropriately. In such case, one should not use this constructor and instead create the structure manually.

Parameters
max_index_countexpected number of entries to hash
Returns
pointer to allocated HASH_MAP structure

Definition at line 2094 of file libMVL.c.

◆ mvl_close()

void mvl_close ( LIBMVL_CONTEXT ctx)

Write out MVL file directory and postable and close file.

Parameters
ctxMVL context pointer

Definition at line 1347 of file libMVL.c.

◆ mvl_compute_extent_index()

int mvl_compute_extent_index ( LIBMVL_EXTENT_INDEX ei,
LIBMVL_OFFSET64  count,
LIBMVL_VECTOR **  vec,
void **  data,
LIBMVL_OFFSET64 data_length 
)

Compute an extent index.

Parameters
eia pointer to extent index structure
countthe number of LIBMVL_VECTORS considered as columns in a table
vecan array of pointers to LIBMVL_VECTORS considered as columns in a table
dataan array of pointers to memory mapped areas those LIBMVL_VECTORs derive from. This allows computing hash from vectors drawn from different MVL
Returns
an integer error code, or 0 on success

Definition at line 2645 of file libMVL.c.

◆ mvl_compute_hash_map()

void mvl_compute_hash_map ( HASH_MAP hm)

Compute hash map. This assumes that hm->hash array has been populated with hm->hash_count hashes computed with mvl_hash_indices().

Parameters
hma pointer to HASH_MAP structure

Definition at line 2135 of file libMVL.c.

◆ mvl_compute_hash_map_size()

LIBMVL_OFFSET64 mvl_compute_hash_map_size ( LIBMVL_OFFSET64  hash_count)

Compute suggested size of hash map given the number of entries to hash. Hash map size should always be a power of 2.

Parameters
hash_countexpected number of items to hash
Returns
suggested hash map size

Definition at line 2075 of file libMVL.c.

◆ mvl_compute_vec_stats()

void mvl_compute_vec_stats ( const LIBMVL_VECTOR vec,
LIBMVL_VEC_STATS stats 
)

Compute vector statistics, such as a bounding box.

Parameters
veca pointer to LIBMVL_VECTOR
statsa pointer to previously allocated LIBMVL_VEC_STATS structure

Definition at line 2809 of file libMVL.c.

◆ mvl_create_context()

LIBMVL_CONTEXT* mvl_create_context ( void  )

Create MVL context.

Returns
A pointer to allocated LIBMVL_CONTEXT structure

Definition at line 150 of file libMVL.c.

◆ mvl_create_named_list()

LIBMVL_NAMED_LIST* mvl_create_named_list ( int  size)

Allocate and initialize structure for LIBMVL_NAMED_LIST.

Parameters
sizethis can be set to large values if the final size of named list is known
Returns
point to structure for LIBMVL_NAMED_LIST

Definition at line 927 of file libMVL.c.

◆ mvl_create_R_attributes_list()

LIBMVL_NAMED_LIST* mvl_create_R_attributes_list ( LIBMVL_CONTEXT ctx,
const char *  R_class 
)

Create R-style attribute list for class given by R_class, which could be, for example, "data.frame".

Parameters
ctxMVL context pointer that has been initialized for writing
R_classstring identifying R class, such as "data.frame"
Returns
pointer to LIBMVL_NAMED_LIST with allocated parameters

Definition at line 1082 of file libMVL.c.

◆ mvl_extend_extent_list()

void mvl_extend_extent_list ( LIBMVL_EXTENT_LIST el,
LIBMVL_OFFSET64  nelem 
)

Increase storage of previously allocated extent list.

Parameters
elextent list structure
nelemMake sure it can contain at least that many elements

Definition at line 2573 of file libMVL.c.

◆ mvl_extend_partition()

void mvl_extend_partition ( LIBMVL_PARTITION el,
LIBMVL_OFFSET64  nelem 
)

Increase storage of previously allocated partition.

Parameters
elPartition structure
nelemMake sure it can contain at least that many elements

Definition at line 2451 of file libMVL.c.

◆ mvl_find_directory_entry()

LIBMVL_OFFSET64 mvl_find_directory_entry ( LIBMVL_CONTEXT ctx,
const char *  tag 
)

Find entry in MVL file directory.

Parameters
ctxMVL context pointer
tagcharacter string identifying entry
Returns
offset into file the entry points to

Definition at line 1384 of file libMVL.c.

◆ mvl_find_groups()

void mvl_find_groups ( LIBMVL_OFFSET64  indices_count,
const LIBMVL_OFFSET64 indices,
LIBMVL_OFFSET64  vec_count,
LIBMVL_VECTOR **  vec,
void **  vec_data,
HASH_MAP hm 
)

This function transforms HASH_MAP into a list of groups. Similar to GROUP BY clause in SQL.

The original HASH_MAP describes groups of rows with identical hashes. However, there is a (remote) possibility of collision where different rows have the same hash. This function resolves this ambiguity. After calling hm->hash_map becomes invalid, but hm->first and hm->next describe exactly identical rows

Parameters
indices_countnumber of elements in indices array
indicesan array of indices used to create HASH_MAP hm
vec_countthe number of LIBMVL_VECTORS considered as columns in a table
vecan array of pointers to LIBMVL_VECTORS considered as columns in a table
vec_dataan array of pointers to memory mapped areas those LIBMVL_VECTORs derive from. This allows computing hash from vectors drawn from different MVL
hma previously computed (with mvl_compute_hash_map()) HASH_MAP

Definition at line 2382 of file libMVL.c.

◆ mvl_find_list_entry()

LIBMVL_OFFSET64 mvl_find_list_entry ( LIBMVL_NAMED_LIST L,
long  tag_length,
const char *  tag 
)

Find existing entry inside LIBMVL_NAMED_LIST. If several identically named entries exist this function returns last written value. Hash table is used if present.

Parameters
Lpointer to previously allocated LIBMVL_NAMED_LIST
tag_lengthsize of tag
tagstring identifying entry - these can repeat.
Returns
entry value

Definition at line 1048 of file libMVL.c.

◆ mvl_find_matches()

int mvl_find_matches ( LIBMVL_OFFSET64  key_indices_count,
const LIBMVL_OFFSET64 key_indices,
LIBMVL_OFFSET64  key_vec_count,
LIBMVL_VECTOR **  key_vec,
void **  key_vec_data,
LIBMVL_OFFSET64 key_hash,
LIBMVL_OFFSET64  indices_count,
const LIBMVL_OFFSET64 indices,
LIBMVL_OFFSET64  vec_count,
LIBMVL_VECTOR **  vec,
void **  vec_data,
HASH_MAP hm,
LIBMVL_OFFSET64 key_last,
LIBMVL_OFFSET64  pairs_size,
LIBMVL_OFFSET64 key_match_indices,
LIBMVL_OFFSET64 match_indices 
)

Compute pairs of merge indices. This is similar to JOIN operation in SQL.

This function takes two table like sets of vectors as input. The vectors in each table set have to be of equal number of elements. We also take two index arrays specifying rows in each table set. We then find pairs of indices where the rows are identical.

The output is returned in pair of preallocated arrays key_match_indices and match_indices. The pairs are arrange in stretches of identical "key" rows. Those stretches are described by key_last array.

Parameters
key_indices_countnumber of entries in key_indices array
key_indicesan array with indices into "key" table-like vector set
key_vec_countnumber of vectors in "key" table set
key_vecan array of vectors in "key" table set
key_vec_dataan array of pointers to memory mapped areas those "key" vectors derive from. This allows computing hash from vectors drawn from different MVL files
key_hashan array of hashes of "key" vectors computed with mvl_hash_indices()
indices_countnumber of entries in indices array
indicesan array with indices into "main" table-like vector set
vec_countnumber of vectors in "main" table set
vecan array of vectors in "main" table set
vec_dataan array of pointers to memory mapped areas those "main" vectors derive from. This allows computing hash from vectors drawn from different MVL files
hma previosly computed HASH_MAP of "main" table set
key_lastthis is an output array of size key_indices_count that describes stretches of matches with indentical "key" rows. Thus for "key" row i, the corresponding stretch is key_last[i-1] to key_last[i]-1
pairs_sizethe size of allocated key_match_indices and match_indices arrays. This value can be computed with mvl_hash_match_count().
key_match_indicesan array of "key" indices from each pair
match_indicesan array of "main" indices from each pair
Returns
0 if everything went well, otherwise a negative error code

Definition at line 2306 of file libMVL.c.

◆ mvl_find_repeats()

void mvl_find_repeats ( LIBMVL_PARTITION el,
LIBMVL_OFFSET64  count,
LIBMVL_VECTOR **  vec,
void **  data 
)

Compute list of extents describing stretches of data with identical values.

Parameters
elpointer to previously allocated LIBMVL_PARTITION structure
countNumber of vectors in vec
vecArray of vectors with identical number of elements
dataMapped data areas (needed to compare strings)

Definition at line 2469 of file libMVL.c.

◆ mvl_free_context()

void mvl_free_context ( LIBMVL_CONTEXT ctx)

Release memory associated with MVL context.

Parameters
ctxpointer to context previously allocated with mvl_create_context()

Definition at line 186 of file libMVL.c.

◆ mvl_free_extent_index_arrays()

void mvl_free_extent_index_arrays ( LIBMVL_EXTENT_INDEX ei)

free arrays of previously allocated extent list. This function does not free the structure itself.

Parameters
eia pointer to LIBMVL_EXTENT_INDEX structure

Definition at line 2611 of file libMVL.c.

◆ mvl_free_extent_list_arrays()

void mvl_free_extent_list_arrays ( LIBMVL_EXTENT_LIST el)

free arrays of previously allocated partition. This function does not free the structure itself.

Parameters
ela pointer to LIBMVL_PARTITION structure

Definition at line 2557 of file libMVL.c.

◆ mvl_free_hash_map()

void mvl_free_hash_map ( HASH_MAP hash_map)

Free allocated HASH_MAP.

Parameters
hash_mapa pointer to previously allocated hash_map structure

Definition at line 2118 of file libMVL.c.

◆ mvl_free_named_list()

void mvl_free_named_list ( LIBMVL_NAMED_LIST L)

Free structure for LIBMVL_NAMED_LIST.

Parameters
Lpointer to previously allocated LIBMVL_NAMED_LIST

Definition at line 949 of file libMVL.c.

◆ mvl_free_partition_arrays()

void mvl_free_partition_arrays ( LIBMVL_PARTITION el)

free arrays of previously allocated partition. This function does not free the structure itself.

Parameters
ela pointer to LIBMVL_PARTITION structure

Definition at line 2533 of file libMVL.c.

◆ mvl_get_character_class_offset()

LIBMVL_OFFSET64 mvl_get_character_class_offset ( LIBMVL_CONTEXT ctx)

Get offset to metadata describing R-style character class - an array of strings. This is convenient for writing columns of strings to be analyzed with R - just provide this offset as the metadata field of mvl_write_packed_list()

Parameters
ctxMVL context pointer that has been initialized for writing
Returns
an offset into the file, suitable for specifying as MVL object metadata

Definition at line 819 of file libMVL.c.

◆ mvl_hash_indices()

int mvl_hash_indices ( LIBMVL_OFFSET64  indices_count,
const LIBMVL_OFFSET64 indices,
LIBMVL_OFFSET64 hash,
LIBMVL_OFFSET64  vec_count,
LIBMVL_VECTOR **  vec,
void **  vec_data,
LIBMVL_OFFSET64 vec_data_length,
int  flags 
)

This function is used to compute 64 bit hash of vector values array hash[] is passed in and contains the result of the computation.

Integer indices are computed by value, so that 100 produces the same hash whether it is stored as INT32 or INT64.

Floats and doubles are trickier - we can guarantee that the hash of a float promoted to a double is the same as the hash of the original float, but not the reverse.

Parameters
indices_counttotal number of indices
indicesan array of indices into provided vectors
hasha previously allocated array of length indices_count that the computed hashes will be written into
vec_countthe number of LIBMVL_VECTORS considered as columns in a table
vecan array of pointers to LIBMVL_VECTORS considered as columns in a table
vec_dataan array of pointers to memory mapped areas those LIBMVL_VECTORs derive from. This allows computing hash from vectors drawn from different MVL files
vec_data_lengthan array of lengths of memory mapped areas those LIBMVL_VECTORs derive from.
flagsflags specifying whether to initialize or finalize hash

Definition at line 1882 of file libMVL.c.

◆ mvl_hash_match_count()

LIBMVL_OFFSET64 mvl_hash_match_count ( LIBMVL_OFFSET64  key_count,
const LIBMVL_OFFSET64 key_hash,
HASH_MAP hm 
)

Find count of matches between hashes of two sets.

This function is useful to find the upper limit on the number of possible matches, so one can allocate arrays for the result or plan computation in some other way.

Parameters
key_countnumber of key hashes
key_hashan array of key hashes to query
hma pointer to HASH_MAP structure
Returns
number of matches

Definition at line 2202 of file libMVL.c.

◆ mvl_hash_range()

int mvl_hash_range ( LIBMVL_OFFSET64  i0,
LIBMVL_OFFSET64  i1,
LIBMVL_OFFSET64 hash,
LIBMVL_OFFSET64  vec_count,
LIBMVL_VECTOR **  vec,
void **  vec_data,
LIBMVL_OFFSET64 vec_data_length,
int  flags 
)

This function is used to compute 64 bit hash of vector values array hash[] is passed in and contains the result of the computation.

Integer indices are computed by value, so that 100 produces the same hash whether it is stored as INT32 or INT64.

Floats and doubles are trickier - we can guarantee that the hash of a float promoted to a double is the same as the hash of the original float, but not the reverse.

Parameters
i0starting index to hash
i1first index to not hash
hasha previously allocated array of length (i1-i0) that the computed hashes will be written into
vec_countthe number of LIBMVL_VECTORS considered as columns in a table
vecan array of pointers to LIBMVL_VECTORS considered as columns in a table
vec_dataan array of pointers to memory mapped areas those LIBMVL_VECTORs derive from. This allows computing hash from vectors drawn from different MVL files
vec_data_lengthan array of pointers to memory mapped areas those LIBMVL_VECTORs derive from.
flagsflags specifying whether to initialize or finalize hash

Definition at line 1984 of file libMVL.c.

◆ mvl_indexed_copy_vector()

LIBMVL_OFFSET64 mvl_indexed_copy_vector ( LIBMVL_CONTEXT ctx,
LIBMVL_OFFSET64  index_count,
const LIBMVL_OFFSET64 indices,
const LIBMVL_VECTOR vec,
const void *  data,
LIBMVL_OFFSET64  data_length,
LIBMVL_OFFSET64  metadata,
LIBMVL_OFFSET64  max_buffer 
)

Write MVL vector that contains data at specific indices. The indices can repeat, and can themselves be stored in memory mapped MVL file.

Parameters
ctxMVL context pointer that has been initialized for writing
index_countnumber of indices to process, this will determine the length of the new vector
indicesarray of indices into vector vec
veca pointer to fully formed MVL vector, such as from mapped MVL file
datapointer to data of previously mapped MVL library
data_lengthlength of data of previously mapped MVL library
metadataan optional offset to previously written metadata. Specify LIBMVL_NO_METADATA if not needed
max_buffermaximum size of buffer to hold in-flight data. Recommend to set to at least 10MB for efficiency.
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 473 of file libMVL.c.

◆ mvl_init_extent_index()

void mvl_init_extent_index ( LIBMVL_EXTENT_INDEX ei)

Initialize freshly allocated extent list structure.

Parameters
eia pointer to LIBMVL_EXTENT_INDEX structure

Definition at line 2601 of file libMVL.c.

◆ mvl_init_extent_list()

void mvl_init_extent_list ( LIBMVL_EXTENT_LIST el)

Initialize freshly allocated partition structure.

Parameters
ela pointer to LIBMVL_PARTITION structure

Definition at line 2545 of file libMVL.c.

◆ mvl_init_partitiion()

void mvl_init_partitiion ( LIBMVL_PARTITION el)

Initialize freshly allocated partition structure.

Parameters
ela pointer to LIBMVL_PARTITION structure

Definition at line 2524 of file libMVL.c.

◆ mvl_load_image()

void mvl_load_image ( LIBMVL_CONTEXT ctx,
LIBMVL_OFFSET64  length,
const void *  data 
)

Initilize MVL context to operate with memory mapped area data.

Parameters
ctxMVL context pointer
lengthsize of memory mapped data, in bytes
datapointer to the beginning of memory mapped area

Definition at line 1399 of file libMVL.c.

◆ mvl_normalize_vector()

void mvl_normalize_vector ( const LIBMVL_VECTOR vec,
const LIBMVL_VEC_STATS stats,
LIBMVL_OFFSET64  i0,
LIBMVL_OFFSET64  i1,
double *  out 
)

normalize vector

This function converts numeric vectors into a normalized double precision entries. Indices i0 and i1 specify the stretch of indices to normalize. This facilitates processing of very long vectors in pieces.

Parameters
veca pointer to LIBMVL_VECTOR
statspreviously allocated LIBMVL_VEC_STATS structure
i0start index of stretch to process
i1stop index of stretch to process
outarray of normalized entries of size i1-i0. First entry corresponds to index i0

Definition at line 2960 of file libMVL.c.

◆ mvl_open()

void mvl_open ( LIBMVL_CONTEXT ctx,
FILE *  f 
)

Prepare context for writing to file f.

Parameters
ctxMVL context pointer
fpointer to previously opened stdio.h FILE structure

Definition at line 1338 of file libMVL.c.

◆ mvl_read_attributes_list()

LIBMVL_NAMED_LIST* mvl_read_attributes_list ( LIBMVL_CONTEXT ctx,
const void *  data,
LIBMVL_OFFSET64  data_size,
LIBMVL_OFFSET64  metadata_offset 
)

Read back MVL attributes list, typically used to described metadata. This function also initialize hash table for fast access. This function does not check that the offsets stored in returned LIBMVL_NAMED_LIST data structure are valid, this should be done by the code that uses those offsets.

Parameters
ctxMVL context pointer
datamemory mapped data
data_sizesize of memory mapped data
metadata_offsetmetadata offset pointing to the previously written attributes
Returns
NULL if there is no metadata, otherwise LIBMVL_NAMED_LIST populated with attributes

Definition at line 1191 of file libMVL.c.

◆ mvl_read_named_list()

LIBMVL_NAMED_LIST* mvl_read_named_list ( LIBMVL_CONTEXT ctx,
const void *  data,
LIBMVL_OFFSET64  data_size,
LIBMVL_OFFSET64  offset 
)

Read back MVL named list. This function also initialize hash table for fast access.

Parameters
ctxMVL context pointer
datamemory mapped data
data_sizesize of memory mapped data
offsetoffset into data where LIBMVL_NAMED_LIST begins
Returns
NULL on error, otherwise LIBMVL_NAMED_LIST

Definition at line 1250 of file libMVL.c.

◆ mvl_recompute_named_list_hash()

void mvl_recompute_named_list_hash ( LIBMVL_NAMED_LIST L)

Recompute named list hash.

Parameters
Lpointer to previously allocated LIBMVL_NAMED_LIST

Definition at line 964 of file libMVL.c.

◆ mvl_rewrite_vector()

void mvl_rewrite_vector ( LIBMVL_CONTEXT ctx,
int  type,
LIBMVL_OFFSET64  base_offset,
LIBMVL_OFFSET64  idx,
long  length,
const void *  data 
)

Write more data to MVL vector that has been previously created with mvl_start_write_vector()

Parameters
ctxMVL context pointer that has been initialized for writing
typeMVL data type
base_offsetthe offset returned by mvl_start_write_vector()
idxindex of of first element pointed to by data
lengthnumber of elements to write
datapointer to data

Definition at line 452 of file libMVL.c.

◆ mvl_start_write_vector()

LIBMVL_OFFSET64 mvl_start_write_vector ( LIBMVL_CONTEXT ctx,
int  type,
LIBMVL_OFFSET64  expected_length,
LIBMVL_OFFSET64  length,
const void *  data,
LIBMVL_OFFSET64  metadata 
)

Begin write of MVL vector. This is only needed if the vector has to be written in parts, such as due to memory constraints.

Parameters
ctxMVL context pointer that has been initialized for writing
typeMVL data type
expected_lengthnumber of elements in the fully written vector
lengthnumber of elements to write
datapointer to data
metadataan optional offset to previously written metadata. Specify LIBMVL_NO_METADATA if not needed
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 369 of file libMVL.c.

◆ mvl_strerror()

const char* mvl_strerror ( LIBMVL_CONTEXT ctx)

Obtain description of error code.

Parameters
ctxpointer to context previously allocated with mvl_create_context()
Returns
pointer to C string which memory is owned by the context

Definition at line 213 of file libMVL.c.

◆ mvl_write_attributes_list()

LIBMVL_OFFSET64 mvl_write_attributes_list ( LIBMVL_CONTEXT ctx,
LIBMVL_NAMED_LIST L 
)

Write out R-style attribute list.

Parameters
ctxMVL context pointer that has been initialized for writing
Lpreviously created attributes list
Returns
an offset into the file, suitable for use as vector metadata

Definition at line 1096 of file libMVL.c.

◆ mvl_write_cached_string()

LIBMVL_OFFSET64 mvl_write_cached_string ( LIBMVL_CONTEXT ctx,
long  length,
const char *  data 
)

Write a single C string if it has not been written before, otherwise return offset to previously written object. In particular, this is handy for providing metadata tags.

Parameters
ctxMVL context pointer that has been initialized for writing
lengthstring length. Set to -1 to be computed automatically.
datastring data
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 701 of file libMVL.c.

◆ mvl_write_concat_vectors()

LIBMVL_OFFSET64 mvl_write_concat_vectors ( LIBMVL_CONTEXT ctx,
int  type,
long  nvec,
const long *  lengths,
void **  data,
LIBMVL_OFFSET64  metadata 
)

Write complete MVL vector concatenating data from many vectors or arrays.

Parameters
ctxMVL context pointer that has been initialized for writing
typeMVL data type
nvecnumber of arrays to concatenate
lengthsarray of lengths of individual vectors
dataarray of pointers to vector data
metadataan optional offset to previously written metadata. Specify LIBMVL_NO_METADATA if not needed
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 636 of file libMVL.c.

◆ mvl_write_directory()

LIBMVL_OFFSET64 mvl_write_directory ( LIBMVL_CONTEXT ctx)

Write out MVL file directory with entries collected so far. If this is called multiple times only the latest written directory is retrieved when MVL file is opened. It is an error to write out an empty directory.

Parameters
ctxMVL context pointer that has been initialized for writing
Returns
an offset into the file where the directory was written

Definition at line 880 of file libMVL.c.

◆ mvl_write_extent_index()

LIBMVL_OFFSET64 mvl_write_extent_index ( LIBMVL_CONTEXT ctx,
LIBMVL_EXTENT_INDEX ei 
)

Write extent index to MVL file.

Definition at line 2695 of file libMVL.c.

◆ mvl_write_named_list()

LIBMVL_OFFSET64 mvl_write_named_list ( LIBMVL_CONTEXT ctx,
LIBMVL_NAMED_LIST L 
)

Write out named list. In R, this would be read back as list.

Parameters
ctxMVL context pointer that has been initialized for writing
Lpreviously created named list
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 1119 of file libMVL.c.

◆ mvl_write_named_list2()

LIBMVL_OFFSET64 mvl_write_named_list2 ( LIBMVL_CONTEXT ctx,
LIBMVL_NAMED_LIST L,
char *  cl 
)

Write out named list. In R, this would be read back as list with class attribute set to "cl".

Parameters
ctxMVL context pointer that has been initialized for writing
Lpreviously created named list
clcharacter string describing list class
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 1141 of file libMVL.c.

◆ mvl_write_named_list_as_data_frame()

LIBMVL_OFFSET64 mvl_write_named_list_as_data_frame ( LIBMVL_CONTEXT ctx,
LIBMVL_NAMED_LIST L,
int  nrows,
LIBMVL_OFFSET64  rownames 
)

Write out named list in the style of R data frames. It is assumed that all entries of L are vectors with the same number of elements.

Parameters
ctxMVL context pointer that has been initialized for writing
Lpreviously created named list
nrowsnumber of elements in each entry of L. Note that packed lists should have length of nrows+1
rownamesnames of individual rows. Set to 0 to omit.
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 1164 of file libMVL.c.

◆ mvl_write_packed_list()

LIBMVL_OFFSET64 mvl_write_packed_list ( LIBMVL_CONTEXT ctx,
long  count,
const long *  str_size,
unsigned char **  str,
LIBMVL_OFFSET64  metadata 
)

Write an array of strings as a packed list data type. This is convenient for storing a lot of different strings.

Parameters
ctxMVL context pointer that has been initialized for writing
countNumber of strings to store
str_sizearray of lengths of individual strings. If this is NULL string lengths are computed automatically. In addition, if any string length is -1 it is also computed automatically.
strpoint to array of strings
metadataan optional offset to previously written metadata. Specify LIBMVL_NO_METADATA if not needed
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 785 of file libMVL.c.

◆ mvl_write_string()

LIBMVL_OFFSET64 mvl_write_string ( LIBMVL_CONTEXT ctx,
long  length,
const char *  data,
LIBMVL_OFFSET64  metadata 
)

Write a single C string. In particular, this is handy for providing metadata tags.

Parameters
ctxMVL context pointer that has been initialized for writing
lengthstring length. Set to -1 to be computed automatically.
datastring data
metadataan optional offset to previously written metadata. Specify LIBMVL_NO_METADATA if not needed
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 689 of file libMVL.c.

◆ mvl_write_vector()

LIBMVL_OFFSET64 mvl_write_vector ( LIBMVL_CONTEXT ctx,
int  type,
LIBMVL_OFFSET64  length,
const void *  data,
LIBMVL_OFFSET64  metadata 
)

Write complete MVL vector.

Parameters
ctxMVL context pointer that has been initialized for writing
typeMVL data type
lengthnumber of elements to write
datapointer to data
metadataan optional offset to previously written metadata. Specify LIBMVL_NO_METADATA if not needed
Returns
an offset into the file, suitable for adding to MVL file directory, or to other MVL objects

Definition at line 319 of file libMVL.c.