TODO: A bit about the Utility Modules...


DATeS Utility Modules


dates_utility

add_filter_to_filters_list(new_filter_name, new_filter_src_languages, new_filter_src_path, filters_list_relative_dir='src/filters_forest', filters_list_file_name='filters_list.txt', backup_old_file=True)

Add a new filter to the list of implemented filters in DATeS.

Parameters:
  • new_filter_name – name of the new filter to add to list
  • new_filter_src_languages – list containing source code languages,
  • new_filter_src_path – directory to add the new filters’ source code.
  • filters_list_relative_dir – relative path of the directory where the filters_list files exists, the path must be relative to DATEeS_root_path.
  • filters_list_file_name – name of the file containing filters’ list.
  • backup_old_file – save a copy of the file with extension .bk appended to the end of the file name.
Returns:

None

add_model_to_models_list(new_model_name, new_model_src_languages, new_model_src_path, new_model_dimension, models_list_relative_dir='src/models_forest', models_list_file_name='models_list.txt', backup_old_file=True)

Add a new model to the list of implemented models in DATeS.

Parameters:
  • new_model_name – name of the new model to add to list
  • new_model_src_languages – list containing source code languages,
  • new_model_src_path – directory to add the new models’ source code.
  • new_model_dimension – dimension of the model (0, 1, 2, 3, ...)
  • models_list_relative_dir – relative path of the directory where the models_list files exists, the path must be relative to DATEeS_root_path.
  • models_list_file_name – name of the file containing models’ list.
  • backup_old_file – save a copy of the file with extension .bk appended to the end of the file name.
Returns:

None

add_scheme_to_hybrid_schemes_list(new_hybrid_scheme_name, new_hybrid_scheme_src_languages, new_hybrid_scheme_src_path, hybrid_schemes_list_relative_dir='src/hybrid_schemes_forest', hybrid_schemes_list_file_name='hybrid_schemes_list.txt', backup_old_file=True)

Add a new hybrid_scheme to the list of implemented hybrid_schemes in DATeS.

Parameters:
  • new_hybrid_scheme_name – name of the new hybrid_scheme to add to list
  • new_hybrid_scheme_src_languages – list containing source code languages,
  • new_hybrid_scheme_src_path – directory to add the new hybrid_schemes’ source code.
  • hybrid_schemes_list_relative_dir – relative path of the directory where the hybrid_schemes_list files exists, the path must be relative to DATEeS_root_path.
  • hybrid_schemes_list_file_name – name of the file containing hybrid_schemes’ list.
  • backup_old_file – save a copy of the file with extension .bk appended to the end of the file name.
Returns:

None

add_scheme_to_variational_schemes_list(new_variational_scheme_name, new_variational_scheme_src_languages, new_variational_scheme_src_path, variational_schemes_list_relative_dir='src/variational_schemes_forest', variational_schemes_list_file_name='variational_schemes_list.txt', backup_old_file=True)

Add a new variational_scheme to the list of implemented variational_schemes in DATeS.

Parameters:
  • new_variational_scheme_name – name of the new variational_scheme to add to list
  • new_variational_scheme_src_languages – list containing source code languages,
  • new_variational_scheme_src_path – directory to add the new variational_schemes’ source code.
  • variational_schemes_list_relative_dir – relative path of the directory where the variational_schemes_list files exists, the path must be relative to DATEeS_root_path.
  • variational_schemes_list_file_name – name of the file containing variational_schemes’ list.
  • backup_old_file – save a copy of the file with extension .bk appended to the end of the file name.
Returns:

None

add_smoother_to_smoothers_list(new_smoother_name, new_smoother_src_languages, new_smoother_src_path, smoothers_list_relative_dir='src/smoothers_forest', smoothers_list_file_name='smoothers_list.txt', backup_old_file=True)

Add a new smoother to the list of implemented smoothers in DATeS.

Parameters:
  • new_smoother_name – name of the new smoother to add to list
  • new_smoother_src_languages – list containing source code languages,
  • new_smoother_src_path – directory to add the new smoothers’ source code.
  • smoothers_list_relative_dir – relative path of the directory where the smoothers_list files exists, the path must be relative to DATEeS_root_path.
  • smoothers_list_file_name – name of the file containing smoothers’ list.
  • backup_old_file – save a copy of the file with extension .bk appended to the end of the file name.
Returns:

None

clean_executable_files(root_dir=None, rm_extensions=['.o', '.out', '.pyc', '.exe'])

remove executable files generated during execution in all subdirectories under the passed root_dir

Parameters:
  • root_dir – directory to start search recursively for executable files under. If None is passed, DATES_ROOT_PATH will be used...
  • rm_extensions – list containing all extensions to search for and remove. if only one type is passed, it can be a string. Be careful with this list after making use of Python Extensions...
Returns:

None

formulate_list_file_header(schemes_type, line_length=120)

Return a string containing the header of any of the files containing lists of (models, filters,...)

Returns:file_header
get_model_source_path(model_name, full_path=True)

retrieve the path of the source code directory given the model name

Parameters:
  • model_name – string containing model name
  • full_path – full path if true, otherwise relative path to DATES_ROOT_PATH is returned
Returns:

path of the model source files

Return type:

model_src_path

prepare_model_files(model_name, working_dir_rel_path='EXPERIMENT_RUN/', subdir_name='model_src')

Copy the necessary model files to the working directory directory.

Parameters:model_name – name of the model. must be in the table of models in the models list file
Returns:full path of the model_source files directory for experiment run
Return type:target_dir
query_yes_no(message, default='yes')

Terminal-based query: Y/N. This keeps asking until a valid yes/no is passed by user.

Parameters:
  • message – a string prented on the termianl to the user.
  • def_answer – the answer presumed, e.g. if the user just hits <Enter>. It must be “yes” (the default), “no” or None (in the latter case, the answer is required of the user).
Returns:

The “answer” return value is True for “yes” or False for “no”.

read_filters_list(return_filter_full_path=True, filters_list_relative_dir='src/filters', filters_list_file_name='filters_list.txt')

Retrieve the list of implemented filters in DATeS.

Parameters:
  • filter_full_path – return the filters’ path as full path if true, otherwise relative paths are returned.
  • filters_list_relative_dir – relative path of the directory where the filters_list files exists, the path must be relative to DATEeS_root_path.
  • filters_list_file_name – name of the file containing filters’ list.
Returns:

  • the first is a list contains filters names,
  • the second is a list of lists containing source code languages,
  • the third is a list containing filters paths, return Full path by default.

Return type:

Three lists

read_hybrid_schemes_list(return_hybrid_scheme_full_path=True, hybrid_schemes_list_relative_dir='src/hybrid_schemes', hybrid_schemes_list_file_name='hybrid_schemes_list.txt')

Retrieve the list of implemented hybrid_schemes in DATeS.

Parameters:
  • hybrid_scheme_full_path – return the hybrid_schemes’ path as full path if true, otherwise relative paths are returned.
  • hybrid_schemes_list_relative_dir – relative path of the directory where the hybrid_schemes_list files exists, the path must be relative to DATEeS_root_path.
  • hybrid_schemes_list_file_name – name of the file containing hybrid_schemes’ list.
Returns:

  • the first is a list contains hybrid_schemes names,
  • the second is a list of lists containing source code languages,
  • the third is a list containing hybrid_schemes paths, return Full path by default.

Return type:

Three lists

read_models_list(return_model_full_path=True, models_list_relative_dir='src/Models_Forest', models_list_file_name='models_list.txt')

Retrieve the list of implemented models in DATeS.

Parameters:
  • model_full_path – return the models’ path as full path if true, otherwise relative paths are returned.
  • models_list_relative_dir – relative path of the directory where the models_list files exists, the path must be relative to DATEeS_root_path.
  • models_list_file_name – name of the file containing models’ list.
Returns:

  • the first is a list contains models names,
  • the second is a list of lists containing source code languages,
  • the third is a list containing models paths, return Full path by default.

Return type:

Three lists

read_smoothers_list(return_smoother_full_path=True, smoothers_list_relative_dir='src/smoothers', smoothers_list_file_name='smoothers_list.txt')

Retrieve the list of implemented smoothers in DATeS.

Parameters:
  • smoother_full_path – return the smoothers’ path as full path if true, otherwise relative paths are returned.
  • smoothers_list_relative_dir – relative path of the directory where the smoothers_list files exists, the path must be relative to DATEeS_root_path.
  • smoothers_list_file_name – name of the file containing smoothers’ list.
Returns:

  • the first is a list contains smoothers names,
  • the second is a list of lists containing source code languages,
  • the third is a list containing smoothers paths, return Full path by default.

Return type:

Three lists

read_variational_schemes_list(return_variational_schemes_full_path=True, variational_schemes_list_relative_dir='src/variational_schemes', variational_schemes_list_file_name='variational_schemes_list.txt')

Retrieve the list of implemented variational_schemes in DATeS.

Parameters:
  • variational_scheme_full_path – return the variational_schemes’ path as full path if true, otherwise relative paths are returned.
  • variational_schemes_list_relative_dir – relative path of the directory where the variational_schemes_list files exists, the path must be relative to DATEeS_root_path.
  • variational_schemes_list_file_name – name of the file containing variational_schemes’ list.
Returns:

  • the first is a list contains variational_schemes names,
  • the second is a list of lists containing source code languages,
  • the third is a list containing variational_schemes paths, return Full path by default.

Return type:

Three lists

_utility_configs

A module providing functions that handle configurations files; this includes reading, righting, and validating them.

aggregate_configurations(configs, def_configs, copy_configurations=True)

Blindly (and recursively) combine the two dictionaries. One-way copying: from def_configs to configs only. Add default configurations to the passed configs dictionary

Parameters:
  • configs
  • def_configs
  • copy_configurations
Returns:

Return type:

configs

read_assimilation_configurations(config_file_name='setup.inp', config_file_relative_dir=None)

Properly read the assimilation configurations from passed configurations file

Parameters:
  • config_file_name – name of the configurations file
  • config_file_relative_dir – relative directory where the configurations file <config_file_name> can be found, If this is None, the root directory of dates [$DATES_ROOT_PATH] will be used.
Returns:

model_configs: inout_configs:

Return type:

assimilation_configs

validate_assimilation_configurations(assimilation_configs=None, def_assimilation_configs=None, model_configs=None, def_model_configs=None, inout_configs=None, def_inout_configs=None, copy_configurations=True)

Configurations are validated against passed defaults. If defaults are empty, passed configurations are returned as is. If only default configs are passed, they are copied to corresponding configs dict

Parameters:
  • model_configs – a dictionary containing default model configurations. This should be obtained from a configurations file
  • default_model_configs – default model configurations
  • assimilation_configs – a dictionary containing default assimilation configurations.
  • def_assimilation_configs – default assimilation configurations
  • inout_configs – a dictionary containing default input/output configurations.
  • def_inout_configs – default input/output_configurations
  • copy_configurations – If True, A deep copy of default dict is returned rather than a reference. This is relevant only in passed configs are None.
Returns:

validated model configurations valid_assimilation_configs: validated assimilation configurations valid_inout_configs: validated input/output configurations

Return type:

valid_model_configs

write_dates_configs_template(file_name='assimilation_configs_template.inp', directory=None)

generate a configurations template file in the given path. Note that upon reading configurations later, a validation process for entries should be called.

Parameters:
  • file_name – name of the file to save configurations file template to,
  • directory – relative/full path of the directory to save the template in, if directory is not given, file is written in the cwd
write_dicts_to_config_file(file_name, out_dir, dicts, sections_headers)

Write one or more dictionaries (passed as a list) to a configuration file.

Parameters:
  • file_name (param) –
  • out_dir
  • dicts
  • sections_headers

_utility_data_assimilation

A module providing functions that handle DA functionalities; such as ensemble propagation, and inflation, etc.

calculate_localization_coefficients(radius, distances, method='Gauss')

Evaluate the spatial decorrelation coefficients based on the passed vector of distances and the method.

Parameters:
  • radius – decorrelation radius
  • distances – vector containing distances based on which decorrelation coefficients are calculated.
  • method – Localization mehtod. Methods supported: 1- Gaussian ‘Gauss’ 2- Gaspari_Cohn 3- Cosine 4- Cosine_squared 5- Exp3 6- Cubic 7- Quadro 8- Step 9- None
Returns:

a vector containing decorrelation coefficients.

Return type:

coefficients

calculate_rmse(first, second, vec_size=None)

Calculate the root mean squared error between two vectors of the same type. No exceptions are handled explicitly here

Parameters:
  • first
  • second
  • vec_size – length of each of the two vectors
Returns:

rmse

covariance_trace(ensemble, model=None, row_var=False, ddof=1)

Evaluate the trace of the covariance matrix given an ensemble of states.

Parameters:
  • ensemble – a list of model states, or a Numpy array. If it is a numpy array, each column is taken as state (row_vars=False).
  • model – model object. Needed of the passed ensemble is a list of model states.
  • row_var – active only if ensemble is a Numpy-nd array. Each row is a state variable. Set to True if each column is a state variable.
  • ddof – degree of freedom; the variance is corrected by dividing by sample_size - ddof
Returns:

the trace of the covariance matrix of the ensemble. This is the sum of the ensemble-based variances of the state variables.

Return type:

trace

create_synthetic_observations(model, observation_checkpoints, reference_checkpoints, reference_trajectory)

create synthetic observations given reference trajectory

ensemble_covariance_dot_state(ensemble, in_state, model=None)

Given an ensemble of states (list of state vectors), evaluate the effect of the ensemble-based covariance matrix on the passed state vector.

Parameters:
  • ensemble – a list of model states. If it is a numpy array, each column is taken as state (row_vars=False).
  • in_state – state vector
Returns:

The result of multiplying the ensemble-based covariance matrix by the given state.

Return type:

out_state

inflate_ensemble(ensemble, inflation_factor, in_place=True)

Apply inflation on an ensemble of states

propagate_ensemble(ensemble, model, checkpoints, in_place=True)

Given a list (ensemble) of model states, use the model object to propagate each state forward in time and create/modify the ensemble based on the flag in in_place.

Parameters:ensemble – list of model state vectors. Exceptions should be handled more carefully
random_orthonormal_matrix(dimension)

Generates a random orthonormal matrix O such that: Q * Q^T = I, where I is the identity matrix of size dimension x dimension

Parameters:dimension – size of the random orthonormal matrix to be generated
Returns:the random orthonormal matrix, Q is of size (dimension x dimension)
Return type:Q
random_orthonormal_mean_preserving_matrix(dimension)
Generates a random orthonormal mean-preserving matrix Q, such that:
Q * Q^T = I, where I is the identity matrix of size dimension x dimension, and Q * II = 0, where II is a column vector of ones, and 0 is a column vector of zeros.
Parameters:dimension – size of the random orthonormal matrix to be generated
Returns:the random orthonormal matrix, Q is of size (dimension x dimension)
Return type:Q
rank_hist(ensembles_repo, reference_repo, first_var=0, last_var=None, var_skp=1, draw_hist=False, hist_type='relfreq', first_time_ind=0, last_time_ind=None, time_ind_skp=1, hist_title=None, hist_max_height=None, font_size=None)

Calculate the rank statistics of the true solution/observations w.r.t an ensemble of states/observations

Parameters:
  • ensembles_repo – an ensemble of model states (or model observations). A numpy array of size (state/observation size, ensemble size, time instances)
  • reference_repo – a list of reference states (or model observations) A numpy array of size (state/observation size, time instances)
  • first_var – initial index in the reference states to evaluate ranks at
  • last_var – last index in the reference states to evaluate ranks at
  • var_skp – number of skipped variables to reduce correlation effect
  • draw_hist – If True, a rank histogram is plotted, and a figure handle is returned, None is returned otherwise
  • hist_type – ‘freq’ vs ‘relfreq’: Frequency vs Relative frequencies for plotting. Used only when ‘draw_hist’ is True.
  • first_time_ind – initial index in the time dimension to evaluate ranks at
  • last_time_ind – last index in the time dimension to evaluate ranks at
  • time_ind_skp – number of skipped time instances to reduce correlation effect
  • hist_title – histogram plot title (if given), and ‘draw_hist’ is True.
Returns:

frequencies of the rank of truth among ensemble members ranks_rel_freq: relative frequencies of the rank of truth among ensemble members bins_bounds: bounds of the bar plot fig_hist: a matlab.pyplot figure handle of the rank histogram plot

Return type:

ranks_freq

_utility_file_IO

A module providing utility functions required to handle files IO operations.

cleanup_directory(directory_name, parent_path, backup_existing=True)

Try to find the directory name under the parent path. I.e. parent_path/directory IF the directory does not exist, create it, otherwise either delete it’s contents or back them up. If backup_existing is True and zip_backup is true the backup is archived as *.zip file.

Parameters:
  • directory_name
  • parent_path
  • backup_existing
get_list_of_subdirectories(root_dir)

Retrieve a list of sub-directories .

Parameters:root_dir – directory to start constructing sub-directories of.
Returns:a list containing subdirectories under the given root_dir.
Return type:subdirs_list
Returns:list of subdirectories; returns None if root_dir has no subdirectories.
Return type:subdirs_list
read_ensemble_states(ensemble_size, ensemble_file_prefix='ensemble_mem', ensemble_file_ext='.dat', ensemble_relative_dir='ensemble_out/')

Read ensemble states and return an np.ndarray (two-dimensional) with each ensemble member stored in a column.

Parameters:
  • ensemble_size – number of ensemble members to read from files
  • ensemble_file_prefix – all files are named as <ensemble_file_prefix>_number.dat
  • ensemble_relative_dir – relative directory containing the ensemble file(s)
Returns:

two-dimensional np.ndarray containing the ensemble members.

try_file_name(directory, file_prefix, extension)

Try to find a suitable file name file_prefix_<number>.<extension>

Parameters:
  • directory
  • file_prefix
  • extension
Returns:

Return type:

file_name

zip_dir(path, output_location=None, save_full_path=False)

Backup a directory in a zip archive in the given ‘output_location’. If an archive with the same name in the output_location exists, a proper number-suffix will replace the archive name.

Parameters:
  • path – the path of the directory to zip. All files and subdirectory in the leaf directory will be archived.
  • output_location – where to save the zip archive
  • full_path – if true the whole path will be traced while archiving.

_utility_machine_learning

A module providing utility functions required to handle machine learning related operations.

GMM_clustering(Ensemble, num_comp, covariance_type, inf_criteria)

Standard Gaussian Mixture model with EM

Parameters:
  • Ensemble
  • num_comp
  • covariance_type
  • inf_criteria
Returns:

opt_inf_criterion: optimal_covar_type:

Return type:

gmm_model

VBGMM_clustering(Ensemble, num_comp, covariance_type, inf_criteria, alpha=1.0, random_state=None, thresh=None, tol=0.001, verbose=False, min_covar=None, n_iter=10, params='wmc', init_params='wmc')

Variational Gaussian Mixture model with EM

Parameters:
  • Ensemble
  • num_comp
  • covariance_type
  • inf_criteria
  • alpha=1.0
  • random_state=None
  • thresh=None
  • tol=0.001
  • verbose=False
  • min_covar=None
  • n_iter=10
  • params='wmc'
  • init_params='wmc'
Returns:

opt_inf_criterion: optimal_covar_type:

Return type:

gmm_model

generate_gmm_model_info(ensemble, clustering_model='gmm', cov_type=None, inf_criteria='aic', number_of_components=None, min_number_of_components=None, max_number_of_components=None, min_number_of_points_per_component=1, invert_uncertainty_param=False, verbose=False)

Build the best Gaussian mixture model fitting to an ensemble of states.

Parameters:
  • ensemble – A two dimensional numpy array with each row representing an ensemble member
  • clustering_model
  • cov_type
  • inf_criteria
  • number_of_components
  • min_number_of_components
  • max_number_of_components
  • min_number_of_points_per_component
  • invert_uncertainty_param
Returns:

converged: lables: weights: means: covariances: precisions: optimal_covar_type:

Return type:

optimal_model

_utility_optimization

A module providing functions that handle optimization related functionalities; this includes validating gradient, etc.

validate_gradient(state, gradient, func, *func_params, **fd_type_and_output)

Validate gradient of a scalar function

Parameters:
  • state – state vector at which gradient of the objective function is evaluated
  • gradient – exact gradient vector of ‘func’, evaluated at ‘state’
  • func – objective function to differentiate
  • func_params – (tuple) of function parameters passed to ‘func’
  • fd_type_and_output – (dictionary) of named arguments. Supported are: - ‘FD_type’: type of the finite difference scheme: ‘left’, ‘right’, ‘central’ - ‘screen_output’: If True, results are printed to screen before return.
Returns:

Gradient vector evaluated using finite difference approximation Rel_ERR: vector containing relative errors of the exact vs. approximate gradient.

Return type:

Grad_FD

_utility_stat

A module providing functions that handle statistics-related operations; this includes generating random vectors, etc.

add_ensemble(first_ensemble, second_ensemble, in_place=True)

Add two ensembles. if in_place, the first is overwritten

Parameters:
  • first_ensemble
  • second_ensemble
  • in_place
Returns:

result_vectors_list

ensemble_mean(vectors_list)

Given a list of State or Observation vectors, the mean is evaluated and returned

Parameters:vectors_list – a list of state or observation vectors
Returns:a vecotr of the same type as the entries of vecotrs_list, containing the mean of objects of vectors_list.
Return type:ens_average
ensemble_precisions(ensemble, sample_based=True, return_state_vector=False)

Calculate the ensemble-based statevector precisions (variance reciprocals).

Parameters:
  • ensemble
  • sample_based
  • return_state_vector
ensemble_variances(ensemble, sample_based=True, return_state_vector=False)

Calculate the ensemble-based statevector variances.

Parameters:
  • ensemble
  • sample_based
  • return_state_vector
Returns:

_raw_vector_ref

generate_ensemble(ensemble_size, ensemble_average, noise_model)
Create a list/ensemble of states of size ensemble_size with noise vectors created using
noise_model centered around ensemble_mean.
Parameters:
  • ensemble_size
  • ensemble_average
  • noise_model
Returns:

perturbed_ensemble

mvn_rand_vec(vec_size)

Function generating a standard normal random vector with values truncated at -/+3

Parameters:vec_size
Returns:
Return type:randn_vec

_utility_url

A module providing classes and functions that handle url-related functionalities; such as downloading files, etc.

class URLDownload(link=None, file_name=None, checksum='md5', download_immediately=False)

Download a file. A simple class to download files using urllib

Parameters:
  • link
  • file_name
  • checksum
  • download_immediately

Methods

__init__(link=None, file_name=None, checksum='md5', download_immediately=False)
download(file_link=None, file_name=None, print_summary=False, return_summary=False)

Download a file from web with showing progression and hash

Parameters:
  • file_link
  • file_name
  • print_summary
  • return_summary
hook(*data)

This hook function will be called once on establishment of the network connection and once after each block read thereafter. The hook will be passed three arguments; a count of blocks transferred so far, a block size in bytes, and the total size of the file. The third argument may be -1 on older FTP servers which do not return a file size in response to a retrieval request.