Title: | End-to-End Automated Machine Learning and Model Evaluation |
---|---|
Description: | Single unified interface for end-to-end modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals. |
Authors: | Alex Zwanenburg [aut, cre] |
Maintainer: | Alex Zwanenburg <[email protected]> |
License: | EUPL |
Version: | 1.5.0 |
Built: | 2025-02-20 05:33:12 UTC |
Source: | https://github.com/alexzwanenburg/familiar |
This methods aggregates variable importance from one or more
vimpTable
objects.
aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'list' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'character' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'vimpTable' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'NULL' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'experimentData' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...)
aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'list' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'character' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'vimpTable' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'NULL' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...) ## S4 method for signature 'experimentData' aggregate_vimp_table(x, aggregation_method, rank_threshold = NULL, ...)
x |
Variable importance ( |
aggregation_method |
Method used to aggregate variable importance. The available methods are described in the feature selection methods vignette. |
rank_threshold |
Rank threshold used within several aggregation methods. See the feature selection methods vignette for more details. |
... |
unused parameters. |
A vimpTable
object with aggregated variable importance data.
Creates dataObject
a object from input data. Input data can be
a data.frame
or data.table
, a path to such tables on a local or network
drive, or a path to tabular data that may be converted to these formats.
In addition, a familiarEnsemble
or familiarModel
object can be passed
along to check whether the data are formatted correctly, e.g. by checking
the levels of categorical features, whether all expected columns are
present, etc.
as_data_object(data, ...) ## S4 method for signature 'dataObject' as_data_object(data, object = NULL, ...) ## S4 method for signature 'data.table' as_data_object( data, object = NULL, sample_id_column = waiver(), batch_id_column = waiver(), series_id_column = waiver(), development_batch_id = waiver(), validation_batch_id = waiver(), outcome_name = waiver(), outcome_column = waiver(), outcome_type = waiver(), event_indicator = waiver(), censoring_indicator = waiver(), competing_risk_indicator = waiver(), class_levels = waiver(), exclude_features = waiver(), include_features = waiver(), reference_method = waiver(), check_stringency = "strict", ... ) ## S4 method for signature 'ANY' as_data_object( data, object = NULL, sample_id_column = waiver(), batch_id_column = waiver(), series_id_column = waiver(), ... )
as_data_object(data, ...) ## S4 method for signature 'dataObject' as_data_object(data, object = NULL, ...) ## S4 method for signature 'data.table' as_data_object( data, object = NULL, sample_id_column = waiver(), batch_id_column = waiver(), series_id_column = waiver(), development_batch_id = waiver(), validation_batch_id = waiver(), outcome_name = waiver(), outcome_column = waiver(), outcome_type = waiver(), event_indicator = waiver(), censoring_indicator = waiver(), competing_risk_indicator = waiver(), class_levels = waiver(), exclude_features = waiver(), include_features = waiver(), reference_method = waiver(), check_stringency = "strict", ... ) ## S4 method for signature 'ANY' as_data_object( data, object = NULL, sample_id_column = waiver(), batch_id_column = waiver(), series_id_column = waiver(), ... )
data |
A |
... |
Unused arguments. |
object |
A |
sample_id_column |
(recommended) Name of the column containing
sample or subject identifiers. See If unset, every row will be identified as a single sample. |
batch_id_column |
(recommended) Name of the column containing batch or cohort identifiers. This parameter is required if more than one dataset is provided, or if external validation is performed. In familiar any row of data is organised by four identifiers:
|
series_id_column |
(optional) Name of the column containing series
identifiers, which distinguish between measurements that are part of a
series for a single sample. See If unset, rows which share the same batch and sample identifiers but have a different outcome are assigned unique series identifiers. |
development_batch_id |
(optional) One or more batch or cohort
identifiers to constitute data sets for development. Defaults to all, or
all minus the identifiers in |
validation_batch_id |
(optional) One or more batch or cohort
identifiers to constitute data sets for external validation. Defaults to
all data sets except those in |
outcome_name |
(optional) Name of the modelled outcome. This name will
be used in figures created by If not set, the column name in |
outcome_column |
(recommended) Name of the column containing the
outcome of interest. May be identified from a formula, if a formula is
provided as an argument. Otherwise an error is raised. Note that |
outcome_type |
(recommended) Type of outcome found in the outcome column. The outcome type determines many aspects of the overall process, e.g. the available feature selection methods and learners, but also the type of assessments that can be conducted to evaluate the resulting models. Implemented outcome types are:
If not provided, the algorithm will attempt to obtain outcome_type from contents of the outcome column. This may lead to unexpected results, and we therefore advise to provide this information manually. Note that |
event_indicator |
(recommended) Indicator for events in |
censoring_indicator |
(recommended) Indicator for right-censoring in
|
competing_risk_indicator |
(recommended) Indicator for competing
risks in |
class_levels |
(optional) Class levels for |
exclude_features |
(optional) Feature columns that will be removed
from the data set. Cannot overlap with features in |
include_features |
(optional) Feature columns that are specifically
included in the data set. By default all features are included. Cannot
overlap with |
reference_method |
(optional) Method used to set reference levels for categorical features. There are several options:
|
check_stringency |
Specifies stringency of various checks. This is mostly:
|
You can specify settings for your data manually, e.g. the column for
sample identifiers (sample_id_column
). This prevents you from having to
change the column name externally. In the case you provide a familiarModel
or familiarEnsemble
for the object
argument, any parameters you provide
take precedence over parameters specified by the object.
A dataObject
object.
Creates a familiarCollection
objects from familiarData
,
familiarEnsemble
or familiarModel
objects.
as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarCollection' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarData' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarEnsemble' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarModel' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'list' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'character' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'ANY' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... )
as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarCollection' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarData' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarEnsemble' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'familiarModel' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'list' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'character' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... ) ## S4 method for signature 'ANY' as_familiar_collection( object, familiar_data_names = NULL, collection_name = NULL, ... )
object |
|
familiar_data_names |
Names of the dataset(s). Only used if the |
collection_name |
Name of the collection. |
... |
Arguments passed on to
|
A data
argument is expected if the object
argument is a
familiarEnsemble
object or one or more familiarModel
objects.
A familiarCollection
object.
Creates familiarData
a object from familiarEnsemble
or
familiarModel
objects.
as_familiar_data(object, ...) ## S4 method for signature 'familiarData' as_familiar_data(object, ...) ## S4 method for signature 'familiarEnsemble' as_familiar_data(object, name = NULL, ...) ## S4 method for signature 'familiarModel' as_familiar_data(object, ...) ## S4 method for signature 'list' as_familiar_data(object, ...) ## S4 method for signature 'character' as_familiar_data(object, ...) ## S4 method for signature 'ANY' as_familiar_data(object, ...)
as_familiar_data(object, ...) ## S4 method for signature 'familiarData' as_familiar_data(object, ...) ## S4 method for signature 'familiarEnsemble' as_familiar_data(object, name = NULL, ...) ## S4 method for signature 'familiarModel' as_familiar_data(object, ...) ## S4 method for signature 'list' as_familiar_data(object, ...) ## S4 method for signature 'character' as_familiar_data(object, ...) ## S4 method for signature 'ANY' as_familiar_data(object, ...)
object |
A |
... |
Arguments passed on to
|
name |
Name of the |
The data
argument is required if familiarEnsemble
or
familiarModel
objects are provided.
A familiarData
object.
Creates familiarEnsemble
a object from familiarModel
objects.
as_familiar_ensemble(object, ...) ## S4 method for signature 'familiarEnsemble' as_familiar_ensemble(object, ...) ## S4 method for signature 'familiarModel' as_familiar_ensemble(object, ...) ## S4 method for signature 'list' as_familiar_ensemble(object, ...) ## S4 method for signature 'character' as_familiar_ensemble(object, ...) ## S4 method for signature 'ANY' as_familiar_ensemble(object, ...)
as_familiar_ensemble(object, ...) ## S4 method for signature 'familiarEnsemble' as_familiar_ensemble(object, ...) ## S4 method for signature 'familiarModel' as_familiar_ensemble(object, ...) ## S4 method for signature 'list' as_familiar_ensemble(object, ...) ## S4 method for signature 'character' as_familiar_ensemble(object, ...) ## S4 method for signature 'ANY' as_familiar_ensemble(object, ...)
object |
A |
... |
Unused arguments. |
A familiarEnsemble
object.
Extract model coefficients
coef(object, ...) ## S4 method for signature 'familiarModel' coef(object, ...)
coef(object, ...) ## S4 method for signature 'familiarModel' coef(object, ...)
object |
a familiarModel object |
... |
additional arguments passed to |
This method extends the coef
S3 method. For some models coef
requires information that is trimmed from the model. In this case a copy of
the model coefficient is stored with the model, and returned.
Coefficients extracted from the model in the familiarModel object, if any.
The dataObject class is used to resolve the issue of keeping track of pre-processing status and data loading inside complex workflows, e.g. nested predict functions inside a calibration function.
data
NULL or data table containing the data. This is the data which will be read and used.
preprocessing_level
character indicating the level of pre-processing already conducted.
outcome_type
character, determines the outcome type.
data_column_info
Object containing column information.
delay_loading
logical. Allows delayed loading data, which enables data parsing downstream without additional workflow complexity or memory utilisation.
perturb_level
numeric. This is the perturbation level for data which has not been loaded. Used for data retrieval by interacting with the run table of the accompanying model.
load_validation
logical. This determines which internal data set will be loaded. If TRUE, the validation data will be loaded, whereas FALSE loads the development data.
aggregate_on_load
logical. Determines whether data is aggregated after loading.
sample_set_on_load
NULL or vector of sample identifiers to be loaded.
An experimentData object contains information concerning the experiment. These objects can be used to instantiate multiple experiments using the same iterations, feature information and variable importance.
experimentData objects are primarily used to improve reproducibility, since these allow for training models on a shared foundation.
experiment_setup
Contains regarding the experimental setup that is used to generate the iteration list.
iteration_list
List of iteration data that determines which instances are assigned to training, validation and test sets.
feature_info
Feature information objects. Only available if the
experimentData object was generated using the precompute_feature_info
or
precompute_vimp
functions.
vimp_table_list
List of variable importance table objects. Only
available if the experimentData object was created using the
precompute_vimp
function.
project_id
Identifier of the project that generated the experimentData object.
familiar_version
Version of the familiar package used to create this experimentData.
precompute_data_assignment
precompute_feature_info
, precompute_vimp
Extract and export all data from a familiarCollection.
export_all(object, dir_path = NULL, aggregate_results = waiver(), ...) ## S4 method for signature 'familiarCollection' export_all(object, dir_path = NULL, aggregate_results = waiver(), ...) ## S4 method for signature 'ANY' export_all(object, dir_path = NULL, aggregate_results = waiver(), ...)
export_all(object, dir_path = NULL, aggregate_results = waiver(), ...) ## S4 method for signature 'familiarCollection' export_all(object, dir_path = NULL, aggregate_results = waiver(), ...) ## S4 method for signature 'ANY' export_all(object, dir_path = NULL, aggregate_results = waiver(), ...)
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
... |
Arguments passed on to
|
Data, such as model performance and calibration information, is
usually collected from a familiarCollection
object. However, you can also
provide one or more familiarData
objects, that will be internally
converted to a familiarCollection
object. It is also possible to provide a
familiarEnsemble
or one or more familiarModel
objects together with the
data from which data is computed prior to export. Paths to the previous
files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export ROC and Precision-Recall curves for models in a familiarCollection.
export_auc_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_auc_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_auc_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_auc_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_auc_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_auc_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
ROC curve data are exported for individual and ensemble models. For ensemble models, a credibility interval for the ROC curve is determined using bootstrapping for each metric. In case of multinomial outcomes, ROC-curves are computed for each class, using a one-against-all approach.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export calibration and goodness-of-fit tests for data in a familiarCollection.
export_calibration_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_calibration_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_calibration_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_calibration_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_calibration_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_calibration_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Calibration tests are performed based on expected (predicted) and observed outcomes. For all outcomes, calibration-at-the-large and calibration slopes are determined. Furthermore, for all but survival outcomes, a repeated, randomised grouping Hosmer-Lemeshow test is performed. For survival outcomes, the Nam-D'Agostino and Greenwood-Nam-D'Agostino tests are performed.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export calibration information (e.g. baseline survival) for data in a familiarCollection.
export_calibration_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_calibration_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_calibration_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_calibration_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_calibration_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_calibration_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Currently only baseline survival is exported as supporting calibration
information. See export_calibration_data
for export of direct assessment
of calibration, including calibration and goodness-of-fit tests.
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files.
Extract and export confusion matrics for models in a familiarCollection.
export_confusion_matrix_data( object, dir_path = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_confusion_matrix_data( object, dir_path = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_confusion_matrix_data( object, dir_path = NULL, export_collection = FALSE, ... )
export_confusion_matrix_data( object, dir_path = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_confusion_matrix_data( object, dir_path = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_confusion_matrix_data( object, dir_path = NULL, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Confusion matrices are exported for individual and ensemble models.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export decision curve analysis data in a familiarCollection.
export_decision_curve_analysis_data( object, dir_path = NULL, aggregate_results = TRUE, ... ) ## S4 method for signature 'familiarCollection' export_decision_curve_analysis_data( object, dir_path = NULL, aggregate_results = TRUE, ... ) ## S4 method for signature 'ANY' export_decision_curve_analysis_data( object, dir_path = NULL, aggregate_results = TRUE, ... )
export_decision_curve_analysis_data( object, dir_path = NULL, aggregate_results = TRUE, ... ) ## S4 method for signature 'familiarCollection' export_decision_curve_analysis_data( object, dir_path = NULL, aggregate_results = TRUE, ... ) ## S4 method for signature 'ANY' export_decision_curve_analysis_data( object, dir_path = NULL, aggregate_results = TRUE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Decision curve analysis data is computed for categorical outcomes, i.e. binomial and multinomial, as well as survival outcomes.
A list of data.table (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export feature expressions for the features in a familiarCollection.
export_feature_expressions( object, dir_path = NULL, evaluation_time = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_feature_expressions( object, dir_path = NULL, evaluation_time = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_feature_expressions( object, dir_path = NULL, evaluation_time = waiver(), export_collection = FALSE, ... )
export_feature_expressions( object, dir_path = NULL, evaluation_time = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_feature_expressions( object, dir_path = NULL, evaluation_time = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_feature_expressions( object, dir_path = NULL, evaluation_time = waiver(), export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
evaluation_time |
One or more time points that are used to create the
outcome columns in expression plots. If not provided explicitly, this
parameter is read from settings used at creation of the underlying
|
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Feature expressions are computed by standardising each feature, i.e. sample mean is 0 and standard deviation is 1.
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files.
Extract and export mutual correlation between features in a familiarCollection.
export_feature_similarity( object, dir_path = NULL, aggregate_results = TRUE, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), export_dendrogram = FALSE, export_ordered_data = FALSE, export_clustering = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_feature_similarity( object, dir_path = NULL, aggregate_results = TRUE, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), export_dendrogram = FALSE, export_ordered_data = FALSE, export_clustering = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_feature_similarity( object, dir_path = NULL, aggregate_results = TRUE, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), export_dendrogram = FALSE, export_ordered_data = FALSE, export_clustering = FALSE, export_collection = FALSE, ... )
export_feature_similarity( object, dir_path = NULL, aggregate_results = TRUE, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), export_dendrogram = FALSE, export_ordered_data = FALSE, export_clustering = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_feature_similarity( object, dir_path = NULL, aggregate_results = TRUE, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), export_dendrogram = FALSE, export_ordered_data = FALSE, export_clustering = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_feature_similarity( object, dir_path = NULL, aggregate_results = TRUE, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), export_dendrogram = FALSE, export_ordered_data = FALSE, export_clustering = FALSE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
feature_cluster_method |
The method used to perform clustering. These are
the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_cluster_cut_method |
The method used to divide features into
separate clusters. The available methods are the same as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_similarity_threshold |
The threshold level for pair-wise
similarity that is required to form feature clusters with the If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
export_dendrogram |
Add dendrogram in the data element objects. |
export_ordered_data |
Add feature label ordering to data in the data element objects. |
export_clustering |
Add clustering information to data. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
A list containing a data.table (if dir_path
is not provided), or
nothing, as all data is exported to csv
files.
Extract and export feature selection variable importance from a familiarCollection.
export_fs_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_fs_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_fs_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... )
export_fs_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_fs_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_fs_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
aggregation_method |
(optional) The method used to aggregate variable importances over different data subsets, e.g. bootstraps. The following methods can be selected:
|
rank_threshold |
(optional) The threshold used to define the subset of highly important features. If not set, this threshold is determined by maximising the variance in the occurrence value over all features over the subset size. This parameter is only relevant for |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data, such as model performance and calibration information, is
usually collected from a familiarCollection
object. However, you can also
provide one or more familiarData
objects, that will be internally
converted to a familiarCollection
object. Paths to the previous files can
also be provided.
Unlike other export function, export using familiarEnsemble
or
familiarModel
objects is not possible. This is because feature selection
variable importance is not stored within familiarModel
objects.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Variable importance is based on the ranking produced by feature selection
routines. In case feature selection was performed repeatedly, e.g. using
bootstraps, feature ranks are first aggregated using the method defined by
the aggregation_method
, some of which require a rank_threshold
to
indicate a subset of most important features.
Information concerning highly similar features that form clusters is provided as well. This information is based on consensus clustering of the features. This clustering information is also used during aggregation to ensure that co-clustered features are only taken into account once.
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files.
Extract and export model hyperparameters from models in a familiarCollection.
export_hyperparameters( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_hyperparameters( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_hyperparameters( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_hyperparameters( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_hyperparameters( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_hyperparameters( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data, such as model performance and calibration information, is
usually collected from a familiarCollection
object. However, you can also
provide one or more familiarData
objects, that will be internally
converted to a familiarCollection
object. It is also possible to provide a
familiarEnsemble
or one or more familiarModel
objects together with the
data from which data is computed prior to export. Paths to the previous
files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Many model hyperparameters are optimised using sequential model-based
optimisation. The extracted hyperparameters are those that were selected to
construct the underlying models (familiarModel
objects).
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files. In case of the latter, hyperparameters are
summarised.
Extract and export individual conditional expectation data.
export_ice_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_ice_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_ice_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_ice_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_ice_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_ice_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export metrics for model performance of models in a familiarCollection.
export_model_performance( object, dir_path = NULL, aggregate_results = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_model_performance( object, dir_path = NULL, aggregate_results = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_model_performance( object, dir_path = NULL, aggregate_results = FALSE, export_collection = FALSE, ... )
export_model_performance( object, dir_path = NULL, aggregate_results = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_model_performance( object, dir_path = NULL, aggregate_results = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_model_performance( object, dir_path = NULL, aggregate_results = FALSE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Performance of individual and ensemble models is exported. For ensemble models, a credibility interval is determined using bootstrapping for each metric.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export model-based variable importance from a familiarCollection.
export_model_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_model_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_model_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... )
export_model_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_model_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_model_vimp( object, dir_path = NULL, aggregate_results = TRUE, aggregation_method = waiver(), rank_threshold = waiver(), export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
aggregation_method |
(optional) The method used to aggregate variable importances over different data subsets, e.g. bootstraps. The following methods can be selected:
|
rank_threshold |
(optional) The threshold used to define the subset of highly important features. If not set, this threshold is determined by maximising the variance in the occurrence value over all features over the subset size. This parameter is only relevant for |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data, such as model performance and calibration information, is
usually collected from a familiarCollection
object. However, you can also
provide one or more familiarData
objects, that will be internally
converted to a familiarCollection
object. It is also possible to provide a
familiarEnsemble
or one or more familiarModel
objects together with the
data from which data is computed prior to export. Paths to the previous
files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Variable importance is based on the ranking produced by model-specific
variable importance routines, e.g. permutation for random forests. If such a
routine is absent, variable importance is based on the feature selection
method that led to the features included in the model. In case multiple
models (familiarModel
objects) are combined, feature ranks are first
aggregated using the method defined by the aggregation_method
, some of
which require a rank_threshold
to indicate a subset of most important
features.
Information concerning highly similar features that form clusters is provided as well. This information is based on consensus clustering of the features that were used in the signatures of the underlying models. This clustering information is also used during aggregation to ensure that co-clustered features are only taken into account once.
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files.
Extract and export partial dependence data.
export_partial_dependence_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_partial_dependence_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_partial_dependence_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_partial_dependence_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_partial_dependence_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_partial_dependence_data( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export model-based variable importance from a familiarCollection.
export_permutation_vimp( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_permutation_vimp( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_permutation_vimp( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_permutation_vimp( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_permutation_vimp( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_permutation_vimp( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data, such as permutation variable importance and calibration
information, is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previously mentioned files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Permutation Variable importance assesses the improvement in model performance due to a feature. For this purpose, the performance of the model is measured as normal, and is measured again with a dataset where the values of the feature in question have been randomly permuted. The difference between both performance measurements is the permutation variable importance.
In familiar, this basic concept is extended in several ways:
Point estimates of variable importance are based on multiple (21) random permutations. The difference between model performance on the normal dataset and the median performance measurement of the randomly permuted datasets is used as permutation variable importance.
Confidence intervals for the ensemble model are determined using bootstrap methods.
Permutation variable importance is assessed for any metric specified using
the metric
argument.
Permutation variable importance can take into account similarity between features and permute similar features simultaneously.
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files.
Extract and export the values predicted by single and ensemble models in a familiarCollection.
export_prediction_data(object, dir_path = NULL, export_collection = FALSE, ...) ## S4 method for signature 'familiarCollection' export_prediction_data(object, dir_path = NULL, export_collection = FALSE, ...) ## S4 method for signature 'ANY' export_prediction_data(object, dir_path = NULL, export_collection = FALSE, ...)
export_prediction_data(object, dir_path = NULL, export_collection = FALSE, ...) ## S4 method for signature 'familiarCollection' export_prediction_data(object, dir_path = NULL, export_collection = FALSE, ...) ## S4 method for signature 'ANY' export_prediction_data(object, dir_path = NULL, export_collection = FALSE, ...)
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data, such as model performance and calibration information, is
usually collected from a familiarCollection
object. However, you can also
provide one or more familiarData
objects, that will be internally
converted to a familiarCollection
object. It is also possible to provide a
familiarEnsemble
or one or more familiarModel
objects together with the
data from which data is computed prior to export. Paths to the previous
files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Both single and ensemble predictions are exported.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export sample risk group stratification and associated tests for data in a familiarCollection.
export_risk_stratification_data( object, dir_path = NULL, export_strata = TRUE, time_range = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_risk_stratification_data( object, dir_path = NULL, export_strata = TRUE, time_range = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_risk_stratification_data( object, dir_path = NULL, export_strata = TRUE, time_range = NULL, export_collection = FALSE, ... )
export_risk_stratification_data( object, dir_path = NULL, export_strata = TRUE, time_range = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_risk_stratification_data( object, dir_path = NULL, export_strata = TRUE, time_range = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_risk_stratification_data( object, dir_path = NULL, export_strata = TRUE, time_range = NULL, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
export_strata |
Flag that determines whether the raw data or strata are exported. |
time_range |
Time range for which strata should be created. If |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Three tables are exported in a list:
data
: Contains the assigned risk group for a given sample, along with
its reported survival time and censoring status.
hr_ratio
: Contains the hazard ratio between different risk groups.
logrank
: Contains the results from the logrank test between different
risk groups.
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
Extract and export cut-off values for risk group stratification by models in a familiarCollection.
export_risk_stratification_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_risk_stratification_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_risk_stratification_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
export_risk_stratification_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_risk_stratification_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_risk_stratification_info( object, dir_path = NULL, aggregate_results = TRUE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Stratification cut-off values are determined when creating a model, using
one of several methods set by the stratification_method
parameter. These
values are then used to stratify samples in any new dataset. The available
methods are:
median
(default): The median predicted value in the development cohort
is used to stratify the samples into two risk groups.
fixed
: Samples are stratified based on the sample quantiles of the
predicted values. These quantiles are defined using the
stratification_threshold
parameter.
optimised
: Use maximally selected rank statistics to determine the
optimal threshold (Lausen and Schumacher, 1992; Hothorn et al., 2003) to
stratify samples into two optimally separated risk groups.
A data.table (if dir_path
is not provided), or nothing, as all data
is exported to csv
files.
Lausen, B. & Schumacher, M. Maximally Selected Rank Statistics. Biometrics 48, 73 (1992).
Hothorn, T. & Lausen, B. On the exact distribution of maximally selected rank statistics. Comput. Stat. Data Anal. 43, 121–137 (2003).
Extract and export mutual correlation between features in a familiarCollection.
export_sample_similarity( object, dir_path = NULL, aggregate_results = TRUE, sample_limit = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), export_dendrogram = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_sample_similarity( object, dir_path = NULL, aggregate_results = TRUE, sample_limit = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), export_dendrogram = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_sample_similarity( object, dir_path = NULL, aggregate_results = TRUE, sample_limit = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), export_dendrogram = FALSE, export_collection = FALSE, ... )
export_sample_similarity( object, dir_path = NULL, aggregate_results = TRUE, sample_limit = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), export_dendrogram = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_sample_similarity( object, dir_path = NULL, aggregate_results = TRUE, sample_limit = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), export_dendrogram = FALSE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_sample_similarity( object, dir_path = NULL, aggregate_results = TRUE, sample_limit = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), export_dendrogram = FALSE, export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
aggregate_results |
Flag that signifies whether results should be aggregated for export. |
sample_limit |
(optional) Set the upper limit of the number of samples that are used during evaluation steps. Cannot be less than 20. This setting can be specified per data element by providing a parameter
value in a named list with data elements, e.g.
This parameter can be set for the following data elements:
|
sample_cluster_method |
The method used to perform clustering based on
distance between samples. These are the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
sample_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
export_dendrogram |
Add dendrogram in the data element objects. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
A list containing a data.table (if dir_path
is not provided), or
nothing, as all data is exported to csv
files.
Extract and export univariate analysis data of features for data in a familiarCollection.
export_univariate_analysis_data( object, dir_path = NULL, p_adjustment_method = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_univariate_analysis_data( object, dir_path = NULL, p_adjustment_method = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_univariate_analysis_data( object, dir_path = NULL, p_adjustment_method = waiver(), export_collection = FALSE, ... )
export_univariate_analysis_data( object, dir_path = NULL, p_adjustment_method = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' export_univariate_analysis_data( object, dir_path = NULL, p_adjustment_method = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' export_univariate_analysis_data( object, dir_path = NULL, p_adjustment_method = waiver(), export_collection = FALSE, ... )
object |
A |
dir_path |
Path to folder where extracted data should be saved. |
p_adjustment_method |
(optional) Indicates type of p-value that is
shown. One of |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Univariate analysis includes the computation of p and q-values, as well as robustness (in case of repeated measurements). p-values are derived from Wald's test.
A data.table (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
End-to-end, automated machine learning package for creating trustworthy and interpretable machine learning models. Familiar supports modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. In addition, an novelty or out-of-distribution detector is trained simultaneously and contained with every model. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals.
Maintainer: Alex Zwanenburg [email protected] (ORCID)
Authors:
Steffen Löck
Other contributors:
Stefan Leger [contributor]
Iram Shahzadi [contributor]
Asier Rabasco Meneghetti [contributor]
Sebastian Starke [contributor]
Technische Universität Dresden [copyright holder]
German Cancer Research Center (DKFZ) [copyright holder]
Useful links:
Report bugs at https://github.com/alexzwanenburg/familiar/issues
A familiarCollection object aggregates data from one or more familiarData objects.
name
Name of the collection.
data_sets
Name of the individual underlying datasets.
outcome_type
Outcome type for which the collection was created.
outcome_info
Outcome information object, which contains information concerning the outcome, such as class levels.
fs_vimp
Variable importance data collected by feature selection methods.
model_vimp
Variable importance data collected from model-specific algorithms implemented by models created by familiar.
permutation_vimp
Data collected for permutation variable importance.
hyperparameters
Hyperparameters collected from created models.
hyperparameter_data
Additional data concerning hyperparameters. This is currently not used yet.
required_features
The set of features required for complete reproduction, i.e. with imputation.
model_features
The set of features that are required for using the model, but without imputation.
learner
Learning algorithm(s) used for data in the collection.
fs_method
Feature selection method(s) used for data in the collection.
prediction_data
Model predictions for the data in the collection.
confusion_matrix
Confusion matrix information for the data in the collection.
decision_curve_data
Decision curve analysis data for the data in the collection.
calibration_info
Calibration information, e.g. baseline survival in the development cohort.
calibration_data
Model calibration data collected from data in the collection.
model_performance
Collection of model performance data for data in the collection.
km_info
Information concerning risk-stratification cut-off values for data in the collection.
km_data
Kaplan-Meier survival data for data in the collection.
auc_data
AUC-ROC and AUC-PR data for data in the collection.
ice_data
Individual conditional expectation data for data in the collection. Partial dependence data are computed on the fly from these data.
univariate_analysis
Univariate analysis results of data in the collection.
feature_expressions
Feature expression values for data in the collection.
feature_similarity
Feature similarity information for data in the collection.
sample_similarity
Sample similarity information for data in the collection.
data_set_labels
Labels for the different datasets in the collection.
See get_data_set_names
and set_data_set_names
.
learner_labels
Labels for the different learning algorithms used to
create the collection. See get_learner_names
and set_learner_names
.
fs_method_labels
Labels for the different feature selection methods
used to create the collection. See get_fs_method_names
and
set_fs_method_names
.
feature_labels
Labels for the features in this collection. See
get_feature_names
and set_feature_names
.
km_group_labels
Labels for the risk strata in this collection. See
get_risk_group_names
and set_risk_group_names
.
class_labels
Labels of the response variable. See get_class_names
and
set_class_names
.
project_id
Identifier of the project that generated this collection.
familiar_version
Version of the familiar package.
familiarCollection objects collect data from one or more familiarData objects. This objects are important, as all plotting and export functions use it. The fact that one can supply familiarModel, familiarEnsemble and familiarData objects as arguments for these methods, is because familiar internally converts these into familiarCollection objects prior to executing the method.
A familiarData object is created by evaluating familiarEnsemble or familiarModel objects on a dataset. Multiple familiarData objects are aggregated in a familiarCollection object.
name
Name of the dataset, e.g. training or internal validation.
outcome_type
Outcome type of the data used to create the object.
outcome_info
Outcome information object, which contains additional information concerning the outcome, such as class levels.
fs_vimp
Variable importance data collected from feature selection methods.
model_vimp
Variable importance data collected from model-specific algorithms implemented by models created by familiar.
permutation_vimp
Data collected for permutation variable importance.
hyperparameters
Hyperparameters collected from created models.
hyperparameter_data
Additional data concerning hyperparameters. This is currently not used yet.
required_features
The set of features required for complete reproduction, i.e. with imputation.
model_features
The set of features that are required for using the model or ensemble of models, but without imputation.
learner
Learning algorithm used to create the model or ensemble of models.
fs_method
Feature selection method used to determine variable importance for the model or ensemble of models.
pooling_table
Run table for the data underlying the familiarData object. Used internally.
prediction_data
Model predictions for a model or ensemble of models for the underlying dataset.
confusion_matrix
Confusion matrix for a model or ensemble of models, based on the underlying dataset.
decision_curve_data
Decision curve analysis data for a model or ensemble of models, based on the underlying dataset.
calibration_info
Calibration information, e.g. baseline survival in the development cohort.
calibration_data
Calibration data for a model or ensemble of models, based on the underlying dataset.
model_performance
Model performance data for a model or ensemble of models, based on the underlying dataset.
km_info
Information concerning risk-stratification cut-off values..
km_data
Kaplan-Meier survival data for a model or ensemble of models, based on the underlying dataset.
auc_data
AUC-ROC and AUC-PR data for a model or ensemble of models, based on the underlying dataset.
ice_data
Individual conditional expectation data for features included in a model or ensemble of models, based on the underlying dataset. Partial dependence data are computed on the fly from these data.
univariate_analysis
Univariate analysis of the underlying dataset.
feature_expressions
Feature expression values of the underlying dataset.
feature_similarity
Feature similarity information of the underlying dataset.
sample_similarity
Sample similarity information of the underlying dataset.
is_validation
Signifies whether the underlying data forms a validation dataset. Used internally.
generating_ensemble
Name of the ensemble that was used to generate the familiarData object.
project_id
Identifier of the project that generated the familiarData object.
familiar_version
Version of the familiar package.
familiarData objects contain information obtained by evaluating a single model or single ensemble of models on a dataset.
Most attributes of the familiarData object are objects of the familiarDataElement class. This (super-)class is used to allow for standardised aggregation and processing of evaluation data.
data
Evaluation data, typically a data.table or list.
identifiers
Identifiers of the data, e.g. the generating model name, learner, etc.
detail_level
Sets the level at which results are computed and aggregated.
ensemble
: Results are computed at the ensemble level, i.e. over all
models in the ensemble. This means that, for example, bias-corrected
estimates of model performance are assessed by creating (at least) 20
bootstraps and computing the model performance of the ensemble model for
each bootstrap.
hybrid
(default): Results are computed at the level of models in an
ensemble. This means that, for example, bias-corrected estimates of model
performance are directly computed using the models in the ensemble. If
there are at least 20 trained models in the ensemble, performance is
computed for each model, in contrast to ensemble
where performance is
computed for the ensemble of models. If there are less than 20 trained
models in the ensemble, bootstraps are created so that at least 20 point
estimates can be made.
model
: Results are computed at the model level. This means that, for
example, bias-corrected estimates of model performance are assessed by
creating (at least) 20 bootstraps and computing the performance of the
model for each bootstrap.
Note that each level of detail has a different interpretation for bootstrap
confidence intervals. For ensemble
and model
these are the confidence
intervals for the ensemble and an individual model, respectively. That is,
the confidence interval describes the range where an estimate produced by a
respective ensemble or model trained on a repeat of the experiment may be
found with the probability of the confidence level. For hybrid
, it
represents the range where any single model trained on a repeat of the
experiment may be found with the probability of the confidence level. By
definition, confidence intervals obtained using hybrid
are at least as
wide as those for ensemble
. hybrid
offers the correct interpretation if
the goal of the analysis is to assess the result of a single, unspecified,
model.
Some child classes do not use this parameter.
estimation_type
Sets the type of estimation that should be possible. This has the following options:
point
: Point estimates.
bias_correction
or bc
: Bias-corrected estimates. A bias-corrected
estimate is computed from (at least) 20 point estimates, and familiar
may
bootstrap the data to create them.
bootstrap_confidence_interval
or bci
(default): Bias-corrected
estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The
number of point estimates required depends on the confidence_level
parameter, and familiar
may bootstrap the data to create them.
Some child classes do not use this parameter.
confidence_level
(optional) Numeric value for the level at which
confidence intervals are determined. In the case bootstraps are used to
determine the confidence intervals bootstrap estimation, familiar
uses
the rule of thumb to determine the number of
required bootstraps.
bootstrap_ci_method
Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
percentile
(default): Confidence intervals obtained using the percentile
method.
bc
: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
value_column
Identifies column(s) in the data
attribute presenting
values.
grouping_column
Identifies column(s) in the data
attribute presenting
identifier columns for grouping during aggregation. Familiar will
automatically assign items from the identifiers
attribute to the data and
this attribute when combining multiple familiarDataElements of the same
(child) class.
is_aggregated
Defines whether the object was aggregated.
Efron, B. & Hastie, T. Computer Age Statistical Inference. (Cambridge University Press, 2016).
A familiarEnsemble object contains one or more familiarModel objects.
name
Name of the familiarEnsemble object.
model_list
List of attached familiarModel objects, or paths to these objects. Familiar attaches familiarModel objects when required.
outcome_type
Outcome type of the data used to create the object.
outcome_info
Outcome information object, which contains additional information concerning the outcome, such as class levels.
data_column_info
Data information object containing information regarding identifier column names and outcome column names.
learner
Learning algorithm used to create the models in the ensemble.
fs_method
Feature selection method used to determine variable importance for the models in the ensemble.
feature_info
List of objects containing feature information, e.g., name, class levels, transformation, normalisation and clustering parameters.
required_features
The set of features required for complete reproduction, i.e. with imputation.
model_features
The combined set of features that is used to train the models in the ensemble,
novelty_features
The combined set of features that is used to train all novelty detectors in the ensemble.
run_table
Run table for the data used to train the ensemble. Used internally.
calibration_info
Calibration information, e.g. baseline survival in the development cohort.
model_dir_path
Path to folder containing the familiarModel objects. Can
be updated using the update_model_dir_path
method.
auto_detach
Flag used to determine whether models should be detached from the model after use, or not. Used internally.
settings
A copy of the evaluation configuration parameters used at model creation. These are used as default parameters when evaluating the ensemble to create a familiarData object.
project_id
Identifier of the project that generated the underlying familiarModel object(s).
familiar_version
Version of the familiar package.
A familiarHyperparameterLearner object is a self-contained model that can be applied to predict optimisation scores for a set of hyperparameters.
Hyperparameter learners are used to infer the optimisation score for sets of hyperparameters. These are then used to either infer utility using acquisition functions or to generate summary scores to identify the optimal model.
name
Name of the familiarHyperparameterLearner object.
learner
Algorithm used to create the hyperparameter learner.
target_learner
Algorithm for which the hyperparameters are being learned.
target_outcome_type
Outcome type of the learner for which hyperparameters are being modeled. Used to determine the target hyperparameters.
optimisation_metric
One or metrics used to generate the optimisation score.
optimisation_function
Function used to generate the optimisation score.
model
The actual model trained using the specific algorithm, e.g. a
isolation forest from the isotree
package.
target_hyperparameters
The names of the hyperparameters that are used to train the hyperparameter learner.
project_id
Identifier of the project that generated the familiarHyperparameterLearner object.
familiar_version
Version of the familiar package.
package
Name of package(s) required to executed the hyperparameter
learner itself, e.g. laGP
.
package_version
Version of the packages mentioned in the package
attribute.
Superclass for model performance objects.
metric
Performance metric.
outcome_type
Type of outcome being predicted.
name
Name of the performance metric.
value_range
Range of the performance metric. Can be half-open.
baseline_value
Value of the metric for trivial models, e.g. models that always predict the median value, the majority class, or the mean hazard, etc.
higher_better
States whether higher metric values correspond to better predictive model performance (e.g. accuracy) or not (e.g. root mean squared error).
A familiarModel object is a self-contained model that can be applied to generate predictions for a dataset. familiarModel objects form the parent class of learner-specific child classes.
name
Name of the familiarModel object.
model
The actual model trained using a specific algorithm, e.g. a
random forest from the ranger
package, or a LASSO model from glmnet
.
outcome_type
Outcome type of the data used to create the object.
outcome_info
Outcome information object, which contains additional information concerning the outcome, such as class levels.
feature_info
List of objects containing feature information, e.g., name, class levels, transformation, normalisation and clustering parameters.
data_column_info
Data information object containing information regarding identifier column names and outcome column names.
hyperparameters
Set of hyperparameters used to train the model.
hyperparameter_data
Information generated during hyperparameter optimisation.
calibration_model
One or more models used to recalibrate the model output. Currently only used by some models.
novelty_detector
A familiarNoveltyDetector object that can be used to detect out-of-distribution samples.
learner
Learning algorithm used to create the model.
fs_method
Feature selection method used to determine variable importance for the model.
required_features
The set of features required for complete reproduction, i.e. with imputation.
model_features
The set of features that is used to train the model,
novelty_features
The set of features that is used to train all novelty detectors in the ensemble.
calibration_info
Calibration information, e.g. baseline survival in the development cohort.
km_info
Data concerning stratification into risk groups.
run_table
Run table for the data used to train the model. Used internally.
settings
A copy of the evaluation configuration parameters used at model creation. These are used as default parameters when evaluating the model (technically, familiarEnsemble) to create a familiarData object.
is_trimmed
Flag that indicates whether the model, stored in the model
slot, has been trimmed.
trimmed_function
List of functions whose output has been captured prior to trimming the model.
messages
List of warning and error messages generated during training.
project_id
Identifier of the project that generated the familiarModel object.
familiar_version
Version of the familiar package.
package
Name of package(s) required to executed the model itself, e.g.
ranger
or glmnet
.
package_version
Version of the packages mentioned in the package
attribute.
A familiarNoveltyDetector object is a self-contained model that can be applied to generate out-of-distribution predictions for instances in a dataset.
Note that these objects do not contain any data concerning outcome, as this not relevant for (prospective) out-of-distribution detection.
name
Name of the familiarNoveltyDetector object.
learner
Learning algorithm used to create the novelty detector.
model
The actual novelty detector trained using a specific algorithm,
e.g. a isolation forest from the isotree
package.
feature_info
List of objects containing feature information, e.g., name, class levels, transformation, normalisation and clustering parameters.
data_column_info
Data information object containing information regarding identifier column names.
conversion_parameters
Parameters used to convert raw output to statistical probability of being out-of-distribution. Currently unused.
hyperparameters
Set of hyperparameters used to train the detector.
required_features
The set of features required for complete reproduction, i.e. with imputation.
model_features
The set of features that is used to train the detector.
run_table
Run table for the data used to train the detector. Used internally.
is_trimmed
Flag that indicates whether the detector, stored in the
model
slot, has been trimmed.
trimmed_function
List of functions whose output has been captured prior to trimming the model.
project_id
Identifier of the project that generated the familiarNoveltyDetector object.
familiar_version
Version of the familiar package.
package
Name of package(s) required to executed the detector itself,
e.g. isotree
.
package_version
Version of the packages mentioned in the package
attribute.
The familiarVimpMethod class is the parent class for all variable importance methods in familiar.
outcome_type
Outcome type of the data to be evaluated using the object.
hyperparameters
Set of hyperparameters for the variable importance method.
vimp_method
The character string indicating the variable importance method.
multivariate
Flags whether the variable importance method is multivariate vs. univariate.
outcome_info
Outcome information object, which contains additional information concerning the outcome, such as class levels.
feature_info
List of objects containing feature information, e.g., name, class levels, transformation, normalisation and clustering parameters.
required_features
The set of features to be assessed by the variable importance method.
package
Name of the package(s) required to execute the variable importance method.
run_table
Run table for the data to be assessed by the variable importance method. Used internally.
project_id
Identifier of the project that generated the familiarVimpMethod object.
A featureInfo object contains information for a single feature. This information is used to check data prospectively for consistency and for data preparation. These objects are, for instance, attached to a familiarModel object so that data can be pre-processed in the same way as the development data.
name
Name of the feature, which by default is the column name of the feature.
set_descriptor
Character string describing the set to which the feature belongs. Currently not used.
feature_type
Describes the feature type, i.e. factor
or numeric
.
levels
The class levels of categorical features. This is used to check prospective datasets.
ordered
Specifies whether the
distribution
Five-number summary (numeric) or class frequency (categorical).
data_id
Internal identifier for the dataset used to derive the feature information.
run_id
Internal identifier for the specific subset of the dataset used to derive the feature information.
in_signature
Specifies whether the feature is included in the model signature.
in_novelty
Specifies whether the feature is included in the novelty detector.
removed
Specifies whether the feature was removed during pre-processing.
removed_unknown_type
Specifies whether the feature was removed during pre-processing because the type was neither factor nor numeric..
removed_missing_values
Specifies whether the feature was removed during pre-processing because it contained too many missing values.
removed_no_variance
Specifies whether the feature was removed during pre-processing because it did not contain more than 1 unique value.
removed_low_variance
Specifies whether the feature was removed during
pre-processing because the variance was too low. Requires applying
low_variance
as a filter_method
.
removed_low_robustness
Specifies whether the feature was removed during
pre-processing because it lacks robustness. Requires applying robustness
as a filter_method
, as well as repeated measurement.
removed_low_importance
Specifies whether the feature was removed during
pre-processing because it lacks relevance. Requires applying
univariate_test
as a filter_method
.
fraction_missing
Specifies the fraction of missing values.
robustness
Specifies robustness of the feature, if measured.
univariate_importance
Specifies the univariate p-value of the feature, if measured.
transformation_parameters
Details parameters for power transformation of numeric features.
normalisation_parameters
Details parameters for (global) normalisation of numeric features.
batch_normalisation_parameters
Details parameters for batch normalisation of numeric features.
imputation_parameters
Details parameters or models for imputation of missing values.
cluster_parameters
Details parameters for forming clusters with other features.
required_features
Details features required for clustering or imputation.
familiar_version
Version of the familiar package.
A featureInfo object contains information for a single feature. Some information, for example concerning clustering and transformation contains various parameters that allow for applying the data transformation correctly. These are stored in featureInfoParameters objects.
featureInfoParameters is normally a parent class for specific classes, such as featureInfoParametersTransformation.
name
Name of the feature, which by default is the column name of the feature. Typically used to correctly assign the data.
complete
Flags whether the parameters have been completely set.
familiar_version
Version of the familiar package.
Outcome classes in familiarCollection objects can have custom names for export and plotting. This function retrieves the currently assigned names.
## S4 method for signature 'familiarCollection' get_class_names(x)
## S4 method for signature 'familiarCollection' get_class_names(x)
x |
A familiarCollection object. |
Labels convert internal class names to the requested label at export
or when plotting. Labels can be changed using the set_class_names
method.
An ordered array of class labels.
familiarCollection for information concerning the familiarCollection class.
set_class_names
for updating the name and ordering of classes.
Datasets in familiarCollection objects can have custom names for export and plotting. This function retrieves the currently assigned names.
## S4 method for signature 'familiarCollection' get_data_set_names(x)
## S4 method for signature 'familiarCollection' get_data_set_names(x)
x |
A familiarCollection object. |
Labels convert internal naming of data sets to the requested label
at export or when plotting. Labels can be changed using the
set_data_set_names
method.
An ordered array of dataset name labels.
familiarCollection for information concerning the familiarCollection class.
set_data_set_names
for updating the name of datasets and their ordering.
Features in familiarCollection objects can have custom names for export and plotting. This function retrieves the currently assigned names.
## S4 method for signature 'familiarCollection' get_feature_names(x)
## S4 method for signature 'familiarCollection' get_feature_names(x)
x |
A familiarCollection object. |
Labels convert internal naming of features to the requested label at
export or when plotting. Labels can be changed using the
set_feature_names
method.
An ordered array of feature labels.
familiarCollection for information concerning the familiarCollection class.
set_feature_names
for updating the name and ordering of features.
Feature selection methods in familiarCollection objects can have custom names for export and plotting. This function retrieves the currently assigned names.
## S4 method for signature 'familiarCollection' get_fs_method_names(x)
## S4 method for signature 'familiarCollection' get_fs_method_names(x)
x |
A familiarCollection object. |
Labels convert internal naming of feature selection methods to the
requested label at export or when plotting. Labels can be changed using the
set_fs_method_names
method.
An ordered array of feature selection method name labels.
familiarCollection for information concerning the familiarCollection class.
set_fs_method_names
for updating the name of feature selection methods and their ordering.
Learners in familiarCollection objects can have custom names for export and plotting. This function retrieves the currently assigned names.
## S4 method for signature 'familiarCollection' get_learner_names(x)
## S4 method for signature 'familiarCollection' get_learner_names(x)
x |
A familiarCollection object. |
Labels convert internal naming of learners to the requested label at
export or when plotting. Labels can be changed using the
set_learner_names
method.
An ordered array of learner name labels.
familiarCollection for information concerning the familiarCollection class.
set_learner_names
for updating the name of learners and their ordering.
Risk groups in familiarCollection objects can have custom names for export and plotting. This function retrieves the currently assigned names.
## S4 method for signature 'familiarCollection' get_risk_group_names(x)
## S4 method for signature 'familiarCollection' get_risk_group_names(x)
x |
A familiarCollection object. |
Labels convert internal naming of risk groups to the requested label
at export or when plotting. Labels can be changed using the
set_risk_group_names
method.
An ordered array of risk group labels.
familiarCollection for information concerning the familiarCollection class.
set_risk_group_names
for updating the name and ordering of risk groups.
This method retrieves and parses variable importance tables from
their respective vimpTable
objects.
get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'list' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'character' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'vimpTable' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'NULL' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'experimentData' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'familiarModel' get_vimp_table(x, state = "ranked", data = NULL, as_object = FALSE, ...)
get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'list' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'character' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'vimpTable' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'NULL' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'experimentData' get_vimp_table(x, state = "ranked", ...) ## S4 method for signature 'familiarModel' get_vimp_table(x, state = "ranked", data = NULL, as_object = FALSE, ...)
x |
Variable importance ( |
state |
State of the returned variable importance table. This affects what contents are shown, and in which format. The variable importance table can be returned with the following states:
Internally, the variable importance table will go through each state, i.e. an variable importance table in the initial state will be decoded, declustered and then ranked prior to returning the variable importance table. |
... |
Unused arguments. |
data |
Internally used argument for use with |
as_object |
Internally used argument for use with |
A data.table
with variable importance scores and, with
state="ranked"
, the respective ranks.
This function creates an empty configuration xml file in the directory
specified by dir_path
. This provides an alternative to the use of input
arguments for familiar.
get_xml_config(dir_path)
get_xml_config(dir_path)
dir_path |
Path to the directory where the configuration file should be
created. The directory should exist, and no file named |
Nothing. A file named config.xml
is created in the directory
indicated by dir_path
.
## Not run: # Creates a config.xml file in the working directory get_xml_config(dir_path=getwd()) ## End(Not run)
## Not run: # Creates a config.xml file in the working directory get_xml_config(dir_path=getwd()) ## End(Not run)
An outcome information object stores data concerning an outcome. This is used to prospectively check data.
name
Name of the outcome, inherited from the original column name by default.
outcome_type
Type of outcome.
outcome_column
Name of the outcome column in data.
levels
Specifies class levels of categorical outcomes.
ordered
Specifies whether categorical outcomes are ordered.
reference
Class level used as reference.
time
Maximum time, as set by the time_max
configuration parameter.
censored
Censoring indicators for survival outcomes.
event
Event indicators for survival outcomes.
competing_risk
Indicators for competing risks in survival outcomes.
distribution
Five-number summary (numeric outcomes), class frequency (categorical outcomes), or survival distributions.
data_id
Internal identifier for the dataset used to derive the outcome information.
run_id
Internal identifier for the specific subset of the dataset used to derive the outcome information.
transformation_parameters
Parameters used for transforming a numeric outcomes. Currently unused.
normalisation_parameters
Parameters used for normalising numeric outcomes. Currently unused.
This method creates precision-recall curves based on data in a familiarCollection object.
plot_auc_precision_recall_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_auc_precision_recall_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_auc_precision_recall_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_auc_precision_recall_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_auc_precision_recall_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_auc_precision_recall_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where the plots of receiver
operating characteristic curves are saved to. Output is saved in the
|
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to color the different
plot elements in case a value was provided to the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates area under the precision-recall curve plots.
Available splitting variables are: fs_method
, learner
, data_set
and
positive_class
. By default, the data is split by fs_method
and learner
,
with faceting by data_set
and colouring by positive_class
.
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Bootstrap confidence intervals of the ROC curve (if present) can be shown
using various styles set by conf_int_style
:
ribbon
(default): confidence intervals are shown as a ribbon with an
opacity of conf_int_alpha
around the point estimate of the ROC curve.
step
(default): confidence intervals are shown as a step function around
the point estimate of the ROC curve.
none
: confidence intervals are not shown. The point estimate of the ROC
curve is shown as usual.
Labelling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates receiver operating characteristic curves based on data in a familiarCollection object.
plot_auc_roc_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_auc_roc_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_auc_roc_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_auc_roc_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_auc_roc_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_auc_roc_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_n_breaks = 5, x_breaks = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where the plots of receiver
operating characteristic curves are saved to. Output is saved in the
|
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to color the different
plot elements in case a value was provided to the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates area under the ROC curve plots.
Available splitting variables are: fs_method
, learner
, data_set
and
positive_class
. By default, the data is split by fs_method
and learner
,
with faceting by data_set
and colouring by positive_class
.
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Bootstrap confidence intervals of the ROC curve (if present) can be shown
using various styles set by conf_int_style
:
ribbon
(default): confidence intervals are shown as a ribbon with an
opacity of conf_int_alpha
around the point estimate of the ROC curve.
step
(default): confidence intervals are shown as a step function around
the point estimate of the ROC curve.
none
: confidence intervals are not shown. The point estimate of the ROC
curve is shown as usual.
Labelling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates calibration plots from calibration data stored in a familiarCollection object. For this figures, the expected (predicted) values are plotted against the observed values. A well-calibrated model should be close to the identity line.
plot_calibration_data( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_density = TRUE, show_calibration_fit = TRUE, show_goodness_of_fit = TRUE, density_plot_height = grid::unit(1, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_calibration_data( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_density = TRUE, show_calibration_fit = TRUE, show_goodness_of_fit = TRUE, density_plot_height = grid::unit(1, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_calibration_data( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_density = TRUE, show_calibration_fit = TRUE, show_goodness_of_fit = TRUE, density_plot_height = grid::unit(1, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_calibration_data( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_density = TRUE, show_calibration_fit = TRUE, show_goodness_of_fit = TRUE, density_plot_height = grid::unit(1, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_calibration_data( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_density = TRUE, show_calibration_fit = TRUE, show_goodness_of_fit = TRUE, density_plot_height = grid::unit(1, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_calibration_data( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_density = TRUE, show_calibration_fit = TRUE, show_goodness_of_fit = TRUE, density_plot_height = grid::unit(1, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created calibration
plots are saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to color the different
data points and fit lines in case a non-singular variable was provided to
the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
x_label_shared |
(optional) Sharing of x-axis labels between facets. One of three values:
|
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
y_label_shared |
(optional) Sharing of y-axis labels between facets. One of three values:
|
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
show_density |
(optional) Show point density in top margin of the
figure. If |
show_calibration_fit |
(optional) Specifies whether the calibration in
the large and calibration slope are annotated in the plot. If |
show_goodness_of_fit |
(optional) Specifies whether a the results of
goodness of fit tests are annotated in the plot. If |
density_plot_height |
(optional) Height of the density plot. The height
is 1.5 cm by default. Height is expected to be grid unit (see |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates a calibration plot for each model in each dataset. Any data used for calibration (e.g. baseline survival) is obtained during model creation.
Available splitting variables are: fs_method
, learner
, data_set
and
evaluation_time
(survival analysis only) and positive_class
(multinomial
endpoints only). By default, separate figures are created for each
combination of fs_method
and learner
, with facetting by data_set
.
Calibration in survival analysis is performed at set time points so that
survival probabilities can be computed from the model, and compared with
observed survival probabilities. This is done differently depending on the
underlying model. For Cox partial hazards regression models, the base
survival (of the development samples) are used, whereas accelerated failure
time models (e.g. Weibull) and survival random forests can be used to
directly predict survival probabilities at a given time point. For survival
analysis, evaluation_time
is an additional facet variable (by default).
Calibration for multinomial endpoints is performed in a one-against-all
manner. This yields calibration information for each individual class of the
endpoint. For such endpoints, positive_class
is an additional facet variable
(by default).
Calibration plots have a density plot in the margin, which shows the density
of the plotted points, ordered by the expected probability or value. For
binomial and multinomial outcomes, the density for positive and negative
classes are shown separately. Note that this information is only provided in
when color_by
is not used as a splitting variable (i.e. one calibration
plot per facet).
Calibration plots are annotated with the intercept and the slope of a linear
model fitted to the sample points. A well-calibrated model has an intercept
close to 0.0 and a slope of 1.0. Intercept and slope are shown with their
respective 95% confidence intervals. In addition, goodness-of-fit tests may
be shown. For most endpoints these are based on the Hosmer-Lemeshow (HL)
test, but for survival endpoints both the Nam-D'Agostino (ND) and the
Greenwood-Nam-D'Agostino (GND) tests are shown. Note that this information
is only annotated when color_by
is not used as a splitting variable (i.e.
one calibration plot per facet).
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labeling methods such as set_risk_group_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
Hosmer, D. W., Hosmer, T., Le Cessie, S. & Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 16, 965–980 (1997).
D’Agostino, R. B. & Nam, B.-H. Evaluation of the Performance of Survival Analysis Models: Discrimination and Calibration Measures. in Handbook of Statistics vol. 23 1–25 (Elsevier, 2003).
Demler, O. V., Paynter, N. P. & Cook, N. R. Tests of calibration and goodness-of-fit in the survival setting. Stat. Med. 34, 1659–1680 (2015).
This method creates confusion matrices based on data in a familiarCollection object.
plot_confusion_matrix( object, draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), show_alpha = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_confusion_matrix( object, draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), show_alpha = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_confusion_matrix( object, draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), show_alpha = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_confusion_matrix( object, draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), show_alpha = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_confusion_matrix( object, draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), show_alpha = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_confusion_matrix( object, draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), show_alpha = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created confusion
matrixes are saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette used to colour the confusion matrix. The colour depends on whether each cell of the confusion matrix is on the diagonal (observed outcome matched expected outcome) or not. |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
rotate_x_tick_labels |
(optional) Rotate tick labels on the x-axis by
90 degrees. Defaults to |
show_alpha |
(optional) Interpreting confusion matrices is made easier
by setting the opacity of the cells.
|
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates area under the ROC curve plots.
Available splitting variables are: fs_method
, learner
and data_set
.
By default, the data is split by fs_method
and learner
, with facetting
by data_set
.
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labeling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates decision curves based on data in a familiarCollection object.
plot_decision_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_decision_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_decision_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_decision_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_decision_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_decision_curve( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created decision
curve plots are saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to color the different
plot elements in case a value was provided to the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates plots for decision curves.
Available splitting variables are: fs_method
, learner
, data_set
and
positive_class
(categorical outcomes) or evaluation_time
(survival
outcomes). By default, the data is split by fs_method
and learner
, with
faceting by data_set
and colouring by positive_class
or
evaluation_time
.
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Bootstrap confidence intervals of the decision curve (if present) can be
shown using various styles set by conf_int_style
:
ribbon
(default): confidence intervals are shown as a ribbon with an
opacity of conf_int_alpha
around the point estimate of the decision
curve.
step
(default): confidence intervals are shown as a step function around
the point estimate of the decision curve.
none
: confidence intervals are not shown. The point estimate of the
decision curve is shown as usual.
Labelling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating prediction models. Med. Decis. Making 26, 565–574 (2006).
Vickers, A. J., Cronin, A. M., Elkin, E. B. & Gonen, M. Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med. Inform. Decis. Mak. 8, 53 (2008).
Vickers, A. J., van Calster, B. & Steyerberg, E. W. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res 3, 18 (2019).
This method creates a heatmap based on data stored in a
familiarCollection
object. Features in the heatmap are ordered so that
more similar features appear together.
plot_feature_similarity( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_dendrogram = c("top", "right"), dendrogram_height = grid::unit(1.5, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_feature_similarity( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_dendrogram = c("top", "right"), dendrogram_height = grid::unit(1.5, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_feature_similarity( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_dendrogram = c("top", "right"), dendrogram_height = grid::unit(1.5, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_feature_similarity( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_dendrogram = c("top", "right"), dendrogram_height = grid::unit(1.5, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_feature_similarity( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_dendrogram = c("top", "right"), dendrogram_height = grid::unit(1.5, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_feature_similarity( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_dendrogram = c("top", "right"), dendrogram_height = grid::unit(1.5, "cm"), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
A |
feature_cluster_method |
The method used to perform clustering. These are
the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_cluster_cut_method |
The method used to divide features into
separate clusters. The available methods are the same as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_similarity_threshold |
The threshold level for pair-wise
similarity that is required to form feature clusters with the If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created performance
plots are saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
gradient_palette |
(optional) Sequential or divergent palette used to colour the similarity or distance between features in a heatmap. |
gradient_palette_range |
(optional) Numerical range used to span the
gradient. This should be a range of two values, e.g. |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
x_label_shared |
(optional) Sharing of x-axis labels between facets. One of three values:
|
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
y_label_shared |
(optional) Sharing of y-axis labels between facets. One of three values:
|
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
rotate_x_tick_labels |
(optional) Rotate tick labels on the x-axis by
90 degrees. Defaults to |
show_dendrogram |
(optional) Show dendrogram around the main panel.
Can be A dendrogram can only be drawn from cluster methods that produce
dendrograms, such as By default, a dendrogram is drawn to the top and right of the panel. |
dendrogram_height |
(optional) Height of the dendrogram. The height is
1.5 cm by default. Height is expected to be grid unit (see |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates area under the ROC curve plots.
Available splitting variables are: fs_method
, learner
, and data_set
.
By default, the data is split by fs_method
and learner
, with facetting
by data_set
.
Note that similarity is determined based on the underlying data. Hence the ordering of features may differ between facets, and tick labels are maintained for each panel.
Available palettes for gradient_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labeling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates individual conditional expectation plots based on data in a familiarCollection object.
plot_ice( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = NULL, plot_sub_title = NULL, caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, ice_default_alpha = 0.6, n_max_samples_shown = 50L, show_ice = TRUE, show_pd = TRUE, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_ice( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = NULL, plot_sub_title = NULL, caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, ice_default_alpha = 0.6, n_max_samples_shown = 50L, show_ice = TRUE, show_pd = TRUE, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_ice( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, ice_default_alpha = 0.6, n_max_samples_shown = 50L, show_ice = TRUE, show_pd = TRUE, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_ice( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = NULL, plot_sub_title = NULL, caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, ice_default_alpha = 0.6, n_max_samples_shown = 50L, show_ice = TRUE, show_pd = TRUE, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_ice( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = NULL, plot_sub_title = NULL, caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, ice_default_alpha = 0.6, n_max_samples_shown = 50L, show_ice = TRUE, show_pd = TRUE, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_ice( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, ice_default_alpha = 0.6, n_max_samples_shown = 50L, show_ice = TRUE, show_pd = TRUE, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created individual
conditional expectation plots are saved to. Output is saved in the
|
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to colour the different
plot elements in case a value was provided to the |
gradient_palette |
(optional) Sequential or divergent palette used to colour the raster in 2D individual conditional expectation or partial dependence plots. This argument is not used for 1D plots. |
gradient_palette_range |
(optional) Numerical range used to span the
gradient for 2D plots. This should be a range of two values, e.g. |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
novelty_range |
(optional) Numerical range used to span the range of
novelty values. This determines the size of the bubbles in 2D, and
transparency of lines in 1D. This should be a range of two values, e.g.
|
value_scales |
(optional) Sets scaling of predicted values. This parameter has several options:
For 1D plots, this option is ignored if the |
novelty_scales |
(optional) Sets scaling of novelty values, similar to
the
|
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
ice_default_alpha |
(optional) Default transparency (value) of sample lines in an 1D plot. When novelty is shown, this is the transparency corresponding to the least novel points. The confidence interval alpha values is scaled by this value. |
n_max_samples_shown |
(optional) Maximum number of samples shown in an individual conditional expectation plot. Defaults to 50. These samples are randomly picked from the samples present in the ICE data, but the same samples are consistently picked. Partial dependence is nonetheless computed from all available samples. |
show_ice |
(optional) Sets whether individual conditional expectation plots should be created. |
show_pd |
(optional) Sets whether partial dependence plots should be created. Note that if an anchor is set for a particular feature, its partial dependence cannot be shown. |
show_novelty |
(optional) Sets whether novelty is shown in plots. |
anchor_values |
(optional) A single value or a named list or array of
values that are used to centre the individual conditional expectation plot.
A single value is valid if and only if only a single feature is assessed.
Otherwise, values Has no effect if the plot is not shown, i.e.
|
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates individual conditional expectation plots. These plots come in two varieties, namely 1D and 2D. 1D plots show the predicted value as function of a single feature, whereas 2D plots show the predicted value as a function of two features.
Available splitting variables are: feature_x
, feature_y
(2D only),
fs_method
, learner
, data_set
and positive_class
(categorical
outcomes) or evaluation_time
(survival outcomes). By default, for 1D ICE
plots the data are split by feature_x
, fs_method
and learner
, with
faceting by data_set
, positive_class
or evaluation_time
. If only
partial dependence is shown, positive_class
and evaluation_time
are
used to set colours instead. For 2D plots, by default the data are split by
feature_x
, fs_method
and learner
, with faceting by data_set
,
positive_class
or evaluation_time
. The color_by
argument cannot be
used with 2D plots, and attempting to do so causes an error. Attempting to
specify feature_x
or feature_y
for color_by
will likewise result in
an error, as multiple features cannot be shown in the same facet.
The splitting variables indicated by color_by
are coloured according to
the discrete_palette
parameter. This parameter is therefore only used for
1D plots. Available palettes for discrete_palette
and gradient_palette
are those listed by grDevices::palette.pals()
(requires R >= 4.0.0),
grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
,
terrain.colors
, topo.colors
and cm.colors
, which correspond to the
palettes of the same name in grDevices
. If not specified, a default
palette based on palettes in Tableau are used. You may also specify your
own palette by using colour names listed by grDevices::colors()
or
through hexadecimal RGB strings.
Bootstrap confidence intervals of the partial dependence plots can be shown
using various styles set by conf_int_style
:
ribbon
(default): confidence intervals are shown as a ribbon with an
opacity of conf_int_alpha
around the point estimate of the partial
dependence.
step
(default): confidence intervals are shown as a step function around
the point estimate of the partial dependence.
none
: confidence intervals are not shown. The point estimate of the
partial dependence is shown as usual.
Note that when bootstrap confidence intervals were computed, they were also computed for individual samples in individual conditional expectation plots. To avoid clutter, only point estimates for individual samples are shown.
Labelling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This function creates Kaplan-Meier survival curves from stratification data stored in a familiarCollection object.
plot_kaplan_meier( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, linetype_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, combine_legend = TRUE, ggtheme = NULL, discrete_palette = NULL, x_label = "time", x_label_shared = "column", y_label = "survival probability", y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = c(0, 1), y_n_breaks = 5, y_breaks = NULL, confidence_level = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, censoring = TRUE, censor_shape = "plus", show_logrank = TRUE, show_survival_table = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_kaplan_meier( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, linetype_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, combine_legend = TRUE, ggtheme = NULL, discrete_palette = NULL, x_label = "time", x_label_shared = "column", y_label = "survival probability", y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = c(0, 1), y_n_breaks = 5, y_breaks = NULL, confidence_level = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, censoring = TRUE, censor_shape = "plus", show_logrank = TRUE, show_survival_table = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_kaplan_meier( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, linetype_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, combine_legend = TRUE, ggtheme = NULL, discrete_palette = NULL, x_label = "time", x_label_shared = "column", y_label = "survival probability", y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = c(0, 1), y_n_breaks = 5, y_breaks = NULL, confidence_level = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, censoring = TRUE, censor_shape = "plus", show_logrank = TRUE, show_survival_table = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_kaplan_meier( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, linetype_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, combine_legend = TRUE, ggtheme = NULL, discrete_palette = NULL, x_label = "time", x_label_shared = "column", y_label = "survival probability", y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = c(0, 1), y_n_breaks = 5, y_breaks = NULL, confidence_level = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, censoring = TRUE, censor_shape = "plus", show_logrank = TRUE, show_survival_table = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_kaplan_meier( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, linetype_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, combine_legend = TRUE, ggtheme = NULL, discrete_palette = NULL, x_label = "time", x_label_shared = "column", y_label = "survival probability", y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = c(0, 1), y_n_breaks = 5, y_breaks = NULL, confidence_level = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, censoring = TRUE, censor_shape = "plus", show_logrank = TRUE, show_survival_table = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_kaplan_meier( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, linetype_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, combine_legend = TRUE, ggtheme = NULL, discrete_palette = NULL, x_label = "time", x_label_shared = "column", y_label = "survival probability", y_label_shared = "row", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = c(0, 1), y_n_breaks = 5, y_breaks = NULL, confidence_level = NULL, conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, censoring = TRUE, censor_shape = "plus", show_logrank = TRUE, show_survival_table = TRUE, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created figures are
saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
linetype_by |
(optional) Variables that are used to determine the
linetype of lines in a plot. The variables cannot overlap with those
provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
combine_legend |
(optional) Flag to indicate whether the same legend
is to be shared by multiple aesthetics, such as those specified by
|
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to color the different
risk strata in case a non-singular variable was provided to the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
x_label_shared |
(optional) Sharing of x-axis labels between facets. One of three values:
|
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
y_label_shared |
(optional) Sharing of y-axis labels between facets. One of three values:
|
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
confidence_level |
(optional) Confidence level for the strata in the plot. |
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
censoring |
(optional) Flag to indicate whether censored samples should be indicated on the survival curve. |
censor_shape |
(optional) Shape used to indicate censored samples on
the survival curve. Available shapes are documented in the |
show_logrank |
(optional) Specifies whether the results of a logrank
test to assess differences between the risk strata is annotated in the
plot. A log-rank test can only be shown when |
show_survival_table |
(optional) Specifies whether a survival table is
shown below the Kaplan-Meier survival curves. Survival in the risk strata
is assessed for each of the breaks in |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from number of facets and the inclusion of survival tables. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates a Kaplan-Meier survival plot based on risk group stratification by the learners.
familiar
does not determine what units the x-axis has or what kind of
survival the y-axis represents. It is therefore recommended to provide
x_label
and y_label
arguments.
Available splitting variables are: fs_method
, learner
, data_set
,
risk_group
and stratification_method
. By default, separate figures are
created for each combination of fs_method
and learner
, with faceting by
data_set
, colouring of the strata in each individual plot by
risk_group
.
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Greenwood confidence intervals of the Kaplan-Meier curve can be shown using
various styles set by conf_int_style
:
ribbon
(default): confidence intervals are shown as a ribbon with an
opacity of conf_int_alpha
around the point estimate of the Kaplan-Meier
curve.
step
(default): confidence intervals are shown as a step function around
the point estimate of the Kaplan-Meier curve.
none
: confidence intervals are not shown. The point estimate of the ROC
curve is shown as usual.
Labelling methods such as set_risk_group_names
or set_data_set_names
can be applied to the familiarCollection
object to update labels, and
order the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates plots that show model performance from the
data stored in a familiarCollection object. This method may create several
types of plots, as determined by plot_type
.
plot_model_performance( object, draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, plot_type = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), annotate_performance = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_model_performance( object, draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, plot_type = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), annotate_performance = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_model_performance( object, draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, plot_type = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), annotate_performance = NULL, export_collection = FALSE, ... )
plot_model_performance( object, draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, plot_type = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), annotate_performance = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_model_performance( object, draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, plot_type = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), annotate_performance = NULL, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_model_performance( object, draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, plot_type = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, rotate_x_tick_labels = waiver(), y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), annotate_performance = NULL, export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created performance
plots are saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
x_axis_by |
(optional) Variable plotted along the x-axis of a plot.
The variable cannot overlap with variables provided to the |
y_axis_by |
(optional) Variable plotted along the y-axis of a plot.
The variable cannot overlap with variables provided to the |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
plot_type |
(optional) Type of plot to draw. This is one of The choice for |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to color the different
plot elements in case a value was provided to the |
gradient_palette |
(optional) Sequential or divergent palette used to
color the raster in |
gradient_palette_range |
(optional) Numerical range used to span the
gradient. This should be a range of two values, e.g. |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
rotate_x_tick_labels |
(optional) Rotate tick labels on the x-axis by
90 degrees. Defaults to |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
annotate_performance |
(optional) Indicates whether performance in
heatmaps should be annotated with text. Can be |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function plots model performance based on empirical bootstraps, using various plot representations.
Available splitting variables are: fs_method
, learner
, data_set
,
evaluation_time
(survival outcome only) and metric
. The default for
heatmap
is to split by metric
, facet by data_set
and
evaluation_time
, position learner
along the x-axis and fs_method
along the y-axis. The color_by
argument is not used. The only valid
options for x_axis_by
and y_axis_by
are learner
and fs_method
.
For other plot types (barplot
, boxplot
and violinplot
), depends on
the number of learners and feature selection methods:
one feature selection method and one learner: the default is to split by
metric
, and have data_set
along the x-axis.
one feature selection and multiple learners: the default is to split by
metric
, facet by data_set
and have learner
along the x-axis.
multiple feature selection methods and one learner: the default is to
split by metric
, facet by data_set
and have fs_method
along the
x-axis.
multiple feature selection methods and learners: the default is to split
by metric
, facet by data_set
, colour by fs_method
and have learner
along the x-axis.
If applicable, additional faceting is performed for evaluation_time
.
Available palettes for discrete_palette
and gradient_palette
are those
listed by grDevices::palette.pals()
(requires R >= 4.0.0),
grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
,
terrain.colors
, topo.colors
and cm.colors
, which correspond to the
palettes of the same name in grDevices
. If not specified, a default
palette based on palettes in Tableau are used. You may also specify your
own palette by using colour names listed by grDevices::colors()
or
through hexadecimal RGB strings.
Labeling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates partial dependence plots based on data in a familiarCollection object.
plot_pd( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_pd( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_pd( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_pd( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, gradient_palette = NULL, gradient_palette_range = NULL, x_label = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, novelty_range = NULL, value_scales = waiver(), novelty_scales = waiver(), conf_int_style = c("ribbon", "step", "none"), conf_int_alpha = 0.4, show_novelty = TRUE, anchor_values = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created individual
conditional expectation plots are saved to. Output is saved in the
|
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use to colour the different
plot elements in case a value was provided to the |
gradient_palette |
(optional) Sequential or divergent palette used to colour the raster in 2D individual conditional expectation or partial dependence plots. This argument is not used for 1D plots. |
gradient_palette_range |
(optional) Numerical range used to span the
gradient for 2D plots. This should be a range of two values, e.g. |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
novelty_range |
(optional) Numerical range used to span the range of
novelty values. This determines the size of the bubbles in 2D, and
transparency of lines in 1D. This should be a range of two values, e.g.
|
value_scales |
(optional) Sets scaling of predicted values. This parameter has several options:
For 1D plots, this option is ignored if the |
novelty_scales |
(optional) Sets scaling of novelty values, similar to
the
|
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
show_novelty |
(optional) Sets whether novelty is shown in plots. |
anchor_values |
(optional) A single value or a named list or array of
values that are used to centre the individual conditional expectation plot.
A single value is valid if and only if only a single feature is assessed.
Otherwise, values Has no effect if the plot is not shown, i.e.
|
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates partial dependence plots. These plots come in two varieties, namely 1D and 2D. 1D plots show the predicted value as function of a single feature, whereas 2D plots show the predicted value as a function of two features.
Available splitting variables are: feature_x
, feature_y
(2D only),
fs_method
, learner
, data_set
and positive_class
(categorical
outcomes) or evaluation_time
(survival outcomes). By default, for 1D ICE
plots the data are split by feature_x
, fs_method
and learner
, with
faceting by data_set
, positive_class
or evaluation_time
. If only
partial dependence is shown, positive_class
and evaluation_time
are
used to set colours instead. For 2D plots, by default the data are split by
feature_x
, fs_method
and learner
, with faceting by data_set
,
positive_class
or evaluation_time
. The color_by
argument cannot be
used with 2D plots, and attempting to do so causes an error. Attempting to
specify feature_x
or feature_y
for color_by
will likewise result in
an error, as multiple features cannot be shown in the same facet.
The splitting variables indicated by color_by
are coloured according to
the discrete_palette
parameter. This parameter is therefore only used for
1D plots. Available palettes for discrete_palette
and gradient_palette
are those listed by grDevices::palette.pals()
(requires R >= 4.0.0),
grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
,
terrain.colors
, topo.colors
and cm.colors
, which correspond to the
palettes of the same name in grDevices
. If not specified, a default
palette based on palettes in Tableau are used. You may also specify your
own palette by using colour names listed by grDevices::colors()
or
through hexadecimal RGB strings.
Bootstrap confidence intervals of the partial dependence plots can be shown
using various styles set by conf_int_style
:
ribbon
(default): confidence intervals are shown as a ribbon with an
opacity of conf_int_alpha
around the point estimate of the partial
dependence.
step
(default): confidence intervals are shown as a step function around
the point estimate of the partial dependence.
none
: confidence intervals are not shown. The point estimate of the
partial dependence is shown as usual.
Labelling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This function plots the data on permutation variable importance stored in a familiarCollection object.
plot_permutation_variable_importance( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, conf_int_style = c("point_line", "line", "bar_line", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_permutation_variable_importance( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, conf_int_style = c("point_line", "line", "bar_line", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_permutation_variable_importance( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, conf_int_style = c("point_line", "line", "bar_line", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
plot_permutation_variable_importance( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, conf_int_style = c("point_line", "line", "bar_line", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_permutation_variable_importance( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, conf_int_style = c("point_line", "line", "bar_line", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_permutation_variable_importance( object, draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, discrete_palette = NULL, x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, conf_int_style = c("point_line", "line", "bar_line", "none"), conf_int_alpha = 0.4, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... )
object |
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created figures are
saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette used to fill the bars in case a
non-singular variable was provided to the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
conf_int_style |
(optional) Confidence interval style. See details for allowed styles. |
conf_int_alpha |
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed. |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates a horizontal barplot that lists features by the estimated model improvement over that of a dataset where the respective feature is randomly permuted.
The following splitting variables are available for split_by
, color_by
and facet_by
:
fs_method
: feature selection methods.
learner
: learners.
data_set
: data sets.
metric
: the model performance metrics.
evaluation_time
: the evaluation times (survival outcomes only).
similarity_threshold
: the similarity threshold used to identify groups
of features to permute simultaneously.
By default, the data is split by fs_method
, learner
and metric
,
faceted by data_set
and evaluation_time
, and coloured by
similarity_threshold
.
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labelling methods such as set_fs_method_names
or set_feature_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
Bootstrap confidence intervals (if present) can be shown using various
styles set by conf_int_style
:
point_line
(default): confidence intervals are shown as lines, on which
the point estimate is likewise shown.
line
(default): confidence intervals are shown as lines, but the point
estimate is not shown.
bar_line
: confidence intervals are shown as lines, with the point
estimate shown as a bar plot with the opacity of conf_int_alpha
.
none
: confidence intervals are not shown. The point estimate is shown as
a bar plot.
For metrics where lower values indicate better model performance, more negative permutation variable importance values indicate features that are more important. Because this may cause confusion, values obtained for these metrics are mirrored around 0.0 for plotting (but not any tabular data export).
NULL
or list of plot objects, if dir_path
is NULL
.
This method creates a heatmap based on data stored in a
familiarCollection
object. Features in the heatmap are ordered so that
more similar features appear together.
plot_sample_clustering( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), outcome_palette = NULL, outcome_palette_range = waiver(), x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), outcome_legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 3, x_breaks = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_feature_dendrogram = TRUE, show_sample_dendrogram = TRUE, show_normalised_data = TRUE, show_outcome = TRUE, dendrogram_height = grid::unit(1.5, "cm"), outcome_height = grid::unit(0.3, "cm"), evaluation_times = waiver(), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, verbose = TRUE, ... ) ## S4 method for signature 'ANY' plot_sample_clustering( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), outcome_palette = NULL, outcome_palette_range = waiver(), x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), outcome_legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 3, x_breaks = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_feature_dendrogram = TRUE, show_sample_dendrogram = TRUE, show_normalised_data = TRUE, show_outcome = TRUE, dendrogram_height = grid::unit(1.5, "cm"), outcome_height = grid::unit(0.3, "cm"), evaluation_times = waiver(), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, verbose = TRUE, ... ) ## S4 method for signature 'familiarCollection' plot_sample_clustering( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), outcome_palette = NULL, outcome_palette_range = waiver(), x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), outcome_legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 3, x_breaks = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_feature_dendrogram = TRUE, show_sample_dendrogram = TRUE, show_normalised_data = TRUE, show_outcome = TRUE, dendrogram_height = grid::unit(1.5, "cm"), outcome_height = grid::unit(0.3, "cm"), evaluation_times = waiver(), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, verbose = TRUE, ... )
plot_sample_clustering( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), outcome_palette = NULL, outcome_palette_range = waiver(), x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), outcome_legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 3, x_breaks = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_feature_dendrogram = TRUE, show_sample_dendrogram = TRUE, show_normalised_data = TRUE, show_outcome = TRUE, dendrogram_height = grid::unit(1.5, "cm"), outcome_height = grid::unit(0.3, "cm"), evaluation_times = waiver(), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, verbose = TRUE, ... ) ## S4 method for signature 'ANY' plot_sample_clustering( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), outcome_palette = NULL, outcome_palette_range = waiver(), x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), outcome_legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 3, x_breaks = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_feature_dendrogram = TRUE, show_sample_dendrogram = TRUE, show_normalised_data = TRUE, show_outcome = TRUE, dendrogram_height = grid::unit(1.5, "cm"), outcome_height = grid::unit(0.3, "cm"), evaluation_times = waiver(), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, verbose = TRUE, ... ) ## S4 method for signature 'familiarCollection' plot_sample_clustering( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, y_axis_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, ggtheme = NULL, gradient_palette = NULL, gradient_palette_range = waiver(), outcome_palette = NULL, outcome_palette_range = waiver(), x_label = waiver(), x_label_shared = "column", y_label = waiver(), y_label_shared = "row", legend_label = waiver(), outcome_legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 3, x_breaks = NULL, y_range = NULL, y_n_breaks = 3, y_breaks = NULL, rotate_x_tick_labels = waiver(), show_feature_dendrogram = TRUE, show_sample_dendrogram = TRUE, show_normalised_data = TRUE, show_outcome = TRUE, dendrogram_height = grid::unit(1.5, "cm"), outcome_height = grid::unit(0.3, "cm"), evaluation_times = waiver(), width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, verbose = TRUE, ... )
object |
A |
feature_cluster_method |
The method used to perform clustering. These are
the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
sample_cluster_method |
The method used to perform clustering based on
distance between samples. These are the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
sample_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
sample_limit |
(optional) Set the upper limit of the number of samples that are used during evaluation steps. Cannot be less than 20. This setting can be specified per data element by providing a parameter
value in a named list with data elements, e.g.
This parameter can be set for the following data elements:
|
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created performance
plots are saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
x_axis_by |
(optional) Variable plotted along the x-axis of a plot.
The variable cannot overlap with variables provided to the |
y_axis_by |
(optional) Variable plotted along the y-axis of a plot.
The variable cannot overlap with variables provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
ggtheme |
(optional) |
gradient_palette |
(optional) Sequential or divergent palette used to colour the similarity or distance between features in a heatmap. |
gradient_palette_range |
(optional) Numerical range used to span the
gradient. This should be a range of two values, e.g. |
outcome_palette |
(optional) Sequential ( |
outcome_palette_range |
(optional) Numerical range used to span the
gradient of numeric ( |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
x_label_shared |
(optional) Sharing of x-axis labels between facets. One of three values:
|
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
y_label_shared |
(optional) Sharing of y-axis labels between facets. One of three values:
|
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
outcome_legend_label |
(optional) Label to provide to the legend for
outcome data. If NULL, the legend will not have a name. By default,
|
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
rotate_x_tick_labels |
(optional) Rotate tick labels on the x-axis by
90 degrees. Defaults to |
show_feature_dendrogram |
(optional) Show feature dendrogram around
the main panel. Can be If a position is specified, it should be appropriate with regard to the
A dendrogram can only be drawn from cluster methods that produce
dendograms, such as |
show_sample_dendrogram |
(optional) Show sample dendrogram around the
main panel. Can be If a position is specified, it should be appropriate with regard to the
A dendrogram can only be drawn from cluster methods that produce
dendograms, such as |
show_normalised_data |
(optional) Flag that determines whether the
data shown in the main heatmap is normalised using the same settings as
within the analysis ( Categorial variables are plotted to span 90% of the entire numerical value range, i.e. the levels of categorical variables with 2 levels are represented at 5% and 95% of the range, with 3 levels at 5%, 50%, and 95%, etc. |
show_outcome |
(optional) Show outcome column(s) or row(s) in the
graph. Can be If a position is specified, it should be appropriate with regard to the
The outcome data will be drawn between the main panel and the sample dendrogram (if any). |
dendrogram_height |
(optional) Height of the dendrogram. The height is
1.5 cm by default. Height is expected to be grid unit (see |
outcome_height |
(optional) Height of an outcome data column/row. The
height is 0.3 cm by default. Height is expected to be a grid unit (see
|
evaluation_times |
(optional) Times at which the event status of
time-to-event survival outcomes are determined. Only used for |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
verbose |
Flag to indicate whether feedback should be provided for the plotting. |
... |
Arguments passed on to
|
This function generates area under the ROC curve plots.
Available splitting variables are: fs_method
, learner
, and data_set
.
By default, the data is split by fs_method
and learner
and data_set
,
since the number of samples will typically differ between data sets, even
for the same feature selection method and learner.
The x_axis_by
and y_axis_by
arguments determine what data are shown
along which axis. Each argument takes one of feature
and sample
, and
both arguments should be unique. By default, features are shown along the
x-axis and samples along the y-axis.
Note that similarity is determined based on the underlying data. Hence the ordering of features may differ between facets, and tick labels are maintained for each panel.
Available palettes for gradient_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labeling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This function plots the univariate analysis data stored in a familiarCollection object.
plot_univariate_importance( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, p_adjustment_method = waiver(), split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, significance_level_shown = 0.05, width = waiver(), height = waiver(), units = waiver(), verbose = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_univariate_importance( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, p_adjustment_method = waiver(), split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, significance_level_shown = 0.05, width = waiver(), height = waiver(), units = waiver(), verbose = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_univariate_importance( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, p_adjustment_method = waiver(), split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, significance_level_shown = 0.05, width = waiver(), height = waiver(), units = waiver(), verbose = TRUE, export_collection = FALSE, ... )
plot_univariate_importance( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, p_adjustment_method = waiver(), split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, significance_level_shown = 0.05, width = waiver(), height = waiver(), units = waiver(), verbose = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_univariate_importance( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, p_adjustment_method = waiver(), split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, significance_level_shown = 0.05, width = waiver(), height = waiver(), units = waiver(), verbose = TRUE, export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_univariate_importance( object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), draw = FALSE, dir_path = NULL, p_adjustment_method = waiver(), split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = waiver(), y_label = "feature", legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, x_range = NULL, x_n_breaks = 5, x_breaks = NULL, significance_level_shown = 0.05, width = waiver(), height = waiver(), units = waiver(), verbose = TRUE, export_collection = FALSE, ... )
object |
A |
feature_cluster_method |
The method used to perform clustering. These are
the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_cluster_cut_method |
The method used to divide features into
separate clusters. The available methods are the same as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_similarity_threshold |
The threshold level for pair-wise
similarity that is required to form feature clusters with the If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created figures are
saved to. Output is saved in the |
p_adjustment_method |
(optional) Indicates type of p-value that is
shown. One of |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
show_cluster |
(optional) Show which features were clustered together. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette used to fill the bars in case a
non-singular variable was provided to the |
gradient_palette |
(optional) Palette to use for filling the bars in
case the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
x_range |
(optional) Value range for the x-axis. |
x_n_breaks |
(optional) Number of breaks to show on the x-axis of the
plot. |
x_breaks |
(optional) Break points on the x-axis of the plot. |
significance_level_shown |
Position(s) to draw vertical lines indicating a significance level, e.g. 0.05. Can be NULL to not draw anything. |
width |
(optional) Width of the plot. A default value is derived from the number of facets. |
height |
(optional) Height of the plot. A default value is derived from the number of features and the number of facets. |
units |
(optional) Plot size unit. Either |
verbose |
Flag to indicate whether feedback should be provided for the plotting. |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates a horizontal barplot with the length of the bars corresponding to the 10-logarithm of the (multiple-testing corrected) p-value or q-value.
Features are assessed univariately using one-sample location t-tests after fitting a suitable regression model. The fitted model coefficient and the covariance matrix are then used to compute a p-value.
The following splitting variables are available for split_by
, color_by
and facet_by
:
fs_method
: feature selection methods
learner
: learners
data_set
: data sets
Unlike for plots of feature ranking in feature selection and after
modelling (as assessed by model-specific routines), clusters of features
are now found during creation of underlying familiarData
objects, instead
of through consensus clustering. Hence, clustering results may differ due
to differences in the underlying datasets.
Available palettes for discrete_palette
and gradient_palette
are those
listed by grDevices::palette.pals()
(requires R >= 4.0.0),
grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
,
terrain.colors
, topo.colors
and cm.colors
, which correspond to the
palettes of the same name in grDevices
. If not specified, a default
palette based on palettes in Tableau are used. You may also specify your
own palette by using colour names listed by grDevices::colors()
or
through hexadecimal RGB strings.
Labelling methods such as set_fs_method_names
or set_feature_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
This function plots variable importance based data obtained
during feature selection or after training a model, which are stored in a
familiarCollection
object.
plot_variable_importance( object, type, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), aggregation_method = waiver(), rank_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = "feature", rotate_x_tick_labels = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_variable_importance( object, type, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), aggregation_method = waiver(), rank_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = "feature", rotate_x_tick_labels = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_variable_importance( object, type, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), aggregation_method = waiver(), rank_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = "feature", rotate_x_tick_labels = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) plot_feature_selection_occurrence(...) plot_feature_selection_variable_importance(...) plot_model_signature_occurrence(...) plot_model_signature_variable_importance(...)
plot_variable_importance( object, type, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), aggregation_method = waiver(), rank_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = "feature", rotate_x_tick_labels = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'ANY' plot_variable_importance( object, type, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), aggregation_method = waiver(), rank_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = "feature", rotate_x_tick_labels = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) ## S4 method for signature 'familiarCollection' plot_variable_importance( object, type, feature_cluster_method = waiver(), feature_linkage_method = waiver(), feature_cluster_cut_method = waiver(), feature_similarity_threshold = waiver(), aggregation_method = waiver(), rank_threshold = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, color_by = NULL, facet_by = NULL, facet_wrap_cols = NULL, show_cluster = TRUE, ggtheme = NULL, discrete_palette = NULL, gradient_palette = waiver(), x_label = "feature", rotate_x_tick_labels = waiver(), y_label = waiver(), legend_label = waiver(), plot_title = waiver(), plot_sub_title = waiver(), caption = NULL, y_range = NULL, y_n_breaks = 5, y_breaks = NULL, width = waiver(), height = waiver(), units = waiver(), export_collection = FALSE, ... ) plot_feature_selection_occurrence(...) plot_feature_selection_variable_importance(...) plot_model_signature_occurrence(...) plot_model_signature_variable_importance(...)
object |
A |
type |
Determine what variable importance should be shown. Can be
|
feature_cluster_method |
The method used to perform clustering. These are
the same methods as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_linkage_method |
The method used for agglomerative clustering in
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_cluster_cut_method |
The method used to divide features into
separate clusters. The available methods are the same as for the
If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
feature_similarity_threshold |
The threshold level for pair-wise
similarity that is required to form feature clusters with the If not provided explicitly, this parameter is read from settings used at
creation of the underlying |
aggregation_method |
(optional) The method used to aggregate variable importances over different data subsets, e.g. bootstraps. The following methods can be selected:
|
rank_threshold |
(optional) The threshold used to define the subset of highly important features. If not set, this threshold is determined by maximising the variance in the occurrence value over all features over the subset size. This parameter is only relevant for |
draw |
(optional) Draws the plot if TRUE. |
dir_path |
(optional) Path to the directory where created figures are
saved to. Output is saved in the |
split_by |
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables. |
color_by |
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the |
facet_by |
(optional) Variables used to determine how and if facets of
each figure appear. In case the |
facet_wrap_cols |
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead. |
show_cluster |
(optional) Show which features were clustered together. Currently not available in combination with variable importance obtained during feature selection. |
ggtheme |
(optional) |
discrete_palette |
(optional) Palette to use for coloring bar plots,
in case a non-singular variable was provided to the |
gradient_palette |
(optional) Palette to use for filling the bars in
case the |
x_label |
(optional) Label to provide to the x-axis. If NULL, no label is shown. |
rotate_x_tick_labels |
(optional) Rotate tick labels on the x-axis by
90 degrees. Defaults to |
y_label |
(optional) Label to provide to the y-axis. If NULL, no label is shown. |
legend_label |
(optional) Label to provide to the legend. If NULL, the legend will not have a name. |
plot_title |
(optional) Label to provide as figure title. If NULL, no title is shown. |
plot_sub_title |
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown. |
caption |
(optional) Label to provide as figure caption. If NULL, no caption is shown. |
y_range |
(optional) Value range for the y-axis. |
y_n_breaks |
(optional) Number of breaks to show on the y-axis of the
plot. |
y_breaks |
(optional) Break points on the y-axis of the plot. |
width |
(optional) Width of the plot. A default value is derived from the number of facets and the number of features. |
height |
(optional) Height of the plot. A default value is derived
from number of facets, and the length of the longest feature name (if
|
units |
(optional) Plot size unit. Either |
export_collection |
(optional) Exports the collection if TRUE. |
... |
Arguments passed on to
|
This function generates a barplot based on variable importance of features.
The only allowed values for split_by
, color_by
or facet_by
are
fs_method
and learner
, but note that learner
has no effect when
plotting variable importance of features acquired during feature selection.
Available palettes for discrete_palette
and gradient_palette
are those
listed by grDevices::palette.pals()
(requires R >= 4.0.0),
grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
,
terrain.colors
, topo.colors
and cm.colors
, which correspond to the
palettes of the same name in grDevices
. If not specified, a default
palette based on palettes in Tableau are used. You may also specify your
own palette by using colour names listed by grDevices::colors()
or
through hexadecimal RGB strings.
Labeling methods such as set_feature_names
or set_fs_method_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
NULL
or list of plot objects, if dir_path
is NULL
.
Creates data assignment.
precompute_data_assignment( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", verbose = TRUE, ... )
precompute_data_assignment( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", verbose = TRUE, ... )
formula |
An R formula. The formula can only contain feature names and
dot ( Use of the formula interface is optional. |
data |
A
All data is expected to be in wide format, and ideally has a sample
identifier (see In case paths are provided, the data should be stored as |
experiment_data |
Experimental data may provided in the form of |
cl |
Cluster created using the This parameter has no effect if the |
experimental_design |
(required) Defines what the experiment looks
like, e.g.
The different components are linked using Different subsampling methods can be used in conjunction with the basic workflow components:
As shown in the example above, sampling algorithms can be nested. Though neither variable importance is determined nor models are learned
within The simplest valid experimental design is |
verbose |
Indicates verbosity of the results. Default is TRUE, and all messages and warnings are returned. |
... |
Arguments passed on to
|
This is a thin wrapper around summon_familiar
, and functions like
it, but automatically skips computation of variable importance, learning
and subsequent evaluation steps.
The function returns an experimentData
object, which can be used to
warm-start other experiments by providing it to the experiment_data
argument.
An experimentData
object.
Creates data assignment and subsequently extracts feature information such as normalisation and clustering parameters.
precompute_feature_info( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", verbose = TRUE, ... )
precompute_feature_info( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", verbose = TRUE, ... )
formula |
An R formula. The formula can only contain feature names and
dot ( Use of the formula interface is optional. |
data |
A
All data is expected to be in wide format, and ideally has a sample
identifier (see In case paths are provided, the data should be stored as |
experiment_data |
Experimental data may provided in the form of |
cl |
Cluster created using the This parameter has no effect if the |
experimental_design |
(required) Defines what the experiment looks
like, e.g.
The different components are linked using Different subsampling methods can be used in conjunction with the basic workflow components:
As shown in the example above, sampling algorithms can be nested. Though neither variable importance is determined nor models are learned
within The simplest valid experimental design is This argument is ignored if the |
verbose |
Indicates verbosity of the results. Default is TRUE, and all messages and warnings are returned. |
... |
Arguments passed on to
|
This is a thin wrapper around summon_familiar
, and functions like
it, but automatically skips computation of variable importance, learning
and subsequent evaluation steps.
The function returns an experimentData
object, which can be used to
warm-start other experiments by providing it to the experiment_data
argument.
An experimentData
object.
Creates data assignment, extracts feature information and subsequently computes variable importance.
precompute_vimp( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", fs_method = NULL, fs_method_parameter = NULL, verbose = TRUE, ... )
precompute_vimp( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", fs_method = NULL, fs_method_parameter = NULL, verbose = TRUE, ... )
formula |
An R formula. The formula can only contain feature names and
dot ( Use of the formula interface is optional. |
data |
A
All data is expected to be in wide format, and ideally has a sample
identifier (see In case paths are provided, the data should be stored as |
experiment_data |
Experimental data may provided in the form of |
cl |
Cluster created using the This parameter has no effect if the |
experimental_design |
(required) Defines what the experiment looks
like, e.g.
The different components are linked using Different subsampling methods can be used in conjunction with the basic workflow components:
As shown in the example above, sampling algorithms can be nested. The simplest valid experimental design is This argument is ignored if the |
fs_method |
(required) Feature selection method to be used for
determining variable importance. More than one feature selection method can be chosen. The experiment will then repeated for each feature selection method. Feature selection methods determines the ranking of features. Actual selection of features is done by optimising the signature size model hyperparameter during the hyperparameter optimisation step. |
fs_method_parameter |
(optional) List of lists containing parameters for feature selection methods. Each sublist should have the name of the feature selection method it corresponds to. Most feature selection methods do not have parameters that can be set. Please refer to the vignette on feature selection methods for more details. Note that if the feature selection method is based on a learner (e.g. lasso regression), hyperparameter optimisation may be performed prior to assessing variable importance. |
verbose |
Indicates verbosity of the results. Default is TRUE, and all messages and warnings are returned. |
... |
Arguments passed on to
|
This is a thin wrapper around summon_familiar
, and functions like
it, but automatically skips learning and subsequent evaluation steps.
The function returns an experimentData
object, which can be used to
warm-start other experiments by providing it to the experiment_data
argument. Variable importance may be retrieved from this object using the
get_vimp_table
and aggregate_vimp_table
methods.
An experimentData
object.
get_vimp_table
, aggregate_vimp_table
Fits the model or ensemble of models to the data and shows the result.
predict(object, ...) ## S4 method for signature 'familiarModel' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... ) ## S4 method for signature 'familiarEnsemble' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... ) ## S4 method for signature 'familiarNoveltyDetector' predict(object, newdata, type = "novelty", ...) ## S4 method for signature 'list' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... ) ## S4 method for signature 'character' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... )
predict(object, ...) ## S4 method for signature 'familiarModel' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... ) ## S4 method for signature 'familiarEnsemble' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... ) ## S4 method for signature 'familiarNoveltyDetector' predict(object, newdata, type = "novelty", ...) ## S4 method for signature 'list' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... ) ## S4 method for signature 'character' predict( object, newdata, type = "default", time = NULL, dir_path = NULL, ensemble_method = "median", stratification_threshold = NULL, stratification_method = NULL, percentiles = NULL, ... )
object |
A familiar model or ensemble of models that should be used for prediction. This can also be a path to the ensemble model, one or more paths to models, or a list of models. |
... |
to be documented. |
newdata |
Data to which the models are fitted. |
type |
Type of prediction made. The following values are directly supported:
Other values for type are passed to the fitting method of the actual
underlying model. For example for generalised linear models ( |
time |
Time at which the response ( |
dir_path |
Path to the folder containing the models. Ensemble objects
are stored with the models detached. In case the models were moved since
creation, |
ensemble_method |
Method for ensembling predictions from models for the same sample. Available methods are:
|
stratification_threshold |
Threshold value(s) used for stratifying
instances into risk groups. If this parameter is specified,
|
stratification_method |
Selects the stratification method from which the
threshold values should be selected. If the model or ensemble of models
does not contain thresholds for the indicated method, an error is returned.
In addition this argument is ignored if a |
percentiles |
Currently unused. |
This method is used to predict values for instances specified by the
newdata
using the model or ensemble of models specified by the object
argument.
A data.table
with predicted values.
Tabular exports and figures created from a familiarCollection object can be customised by providing names for outcome classes.
## S4 method for signature 'familiarCollection' set_class_names(x, old = NULL, new = NULL, order = NULL)
## S4 method for signature 'familiarCollection' set_class_names(x, old = NULL, new = NULL, order = NULL)
x |
A familiarCollection object. |
old |
(optional) Set of old labels to replace. |
new |
Set of replacement labels. The number of replacement labels should
be equal to the number of provided old labels or the full number of labels.
If a subset of labels is to be replaced, both |
order |
(optional) Ordered set of replacement labels. This is used to provide the order in which the labels should be placed, which affects e.g. levels in a plot. If the ordering is not explicitly provided, the old ordering is used. |
Labels convert the internal naming for class levels to the requested
label at export or when plotting. This enables customisation of class
names. Currently assigned labels can be found using the
get_class_names
method.
A familiarCollection object with updated labels.
familiarCollection for information concerning the
familiarCollection class. * get_class_names
for obtaining
currently assigned class names.
Tabular exports and figures created from a familiarCollection object can be customised by setting data labels.
## S4 method for signature 'familiarCollection' set_data_set_names(x, old = NULL, new = NULL, order = NULL)
## S4 method for signature 'familiarCollection' set_data_set_names(x, old = NULL, new = NULL, order = NULL)
x |
A familiarCollection object. |
old |
(optional) Set of old labels to replace. |
new |
Set of replacement labels. The number of replacement labels should
be equal to the number of provided old labels or the full number of labels.
If a subset of labels is to be replaced, both |
order |
(optional) Ordered set of replacement labels. This is used to provide the order in which the labels should be placed, which affects e.g. levels in a plot. If the ordering is not explicitly provided, the old ordering is used. |
Labels convert internal naming of data sets to the requested label
at export or when plotting. Currently assigned labels can be found using
the get_data_set_names
method.
A familiarCollection object with custom names for the data sets.
familiarCollection for information concerning the
familiarCollection class. * get_data_set_names
for obtaining
currently assigned labels.
Tabular exports and figures created from a familiarCollection object can be customised by providing names for features.
## S4 method for signature 'familiarCollection' set_feature_names(x, old = NULL, new = NULL, order = NULL)
## S4 method for signature 'familiarCollection' set_feature_names(x, old = NULL, new = NULL, order = NULL)
x |
A familiarCollection object. |
old |
(optional) Set of old labels to replace. |
new |
Set of replacement labels. The number of replacement labels should
be equal to the number of provided old labels or the full number of labels.
If a subset of labels is to be replaced, both |
order |
(optional) Ordered set of replacement labels. This is used to provide the order in which the labels should be placed, which affects e.g. levels in a plot. If the ordering is not explicitly provided, the old ordering is used. |
Labels convert the internal naming for features to the requested
label at export or when plotting. This enables customisation without
redoing the analysis with renamed input data. Currently assigned labels can
be found using the get_feature_names
method.
A familiarCollection object with updated labels.
familiarCollection for information concerning the
familiarCollection class. * get_feature_names
for obtaining
currently assigned feature names.
Tabular exports and figures created from a familiarCollection object can be customised by providing names for the feature selection methods.
## S4 method for signature 'familiarCollection' set_fs_method_names(x, old = NULL, new = NULL, order = NULL)
## S4 method for signature 'familiarCollection' set_fs_method_names(x, old = NULL, new = NULL, order = NULL)
x |
A familiarCollection object. |
old |
(optional) Set of old labels to replace. |
new |
Set of replacement labels. The number of replacement labels should
be equal to the number of provided old labels or the full number of labels.
If a subset of labels is to be replaced, both |
order |
(optional) Ordered set of replacement labels. This is used to provide the order in which the labels should be placed, which affects e.g. levels in a plot. If the ordering is not explicitly provided, the old ordering is used. |
Labels convert the internal naming for feature selection methods to
the requested label at export or when plotting. This enables the use of
more specific naming, e.g. changing mim
to Mutual Information
Maximisation
. Currently assigned labels can be found using the
get_fs_method_names
method.
A familiarCollection object with updated labels.
familiarCollection for information concerning the
familiarCollection class. * get_fs_method_names
for obtaining
currently assigned labels.
Tabular exports and figures created from a familiarCollection object can be customised by providing names for the learners.
## S4 method for signature 'familiarCollection' set_learner_names(x, old = NULL, new = NULL, order = NULL)
## S4 method for signature 'familiarCollection' set_learner_names(x, old = NULL, new = NULL, order = NULL)
x |
A familiarCollection object. |
old |
(optional) Set of old labels to replace. |
new |
Set of replacement labels. The number of replacement labels should
be equal to the number of provided old labels or the full number of labels.
If a subset of labels is to be replaced, both |
order |
(optional) Ordered set of replacement labels. This is used to provide the order in which the labels should be placed, which affects e.g. levels in a plot. If the ordering is not explicitly provided, the old ordering is used. |
Labels convert the internal naming for learners to the requested
label at export or when plotting. This enables the use of more specific
naming, e.g. changing random_forest_rfsrc
to Random Forest
.
Currently assigned labels can be found using the get_learner_names
method.
A familiarCollection object with custom labels for the learners.
familiarCollection for information concerning the
familiarCollection class. * get_learner_names
for obtaining
currently assigned labels.
Tabular exports and figures created from a familiarCollection object can be customised by providing names for risk groups in survival analysis.
## S4 method for signature 'familiarCollection' set_risk_group_names(x, old = NULL, new = NULL, order = NULL)
## S4 method for signature 'familiarCollection' set_risk_group_names(x, old = NULL, new = NULL, order = NULL)
x |
A familiarCollection object. |
old |
(optional) Set of old labels to replace. |
new |
Set of replacement labels. The number of replacement labels should
be equal to the number of provided old labels or the full number of labels.
If a subset of labels is to be replaced, both |
order |
(optional) Ordered set of replacement labels. This is used to provide the order in which the labels should be placed, which affects e.g. levels in a plot. If the ordering is not explicitly provided, the old ordering is used. |
Labels convert the internal naming for risk groups to the requested
label at export or when plotting. This enables customisation of risk group
names. Currently assigned labels can be found using the
get_risk_group_names
method.
A familiarCollection object with updated labels.
familiarCollection for information concerning the
familiarCollection class. * get_risk_group_names
for obtaining
currently assigned risk group labels.
summary
produces model summaries.
summary(object, ...) ## S4 method for signature 'familiarModel' summary(object, ...)
summary(object, ...) ## S4 method for signature 'familiarModel' summary(object, ...)
object |
a familiarModel object |
... |
additional arguments passed to |
This method extends the summary
S3 method. For some models
summary
requires information that is trimmed from the model. In this case
a copy of summary data is stored with the model, and returned.
Depends on underlying model. See the documentation for the particular models.
Perform end-to-end machine learning and data analysis
summon_familiar( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, config = NULL, config_id = 1L, verbose = TRUE, .stop_after = "evaluation", ... )
summon_familiar( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, config = NULL, config_id = 1L, verbose = TRUE, .stop_after = "evaluation", ... )
formula |
An R formula. The formula can only contain feature names and
dot ( Use of the formula interface is optional. |
data |
A
All data is expected to be in wide format, and ideally has a sample
identifier (see In case paths are provided, the data should be stored as |
experiment_data |
Experimental data may provided in the form of |
cl |
Cluster created using the This parameter has no effect if the |
config |
List containing configuration parameters, or path to an All parameters can also be set programmatically. These supersede any arguments derived from the configuration list. |
config_id |
Identifier for the configuration in case the list or |
verbose |
Indicates verbosity of the results. Default is TRUE, and all messages and warnings are returned. |
.stop_after |
Variable for internal use. |
... |
Arguments passed on to
|
Nothing. All output is written to the experiment directory. If the experiment directory is in a temporary location, a list with all familiarModel, familiarEnsemble, familiarData and familiarCollection objects will be returned.
Storey, J. D. A direct approach to false discovery rates. J. R. Stat. Soc. Series B Stat. Methodol. 64, 479–498 (2002).
Shrout, P. E. & Fleiss, J. L. Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86, 420–428 (1979).
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155–163 (2016).
Yeo, I. & Johnson, R. A. A new family of power transformations to improve normality or symmetry. Biometrika 87, 954–959 (2000).
Box, G. E. P. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Series B Stat. Methodol. 26, 211–252 (1964).
Raymaekers, J., Rousseeuw, P. J. Transforming variables to central normality. Mach Learn. (2021).
Park, M. Y., Hastie, T. & Tibshirani, R. Averaged gene expressions for regression. Biostatistics 8, 212–227 (2007).
Tolosi, L. & Lengauer, T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27, 1986–1994 (2011).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007)
Kaufman, L. & Rousseeuw, P. J. Finding groups in data: an introduction to cluster analysis. (John Wiley & Sons, 2009).
Muellner, D. fastcluster: fast hierarchical, agglomerative clustering routines for R and Python. J. Stat. Softw. 53, 1–18 (2013).
Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008).
McFadden, D. Conditional logit analysis of qualitative choice behavior. in Frontiers in Econometrics (ed. Zarembka, P.) 105–142 (Academic Press, 1974).
Cox, D. R. & Snell, E. J. Analysis of binary data. (Chapman and Hall, 1989).
Nagelkerke, N. J. D. A note on a general definition of the coefficient of determination. Biometrika 78, 691–692 (1991).
Meinshausen, N. & Buehlmann, P. Stability selection. J. R. Stat. Soc. Series B Stat. Methodol. 72, 417–473 (2010).
Haury, A.-C., Gestraud, P. & Vert, J.-P. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS One 6, e28210 (2011).
Wald, R., Khoshgoftaar, T. M., Dittman, D., Awada, W. & Napolitano,A. An extensive comparison of feature ranking aggregation techniques in bioinformatics. in 2012 IEEE 13th International Conference on Information Reuse Integration (IRI) 377–384 (2012).
Hutter, F., Hoos, H. H. & Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. in Learning and Intelligent Optimization (ed. Coello, C. A. C.) 6683, 507–523 (Springer Berlin Heidelberg, 2011).
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 148–175 (2016)
Srinivas, N., Krause, A., Kakade, S. M. & Seeger, M. W. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Trans. Inf. Theory 58, 3250–3265 (2012)
Kaufmann, E., Cappé, O. & Garivier, A. On Bayesian upper confidence bounds for bandit problems. in Artificial intelligence and statistics 592–600 (2012).
Jamieson, K. & Talwalkar, A. Non-stochastic Best Arm Identification and Hyperparameter Optimization. in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (eds. Gretton, A. & Robert, C. C.) vol. 51 240–248 (PMLR, 2016).
Gramacy, R. B. laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R. Journal of Statistical Software 72, 1–46 (2016)
Sparapani, R., Spanbauer, C. & McCulloch, R. Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package. Journal of Statistical Software 97, 1–66 (2021)
Davison, A. C. & Hinkley, D. V. Bootstrap methods and their application. (Cambridge University Press, 1997).
Efron, B. & Hastie, T. Computer Age Statistical Inference. (Cambridge University Press, 2016).
Lausen, B. & Schumacher, M. Maximally Selected Rank Statistics. Biometrics 48, 73 (1992).
Hothorn, T. & Lausen, B. On the exact distribution of maximally selected rank statistics. Comput. Stat. Data Anal. 43, 121–137 (2003).
This is the default theme used for plots created by familiar. The theme uses
ggplot2::theme_light
as the base template.
theme_familiar( base_size = 10, base_family = "", base_line_size = 0.5, base_rect_size = 0.5 )
theme_familiar( base_size = 10, base_family = "", base_line_size = 0.5, base_rect_size = 0.5 )
base_size |
Base font size in points. Size of other plot text elements is based off this. |
base_family |
Font family used for text elements. |
base_line_size |
Base size for line elements, in points. |
base_rect_size |
Base size for rectangular elements, in points. |
A complete plotting theme.
Train models using familiar. Evaluation is not performed.
train_familiar( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", learner = NULL, hyperparameter = NULL, verbose = TRUE, ... )
train_familiar( formula = NULL, data = NULL, experiment_data = NULL, cl = NULL, experimental_design = "fs+mb", learner = NULL, hyperparameter = NULL, verbose = TRUE, ... )
formula |
An R formula. The formula can only contain feature names and
dot ( Use of the formula interface is optional. |
data |
A
All data is expected to be in wide format, and ideally has a sample
identifier (see In case paths are provided, the data should be stored as |
experiment_data |
Experimental data may provided in the form of |
cl |
Cluster created using the This parameter has no effect if the |
experimental_design |
(required) Defines what the experiment looks
like, e.g.
The different components are linked using Different subsampling methods can be used in conjunction with the basic workflow components:
As shown in the example above, sampling algorithms can be nested. The simplest valid experimental design is This argument is ignored if the |
learner |
(required) Name of the learner used to develop a model. A
sizeable number learners is supported in |
hyperparameter |
(optional) List, or nested list containing
hyperparameters for learners. If a nested list is provided, each sublist
should have the name of the learner method it corresponds to, with list
elements being named after the intended hyperparameter, e.g.
All learners have hyperparameters. Please refer to the vignette on learners for more details. If no parameters are provided, sequential model-based optimisation is used to determine optimal hyperparameters. Hyperparameters provided by the user are never optimised. However, if more than one value is provided for a single hyperparameter, optimisation will be conducted using these values. |
verbose |
Indicates verbosity of the results. Default is TRUE, and all messages and warnings are returned. |
... |
Arguments passed on to
|
This is a thin wrapper around summon_familiar
, and functions like
it, but automatically skips all evaluation steps. Only a single learner is
allowed.
One or more familiarModel objects.
Updates the model directory path of a familiarEnsemble
object.
update_model_dir_path(object, dir_path, ...) ## S4 method for signature 'familiarEnsemble' update_model_dir_path(object, dir_path) ## S4 method for signature 'ANY' update_model_dir_path(object, dir_path)
update_model_dir_path(object, dir_path, ...) ## S4 method for signature 'familiarEnsemble' update_model_dir_path(object, dir_path) ## S4 method for signature 'ANY' update_model_dir_path(object, dir_path)
object |
A |
dir_path |
Path to the directory where models are stored. |
... |
Unused arguments. |
Ensemble models created by familiar are often written to a directory on a local drive or network. In such cases, the actual models are detached, and paths to the models are stored instead. When the models are moved from their original location, they can no longer be found and attached to the ensemble. This method allows for pointing to the new directory containing the models.
A familiarEnsemble
object.
Provides backward compatibility for familiar objects exported to a file. This mitigates compatibility issues when working with files that become outdated as new versions of familiar are released, e.g. because slots have been removed.
update_object(object, ...) ## S4 method for signature 'familiarModel' update_object(object, ...) ## S4 method for signature 'familiarEnsemble' update_object(object, ...) ## S4 method for signature 'familiarData' update_object(object, ...) ## S4 method for signature 'familiarCollection' update_object(object, ...) ## S4 method for signature 'vimpTable' update_object(object, ...) ## S4 method for signature 'familiarNoveltyDetector' update_object(object, ...) ## S4 method for signature 'featureInfo' update_object(object, ...) ## S4 method for signature 'featureInfoParametersTransformationPowerTransform' update_object(object, ...) ## S4 method for signature 'experimentData' update_object(object, ...) ## S4 method for signature 'list' update_object(object, ...) ## S4 method for signature 'ANY' update_object(object, ...)
update_object(object, ...) ## S4 method for signature 'familiarModel' update_object(object, ...) ## S4 method for signature 'familiarEnsemble' update_object(object, ...) ## S4 method for signature 'familiarData' update_object(object, ...) ## S4 method for signature 'familiarCollection' update_object(object, ...) ## S4 method for signature 'vimpTable' update_object(object, ...) ## S4 method for signature 'familiarNoveltyDetector' update_object(object, ...) ## S4 method for signature 'featureInfo' update_object(object, ...) ## S4 method for signature 'featureInfoParametersTransformationPowerTransform' update_object(object, ...) ## S4 method for signature 'experimentData' update_object(object, ...) ## S4 method for signature 'list' update_object(object, ...) ## S4 method for signature 'ANY' update_object(object, ...)
object |
A |
... |
Unused arguments. |
An up-to-date version of the respective S4 object.
Calculate variance-covariance matrix for a model
vcov(object, ...) ## S4 method for signature 'familiarModel' vcov(object, ...)
vcov(object, ...) ## S4 method for signature 'familiarModel' vcov(object, ...)
object |
a familiarModel object |
... |
additional arguments passed to |
This method extends the vcov
S3 method. For some models vcov
requires information that is trimmed from the model. In this case a copy of
the variance-covariance matrix is stored with the model, and returned.
Variance-covariance matrix of the model in the familiarModel object, if any.
A vimpTable object contains information concerning variable importance of one or more features. These objects are created during feature selection.
vimpTable objects exists in various states. These states are generally incremental, i.e. one cannot turn a declustered table into the initial version. Some methods such as aggregation internally do some state reshuffling.
This object replaces the ad-hoc lists with information that were used in versions prior to familiar 1.2.0.
vimp_table
Table containing features with corresponding scores.
vimp_method
Method used to compute variable importance scores for each feature.
run_table
Run table for the data used to compute variable importances from. Used internally.
score_aggregation
Method used to aggregate the score of contrasts for each categorical feature, if any,
encoding_table
Table used to relate categorical features to their contrasts, if any. Not used for all variable importance methods.
cluster_table
Table used to relate original features with features after clustering. Variable importance is determined after feature processing, which includes clustering.
invert
Determines whether increasing score corresponds to increasing
(FALSE
) or decreasing rank (TRUE
). Used internally to determine how
ranks should be formed.
project_id
Identifier of the project that generated the vimpTable object.
familiar_version
Version of the familiar package used to create this table.
state
State of the variable importance table. The object can have the following states:
initial
: initial state, directly after the variable importance table is
filled.
decoded
: depending on the variable importance method, the initial
variable importance table may contain the scores of individual contrasts
for categorical variables. When decoded, data in the encoding_table
attribute has been used to aggregate scores from all contrasts into a
single score for each feature.
declustered
: variable importance is determined from fully processed
features, which includes clustering. This means that a single feature in
the variable importance table may represent multiple original features.
When a variable importance table has been declustered, all clusters have
been turned into their constituent features.
reclustered
: When the table is reclustered, features are replaced by
their respective clusters. This is actually used when updating the cluster
table to ensure it fits to a local context. This prevents issues when
attempting to aggregate or apply variable importance tables in data with
different feature preprocessing, and as a result, different clusters.
ranked
: The scores have been used to create ranks, with lower ranks
indicating better features.
aggregated
: Score and ranks from multiple variable importance tables
were aggregated.
get_vimp_table
, aggregate_vimp_table
This function is functionally identical to ggplot2::waiver()
function and
creates a waiver object. A waiver object is an otherwise empty object that
serves the same purpose as NULL
, i.e. as placeholder for a default value.
Because NULL
can sometimes be a valid input argument, it can therefore not
be used to switch to an internal default value.
waiver()
waiver()
waiver object