solution
column in mc_manhattan_plot()
when extended solutions data frame has no MC labelsweights matrix
merge.data_list()
as.list()
for dist_fns_list
, clust_fns_list
, and data_list
objectsgenerate_settings_matrix
needed paste0print.solutions_df()
misprinted the number of observations in the solutions data framemerge_dls()
is superseded by merge.data_lists()
ext_solutions_df
manipulation won't drop summary_features
and features
attributesestimate_nclust_given_graph
has more resiliency to floating point errors through tryCatch statement during eigengap quality assignmentestimate_nclust_given_graph
has more resiliency to floating point errors through tryCatch loop updating eigenvalue scalingdplyr_row_slice()
functions for classes solutions_df
and ext_solutions_df
extend_solutions()
extend_solutions
was not assigning feature types properly during p-value calculationsrbind.ext_solutions_df
now takes ...
parameter before reset_indices
parameter to avoid error during calls with unnamed parameters.rbind.solutions_df
now takes ...
parameter before reset_indices
parameter to avoid error during call without named parameters.snf_config
object made weights matrix lose its classlist
) -> (class data_list
, list
)data.frame
) -> solutions data frame (class solutions_df
, data.frame
)data.frame
) -> extended solutions data frame (class ext_solutions_df
, data.frame
)data.frame
) -> (class ext_solutions_df
, data.frame
)list
) -> distance functions list (class dist_fns_list
, list
)list
) -> clustering functions list (class clust_fns_list
, list
)matrix
, array
) -> (class weights_matrix
, matrix
, array
)generate_data_list()
-> data_list()
get_cluster_df()
, get_clusters()
, get_cluster_solutions()
) now all superseded by custom transposition of solutions_df
class objects (i.e., simply call t()
)generate_settings_matrix()
, generate_distance_metrics_list()
, generate_weights_matrix()
, generate_clust_algs_list()
) now all superseded by single function snf_config()
and the snf_config
class object it producessplit_vector
, either by adjusted_rand_index_heatmap()
or shiny_annotator()
, solutions_df
and ext_solutions_df
class objects can be annotated with their meta cluster labels using the function label_meta_clusters()
. This is necessary prior to usage of get_representative_solutions()
.as.data.frame()
batch_snf
no longer changes the output structure from a solutions data frame to a list of a solutions data frame and a similarity matrix list. Instead, the similarity matrix list is added to the solutions data frame as an attribute and can be extracted using the function sim_mats_list()
.calculate_coclustering()
functionprint()
functions have been defined for all major metasnf objects.Last update before CRAN submission.
set.seed
prior to generate_settings_matrix
instead.estimate_nclust_given_graph()
occasionally yielded incorrect number of cluster estimates as a result of improper scaling in metasnf v0.7.0. The scaling should be corrected now.mc_manhattan_plot()
with a data list containing duplicate feature namesmc_manhattan_plot()
parameter rep_solution
replaced with more accurate name extended_solutions_matrix
(solutions matrix with _pval columns)SNFtool::estimateNumberOfClustersGivenGraph()
could occasionally error out on the basis of calculating eigenvectors (eigengap heuristic) for a Laplacian with floating point values that were too small. Adapted function estimate_nclust_given_graph()
slightly scales up Laplacian to reduce the risk of encountering this error (presumably without any change to resulting cluster number estimate)get_matrix_order
has arguments allowing users to control which distance metric and agglomerative hierarchical clustering methods are used to sort matricesget_complete_uids
quickly pulls UIDs of observations with complete data from a list of dataframesextend_solutions
doesn't crash on multi-feature target listsgenerate_data_list()
remove_missing
parameter for generate_data_list
allowing subjects with incomplete data to remain in the data listlp_solutions_matrix
error message when training set is not subset of full data listgenerate_data_list
list elements now are named after their componentsmerge_data_lists
functionality to horizontally merge data listsextend_solutions()
will no longer crash when a data_list has the UID column in non-first position.generate_data_list()
enforces the UID column to be in first position of each dataframe.auto_plot()
will automatically generate bar and/or jitter plots showing how features in a data_list/target_list are distributed across a single cluster solutionshiny_annotator()
function can be used to identify indices of meta clusters within an adjusted_rand_index_heatmap
adjusted_rand_index_heatmap()
now has a split_vector
parameter that will slice a heatmap into meta clustersrename_dl()
can be used to rename features in a data_listmanhattan_plot
has been split into var_manhattan_plot
(key variable - all variables), esm_manhattan_plot
(cluster solutions in an extended solutions matrix to all variables), and mc_manhattan_plot
(like esm_manhattan_plot
, but at the meta-cluster level)get_representative_solutions
extracts max-ARI solutions from an extended solutions matrix based on a split_vector
containing meta cluster boundariesbatch_nmi
calculates NMI scores (see https://branchlab.github.io/metasnf/articles/nmi_scores.html)extend_solutions
will only calculate p-value summary measures (min/max/mean) for data_list passed in as a target_list
parameter, but will also accept and calculate p-values for a data_list passed in through the data_list
parameteradjusted_rand_index_heatmap
and assoc_pval_heatmap
have updated parameters to improve ease of use and flexibility (including easier colour control)get_clustered_subs
has been removed (does the same thing as get_cluster_df
)get_cluster_pval
deprecated for calc_assoc_pval
generate_data_list()
and its corresponding functionsremove_signal
has been renamed to linear_adjust
to better reflect its functionsummarize_distance_metrics_list
has been shortened to summarize_dml
correlation_pval_heatmap
has been renamed to assoc_pval_heatmap
calc_om_aris
has been renamed to calc_aris
extend_solutions
p-value calculation warnings are now suppressed_pval
instead of a mix of p_val
, pval
, and p
.pval_select
, p_val_select
, top_oms_per_cluster
, check_subj_orders_for_lp
, get_p
, chi_sq_pval
,pval_summaries
, which would calculate min/max/mean p-values, has been replaced with summarize_pvals
train_test_assign
now provides results as named list of subject vectors instead of a data.frame. keep_split
function has been removed accordingly.sort_subjects
parameter added to generate_data_list
to allow for sorting of subjects in the data_listextend_solutions
can now also be parallelized (see ?extend_solutions)remove_signal
function has sig_digs
parameter that can be used to restrict how many significant figures are returned in the resulting residualscalc_om_aris
is now MUCH faster after removing excessive calls to as.numeric
and enabling parallel processing with future.apply
. Thanks for the idea, Alper.extend_solutions
to better handle extreme p-values (e.g. infinity)p_val_select
with pval_select
which can also return negative-log p-valuesgenerate_data_list
correctly errors when components are only partially named (resolves https://github.com/BRANCHlab/metasnf/issues/10)lp_row
function has been replaced by lp_solutions_matrix
. The new function is order agnostic: full data lists can be constructed without any restriction on how training and testing set subjects are sorted. Subjects present in the provided solutions matrix to propagate are assumed to be the training subjects.calc_om_aris
now has progress
parameter. When set to true and used in conjunction with progressr::with_progress()
, a progress bar is shown for the calculations. Learn more with ?calc_om_aris
.grepl
instead of grep
used in extend_solutions
to reduce errors when no chi-squared warning occurskeep_split
will preserve observations who were assigned a split but were not present in the dataframe being split. Instead of being removed, those observations will have NA values.fraction_clustered_together
crashing when a cluster was assigned to only a single observationfraction_clustered_together
not running due to bracket typo when evaluating length of the data_listcorrelation_pval_heatmap
function can have significance stars disabled with significance_stars
parameterestimateNumberOfClustersGivenGraph
has been used up to this point without specifying a parameter for NUMC
. Consequently, final similarity matrices clustered with the default methods (spectral clustering based on eigen-gap or rotation cost heuristics) were not capable of resulting in more than 5 clusters. The default functions have been updated to span 2 clusters to 10 clusters. Users will likely see different clustering results as a result of this change. To replicate the behaviour of default spectral clustering prior to v0.3.0, users should copy the following code prior to the batch_snf command:clust_algs_list <- generate_clust_algs_list(
"spectral_eigen" = spectral_eigen_classic,
"spectral_rot" = spectral_rot_classic
)
# Adapt below as necessary
solutions_matrix <- batch_snf(
data_list,
settings_matrix,
clust_algs_list = clust_algs_list
)
fisher_exact_pval
function to avoid "FEXACT" error (like here https://github.com/Lagkouvardos/Rhea/issues/17). Impact on results is expected to be negligible.remove_signal()
enables correcting a data_list linearly for confounders / unwanted signal. Vignette is available: https://branchlab.github.io/metasnf/articles/confounders.html.batch_snf()
has new parameter automatic_standard_normalize
to switch out the default numeric distance measures (euclidean) with standard normalized variants.NEWS.md
file to track changes to the package.