nf-core/drugresponseeval
Pipeline for testing drug response prediction models in a statistically and biologically sound way.
Define the models and baselines to be tested.
Model to be tested.
string
NaiveDrugMeanPredictor
Model to be tested. See the documentation for a list of pre-implemented models. Can be multiple models separated by ','.
Baselines to be tested.
string
NaiveMeanEffectsPredictor
Baselines to be tested. See documentation of a list of available models. For baselines, randomization and robustness tests are not run. The NaiveMeanEffectsPredictor will always be included.
Define where the pipeline should find input data and save output data.
Run name for the pipeline. The subdirectory in results will be named like this.
string
my_run
You will need to set a run identifier for the pipeline. This is used to create a unique output directory for each run.
Name of the dataset. Pre-supplied datasets are CTRPv2, CTRPv1, CCLE, GDSC1, GDSC2, TOYv1, TOYv2.
string
CTRPv2
Name of the dataset used for the pipeline. This can be either one of the provided datasets ('GDSC1', 'GDSC2', 'CCLE', 'CTRPv2', 'TOYv1', 'TOYv2) in which case the datasets with the fitted curves is downloaded, or a custom dataset name, pointing either to raw viability measurements for automatic curve fitting, or pre-fit data (see no_refitting option; not recommended for dataset comparability reasons due to potential differences in fitting procedures).
The output directory where the results will be saved. Default is results/
string
results
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
Define the mode in which the pipeline will be run.
Run the pipeline in test mode LPO (Leave-random-Pairs-Out), LCO (Leave-Cell-line-Out), or LDO (Leave-Drug-Out).
string
LCO
^((LPO|LCO|LTO|LDO)?,?)*(?<!,)$
Which tests to run (LPO=Leave-random-Pairs-Out, LCO=Leave-Cell-line-Out, LTO=Leave-Tissue-Out, LDO=Leave-Drug-Out). Can be a list of test runs e.g. 'LPO,LCO,LTO,LDO' to run all tests. Default is LCO.
Options for randomization.
Randomization mode for the pipeline.
string
None
^(None|(?:SVR[CD]|SVC[CD])(,(?:SVR[CD]|SVC[CD]))*)$
Which randomization tests to run, additionally to the normal run. Default is None which means no randomization tests are run. Modes: SVCC, SVRC, SVCD, SVRD. Can be a list of randomization tests e.g. 'SCVC,SCVD' to run two tests. Default is None. SVCC: Single View Constant for Cell Lines: in this mode, one experiment is done for every cell line view the model uses (e.g. gene expression, mutation, ..). For each experiment one cell line view is held constant while the others are randomized. SVRC Single View Random for Cell Lines: in this mode, one experiment is done for every cell line view the model uses (e.g. gene expression, mutation, ..).
Randomization type for the pipeline.
string
type of randomization to use. Choose from "permutation", "invariant". Default is "permutation
Options for robustness.
Number of trials to run for the robustness test
integer
Number of trials to run for the robustness test. Default is 0, which means no robustness test is run. The robustness test is a test where the model is trained with varying seeds. This is done multiple times to see how stable the model is.
Options for data input.
Path to the data directory.
string
data
Path to the data directory. The downloaded data will be exported here. If you supply custom data, it goes here, too.
The name of the drug response measure to use.
string
Column of the response dataset in which the drug response is stored.
Datasets for cross-study prediction.
string
^(?:|(?:GDSC[12]|CCLE|CTRPv[12]|TOYv[12])(,(?:GDSC[12]|CCLE|CTRPv[12]|TOYv[12]))*)$
List of datasets to use to evaluate predictions across studies. Can be a combination like 'CTRPv1,CCLE'. Default is empty string which means no cross-study datasets are used.
Link to the latest Zenodo version of the dataset.
string
https://zenodo.org/records/15533857/files/
^https://zenodo.org/records/[0-9]+/files/$
Link to the Zenodo dataset from where pre-supplied datasets like CTRPv2 are downloaded.
Additional options for the pipeline.
False by default (=refitting). By default, we use measures calculated with CurveCurator instead of original measures reported by the authors for the available datasets, or invoke automatic fitting of custom raw viability data with CurveCurator. Set this flag to disable this option.
boolean
By default, measures calculated by CurveCurator (by re-fitting the response curves, see 'measure' option for details) are used for available datasets, which allows better comparability between datasets. When providing a custom dataset (see 'dataset_name' option), we expect a csv-formatted file at <path_data>/<dataset_name>/<dataset_name>_raw.csv (also see 'path_data' option) containing the raw response data. We fit the curves by default with CurveCurator to provide fair comparison to our other available datasets. The fitted data will then be stored at <path_data>/<dataset_name>/<dataset_name>.csv. If you want to disable this option, set the flag.
Optimization metric for the pipeline.
string
Optimization metric for the pipeline. All models will minimize (MSE, RMSE, MAE)/maximize (R^2, Pearson, Spearman, Kendall) this metric calculated on the validation set. Default is RMSE.
Number of cross-validation splits.
integer
10
Number of cross-validation splits. Default is 10.
Response transformation
string
Transformation to apply to the response variable possible values: None, standard, minmax, robust
Model checkpoint directory
string
TEMPORARY
Directory to save model checkpoints.
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
Send plain-text email instead of HTML.
boolean
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string