| ... | @@ -22,8 +22,9 @@ Options: |
... | @@ -22,8 +22,9 @@ Options: |
|
|
-o, --outlier FLOAT outlier threshold in standard deviations.
|
|
-o, --outlier FLOAT outlier threshold in standard deviations.
|
|
|
--help Show this message and exit.
|
|
--help Show this message and exit.
|
|
|
```
|
|
```
|
|
|
|
## Options further explanation
|
|
|
|
|
|
|
|
**outlier threshold**
|
|
### -o, --outlier, **Outlier Threshold**
|
|
|
|
|
|
|
|
This input field is related with the outlier detection for numerical variables of the incoming dataset. The way that the Data Quality Control tool handles the outlier detection of a certain numerical variable, is that first calculates the **mean** and the **standard deviation** based on the valid values of that column and then calculates the upper and the lower limit by the formula:
|
|
This input field is related with the outlier detection for numerical variables of the incoming dataset. The way that the Data Quality Control tool handles the outlier detection of a certain numerical variable, is that first calculates the **mean** and the **standard deviation** based on the valid values of that column and then calculates the upper and the lower limit by the formula:
|
|
|
|
|
|
| ... | @@ -64,6 +65,18 @@ Options: |
... | @@ -64,6 +65,18 @@ Options: |
|
|
--help Show this message and exit.
|
|
--help Show this message and exit.
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
## Options further explanation
|
|
|
|
|
|
|
|
### --max_levels
|
|
|
|
|
|
|
|
In the infer option section, we give the number of rows that the tool will based on for the schema inference. Also, we declare the maximum number of categories that a `nominal` MIPType variable can have.
|
|
|
|
|
|
|
|
### --cde_file,--threshold, **CDE Dictionary support**
|
|
|
|
|
|
|
|
If we choose the Data Catalogue's excel as an output, the tool offers the option of suggesting CDE variables for each column of the incoming dataset. This option is possible, only when a CDE dictionary is provided. This dictionary is an excel file that contains information for all the CDE variables that are included or will be included in the MIP (this dictionary will be available in the Data Catalogue in the near future).
|
|
|
|
|
|
|
|
The tool calculates a similarity measure for each column based on the column name similarity (80%) and the value range similarity (20%). The similarity measure takes values between 0 and 1. With the option **similarity threshold** we can define the minimum similarity measure between an incoming column and a CDE variable that need to be met in order the tool to suggest that CDE variable as a possible correspondence. The tool stores those CDE suggestions in the excel file in the column named **CDE** and also stores the corresponding concept path under the column **conceptPath**.
|
|
|
|
|
|
|
## Supported schema formats
|
|
## Supported schema formats
|
|
|
|
|
|
|
|
The schema could be saved in two formats:
|
|
The schema could be saved in two formats:
|
| ... | @@ -71,9 +84,10 @@ The schema could be saved in two formats: |
... | @@ -71,9 +84,10 @@ The schema could be saved in two formats: |
|
|
1. Frictionless spec json
|
|
1. Frictionless spec json
|
|
|
2. Data Catalogue's spec Excel (xlsx) file, that can be used for creating a new CDE pathology version.
|
|
2. Data Catalogue's spec Excel (xlsx) file, that can be used for creating a new CDE pathology version.
|
|
|
|
|
|
|
|
In the infer option section, we give the number of rows that the tool will based on for the schema inference. Also, we declare the maximum number of categories that a `nominal` MIPType variable can have.
|
|
|
|
|
|
|
|
|
|
If we choose the Data Catalogue's excel as an output, the tool offers the option of suggesting CDE variables for each column of the incoming dataset. This option is possible, only when a CDE dictionary is provided. This dictionary is an excel file that contains information for all the CDE variables that are included or will be included in the MIP (this dictionary will be available in the Data Catalogue in the near future). The tool calculates a similarity measure for each column based on the column name similarity (80%) and the value range similarity (20%). The similarity measure takes values between 0 and 1. With the option **similarity threshold** we can define the minimum similarity measure between an incoming column and a CDE variable that need to be met in order the tool to suggest that CDE variable as a possible correspondence. The tool stores those CDE suggestions in the excel file in the column named **CDE** and also stores the corresponding concept path under the column **conceptPath**.
|
|
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# DICOM MRI metadata validation
|
|
# DICOM MRI metadata validation
|
|
|
|
|
|
| ... | | ... | |