3-2.Data Science Methodology 2주차

2020. 5. 19. 22:59자기계발/Coursera

Sorting data is not part of the Data Understanding stage.

 

Data preparation - Cleansing data 

Transforming data in the data preparation phase is the process of getting the data into a state where it may be easier to work with.

The Data Preparation stage is in fact the most time-consuming phase of a data science project.

The target variable was congestive heart failure (CHF) with 30 days following discharge from CHF hospitalization.

 

 

# 2주차- (1) 문제풀이

 

Q. The Data Understanding stage refers to the stage of removing redundant data.

-> False

 

Q. In the case study, working through the Data Preparation stage, it was revealed that the initial definition was not capturing all of the congestive heart failure admissions that were expected, based on clinical experience.

-> False

 It was working through the Data Understanding staging that the initial definition was found to be incomplete.

 

Q. The Data Preparation stage involves correcting invalid values and addressing outliers.

-> True

- The Data Preparation stage involves addressing missing values.

- The Data Preparation stage involves removing duplicate data.

- The Data Preparation stage involves properly formatting the data.

 

Q. During the Data Preparation stage, clients and stakeholders aggregate the data and merge them from different sources, enabling data scientists to use clean data in the analysis.

->False

- During the Data Preparation stage, data scientists define the variables to be used in the model.

- During the Data Preparation stage, data scientists determine the timing of events.

- During the Data Preparation stage, data scientists aggregate the data and merge them from different sources.

- During the Data Preparation stage, data scientists identify missing data.

 

Q. The Data Preparation stage is a very iterative and complicated stage that cannot be accelerated through automation.

-> False

 


 

# Modeling

 

 

 

 

 

 

 

 

 

 

 

 

ROC stands for Receiver Operating Characteristic curve, which was originally developed to detect enemy aircrafts on radar.

-> The ROC curve is a useful diagnostic tool for determining the optimal classification model.

-> By plotting the true-positive rate against the false-positive rate for different values of the relative misclassification cost, the ROC curve can be used to select the optimal model.

 

'자기계발 > Coursera' 카테고리의 다른 글

3-4.Data Science Methodology Final Assignment  (0) 2020.05.22
3-3.Data Science Methodology 3주차  (0) 2020.05.22
3-1.Data Science Methodology 1주차  (0) 2020.05.14
2. IBM Data Science  (0) 2020.05.10
1. The Data Scientist’s Toolbox  (0) 2020.05.10