Clean data
Clean Data.
Script to clean data from dataset and create new excel sheet with data.
Can be ran from command line
python3 clean_data.py {input_file} {output_file}
Or import clean_data function
from clean_data import clean_data
clean_data(data_frame)
Clean data.
Takes data_frame object and cleans data by using 'applymap' to apply a lamda function to all DataFrame values
1st Lambda function converts values to strings, replaces instances of ND with empty string, and strips whitespace and quotation marks.
2nd Lambda function converts values that should be numeric back to numeric values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_frame(object) |
Pandas DataFrame object |
required |
Returns:
| Type | Description |
|---|---|
object
|
clean_data_frame(object) Pandas DataFrame object |
Source code in report_generator/location_formatter/clean_data.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | |
create_data_frame(path_to_dataset)
Create data frame.
Loads excel file and returns Pandas DataFrame obj
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_to_data_set(str) |
file path string |
required |
Returns:
| Name | Type | Description |
|---|---|---|
data_frame |
object
|
Pandas DataFrame object |
Source code in report_generator/location_formatter/clean_data.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
main(input_file_name, output_file_name)
Clean data main.
Takes input_file_name and output_file_name. Loads data from input_file Cleans data Saves as output_file
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_file_name |
required | ||
output_file_name |
required |
Source code in report_generator/location_formatter/clean_data.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |