From the course: Data Cleaning in Python Essential Training
Unlock this course with a free trial
Join today to access over 25,400 courses taught by industry experts.
Challenge: ETL - Python Tutorial
From the course: Data Cleaning in Python Essential Training
Challenge: ETL
(upbeat techno music) - [Instructor] So we have some traffic information. So we have the IP of the incoming request, the time, the path on the server, such as /images, the status, which is the HTP code, and the size of the return data. You need to create an ETL from this CSV file into an sqlite3 database. You should drop and report invalid row. An IP should be a valid IP. See the IP address module. Time must not be in the future. Path cannot be empty. The status code must be a valid HTP status code. See HTP status request enumerate for list of status codes. And size cannot be negative or empty. At the end of the ETL, report the percentage of bad rows and fail the ETL if there are more than 5% bad rows.