Database Design and Analysis

The following database project will create an educational attainment “demand” forecast for the state of California for years greater than 2010. The demand forecast is the expected number of population who have obtained a certain level of education. The population is divided into age groups and education attainment is divided into different levels. The population of each group is estimated for each year up to year 2050. Implement the following steps to obtain and educational demand forecast for the state of California. The files can be downloaded below.

Create a ca_pop schema in your MySQL database.
Using your ca_pop schema, create an educational_attainment table which columns match the columns in the Excel spreadsheet ca_pop_educational_attainment.csv.
Using your ca_pop schema, create a pop_proj table which columns match the columns in the Excel spreadsheet pop_proj_1970_2050.csv.
Using the data loading technique for a csv file, load the data in ca_pop_educational_attainment.csv into the table educational_attainment.
Using the data loading technique for a csv file you learned in Module 1, load the data in pop_proj_1970_2050.csv into the table pop_proj.
Write a query to select the total population in each age group.
Use the query from Step 6 as a subquery to find each type of education attained by the population in that age group and the fraction of the population of that age group that has that educational attainment. Label the fraction column output as coefficient. For instance, the fraction of the population in age group 00 – 17 who has an education attainment of Bachelor’s degree or higher is 0.0015, which is the coefficient.
Create a demographics table from the SQL query from Step 7.
Create a query on the pop_proj table which shows the population count by date_year and age.
Use that query from Step 9 as a subquery and join it to the demographics table using the following case statement:

demographics.age =
when temp_pop.age < 18 then ’00 to 17′
when temp_pop.age > 64 then ’65 to 80+’
else ’18 to 64′

“temp_pop” is an alias for the subquery. Use the following calculation for the demand output:

round(sum(temp_pop.total_pop * demographics.coefficient)) as demand

Output the demand grouped by year and education level.
Write each query you used in Steps 1 – 8 in a text file. If a query produced a result set, then list the first ten rows of each row set after the query. Bundle your lessons learned report, your queries and your query results text file, and your MySQL query explanations from both before and after adding table indexes into a single zip file and submit that zip file as your final portfolio.


  • pop_proj_1970_2050.csv




0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *