Recognizing if you are asking an inferencial question or prediction question is critical to choosing a model Oand algorithm. Let’s build awareness of the difference.
- Association between response and key variables while adjusting for confounders/interactions.
- Normally there are few variables that highly impact the response (think pareto law).
- Estimate association of variables to response is key for modeling
- Sensiviity analysis can check assoication.
- Develop a model that best predicts response
- Use all available information, not just few key variables (no variable is favored)
- Not much focus on variables assoication with outcome
- Just want to find a model form with high accuracy and low error.
Another angle on this when doing data analysis, let’s imagine a “data generating process”, and inference refer to learning about the structure of this process from the data. But prediction means being able to actually forecast the data that come from the process. They can go hand in hand.
Here is an example: Let’s say you want to estimate the rate of increase in house value based on square footage. This is estimating how square footage is assoicated with house value. This is inference. Now prediction would be calculating house value based on many variables (# of bedrooms, # of baths, square footage, etc).
Caution – There are types of models where one is able to make sensible predictions, but the model doesn’t lead to useful insights (e.g. black box). Sophicasted ensembling models can lead to good predictions but are hard to explain or understand. The opposite models are like linear regression, where you know if one variable increasing by certain amount it directly changes the estimation of response or outcome (e.g. house value).