Regression…

Regression is not a topic that I originally spent much time thinking about but as I started to get more involved with analyzing large dataset, I found myself learning more about regression so I coud extract the most out of a dataset. I have noticed one thing as I have worked with data scientiests from different backgrounds.
Those who come from a statistics or math background are very concerned with the coefficient estimation while people from an applied science background only concentrate on the overall model quality. Part of this might be due to how the two groups use the model after fitting - the more theoretical science backgrounds are interested in calculating probabilities and odds from the model and the applied science backgrounds are interested in reproduceability and forecasting.
Here are some important skills when performing linear regression that I learned from the theoretical scientists:
- Incorporating data types into models
- Continuous
- Binary
- Categorical
- Non-linear
- Evaluate Model with Residuals * Normality Assumption Check - Plot Y estimated versus Y known * Independence of Residuals - Plot Residuals versus Y estimated * Verify Constant Variance (Homoscedasticity) - Plot Residuals versus Y estimated * Check for Multicolinearity - Calculate VIF
- Feature Diagnostics * Forward Selection * Backward Selection * Stepwise Regression * Use Subject Matter Expert (SME) Input to find important factors
- Once each feature shows significance, perform additional analysis * Calculate DFBETAS, DFFITs, Cook’s D * R^2, MAE, Mean Squared Error (MSE) * Plot Results
From my engineering experiences we would also perform a sensitivity assessment. This could be done by inputing a lot of inputs or by creating extreme case studies.