Robust Compensation Modeling Using (SVR)
The client, an enterprise human resources department, needed a mathematical model to benchmark and predict fair salaries across their corporate hierarchy. With 10 defined organizational levels ranging from Business Analyst (Level 1) to CEO (Level 10), the objective was to establish a data-driven compensation curve. The model needed to account for the exponential leap in executive pay without allowing those extreme outliers to artificially inflate the salary bands of mid-level managers.

Client
Challenge
Modeling this dataset presented a classic machine learning challenge: Severe Non-Linearity: Salary growth was not strictly linear; there was a staggering 22.2x multiplier between entry-level ($45,000) and the CEO ($1,000,000). The Outlier Effect: High-level executive compensation (Levels 9 and 10) spiked dramatically. Standard linear models failed to capture this curve, while high-degree Polynomial models simply memorized the data (overfitting), allowing the $1M CEO salary to severely distort the predictive curve for everyone else. Small Data Size: With only 10 distinct position levels available for training, traditional algorithms prone to high variance would struggle to generalize.
Goal
Workflow Overview: Feature Scaling: Because SVR is highly sensitive to the magnitude of features, I applied standard scaling to both the position levels and the salary figures to ensure the model processed the geometry of the data correctly. Kernel Evaluation: I tested multiple algorithmic transformations (Kernels) to project the data into higher dimensions where a linear hyperplane could separate it. Linear Kernel: Underfitted the curve ($R^2 = 0.4538$). Sigmoid Kernel: Failed to capture the underlying trend entirely ($R^2 = 0.3874$). Polynomial Kernel: Captured the curve better but risked volatility ($R^2 = 0.6454$). RBF Selection & Optimization: I ultimately selected the Radial Basis Function (RBF) Kernel, which inherently handles complex, non-linear relationships smoothly. Strategic Outlier Suppression: I optimized the SVR hyperparameters specifically to treat the Level 10 CEO salary as an anomaly.
Result
The implementation proved that a perfect statistical score isn't always the best business solution. The SVR model prioritized robustness over memorization: Avoided Overfitting: While a Degree-4 Polynomial Regression achieved a near-perfect $R^2$ of 0.9974, it was entirely warped by the $1M outlier. My optimized SVR model deliberately ignored the CEO outlier to protect the integrity of the curve. Stabilized the Mid-Market: By allowing the SVR algorithm to "miss" the CEO salary prediction (predicting ~$400k instead of $1M), the model generated highly accurate, realistic, and smooth salary predictions for Levels 1 through 8. Handled Small Data Beautifully: The RBF kernel successfully extrapolated the complex, exponential trend using only 10 data points, proving the efficiency of Support Vector Machines on small datasets.




Available