With increasing amount of strong motion data, Ground Motion Prediction Equation (GMPE) developers are able to quantify empirical site amplification functions (delta S2S(s)) from GMPE residuals, for use in site-specific Probabilistic Seismic Hazard Assessment. In this study, we first derive a GMPE for 5% damped Pseudo Spectral Acceleration (g) of Active Shallow Crustal earthquakes in Japan with 3.4 <= M-w <= 7.3 and 0 <= R-JB <= 600km. Using k-mean spectral clustering technique, we then classify our estimated delta S2S(s)(T = 0.01 - 2s) of 588 wellcharacterized sites, into 8 site clusters with distinct mean site amplification functions, and within-cluster site-tosite variability similar to 50% smaller than the overall dataset variability (phi(S2S)). Following an evaluation of existing schemes, we propose a revised data-driven site classification characterized by kernel density distributions of V-s30, V-s10, H-800, and predominant period (T-G) of the site clusters.