Connection Request Spam [LinkedIn Machine Learning Interview Question]


At LinkedIn, maintaining the integrity of people's professional network is of utmost importance, and as such, there is a model in place which identifies spammy connection requests.

Your team has spent the last two months prototyping a model which you hope performs better because it incorporates a new 3rd-party dataset/API which helps flag IP addresses and browser fingerprints as risky.

The company built their dataset/API by helping thousands of other companies with their spam and fraud issues, and have aggregated data about bad actors across the internet to create an almost global internet blacklist. However, using this 3rd-party IP address/browser fingerprinting data would cost $2M per year if LinkedIn puts the model into production.

How would you determine if it's worth buying access to this 3rd party solution?