Making Web Scraping Ethically Compliant and Less Intrusive
By Neil Emeigh, CEO & Founder — Rayobyte
More and more companies are diving into the web scraping space and selling proxies. In fact, Proxyway reveals that proxy traffic increased between 60% and 70% in 2021. This growth is easy to explain: digital marketers and researchers in data science need massive amounts of information, and web scraping is the most effective way for them to get it.
Ethical questions around web scraping center around residential proxies
Why does web scraping necessitate the sale of residential proxies? The internet is a boundless source of data, but web-scraping detection software often blocks anyone attempting to harvest a significant amount of information through one IP address.
To prevent this, providers enable web scrapers to conceal their online identities and funnel the information they want through intermediary devices called residential proxies. When our customers connect through proxies, they do their work under new IP addresses, making it less likely that sites will block them.
I’ve been using proxies for years. When I first entered the space, I ran into plenty of providers with poor uptime and even worse customer service. From the very beginning, I could see these providers were not interested in running an upstanding operation that was safe for customers, so I made it my goal to hold the industry to a higher standard.
As proxy providers, we have a massive impact on what this market looks like. Unfortunately, the unethical tactics of many providers have given our industry a bad reputation. Two concerns must be considered as we attempt to define ethical web scraping: first, we must promote the ethical acquisition of residential proxies; and second, we must promote the ethical usage of those proxies.
How to promote the ethical acquisition of residential proxies
Ethical acquisition primarily applies to residential proxies. These are the IP addresses that come from regular people’s laptops, cell phones, and other devices.
IP addresses are valuable because they are virtually impossible to ban. The problem is that providers frequently trick users into signing away their IP addresses with long and complicated terms of service that give providers the right to use IP addresses in any way they choose. Even worse, some providers use a script to take IPs without users ever knowing.
We have set the industry standard for obtaining residential proxies ethically, and we hope other providers will follow. First, we ensure that everyone in our residential proxy pool is aware that we are using their IP address by reminding them once every month. At any time, they have the option to opt-out with no questions asked. We make our presence less intrusive for residential proxies by only employing devices that are connected to Wi-Fi, plugged in or over 50% power, and not in current use. Finally, we compensate residential proxies to ensure that everyone involved receives a benefit. A truly ethical approach means that no one is fooled or gives away something for free.
In this industry, finding IP sources is not difficult, but finding quality IP sources and obtaining them ethically is. After all, providers employ shady acquisition tactics because it’s easier. In the long run, however, I firmly believe it is better to go the extra mile and find users willing to let you use their IP as a proxy, keep them in the know, and compensate them fairly.
Promoting ethical usage of residential proxies
Ethical usage is all about maintaining control over how people use proxies. To the average person, proxies bring hackers and botnets to mind.
The truth is that web scraping boils down to research — it is the same as looking at websites, social media pages, and search engine listings. The only catch is that, instead of doing it one at a time, a bot scales the process and shares relevant data. All search engines are built on scraping, and it is hard to find an e-commerce company that is not benefiting from this method of data collection.
We need to change the public’s perception of web scraping, which begins by prohibiting users from sketchy activities. We protect our residential proxy pool by refusing to sell their IPs outright and stringently vetting the companies who will be using them. Each of our customers must demo products and be vetted to demonstrate they are legitimate businesses.
We also employ rigorous automated and manual checks to confirm no one is accessing private data in shady ways with our proxies. First, we define the use cases under which users can access proxies, and we employ proprietary monitoring software to shut down use cases that do not comply with those standards. In addition, our 24/7 technical team performs manual spot checks to keep tabs on risky behavior. Finally, we set preventative measures, such as restricting our customers’ accounts to the domains they said they would access.
We believe in ethical data collection. We also believe that proxies are a valuable part of our world’s tech infrastructure. They let individuals reclaim privacy and security online, and they allow companies of any size to take advantage of big data. Most proxy providers set limits to ensure their scraping is legal. We need a movement toward obtaining and using residential proxies ethically throughout the entire process.