Threat Intelligence Through Web Scraping

Bright Data CEO Or Lenchner discusses how security teams are utilizing public Web data networks to safeguard their organizations from digital risks.

December 19, 2022

4 Min Read
Dark Reading logo in a gray background | Dark Reading

Threat intelligence plays a key role in the safety and security of any organization’s online activity, and it plays a determining factor in upholding the integrity of their internal infrastructure.

But to be able to assess possible threats across the cybersecurity landscape at scale, they need data — and more importantly, public Web data.

This is because Web data helps security operators better understand the vulnerabilities that may be present within their systems, threats that could originate within the networks of outside organizations as well as potential risks that could target their organization across the World Wide Web.

Public Web Data in Security Research, Intelligence, and Testing

In practice, this collection of public Web data is used to automate checks that can discover the presence of possible malware, phishing links, different forms of fraud, information leaks, and counterfeiting schemes.

Essentially, it provides security operators with the visibility they need to proficiently detect, address, and prevent real live threats or intrusions from affecting their organizational security, by mapping out the potential security gaps that could target various Web-based systems.

Web Data Collection Networks

To obtain this increased visibility, security firms and operators use public Web data collection networks to gather large amounts of information, or Web data, which they then use to identify, monitor, and assess threats in real-time.

The Web data collection networks (IP proxy networks included) provide a risk-free environment to discover how to prevent digital risks from reaching their organization, without sacrificing the integrity of their internal infrastructure.

In order to accomplish this, security operators route requests through Web data collection networks to assess the risk of potentially malicious websites or URLs.

The requests then return information, or Web data. That Web data provides details on how the domain reacted to the request, which then allows the security teams to assess the threat (or lack thereof) and take proper action to mitigate it before it ever reaches their Web-based systems.

Within applications, this method essentially provides a firewall, opening a one-way access to information, considering the manner in which the requests are routed: away from their internal systems — safeguarding their organization’s internal network.

Security Use Cases:

Scraping for Possible Malware and Phishing Schemes Targeting US Banks

The security departments of some of the leading US banks use public Web data collection networks to gather information about possible online threat actors and examine malware.

Additionally, they use Web scraping techniques to continually and automatically scan the public domain for potentially malicious websites or links. For example, security teams can automatically identify different phishing sites that attempt to steal sensitive client or company information such as usernames, passwords, or credit card information.

From here, when an email comes into the organization’s network or a website is approached, the security team already knows the risk parameters attached to it.

Web Scraping for Cybersecurity Firms

A number of cybersecurity firms use Web data collection in order to assess the risk of different domains for malware and fraud.

They generate or purchase lists of potentially malicious domains, and then route DNS requests to each of these links, servers, or websites to see how they react to the request.

This provides the cybersecurity firms with the ability to approach possibly malicious websites as a “victim” or a real user, and see how the website would target an unsuspecting visitor to properly assess the risk.

Threat Research and Mitigation

Threat intelligence firms deploy the use of public Web data collection networks to tap into various sources of information, such as hacker or app forums, public social media channels, blogs, and so on, to identify new leads on various potential threats.

This collection of Web data is foundational to their intelligence insights, which they then share with a wide range of customers looking to bolster their own security operations.

Key Takeaways

Overall, integrating with Web data collection networks enhances an organization’s visibility and ability to deal with digital threats across the vast online landscape in real-time.

This is particularly important considering the recent shifts made in remote working, as well as the broadening of online operations, strategies, and services all around, following the onset of the coronavirus, which only further add to the lists of risks or pathways that can disrupt organizational security.

So, while the task set before security teams has become increasingly more difficult, Web data collection networks have essentially transformed this once complicated ordeal into a much more manageable endeavor by providing options for automation — helping them to target more sources of information, in turn identifying more risks, while protecting the integrity of their own internal systems in the grand scheme of things.

About the Author

Or Lenchner is CEO of Bright Data, a position he has held since July 2018. For the past few years, under his leadership, the company has advanced its product offerings to include first-of-its-kind automated solutions, enabling its over 15,000 customers to collect and receive public data in a matter of minutes. Prior to his career at Bright Data, Lenchner founded and managed several Web-based businesses, developing digital assets and online marketing programs.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights