Open source intelligence (OSINT) is the act of gathering and analyzing publicly available data for intelligence purposes.
What is open source data?
Open source data is any information that is readily available to the public or can be made available by request. OSINT sources can include:
- Newspaper and magazine articles, as well as media reports
- Academic papers and published research
- Books and other reference materials
- Social media activity
- Census data
- Telephone directories
- Court filings
- Arrest records
- Public trading data
- Public surveys
- Location context data
- Breach or compromise disclosure information
- Publicly shared cyberattack indicators like IP addresses, domain or file hashes
- Certificate or Domain registration data
- Application or system vulnerability data
While most open source data is accessed via the open internet and may be indexed with the help of a search engine like Google, it can also be accessed via more closed forums that are not indexed by search engines. Though most deep web content is inaccessible to general users because it lives behind a paywall or requires a login to access, it is still considered part of the public domain.
It is also important to note that there is often a tremendous amount of secondary data that can be leveraged from each open source of information. For example, social media accounts can be mined for personal information, such as a user’s name, birthdate, family members and place of residence. However, the file metadata from specific posts can also reveal additional information such as where the post was made, the device used to create the file and the author of the file.
How is open source data used?
In the context of OSINT, intelligence refers to the extraction and analysis of public data to gain insights, which are then used to improve decision making and inform activity. Traditionally, OSINT was a technique used by the national security and law enforcement communities. However, in recent years it has also become a foundational capability within cybersecurity.
OSINT and Cybersecurity
In the cybersecurity realm, intelligence researchers and analysts leverage open source data to better understand the threat landscape and help defend organizations and individuals from known risks within their IT environment.
OSINT Use Cases in Cybersecurity
Within cybersecurity, there are two common use cases for OSINT:
- Measuring the risk to your own organization
- Understanding the actor, tactics and targets
Measuring Your Own Risk
Penetration testing (aka pen testing, security validation, threat surface assessment or ethical hacking) is the simulation of a real-world cyberattack in order to test an organization’s cybersecurity capabilities and expose vulnerabilities. The purpose of penetration testing is to identify weaknesses and vulnerabilities within the IT environment and remediate them before they are discovered and exploited by a threat actor.
While there are many different types of penetration testing, the two most common within the context of OSINT are:
- External Pen Testing: Assesses your internet-facing systems to determine if there are exploitable vulnerabilities that expose data or unauthorized access to the outside world. The test includes system identification, enumeration, vulnerability discovery and exploitation.
- Threat Surface Assessment: Also known as an attack surface analysis, this is about mapping out what parts of a system need to be reviewed and tested for security vulnerabilities. The point of attack surface analysis is to understand the risk areas in an application, to make developers and security specialists aware of what parts of the application are open to attack, to find ways of minimizing this, and to notice when and how the attack surface changes and what this means from a risk perspective.
- Web Application Pen Test: Evaluates your web application using a three-phase process: reconnaissance, wherein the security team discovers information such as the operating system, services and resources in use; discovery, wherein the security analysts attempt to identify vulnerabilities, such as weak credentials, open ports or unpatched software; and exploitation, wherein the team leverages the discovered vulnerabilities to gain unauthorized access to sensitive data.
Understanding the Actor, Tactics and Targets
Open source data is one of many types of data leveraged by cybersecurity teams as part of a comprehensive threat intelligence capability to understand the actor behind the attack
Threat intelligence is the process through which collected data is analyzed to understand a threat actor’s motives, targets and attack behaviors. Threat intelligence includes the use of open source data and combines it with closed data sources, such as internal telemetry, data gathered from the dark web, and other external sources to gather a more complete picture of the threat landscape.
In general open source data generally lacks the context needed to make it meaningful to security teams. For example, a post on a public message board on its own may not provide any useful information to cybersecurity teams. However, by viewing this activity within the context of a broader collection and threat intelligence framework, it is possible to attribute the activity to a known adversary group, thus adding additional depth and color to their profile that can be used to defend the organization from this specific threat actor.
OSINT: A Two-way Street
Open source information is available to everyone. That means it can also be used for nefarious purposes by threat actors and adversary groups just as easily as it is accessed by cybersecurity professionals or the intelligence community.
One of the most common reasons cybercriminals leverage OSINT is for social engineering purposes. They will often gather personal information of potential victims via social media profiles or other online activity to create a profile of the individual that can then be used to customize phishing attacks. OSINT can also be leveraged for detection evasion, for instance by reviewing publicly disclosed intelligence, threat actors know where organizations may put up defense lines and look for alternate methods of attacks.
Another common technique used by hackers is Google hacking, which is also sometimes referred to as Google dorking. Google hacking involves using Google’s search engine and applications to run highly specific command searches that will identify system vulnerabilities or sensitive information. For example, a cybercriminal can execute a file search for documents that contain the phrase “sensitive but unclassified information.” They can leverage tools to scan for any misconfigurations or security gaps in a website’s code. These vulnerabilities can then be exploited as a point of entry for future ransomware or malware attacks.
Attackers are also known for influencing google searches by setting up a network of fake websites that contain essentially unreliable open source data. Adversaries put this misinformation out in the wild to mislead web crawlers and readers or trick them into distributing malware.
Perhaps the biggest challenge associated with OSINT is managing the truly staggering amount of public data, the likes of which grows daily. Because humans cannot possibly manage so much information, organizations must automate data collection and analysis and leverage mapping tools to help visualize and connect data points more clearly.
With the help of machine learning and artificial intelligence, an OSINT tool can assist OSINT practitioners in gathering and storing large quantities of data. These tools can also find significant links and patterns among different pieces of information.
Further, organizations must develop a clear underlying strategy to define which data sources they want to gather. This will help avoid overwhelming the system with information of limited value or questionable reliability. To that end, organizations must clearly define their goals and objectives as it pertains to open source intelligence.
OSINT Collection Techniques
Broadly speaking, collection of open source intelligence falls into two categories: passive collection and active collection.
- Passive collection combines all available data into one, easily accessible location. With the help of machine learning (ML) and artificial intelligence (AI), threat intelligence platforms can assist in managing and prioritizing this data, as well as dismissing some data points based on rules defined by the organization.
- Active collection uses a variety of investigative techniques to identify specific information. Active data collection can be used ad-hoc to supplement cyber threat profiles identified by the passive data tools or to otherwise support a specific investigation. Commonly known OSINT collection tools include domain or certificate registration lookups to identify the owner of certain domains. Public malware sandboxing to scan applications is another example of OSINT collection.
While there is a tremendous amount of publicly available information that can be leveraged by cybersecurity professionals, the sheer volume of OSINT data —– which is dispersed across many different sources — can make it difficult for security teams to extricate key data points. In addition it is important that the high-value, relevant information gathered through OSINT activity is then integrated with cybersecurity tools and systems.
The OSINT framework is a methodology that integrates data, processes, methods, tools and techniques to help the security team identify information about an adversary or their actions quickly and accurately.
An OSINT framework can be used to:
- Establish the digital footprint of a known threat
- Gather all availability intelligence about an adversary’s activity, interests, techniques, motivation and habits
- Categorize data by source, tool, method or goal
- Identify opportunities to enhance the existing security posture through system recommendations
Issues with Open Source Intelligence
OSINT is regularly utilized by intelligence communities, as well as national security teams and law enforcement to protect organizations and society from threats of all kinds.
However, as noted above, OSINT can also be leveraged just as easily for nefarious reasons by cybercriminals and other threat actors. Further, OSINT has raised debate in recent years as to how information within the public domain can be used safely and responsibly. Some of the most prevalent issues include:
Publicly available information is perfectly legal to access, analyze and distribute. Just remember that it can be used by attackers to support or advance illegal activities by seeding misleading or malicious data into certain communities. Hacktivists especially are known to distribute data publicly to influence public opinions.
While a great deal of information is available online, people and companies must use such information ethically. When using OSINT, practitioners must ensure they are doing so for legitimate purposes and that information is not used to exploit, harass, ostracize or harm others.
A shocking amount of information on private individuals is available in the public domain. In cobbling together information from social media profiles, online activity, public records and other sources, it is possible to develop a detailed profile of a person’s habits, interests and behaviors. While much of the data available about individual consumers has been shared by consumers themselves, they often did so without fully understanding the implications of such activity. The debate rages on as to what information brands and companies should be able to gather and store when consumers use their services, visit their stores or interact online — and how they are able to use the information in the future.