If you work in cybersecurity, you can’t ignore the warning signs of impending threats. New cyberthreats, new adversaries, and new hacking tools emerge every day, each more sophisticated than the last. The volume of cybersecurity data is skyrocketing, overwhelming professionals with noise. Meanwhile, the number of defenders is dwindling, while the dark side is growing stronger with the backing of organized crime and government funding. What could save the world? Superheroes, sure, but behind every successful superhero is a tech sidekick—in the case of cyberthreat hunters, a data scientist.
This is the twelfth post in our blog series “Rise of the Threat Hunter.” By now, you’re well acquainted with our superheroes, the Threat Hunters. Learn more about the series and catch up on previous posts Check out our introduction to the series or read last week’s post «Equipping Threat Hunters: Advanced Analytics and AI, Part 1.”
Data scientists among threat hunters
While at first glance, data scientists may seem disconnected from threat hunters, their complementary skills can elevate a threat hunting team from good to great. Cyber threat hunting involves more than just technical skills and procedural compliance. It requires an analytical, investigative mindset and a creative approach. Interestingly, these same skills are essential for a successful data scientist. This is one reason why data scientists and threat hunters work so well together.
Data scientists spend their days analyzing data, uncovering hidden patterns and insights that can’t be found through traditional methods. They are professionals with diverse expertise in computer science, mathematics, statistics, and machine learning. Collaboration between threat hunters and data scientists is essential for effective cyber threat investigations. Threat hunters identify what to look for, while data scientists determine how to extract these signals from vast and complex data. Well-formulated problems, along with carefully cleaned and prepared data, can unlock data insights on a whole new level that are inaccessible through direct search and query of the original data. This can mean the difference between missing the compromise or catching it early in its progression before any damage is done.
Skilled operator or simple button pusher
Cybersecurity tools have changed. Continuous advancements have dramatically improved speed and efficiency by incorporating the latest technological advancements such as machine learning, intelligent asset discovery, and entity resolution. However, these advancements have led to an increase in the sophistication and complexity of stopping threats.
As a result, security tools have become less intuitive and self-explanatory. This increased complexity raises concerns that already busy threat hunters will go from being experienced operators of advanced tools to mere button pushers. This isn’t because they don’t want to learn how to use the tool, but rather because most threat hunters don’t have the time to specialize in a particular tool in their security stack.
Combining data scientists and threat hunters in a cyber defense team could solve this problem. Data scientists not only know a variety of methods to extract insights from data, but also deeply understand the common weaknesses and pitfalls of these methods. More importantly, they understand the weaknesses of the data on which these methods are applied and how to overcome them.
Consider prompt engineering. Grammar, length, tone, and sentence structure can mean the difference between getting a relevant answer or not getting an answer at all. Even worse than not getting an answer is getting a wrong answer, also known as hallucination. Data scientists can work with threat hunters to iteratively design effective prompts using their domain and data knowledge. As a team, threat hunters and data scientists can efficiently generate high-quality results.
The Entity Resolution Problem
Another example of effective teamwork between threat hunters and data scientists is entity resolution, which involves associating all events and behaviors of a single entity with that entity. This problem consists of two parts. First, we need to define what constitutes an entity, including determining the best level of granularity and separation methods. Second, we need to maximize the association between an entity and all its activities recorded across various data sources and entity representations. While this may seem straightforward, how both problems are solved will have a direct impact on the quality of the results the data produces.
Let’s say an organization has a system administrator with two accounts. One is a standard user account that is used for day-to-day tasks, and the other is their privileged administrative account. Should we join these two accounts under one umbrella or keep them separate? From a threat hunter’s perspective, both accounts belong to the same employee, so it seems natural to join them. However, from a data science and behavioral analysis perspective, these two accounts have different functions, use different processes, have different permissions, transfer different volumes of data, and even have different activity patterns. Furthermore, when comparing their behavior to their peers, to get the best results, the administrator account should be compared to other administrative accounts, and the standard user account should be compared to other standard user accounts. Running PowerShell scripts that add or remove users or escalate user privileges is perfectly normal for an administrator, but extremely abnormal for a standard user.
Integrating Data Science into the Threat Research Process
At this point, I hope you’re convinced that your threat hunting team needs data scientists. You have several options for acquiring one: you can hire a new member of your data science team, use external data science consultants or paid services, or cultivate talent from among your current cybersecurity team members. Once your multidisciplinary team is assembled, it can start by tackling one use case at a time. Having a diverse team of expertise will be an advantage, as ideas for solutions and use cases will come from both the threat hunters and the data scientists on your cyber defense team.
In my experience working with our threat hunting team, threat hunting professionals in the field typically provide details about the use cases they see in the field, as well as the methods they use to solve them. Once threat hunters have described the intricacies of the use case, the methods used to detect it, and the limitations of currently available tools, data scientists can dive into the data. They consider various aspects of the problem and the available data, sometimes approaching it like a math or logic problem. This creates a foundation for brainstorming new detection methods, testing and validating them, and ultimately integrating these new solutions permanently into the frontline defenders’ arsenal.
A member of the data science team may also propose a new use case based on new algorithms, computational methods, or data sources. When this happens, the data scientists share their findings with the threat hunters and together they brainstorm whether the findings have value and how they can improve current solutions. For example, the data science team might notice that command-line inputs exhibit patterns similar to natural language and suggest that the threat hunters use language-based models to discover reconnaissance attacks. Through close collaboration with the threat hunters, a new and effective method for discovering command-line reconnaissance is developed, with the idea tested and evaluated over multiple iterations.
Conclusion
Data scientists help threat hunting teams work smarter, not harder, and when properly integrated and engaged, can help reduce the burden on our threat hunting superheroes. Ideas that bounce back and forth between team members over multiple cycles naturally grow, improving coverage of the vast array of cybersecurity use cases. Such a flow of ideas should never stop, because this race to secure an organization is never over and the defense team is never “done,” but that’s the challenge and beauty of the job.
Learn more about OpenText cybersecurity
Ready to equip your threat hunting team with products, services, and training to protect your most valuable and sensitive information? Discover our cybersecurity portfolio for a modern portfolio of complementary security solutions that provide threat hunters and security analysts with 360-degree visibility into endpoints and network traffic to proactively identify, triage and investigate anomalous and malicious behavior.