Although automation and machine learning (ML) have been used in information security for almost two decades, experimentation in this area continues non-stop. Security professionals must combat increasingly sophisticated cyber threats and a growing number of attacks without a significant increase in budget or staff. On the positive side, AI significantly reduces the workload of security analysts, while speeding up many phases of incident management, from detection to response. However, a number of seemingly obvious areas of ML application are underperforming.
AI-powered cyber threat detection
To oversimplify, there are two basic – and long-tested – ways to apply ML:
- Attack detection. By training AI on examples of phishing emails, malicious files, and dangerous application behaviors, we can achieve an acceptable level of threat detection. similar The main pitfall is that this area is very dynamic, with attackers constantly inventing new methods of disguise. Therefore, the model requires frequent retraining to maintain its effectiveness. This requires a labeled dataset, that is, a large collection of recent, verified examples of malicious behavior. An algorithm trained in this way will not be effective against fundamentally new and novel attacks. Additionally, there are some difficulties in detecting attacks that rely entirely on legitimate IT tools (LotL). Despite these limitations, most infosec vendors use this method, which is very effective for email analysis, phishing detection, and identifying certain classes of malware. That said, it does not promise complete automation or 100% reliability.
- Anomaly detection. By training the AI on “normal” server and workstation activity, we can identify deviations from this norm, for example when an accountant suddenly starts performing administrative actions with the email server. The pitfalls here are that this method requires (a) collecting and storing large amounts of telemetric data, and (b) regularly retraining the AI to keep up with changes in IT infrastructure. Even then, there will be many false positives (FPs) and no guarantee of attack detection. Anomaly detection should be tailored to each specific organization. Using such a tool therefore requires highly skilled people in cybersecurity, data analysis and ML. And these invaluable employees must provide 24/7 system support.
The philosophical conclusion we can draw so far is that AI excels at routine tasks where the domain and characteristics of the object change slowly and rarely: writing coherent texts, recognizing dog breeds, etc. Where a human mind actively resists training data, statically configured AI gradually becomes less and less effective. Analysts refine AI instead of creating rules for detecting cyberthreats: the scope of work changes, but contrary to a common misconception, no labor savings are made. Moreover, the desire to improve AI threat detection and increase the number of true positives (TPs) inevitably leads to an increase in the number of FPs, which directly increases human workload. Conversely, attempting to reduce FPs to near zero also results in a decrease in the number of TPs, thereby increasing the risk of missing a cyberattack.
As a result, AI has a place in the detection toolbox, but not as a silver bullet that can solve all cybersecurity detection problems, nor can it operate completely autonomously.
AI as a partner of the SOC analyst
AI cannot be entirely responsible for searching for cyber threats, but it can reduce human workload by independently analyzing simple SIEM alerts and assisting analysts in other cases:
- Filtering false positives. Having been trained on SIEM alerts and analyst verdicts, the AI can filter FPs quite reliably: our Kaspersky MDR The solution enables a reduction in SOC workload of approximately 25%. Check out our next article for more details on this “automatic analysis” implementation.
- Prioritization of alerts. The same ML engine does not just filter FPs; it also assesses the likelihood that a detected event indicates serious malicious activity. These critical alerts are then transmitted to experts for priority analysis. Alternatively, the “threat probability” can be represented as a visual indicator, helping the analyst prioritize the most important alerts.
- Anomaly detection. AI can quickly alert on anomalies in the protected infrastructure by tracking phenomena such as an increase in the number of alerts, a sharp increase or decrease in the telemetry flow of certain sensors, or changes in its structure.
- Detection of suspicious behavior. Although I search arbitrary anomalies in a network cause significant difficulties, certain scenarios lend themselves well to automation and in these cases ML outperforms static rules. Examples include detecting unauthorized use of accounts from unusual subnets; detect abnormal access to file servers and analyze them; and looking ticket-passing attacks.
Major Language Models in Cybersecurity
As the hottest trending topic in AI, large language models (LLM) have also been widely tested by information security companies. Leaving aside cybercriminal activities such as generating phishing emails and malware using GPT, we note these interesting (and numerous) experiences leveraging LLMs for routine tasks:
- Generate detailed descriptions of cyber threats
- Writing incident investigation reports
- Fuzzy search in data archives and logs via chats
- Generate tests, test cases and code for fuzzing
- Initial analysis of decompiled source code by reverse engineering
- Deobfuscation and explanation of long command lines (our MDR service already uses this technology)
- Generate tips and tricks for writing detection rules and scripts
Most linked documents and articles describe niche implementations or scientific experiments, so they do not provide a measurable performance assessment. Moreover, search available on the performance of qualified employees helped by LLMs shows mixed results. Therefore, such solutions should be implemented slowly and in stages, with a preliminary assessment of the savings potential and a detailed assessment of the time invested and the quality of the result.