Mitre Att&ck as Context
Introduction:
A common theme of science fiction authors, and these days policymakers and think tanks, is how will the humans work with the machines, as the machines begin to surpass us across many dimensions.
In cybersecurity humans and their systems are at a crossroads, their limitations daily exposed by ever more innovative, aggressive, well-funded, and AI-wielding attackers.
The humans are burned out. Dissatisfied by their cybersecurity careers and their day-to-day work.
And the machines are all too often either too literal — a set of if-then statements packaged into thickets of rules — or too brittle — seeing useful patterns from the noise, but only when carefully tuned, oiled, and interpreted by overstretched humans with rare expertise.
For years those of us building tools for defenders have been building point solutions for new attack vectors, and hoping that we could catch up with ever-widening and complex attack surfaces of enterprises, utilities, governments, service providers, and all of us humans.
We have built DeepTempo and our LogLMs to buttress the eroding foundations of cybersecurity. Our results thus far show accuracy and resilience superior to that of rules and traditional ML-based incident identification both in terms of accuracy and adaptability.
In this blog, I turn my attention to another dimension — explainability. We must bring along with us humans, and give the professionals protecting us all a way to interpret the results of these deep learning models. Otherwise, our human defenders will not trust the insights of the models.
As one of our first advisors, Chris Bates — who helped build SentinellOne as CISO and chief trust officer — reminds us:
“A smart black box could be useful — but it risks becoming a curiosity.”
We won’t buttress our defenses’ foundations with only curiosity.
Explainability: by design
If you get the data model wrong — a rewrite is in your future.
Technical debt is inevitable, but the wrong data layer and data model will quickly bankrupt even the most overfunded start-up.
So before we built our LogLMs — and in so doing created a new vocabulary and approach to tokenization — we thought a lot about several key attributes including explainability.
Explainability is built into the foundations of our Tempo LogLM. Much like the construct of a sentence can be useful in certain LLMs, we use human interpretable sequences within our model. When our Tempo LogLM suggests that a particular set of events is worrisome, it gives you a UID that maps back to a conversation between entities in your environment — it shows which sentence seems to have a grammatical error and this conversation is saved by the model. This enables a user experience our early users enjoy. It fits very well into their existing SIEMs, and allows them to use those SIEMs to add a lot of relevant context, including external and internal threat intelligence.
You can see the value of this UID tying back to the underlying data on our free-to-try Snowflake Tempo NativeApp. You can see our partner and user exploring this capability in the following demo video:
https://medium.com/media/20e20a4796f38786882dbe4ff9a8253f/href
We also realized that deep learning could help us to solve the explainability challenges of deep learning. We decided to use deep learning to translate the insights of the model back into the language of the security operations center. So far we have done this translation in two ways:
- Entity resolution and grouping
- Mitre Att&ck mapping
I’ll leave the internals of entity resolution and grouping for a later date; we have an engineering-focused blog on the way and are hopeful that these blogs will help others consider building and adapting their own purpose-built models. The short summary is foundation models like our Tempo learn the meaning of the nouns within each sentence of communications they examine. Just as an LLM can distinguish between a Queen ruling a country vs. the most powerful piece in Chess, so too can our LogLMs learn which sort of mail server might be in a given sequence, and inform its expectations about the behavior of that mail server based on its experience with hundreds of thousands of mail servers. Concretely, we tag our sequences with the types of entities included, helping our end users to understand what they are looking at — is it email servers, or just a particular few email servers, for example that are behaving strangely.
Mitre Att&ck mapping:
Today we are announcing that our Tempo LogLM now maps concerning anomalies to the most likely Mitre Att&ck sequence.
Using only network logs, which themselves are of course metadata, the model can see whether a particular type of reconnaissance is occurring, or perhaps lateral movement, or even exfiltration. Starting today our Tempo NativeApp on Snowflake now adds the closest Mitre Att&ck or Att&cks to all stored sequences and the underlying embeddings, which are massively smaller than the underlying logs.
This allows the security operations teams to run workflows and investigations informed in part based on their understanding of Mitre Att&ck. Many organizations have prepared plans to respond to particular attack methods. They now can invoke these plans quickly once the alert is fired from Tempo to their SIEM, cutting minutes or hours off of their meantime to respond.
We are in the course of providing an additional free way to try out Tempo — focused Mitre Att&ck mapping — and we will shortly be open-sourcing the raw materials we used to add these capabilities. Please watch this space — stay tuned.
Conclusion:
The future is here now. The need to help the humans to keep up with the machines has already arrived.
In cybersecurity we need to get much more intelligent — immediately — since we know our attackers are too often winning and are increasingly using LLMs to do so. Sometimes it seems like the dystopian future of Leave the World Behind is unfolding on our front pages.
The collective wisdom embedded in Mitre Att&ck — the many thousands of hours of effort that went into collecting and designing this taxonomy — offers a way for us to use deep learning to explain the insights of deep learning-based LogLMs.
Try it out today. As always — you can find Tempo on the Snowflake NativeApp marketplace where we even make available an example data set for initial evaluation.
https://app.snowflake.com/marketplace/listing/GZTYZOYXHP3/deeptempo-cybersecurity-tempo
We hope you’ll see that collective defense via deep learning is useful today as a cost-effective way to provide more defense in depth, making better sense out of streams of logs than traditional rules-based indicators or brittle ML models allow.
Anomalies are not Enough was originally published in DeepTempo on Medium, where people are continuing the conversation by highlighting and responding to this story.
The post Anomalies are not Enough appeared first on Security Boulevard.