Making Threat Graph Extensible: Leveraging the Intermediate Representation to Generate Go Code (Part 2 of 2)

In our earlier post, Making Threat Graph Extensible: Leveraging a DSL to Improve Data Ingestion (Part 1 of 2), we explored how and why CrowdStrike leverages HCL as a domain-specific language (DSL) in creating CrowdStrike Threat Graph®, our purpose-built graph database. We also reviewed our DSL specification and how it is converted to an intermediate representation for further processing.

In Part 2, we discuss how Go code is generated from the intermediate representation.

Code Generation

Code generation is the final step of the DSL implementation. Since a majority of the validations —  such as variable and constant types defined and used in userlib functions and properties in vertices and edges — are checked during the intermediate representation conversion, we can now proceed to converting HCL and HIL into Go code by parsing the HCL and HIL AST (abstract syntax tree).

Here we review a few examples of HCL and HIL AST conversion to Go code.

The following HCL block …

var "processID" {
  key     = field.ProcessId
  onError = action.Error
  action  = "${len(processID) == 0 || processID == badProcessID ? break : continue}"
}

 … will generate the below Go code, given badProcessID is a constant defined earlier to this var block:

processID, err := event.GetField(“ProcessId”)
if err != nil {
  return err
}

if len(processID) == 0 || processID == badProcessID {
  return nil
}

The first step in generating the above code is processing the key and onError fields, as well as the first conditional expression. This is a fairly straightforward process and does not require an AST traversal.

The second step is parsing the HIL AST and understanding the conditional expression — it will  ensure we return an error if the field fails extraction from the event. The code for that process is below:

func UnwrapTernaryOp(expr hcl.Expression) (*TernaryOperator, error) {
  op := &TernaryOperator{}
  switch e := expr.(type) {
  case *hclsyntax.TemplateWrapExpr:
    wrapped := e.Wrapped
    switch conditional := wrapped.(type) {
    case *hclsyntax.ConditionalExpr:
      condition := conditional.Condition
      trueStmt := conditional.TrueResult
      falseStmt := conditional.FalseResult

      goCond, err := ToGoCondition(condition)
      if err != nil {
        return nil, err
      }
      op.GoCondition = goCond.Condition
      op.TrueStmt = trueStmt.(*hclsyntax.ScopeTraversalExpr).Traversal.RootName()
      op.FalseStmt = falseStmt.(*hclsyntax.ScopeTraversalExpr).Traversal.RootName()
    default:
      return nil, fmt.Errorf("unknown expression in ternary op - %T", conditional)
    }
  }
  return op, nil
}

func ToGoCondition(condition hclsyntax.Expression) (string, error) {
  stmt := ""
  switch s := condition.(type) {
  case *hclsyntax.LiteralValueExpr:
    var err error
    // cty is a dynamic type library used by HCL
    stmt, _, err = ctyValueToGo(s.Val)
    if err != nil {
      return "", nil, err
    }
  case *hclsyntax.FunctionCallExpr:
    args := make([]string, len(s.Args))
    for i, arg := range s.Args {
      var err error
      args[i], err = ToGoCondition(arg)
      if err != nil {
        return "", err
      }
    }
    stmt = fmt.Sprintf("%s(%s)", s.Name, strings.Join(args, ", "))
  case *hclsyntax.BinaryOpExpr:
    op := ""
    switch s.Op {
    case hclsyntax.OpEqual:
      op = "=="
    case hclsyntax.OpNotEqual:
      op = "!="
    case hclsyntax.OpGreaterThan:
      op = ">"
    case hclsyntax.OpGreaterThanOrEqual:
      op = ">="
    case hclsyntax.OpLessThan:
      op = "<"
    case hclsyntax.OpLessThanOrEqual:
      op = "<="
    case hclsyntax.OpModulo:
      op = "%"
    case hclsyntax.OpLogicalOr:
      op = "||"
    case hclsyntax.OpLogicalAnd:
      op = "&&"
    case hclsyntax.OpLogicalNot:
      op = "!"
    default:
      return "", fmt.Errorf("unknown operator '%s' in ternary expression", s.Op.Type.GoString())
    }

    lhs, err := ToGoCondition(s.LHS)
    if err != nil {
      return "", err
    }
		
    rhs, err := ToGoCondition(s.RHS)
    if err != nil {
      return "", err
    }

    stmt = fmt.Sprintf("%s %s %s", lhs, op, rhs)
    case *hclsyntax.TemplateExpr:
      val := ""
      for _, p := range s.Parts {
        out, vars, err := ToGoCondition(p)
        if err != nil {
          return "", err
        }
        val += out
      }
      stmt = val
    case *hclsyntax.UnaryOpExpr:
      op := ""
      switch s.Op {
      case hclsyntax.OpNegate:
        op = "-"
      case hclsyntax.OpLogicalNot:
        op = "!"
      }

      out, err := ToGoCondition(s.Val)
      if err != nil {
        return "", err
      }

      stmt = fmt.Sprintf("%s%s", op, out)
      default:
        return "", fmt.Errorf("unknown expression type: %T", s)
    }

    return stmt, nil
}

Although the code appears complex, it is fairly simple. The above represents the AST tree traversal and the appending of pieces of code in order to form the final conditional statement.

The final code-generated handler for the above DSL looks like the below:

type ProcessHandler struct {
  graphStore *GraphStore // API to interact with Threat Graph
}

// Constructors and helper funcs removed to keep context of blog relevant

func (h *ProcessHandler) ProcessEvent(ctx context.Context, event Event) error {
  if event == nil {
    return nil
  }

  const (
    badProcessID = -1
  )

  processID, err := event.GetField(“ProcessId”)
  if err != nil {
    return err
  }
  if len(processID) == 0 || processID == badProcessID {
    return nil
  }

  host, err := event.GetField(“Hostname”)
  if err != nil {
    return nil
  }

  user, err := userlib.GetUser()
  if err != nil {
    return err
  }

  processVertex := NewVertex(“Process”, processID)
  processVertex.AddProperty(“process_id”, processID)
  processVertex.AddProperty(“timestamp”, time.Now())

  userVertex := NewVertex(“User”, user)
  userVertex.AddProperty(“timestamp”, time.Now())

  hostVertex := NewVertex(“Host”, host)
  hostVertex.AddProperty(“timestamp”, time.Now())

  userToHostOutEdge := NewEdge(userVertex, UserHostEdge, DirectionOut, hostVertex)
  userToHostOutEdge.AddProperty(“timestamp”, time.Now())

  userToHostInEdge := NewEdge(userVertex, HostUserEdge, DirectionIn, hostVertex)
  userToHostInEdge.AddProperty(“timestamp”, time.Now())

  userToProcessOutEdge := NewEdge(userVertex, UserProcessEdge, DirectionOut, processVertex)
  userToProcessOutEdge.AddProperty(“timestamp”, time.Now())

  err = h.graphStore.SaveVertices(userVertex, hostVertex, processVertex)
  if err != nil {
    return err
  }

  err = h.graphStore.SaveEdges(userToHostOutEdge, userToHostInEdge, userToProcessOutEdge)
  if err != nil {
    return err
  }

  return nil
}

Below is a visual representation of graph mutations we get from the above DSL spec.

process flow chart

And here is the data representation of the graph:

Vertex ID Type Time Adjacent ID Properties
proc:1234 ProcessVertex 2021-01-01T07:11:00Z <binary blob of props>
user:jsmith UserVertex 2021-01-01T05:10:00Z <binary blob of props>
user:jsmith UserHostEdge 2021-01-01T05:09:00Z host:DC-123 <binary blob of props>
user:jsmith UserProcessEdge 2021-01-01T05:08:00Z proc:1234 <binary blob of props>
host:DC-123 HostVertex 2021-01-01T05:08:00Z <binary blob of props>
host:DC-123 HostUserEdge 2021-01-01T05:08:00Z user:jsmith <binary blob of props>

Migration

All new handlers written in Threat Graph will use our DSL. However, existing handlers need to be migrated to be represented by the DSL to make the process consistent and maintainable. One way to validate that code generated from the DSL will produce the same graph mutations as handwritten handlers is by conducting tests that pass the same events with the same values to each of these handlers. These mutations generated by the handlers can then be recorded and compared. This gives us the confidence that we have full parity before we cut over to leveraging the handlers generated from the DSL.

Challenges

As happens with many big projects, our team encountered several challenges during this process. These included: 

Migration: When migrating from handwritten handler to HCL handlers, it is important to consider the parity (functionality) of graph mutations, as well as the performance of the generated code. The latter plays an equal or even greater role as a deciding factor in deploying the DSL-generated handler. To that end, we collected a baseline performance profile of the handwritten handler and compared it with the DSL-generated handlers. This allowed our team to generate a 30% performance gain through improved processing time and memory allocations.

Performance and scalability: When operating at scale, adding a toolset to the arsenal should improve team velocity, as well as enhance overall system scalability, performance and maintenance. By standardizing the code generation, we were able to create a stable code base that offers our team greater predictability and enhanced performance. 

Tuning: Our team continuously tunes the generated code in order to reduce the memory allocation and improve processing times on the handler. One of the major changes we made is  caching the known event fields and properties on the event. This allows us to create them in the generated code and avoid looping them at run time.

Balance of features in a DSL: Finding the balance between the available feature set of the DSL and the necessary features is important to maintaining the overall functionality of the tool. Adding too many features may result in a new language, which would require the team to learn a new DSL or limit inputs from other teams. 

Uncertainty: Our decision to use HCL as our DSL came after careful consideration. As part of our process, we built prototypes in YAML and JSON and went through our review process to evaluate simplicity, readability and ease of use. Ultimately, HCL offered the best overall performance and the least amount of toolchain modifications.

Success

To recap, below are the three main things our team was able to achieve through a DSL:  

  • Simplify the process of new contributions to the Threat Graph data model
  • Create a Threat Graph handler that is intuitive, performant and safe  
  • Reduce the time and resources needed from the Threat Graph team to validate extensions of the graph

A DSL and HCL were unexpected choices for our team, but ultimately the best option. Our openness to all possibilities and comprehensive decision-making process has reaped huge rewards for our team, our organization and our customers. 

Additional Resources

Related Content