Automate Decisions using Apache Spark

What is Spark?

Apache Spark is an analytics engine for large-scale data processing, with powerful built-in modules for data streaming, SQL, machine learning and graph processing.

When working with large-scale data processing, the spark is the best analytics engine to work with. Apache Spark achieves high performance for both batch and streaming data and workloads can be run 100x faster, it is lightning fast, has in-memory data processing engine. The spark provides the API’s for other programming languages such as Java, Scala, Python and R. It has optimized engine for executing graph. Spark includes powerful libraires for SQL and DatFramesMLlib for machine learning, GraphX, and Spark Streaming which can be combined in the same application.

What is Business Rules Management Systems (BRMS)?

Business Rules Management System (BRMS) is a technology system used to capture decision logic as a business rule, which is automated across applications. The conventional applications have business logic as a code within applications, instead with a BRMS, the rules can be externalized and managed away from application code. This enables the business logic can be used by multiple applications and can be modified independently.

Examples of BRMS Systems

When working with large data or to perform analytics, sometimes there is a need to call a rules engine. The above decision engines provide powerful decisioning capabilities when working with large data. Let us look at the architecture.

Integrating spark with BRMS

The IBM Operational decision management system (ODM) can be embedded in a map reduce spark application. All the features in ODM can be used while the spark provides high scalability using the cluster deployment. With the cluster deployment, multiple rules engine can be run in parallel in multiple JVM’s. The spark and ODM integration architecture can be found here.

Decisioning process from spark

  • Reading a decision request Resilient Distributed Datasets (RDD) from a file or any other store.
  • The RDD decision request is map transformed via function and executes the decision service.
  • The ODM REST service can be invoked or a POJO session can be established.
  • Creating decision RDD with the request and the response for each decision made.
  • Save the decision RDD in HDFS or another persistent store.

The response from the decision service can be used for post processing.