Real-Time Recommendations with Machine Learning in Clickstream Processing Pipeline

The Nussknacker Blog
10 min read1 hour ago

--

Overview: Bridging Data Streams and E-commerce

While modern recommendation systems leverage vast datasets to predict preferences, seamlessly integrating these systems into existing platforms pose challenges. Businesses need real-time data processing and scalable machine learning solutions to meet these demands.

In this article, we demonstrate how Nussknacker.io, paired with Snowplow, a popular open-source event tracking system, can transform raw event data into actionable insights. Using Shopify as an example, we showcase how Nussknacker processes click streams and app events in real time. With Nussknacker MLflow component, pre-trained machine learning models can be effortlessly integrated into the streaming process, enabling businesses to deploy custom models for dynamic, personalized recommendations instantly.

By combining real-time data streams with advanced machine learning, Nussknacker helps businesses exceed customer expectations, driving engagement and boosting sales. Nussknacker Designer simplifies complex data processes, making it accessible even for teams without coding expertise or/and without deep knowledge of streaming and Flink. As we delve into this use case, we’ll explore how Nussknacker, Snowplow, and custom machine learning models come together to elevate e-commerce personalization.

Use Case: Real-Time Recommendations

We aim to deliver real-time personalized product recommendations in Shopify by following these stetps:

  • Capture User Interactions: Use the Snowplow tracker to collect clickstream data from the Shopify store and send it to the Nusknacker Cloud HTTP endpoint for processing.
  • Process Events: Parse incoming event data in Nussknacker, group products viewed by users within a specific time window, and prepare data for ML model input.
  • Generate Recommendations: Utilize the MLflow component in Nussknacker to feed grouped product data into a machine learning model for real-time recommendations.
  • Send Recommendations to Shopify: Use the Nussknacker HTTP component to update Shopify’s backend with personalized recommendations via the Shopify API.
  • Display Recommendations: Leverage the Shopify Storefront API and custom JavaScript in Liquid templates to fetch and render recommendations directly on product pages.

This use case demonstrates how Nussknacker efficiently handles real-time event data, integrates user-supplied machine learning models, and enhances e-commerce platforms like Shopify with personalized recommendations. With this overview in mind, let’s delve deeper into each step to see how we can implement this solution.

NOTE: If you want to recreate the entire scenario step by step go to the full version of this blogpost on our website

Capturing Shopify Events with Snowplow

To kickstart our real-time recommendation system, we establish a reliable flow of user interaction data from our Shopify store to Nussknacker using Snowplow. Snowplow is a well-known solution for capturing user interactions, offering open-source trackers that we can easily leverage. We use newly created Shopify store. Then we configure an HTTP endpoint in our Nussknacker Cloud instance to receive events directly from the Snowplow tracker integrated into our store. This setup allows us to capture detailed user behavior in real time, providing the critical data needed to power our recommendation engine.

Stream Processing Made Easy with Nussknacker

With our Shopify store sending events to Nussknacker Cloud via Snowplow, we’re ready to harness Nussknacker’s capabilities to process these events. The events collected by the HTTP endpoint are stored in Kafka, which serves as the source for our Nussknacker scenario. Although we don’t utilize predefined Snowplow event schemas, Nussknacker’s flexible JSON parsing allows us to handle the incoming data effectively. We leverage standard Nussknacker components like Filter and Variable to parse and validate the events, extracting essential information such as user identifiers, viewed products, and timestamps. In addition, we include a debugging sink, which in our user case is one of several Nussknacker methods, to monitor and ensure that our and ensure that our scenario is working correctly.

This data processing setup ensures that only relevant data is passed to our machine learning model, laying the groundwork for generating accurate and personalized recommendations.

NOTE: If you want to recreate the entire scenario step by step go to the full version of this blogpost on our website

Incorporating Machine Learning for Dynamic Recommendations

Personalized product recommendations based on the products a user has viewed require a robust machine learning learning model capable of capturing both short-term and long-term user preferences. We find SLi-Rec (Short-term and Long-term preference Integrated RECommender system), a deep learning-based framework designed to enhance personalized recommendations by modeling users’ sequential behavior. SLi-Rec uses self-attention mechanisms to effectively capture both long-term and short-term user preferences.

Key Features of SLi-Rec:

  • Self-Attention Mechanism: Identifies relationships between items in a user’s interaction sequence, enhancing the representation of user preferences.
  • Fusion of Long- and Short-Term Interests: Balances immediate trends with established patterns for better recommendation accuracy.
  • Efficient Sequential Modeling: Processes user interaction sequences dynamically, improving predictions for the next item of interest.

In the realm of recommender systems, two paradigms are most popular today: general recommenders and sequential recommenders.

General recommenders, such as factorization-based collaborative filtering methods, aim to learn users’ long-term preferences, which are presumed to be static or change slowly over time. While these systems can provide decent recommendations, they often fail to reflect users’ recent behaviors. They also require periodic retraining on collected historical data to account for any changes in users’ preferences.

Sequential recommenders, on the other hand, strive to capture the variability of user behaviors influenced by evolving interests, demands, or global trends. These recommenders operate on sequences of user actions, meaning any changes in preferences and the order of those actions both influence the provided recommendations. This attention to both short-term and long-term interests makes sequential recommenders superior in use cases requiring real-time product recommendations.

Given our goal of delivering real-time, personalized recommendations in our Shopify store, SLi-Rec’s ability to integrate both short-term and long-term user interests makes it an ideal choice. By leveraging SLi-Rec within our Nussknacker scenario, we can dynamically respond to users’ immediate behaviors while also considering their historical preferences, thereby enhancing the overall shopping experience.

A Word About Machine Learning In Nussknacker

To integrate our machine learning model into the Nussknacker processing pipeline, we utilize Nussknacker’s support for model inference through its MLflow component. This feature allows us to incorporate machine learning models directly into our streaming data flows, enabling real-time predictions.

Using the MLflow component, we select a specific model from the MLflow Model Registry. The MLflow Model Registry is a centralized repository that manages the lifecycle of ML models, providing versioning and easy deployment options. Within Nussknacker, we assign the required input parameters to the model, aligning it with the data we’ve extracted from our events.

During the scenario deployment to the Flink runtime environment, Nussknacker retrieves the selected model from the registry and deploys it within the Nussknacker ML runtime, a distributed execution environment optimized for Python-based machine learning tasks. This specialized runtime ensures models developed in Python integrate smoothly, providing both performance and scalability for continuous inference when using the model in stream processing.

The Flink job, orchestrated by Nussknacker Designer, communicates directly with the deployed model in the ML runtime to perform inference. With this setup, our streaming application delivers real-time predictions based on incoming data, maintaining low latency and enabling rapid, data-driven decision-making.

By leveraging Nussknacker’s MLflow component, we effectively bring our machine learning models into the streaming context, allowing us to provide dynamic, personalized recommendations to users as they interact with our Shopify store.

For a deeper dive into how this process works, you can read Łukasz Jędrzejewski’s blog post. In the section “MLflow model inference simplified with Nussknacker ML runtime”, he explains the integration in detail, showcasing how Nussknacker simplifies model inference in streaming applications.

Preparing and Registering the Recommendation Model

To utilize our selected recommendation model within Nussknacker, we first need to train it with the relevant data. We’ve provided a comprehensive Jupyter notebook and detailed instructions to guide you through setting up your environment on Azure Databricks, training the model, and registering it in MLflow. Once the model is trained and registered, it’s ready to be integrated into our Nussknacker scenario. This integration enables us to turn aggregated user interactions into real-time personalized product recommendations, leveraging the power of machine learning.

Using Nussknacker ML Component in Real-Time Data Processing

With the trained model prepared, we focus on embedding it into our Nussknacker scenario. The integration involves combining real-time data aggregation with machine learning inference to create a dynamic and responsive system.

First, we group product interactions for each user using Nussknacker’s Sliding Window aggregation component. This step collects product views of a single user over a defined time period (e.g., five minutes) and organizes them into structured data, including product slugs and timestamps. These aggregated events represent the user’s recent interactions, which serve as the input for the recommendation model.

Next, we integrate the machine learning model using the MLflow component. This allows us to seamlessly connect to the model hosted in the MLflow registry (in our case, Azure Databricks) and utilize the model’s signature to correctly assign the aggregated data variables as inputs. During runtime, the model processes this input to generate personalized product recommendations in real time.

This powerful setup demonstrates how Nussknacker simplifies the integration of machine learning models into streaming data processes, enabling us to deliver a more engaging shopping experience to our customers. In the next step, we’ll bring these recommendations full circle by sending them back to Shopify for display, completing the loop from data collection to customer engagement.

NOTE: If you want to recreate the entire scenario step by step go to the full version of this blogpost on our website

Sending Recommendations Back to Shopify

With the machine learning model integrated into our Nussknacker scenario, we now turn our attention to sending personalized recommendations back to Shopify. This step ensures that the insights generated by our scenario are made accessible to users, enhancing their shopping experience.

Before sending the recommendations, we refine the model’s output through a post-processing step. By limiting the number of recommended products to a maximum of 10, we ensure the suggestions remain concise and relevant. This highlights Nussknacker’s flexibility in adapting raw model inference results to meet specific business needs.

To handle the transmission of recommendations, we utilize Nussknacker’s versatile HTTP component. This general-purpose tool seamlessly integrates with Shopify’s Admin API, even though the API relies on GraphQL. The HTTP component provides a straightforward and efficient way to communicate with Shopify’s backend, showcasing its adaptability for various integration scenarios.

While the recommendations need to be stored for retrieval, the storage backend could be any database or system that fits the business requirements. For this use case, we chose Shopify’s Metaobjects to simplify the setup. These Metaobjects associate the recommended products with a user ID, making them easy to manage and retrieve. However, this approach could be replaced with other storage solutions depending on the desired architecture.

With this we have successfully completed the technical flow: capturing user interactions, generating personalized recommendations using a machine learning model, and sending them back to Shopify for storage. This integration demonstrates the full potential of combining Nussknacker’s streaming capabilities with Shopify’s API to deliver real-time personalization.

The next step is to make these recommendations visible to customers. In the upcoming section, we’ll focus on how to display the recommendations within your Shopify store.

NOTE: If you want to recreate the entire scenario step by step go to the full version of this blogpost on our website

Rendering Recommendations in the Shopify Store

The final step in our journey is to display personalized recommendations directly within the Shopify storefront. By injecting custom JavaScript into a Liquid template of the Shopify theme, we can dynamically fetch and render recommendations using Shopify’s Storefront API.

This approach is chosen for its simplicity, making it ideal for showcasing the integration. While it introduces a small delay due to Shopify’s metastore refresh, this lag is entirely on Shopify’s side and not related to Nussknacker’s real-time processing capabilities.

For production use cases, alternative methods or platforms might be preferred for storing and fetching recommendations to minimize delay. However, for the purposes of this demonstration, this method effectively highlights how Nussknacker can integrate real-time machine learning insights with Shopify to deliver personalized user experiences.

Benefits of an Integrated Streaming Solution

Nussknacker empowers businesses to enhance e-commerce with real-time data processing and machine learning. By integrating easily with tools like Snowplow and MLflow, it offers a seamless way to handle complex data streams and deliver actionable insights.

Seamless Integration with Clickstream Data

Nussknacker captures and processes clickstream data from Snowplow, simplifying data collection and enabling businesses to focus on insights instead of infrastructure.

Real-Time Machine Learning

By allowing the integration of pre-trained models from platforms like Databricks, Nussknacker makes it simple to incorporate externally trained and managed models, enabling instant predictions and real-time personalization.

Flexible and Intuitive

Robust tools for data manipulation and a low-code interface allow businesses to adapt scenarios quickly, focusing on strategy rather than technical details.

Real-Time Recommendations

Nussknacker enables instant, personalized recommendations, improving customer engagement and driving conversions.

Conclusion: Transforming E-commerce with Real-Time Streaming and ML

Real-time data processing and machine learning are now essential for e-commerce success. Nussknacker simplifies this integration, enabling businesses to:

  • Capture Insights Seamlessly: Easily process user interactions using Snowplow.
  • Deliver Instant Predictions: Use real-time ML for personalization without complex infrastructure.
  • Adapt Quickly: Modify and scale scenarios to meet business needs.

With Nussknacker, businesses can act on data instantly, reduce development complexity, and enhance customer experiences. It provides a powerful foundation for integrating real-time streaming and machine learning into modern e-commerce platforms.

Thank you for reading! We hope this inspires you to explore Nussknacker for your real-time data processing and machine learning needs. Feel free to contact us if you have any questions or need support.

--

--

The Nussknacker Blog
The Nussknacker Blog

Written by The Nussknacker Blog

Low-code tool for automating actions on real-time data

No responses yet