The New Stack Podcast

Hazelcast and the Benefits of Real Time Data

Episode Summary

In this latest podcast from The New Stack, we interview Manish Devgan, chief product officer for Hazelcast, which offers a real time stream processing engine. This interview was recorded at KubeCon+CloudNativeCon, held last October in Detroit. "'Real time' means different things to different people, but it's really a business term," Devgan explained. In the business world, time is money, and the more quickly you can make a decision, using the right data, the more quickly one can take action. Although we have many "batch-processing" systems, the data itself rarely comes in batches, Devgan said. "A lot of times I hear from customers that are using a batch system, because those are the things which are available at that time. But data is created in real time sensors, your machines, espionage data, or even customer data — right when customers are transacting with you."

Episode Notes

In this latest podcast from The New Stack, we interview Manish Devgan, chief product officer for Hazelcast, which offers a real time stream processing engine. This interview was recorded at KubeCon+CloudNativeCon, held last October in Detroit.

 

"'Real time' means different things to different people, but it's really a business term," Devgan explained. In the business world, time is money, and the more quickly you can make a decision, using the right data, the more quickly one can take action.

 

Although we have many "batch-processing" systems, the data itself rarely comes in batches, Devgan said. "A lot of times I hear from customers that are using a batch system, because those are the things which are available at that time.

 

But data is created in real time sensors, your machines, espionage data, or even customer data — right when customers are transacting with you."

 

What is a Real Time Data Processing Engine?

 

A real time data processing engine can analyze data as it is coming in from the source. This is different from traditional approaches that store the data first, then analyze it later. Bank loans may is example of this approach.

 

With a real time data processing engine in place, a bank can offer a loan to a customer using an automated teller machine (ATM) in real time, Devgan suggested.  "As the data comes in, you can actually take action based on context of the data," he argued.

 

Such a loan app may combine real-time data from the customer alongside historical data stored in a traditional database. Hazelcast can combine historical data with real time data to make workloads like this possible.

 

In this interview, we also debated the merits of Kafka, the benefits of using a managed service rather than running an application in house, Hazelcast's users, and features in the latest release of the Hazelcast platform.

 

 

 

 

 

 

 

 

Episode Transcription

Colleen Coll  0:08  

Welcome to this special edition of the new stack makers on the road. We're here in cube con North America, and discussions from the show floor with technologists giving you their expertise and insights to help you with your everyday work.

 

Joab Jackson  0:27  

Hello, in this latest edition of the new stack makers podcast, we're going to find out everything we need to know about Hazel cast, a real time data processing solution. With us we have Manish Devgan. He is the chief product officer at Hazel cast. Welcome.

 

Manish Devgan  0:51  

Thank you. Thank you for having me.

 

Joab Jackson  0:52  

First of all, what does the Chief Product Officer do?

 

Manish Devgan  0:57  

I run products. So basically deciding on the product strategy, and making sure that we are delivering solutions to our customers and solving solving their problems.

 

Joab Jackson  1:09  

So what is Hazel cast? He

 

Manish Devgan  1:12  

took us through the real time stream processing engine. And we have a platform which allows you to basically power the real time economy. There are a lot of the use cases which we go after where, you know, the time is money. And it's very important for these customers to take real time actions, on insights, they're gathering on fresh data. What is real time data? It's interesting real time means different things to different people. But it's really a business term. It basically is, time is money, you know, for for certain customers. 30 milliseconds is something which is real time. For others, one second is good enough. It's really a business drum on what if you want to really take action on insights in that moment, which is important for that business.

 

Joab Jackson  2:03  

So the data itself, in this real time scenario, where does the data come from?

 

Manish Devgan  2:11  

Well, the data is coming in as streams. And data is never created in batches. I mean, a lot of times I hear from customers that they are using batch batch system, because those are the things which are available at that time. But data is created in real time sensors, your machines, espionage data, or even customer data, right when customers are transacting with you. They're transacting in real time. So the transactional data is being created in real time.

 

Joab Jackson  2:39  

So I imagine that your customers have many sources of real time data. What does a processing engine do?

 

Manish Devgan  2:49  

Yeah, so we are, we are able to analyze data as it's coming in. So a lot of the solutions in the market, you know, where you have to do, you actually have to store incoming data first and then analyze. But here, you can actually analyze as the data is coming in. So you can analyze data in motion, as well as data at rest. So it's really a solution, which allows you to analyze data, streaming data. So before it even may be persisted. So typical use case might be you know, you're an at an ATM bank is using our solution as an ATM, where customers are transacting, and they are able to offer them real time loans. And they've seen that if they're able to actually meet customers expectation in real time, in that window of opportunity, they're actually able to increase their loan origination by 400%. So that happens, because you're able to transact or you're able to interact with your customer in real time, while they're shopping, or when they're banking. But I could do that database, right? You can do that that is in the database, but not in that in that moment, right, you will have to first store that stream coming in and then analyze. But here, you're able to do much richer, you, you're able to take actions at a much richer level in the sense that you have customer data coming in. And you have to look at customer history, back to the example about real time offers. As the data comes in, you can actually take action based on context of the data which may be stored. Hmm. Right. So you're able to join data in motion and data at rest. So right now know that a customer may be transacting at an ATM machine. But you may want to look up the history of the customer, which may be stored, you know, to provide that context for for you to generate a loan. And that's one of the things which we recently announced your ability to do a join between data in motion and data rest.

 

Joab Jackson  4:50  

So I could the data rest could be from any database or data warehouse or anything like that, or

 

Manish Devgan  4:58  

Yeah, so it could be And in any database, but for you to make that decision in that 30 millisecond while the customer is banking or while that customer is transacting, you need high speed, low latency access to data technology, which provide you to not only process streams, but also do context enrichment. Based on the data which is stored is gold, right, that's where you can really provide value to your customer.

 

Joab Jackson  5:26  

I know the standard database, I get all the SQL commands. And if I want to delve more, I can write up a prepared statement. What are the range of commands I could use with hazel cast

 

Manish Devgan  5:37  

VR SQL first by design, we make sure that our developers who are building solutions on top of the platform are able to use SQL for both data and motion as data at rest. So we support SQL for all operations, including creation of data, updating data, and quitting as well.

 

Joab Jackson  5:55  

What sort of tools do you have for developers?

 

Manish Devgan  5:58  

Do you have API's? Developers typically use API's, I mean, you can, as a developer, you could use SQL, but also you could use Java, we have C++, C sharp, and many other languages supported directly for accessing streaming data.

 

Joab Jackson  6:14  

Now, when I hear the term streaming data, I hear quite a bit about Kafka, and there's various Kafka, they're like, well, we can just streaming data, in what ways is Hazel cast superior to a Kafka solution,

 

Manish Devgan  6:28  

none of our customers use Kafka to ingest data into the platform, where we shine is there, when you want to do streaming analytics, you want to do analytics on that streaming data. And that's where they use us in conjunction with Kafka. That's kind of where we kind of interact with Kafka.

 

Joab Jackson  6:46  

So you would in this case, you're just using Kafka to bring the data in, yes,

 

Manish Devgan  6:51  

native data in or basically moving data around, when you're about taking action on the data streaming in. That's where they use our solution to do streaming, stream processing. And then context enrichment from the data we store in our platform.

 

Joab Jackson  7:06  

This is all happening in working memory,

 

Manish Devgan  7:09  

via memory first. So when when the latency requires that you store data in memory, then we do that. But we also have a tiered storage model, where you where you can decide on using things like NVMe SSD, so you can actually store data in a tiered way. So that way, you can decide whether the hot data resides in memory, but the not so hot data may reside in another tier. So we provide them flexibility in how to store data. And that's another new reason enhancement. That's right, between storage, these tokens, 5.2 5.2 and 5.2, they basically have tiered storage available. So you can actually store the data in a hybrid way in memory as well as on on disk or SSD, that NVMe.

 

Joab Jackson  7:57  

So we talked about the developer experience, what does this look like for the admin who has to set up and maintain and run it,

 

Manish Devgan  8:05  

a lot of our customers use the platform in a more self hosted way, we run on Kubernetes. So the we actually have discovery plugin and operators for for people who are managing the cluster to use them, we announced this simply a serverless offering, which is a managed service, a cloud managed service. And that basically takes the operational complexity away from you. So now our customers don't have to worry about setting up clusters or dynamically scaling up or shrinking. We do that automatically. But you basically are working on your business use case on top of our platform. So if you're worried about this operational complexity, or these, this heavy lifting you have to do, you can basically use our managed service or managed services called Viridian. So that's, that's a service which you can sign up for, start to work on your solution.

 

Joab Jackson  8:56  

And so how does that work? It's all done through API's.

 

Manish Devgan  9:00  

It's all done through API's. So basically, you know, you simply have to, on one click, start the cluster. And then you're going to interact with the cluster using API, or even sequel, right Java API or SQL.

 

Joab Jackson  9:14  

How am I charged for that? Is it a monthly? Oh, yeah, we were usage.

 

Manish Devgan  9:19  

We have different pricing models for that. We have a storage, we have a serverless tier, which is a free tier, where we actually have two gigabyte of storage free, oh, but as you move further than you pay per gigabyte, and then we are also working on a metric of request units where you basically get charged as you provision more data, as you provision more servers.

 

Joab Jackson  9:44  

The two gig is not working memory.

 

Manish Devgan  9:47  

Yeah, that's that's working memory. Yeah.

 

Joab Jackson  9:49  

So for the self hosted, do you provide the hardware? Do I buy the hardware? Yeah,

 

Manish Devgan  9:55  

Vienna, Vienna, basically a software company. So you can run on commodity hardware or you can run on Azure or GCP, or AWS, it really depends on what what cloud you want to run on. But we provide you the, the hooks, you need to run our platform on, if you're running it on self hosted,

 

Joab Jackson  10:13  

what sorts of companies or industries use a little gas the most?

 

Manish Devgan  10:20  

Yeah, so we are primarily in three verticals, I would say. One is definitely financial services. So a lot of the big banks, or the credit card company, you can imagine the use cases are around fast data access and stream processing. It is about credit card fraud, and transaction and payment processing. So the second big industry is retail, and E commerce. And the third is health, health services. So healthcare is becoming the third biggest vertical for us,

 

Joab Jackson  10:50  

healthcare, healthcare, they need real time data,

 

Manish Devgan  10:53  

real time data, real time notifications, a lot of the use cases we have around monitoring of devices, monitoring patient beds, expensive equipments, which they want to monitor. And also just in basic health care, you know, you need real time notifications, for the doctors to know, you know, for providing you the best patient care.

 

Joab Jackson  11:13  

Now, when I say good memory, I think it's expensive.

 

Manish Devgan  11:17  

You know, actually the the hardware has changed quite a bit in the last few years. And that's one of the reasons why we have invested in tiered storage, where, you know, you can store a and have the low latency, predictable low latency in an SSD environment, right. So we be really cost optimized for your use case, if your use case requires sub millisecond, we can actually provide that on an SSD based mechanism. If you're looking at microsecond or a single millisecond, then yeah, then you have to go to a full in memory kind of solution. So there is flexibility there. And that's how we optimize

 

Joab Jackson  11:53  

the enterprise contract is that like, on a per year basis on a per usage, or

 

Manish Devgan  11:59  

be basically charged pay as you go or pay as you provision, fairly flexible model there. But for certain use cases, we also have custom pricing. So based on your use case, you know, we kind of price that out for you.

 

Joab Jackson  12:11  

Are there any other aspects of the recent release, you would want to highlight for us? Yeah,

 

Manish Devgan  12:16  

I think the recent release is really big for us, because we are inching towards becoming the system of record for all real time application. The tiered storage is one step in that direction, combining streaming with storage. And that's becoming very, very fundamental for any kind of real time application. Because, you know, data is always created in streams. They're not created in batches. So we are trying to bring together streaming and storage, compute and storage to be the part of the data platform.

 

Joab Jackson  12:47  

Do streaming data, is that different for the developers? Do they? Are there concepts that they have to think about?

 

Manish Devgan  12:54  

They have to think about, you know, if a data is being generated in stream, I can, I should be able to analyze that in streams as well. You don't have to store the data, and then use a batch system to process that. So that's, I think, a fundamental shift in I think, the developer mindset that I am ingesting data fast. And actually, I can analyze and take action on the data fast as well. So that's the fundamental shift. But I think as far as the programmatic API is go, it's equal, but

 

Joab Jackson  13:23  

you're given a time window. Yes. Anything else about the five to release you want to share,

 

Manish Devgan  13:29  

the only thing I would say is that if you want to try out our next release, you know, go to Hazel cast, and look at the Veridian option. You can sign up for free and Kickstarter platform and see the benefits you get from real time data processing.

 

Joab Jackson  13:46  

Okay. All right. Well, thank you so much for taking the time to get us up to speed. And thank you listeners and viewers for checking in and discuss concludes another episode of the new stock podcast.

 

Alex Williams  14:00  

Thanks for listening. If you liked the show, please rate and review us on Apple podcast Spotify, or wherever you get your podcasts. That's one of the best ways you can help us grow this community and we really appreciate your feedback. You can find the full video version of this episode on YouTube. Search for the new stack and don't forget to subscribe so you never miss any new videos. Thanks for joining us and see you soon.

 

Transcribed by https://otter.ai