in this episode
What’s next for a technology that delivers near-perfect results? Step up and aim for The Next Big Think! This is exactly what Narrativa, a TCS COIN™ Accelerator startup that produces automated content for various industries, aspires to do with natural language generation (NLG).
David Llorente, the founder and CEO of the company, explains how NLG-enabled content is almost perfect, and the next step is to write more complex stories in lesser time. In this episode of The Next Big Think! witness how David’s vision of ‘a new way to create content at scale’ comes alive and goes on to make businesses more productive and future ready. Tune in to realize the power of NLG, a technology that can not only analyze complex, non-structural data but can also work with other industries to infuse more transparency into governmental activities and build better societies.
Kevin Benedict: Welcome to The Next Big Think! In this podcast, we give a shout out to the future. I'm your host, Kevin Benedict, a partner and futurist here at TCS. And I want to thank each of you for listening today. My guest today is David Llorente, he's the CEO and co-founder of Narrativa. David, thank you for joining us today.
David Llorente: Hey, Kevin, thanks for having me.
Kevin Benedict: So this weekend, we're celebrating our Labor Day, kind of the celebration of the end of summer. Now I see from your profile and your background, David, you've lived all over the world. So how do you pick and choose what holidays you choose to keep?
David Llorente: Well, actually, I haven't had a proper holiday for many years. I know it's doing something but, you know, I tried to… I try to find places that from my childhood that helped me to relax and you know, be outdoors. So that's the main role. So, I spent time in Spain and Estonia, in summer, as you said.
Kevin Benedict: That's a beautiful part of the world. So, you've… you've experienced living in at least six different countries? How has that experience helped you as a CEO of Narrativa?
David Llorente: Well, in general, it helped me to, to… let's say, to understand better people and different cultures. And, also, to understand that there are many different answers for the same question. Yeah. So that was very important for me, because obviously, in Narrativa… we have… it's a multicultural team. Not to mention that we have customers around the globe. So, it helps me… it helps me to really understand how to interact with different cultures and more like, not, let's say, not to judge people, or cultures, just to try to understand the reasoning behind certain habits and so on. So, this was very interesting. It's been interesting for me, and it definitely helped me to also build my character.
Kevin Benedict: You know, when I first started traveling internationally, David, I thought it was so fascinating to see how, you know, my experience, especially as a young person, up to that point had only been in the US before professionally sort of traveling all over the globe. And then you would, you would see how the same challenge it might even just be a traffic flow, how that same challenge is handled differently all over the world. It just, it just broadens your mind about how the same problem has many different solutions.
David Llorente: That's right. That's right. I'm also happy to understand that some things might be a problem in certain countries, but not in other countries. Right. That's very interesting, yeah.
Kevin Benedict: Absolutely. So, I imagine you know, having traveled everywhere, and lived so many different places, and… and now having offices in so many places, as well, that the different languages must play a role in, you know, helping you guys better understand the whole process of natural language generation, you know, how do you find that experience helpful?
David Llorente: Yeah, well, with language is, you know, helps us to understand, first, that, obviously, having a common language or working language which is English, in our case, super, super useful, but also helped me to understand it, you know, living abroad helped me to understand that, how important our local languages, you know. We think that English is, you know, the kind of lingua franca, you know, for today's but there are many, many different languages that if you actually pay attention to these languages, you'll discover that there are many, many interesting markets, obviously, for natural language generation.
Kevin Benedict: So before we get further into our podcast today, David, let's just define NLG, or natural language generation, because we're going to be using that phrase a lot today. So, what's your definition? What does it mean to somebody who's not familiar with this space?
David Llorente: Yeah, well, basically, natural language generation is a subcategory of natural language processing. So, when we talk about natural language generation, we are always also talking about natural language processing. The specifics of natural language generation is that the models use or the systems we build are… the goal is to actually generate language in natural language. Yeah. So that's, that's kind of the difference because natural language processing you can use models to extract entities or classified text or find synonyms or doing paraphrasing. But let's say by using different of these technologies or models together, then you actually achieve the natural language generation.
Kevin Benedict: Thank you for filling us in there, because I don't want to, you know, bury people in acronyms that they're unfamiliar with. Let's talk about what was your original vision? Where did that vision come from for creating Narrativa? And has that vision changed over the years?
David Llorente: So, I was wanting to build a new way, or to develop a new way to create content at scale, you know, because currently, if you think about how much content is available right now, is crazy. And how much data you know, it's just amazing the amount of data and content produced every day. So, if you put these two things together, the data to be able to actually generate the stories from data and content, to be able to basically teach your system or train your systems, your AI to write content, then you have the combination thereof. So, from the technological point of view, everything is there, the mix is there. But obviously, you know, the market is, is taking notice, you know, it's a big change, it's a substantial change on how processes are carried out and how, you know, companies run roles. So, the market is still adapting. But definitely my vision was to actually find and develop a new way to create content.
Kevin Benedict: Can you give us before we move on, David? Can you give us just a handful of examples?
David Llorente: Yeah, I mean, well, we started from media, basically, we started creating, generating news automatically. We still do it. For example, Wallstreet Journal, they are using our content. Boston Globe. Outside of the US, we have in Dubai, in Spain, we have three or four customers. We even have customers in… in, in Saudi Arabia. So, we create around 1.2 million articles every month. So, we cover as many different stories, you know, but we also do customer communications and newsletters. We automate clinical study reports, we automate financial reports, you know, we do product descriptions for e-commerce. So, think about any, any area where you have actually data, and you have a large need to create content. And this is where Narrativa steps in and helps those companies.
Kevin Benedict: So, the sweet spot, if I understand it, right, David, and correct me if I'm wrong, is data rich environments where there's a lot of data come in very quickly. And there's value in sharing that within a context. And just getting the raw data out there within an understood context, is that the sweet spot?
David Llorente: Correct, you know, think about this - we didn’t invent something, everybody was creating content for. So, this doesn't change, what it changes is the way that the content is created. So, you know, any industry that is currently creating content? Now, they have a way to do it much faster at scale, and definitely much cheaper than by doing it manually. I mean, today, the content generated, I mean, when you generate content manually, you try to think about, okay, is this content gonna be interesting? How many readers it’s going to have, what is going to impact? Now it’s changed there's a new there is a new way. Yeah. You have to think so much about how many readers this piece of content is going to have? Because that's another you can create the piece of content for one person, so maybe this is a story that is interesting only for you. Yeah. So, it's done automatically. It's done at scale. You know, this changes everything when it comes to content creation, you know, the strategy, content strategy changes dramatically, you know, all the long tail content for media, you know, Usually long tail content was something that, you know, it wasn't there, nobody created the stories manually to go to very, very, you know, specific niches with low value now is different, you can do it. But not only that, because it's not fair to say that it's only about long tail, we're talking about also like writing content, which is like premium content, let's say think about clinical study reports for pharma companies. This is this document you create, to send to the FDA to get your drug vaccine approved, right? So, this is not long tail content. But by doing it automatically, you can do it in a much, of course, much faster way. Because you create the first draft for the medical writers. Secondly, you can reduce the needs for reviews, because obviously, the machine is not gonna make mistakes with the data. Yeah, right. Machines are good with data. And third, you know, you can free up time from the medical writers to actually perform all the activities related to the submission of the documentation. Yeah. So, at the end, it's a win win situation for everybody. And again, it helps with longtail, but also with this content, which is critical for organizations.
Kevin Benedict: That's fascinating, David. So, walk us through, you know, some of the differences in the technologies involved in the natural language, understanding natural language generation, natural language processing, at a high level as high as you can for us, you know, non techies, just walk us through kind of what technologies make that all possible.
David Llorente: Okay. So, you know, we use different… different technologies. But the main one is, obviously, we use machine learning and deep learning, why we have that because how, traditionally, natural language generation was done in a very programmatic way, basically, I mean, I think everybody remembers the templates that you could build in Microsoft Word, right? You build a template, and then you fill the gaps basically, with data. Right? That's it. There's a very, there's a programmatic approach. Yeah. Which requires you to know, in advance all the scenarios possible. Yeah, because you need to program there, them, right. So, the difference is with our approach, using deep learning-machine learning, is that we teach the system how to write by ingesting hundreds of thousands of millions of articles of documents, of customer communications. So, the system is able to actually understand the style. And for that we do extract, we do clustering and classification of sentences, is able to understand what the content of these sentences by extracting entities within an ontology. So, all these are machine learning and deep learning models that help us to, to extract this and basically to teach the system how to write. And then obviously, we have the, the aspect with these, which is the data analysis, because only the machine might not have my know how to write. I mean, the best example is GPT-3, a model that OpenAI release, like I think, a year ago. It's a language model. And this is what is able to generate content, you know, and it's amazing how it does it. But the problem is that it doesn't generate content based on basically it is able to maintain a conversation, but it's not able to tell you, okay, good you give to the model, like a spreadsheet with, I don't know, says numbers isn't gonna be able to actually be a report. Yeah, because it doesn't understand the context. Yeah, the way that old approaches do. But so everything is about data in the last stage. So, we need to understand the data. And for that, we do analyze the data. And again, we use obviously, we also use news and statistical models, but again, it's mainly machine learning and deep learning.
Kevin Benedict: That is fascinating. So, when you look at everything involved in being able to produce this natural language generation, what what really are the hardest components of that, the hardest things to achieve with that?
David Llorente: Well, the hardest thing, is to make sure that the machine is actually coherent and understands the context. That's very hard. Because you need to make sure that that, you know, it’s consistent, that always behaves in a similar way. Yeah, what is when I say behave means that, that the output is consistent. So, it always follows the same structure, that is very important, because you don't want to have many different types of styles, or structure of the content, because it really doesn't help for the readers. Yeah. And definitely the context is probably the hardest part here. You need to have an ontology behind the domain, in the domain for the system for the AI to really understand the context. Yeah. And we listened and thought this ontology is incredibly hard. Again, we use machine learning to help us to build these ontologies. But they changed a lot from one domain to another, and some domains like, like clinical style reports for pharma, these ontologies are really large. So, they are very complex. And this is where AI needs more help.
Kevin Benedict: So, when we're talking about NLG, what are the outputs that you're typically generating? Are they mostly text, mostly article? Or can they be used to populate a chat bot conversation, or to generate synthetic speech or animation or avatars? You know, can it be used in all those areas?
David Llorente: All those areas, you know that, you know, we can generate, we do generate text, but we also generate graphics. Yeah. And then obviously, if you want to generate voice, the first step is to generate text. Yeah. So that, let's say that the last thing we're doing, and something you didn't mention, is that we are using our technology to actually generate synthetic datasets. Why is that, you know, sometimes, there are certain industries like pharma, but also finance or even, you know, military and security. They are not very keen to serve data to train AI models. It's very hard. So, what we're doing now is we are using our technology to build synthetic datasets. So, to recreate real datasets, but they are not real. Yeah, they are fake. But why we do that. Because for the machine, these datasets, it doesn't matter if it's real or not, as long as it's coherent, within the domain. So, we are able to build new data sets to train AI models. And once these models are used with real data, they behave, that they perform very well.
Kevin Benedict: So, if we were to look at the industry of NLG. And we're looking at the maturity of that industry, on a scale from one to 10, where one is brand new, and 10 is fully mature. Where would you put NLG on that scale today?
David Llorente: I think broadly were between three and four. Oh, wow. Yeah, means we are a handful of companies specializing on NLG. The industries that are adopting NLG are still let's say they're, they're very few companies really embracing NLG at the scale. However, the last 12 months, we have seen a dramatic change. So, we see exponential growth now, across many industries, we see that the benefits of NLG has been proven and the NLG has been understood. And something I mentioned before the organizations are reshaping themselves. And COVID is obviously one… one of the main reasons why this is happening. They are trying to be more proactive, they are trying to reduce the costs and increase the output. And they try also to actually find new business models and new revenue streams. And the way they do it is by trying to integrate different technologies, among them NLG, to be able to, again to increase revenues to bring new revenues, to reduce costs, to be more flexible. So, you know, it's, I guess I can say we are in a really, really good moment for the industry. And a great future ahead of us.
Kevin Benedict: You know, it's fascinating because one of the things that I do as part of my profession here at TCS is we're always studying emerging technologies, various current and future scenarios, and how changes in societal influences and economic and geopolitical issues impact possible future scenarios. And this pandemic that we're all experiencing, has really changed the trajectory of technology adoption in so many different industries. And, as you're sharing with us today, and natural language generation has also been positively impacted the interest in that area, let me move our conversation over to biases, David, because it seems like one of the key things as we just talked about there, when your systems are looking at content, and being able to convert that into text and things, is just making sure that there's not biases coming out of the system, whether intentional or unintentional biases, that might change the context and meaning of what people are reading. Is that an issue? And how do you approach that and deal with it?
David Llorente: So, it is definitely an issue. I mean, first, because I have to say that the bias has come from, from ourselves, actually, you know, when we train the models, we use text that in general is generated by humans. And you can see when you train the model, and… and you see the output, it really is… really biased quite often, especially on, on certain topics. You know, I'm not saying that this affects pharma or finance industry, because it's very, they're very, like, factual, let's say, but media, for example, it does affect. So, and it does affect that, and it can have a big impact on the results. Yeah. The way we, we understand that this is not well, I have to say, sometimes the bias is unintentional. But some that's intentional. Yeah. So, what we try to do always, or what we do always is to make sure that we have some mechanism in place to actually detect that. Yeah. And this, what I mentioned before about the synthetic datasets, this plays a role there, because these datasets you can actually, you can actually feel to avoid these biases, in most cases. Yeah. But I have to say is, obviously is absolutely impossible to control to be bias-free, let's say. But then there is another angle, some customers or some potential customers, might ask you to write content that is biased. And in that case, our decision as a company is not to do that. So, the machines need to write content in factual away as objective as possible. But we should not allow the machines to actually influence certain views in any any kind of view. Because that's not the goal of the machines, the machines need to be factual, you know, not try to interpret the data to actually support certain opinions or certain point of views.
Kevin Benedict: That's interesting, because, as you pointed out, so many of these systems are trained on past data. And if there's past historic biases, then you don't want to just grab those and project forward into the future through your system. You want to try to identify those biases in the past data and try to change those in some positive way going forward.
David Llorente: That's correct. I mean, we have some models that help us to detect subjectivity and biases and they are not perfect, but they are definitely very, very helpful. So especially for media, in general for all for all industries. In the process of designing the stories we always have somebody, an expert from the industry, helping us to validate the results. And one of the things that they help us to validate is to avoid is to have biases and again, and to have this angle of certain stories, so subjectivity which we cannot actually afford to, to, let's say to step in, because then we enter in a, in a very different type of industry, which, you know, is coming fake news and all these which is disgusting in our opinion.
Kevin Benedict: Right. So, what industries do you see right now being kind of the most fertile markets for you guys?
David Llorente: Life Sciences… is a big way there, pharma hospitals, you know, manufactures of vaccines is… I mean, they really, really want to get to the market ASAP. Once that clinical trial is done, they want to get the vaccine or proof of the drug approved for the treatment approved and go to the market. If you think like they spent billions on developing drugs, and maybe process to develop a drug and go through the trials takes maybe five to 10 years, you can't afford to be like one year on the process of spending one year with the approval of the drug, you want to be able to generate the report like in a matter of weeks, send it for approval and get it approved, you can get to the market. So that's kind of the main industry for us. And then obviously, we have finance, insurance. And, also, traditional like media, gaming, and e-commerce. These are the main industries where we see big traction, logistics is coming in generating all these industries are also like, we see a lot of interest there as they start adopting these technologies.
Kevin Benedict: So, they're very, you know, I've taken a look at a lot of different emerging technologies over the years, it seems like, I can't think of a case when it's not true, that anytime a new emerging technology pops up, it can be used for either good or bad. There's always somebody out there with a nefarious intent that figures out a way of using it for bad purposes. Is it the same in natural language generation?
David Llorente: Absolutely, absolutely. I mean, technologies is, is a facilitator, and AI is exponential, it's what you can do with AI, you cannot do with any other technology, you know, think about, we all know that self-driving cars, that's a problem that you can only solve with AI. Again, and natural language generation, you can only solve with AI. I know this, but you do this at scale, you know, you build an under-memory software. So, it can, you can replicate as much as you want and there is no cost of distribution is a huge, huge step forward. For good or for bad. So, but I always say that we don't have to be afraid of using technology that can be used for bad. What you have to be careful is that what you need to understand is going to be used for bad, but you need to be… make sure that you also use it for good to fight back. Yeah. So, fake news can be also for the guy using the same technology. You know, and I think NLG is incredibly powerful. And, and everyday is… is, I mean the use is growing exponentially. And I have to say we have seen some bad uses of NLG. And we have gotten some bad proposals with this, but again, we truly believe that AI will sustain the in the side of the good guys.
Kevin Benedict: David, let's talk about the future. Where do you see natural language generation? How do you see the evolving over the next five years?
David Llorente: You know, I think… I think when it comes to the… the quality of the output, I don't think it's going to improve much because it's in an incredible level. You know, you can't do it when you read the story by machine, you can't figure out that you know who wrote this story. They are nearly perfect. What is going to change dramatically is that we will be able to write out more complex stories. Yeah. And the process to design and implement the models to be able to write these stories, this process is gonna dramatically change is going to be, it's going to require less human intervention, is gonna require less time. And again, it's going to be able, we're going to be able to, to analyze much more complex data, even non-structured data. So, I think this is gonna be a revolution for… for the companies, you know, and, again, there will be a lot more, a lot more industries, and companies usually need government institutions, you know, think about this, we keep talking about transparency and accountability of governments and institutions, and for example, yeah, so, yes, there is so much data, why… why don't we transform all this data into text, something that we can read, we can understand another format into text but into something that we can actually release it to understand where the money is spent, you know? Right, you know, where are these funds come from, you know, how certain provinces are performing, you know, you know, all these things will help us to definitely have truly bring transparency, accountability to our society, and definitely is going to help us to have more than more democratic countries, more free countries, you know, and definitely, I think, better societies in general. And I think NLG, in combination with other industries is going to help us there, you know, because now their excuses always like, oh, you know, there's so much data, what do we do with this data? It's like, okay, now there is something that is going to help you. And it's your responsibility to actually use that technology to improve your company, your society, your government, the country, the society in general. Yeah. And I think that's my vision. I think NLG in our industries is gonna help, definitely to build better societies.
Kevin Benedict: David, thank you so much for just taking the time in your evening here before your son's soccer game.
David Llorente: Football game. No, rugby, rugby game. Oh, rugby game. Rugby. Yeah.
Kevin Benedict: All right. Well, thank you for giving us some time here and opening up and pulling the curtains aside, and opening our eyes into this world of natural language generation. To me, it's fascinating. That's incredible. So, thank you for talking about your experiences, your knowledge and what you're accomplishing there at Narrativa, thank you so much.
David Llorente: Thank you, Kevin. Thank you so much, it has been a pleasure.