AI

AI


Introduction

The last few years have been a dream run for Artificial Intelligence enthusiasts and machine learning professionals. These technologies have evolved from being a niche to becoming mainstream, and are impacting millions of lives today. Countries now have dedicated AI ministers and budgets to make sure they stay relevant in this race.

The same has been true for a data science professional. A few years back – you would have been comfortable knowing a few tools and techniques. Not anymore! There is so much happening in this domain and so much to keep pace with – it feels mind boggling at times.

This is why I thought of taking a step back and looking at the developments in some of the key areas in Artificial Intelligence from a data science practitioners’ perspective. What were these breakthroughs? What happened in 2018 and what can be expected in 2019? Read this article to find out!

P.S. As with any forecasts, these are my takes. These are based on me trying to connect the dots. If you have a different perspective – I would love to hear it. Do let me know what you think might change in 2019.

Areas we’ll cover in this article

  • Natural Language Processing (NLP)
  • Computer Vision
  • Tools and Libraries
  • Reinforcement Learning
  • AI for Good – A Move Towards Ethical AI

Natural Language Processing (NLP)


Making machines parse words and sentences has always seemed like a dream. There are way too many nuances and aspects of a language that even humans struggle to grasp at times. But 2018 has truly been a watershed moment for NLP.

We saw one remarkable breakthrough after another – ULMFiT, ELMO, OpenAI’s Transformer and Google’s BERT to name a few. The successful application of transfer learning (the art of being able to apply pretrained models to data) to NLP tasks has blown open the door to potentially unlimited applications. Our podcast with Sebastian Ruder further cemented our belief in how far his field has traversed in recent times. As a side note, that’s a must-listen podcast for all NLP enthusiasts.

Let’s look at some of these key developments in a bit more detail. And if you’re looking to learn the ropes in NLP and are looking for a place to get started, make sure you head over to this ‘NLP using Python‘ course. It’s as good a place as any to start your text-fuelled journey!

 

ULMFiT

Designed by Sebastian Ruder and fast.ai’s Jeremy Howard, ULMFiT was the first framework that got the NLP transfer learning party started this year. For the uninitiated, it stands for Universal Language Model Fine-Tuning. Jeremy and Sebastian have truly put the word Universal in ULMFiT – the framework can be applied to almost any NLP task!

The best part about ULMFiT and the subsequent frameworks we’ll see soon? You don’t need to train models from scratch! These researchers have done the hard bit for you – take their learning and apply it in your own projects. ULMFiT outperformed state-of-the-art methods in six text classification tasks.

You can read this excellent tutorial by Prateek Joshi on how to get started with ULMFiT for any text classification problem.

 

ELMo

Want to take a guess at what ELMo stands for? It’s short for Embeddings from Language Models. Pretty creative, eh? Apart from it’s name resembling the famous Sesame Street character, ELMo grabbed the attention of the ML community as soon as it was released.

ELMo uses language models to obtain embeddings for each word while also considering the context in which the word fits into the sentence or paragraph. Context is such a crucial aspect of NLP that most people failed to grasp before. ELMo uses bi-directional LSTMs to create the embeddings. Don’t worry if that sounds like a mouthful – check out this article to get a really simple overview of what LSTMs are and how they work.

Like ULMFiT, ELMo significantly improves the performance of a wide variety of NLP tasks, like sentiment analysis and question answering. Read more about it here.

 

Google’s BERT

Quite a few experts have claimed that the release of BERT marks a new era in NLP. Following ULMFiT and ELMo, BERT really blew away the competition with it’s performance. As the original paper states, “BERT is conceptually simple and empirically powerful”.

BERT obtained state-of-the-art results on 11 (yes, 11!) NLP tasks. Check out their results on the SQuAD benchmark:

SQuAD v1.1 Leaderboard (Oct 8th 2018)Test EMTest F11st Place Ensemble – BERT87.493.22nd Place Ensemble – nlnet86.091.71st Place Single Model – BERT85.191.82nd Place Single Model – nlnet83.590.1 

Interested in getting started? You can use either the PyTorch implementation or Google’s own TensorFlow codeto try and replicate the results on your own machine.

I’m fairly certain you are wondering what BERT stands for at this point. 

 It’s Bidirectional Encoder Representations from Transformers. Full marks if you got it right the first time.

 

Facebook’s PyText

How could Facebook stay out of the race? They have open-sourced their own deep learning NLP framework called PyText. It was released earlier this week so I’m still to experiment with it, but the early reviews are extremely promising. According to research published by FB, PyText has led to a 10% increase in accuracy of conversational models and reduced the training time as well.
PyText is actually behind a few of Facebook’s own products like the FB Messenger. So working on this adds some real-world value to your own portfolio (apart from the invaluable knowledge you’ll gain obviously).
You can try it out yourself by downloading the code from this GitHub repo.

 

Google Duplex

If you haven’t heard of Google Duplex yet, where have you been?! Sundar Pichai knocked it out of the park with this demo and it has been in the headlines ever since:



Since this is a Google product, there’s a slim chance of them open sourcing the code behind it. But wow! That’s a pretty awesome audio processing application to showcase. Of course it raises a lot of ethical and privacy questions, but that’s a discussion for later in this article. For now, just revel in how far we have come with ML in recent years.

 

Who better than Sebastian Ruder himself to provide a handle on where NLP is headed in 2019? Here are his thoughts:

  1. Pretrained language model embeddings will become ubiquitous; it will be rare to have a state-of-the-art model that is not using them
  2. We’ll see pretrained representations that can encode specialized informationwhich is complementary to language model embeddings. We will be able to combine different types of pretrained representations depending on the requirements of the task
  3. We’ll see more work on multilingual applications and cross-lingual models. In particular, building on cross-lingual word embeddings, we will see the emergence of deep pretrained cross-lingual representations

Natural Language Processing (NLP)


Making machines parse words and sentences has always seemed like a dream. There are way too many nuances and aspects of a language that even humans struggle to grasp at times. But 2018 has truly been a watershed moment for NLP.

We saw one remarkable breakthrough after another – ULMFiT, ELMO, OpenAI’s Transformer and Google’s BERT to name a few. The successful application of transfer learning (the art of being able to apply pretrained models to data) to NLP tasks has blown open the door to potentially unlimited applications. Our podcast with Sebastian Ruder further cemented our belief in how far his field has traversed in recent times. As a side note, that’s a must-listen podcast for all NLP enthusiasts.

Let’s look at some of these key developments in a bit more detail. And if you’re looking to learn the ropes in NLP and are looking for a place to get started, make sure you head over to this ‘NLP using Python‘ course. It’s as good a place as any to start your text-fuelled journey!

 

ULMFiT

Designed by Sebastian Ruder and fast.ai’s Jeremy Howard, ULMFiT was the first framework that got the NLP transfer learning party started this year. For the uninitiated, it stands for Universal Language Model Fine-Tuning. Jeremy and Sebastian have truly put the word Universal in ULMFiT – the framework can be applied to almost any NLP task!

The best part about ULMFiT and the subsequent frameworks we’ll see soon? You don’t need to train models from scratch! These researchers have done the hard bit for you – take their learning and apply it in your own projects. ULMFiT outperformed state-of-the-art methods in six text classification tasks.

You can read this excellent tutorial by Prateek Joshi on how to get started with ULMFiT for any text classification problem.

 

ELMo

Want to take a guess at what ELMo stands for? It’s short for Embeddings from Language Models. Pretty creative, eh? Apart from it’s name resembling the famous Sesame Street character, ELMo grabbed the attention of the ML community as soon as it was released.

ELMo uses language models to obtain embeddings for each word while also considering the context in which the word fits into the sentence or paragraph. Context is such a crucial aspect of NLP that most people failed to grasp before. ELMo uses bi-directional LSTMs to create the embeddings. Don’t worry if that sounds like a mouthful – check out this article to get a really simple overview of what LSTMs are and how they work.

Like ULMFiT, ELMo significantly improves the performance of a wide variety of NLP tasks, like sentiment analysis and question answering. Read more about it here.

 

Google’s BERT

Quite a few experts have claimed that the release of BERT marks a new era in NLP. Following ULMFiT and ELMo, BERT really blew away the competition with it’s performance. As the original paper states, “BERT is conceptually simple and empirically powerful”.

BERT obtained state-of-the-art results on 11 (yes, 11!) NLP tasks. Check out their results on the SQuAD benchmark:

SQuAD v1.1 Leaderboard (Oct 8th 2018)Test EMTest F11st Place Ensemble – BERT87.493.22nd Place Ensemble – nlnet86.091.71st Place Single Model – BERT85.191.82nd Place Single Model – nlnet83.590.1 

Interested in getting started? You can use either the PyTorch implementation or Google’s own TensorFlow codeto try and replicate the results on your own machine.

I’m fairly certain you are wondering what BERT stands for at this point. 

 It’s Bidirectional Encoder Representations from Transformers. Full marks if you got it right the first time.

 

Facebook’s PyText

How could Facebook stay out of the race? They have open-sourced their own deep learning NLP framework called PyText. It was released earlier this week so I’m still to experiment with it, but the early reviews are extremely promising. According to research published by FB, PyText has led to a 10% increase in accuracy of conversational models and reduced the training time as well.
PyText is actually behind a few of Facebook’s own products like the FB Messenger. So working on this adds some real-world value to your own portfolio (apart from the invaluable knowledge you’ll gain obviously).
You can try it out yourself by downloading the code from this GitHub repo.

 

Google Duplex

If you haven’t heard of Google Duplex yet, where have you been?! Sundar Pichai knocked it out of the park with this demo and it has been in the headlines ever since:



Since this is a Google product, there’s a slim chance of them open sourcing the code behind it. But wow! That’s a pretty awesome audio processing application to showcase. Of course it raises a lot of ethical and privacy questions, but that’s a discussion for later in this article. For now, just revel in how far we have come with ML in recent years.

 

Who better than Sebastian Ruder himself to provide a handle on where NLP is headed in 2019? Here are his thoughts:

  1. Pretrained language model embeddings will become ubiquitous; it will be rare to have a state-of-the-art model that is not using them
  2. We’ll see pretrained representations that can encode specialized informationwhich is complementary to language model embeddings. We will be able to combine different types of pretrained representations depending on the requirements of the task
  3. We’ll see more work on multilingual applications and cross-lingual models. In particular, building on cross-lingual word embeddings, we will see the emergence of deep pretrained cross-lingual representations

 Computer Vision


This is easily the most popular field right now in the deep learning space. I feel like we have plucked the low-hanging fruits of computer vision to quite an extent and are already in the refining stage. Whether it’s image or video, we have seen a plethora of frameworks and libraries that have made computer vision tasks a breeze.

We at Analytics Vidhya spent a lot of time this year working on democratizing these concepts. Check out our computer vision specific articles here, covering topics from object detection in videos and images to lists of pretrained models to get your deep learning journey started.

Here’s my pick of the best developments we saw in CV this year.

And if you’re curious about this wonderful field (actually going to become one of the hottest jobs in the industry soon), then go ahead and start your journey with our ‘Computer Vision using Deep Learning’ course.

The Release of BigGANs

Ian Goodfellow designed GANs in 2014, and the concept has spawned multiple and diverse applications since. Year after year we see the original concept being tweaked to fit a practical use case. But one thing has remained fairly consistent till this year – images generated by machines were fairly easy to spot. There would always be some inconsistency in the frame which made the distinction fairly obvious.

But that boundary has started to seep away in recent months. And with the creation of BigGANs, that boundary could be removed permanently. Check out the below images generated using this method:


Unless you take a microscope to it, you won’t be able to tell if there’s anything wrong with that collection. Concerning or exciting? I’ll leave that up to you, but there’s no doubt GANs are changing the way we perceive digital images (and videos).

For the data scientists out there, these models were trained on the ImageNet dataset first and then the JFT-300M data to showcase that these models transfer well from one set to the other. I would also to direct you to the GAN Dissection page – a really cool way to visualize and understand GANs.

 

Fast.ai’s Model being Trained on ImageNet in 18 Minutes

This was a really cool development. There is a very common belief that you need a ton of data along with heavy computational resources to perform proper deep learning tasks. That includes training a model from scratch on the ImageNet dataset. I understand that perception – most of us thought the same before a few folks at fast.ai found a way to prove all of us wrong.

Their model gave an accuracy of 93% in an impressive 18 minutes timeframe. The hardware they used, detailed in their blog post, contained 16 public AWS cloud instances, each with 8 NVIDIA V100 GPUs. They built the algorithm using the fastai and PyTorch libraries.


The total cost of putting the whole thing together came out to be just $40! Jeremy has described their approach, including techniques, in much more detail here. A win for everyone!

 

NVIDIA’s vid2vid technique

Image processing has come leaps and bounds in the last 4-5 years, but what about video? Translating methods from a static frame to a dynamic one has proved to be a little tougher than most imagined. Can you take a video sequence and predict what will happen in the next frame? It had been explored before but the published research had been vague, at best.

NVIDIA decided to open source their approach earlier this year, and it was met with widespread praise. The goal of their vid2vid approach is to learn a mapping function from a given input video in order to produce an output video which depicts the contents of the input video with incredible precision.


You can try out their PyTorch implementation available on their GitHub here.

 

Like I mentioned earlier, we might see modifications rather than inventions in 2019. It might feel like more of the same – self-driving cars, facial recognition algorithms, virtual reality, etc. Feel free to disagree with me here and add your point of view – I would love to know what else we can expect next year that we haven’t already seen.

Drones, pending political and government approvals, might finally get the green light in the United States (India is far behind there). Personally, I would like to see a lot of the research being implemented in real-world scenarios. Conferences like CVPR and ICML portray the latest in this field but how close are those projects to being used in reality?

Visual question answering and visual dialog systems could finally make their long-awaited debut soon. These systems lack the ability to generalize but the expectation is that we’ll see an integrated multi-modal approach soon.

Self-supervised learning came to the forefront this year. I can bet on that being used in far more studies next year. It’s a really cool line of learning – the labels are directly determined from the data we input, rather than wasting time labelling images manually. Fingers crossed!



Report Page