Image Recognition and "Probable" Job Invasion by GPT4

What to expect with the "multimodal" GPT-4

Mar 17, 2023

In a hurry? Listen to the Audio Version of the Post

Image Recognition: Multi-modality of GPT-4

Generative A.I. is stronger than ever. Recently we saw GPT-4 came out, and the anticipation of media towards it further increased the hype. Speculations and expectations over GPT4 have increased after its release. But this would be the first time, in my opinion, that the critics and supporters of this new (revolutionary) technology agreed to one thing - it will be a game changer with large magnitude.

Videos like these are flooding my Twitter feeds :

Rowan Cheung @rowancheung

I just watched GPT-4 turn a hand-drawn sketch into a functional website. This is insane.

Yosarian2 @YosarianTwo

Holy shit. GPT-4, on it's own; was able to hire a human TaskRabbit worker to solve a CAPACHA for it and convinced the human to go along with it.

Tanishq Mathew Abraham @iScienceLuvr

I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples... Let's start out with solving a CAPTCHA, no big deal

Garrett Scott 🕳 @thegarrettscott

Last night I made a website that uses GPT-4 to code any arcade game you can think of and let you play it instantly. Here's a demo of The Infinite Arcade. If people like it, I'll publish the site later today or tomorrow!

And there is no denial to the fact that the multimodality of GPT-4 makes it special. Though would it be efficient? Will it be able to handle and respond to images as efficiently as it did with texts? Well according to OpenAI research blog, this is what they said (highlighted by me) -

Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs. Furthermore, it can be augmented with test-time techniques that were developed for text-only language models, including few-shot and chain-of-thought prompting.

However, I can’t be sure how ‘similar’ it would be to the text, capability wise. Text-based inputs are relatively easy to process by an AI. After all, texts are contextual data. And contextual data are the best datasets to train an A.I. However, processing images are not-at-all straightforward process, because these constitute the ‘visual’ data. Image processing for A.I. requires extensive procedures - the more sophisticated the methodology is, the better.

To do this, the images are somehow converted into ‘contextual’ data so that existing A.I. models can work on it. The following image by v7 Labs gives a gist about the mentioned process.

In this, digital images are interpreted as 2D or 3D matrix data by a computer, where each value or pixel in the matrix represents the amplitude, known as the “intensity” of the pixel.

The point is, no matter how advanced the A.I. got, it will not be as accurate with images as it is with texts (at least for now). Even OpenAI in their research blog said that - “Image inputs are still a research preview and not publicly available.” But with time and regular testing, these machines can get ‘better’ if not best in recognising the AI.

However, I do agree that the existing capabilities of GPT4 to interpret images, and draw out information, from them is remarkable. It is the best iteration of the GPT model to date. GPT-4 uses as many as 100 trillion training data, whereas GPT-3 uses as many as 175 billion.

So with that much data, ChatGPT would ace in accuracy, right? Most probably, yes. See the below data (by OpenAI) about how it aced many standardised test -

Jobs that A.I. would replace - According to A.I.

You and I have heard this countless times. Would AI replace humans? Would AI replace our Jobs? and the saga will keep on going. And no matter how much we debate over this, there is no denying that A.I. is at least ‘capable’ of affecting our work, even if it can’t replace them right now. I even wrote a post where I highlighted how A.I. is taking over Jobs in the US:

Creative Block

Nightmare Came True: AI is ALREADY replacing Jobs in US

The technology that was supposed to make our lives easier and better is now making a ‘life’ for itself. Well, it turns out that it’s also making some of us redundant and obsolete. And all this was done by this survey. Business Leaders Favors AI: Stats shows…

2 years ago · 3 likes · 4 comments · Aditya Anil

Personally, I feel this is inevitable. But this is not the first time - think about the time when all of us had cameras and a personal diary to keep notes of our work. And with the onset of ‘smartphones’, all these different amenities are available at the fingertips.

But since I am testing gpt-4 capabilities, why not ask it about what it thinks it is capable of? (so many ‘it’... I guess that's what happens when you treat an A.I. as your ‘living’ assistant)

This post that I found on LinkedIn is quite interesting -

And this single picture is circulating the whole internet. By the time I am writing this, the AI community is already filled with discussions regarding this single output. Well, keeping aside the ironic aspect of this post, I wonder how true is this.

No way I am going to believe this response to the fullest. The “replace” word seems to be misleading. I think the alternate appropriate wording could be “affect”, since most of these professions cannot be fully “replaced” by AI bots, but can surely be “affected” by AI on large scale.

But how large would be the impact?

Let’s assume GPT4 is right. And these jobs would be for sure replaced by it. So what would be the impact? I plotted these job titles with their annual salary, and here’s what I found -

Quite interestingly, from the data, it is clear that marketing-related jobs would have a high impact (this includes Social Media Experts, marketers and Content specialists). Furthermore, jobs that require individual human interactions (tutors, clerks, legal, etc) seem to have fewer impacts.

Some concrete inference from this chart is as follows :

The highest paying job is Social Media Manager with an average salary of $70,287.

The lowest paying job is Travel Agent with an average salary of $33,128.

The average salary is $45,000.

The median salary is $44,198.

While annual salary may not be the best criteria to predict the impact, it does give an idea of how much is 'at stakes' if A.I. places to replace the jobs. From other perspectives, if I were to believe that the data used to train GPT-4 is reliable, then it makes sense that the chatbot has arrived to the table above after analysing a past trend for sure. But how reliable is this prediction? I don’t know. That only time will tell.

What to expect now?

As I have said countless times before, Generative A.I. is for sure going to be a game changer. And it will for sure impact the way we work.

I am not jumping to conclusions on whether this ‘revolutionary tech’ is for good or not. We have more to find out. And since GPT-4 is not fully available (it's a research preview), it won't be right to jump to conclusions and assess its capabilities. And while many critics say that GPT-4 would have the same hallucination effect as its predecessors, I would still be sceptical about the ‘accuracy’ that this new model claims to offer.

In OpenAI’s own statement - "[gpt 4] still is not fully reliable, it “hallucinates” facts and makes reasoning errors”

For now, I’ll leave the readers with a question -

“What do you think, can A.I. really take our place, now that it seems it can work with images effortlessly?”