Development

Describe Images Using OpenAI and Next JS

In this tutorial we’ll be building a Next JS app that takes a user-uploaded image and describes its contents back to us with the OpenAI API.

15 min
January 26, 2024
Chris Held
Development Lead

Most applications of the OpenAI API are basically a slimmed down version of ChatGPT.


Vercel even has hooks to make this easy to implement. But what about images?

In this tutorial we’ll be building a Next JS app that takes a user uploaded image and describes its contents back to us.

an animated image showing AI describing a photograph of a landscape



Setup


To start, you’ll need to sign up for an Open AI account if you haven’t already. From there you’ll need to create a new api key [here](https://platform.openai.com/api-keys). Copy this code off somewhere, we will be using it soon.


At the time of this post the model we need is only available to paid users, but you can prepay as little as $1 to gain access. [Here](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4) is more information about limitations and how to prepay for vision access.


Once you’ve created an account, we can set up our Next JS app using the cli:


   -- CODE line-numbers language- --

   <!--

         npx create-next-app

   -->


All of the defaults are fine for this project. Next we’ll want to install a few libraries to help us work with the Open AI API:


   -- CODE line-numbers language- --

   <!--

         npm i ai openai

   -->


Now, create a new file named env.local at the root of your project and add the following:


   -- CODE line-numbers language-js --

   <!--

         OPENAI_API_KEY=the key you created earlier

   -->


Call the Open AI API


Now we’re ready to write some code. Let’s start with the Open AI call. Inside the `app` folder create a directory called `lib` and then create a file called `classifier.ts.` This is where the logic for classifying our images will go.


   -- CODE line-numbers language-js --

   <!--

         

         import { OpenAI } from "openai";

         import { OpenAIStream } from "ai";

         

         // create a new OpenAI client using our key from earlier

         const openAi = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

         

         export const classifyImage = async (file: File) => {

           // encode our file as a base64 string so it can be sent in an HTTP request

           const encoded = await file

             .arrayBuffer()

             .then((buffer) => Buffer.from(buffer).toString("base64"));

         

           // create an OpenAI request with a prompt

           const completion = await openAi.chat.completions.create({

             model: "gpt-4-vision-preview",

             messages: [

               {

                 role: "user",

                 content: [

                   {

                     type: "text",

                     text: "Describe this image as if you were David Attenborough. Provide as much detail as possible.",

                   },

                   {

                     type: "image_url",

                     image_url: {

                       url: `data:image/jpeg;base64,${encoded}`,

                     },

                   },

                 ],

               },

             ],

             stream: true,

             max_tokens: 1000,

           });

         

           // stream the response

           return OpenAIStream(completion);

         };

   -->


There is a lot of code here, so let’s break it down a little.


   -- CODE line-numbers language-js --

   <!--

         export const classifyImage = async (file: File) => {

           const encoded = await file

             .arrayBuffer()

             .then((buffer) => Buffer.from(buffer).toString("base64"));

   -->


Here we can see our function is taking in a file and then encoding it into a base64 string. This makes it possible for us to send it in an HTTP request, which we’ll need in order to get it over to Open AI.


   -- CODE line-numbers language-js --

   <!--

           const completion = await openAi.chat.completions.create({

             model: "gpt-4-vision-preview",

             messages: [

               {

                 role: "user",

                 content: [

                   {

                     type: "text",

                     text: "Describe this image as if you were David Attenborough. Provide as much detail as possible.",

                   },

                   {

                     type: "image_url",

                     image_url: {

                       url: `data:image/jpeg;base64,${encoded}`,

                     },

                   },

                 ],

               },

             ],

             stream: true,

             max_tokens: 1000,

           });

   -->


This is using the Open AI npm module to generate a completion, which is an object used to represent a request to Open AI. The messages array are where we tell the AI what we need it to do.

In this case we’ve told it to describe an image, and provided the base64 encoded string earlier as our image data. We’ve also set stream to true because we want to replicate Chat-GPT’s response pattern of writing text as it comes in since this request will take awhile, and we want to have as good of a user experience as possible. We also set a max_tokens value as a sort of sanity check to make sure our request doesn’t get too expensive (both computationally and on your wallet).


   -- CODE line-numbers language-js --

   <!--

         return OpenAIStream(completion);

   -->


To complete the function we call Vercel AI’s OpenAIStream function, which takes a completion and handles streaming the response.


Create an API endpoint


In order to actually call this route with a file, we’re going to create an API route. This will make it callable from one of our client components via a `fetch` call. To do that we need to create an `app/api/classify` directory, then create a `route.ts` file so Next JS’s file based routing will pick it up. After you’ve created the file paste the following code in:


   -- CODE line-numbers language-js --

   <!--

         import { classifyImage } from "@/app/lib/classifier";

         import { NextResponse, NextRequest } from "next/server";

         import { StreamingTextResponse } from "ai";

         

         // Set the runtime to edge for best performance

         export const runtime = "edge";

         

         // add a listener to POST requests

         export async function POST(request: NextRequest) {

           // read our file from request data

           const data = await request.formData();

           const file: File | null = data.get("file") as unknown as File;

         

           if (!file) {

             return NextResponse.json(

               { message: "File not present in body" },

               { status: 400, statusText: "Bad Request" }

             );

           }

         

           //call our classify function and stream to the client

           const response = await classifyImage(file);

           return new StreamingTextResponse(response);

     }

   -->


Let’s take a closer look at what we’re doing here:


   -- CODE line-numbers language-js --

   <!--

         export async function POST

   -->


This tells Next to register a POST route at /api/classify.


   -- CODE line-numbers language-js --

   <!--

         const data = await request.formData();

           const file: File | null = data.get("file") as unknown as File;

         

           if (!file) {

             return NextResponse.json(

               { message: "File not present in body" },

               { status: 400, statusText: "Bad Request" }

             );

           }

         

         const response = await classifyImageStream(file);

         return new StreamingTextResponse(response);

   -->


Here we’re getting a file from the form data sent along with the POST request and doing some validation to make sure it’s present. If the file is there we send it along to the function we created earlier.  We lean on another tool from Vercel’s ai package to send the response back as a stream.


Create an upload component


Now that we’ve got our server side implementation wrapped up and have an endpoint, we need to create a mechanism to call it. We’ll create an `ImageClassifier` component to do this. Create a `ui` folder inside the `app` folder and add a new file named `imageClassifier.tsx` in it. Paste the following code inside:


   -- CODE line-numbers language-js --

   <!--

         "use client";

         

         import { useState, FormEvent } from "react";

         

         export default function ImageClassifier() {

           // set up some variables to help manage component state

           const [file, setFile] = useState<File | null>(null);

           const [image, setImage] = useState<string | null>(null);

           const [response, setResponse] = useState("");

           const [submitted, setSubmitted] = useState(false);

           const [inputKey, setInputKey] = useState(new Date().toString());

         

           const onSubmit = async (e: FormEvent<HTMLFormElement>) => {

             e.preventDefault();

             setSubmitted(true);

             // prepare and submit our form

             const formData = new FormData();

             formData.append("file", file as File);

             fetch("/api/classifystream", {

               method: "POST",

               body: formData,

             }).then((res) => {

               // create a stream from the response

               const reader = res.body?.getReader();

               return new ReadableStream({

                 start(controller) {

                   return pump();

                   function pump(): any {

                     return reader?.read().then(({ done, value }) => {

                       // no more data - exit our loop

                       if (done) {

                         controller.close();

                         return;

                       }

                       controller.enqueue(value);

                       // decode the current chunk and append to our response value

                       const decoded = new TextDecoder("utf-8").decode(value);

                       setResponse((prev) => `${prev}${decoded}`);

                       return pump();

                     });

                   }

                 },

               });

             });

           };

     

           // resets the form so we can upload more images

           const onReset = () => {

             setFile(null);

             setImage(null);

             setResponse("");

             setSubmitted(false);

             setInputKey(new Date().toString());

           };

         

           return (

             <div className="max-w-4xl">

               {image ? (

                 <img

                   src={image}

                   alt="An image to classify"

                   className="mb-8 w-full object-contain"

                 />

               ) : null}

               <form onSubmit={onSubmit}>

                 <input

                   key={inputKey}

                   type="file"

                   accept="image/jpeg"

                   onChange={(e) => {

                     // sets or clears our image and file variables

                     if (e.target.files?.length) {

                       setFile(e.target?.files[0]);

                       setImage(URL.createObjectURL(e.target?.files[0]));

                     } else {

                       setFile(null);

                       setImage(null);

                     }

                   }}

                 />

                 <p className="py-8 text-slate-800">

                   {submitted && !response ? "Contacting Sir Attenborogh..." : response}

                 </p>

                 <div className="flex flex-row">

                   <button

                     className={`${

                       submitted || !file ? "opacity-50" : "hover:bg-gray-100"

                     } bg-white mr-4 text-slate-800 font-semibold py-2 px-4 border border-gray-400 rounded shadow`}

                     type="submit"

                     disabled={submitted || !file}

                   >

                     Describe

                   </button>

                   <button

                     className="bg-white hover:bg-red-100 text-red-800 font-semibold py-2 px-4 border border-red-400 rounded shadow"

                     type="button"

                     onClick={onReset}

                   >

                     Reset

                   </button>

                 </div>

               </form>

             </div>

           );

         }

   -->


This is our most complex piece of code yet, so let’s break it down:


   -- CODE line-numbers language-js --

   <!--

         "use client"

   -->


This tells Next JS we’re using a client component, which will let us use state hooks as well as tell Next JS to not render this component on the server.


   -- CODE line-numbers language-js --

   <!--

           const [file, setFile] = useState<File | null>(null);

           const [image, setImage] = useState<string | null>(null);

           const [response, setResponse] = useState("");

           const [submitted, setSubmitted] = useState(false);

           const [inputKey, setInputKey] = useState(new Date().toString());

   -->

We have a lot of state to look after here. `file` is the file we’re going to eventually be sending to the server, `image` is that file represented as an Object URL for display. `response` will be used to capture our response from the server.

We need to store this as a state variable because it will be coming back in chunks as a streaming response. `submitted` is used as a helper to disable form elements and show loading state, and `inputKey` is used as way to force React to clear out our input when we reset our form.


Most of the rest of the component is markup, but there are two functions that deserve a closer look (`onReset` simply resets our form values so we will skip that one).


   -- CODE line-numbers language-js --

   <!--

              <input

               key={inputKey}

               type="file"

               accept="image/jpeg"

               onChange={(e) => {

                 if (e.target.files?.length) {

                   setFile(e.target?.files[0]);

                   setImage(URL.createObjectURL(e.target?.files[0]));

                 } else {

                   setFile(null);

                   setImage(null);

                 }

               }}

             />

   -->


The `onChange` function checks if a file was selected, and if it was stores it in state as well as converts it to an ObjectUrl to show a preview to the user.


`onSubmit` is where the file is handled and passed off to the server. If we were only dealing with text, we could leverage Vercel’s [useChat](https://sdk.vercel.ai/docs/api-reference/use-chat) hook and abstract most of this complexity away, but since we’re dealing with binary data we will stream it ourselves. First we get our data ready to submit and send it to our server using `fetch`:


   -- CODE line-numbers language-js --

   <!--

         const formData = new FormData();

         formData.append("file", file as File);

         fetch("/api/classify", {

           method: "POST",

           body: formData,

         }).then((res) => {

   -->


Next we handle streaming:


   -- CODE line-numbers language-js --

   <!--

           const reader = res.body?.getReader();

           return new ReadableStream({

             start(controller) {

               return pump();

               function pump(): any {

                 return reader?.read().then(({ done, value }) => {

                   if (done) {

                     controller.close();

                     return;

                   }

                   controller.enqueue(value);

                   const decoded = new TextDecoder("utf-8").decode(value);

                   setResponse((prev) => `${prev}${decoded}`);

                   return pump();

                 });

               }

             },

           });

   -->


We take advantage of `fetch`'s stream implementation and create a `ReadableStream` to take in data as it comes in. To read the stream as it comes in we create a `pump` function and call it every time we receive a value from our stream until it’s done. We then take the value and append it to our response, giving us the same sort of effect you see with ChatGPT.


Wrapping up


Our last step is to render our `ImageClassifier` component on a page. Open `app/page.tsx` and replace its’ contents with the following:


   -- CODE line-numbers language-js --

   <!--

         import ImageClassifier from "./ui/imageClassifier";

         

         export default async function Home() {

           return (

             <main className="flex min-h-screen flex-col items-center p-24">

               <ImageClassifier />

             </main>

           );

         }

   -->


To test our app we can run `npm run dev` and head to http://localhost:3000.


Voila! We now have a functioning app that takes an image and describes it.

Though the example was a little silly there are quite a few interesting applications we could apply by adjusting the prompt. Also, since we’re exposing an API endpoint you could call it with another client, like a mobile app for example.

Interesting in learning more about the tools we used?

Here are some resources to help you get started:


- [Vercel AI Docs](https://sdk.vercel.ai/docs)

- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)

- [ReadableStream MDN Documentation](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream)


You can find a slightly modified version of the source code [here](https://github.com/chris-held/image-classifier). In this version you can see a non-streaming API route as well and how they are different. Thanks for reading and happy coding!

Actionable UX audit kit

  • Guide with Checklist
  • UX Audit Template for Figma
  • UX Audit Report Template for Figma
  • Walkthrough Video
By filling out this form you agree to receive our super helpful design newsletter and announcements from the Headway design crew.

Create better products in just 10 minutes per week

Learn how to launch and grow products less chaos.

See what our crew shares inside our private slack channels to stay on top of industry trends.

By filling out this form you agree to receive a super helpful weekly newsletter and announcements from the Headway crew.