It’s screaming headlines like these that are hard to escape at the moment. At the same time, the practical use of artificial intelligence-based tools such as ChatGPT, DALL-E, or Midjourney is still quite distant; more of a theoretical option, perhaps a distant future scenario. Or is all of this already concretely applicable – especially with a focus on culinary arts and the development of new ideas and concepts?
I spent a long winter’s evening looking at these tools and this very question. The task was clear: Appetizing looking food photos should be generated, partly based on my own images from my pool (test also using external images from the Internet), but partly also by pure text input and further refinement.
With a few simple tricks (and a healthy streak of tech-savvy FSimpsons portraiture), Midjourney may be able to be used in a context that is quite exciting for the hospitality industry, I suspect. After all, the tools can do much more than create memes, Simpsons portraits, dystopian Blade Runner scenes, or funny space monsters (though they’re actually pretty good at that).
What exactly are DALL-E and Midjourney?
But first, a quick look at the current status quo. In addition to tools such as Stable Diffusion or Imagen from Google, Midjourney and Dall-E are currently the most popular text-to-image programs. What they all have in common is that they can create images from text using a simple input mask. For the image shown above, my process started with a very simple imagine/beef tenderloin in a jacket on mushrooms and beef jus style input.
To be able to calculate these results, the tools use very different expressions of artificial neural networks and many years of training based on huge, mostly freely available data from the Internet.
Through additional inputs and interactions with the interface, one’s own works can be brought into countless artistic styles and, with a click of the mouse, can be varied, refined, remixed, enlarged, or created over and over again with different calculation algorithms.
However, there are also features that are not quite so obvious. Features that at least touch, perhaps even exceed, a gray area in copyright issues. This is because, in addition to their own prompts (i.e. text inputs), the systems can also be fed with external information. Links to websites, for example, which then style the next calculation. Or, they use user-uploaded images as a rough marching course for the desired look or content. Sometimes, by the way, not so crude, but frighteningly concrete and close to the original. Thus, instead of a style described in the text, the photographic conception of a photographer or filmmaker is now suddenly used as a creative guardrail – and their style is sometimes imitated very authentically (very often, of course, this also goes grandly wrong). In the same way, the image of an interior, i.e. a real location, can serve as a (naturally distorted) backdrop for an AI-generated motif. The machines are still a little rough around the edges, but the trend is clear.
Practical benefits of Midjourney & Co for the catering industry?
It becomes interesting when instead of referencing high-end images, you take your own photos as a basis and add potential deficits in photography to them.Good light, for example. Or the play with sharpness and blur. In this way, you could choose photos of authentic dishes from your own restaurant as the basis, but then have individual ingredients, cooking methods, components or plating swapped out via voice command. Or just take the look in the direction of another photo, but keep the content.
In many cases, this doesn’t really look good at first. With a few additional commands, a variation or two – and a bit of luck, however, you can create images that at least don’t have to hide behind poorly made food photos.
What these generated motifs lack, of course, is any form of authenticity to begin with. In other words, the very component that is so important in social media in general and in gastronomy communication in particular. However, this shortcoming can at least be reduced somewhat by adding real and professionally made pictures of your own location to your command and at least offering the social media guest a familiar ambience around the plate shown. By the way, you can of course also reference your own tableware to create further individual points of reference here.
So in theory, a lot is already possible. And much of it will certainly have an even more practical use in the future. But what is also directly noticeable in my numerous sample images is the fact that the AIs unfortunately seldom come out of their air-conditioned basements and above all still have no real idea of colors, textures and proportions. In the end, however, it is precisely these details that make for an image that whets the appetite. And clearly, a fillet of beef with rosemary makes is a sensible and classic composition – a combination with extremely many and misshapen, clearly too large olives, then but rather less .
By the way, my intention here was not a beef tenderloin, actually the had fed the machine with a prompt for a pasta dish. Normally, this could have been refined from here on and the direction corrected again – but I found this result quite impressive as well and we let ourselves drift a bit here. By the way, the prompt for Midjourney looked like this:
imagine/ Braised lamb, torn olives, pecorino, Champignons, Food photo, On a table of a modern upscale restaurant, Stockholm :: photorealistic, ultra realistic, foreground focus, 8k, volumetric light, filmic, fuji velvia, leica look, ultra detailed
The very first proposals were far less impressive. Only when I gradually specified my idea did we get closer. To do this, you can select one of the four variants in Midjourney and generate four new proposals based on it. Rinse and repeat.
However, the effort that went into creating these few sample images should not be underestimated – and I was also able to approach my little experiment with a very open mind. Of course, this is different in a real scenario. Nevertheless, the effort is significantly less than for any real shooting – but it is not negligible either. To be able to really experiment intensively, even the free versions of the tools are not really sufficient. Only through refinement, by excluding false branches or the addition of important keywords, can meaningful results be produced.
Either way, this hints at a highly exciting new technical opportunity that already has enormous potential in many industries and sectors. Actually, in all industries where timely initial sketches or visualizations are helpful. In this case in particular, I am explicitly including culinary arts again: Because a first visual idea, a first impression for a new dish or a combination of products, can be rendered quickly this way (and with all the limitations and problems that still exist here). Based on such a sketch, further considerations can then be made and discussed. On which plate could the new idea be served? What is the effect of which placement and which proportion? Of course, in the end completely different questions decide, first of all the taste – but as a first idea very helpful.
By the way, the following image is only four clicks away from the first draft just shown. Here I turned right at number 2, as the suggestion above. You can still see the contours of the picture that was initially on the table. In the meantime, it has become an atrium. Still we are far away from a real dish – but the idea with the raised and structured plate is new, also the suggestion to pour a dark jus and a herb oil is not wrong, the flowers quite pretty – only about the banana chips we still have to talk…
In addition to these more graphically oriented tools, new text-generating tools such as ChtGPT are also waiting in the wings to conquer the web and change it permanently. The possible output ideas are many and varied. ChatGPT can generate social media posts, write emails or blog posts, generate introductions for the next newsletter, give keywords for Google Ads campaigns – or even write recipes.
However, initial experiments here also show that we have certainly not yet reached a resilient level in culinary matters. Although the machines generate recipe texts that are readable and conform to the formal standard at first glance – if you take a closer look, you certainly don’t want to prepare them and naturally miss all the details that a human cook would have provided. You can clearly see the basis on which training was done and that the focus was initially on the absolute and minimal basics. In this context, I would not like to give you then the recipe for the previously generated image 😉
Recipe for braised lamb with olives and mushrooms (by OpenAI – at your own risk)
-500g lamb meat, cut into small cubes
-2 tbsp olive oil
-3 medium onions, thinly sliced
-2 cloves of garlic, finely chopped
-200g mushrooms, cut into slices
-1 tbsp tomato paste
-300ml vegetable broth
-Salt and pepper
-2 tablespoons fresh thyme, chopped
-4 tablespoons olives, chopped
-1 tablespoon fresh oregano, chopped
In a large frying pan, heat the olive oil.
2. add lamb cubes and sauté over medium heat until lightly browned on the surface.
3. add onions and garlic and fry for another 3-4 minutes.
4. add mushrooms and tomato paste and fry for another 2-3 minutes.
5. add vegetable broth and mix everything well. Reduce heat and simmer everything for 8-10 minutes.
Add salt, pepper, thyme and oregano and mix well.
Add the olives and simmer for another 5-10 minutes.