Recently, Visual 1st held a small panel of scientists and technology gurus called “Visual AI — The One-Click Future and Beyond.” This discussed what role AI-driven tools might play in the visual fields. Hans Hartman, President of Suite 48 Analytics and Christian Rondeau, CTO of Mediaclip moderated the panel.
About the panel
Visual 1st described the panel:
“Just like in the early days of mobile photography, it’s not the technology shift itself that matters: it’s the breakthrough in solutions for problems — or unmet needs — we didn’t even know we had.
“Fast-forward from the photo app revolution to today’s surging field of AI-driven tools. What can visual AI really deliver? Solutions for one-click photo curation, one-click auto-layout, one-click photo capture, one-click photo editing — and one-click solutions for what we can’t even fathom yet? And how does this one-click automated future coexist with tools or features that give the user manual control to get exactly what they want?”
The panelists included Appu Shaji, CEO and Chief Scientist at Mobius Labs; Brad Malcolm, CEO and co-founder at EyeQ; and Yair Adato, CTO and co-founder at Bria.
Various approaches to AI
Although never at loggerheads during the panel, the three participants mentioned approaching AI from different angles. For instance, Shaji finds the idea of non-technical people using AI for problem-solving to be particularly interesting.
Malcolm cited making corrections in images as opposed to necessarily enhancements. He also mentioned that just because AI is “hot” doesn’t mean that one should always use it. EyeQ uses AI in its analysis to determine when to use AI and when to use more traditional approaches. Malcom believes this hybrid approach leverages what each approach does best.
Adato discussed utilizing Natural Language Processing, or NLP, to create or invent new concepts. NLP focuses on the ability for computers to understand text and spoken words in much the same way human beings can. The idea here is that NLP could help extract information from documents and aid in categorization and organization of those documents.
Further, as Adato mentioned, AI promises to generate something that was not there before. I have done a bit of dabbling with Snowpixel, for instance, allowing you to describe the image in words and generate an image based on that.
How to evaluate whether AI works outside of a laboratory?
Rondeau asked how to determine if AI works “out in the wild,” out of scientists’ hands. Adato stated that you need specific strategies.
Malcolm continued, stating that EyeQ use quantitative mathematical data. However, this needs to be evaluated by a qualitative A/B test from customers. In other words, do people feel that AI works as promised? This feedback is crucial, as the qualitative data must back up mathematical data.
Shaji continued by stating that AI attempts to mimic human behavior, and that this has particularly taken root in the past 3-4 years. He stated that the scale is enormous because AI can process millions of images.
However, Shaji pointed out, no systems are ever 100% accurate. AI algorithms require extensive training and tests. He stated that AI resides at an intersection of technical and social science. Shaji felt that AI must evolve alongside humans socially. “Body shaming,” which was more present forty years ago, would not be considered appropriate today.
What do they see going forward for AI?
Malcolm’s company is developing AI for corrections on video. He stated that this was now possible due to increases in high-speed performance, mentioning AI Scene Detection to detect specific scenes.
Shaji again cited the usability of AI. He feels that the flurry of AI applications in the next 5-6 years will mean that AI is being trained by people outside the field, joking that he might put himself out of a job.
Adato continued, stating that he felt that all Fortune 500 companies will be software companies in 10 years, and AI will be in all software companies. He then joked that Shaji’s future is secure. He also mentioned the possibility of converting much of AI to a unified platform.
Part of the discussion centered around TensorFlow, an open-source platform for machine learning that many software companies use for developing machine learning and AI.
Facebook already leans heavily on AI
Each minute, upload over 136,000 photos to Facebook. Facebook then uses AI for attempting to “understand” photos and videos. They also use AI for its Deepfake Detection Challenge (DFDC) to attempt to find deepfake videos. The company also uses Deep Text to determine how people use language, slangs, abbreviation and more. Sound familiar? It should. These are the very examples of AI that the Visual 1st panel discussed.
Additionally, Facebook uses AI-based translation. They also use AI policing, which I hope improves greatly.
Facebook CEO Mark Zuckerberg has said, “Our goal is to enable computers to understand language more like humans would, instead of rote ones and machine-like zeros memorization.”
Facebook enters the metaverse
Facebook recently announced a rebranding of its parent company to Meta, indicating that they are all in with virtual and augmented reality, essentially looking toward a social media brought forth in an immersive 3D environment. Some, including Zuckerberg, refer to this as the Metaverse. Just to be clear, we should not confuse this with Marvel’s Multiverse of similar alternate universes!
Investors and companies are interested in new online, multi-dimensional, immersive spaces for a while now. It doesn’t seem to be going away, even if so far it has never quite taken root. However, to be fair, VR/AR has never been implemented particularly well either. But with increased processing power and AI, perhaps it might be.
After all, Facebook has announced that they would hire 10,000 workers in the E.U. in the next five years to develop the metaverse. Facebook Reality Labs, their VR/AR research and development team, announced the creation of a special product group to focus on their vision for the metaverse. And also, they has owned Oculus since 2014. All roads seem to be leading to the metaverse.
If Facebook and others are fully invested in creating an immersive metaverse, they would need to lean heavily on machine learning and AI to take the load off creating and coding. And that’s in addition to their already enormous use of AI for photos, videos and many other areas.
Photographers in the metaverse
The idea of the metaverse, of course, is that it promises more interaction. Photographers might build new communities similar to Facebook groups or message boards. People would earn money and purchase items as digital assets. Users could exchange digital assets as Non-Fungible Tokens, or NFTs, allowing them to “tokenize” their work and sell it in marketplaces throughout the metaverse.
The idea here too is that you would have a verified digital identity traced to a user’s digital wallet. The hope is that people cannot create troll accounts as easily, providing a safer environment.
But even if you don’t wish to do this, photographers may possibly find new opportunities. The company’s vision for the metaverse is that it will not only be available on more immersive virtual reality devices, but accessible across different computing platforms. In other words, anything from VR/AR to computers, mobile devices and gaming consoles.
Given how strongly this metaverse relies on an immersive visual environment, there may be opportunities for photography that we have not thought of yet. Could people experience our night panoramas in some sort of virtual reality? Could there be more of an immersive component while experiencing photos? Could one “enter” a photograph while listening to environmental sounds or music?
Thoughts
Is this good? Is this desirable? Will it even really take root? Do we think Facebook, or Meta, would implement this well?
I don’t have the answers to that. However, my guess is that machine learning and AI will have an enormous part in all this.
For now, my interest in machine learning and AI is perhaps more mundane. I want AI to do all the mundane tasks for me so I can create. If that means getting rid of high-ISO noise and hot pixels such as what Topaz DeNoise AI is addressing, great! Quick and effective masking in images? Fantastic! Eliminating sensor spots from images like Luminar Neo will do? Sure! Finding photos quickly without keywording? I say yes!
As the panelists at Virtual 1st pointed out, the idea is to make products usable. Adato mentioned focusing on trying to do something “small” at first to bring something to production. And that sounds great to me.
Great observation. As was the case with the arrival of smartphone cameras, AI is there to stay and will further democratize photography, while offering opportunities as well as challenges for serious photographers. One point of clarification: Bria’s technology uses a range of AI technologies besides NLP to create synthetic media, including computer vision and GAN (generative adversarial networks).
Thanks for your comments, Hans, and glad you like the observations/opinions. Just so you know, I actually do know that Bria uses computer vision and generative models. However, since I believe Adato only discussed utilizing NLP during the panel, I thought I would just mention that. Hopefully that doesn’t paint Bria as a company only using NLP. I am certain they as well as the others are well-versed in a variety of approaches.
Finally, great job moderating the panel discussion and keeping the conversation flowing.