Deep learning technology is increasingly changing how people produce, enjoy, and share art, from concept design to real-time entertainment. The universe is portrayed in artistic works in a variety of inventive and creative ways that change as people advance. The creation of innovative material is still mostly driven by specialists, but current research on visual artistic stylization has made it easier than ever for ordinary people to participate. A lot of work has been put towards successfully and efficiently transferring any picture or specialized domain style to the content image because neural style transfer demonstrates the possibility of encoding and modifying visual styles using deep neural networks.
Despite producing amazing results, these techniques can only stylize the one perspective that the content image records. They intend to stylize 3D information from multi-view input rather than single-image stylization due to the rising need for 3D asset production. Previous approaches to 3D representation often start with explicit models (such as meshes, voxels, and point clouds), then display them differently for multi-view stylization. Although these techniques allow for easy geometry control, they need help modeling and producing complicated situations. Their demands for a generic representation of varied scenes and objects are met by the latest implicit representation of neural radiance field (NeRF), which considerably enhances the quality of new vision synthesis.
Despite NeRF’s stronger scene reconstruction capabilities, it is harder to stylize NeRF by concurrently altering the encoded color and shape due to its highly implicit volumetric representation of appearance and geometry, parameterized and entangled by dense MLP networks. Pioneering NeRF stylization efforts have lately produced exciting advancements in the appearance style transfer of 3D scenes. However, their style guidelines are only based on image references, which, despite being widely used to specify the target style, is only sometimes the best option for every situation. In many cases, it may be difficult or impossible to find the right style images that both reflect the target style and match the source content.
The stylization of natural language is now a reality due to simultaneous developments in language-vision models. Therefore, finding another straightforward, organic, expressive advice method is a good idea. Recent text-guided stylization works have shown that, in contrast to image-guided methods, short text prompts offer:
However, with the current methods, it is still difficult to stylize the implicit NeRF representation using a straightforward text prompt.
The geometry and texture modulations can be limited by learning a latent space, although this process is frequently arduous and data-dependent. Some attempts explicitly impose style directives between the text in the CLIP embedding space and the produced views of NeRF. To enhance the geometry and texture modulations, mesh guiding, and background augmentation have also been suggested. They still have a lack of geometry deformations and texture details, though. In this study, they introduce NeRF-Art, a novel text-driven NeRF stylization technique. Their technique provides continuous new view synthesis with appearance and geometry altered, conforming to the defined style, using a pre-trained NeRF model and a single text prompt.
The latest large-scale Language-Vision model (i.e., CLIP) and NeRF are combined to achieve this, which is not straightforward owing to several difficulties. By limiting the similarity between the text and rendered views in the embedding space as it is inadequate to achieve the appropriate style strength, one may directly apply the supervision from CLIP to NeRF. To solve this issue, they develop a CLIP-based contrastive loss that effectively strengthens the stylization by moving the results farther from other styles designated as negative samples and closer to the target style. They expand their contrastive restriction to a hybrid global-local framework to encompass both global structures and local features to assure better consistency of the class over the whole scene.
Additionally, they loosen the pre-trained NeRF’s density restrictions to facilitate geometry stylization alongside appearance and implement a weight regularisation to efficiently eliminate hazy artifacts and geometry noises while changing the density field. In trials, they first assess the choice of text description for stylization, test their approach to diverse styles, and show the efficiency and adaptability of text guidance for NeRF stylization. Additionally, they conducted a user survey to demonstrate that their method produces the most aesthetically attractive outcomes when compared to similar approaches. They also take the mesh out of the stylized NeRF to illustrate how their technique may modify geometry. They combine it with other baselines to demonstrate how their method can be used for various NeRF-like models. The code implementation is available n GitHub along with a nice image representation.
Check out the Paper, Project, and Github. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.
Marktechpost is a California based AI News Platform providing easy-to-consume, byte size updates in machine learning, deep learning, and data science research
© 2021 Marktechpost LLC. All Rights Reserved. Made with ❤️ in California
Join Our ML Reddit Community