Evaluation of visual perception in the context of the sensory city through artificial intelligence: A case study of Zand Complex urban spaces, Shiraz

Document Type : Original Article

Authors

1 M.A. in Urban Design, Faculty of Art and Architecture, Shiraz University, Shiraz, Iran

2 Associate Professor, Department of Urban Planning, Faculty of Art and Architecture, Shiraz University, Shiraz, Iran

Abstract

Visual perception is an important dimension of urban experience that directly shapes spatial behavior and the sense of place. However, it should be noted that in historical-cultural settings  such as the Zand Complex in Shiraz, objectively assessing visual quality and linking it to the lived experiences of citizens poses significant challenges. The lack of integrated frameworks that combine sensory field-based observations with computational analysis limits the capacity to generate evidence-based design strategies. In the broader paradigm of the sensory city, where all senses are simultaneously engaged, vision is particularly influential, mediating security, comfort, aesthetics, identity, and place-making qualities that enhance urban vitality. From a neuroscientific perspective, vision is a system consisting of the retina, optic nerve, and visual cortex, in which rapid eye movements (eye saccade) and binocular vision play key roles in visual processing. In urban environments, the primary estimated components of visual perception include light and brightness, color and facades, and the visual environment, which are further influenced by secondary factors such as natural elements, spatial legibility, and behavior. Adequate lighting affects both perceived security and the duration of a user’s presence, while color conveys symbolic and cultural meanings and influences psychological states such as calm or arousal. Facades and the visual environment, from a video ecology perspective, interact with the physiological and cognitive responses of the eye and brain, and visual environments are classified into three types: “comfortable”, “homogeneous”, and “aggressive” based on their compatibility with visual mechanisms and individual perceptual experience. This allows for a more accurate assessment of visual perception and supports urban design that is aligned with human visual processing.
This research develops and tests a framework for assessing visual perception within the framework of the sensory city model and applies it to the urban spaces of the Zand Complex. Specific objectives include redefining the components and processes of visual perception in this context; establishing measurable criteria for recording people's visual experience in urban environments (estimated as components including lighting and illumination, facades and visual environment, color and its harmony, natural elements, legibility and spatial recognition, and finally, people's behavior and activity in urban spaces); assessing the visual and functional performance of the study area using qualitative and quantitative methods; examining the role of artificial intelligence, especially image processing, classification algorithms, and fuzzy logic, in diagnosing and guiding urban design interventions.
This study was conducted using a mixed-methods case study design. Data were collected through two primary methods and a complementary method for computing secondary data: (1) field surveys and sensory walks, (2) in-depth interviews, and (3) AI-based image processing and secondary visual data.  The study area was divided into five spatial zones (A–E) to enable detailed analysis of environmental and visual features. The photographs and recorded videos were taken using professional cameras and their locations were calibrated to match the natural eye level and visual perception of the users. Field surveys and sensory walks were conducted in two seasons (summer and winter) and in both day and night sessions to ensure reliability and validity. In parallel with the field survey and sensory walks, qualitative data were collected from 33 in-depth interviews. The audio recordings were transcribed through an automated speech-to-text routine, and the resulting texts were coded and analyzed in ATLAS.ti, and AI-based sentiment and content analysis was integrated with image processing outputs to create a comprehensive interpretation of visual perception patterns. And at the end of each section of the two main methods, as well as at the end of each section of their main criteria, quantitative transformation of qualitative data was performed using Likert scale scoring for each spatial area and each criterion. Python-based image processing algorithms and AI-based fuzzy logic were used to extract and process secondary data from visual features. The methodological approach demonstrates how interdisciplinary techniques, combining spatial analysis, sensory ethnography, and computational modeling, can increase the accuracy and depth of urban perception studies.
The findings indicate that visual perception in the Zandieh complex in Shiraz is influenced by objective factors (light, color, texture, facade, greenery) and subjective factors (memory and historical identity), and user behavior and presence play a significant role in it. The analysis revealed significant variation in visual perception quality among the studied zones. Zones E and B demonstrated the highest levels of visual appeal and spatial presence. However, some zones such as C and D1 also showed relative desirability. Consequently, these zones demonstrated a more favorable situation compared to other sectors with lower scores. In spatial zones C and D1, short-term or tactical interventions can significantly enhance the visual quality and sensory experience of users. Other zones, however, require more extensive spatial planning and structural redesign to provide a more desirable sensory experience for urban users. Analyses of facades and the visual environment, based on videoecological criteria (Filin, 1997), showed that "aggressive" and "homogeneous" facades were associated with increased visual noise and eye fatigue, acting as factors that reduce the quality of experience. In-depth interviews revealed that factors such as accessibility, social functions, proximity to landmark elements, and diversity of activities can help mitigate these visual deficiencies. Methodologically, the integration of qualitative insights with AI-based image processing has increased the accuracy of visual index assessment and translated the findings into practical design recommendations. The proposed framework therefore bridges the gap between phenomenological research and computational analysis, providing a replicable model for assessing the sensory dimensions of urban space. Beyond the immediate case study, this research contributes to the broader discourse on urban design by highlighting how sense-based methods can be complemented by digital technologies to more effectively assess the quality of place in heritage contexts. This framework not only demonstrates the application of AI to urban perception studies, but also suggests pathways for integrating residents’ experiential knowledge with computational models, thereby enriching participatory planning processes. In doing so, it provides a transferable methodology that can be adapted to other historic urban environments facing similar challenges of conservation, adaptation, and placemaking. Finally, this study emphasizes the necessity of multisensory and evidence-based approaches to designing resilient, legible, and engaging urban spaces that balance cultural continuity with contemporary urban needs while promoting long-term sustainability and civic identity.

Keywords

Main Subjects


پورجعفر، محمدرضا و علوی با المعنی، مریم (1391). استخراج معیارهای هماهنگی و ناهماهنگی نماهای ساختمان با دستگاه بینایی انسان با توجه به اصول بوم شناسی بصری. معماری و شهرسازی ایران، 3(1)،  https://doi.org/10.30475/isau.2013.61951
پورجعفر، محمدرضا؛ علوی با المعنی، مریم؛ فتح الهی، یعقوب و پورجعفر، علی (1390). معرفی ویدئواکولوژی و استخراج معیارهای هماهنگی و عدم هماهنگی محیط بصری با دستگاه بینایی از مطالعات ویدئوکولوژی انجام شده بر روی نمای ساختمان‌های گوناگون. مدیریت شهری، 9(27)، 183-196.  https://www.sid.ir/fa/VEWSSID/J_pdf/28713902711.pdf
دهخدا، علی‌اکبر (1377). لغت نامه(جلد 4 و 6). برگرفته از: https://noorlib.ir/book/view/53064
رو، جنی و مک‌کی، لیلا (1403). شهرهای ترمیمگر؛ طراحی شهری برای سلامت روان و رفاه. ترجمه  مجتبی ذوالانواری، ریحانه حدائق، نفیسه کریمکشته، زهرا محدث، آذین زابل‌عباسی و حانیه خطاپوش (ویرایش 1). تخت‌جمشید.
شکوهی دولت آبادی، محمود و زارعی، زهرا (1400). تحلیل غنای حسی با استفاده از تکنیک حس‌گردی و یاداشت برداری حسی (نمونه موردی: پارک آزادی شیراز). دانش شهرسازی، 5(3)، 153-169.  https://doi.org/10.22124/upk.2021.16538.1474
نسر، جک ال (1394). روان‌شناسی محیطی و طراحی شهری. ترجمه نوید پورمحمدرضا در بنرجی، تریدیب  و لوکایتو سیدریس، آناستازیا. طراحی شهری: مفاهیم و جریان های معاصر (ویرایش 1). طهان.
نسر، جک ال (1400). تصویر ذهنی ارزیابانه از شهر. ترجمه مسعود اسدی‌محل‌چالی (ویرایش 2). آرمانشهر.
 
Adams, M., Cox, T., Croxford, B., Moore, G., Sharples, S., & Refaee, M. (2009). The sensory city. In R. Cooper, G. Evans, & C. Boyko (Eds.), Designing sustainable cities (1st ed., pp. 75–85). John Wiley & Sons.
Alexander, C., Ishikawa, S., Silverstein, M., & Jacobson, M. (1977). A pattern language: Towns, buildings, construction (1st ed.). Oxford Univ. Press.
Ames, M.G. (2006). The social life of snapshots: The past, present, and future of personal photography.  [Master’s thesis, School of Information University of California]. Berkeley. Retrived from https://morganya.org/research/thesis_Ames_snapshots.pdf
Asadia, N.T., Moustafa, Y.M., & Elazzazy, M.M.F. (2023). Environmental perception of urban spaces: Physical versus virtual exploration. Civil Engineering and Architecture, 11(4), 2182–2200. https://doi.org/10.13189/cea.2023.110437
Bengio, Y., Goodfellow, I., & Courville, A. (2017). Deep learning (Vol. 1). MIT press Cambridge. Retrived from https://www.academia.edu/download/62266271/Deep_Learning20200303-80130-1s42zvt.pdf  
Bentley, I., McGlynn, S., Smith, G., Alcock, A., & Murrain, P. (1985). Responsive environments: A manual for designers. Routledge.  https://doi.org/10.4324/9780080516172
Bishop, C.M., & Nasrabadi, N.M. (2006). Pattern recognition and machine learning (Vol. 4). Springer.
Cassidy, T. (1997). Environmental psychology: Behaviour and experience In context. Psychology Press. https://doi.org/10.4324/9780203940485
Cullen, G. (1961). The concise townscape (1st ed). Architectural Press.  
Dobreva, D. (2024). Ergonomics and video ecology in the pedestrian zone in the city of Varna - is it safe, comfortable and aesthetic? IETI Transactions on Ergonomics and Safety, 8(1). Https://doi.org/10.6722/tes.202404_8(1).0002 
Filin, V. A. (1997). Videoecology: Good and bad for eyes (in Russian). Retrieved from https://www.videoecology.com/eng.html
Filin, V.A. (2007). Problem of ecology of urban visual environment. Beijing conference. Retrived from https://www.videoecology.com/s_china.html
Filin, V.A. (2009). Urban visual environment as a social factor. Moscow centre “Videoecology”. Retrived from https://www.videoecology.com/s_social.html
Gonzalez, R.C., & Woods, R.E. (2007). Digital image processing (3rd ed). Prentice Hall. Retrived from https://sde.uoc.ac.in/sites/default/files/sde_videos/ 
Grütter, J.K. (2020). Basics of perception in architecture (1st ed.). Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-31156-8
Hillier, B. (2007). Space is the machine: A configurational theory of architecture. Space Syntax. London, UK. Retrived from https://discovery.ucl.ac.uk/id/eprint/3881/
Ingold, T. (2000). The perception of the environment: Essays on livelihood, dwelling and skill (1st ed.). Routledge.
Jacobs, J. (1992). The death and life of great American cities (1st ed.). Vintage .
Jaglarz, A. (2023). Perception of color in architecture and urban space. Buildings, 13(8), Article 8. https://doi.org/10.3390/buildings13082000
Karlen, M., & Benya, J. (2004). Lighting design basics (1st ed.). John Wiley & Sons.
Kaufmann, V. (2016). Rethinking the city. Routledge. https://doi.org/10.4324/9781315782768
Kislinger, L., & Kotrschal, K. (2021). Hunters and gatherers of pictures: Why photography has become a human universal. Frontiers in Psychology, 12, 654474. https://doi.org/10.3389/fpsyg.2021.654474
Kitchin, R. (2014). The real-time city? Big data and smart urbanism. GeoJournal, 79(1), 1–14. https://doi.org/10.1007/s10708-013-9516-8
Kozlova, N. (2016). Contemporary facades of multistorey residential buildings in Kiev: Videoecological aspect. Spatium, 36, 24–33. https://doi.org/10.2298/SPAT1636024K
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Lynch, K. (1960). The image of the city (1st ed.). MIT press.   
Merleau-Ponty, M. (1962). Un inédit de maurice Merleau-Ponty. Revue de Métaphysique et de Morale, 67(4), 401–409. http://www.jstor.org/stable/40900691 .
Neisser, U. (1986). Cognitive psychology: Classic edition (1st ed.). Psychology Press. https://doi.org/10.4324/9781315736174.
Norberg-Schulz, C. (1976). The phenomenon of place. In M. Larice & E. Macdonald (Eds.), The urban design reader (pp. 125–137). Retrived from https://books.google.com/books/about/ 
Pole, C.J. (Ed.). (2004). Seeing is believing? Approaches to visual research (Vol. 7). Elsevier JAI. https://doi.org/10.1016/S1042-3192(2004)7.
Pallasmaa, J. (2005). The eyes of the skin: Architecture and the senses (2nd ed.). John Wiley & Sons.
Palmer, S.E. (1999). Vision science: Photons to phenomenology (1st ed.). The MIT Press.
Peca Amaral Gomes, A. (2023). Invisible city. A multi-sensory approach to the analysis of urban space. [Doctoral’s thesis, University College London (UCL)]. London. https://discovery.ucl.ac.uk/id/eprint/10164380/
Pink, S. (2011). A multisensory approach to visual methods. In E. Margolis & L. Pauwels (Eds.), The SAGE handbook of visual research methods (pp. 601–614). SAGE Publications, Inc.
Rapoport, A. (1977). Human aspects of urban form: Towards a man-environment approach to urban form and design (1st ed). Pergamon Press.
Rensink, R.A. (2000). Scene perception In A. E. Kazdin (Ed.), Encyclopedia of psychology (Vol. 7, pp. 151–155). Oxford University Press.
Rodaway, P. (1994). Sensuous geographies: body, sense, and place (1st ed.). Routledge. https://doi.org/10.4324/9780203082546
Rutakumwa, R., Mugisha, J.O., Bernays, S., Kabunga, E., Tumwekwase, G., Mbonye, M., & Seeley, J. (2020). Conducting in-depth interviews with and without voice recorders: A comparative analysis. Qualitative Research, 20(5), 565–581. https://doi.org/10.1177/1468794119884806
Simmel, G. (1997). Sociology of the senses In A. Blaikie, M. Hepworth, & M. Holmes (Eds.), The body: Critical concepts in sociology (Vol. 1, pp. 5–9). Psychology Press.
Simmel, G. (2023). The metropolis and mental life In W. Longhofer & D. Winchester (Eds.),  Social theory re-wired (3rd ed., pp. 11–19). Routledge. https://doi.org/10.4324/9781003320609
Simmons, S.M., Baur, S., Gillis, W., Burns, D., & Pickerill, H. (2022). Optimizing exterior lighting illuminance and spectrum for human, environmental, and economic factors. IOP Conference Series: Earth and Environmental Science, 1099(1), 012047. https://doi.org/10.1088/1755-1315/1099/1/012047
Cowan, A., & Steward, J. (Eds.). (2007). The city and the senses: Urban culture since 1500. Ashgate Publishing. https://doi.org/10.4324/9781315614731
Wang, P., Song, W., Zhou, J., Tan, Y., & Wang, H. (2023). AI-based environmental color system in achieving sustainable urban development. Systems, 11(3), Article 3. https://doi.org/10.3390/systems11030135
Whyte, W.H. (1980). The social life of small urban spaces. Project for Public Spaces. Retrived from https://libgen.li/edition.php?id=136538133
Zhang, L., & Kim, C. (2023). Chromatics in urban landscapes: Integrating interactive genetic algorithms for sustainable color design in marine cities. Applied Sciences, 13(18), Article 18. https://doi.org/10.3390/app131810306
Zhang, Y., Wang, P., Wei, W., & Wang, Z. (2024). How to construct an urban color system? Taking the historic center of Macau as an example. Buildings, 14(9), 2874. https://doi.org/10.3390/buildings14092874
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633–641. http://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html