Notes
Gains in the Reliability of Landscape Simulations Due to Greater Realism or Dynamic Exploration? Differences Between Designers And Non-Designers
Justin Kau (University of Oregon)
Robert Ribe (University of Oregon)
Chris Enright (University of Oregon)
David Hulse (University of Oregon)
Abstract
The reliability of ratings of four different simulations of the same plaza design were tested for perceptions of four qualities. Simulations varied in realism, and static views versus dynamic exploration. Added graphical realism generally contributed most to reliable ratings while dynamic exploration did not. Simulations employing static, high-realism views elicited more reliable ratings than other types of simulations depicting the same place. People with design training are less reliable at rating scenic beauty of the same place across different simulation types because designers’ ratings may be more ‘over-wrought’ and affected by simulations’ attributes. Design training does make people more reliable at rating the coherence of environments in ways not sensitive to the quality or mode of simulations, unlike those without such training. Design training tends to bias preference ratings upward when simulations’ realism increases, while dynamic exploration biases others’ ratings downward. Perceived realism ratings are broadly unreliable among everyone.
Introduction
Issues arise regarding methods and standards for visual simulations for design projects (Cuff & Hooper, 1979; Lovett et al., 2015). These usually focus on the accuracy or validity of simulations to represent what a landscape change will actually look like from appropriate viewpoints (Sheppard, 2001; Daniel & Meitner, 2001). This study focused instead on the reliability of visual landscape simulations. Reliability can indicate the merit of simulations in eliciting perceptions of qualities that many people tend to agree upon. This can indicate their effectiveness at conveying those qualities (Sheppard, 2005). It can warrant claims that simulations are effective by virtue of conveying landscape attributes in ways that stimulate shared perceptions of particular qualities in inter-subjectively similar ways.
Two factors that can enhance visual landscape simulations are greater representative realism (Lange, 2001; Kingery-Page & Hahn, 2012) and user-directed dynamic exploration (An & Powe, 2015; Ball, Capanni & Watt, 2008; Danahy, 2001). Both entail additional production costs and expertise. These alternatives can produce simulations of similar validity (Stamps, 2010), so the gains they offer in eliciting more reliable perceptions need to be assessed.
Experimental Plan
An experiment tested whether static versus self-directed dynamic simulations may increase the reliability of qualitative ratings between people with or without design training. Four simulations were produced of the same urban park design stratified to represent low versus high realism and static views versus dynamic exploration via a game engine model. These were rated by members of the public and people with design training via an online survey.
Respondents rated three qualities that might be similarly perceived via different visual simulations of the same environment, perhaps as elicited by the actual environment. One was scenic beauty to query simple, first-blush, rudimentary aesthetic perceptions. A second was preference or liking, as queried by a respondent’s desire to use and return again to the plaza. A third was coherence, or how well an environment is fully understood and cognitively navigated or ‘hangs together’ as a synthetic quality that can contribute to more-affective or aesthetic perceptions (Kaplan, 1987). Coherence was selected for testing because higher quality simulations, particularly if experienced by self-directed exploration, may or may not better convey it (Danahy, 2001; Kroh & Gimblett, 1992; Wilson, 1999).
Perceived realism was a fourth quality rated because these can be obtained to measure the realism of visualizations, and their reliability is little understood (Lange, 2001). This was not expected to be perceived similarly across the experimental simulations because they intentionally exhibited a range of graphical realism. The level of agreement, or reliability, of perceived-realism ratings could still be assessed within and between simulations to see how these assessments differ across simulation types.
Methods
Test Design
The study site was an urban plaza known as Kesey Square or Broadway Plaza at the center of Eugene, Oregon. It is a fifth of an acre at a street corner, and bounded on the other two sides by windowless walls. A new design was created for this space with attributes useful for this study. It was rich in diverse layers of colorful vegetation around the perimeter and in the center, contains a focal rock feature, places to sit including the edges of raised planters, and paths to navigate and discover an initially hidden central refuge bench next to a water feature.
The study employed two slightly different versions of this design: one with peripheral benches and one without. These allowed respondents to rate nearly the same landscape a second time with enough difference so that they would not feel that they were being asked to make exactly the same judgments a second time. Subsequent analysis of the study’s ratings data showed no significant difference between these two versions of the design across all the perceptions rated and simulation modalities.
Simulation Methods
The project budget did not allow inclusion of virtual reality simulations. Simulations of static views were generated by ‘photos’ of a Rhino 3D 5.0 model of the design with V-Ray software used to make the surfaces, textures and lighting more photo realistic. Simulated 3D models for dynamic exploration were created using the Unity 3D video game engine with MaterialStudio used to make the surfaces, textures and lighting more photo realistic and PlayMaker software used to enable users’ exploration. All objects were scaled against field photos that included distances and objects measured in the field or otherwise known. Open Street Map data also assisted accurate scaling of the site base model and distances along its street and building edges. Vegetation was rendered as described below.
Simulating Levels of Realism
Each design was simulated twice for the Rhino static view model and twice again for the Unity dynamic game engine model. Within the static models, one simulation employed low-detail, low-realism representations of tree and shrub types generated with Cinema 4D. These exhibited abstracted forms produced by low polygon counts that enable faster processing (Figure 1). The second static model employed higher-resolution, more realistic, renderings of tree and shrub types with much higher polygon counts using LandsDesign software. Within the dynamic models, the same Cinema 4D low-detail representations of the trees and shrubs were again employed. The higher-detail representations of trees and shrubs for the dynamic model were generated using Speedtree software (Figure 1).
Figure 1. Methods of representing different levels of realism in trees
Static View Sample
Six static views within each Rhino model were captured using a set of image capture setups (Figure 2). These were selected to capture a representative, diverse set of views that included all of the park. These included views from outside looking into the park, views within the park and views from inside looking out. All views had paths or sidewalks in the foreground. Two included streets. Two included the discovered feature bench next to the water feature. One included the focal rock feature. The views from the dynamic exploration Unity model in the two right-hand columns in Figure 2 are similar to the adjacent static Rhino model views in the right-hand columns. The Unity views were not rated by the respondents.
Figure 2. Common views of the simulations across the four modalities
Simulation Modalities
The above procedures yielded four simulation modalities for survey respondents to experience: (1) static two-dimensional views of a low-realism, three-dimensional model; (2) static two-dimensional views of a high-realism, three-dimensional model; (3) dynamic, self-directed exploration of a low-realism, three-dimensional model; and (4) dynamic, self-directed exploration of a high-realism, three-dimensional model. The six representative views of each of these modalities are shown in Figure 2. The low-realism views of the static model are similar to those of the low-realism dynamic model. Between the static versus dynamic high-realism views, the static model is a bit more realistic. This difference in realism is markedly less than that between any of the low-realism versus the high-realism views. The research therefore focused mainly on this latter big difference while interpreting the data with awareness of the likely perceived differences between the two modes of high-realism simulation.
Survey Construction
An on-line survey began with instructions that a sequence of four experiences of different simulations of similar designs of the same park were to follow. The four qualities to be rated after every experience were briefly defined and the respondents encouraged to judge each independently of the other three. Thereafter the simulated experiences were presented in the order listed above, rather than in random orders, because showing higher-realism modalities before lower-realism ones would provide subjects with a reference level and produce anchoring bias downward (Tversky & Kahneman, 1974) in rating low-realism simulations by “leading the witness” regarding the researchers’ inquiry. Each simulated experience was followed by a single survey page for making ratings. Respondents were randomly assigned to a protocol that either began with the with-benches design or with the no benches design, with these alternating thereafter to introduce some randomization of stimuli.
Each of the first two static view simulation experiences consisted of sequences of the six ‘photo’ views of a simulation that the respondent scrolled through. The last two dynamic self-directed simulation experiences were initiated with instructions to explore the simulated park only once for one to two minutes and encouragement to explore both the edges of the park and well into and around the park. The respondent then clicked on a link that took them to an on-line game engine model that began with a brief instruction box explaining how to use the mouse and arrow keys on their keyboard to move about the simulated park and look up and down. The survey only functioned for respondents using computers, not smart phones or pads, which was made clear in recruitment of respondents. They explored each game engine model for as long as they wished until they exited and were returned to the survey page for rating that experience.
The four ratings rendered after every experience of a simulation appeared on one survey page. The order of ratings was: (1) beauty, with particular reference to the plants and planting design; (2) coherence, or ease of navigation and discovery of the park’s layout and nice places; (3) perceived degree of realism, or ease with which one can grasp a sense of what it would feel and look like if it were actually built; (4) preference, or liking the park enough to return for more visits. For each of these ratings, respondents toggled a pointer to any desired point along a scale from ‘1’ “lowest value” up to 10 “highest value” or the they could check a “don’t know” box.
Respondents finished by answering questions querying their age range, gender, whether they had any formal training in a design profession, identified as a “gamer” or had visited Kesey Square.
Public Surveys
The survey protocol described above was administered to two respondent subsamples. An initial subsample of 126 respondents was recruited via Qualtrics. These were adult residents of American cities with populations of at least 50,000 who spent at least two minutes responding to the whole survey, entailing four experiences and four ratings of each experience, to eliminate speed responses just seeking to get credit for participation from Qualtrics. Respondents were culled from this subsample if they took less than 100 seconds to explore either of the game engine simulations queried. This latter criterion was derived from pre-test survey trials where subjects with little or no previous experience navigating video games needed at least that many seconds to explore the model enough to see all elements of the park from at least one vantage point. A few respondents were also culled if they rendered the same rating score for at least 88 percent of all their ratings, suggesting non-conscientious participation. This yielded a final Qualtrics-recruited subsample of 80 respondents.
A further subsample of respondents to the same survey was recruited to enable testing of perceptions of people with at least some design training. This subsample was recruited via hall posters in the College of Design (COD) at the University of Oregon as well as via brief presentations to architecture, design, digital arts and landscape architecture classes in that college. Only about three fifths of these supplemental respondents indicated in their survey responses that they had design training. It is likely that they included respondents from outside the College who saw the posters or that were indirectly recruited by COD students by word-of-mouth or through social media. All this yielded 52 additional respondents, who met the criteria for conscientious participation. The two subsamples combined to yield a final sample of 132.
Reliability Analysis Methods
The comparative validity of the four simulation modalities cannot be tested unless the park were actually built and rated by actual visitors as criterion variables. This study therefore focused instead only on comparative reliability of the modalities. The question therefore was how well any particular simulation elicits similar ratings of some particular quality across multiple viewers so that one might infer that they are perceiving the simulated environment in similar ways. A more unreliable simulation would produce highly variable ratings of a quality, irrespective of their mean value, such that people evidently are not agreeing upon the degree to which the simulation communicates attributes that elicits perceptions of that quality. A more reliable simulation would convey attributes of the landscape being simulated well enough to elicit similar ratings of a quality across a diversity of people. Such reliability tests may be applied as a test of the virtue of a simulation irrespective of whether these ratings matched up with those of the actual landscape being simulated -- which would instead be a validity test.
Three tests of reliability employed in this study are illustrated in Figure 3. Greater within-simulation reliability is indicated if the dispersion of its ratings for a quality is tighter, measured by their standard error. An indication of between-simulation reliability is whether two different simulations of the same place produce statistically the same mean ratings for a perceived quality, using a t-test. If two simulations’ means for the same perceived quality are significantly different then this suggests introduction of a perceptual bias between different simulations and low reliability of either or both, without knowing whether either simulation is more valid.
Figure 3. Hypothetical survey results illustrating three measures of two simulations’ reliability
Another measure of between-simulation reliability is the Hedges’ g statistic (Cohen, 1988). It can test the likelihood of a difference between means ratings from two different simulations (Figure 3). If unlikely, and there is reason to believe they should be the same, because they both depict the same landscape, the simulations may actually be equally reliable. Hedges’ g pools the standard errors between the two sets of ratings to estimate how likely it is that the different simulations of the same place are actually eliciting the same perceptions of a quality.
Data Analysis
All analyses were performed across the respondent sample of respondents (n=40) who indicated formal training in a design profession, that including seven respondents from the Qualtrics recruitment; and separately across the other respondents (n=92).
Standard errors and mean ratings were calculated across each unique combination of simulation modality, quality rated, and respondent sample. These tested within-simulation reliability for each such combination to be compared across simulation modalities.
For each combination of quality rated and respondent sub-sample, all pairs of means across the different simulation modalities were tested for statistically significant differences using repeated measures, two-tailed t tests at p = .05. These tested for perceptual biases introduced by switching between simulation modalities.
All the same pairs of mean ratings tested above for between-simulation perceptual bias (using t statistics) were also separately tested for between-simulation reliability in producing the same mean rating for each quality rated. The Hedges’ g statistic performed this test. It estimated the degree of consistency between the patterns of ratings elicited between two simulation modalities in response to the same landscape quality rating survey question. Three test ranges of Hedges’ g were interpreted, as suggested by Stamps (2002): consistent rating patterns for gH = 0 - 0.35; significantly inconsistent patterns for gH = 0.35 – 0.50; and substantially inconsistent patterns for gH > 0.50.
Results
Independence of Ratings
All the ratings of each quality rendered across all the simulations were combined (n = 5128) and an inter-correlation matrix calculated using Pearson’s product-moment correlations. All the values were less than 0.70 except for that between scenic beauty and preference (r = 0.82). It was therefore assumed that all the ratings except this one pair did not excessively ‘load upon’ each other in the respondents perceptions and could be considered ratings of largely different perceptions. Preference and scenic beauty perceptions tend often to be highly similar and correlated (Han, 2010), even when rated independently (Purcell, Peron & Berto, 2001) so differences in the reliability of their ratings across simulation modes may be interesting.
Factor Interaction Effects
Two statistically significant interaction effects were found among study factors by regression analyses across all respondents and simulation modalities. Scenic beauty or preference ratings were each regressed against ratings of coherence and perceived realism and also dummy variables for high simulation realism and static views, with cross effects included. (1) Dynamic exploration was associated with higher scenic beauty or preference ratings only if perceived realism was higher. (2) Higher coherence ratings were more associated with higher preferences than otherwise if simulations were both dynamic and of higher realism.
General Reliability Results
Figure 4 graphs both the within- and between-simulation reliability results. The former are indicated by the points, with better modalities higher up in the graphs. The latter are indicated by the color of the arrows, with green indicating pairs of means that are most consistent, red the least so, and yellow between. Inspection of all the graphs in Figure 4 suggests several findings.
The pattern of standard errors for both types of respondents in graphs A, B and D are very similar with the high-realism, static-view modality always most reliable and the high-realism, dynamic modality second best in graphs A and D. Because the graphical realism in the static views was higher than that in the dynamic model, it is safe to conclude that both the high-realism modalities may be about equally reliable and more so than both low-realism modalities.
None of the graphs in Figure 4 provides evidence of a clear compound gain in reliability by combining both realism and user-directed dynamic exploration. Low realism produces low reliabilities irrespective of changing between static views versus dynamic exploration.
Preferences exhibit the least variability among within-simulation reliabilities across simulation modalities, perceived realism the most, with scenic beauty and coherence in between (Figure 4). The lowest overall between-simulation reliabilities occurred for perceived realism (4 red and 3 yellow arrows). The second lowest for scenic beauty ratings (3 red and 2 yellow arrows). Between-simulation reliabilities for coherence ratings were quite consistent (2 yellow arrows) and preference ratings were very consistent (all green arrows). Between-simulation reliability tests for the interpretive qualities in graphs A, B and D (particularly for non-designers) exhibit a majority of green arrows, indicating that all simulation modalities are generally quite consistent with each other in eliciting reliable perceptions of the corresponding three qualities.
Figure 4. Graphs of within- and between-simulation reliabilities across all tests
Reliability Between Designer Versus Non-designer Respondents
Non-designers exhibited distinctly more reliability in judging scenic beauty than designers across all simulation modalities (Figure 4A). This was true for within-simulation reliability indicated by the marked vertical separation of the connected point sets. It was also true for between-simulation reliabilities, indicated by the majority of red and yellow arrows between the designers’ points in Figure 4A against the majority of green arrows between the non-designers’ points. The same marked difference in within-simulation reliabilities between designers and non-designers was exhibited for coherence ratings (Figure 4B).
Figure 4C shows that non-designers exhibit more within-simulation reliability at rating the perceived realism of various simulations than designers, albeit less obviously than the other qualities in Figure 4. Between-simulation reliabilities in Figure 4C are predominantly inconsistent across both respondent categories. The reliability of realism ratings are sensitive to levels of graphical realism and are wickedly unreliable at measuring this quality.
All between-simulation reliabilities in Figure 4D are green, indicating that all simulation modalities were all equally reliable at eliciting preference perceptions. Non-designers again exhibit greater within-simulation reliability than designers in agreeing about preferences.
General Biases Between Simulation Modalities
The graphs in Figure 5 map the mean rating values across test simulations. Red arrows indicate significant differences between mean rating values, by t-tests (p < 0.05), indicating unreliable biases introduced by changing simulation modalities. Green arrows indicate no such differences and correspondingly more between-simulation reliability.
Inspection of all the graphs in Figure 5 shows more red arrows (35) than green (13) indicating common evidence of poor between-simulation reliability. Changing simulation modalities tends often to drive ratings up or down compared to the preferable case where different simulation modalities might produce statistically the same average rating.
Another general observation is that more realistic simulations biased ratings of all qualities upward, and vice versa, except for coherence in dynamic simulations (Figure 5). Four of the 13 unbiased comparisons were between the two different low-realism simulations and never involved scenic beauty ratings, suggesting non-aesthetic perceptions tend to be more reliably rendered between static and dynamic simulations if these are of low realism. Three of the 13 unbiased comparisons were between static, low-realism versus dynamic, high-realism simulations, suggesting that radically different simulation modalities can be equally reliable.
Figure 5. Graphs of mean rating values across simulation modalities by quality rated
Biases Among Simulations Between Designers and Non-designers
Figure 5A demonstrates that changing simulation modalities tends to bias scenic beauty ratings among both designers and non-designers. Designers tended to see lower levels of scenic beauty than non-designers in low-realism simulations, perhaps because designers ‘need’ to see high realism simulations to perceive higher levels of scenic beauty.
A very interesting finding is in Figure 5B. Ratings of coherence by designers were unbiased (green) across all the simulation modalities, while all those among non-designers were biased (red). This suggests design training strengthens people’s ability to make the same sense of the configuration a design irrespective how it is simulated. People without design training tend to make more or less sense of a landscape depending on how it is simulated. They find less coherence via self-directed explorations of game engine simulations and more coherence in more realistic simulations. This suggests that static view simulations with high realism are the best choice if an important goal is to convey the spatial configuration of a landscape to the general public, consistent with Wilson (1999).
The comparisons of mean ratings in Figure 5C show that simulation modalities with different levels of graphical realism introduce substantial biases in ratings of perceived realism.
The comparisons of mean ratings in Figure 5D show that simulation modalities introduce biases in preference ratings. An exception is that designers tend to be unbiased in preferences between static or dynamic modalities if the level of realism is fixed. Non-designers exhibit biased ratings between static versus dynamic modalities. Designers may be more ‘comfortable’ in exploring game engine simulations than non-designers so that they render the same preference ratings as for static views; while non-designers may tend to have a ‘preference-aversive’ experience in game engine explorations. Non-designers may be more cognitively distracted by the on-line exploration control interface in this study.
Conclusions
This study allows an assessment of the relative merits of urban landscape simulations based only on their reliability in consistently producing perceptions across observers. It offers no assessment of their validly, i.e. accuracy, in representing the experience of actual landscapes. The study compared respondents’ ratings of simulations of the same urban plaza consisting of static views versus user-directed dynamic exploration of digital models. It also studied the impact of low versus high realism upon ratings of both these simulation modalities.
Levels of Landscape Quality Judgments
Different simulation modalities produced large differences in mean ratings of various qualities. The public’s ratings of landscape qualities are highly sensitive to the type of simulation experienced (Figure 5). More realistic simulations generally increase the value of ratings. Self-directed exploration of 3D digital landscape models tend to produce equivalent or lower ratings than those of static views if the realism of simulations is held constant.
Simple aesthetic perceptions are highly volatile across simulation modalities, albeit a bit less among designers’ ratings of more realistic simulations. More realistic simulations were associated with higher scenic beauty ratings; and vice versa, and more so among designers. Designers evidently require high-realism simulations to perceive higher levels of scenic beauty than do non-designers. Comparisons of public perceptions of landscape aesthetics should be made across simulations of similar realism and modality of exploration, but this is not as much of a concern for perceptions of landscape preferences.
Public perceptions of landscape coherence are quite volatile across simulation modalities, and are driven down by self-directed dynamic experiences compared to static views. Higher realism drives coherence perceptions up within either of these types of experience. Design training improves people’s ability to agree more on levels of perceived coherence, irrespective of simulation type, consistent with findings of Dupont, Antrop & Van Eetvelde (2015).
Ratings of perceived realism are unreliable among designers and non-designers, and this is amplified by actual differences in simulations’ realism. Biases tended to be smallest between different low-realism simulations. More realism costs more unreliable perceived realism.
Visual preference among non-designers are quite volatile across simulation modalities. More realistic and static simulations elicit the highest preferences and vice versa, while other combinations elicit in-between ratings. Designers’ preferences are higher for more realistic simulations. Designers’ preferences are not affected by switching between static versus dynamic simulations only if the level of realism stays the same. Designers are less likely to change their preference perceptions between static versus dynamic simulations with the same level of realism, but this is not true of non-designers.
General Reliability of Qualitative Perceptions
While different simulation modalities tended often to produce significantly different mean ratings of perceived qualities (Figure 5), the Hedges’ g tests here suggest different simulations often can be equally reliable at estimating the true mean of such ratings that would be rendered from experiences of the real landscape. These nuanced, conflicting findings were most true of non-designers’ ratings, as long as these were not of perceived realism (Figure 4). All the Hedges’ g tests, together with the standard error tests, offer no decisive basis for favoring any particular mode of simulation regarding within- or between-simulation mode reliability. Switching from static views to dynamic exploration, within either high or low realism simulations, generally does not introduce substantial differences in the reliability of qualitative perceptions. Switching from low to high realism, within either static or dynamic simulations, tends to significantly but not substantially increase the overall reliability of ratings.
Non-designers are more reliable at judging scenic beauty than are designers across all simulation modalities. This is perhaps because they are more able to instantly appraise this simple affective quality ‘on its face’ while designers may allow other more complex cognitive ideas or graphical issues to confound the apperception of simple scenic affects.
High realism simulations of static views are the best simulation modality if a goal is to reliably convey the coherence or spatial configuration of landscapes to non-designers. Low realism or dynamic simulations should be avoided in such cases. Designers exhibit less reliable perceptions of the coherence of simulated landscapes than non-designers, but they are more reliable at rendering such ratings between simulation modalities. Within any one modality, designers’ ratings may be affected by perceptions of how simulations might better convey landscape configurations producing more variable ratings of coherence than non-designers.
Ratings of perceived realism are not a reliable basis for measuring the actual realism of very differently produced simulations, although other options may not scarce. Designers are more reliable than non-designers in agreeing about levels of perceived realism, so asking them to judge this quality may be less problematic, even though their between-simulation-modality ratings are unreliable.
For all qualities rated, non-designers are more reliable in judging landscape preferences than designers across simulation modalities. Designers likely allow graphical considerations to affect ratings of landscape preference in diverse ways, which reduces their ratings’ within-simulation reliability. Non-designers seem to make landscape preference perceptions more simply ‘on their face’ and thereby with more reliability. For either of these types of people, or both combined, adding more realism or adding self-directed exploration to simulations does not improve the reliability of preference perceptions.
Summary Findings
Identifying the best mode of simulation from this study’s findings will focus here only on findings for the general public of non-designers, and only on perceptions of the normatively interpretive qualities tested, i.e. not perceived realism. Recommendations are based only upon evidence of reliability and not validity, and depend upon which perceived quality is prioritized.
If aesthetic perceptions are critical, static, more realistic views are best with the highest within-simulation reliability (Figure 4A). This simulation’s mean scenic beauty rating was significantly different with those of low-realism simulations, whether static or dynamic (Figure 5A). The dynamic, high realism simulation had some merit inasmuch as it was unbiased in effecting mean scenic beauty ratings compared to the low-realism static simulation (Figure 5A).
If coherence perceptions are of concern, static and more realistic views are again best. They consistently have the highest within-simulation reliability and there are significantly losses in between-simulation reliability if the modality is switched to self-directed dynamic exploration (Figure 4B). It should be noted that this recommended static and realistic option elicited significantly different mean ratings of coherence compared to all other simulations (Figure 5B), so it could loose merit if any another modality were proved to be more valid by other studies.
If preference perceptions are important, all simulation modalities have nearly the same within- and between-simulation reliability so they have roughly equal merit. Here too, static and more realistic simulations have slightly more within-simulation reliability (Figure 4D) and produced significantly different mean ratings compared to other options (Figure 5D).
This study finds that static views with high realism is the best simulation modality because it proved best for all three qualities assessed above. Higher realism is the most effective factor in eliciting reliable ratings. Dynamic exploration does not contribute to reliable ratings. This result may be a consequence of two weaknesses in this study: (1) The realism of the alternative self-directed-dynamic and high-realism simulation was a bit lower than of the recommended high-realism, static-views simulation. (2) Dynamic exploration entailed using arrow keys and a mouse, as opposed to a joystick or virtual reality goggles. Further research is needed which improves these experimental conditions.
References
An, K. & Powe, N.A. (2015). Enhancing ‘boundary work’ through the use of virtual reality: exploring the potential within landscape and visual impact assessment. Journal of Environmental Policy and Planning, 17, 673-690.
Ball, J., Capanni, N., Watt, S. (2008). Virtual reality for mutual understanding in landscape planning. International Journal of Social Sciences, 2, 78-88.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, New Jersey: Lawrence Erlbaum Associates Inc.
Cuff, D. & Hooper, K. (1979). Graphic and mental representation of environments. In A. Seideland & S. Danford, eds. Environmental Design: Research, Theory and Application: Proceedings of EDRA 10, pp. 10-17.
Danahy, J.W. (2001). Technology for dynamic viewing and peripheral vision in landscape visualization. Landscape and Urban Planning, 54, 127-138.
Daniel, T.C., Meitner, M.M. (2001). Representational validity of landscape visualizations: the effects of graphical realism on perceived scenic beauty of forest vistas. Journal of Environmental Psychology, 21, 61-72.
Dupont, L., Antrop, M., Van Eetvelde, V. (2015). Does landscape related expertise influence the visual perception of landscape photographs? Implications for participatory landscape planning and management. Landscape and Urban Planning, 141, 68-77.
Han, K.-T. (2010). An exploration of relationships among the responses to natural scenes: scenic beauty, preference, and restoration. Environment and Behavior, 43, 243-270.
Kaplan, S. (1987). Aesthetics, affect, and cognition: Environmental preference form an evolutionary perspective. Environment and Behavior, 19, 3-32.
Kroh, D.P., Gimblett, R.H. (1992). Comparing live experience with pictures in articulating landscape preference. Landscape Research, 17, 58-69.
Lange, E. (2001). The limits of realism: perceptions of virtual landscapes. Landscape and Urban Planning, 54, 163-182.
Lovett, A., Appleton, K., Wareen-Kretzschmar, Von Haaren (2015). Using 3D visualization methods in landscape planning: an evaluation of options and practical issues. Landscape and Urban Planning, 142, 85-94.
Purcell, T., Peron, E., Berto, R. (2001). Why do preferences differ between scene types? Environment and Behavior, 33, 93-106.
Sheppard, S.R.J. (2001). Guidance for crystal ball gazers: developing a code of ethics for landscape visualization. Landscape and Urban Planning, 54, 183-199.
Sheppard, S.R.J. (2005). Validity, reliability and ethics in visualization. In I.D. Bishop & E. Lange, eds., Visualization on Landscape and Environmental Planning: Technology and Applications. London: Taylor and Francis, pp. 72-91.
Stamps, A.E. (2002). Fractals, skylines, nature and beauty. Landscape and Urban Planning, 60, 163–184.
Stamps, A.E. (2010). Use of static and dynamic media to simulate environments: a meta-analysis1. Perceptual and Motor Skills, 111, 355-364.
Tversky, A., Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science, 185, 1124-31.
Wilson, P.N. (1999). Active exploration of a virtual environment does not promote orientation or memory for objects. Environment and Behavior, 31, 752-63.