Introduction
There is increasing interest in understanding tourists’ preferences and viewpoints, given that this information is useful for tourism business research (Brown, 2015; Liutikas, 2017). Thus, there is a need for new preference assessment strategies that directly gather information from tourists. Social media is one of the main and growing sources of tourism-related information (Ayeh, Au & Law, 2013; Boley, Jordan, Kline & Knollenberg, 2018; Munar & Jacobsen, 2014; Xiang, Du, Ma & Fan, 2017).
Recently, social network analysis (SNA) has become a dominant tool for decision making and business research (Batrinca & Treleaven, 2014; Fan & Gordon, 2013; Ordenes et al., 2017). Social network analysis has been defined as a strategy for investigating social structures that combines sociology theory, information theory, and mathematical analysis (Otte & Rousseau, 2002). Similarly, social media analysis (SMA) can be interpreted as a specialized form of SNA that analyses media content generated by users of social media platforms as well as the relationships between this media content within the network of users.
Furthermore, SNA is a powerful tool for tourism research that, at a relatively low cost, can be used to manage and process large datasets of comments, ratings, and shares from different online communities (Brown, 2015; Liu & Liang, 2016; Monterrubio, 2017; Otte & Rousseau, 2002; Schroeder, Pennington-Gray, Kim & Liu-Lastres, 2018; Zeng & Gerritsen, 2014). However, the heterogeneous nature of unsolicited opinions, the complexity of natural language assessment, and differences in the characteristics of the social-data sources hinder the accurate assessment of preferences (Peláez, Cabrera & Vargas, 2018; Rai et al., 2018; Sandberg, Jaradat & Dokoohaki, 2016).
However, the use of solicited data sources, such as direct polling, is typically the preferred approach to assess individual viewpoints. In this case, questionnaires can be constructed such that individual preferences can be assessed in detail, whereas SNA requires the assessment of many conversations to compensate for the little information provided by each individual opinion. Nevertheless, direct polling is often resource-intensive, time-consuming, and geographically limited. Difficulties in direct polling in tourism research are particularly marked when surveying rural destinations, which are sparsely populated geographic areas situated outside cities and towns (Kelliher, Reinl, Johnson & Joppe, 2018).
We analyze a hybrid approach for preference assessment that combines active polling with passive SNA to build a unified preference metric for tourism research. To this end, we present a novel multiple criteria decision analysis (MCDA) model for preference-extraction from solicited and unsolicited data. We also present a real-world application of the proposed MCDA model for the assessment of preferences in rural destinations.
2. Materials and methods
Many decision-making scenarios involve choices based on information provided by peers based on their previous experience. The relationships between the perceived information and final decisions are very complex, and thus difficult to model and predict. To overcome these difficulties, previous studies have proposed MCDA methods to model the decision process. In MCDA, criteria are assessed in relation to a target. Specifically, MCDA identifies the relevance of the criteria that lead to choosing one of the target alternatives (Fernández, Bendodo, Sánchez & Cabrera, 2017).
In this regard, we address the issue of rating various rural destinations in Málaga Province (Andalusia, Spain). According to official sources, rural tourism in Málaga Province has increased in the past decade (Turismo y Planificación Costa del Sol S.L.U., 2017). The reasons for this increase are not well documented; however, previous reports suggest that one of the key factors is improvements in public transport and road systems that lead to small towns. On the other hand, there has also been an increase in the number of comments on rural destinations in Málaga in social media. This study addresses six of these destinations, which account for more than 80% of rural accommodation in this province: Ronda, Antequera, Frigiliana, Alhaurín el Grande, Álora, and Coín.
2.1 Define the social choice problem by identifying plausible alternatives
The first step in the proposed model is to establish a set of comparable alternatives that comprise the target of the MCDA process. Two alternatives are “comparable” if the majority of the study population of individuals consider them to be plausible candidates under the same conditions (e.g. car rental agencies in a given airport, hotels located near the same tourism attraction, neighbouring beaches in a given city, and cities within the same region).
The set of alternatives defines the social choice problem for the proposed MCDA model and can be formally represented as a set:
where A represents the set of k comparable alternatives a i .
In this study, set A corresponds to:
2.2 Identify the criteria that drive decisions
The second step identifies the most relevant criteria when choosing one of the alternatives in set A. In the proposed method, the inclusion requirements for each criterion are as follows:
1. All the alternatives in A must be describable by the criterion.
2. The profile of the alternative described by the criterion must be quantifiable.
Formally, each alternative 𝑎 𝑖 𝜖 𝐴 has a profile of 𝑥 𝑎 = 𝑥 1 𝑎 , 𝑥 2 𝑎 , …,𝑥 𝑛 𝑎 𝜖 ℝ 𝑛 , where 𝑥 𝑖 𝑎 is a partial assessment of 𝑎 in relation to criterion 𝑐 𝑗 . From 𝑥 𝑎 it is possible to estimate an overall measure 𝑀(𝑥 𝑎 ) for each alternative using an aggregation operator 𝑀: ℝ 𝑛 →ℝ (Peláez et al., 2018).
The following criteria were chosen for this study:
Accommodation.
Public infrastructures.
Cultural highlights.
Natural attractions.
2.3 Assess the criteria
Typical summarization approaches, such as the arithmetic mean, are not suitable for identifying the relevance of decision criteria when there are many opinions originating from multiple individuals with heterogeneous points of view. Therefore, an aggregation procedure is needed (La Red, Doña, Peláez & Fernández, 2011). Aggregation is the mechanism for finding a value that represents the opinions of multiple individuals on a given alternative with “good enough” quality.
The proposed model uses a robust eigenvector-based pairwise voting approach that does not violate democracy when considering multiple decision makers (Vargas, 2016). The researchers consulted 16 experts in tourism from local universities and state-run agencies in the province of Málaga. Each expert ranked the set of criteria such that x i > x j means “x i is preferred to x j ”. Subsequently, a pairwise voting matrix 𝑋 𝜙 was constructed:
where 𝑣 𝑖𝑗 (𝜙) 𝑣 𝑗𝑖 𝜙 represents the voting ratio between the number of voters who preferred alternative i to alternative j. Finally, the normalized principal eigenvector of 𝑋(𝜙) was used to represent the weights of the criteria, as follows:
2.4 Assessment of communications from social media platforms
Automated procedures can be used to obtain communications from open or private sources, such as Twitter or Facebook APIs. We obtained relevant communications related to rural tourism in Málaga province using the capture → extraction → semantic analysis approach, which has proven useful for similar tasks (Peláez et al., 2018). Firstly, communications were acquired from social networks using passive (API) or active (scraping) methods and stored in a data pool. Secondly, a subset of communications related to rural tourism was extracted from the data pool by query-based contextualization. Finally, the topic of the communication related to the criteria was determined using automatic natural language analysis techniques.
Communications were classified into one of four categories corresponding to the criteria. Furthermore, each communication was evaluated as a “good” or a “bad” experience. For example:
“I would love to go to a little house by Frigiliana, combining the rural and the beach for this season” → GOOD ACCOMMODATION at FRIGILIANA.
“Huge traffic jam in the road to Coín, ruining my vacations” → BAD PUBLIC INFRASTRUCTURES at COÍN.
The final rating of the destination acquired from the evaluated communications was the weighted aggregated mean of the profile set, using the 𝑊 𝑋 vector for category weights (see Eq. 4), and considering a GOOD experience as a positive value and a BAD experience as a negative value.
2.5 Assessment of solicited opinions
The study survey was conducted daily from July 1, 2019, to 28 September 2019, thus covering the summer holiday period in Spain. The interviewers used a purpose-built digital polling system. A total of 77 tourists were surveyed after returning by bus from rural vacations to the city of Málaga (the capital of Málaga Province). Inclusion criteria were: being 18 years old or older and having stayed overnight in at least one of the studied destinations.
2.6 Questionnaire design
The variables analyzed in the survey were: gender, age, educational level, country of residence, and length of stay. Tourists who participated were asked to rate the visited destination using a 7-point Likert-type scale. The Likert-type scale ranged from completely dissatisfied to completely satisfied (Andriotis, Agiomirgianakis & Mihiotis, 2008; Vagias, 2006; Zatori, Smith & Puczko, 2018).
3. Results
We acquired a total of 77 direct polls and 403 unsolicited comments using SNA. Table 1 shows the distribution of data entries and Figure 1 shows the proportion of solicited to unsolicited data.
Due to the nature of unsolicited opinions, it was not possible to determine the exact age of the individuals who provided opinions on social networks. On the other hand, the mean age of the tourists who contributed to the survey was 39.97 years. Table 2 shows the descriptive statistics of the age of the participants.
The country of origin for unsolicited communications was inferred using the language used in the post. Most of the participants were Spaniards, comprising 77.9% of the unsolicited sources and 66.5% of the solicited sources. Table 3 shows a cross-tabulation of source and country of origin.
The gender of the posters of unsolicited communications was inferred using the reported public information in the social networks. The gender of the participants was evenly distributed across sources and destinations. Table 4 shows a cross-tabulation of the source and gender of the participants.
According to solicited and unsolicited sources, the most preferred destination was Frigiliana, which comprised 32.5% of the solicited sources and 34.7% of the unsolicited sources. Table 5 shows a cross-tabulation of source vs preferred destinations.
Some of the test participants reported visiting more than one of the studied destinations. Therefore, 118 ratings were obtained from 77 participants. The highes-rated destination from direct polling was Antequera (6.11) followed by Ronda (6.00). Table 6 shows the descriptive statistics of the ratings obtained from solicited opinions.
Correlation analysis found a strong monotonic association between the ratings obtained from solicited and unsolicited sources (Spearman’s rho = 0.928, p = 0.008). Moreover, Linear Regression Analysis found a strong linear relationship (R2 = 0.788). Table 7 shows the results of Spearman’s correlation analysis and Figure 2 shows a scatter plot of the normalized scores.
4. Discussion and conclusion
This article presents a novel MCDA approach to the rating tourist experience. This approach combines solicited information using direct polling and unsolicited information from opinions on social networks.
The rounded ratio of data-points obtained from solicited vs unsolicited information was 19:100. This ratio suggests that it is easier to acquire information on the experience of visitors to selected rural destinations using social network sources, even though the survey was conducted during a 3-month period covering most of the summer season in Spain.
The experimental results show a strong correlation between the solicited and unsolicited ratings of the most visited rural destinations in Málaga Province, Spain (Spearman’s rho = 0.928, p = 0.008; linear regression coefficient R2 = 0.788). These results suggest that the proposed MCDA approach can extract preference information from unsolicited communications. This information is strongly associated with the results of direct surveys.
However, this study only evaluated the usability of the proposed MCDA model within the context of a selected group of rural destinations in Spain. There are clear difficulties in conducting direct surveys in such areas. Therefore, future research in this area could consider the inclusion of urban destinations, concerning which there is a more balanced ratio of solicited vs unsolicited data-points. Also, the use of large samples and study periods referring to different seasons of the year.
This study provides important contributions to the literature on tourist experience. Firstly, it overcame some of the limitations of prior studies by considering a novel model for preference-extraction from solicited and unsolicited data. Secondly, it presented empirical evidence using a multiple criteria decision analysis. Third, the results of this study suggest that the proposed MCDA approach can significantly reduce the number of polls required to accurately assess the preferences of visitors of rural destinations, given the difficulties in conducting direct surveys in such areas. From an applied perspective, this research shows implications for helping tourism researchers to manage large datasets of comments from different online communities.