Concept: Amazon Mechanical Turk
Amazon Mechanical Turk (AMT) is an online crowdsourcing service where anonymous online workers complete web-based tasks for small sums of money. The service has attracted attention from experimental psychologists interested in gathering human subject data more efficiently. However, relative to traditional laboratory studies, many aspects of the testing environment are not under the experimenter’s control. In this paper, we attempt to empirically evaluate the fidelity of the AMT system for use in cognitive behavioral experiments. These types of experiment differ from simple surveys in that they require multiple trials, sustained attention from participants, comprehension of complex instructions, and millisecond accuracy for response recording and stimulus presentation. We replicate a diverse body of tasks from experimental psychology including the Stroop, Switching, Flanker, Simon, Posner Cuing, attentional blink, subliminal priming, and category learning tasks using participants recruited using AMT. While most of replications were qualitatively successful and validated the approach of collecting data anonymously online using a web-browser, others revealed disparity between laboratory results and online results. A number of important lessons were encountered in the process of conducting these replications that should be of value to other researchers.
SUMMARY: This application note describes a new scalable semi-automatic approach, the Dual Point Decision Process (DP2), for segmentation of 3D structures contained in 3D microscopy. The segmentation problem is distributed to many individual workers such that each receives only simple questions regarding whether or not two points in an image are placed on the same object. A large pool of micro-labor workers available through Amazon’s Mechanical Turk system provide the labor in a scalable manner.Availability and Implementation: Python based code for non-commercial use and test data are available in the source archive at http://cytoseg.googlecode.com/files/imageprocessing.tar.gz. CONTACT: firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Test result information, discussion, and a DP2 process flowchart are available at Bioinformatics online.
Autonomous vehicles (AVs) should reduce traffic accidents, but they will sometimes have to choose between two evils, such as running over pedestrians or sacrificing themselves and their passenger to save the pedestrians. Defining the algorithms that will help AVs make these moral decisions is a formidable challenge. We found that participants in six Amazon Mechanical Turk studies approved of utilitarian AVs (that is, AVs that sacrifice their passengers for the greater good) and would like others to buy them, but they would themselves prefer to ride in AVs that protect their passengers at all costs. The study participants disapprove of enforcing utilitarian regulations for AVs and would be less willing to buy such an AV. Accordingly, regulating for utilitarian algorithms may paradoxically increase casualties by postponing the adoption of a safer technology.
The reasons for using electronic nicotine delivery systems (ENDS) are poorly understood and are primarily documented by expensive cross-sectional surveys that use preconceived close-ended response options rather than allowing respondents to use their own words. We passively identify the reasons for using ENDS longitudinally from a content analysis of public postings on Twitter. All English language public tweets including several ENDS terms (e.g., “e-cigarette” or “vape”) were captured from the Twitter data stream during 2012 and 2015. After excluding spam, advertisements, and retweets, posts indicating a rationale for vaping were retained. The specific reasons for vaping were then inferred based on a supervised content analysis using annotators from Amazon’s Mechanical Turk. During 2012 quitting combustibles was the most cited reason for using ENDS with 43% (95%CI 39-48) of all reason-related tweets cited quitting combustibles, e.g., “I couldn’t quit till I tried ecigs,” eclipsing the second most cited reason by more than double. Other frequently cited reasons in 2012 included ENDS’s social image (21%; 95%CI 18-25), use indoors (14%; 95%CI 11-17), flavors (14%; 95%CI 11-17), safety relative to combustibles (9%; 95%CI 7-11), cost (3%; 95%CI 2-5) and favorable odor (2%; 95%CI 1-3). By 2015 the reasons for using ENDS cited on Twitter had shifted. Both quitting combustibles and use indoors significantly declined in mentions to 29% (95%CI 24-33) and 12% (95%CI 9-16), respectively. At the same time, social image increased to 37% (95%CI 32-43) and lack of odor increased to 5% (95%CI 2-5), the former leading all cited reasons in 2015. Our data suggest the reasons people vape are shifting away from cessation and toward social image. The data also show how the ENDS market is responsive to a changing policy landscape. For instance, smoking indoors was less frequently cited in 2015 as indoor smoking restrictions became more common. Because the data and analytic approach are scalable, adoption of our strategies in the field can inform follow-up survey-based surveillance (so the right questions are asked), interventions, and policies for ENDS.
Incorrect beliefs about memory have wide-ranging implications. We recently reported the results of a survey showing that a substantial proportion of the United States public held beliefs about memory that conflicted with those of memory experts. For that survey, respondents answered recorded questions using their telephone keypad. Although such robotic polling produces reliable results that accurately predicts the results of elections, it suffers from four major drawbacks: (1) telephone polling is costly, (2) typically, less than 10 percent of calls result in a completed survey, (3) calls do not reach households without a landline, and (4) calls oversample the elderly and undersample the young. Here we replicated our telephone survey using Amazon Mechanical Turk (MTurk) to explore the similarities and differences in the sampled demographics as well as the pattern of results. Overall, neither survey closely approximated the demographics of the United States population, but they differed in how they deviated from the 2010 census figures. After weighting the results of each survey to conform to census demographics, though, the two approaches produced remarkably similar results: In both surveys, people averaged over 50% agreement with statements that scientific consensus shows to be false. The results of this study replicate our finding of substantial discrepancies between popular beliefs and those of experts and shows that surveys conducted on MTurk can produce a representative sample of the United States population that generates results in line with more expensive survey techniques.
Psychological researchers have begun to utilize Amazon’s Mechanical Turk (MTurk) marketplace as a participant pool. Although past work has established that MTurk is well suited to examining individual behavior, pseudo-dyadic interactions, in which participants falsely believe they are interacting with a partner, are a key element of social and cognitive psychology. The ability to conduct such interdependent research on MTurk would increase the utility of this online population for a broad range of psychologists. The present research therefore attempts to qualitatively replicate well-established pseudo-dyadic tasks on MTurk in order to establish the utility of this platform as a tool for researchers. We find that participants do behave as if a partner is real, even when doing so incurs a financial cost, and that they are sensitive to subtle information about the partner in a minimal-groups paradigm, supporting the use of MTurk for pseudo-dyadic research.
Recent research suggests that presenting time intervals as units (e.g., days) or as specific dates, can modulate the degree to which humans discount delayed outcomes. Another framing effect involves explicitly stating that choosing a smaller-sooner reward is mutually exclusive to receiving a larger-later reward, thus presenting choices as an extended sequence. In Experiment 1, participants (N = 201) recruited from Amazon Mechanical Turk completed the Monetary Choice Questionnaire in a 2 (delay framing) by 2 (zero framing) design. Regression suggested a main effect of delay, but not zero, framing after accounting for other demographic variables and manipulations. We observed a rate-dependent effect for the date-framing group, such that those with initially steep discounting exhibited greater sensitivity to the manipulation than those with initially shallow discounting. Subsequent analyses suggest these effects cannot be explained by regression to the mean. Experiment 2 addressed the possibility that the null effect of zero framing was due to within-subject exposure to the hidden- and explicit-zero conditions. A new Amazon Mechanical Turk sample completed the Monetary Choice Questionnaire in either hidden- or explicit-zero formats. Analyses revealed a main effect of reward magnitude, but not zero framing, suggesting potential limitations to the generality of the hidden-zero effect.
A substantial body of research documents that exposure to images depicting a “thin ideal” body figure effects women’s state-oriented body satisfaction. However, there is evidence that the societal ideal body figure of females is evolving to be not just thin, but also muscular or toned. Therefore, the purpose of this research was to test the effect of exposure to ideal body figures that are both thin and muscular on female state body satisfaction. Researchers recruited female participants (N=366) from an online community (Amazon’s Mechanical Turk) and randomly assigned them to view images in one of four conditions: thin, thin and muscular, thin and hypermuscular, and control (images of cars). Results indicated that state-oriented body satisfaction decreased in the thin condition and thin and muscular condition, but not the hypermuscular or control conditions. These findings have implications for clinical initiatives as well as future research.
The reliance on small samples and underpowered studies may undermine the replicability of scientific findings. Large sample sizes may be necessary to achieve adequate statistical power. Crowdsourcing sites such as Amazon’s Mechanical Turk (MTurk) have been regarded as an economical means for achieving larger samples. Because MTurk participants may engage in behaviors which adversely affect data quality, much recent research has focused on assessing the quality of data obtained from MTurk samples. However, participants from traditional campus- and community-based samples may also engage in behaviors which adversely affect the quality of the data that they provide. We compare an MTurk, campus, and community sample to measure how frequently participants report engaging in problematic respondent behaviors. We report evidence that suggests that participants from all samples engage in problematic respondent behaviors with comparable rates. Because statistical power is influenced by factors beyond sample size, including data integrity, methodological controls must be refined to better identify and diminish the frequency of participant engagement in problematic respondent behaviors.
Computer and internet based questionnaires have become a standard tool in Human-Computer Interaction research and other related fields, such as psychology and sociology. Amazon’s Mechanical Turk (AMT) service is a new method of recruiting participants and conducting certain types of experiments. This study compares whether participants recruited through AMT give different responses than participants recruited through an online forum or recruited directly on a university campus. Moreover, we compare whether a study conducted within AMT results in different responses compared to a study for which participants are recruited through AMT but which is conducted using an external online questionnaire service. The results of this study show that there is a statistical difference between results obtained from participants recruited through AMT compared to the results from the participant recruited on campus or through online forums. We do, however, argue that this difference is so small that it has no practical consequence. There was no significant difference between running the study within AMT compared to running it with an online questionnaire service. There was no significant difference between results obtained directly from within AMT compared to results obtained in the campus and online forum condition. This may suggest that AMT is a viable and economical option for recruiting participants and for conducting studies as setting up and running a study with AMT generally requires less effort and time compared to other frequently used methods. We discuss our findings as well as limitations of using AMT for empirical studies.