Abstract
By creating a retrieval augmented LLM pipeline, (RAG with OpenAI’s GPT-4o), we explored the capability of LLMs to ‘twin’ real respondents in a publicly available survey. By providing the model context of previously answered questions from a particular respondent, we were able to achieve far better than random accuracy at masked questions. This opens the door to utilizing LLMs to augment known sentiment, extending surveys beyond the original question set and providing a far more flexible framework for working with qualitative data from real respondents and their ‘digital twins’.
Publish Date
July 1st, 2024
Authors
Destiny Ziebol
Surveys – direct, insightful and expensive.
Collecting survey data at any scale can be very expensive. Average costs, quality, and pricing structures vary widely even within providers; Survey Monkey estimates costs of one dollar per respondent to beyond three dollars per response with targeted and balanced respondents, displaying different pricing with each option. Mechanical Turk utilizes a ‘set your own rate’ structure with questionable quality (Aguinis et al., 2020). This research utilizes the National Science Foundation (NSF) funded General Social Survey, which collects roughly three thousand respondent answer sets biannually, at an operating budget of roughly $4 million a year (NSF 20-550: General Social Survey (GSS) Competition | NSF, 2020) (NSF 24-537: General Social Survey Competition | NSF, 2024).
Contact us at info@tickr.com if you would like to explore using these techniques to rapidly and accurately predict consumer attitudes and behaviors.
Data set and model
Party ID*
|
Sex
|
Race
|
||||||
Subsample
|
GSS 2022
|
Subsample
|
GSS 2022
|
Subsample
|
GSS 2022
|
|||
Strong Democrat:
|
32%
|
17.0%
|
Female:
|
54%
|
53.8%
|
White:
|
75%
|
72.0%
|
Not very strong Democrat:
|
25%
|
12.9%
|
Male:
|
46%
|
46.2%
|
Black:
|
16%
|
16.2%
|
Strong Republican:
|
23%
|
12.3%
|
Other:
|
9%
|
11.8%
|
|||
Not very strong Republican:
|
20%
|
10.3%
|
Each question in the survey is uniquely identified by its Schema ID, which we will be using throughout this post. Question text for individual IDs, and the distribution of raw data associated with them, can be found at the GSS Data Explorer: https://gssdataexplorer.norc.org/. The survey data was preprocessed into clean question text, answer texts, and schema keys that can be easily paired with survey responses.
Schema
|
Question
|
ID_1_Answer
|
wrkstat |
Last week were you working full time, part time, going to school, keeping house, or what?
|
working full time
|
wrkslf |
(Are/Were) you self employed or (do/did) you work for someone else?
|
someone else
|
wrkgovt1 |
(Are/Were) you employed by the government? (Please consider federal, state, or local government.)
|
no
|
Schema
|
Question
|
ID_1_Answer
|
natroad
|
(… are we spending too much, too little, or about the right amount on) Highways and bridges
|
about right
|
natsoc
|
(… are we spending too much, too little, or about the right amount on) Social Security
|
too little
|
natmass
|
(… are we spending too much, too little, or about the right amount on) Mass Transportation
|
about right
|
Auxiliary Opinion questions qualify the general position and outlook of the respondent, and are composed of 118 questions about a wide range of topics such as feelings about people of other races, religious preference, and taxation (question IDs for auxiliary set in appendix-4). This later set is entered into the vectorstore and dynamically searched to supply context to specific questions, and the top 46 are added to the prompt context.
Our pipeline utilized the LangChain Python library, which enables us to incorporate prompt templates, a vector store, reranker and output formatting from each pipeline segment into an easy to use function. Questions were summarized before vectorstore search with checkpoint gpt-3.5-turbo-0125 from OpenAI, which in our experience had great performance with lower compute costs. For our vector store we used a Chroma vectorstore embedded with OpenAI’s text-embedding-ada-002 embedding model, which were chosen for good performance. Once results were returned from the vectorstore, we re-ranked them with Flash ReRank (Damodaran, 2023) running the reranking model rank-T5-flan, which we chose for it’s zero shot out of sample performance and ease of use. For answer prediction we used gpt-4o-2024-05-13, the state of the art flagship OpenAI model at the time of writing.
RAG Pipeline
Context Development
Our pipeline begins by selecting a target question. For this example our target will be question id confinan:
We initially process this question with our first LLM call, using gpt-3.5-turbo-0125, to extract a concise summary of the question topic using the prompt:
This greatly improves the vector store results by minimizing formatting overlap and disregarding instructions to the respondent, instead shifting focus to the heart of the question and creating more meaningful embeddings. Our example question text is summarized as ‘Confidence in banks and financial institutions’. Now, instead of retrieving unrelated questions that are strictly about spending or use the same ‘great deal, only some, hardly any’ scale, we get questions with similar topics.
The vector store returns the questions in the Auxiliary Opinion set ordered by similarity between the embedding of the question and embedding of the summarized target text. This ordered set is then passed to the re-ranker, Flashrank running the rank-T5-flan model. Re-ranking is used as a secondary ordering step, to re-order the results using a cross encoding model that orders based on similarity score between the target text and returned records. Of these reordered results, the top 46 are returned.
Provided below are the top five questions returned by the ranked vector search, when searched for embedding similarity to the summarized question topic ‘Confidence in banks and financial institutions’:
Question ID
|
Question Text
|
satfin
|
We are interested in how people are getting along financially these days. So far as you and your family are concerned, would you say that you are pretty well satisfied with your present financial situation, more or less satisfied, or not satisfied at all?
|
workblks
|
Now I have some questions about different groups in our society. I’m going to show you a seven-point scale on which the characteristics of people in a group can be rated. In the first statement a score of 1 means that you think almost all of the people in that group are “rich.” A score of 7 means that you think almost everyone in the group are “poor.” A score of 4 means you think that the group is not towards one end or another, and of course you may choose any number In between that comes closest to where you think people in the group stand. The second set of characteristics asks if people in the group tend to be hard-working or if they tend to be lazy. Where would you rate blacks in general on this scale?
|
workwhts
|
Now I have some questions about different groups in our society. I’m going to show you a seven-point scale on which the characteristics of people in a group can be rated. In the first statement a score of 1 means that you think almost all of the people in that group are “rich.” A score of 7 means that you think almost everyone in the group are “poor.” A score of 4 means you think that the group is not towards one end or another, and of course you may choose any number In between that comes closest to where you think people in the group stand. The second set of characteristics asks if people in the group tend to be hard-working or if they tend to be lazy. Where would you rate whites in general on this scale?
|
helpsick
|
In general, some people think that it is the responsibility of the government in Washington to see to it that people have help in paying for doctors and hospital bills. Others think that these matters are not the responsibility of the federal government and that people should take care of these things themselves. Where would you place yourself on this scale, or haven’t you made up your mind on this?
|
relactiv
|
How often do you take part in the activities and organizations of a church or place of worship other than attending services?
|
Prompting the Model
The answer set supplied in ({answers}) is randomized, so the order of answers is different every time we prompt our chosen LLM used for answer prediction (gpt-4o-2024-05-13). This increases performance of the pipeline by preventing positional bias – the consistent selection of a response based only on its position on the list, such as always picking ‘c’ in a multiple choice question. Our pipeline’s performance at sentiment matched responses increased by 2.95% and raw ground truth percent correct remained the same after adding answer randomization to our pipeline. More information on sentiment grouped responses can be found in the “Matching Sentiment” section.
The model response is then post processed to remove any errant text or spaces. In the case of responses that are not exact matches for the supplied answer list another call to gpt-3.5-turbo-0125 gives us the closest allowable answer.
Results

Exact match to ground truth
Results vs. Random

Results vs. Popular choice

Matching Sentiment

Exploring Target Question Performance



Question ID
|
Question Text
|
immjobs
|
There are different opinions about immigrants from other countries living in America. (By “immigrants” we mean people who come to settle in America.) How much do you agree or disagree with each of the following statements? Immigrants take jobs away from people who were born in America
|
fefam
|
(Please read the following statements and indicate whether you strongly agree, agree, disagree, or strongly disagree with each statement…) It is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family.
|
meovrwrk
|
Family life often suffers because men concentrate too much on their work.
|

Question ID
|
Original Context
|
Alternate Context
|
immjobs
|
[‘hispanic’, ‘fund16’, ‘vetyears’, ‘sexbirth1’, ‘helpblk’, ’emailmin’, ‘workwhts’, ‘wksup1’, ‘workblks’, ‘union1’, ‘intlblks’, ‘wksub’, ‘pornlaw’, ‘finalter’, ‘life’, ‘notsmart’, ‘relig16’, ‘satfin’, ‘tax’, ‘letin1a’, ‘helpsick’, ‘socrel’, ‘intlwhts’, ‘dwelown’, ‘workhard’, ‘raceacs10’, ‘wksub1’, ‘wwwmin’, ‘thnkself’, ‘wwwhr’, ‘happy’, ‘evstray’, ‘racecen1’, ‘sprtprsn’, ‘reborn’, ‘<mark>popular</mark>’, ‘chldidel’, ‘discaff’, ‘god’, ‘socommun’, ‘raceacs3’, ‘marhisp’, ‘partfull’, ‘othlang’, ‘parsol’, ‘marwht’]
|
[‘intlhsps’, ‘cohort’, ‘unemp’, ‘raceacs2’, ‘<mark>popular</mark>’, ‘sei10’, ‘union’, ‘raceacs11’, ‘obey’, ‘immfate’, ‘wksup’, ‘zodiac’, ‘colrac’, ‘threaten’, ‘year’, ‘workhsps’, ‘hompop_exp’, ‘gunlaw’, ‘sexeduc’, ‘pasei10’, ‘raceacs13’, ‘sexnow1’, ‘rifle’, ‘raceacs12’, ‘owngun’, ‘savesoul’, ‘raceacs7’, ‘raceacs9’, ‘pistol’]
|
fefam
|
[‘<mark>hompop_exp</mark>’, ‘raceacs8’, ‘fund16’, ‘helpblk’, ‘masei10’, ‘immfate’, ’emailmin’, ‘workwhts’, ‘pasei10’, ‘wksup1’, ‘colrac’, ‘marasian’, ’emailhr’, ‘raceacs12’, ‘<mark>racerank1</mark>’, ‘raceacs4’, ‘hunt’, ‘pornlaw’, ‘relig16’, ‘relpersn’, ‘raceacs7’, ‘dateintv’, ‘relactiv’, ‘<mark>letin1a</mark>’, ‘helpsick’, ‘weekswrk’, ‘raceacs11’, ‘raceacs9’, ‘wordsum’, ‘raceacs6’, ‘colmslm’, ‘zodiac’, ‘<mark>dwelown</mark>’, ‘raceacs2’, ‘sei10’, ‘raceacs10’, ‘wwwmin’, ‘happy’, ‘year’, ‘raceacs5’, ‘raceacs14’, ‘god’, ‘raceacs3’, ‘othlang’, ‘raceacs13’, ‘marwht’]
|
[‘colath’, ‘socfrend’, ‘satjob’, ‘news’, ‘compuse’, ‘ethnic’, ‘fear’, ‘localnum’, ‘intlblks’, ‘cappun’, ‘pistol’, ‘attend’, ‘pray’, ‘adoptus’, ‘threaten’, ‘socbar’, ‘<mark>hompop_exp</mark>’, ‘sexeduc’, ‘health’, ‘<mark>dwelown</mark>’, ‘sexnow1’, ‘rifle’, ‘reborn’, ‘finrela’, ‘owngun’, ‘life’, ‘<mark>letin1a</mark>’, ‘shotgun’, ‘hispanic’, ‘<mark>racerank1</mark>’]
|
meovrwrk
|
[‘sexbirth1’, ‘helpblk’, ‘satjob’, ‘fund16’, ‘<mark>workwhts</mark>’, ‘<mark>emailmin</mark>’, ‘pasei10’, ‘workblks’, ‘unemp’, ‘marasian’, ‘<mark>emailhr</mark>’, ‘wksup’, ‘helpoth’, ‘intlblks’, ‘wksub’, ‘racerank1’, ‘raceacs15’, ‘localnum’, ‘pornlaw’, ‘finalter’, ‘tvhours’, ‘relpersn’, ‘raceacs7’, ‘tax’, ‘workhsps’, ‘weekswrk’, ‘wordsum’, ‘intlwhts’, ‘class’, ‘threaten’, ‘workhard’, ‘raceacs10’, ‘savesoul’, ‘wwwmin’, ‘<mark>wwwhr</mark>’, ‘evstray’, ‘sprtprsn’, ‘raceacs5’, ‘reborn’, ‘chldidel’, ‘god’, ‘partfull’, ‘othlang’, ‘parsol’, ‘marwht’, ‘<mark>compuse</mark>’]
|
[‘<mark>emailhr</mark>’, ‘<mark>emailmin</mark>’, ‘<mark>wwwhr</mark>’, ‘news’, ‘wwwmin’, ‘<mark>compuse</mark>’, ‘<mark>workwhts</mark>’, ‘dwelown16’]
|
The alternate context set performed much better for ‘fefam’ and ‘meovwrk’’, the two questions about gender roles, increasing their sentiment grouped performance by 9.78% and 11.86% respectively. ‘immjobs’ performed worse with the new context, with sentiment grouped performance dropping by 4.35%. The new context set compositions were entirely due to changes in the vector store – BAAI/llm-embedder (here and here, Zhang et al. 2023) was used as an embedding model instead of the text-embedding-ada-002 model, and a similarity score cutoff (cutoff of 0.5) was used instead of top n for returning items from the store. This resulted in context question sets of variable length, and smaller more concise sets overall.
Conclusion
References
Aguinis, H., Villamor, I., & Ramani, R. S. (2020). MTurk research: Review and recommendations. Journal of Management, 47(4), 823–837. https://doi.org/10.1177/0149206320969787
Bontcheva, K., Gorrell, G., & Wessels, B. (2013, June 4). Social media and information overload: Survey results. arXiv.Org. https://arxiv.org/abs/1306.0813
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020a, May 28). Language Models are Few-Shot Learners. arXiv.Org. https://arxiv.org/abs/2005.14165
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020b, May 28). Language Models are Few-Shot Learners. arXiv.Org. https://arxiv.org/abs/2005.14165
Brox. (n.d.). Brox.ai. Retrieved June 26, 2024, from https://brox.ai/
Chroma. (2024, June 21). the AI-native open-source embedding database. Trychroma.Com. https://www.trychroma.com/
Common Crawl. (n.d.). Common crawl. Open Repository of Web Crawl Data. Retrieved June 26, 2024, from https://commoncrawl.org/
Damodaran, P. (2023). FlashRank, Lightest and Fastest 2nd Stage Reranker for search pipelines. https://doi.org/10.5281/zenodo.10426927
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts.
National Science Foundation Directorate for Social, Behavioral and Economic Sciences Division of Social and Economic Sciences. (2020, February 27). General Social Survey (GSS) competition. NSF – National Science Foundation. https://new.nsf.gov/funding/opportunities/general-social-survey-competition/nsf20-550/solicitation
National Science Foundation Directorate for Social, Behavioral and Economic Sciences Division of Social and Economic Sciences. (2024, February 7). General social survey competition. NSF – National Science Foundation. https://new.nsf.gov/funding/opportunities/general-social-survey-competition/nsf24-537/solicitation
Papers with code – WebText dataset. (n.d.). Papers With Code. Retrieved June 26, 2024, from https://paperswithcode.com/dataset/webtext
Survey Monkey. (n.d.). Discover SurveyMonkey Audience. Survey Monkey. Retrieved June 26, 2024, from https://www.surveymonkey.com/collect/audience/calculator/preview?ut_source=pricing&ut_source2=pricing
Zhang, P., Xiao, S., Liu, Z., Dou, Z., & Nie, J.-Y. (2023).
Retrieve Anything To Augment Large Language Models.
Appendix
- Respondent IDs for respondents used in this research. These respondents were subsampled from the larger body of respondents such that Party ID, Sex, and Race reflect the distributions of the parent set.
[28, 52, 188, 254, 302, 313, 345, 382, 428, 502, 517, 518, 531, 558, 571, 606, 654, 665, 667, 672, 707, 746, 749, 753, 764, 799, 848, 862, 864, 936, 1006, 1036, 1054, 1116, 1142, 1229, 1244, 1290, 1325, 1369, 1433, 1455, 1465, 1468, 1487, 1660, 1675, 1751, 1755, 1773, 1798, 1949, 1950, 1960, 2014, 2015, 2073, 2163, 2212, 2228, 2236, 2245, 2282, 2309, 2312, 2408, 2431, 2459, 2467, 2597, 2654, 2657, 2663, 2677, 2695, 2720, 2735, 2795, 2827, 2911, 2921, 2924, 2925, 2948, 3037, 3068, 3081, 3097, 3099, 3121, 3152, 3225, 3250, 3278, 3289, 3344, 3436, 3496, 3500, 3533] - Test questions, IDs and Text. These are ‘target questions’,
Schema Question natroad (… are we spending too much, too little, or about the right amount on) Highways and bridges natsoc (… are we spending too much, too little, or about the right amount on) Social Security natmass (… are we spending too much, too little, or about the right amount on) Mass Transportation natpark (… are we spending too much, too little, or about the right amount on) Parks and recreation natchld (… are we spending too much, too little, or about the right amount on) Assistance for childcare natsci (… are we spending too much, too little, or about the right amount on) Supporting scientific research natenrgy (… are we spending too much, too little, or about the right amount on) Developing alternative energy sources eqwlth Some people think that the government in Washington ought to reduce the income differences between the rich and the poor, perhaps by raising the taxes of wealthy families or by giving income assistance to the poor. Others think that the government should not concern itself with reducing this income difference between the rich and the poor. Here is a scale from 1 to 7. Think of a score of 1 as meaning that the government ought to reduce the income differences between rich and poor, and a score of 7 meaning that the government should not concern itself with reducing income differences. What score between 1 and 7 comes closest to the way you feel? cappun Do you favor or oppose the death penalty for persons convicted of murder? gunlaw Would you favor or oppose a law which would require a person to obtain a police permit before he or she could buy a gun? affrmact Some people say that because of past discrimination, Blacks should be given preference in hiring and promotion. Others say that such preference in hiring and promotion of Blacks is wrong because it discriminates against Whites. What about your opinion? Are you for or against preferential hiring and promotion of Blacks? IF FAVORS: Do you favor preference in hiring and promotion strongly or not strongly? IF OPPOSES: Do you oppose preference in hiring and promotion strongly or not strongly? wrkwayup Do you agree strongly, agree somewhat, neither agree nor disagree, disagree somewhat, or disagree strongly with the following statement: Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without special favors. confinan (I am going to name some institutions in this country. As far as the people running this institution are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Banks and financial institutions conbus (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Major companies conclerg (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Organized religion coneduc (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Education confed (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Executive branch of the federal government conlabor (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Organized labor conpress (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Press conmedic (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Medicine contv (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) TV conjudge (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) U.S. Supreme Court consci (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Scientific Community conlegis (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Congress conarmy (I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them?) Military pillok Do you strongly agree, agree, disagree, or strongly disagree that methods of birth control should be available to teenagers between the ages of 14 and 16 if their parents do not approve? sexeduc Would you be for or against sex education in the public schools? premarsx There’s been a lot of discussion about the way morals and attitudes about sex are changing in this country. If a man and a woman have sexual relations before marriage, do you think it is always wrong, almost always wrong, wrong only sometimes, or not wrong at all? teensex There’s been a lot of discussion about the way morals and attitudes about sex are changing in this country. If a man and a woman have sexual relations before marriage, what if they are in their early teens, say 14 to 16 years old? In that case, do you think sex relations before marriage are always wrong, almost always wrong, wrong only sometimes, or not wrong at all? xmarsex What is your opinion about a married person having sexual relations with someone other than the marriage partner–is it always wrong, almost always wrong, wrong only sometimes, or not wrong at all? homosex What about sexual relations between two adults of the same sex–do you think it is always wrong, almost always wrong, wrong only sometimes, or not wrong at all? spanking Do you strongly agree, agree, disagree, or strongly disagree that it is sometimes necessary to discipline a child with a good, hard spanking? polmurdr Would you approve of a policeman striking a citizen who was being questioned as a suspect in a murder case? polescap Would you approve of a policeman striking a citizen who was attempting to escape from custody? fepresch (Please read the following statements and indicate whether you strongly agree, agree, disagree, or strongly disagree with each statement…) A preschool child is likely to suffer if his or her mother works. fefam (Please read the following statements and indicate whether you strongly agree, agree, disagree, or strongly disagree with each statement…) It is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family. helppoor Next, here are issues that some people tell us are important. Some people think that the government in Washington should do everything possible to improve the standard of living of all poor Americans, they are at Point 1 on the scale below. Other people think it is not the government’s responsibility, and that each person should take care of himself, they are at Point 5. Where would you place yourself on this scale, or haven’t you made up your mind on this? helpnot Some people think that the government in Washington is trying to do too many things that should be left to individuals and private businesses. Others disagree and think that the government should do even more to solve our country’s problems. Still others have opinions somewhere in between. Where would you place yourself on this scale, or haven’t you made up your mind on this? fehire Now I’m going to read several statements. As I read each one, please tell me whether you strongly agree, agree, neither agree nor disagree: Because of past discrimination, employers should make special efforts to hire and promote qualified women. goodlife (Please tell me/Please indicate) whether you strongly agree, agree, neither agree nor disagree, disagree, or strongly disagree with the following statement: The way things are in America, people like me and my family have a good chance of improving our standard of living– do you agree or disagree? marhomo (Do you agree or disagree?) Homosexual couples should have the right to marry one another. meovrwrk Family life often suffers because men concentrate too much on their work. fechld To begin, we have some questions about women. To what extent do you agree or disagree…? A working mother can establish just as warm and secure a relationship with her children as a mother who does not work. immcrime There are different opinions about immigrants from other countries living in America. (By “immigrants” we mean people who come to settle in America.) How much do you agree or disagree with each of the following statements? Immigrants increase crime rates immjob There are different opinions about immigrants from other countries living in America. (By “immigrants” we mean people who come to settle in America.) How much do you agree or disagree with each of the following statements? Immigrants take jobs away from people who were born in America - Demographic question set, added to all context sets and always supplied to the model. Full context set is composed of these questions and the questions returned from the vectorstore:
[’sex’, ‘age’, ‘born’, ‘race’, ‘educ’, ‘degree’, ‘marital’, ‘widowed’, ‘partyid’, ‘polviews’, ‘region’, ‘xnorcsiz’, ‘income’, ‘wrkstat’, ‘vote16’, ‘pres16’, ‘vote20’, ‘pres20’, ‘wrkslf’, ‘wrkgovt1’, ‘wrkgovt2’, ‘occ10’, ‘prestg10’, ‘indus10’, ‘pawrkslf’, ‘pawrkslf2’, ‘paocc10’, ‘papres10’, ‘paind10’, ‘mawrkslf’, ‘mawrkslf2’, ‘maocc10’, ‘mapres10’, ‘sibs’, ‘childs’, ‘agekdbrn’, ‘paeduc’, ‘maeduc’, ‘padeg’, ‘madeg’, ‘dipged’, ‘res16’, ‘reg16’, ‘mobile16’, ‘family16’, ‘mawrkgrw’, ‘incom16’, ‘granborn’, ‘maborn’, ‘paborn’, ‘adults’, ‘unrelat’, ‘earnrs’, ‘size’]Schema Question sex CODE RESPONDENT’S SEX age RESPONDENT’S AGE born Were you born in this country? race What race do you consider yourself? educ RESPONDENT’S EDUCATION degree RESPONDENT’S DEGREE marital Are you currently married, widowed, divorced, separated, or have you never been married? widowed IF CURRENTLY MARRIED, SEPARATED, OR DIVORCED: Have you ever been widowed? partyid Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or what? polviews We hear a lot of talk these days about liberals and conservatives. I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal–point 1–to extremely conservative–point 7. Where would you place yourself on this scale? region REGION OF INTERVIEW xnorcsiz Size of your location of residence: income In which of these groups did your total family income, from all sources, fall last year before taxes, that is? wrkstat Last week were you working full time, part time, going to school, keeping house, or what? vote16 In 2016, you remember that Hillary Clinton ran for President on the Democratic ticket against Donald Trump for the Republicans. Do you remember for sure whether or not you voted in that election? pres16 Did you vote for Hillary Clinton or Donald Trump? vote20 In 2020, you remember that Joe Biden ran for President on the Democratic ticket against Donald Trump for the Republicans. Do you remember for sure whether or not you voted in that election? pres20 Did you vote for Joe Biden or Donald Trump? wrkslf (Are/Were) you self employed or (do/did) you work for someone else? wrkgovt1 (Are/Were) you employed by the government? (Please consider federal, state, or local government.) wrkgovt2 (Are/Were) you employed by a private employer (including non-profit organizations)? occ10 RESPONDENT’S OCCUPATION prestg10 PRESTIGE OF RESPONDENT’S OCCUPATION (0 to 100, higher is more prestigious) indus10 RESPONDENT’S INDUSTRY pawrkslf Was your [father/stepfather/male relative you were living with when you were 16] an employee, self-employed without employees, or self-employed with employees pawrkslf2 Was he self-employed or did he work for someone else paocc10 FATHER’S OCCUPATION papres10 PRESTIGE OF RESPONDENT’S FATHER’S OCCUPATION (0 to 100, higher is more prestigious) paind10 FATHER’S INDUSTRY mawrkslf At this job, was [mother/stepmother/female realtive you were living with when you were 16] an employee, self-employed without employees, or self-employed with employees? mawrkslf2 At this job, was (your mother/your stepmother/the female relative you were living with when you were 16) an employee, self-employed without employees, or self-employed with employees? maocc10 MOTHER’S OCCUPATION mapres10 PRESTIGE OF RESPONDENT’S MOTHER’S OCCUPATION (0 to 100, higher is more prestigious) sibs How many brothers and sisters did you have? Please count those born alive, but no longer living, as well as those alive now. Also include stepbrothers and stepsisters, and children adopted by your parents. childs How many children have you ever had? Please count all that were born alive at any time (including any you had from a previous marriage). agekdbrn How old were you when your first child was born? paeduc What is the highest grade in elementary school or high school that your father finished and got credit for? maeduc What is the highest grade in elementary school or high school that your mother finished and got credit for? padeg RESPONDENT’S FATHER’S (SUBSTITUTE FATHER’S) DEGREE madeg RESPONDENT’S MOTHER’S (SUBSTITUTE MOTHER’S) DEGREE dipged Was that a diploma awarded by your high school at graduation at the end of 12th grade, a GED awarded after you took a test, or something else? res16 Which of these categories comes closest to the type of place you were living in when you were 16 years old? in a small city or town (under 50,000), in a medium-size city (50,000-250,000), in a large city (over 250,000), in a suburb near a large city, in open country but not on a farm, farm reg16 In what state or foreign country were you living when you were 16 years old? mobile16 IF STATE NAMED IS SAME STATE R. LIVES IN NOW, ASK MOBILE16: When you were 16 years old, were you living in this same (city/town/county) as you live in now? family16 Were you living with both your own mother and father around the time you were 16? mawrkgrw Did your mother ever work for pay for as long as a year, while you were growing up? incom16 Thinking about the time when you were 16 years old, compared with American families in general then, would you say your family income was: far below average, below average, average, above average, or far above average? (PROBE: Just your best guess.) granborn (Were all of your four grandparents born in this country?…) IF NO: How many were born outside the United States? maborn Was (your mother/ your stepmother/ the female relative you were living with at 16) born in this country? paborn Was (your father/ your stepfather/ the male relative you were living with at 16) born in this country? adults Household Size and Composition: NUMBER OF MEMBERS OVER 17 YEARS OLD. 18+ YEARS unrelat (Is everyone in the household related to you in some way?…) IF NO: How many persons in the household are not related to you in any way? earnrs Just thinking about your family now–those people in the household who are related to you . . . How many persons in the family (including yourself) earned any money in [the last year] from any job or employment? size SIZE OF PLACE: Size of place in thousands is a 4 digit code which provides actual size of place of interview. Remember when using this code to add 3 zeros. Listed below are the frequencies for gross population categories. - Auxiliary question set, which are embedded and added to a vector store, searched against a generated summary of the target question, and then re-ranked by our reranker model. The top 46 are then combined with the core demographic set to create the full question-answer context fed to the model. Full question text is not provided here for space.
[‘ethregion54’, ‘pornlaw’, ‘ethregion22’, ‘relactiv’, ‘chldidel’, ‘vetyears’, ‘sei10’, ‘ethregion4’, ‘ethregion65’, ‘satfin’, ‘ethregion24’, ‘ethregion84’, ‘ethregion15’, ‘ethregion29’, ‘ethregion86’, ‘ethworld2’, ‘ethregion96’, ‘ethregion23’, ‘socrel’, ‘shotgun’, ‘finrela’, ‘ethworld6’, ‘wksub’, ‘marasian’, ‘ethregion82’, ‘socbar’, ‘raceacs5’, ‘notsmart’, ‘ethregion36’, ‘ethregion58’, ‘ethregion2’, ‘news’, ‘ethregion59’, ‘ethregion31’, ‘ethregion7’, ‘ethworld1’, ‘ethregion12’, ‘year’, ‘ethregion9’, ‘fund16’, ‘workhard’, ‘helpblk’, ‘ethregion89’, ‘ethregion94’, ‘parsol’, ‘liveblks’, ‘finalter’, ‘ethregion93’, ‘raceacs9’, ‘adoptus’, ‘ethregion39’, ‘health’, ‘discaff’, ‘marwht’, ‘tax’, ‘ethregion5’, ‘rifle’, ‘ethregion76’, ‘raceacs6’, ‘ethregion1’, ‘ethregion19’, ‘hunt1’, ‘colrac’, ‘ethregion16’, ‘ethregion50’, ‘union1’, ‘hunt’, ‘ethregion45’, ‘sexbirth1’, ‘ethworld3’, ‘workblks’, ‘afraidof’, ‘attend’, ‘ethregion37’, ‘ethregion27’, ‘ethregion52’, ‘ethregion8’, ‘ethregion14’, ‘ethregion74’, ‘intlhsps’, ‘ethregion60’, ‘tvhours’, ‘pray’, ‘ethworld5’, ‘ethregion68’, ‘ethregion64’, ‘ethregion20’, ‘wwwmin’, ‘localnum’, ‘ethregion72’, ‘raceacs16’, ‘dateintv’, ‘ethregion10’, ‘ethregion49’, ‘ethregion43’, ‘zodiac’, ‘othlang’, ‘wksub1’, ‘ethnic’, ‘ethregion42’, ‘masei10’, ‘relpersn’, ‘wksup’, ‘wksup1’, ‘raceacs2’, ‘hompop_exp’, ‘intlwhts’, ‘ethregion87’, ‘intlblks’, ‘poorserv’, ‘raceacs1’, ‘racecen1’, ‘hispanic’, ‘reborn’, ‘workwhts’, ‘ethregion80’, ‘ethregion66’, ‘immfate’, ‘ethregion90’, ‘sexnow1’, ‘compuse’, ‘ethregion25’, ‘weekswrk’, ‘ethworld9’, ‘ethregion35’, ‘ethregion3’, ‘popular’, ‘pistol’, ‘cohort’, ‘ethregion79’, ‘life’, ‘ethregion70’, ‘relig16’, ‘dwelown’, ‘raceacs14’, ‘marblk’, ‘ethregion48’, ‘ethregion83’, ‘god’, ‘ethregion30’, ‘helpoth’, ‘fear’, ‘ethregion41’, ‘savesoul’, ‘raceacs3’, ‘ethregion44’, ‘ethregion77’, ‘socommun’, ‘colath’, ’emailhr’, ‘racerank1’, ‘happy’, ‘ethregion67’, ‘ethregion91’, ‘ethregion13’, ‘raceacs13’, ‘ethregion73’, ‘relig’, ‘ethregion40’, ‘obey’, ‘marhisp’, ‘raceacs8’, ‘sprtprsn’, ’emailmin’, ‘ethregion33’, ‘raceacs12’, ‘ethregion46’, ‘workhsps’, ‘unemp’, ‘thnkself’, ‘ethregion85’, ‘raceacs15’, ‘ethregion56’, ‘wordsum’, ‘raceacs7’, ‘union’, ‘raceacs10’, ‘colmslm’, ‘helpsick’, ‘ethregion51’, ‘class’, ‘owngun’, ‘ethworld8’, ‘ethregion57’, ‘raceacs4’, ‘ethworld7’, ‘socfrend’, ‘ethregion32’, ‘letin1a’, ‘ethregion47’, ‘ethregion75’, ‘ethregion78’, ‘ethregion11’, ‘ethregion17’, ‘dwelown16’, ‘ethregion6’, ‘ethregion55’, ‘disrspct’, ‘ethregion26’, ‘pasei10’, ‘ethregion92’, ‘ethregion71’, ‘ethregion38’, ‘ethregion53’, ‘ethregion18’, ‘ethregion81’, ‘ethregion69’, ‘ethregion62’, ‘satjob’, ‘ethregion34’, ‘ethregion88’, ‘ethregion63’, ‘raceacs11’, ‘ethregion21’, ‘partfull’, ‘ethworld4’, ‘evstray’, ‘wwwhr’, ‘threaten’, ‘ethregion61’, ‘coninc’, ‘ethregion28’]