Can Artificial Intelligence complete my students' work?
There is an exercise that I have been using with students for over 15 years. It's a very simple exercise designed to show students why, when they are conducting desk-based research, they need to ask the same questions of different sources in order to gain a broad picture of the topic and to then be able to develop a critical discussion of the evidence available. The session is part of a series which introduces students to academic writing. So, to keep things as simple as possible, I give the students two or sometimes three articles on the topic of the importance of reference list accuracy. The students are divided into pairs or threes with each individual asked to answer a set of identical questions for one of the articles. The students then compare what they have found and hopefully discover that while some of the answers are the same, there are differences. They then set about constructing a short response to an essay title I give them, based on their amalgamated answers to the questions. While they are doing this, I'm also interested to see if they cite the sources they have been using and if these are presented in a format acceptable for use in their assessments.
There has been much discussion over recent weeks as to how far AI and particularly ChatGPT is causing educational assessments and other academic exercises to be redundant, with the traditional essay seemingly one of the casualties, given how well ChatGPT has demonstrated its ability to complete them (Davis, 2023). So, what will happen to my exercise? I decided the best way to find out was to use some online AI tools to complete the work and then compare the overall responses produced by each of the tools to the answers that students would hopefully find when reading the articles and completing the exercise themselves.
The three AI tools used for the exercise are ChatGPT, Elicit and SciSpace. ChatGPT is a general AI tool, which has been widely discussed in the press since its launch in late 2022 (Hurst, 2022). The other two tools are specifically aimed at an academic audience. Elicit (2022) claims to be a "research assistant using language models like GPT-3 to automate parts of researchers’ workflows". Meanwhile, SciSpace (2022) states that it has a "core mission of accelerating scientific discovery and learning".
So all three of the AI tools were used in an attempt to determine the degree to which they were able to complete the task without reference to the journal articles directly by the individual making the queries.
Initially, the task is divided into a series of questions. Once the individual questions have been attempted, some of the information collected is then synthesised into a few short paragraphs responding to an essay title prompt.
The articles used for the paired version of this task are Oermann and Ziolkowski (2002) and Spivey & Wilks (2004). The papers both present research into errors made in the reference lists of journal articles. They are chosen for their clarity, brevity and content.
The questions that need answering are two background questions which draw on the literature reviews of the articles, a more complex question about the research design and two questions about the findings which require short answers.
Each question from the exercise is now presented with the correct answers taken from the papers, followed by the responses produced by each AI tool and comments about the accuracy of the responses following each question.
Question 1) Why is it important to provide a reference list?
The answers provided in each paper are as follows:
The Oermann & Ziolkowski (2002) paper gives four answers which can be sumarised as
Resource for readers
Help readers to expand understanding, learn what has been published, note gaps, see different viewpoints.
Reader can trace the work of an author
Show the currency of the literature
Spivey & Wilks (2004) also provides four answers to the question. While one answer is similar, it is probably best described as complementary; meaning students could draw on both articles to discuss finding background information.
Find background information in an area
Construct citation indexes
Workout faculty production
Develop rankings
With this evidence from the two articles it can reasonably be argued that there are seven different points made between the two papers regarding the importance of reference lists.
In order to try to answer the question, ChatGPT was instructed as follows:
Using only evidence from "Oermann, M. & Ziolkowski, L. (2002) Accuracy of references in three critical care nursing journals, Journal of PeriAnesthesia Nursing, 17(2), 78-83," explain why reference lists and bibliographies are so important.
ChatGPT produced the following response to this instruction in a matter of seconds:
... accurate reference lists and bibliographies are crucial for several reasons:
Support for claims and assertions: Reference lists and bibliographies help to support the claims and assertions made in a research paper or article by providing evidence from credible sources. This helps to increase the credibility and reliability of the research being conducted.
Attribution of ideas and information: By including accurate references, authors are able to give credit to the original sources of ideas and information that they have used in their research. This is important for avoiding plagiarism and for maintaining ethical standards in research.
Facilitation of further research: Accurate reference lists and bibliographies can help to facilitate further research by allowing others to easily locate and access the sources that have been used in a particular study. This can help to further advance knowledge in a particular field and to build on existing research.
In conclusion, the study by Oermann and Ziolkowski (2002) highlights the importance of accurate reference lists and bibliographies in research, as they help to support claims, give credit to sources, and facilitate further research.
It was then instructed to carry out the same task but using the Spivey and Wilks (2004) article. The following was produced:
The article "Reference List Accuracy in Social Work Journals" by Spivey and Wilks (2004) provides evidence of the importance of reference lists and bibliographies. The authors conducted a study to evaluate the accuracy of reference lists in social work journals, and their findings suggest that accurate reference lists and bibliographies are important for several reasons:
Credibility and reliability of research: Accurate reference lists and bibliographies help to increase the credibility and reliability of the research being conducted by providing evidence from credible sources. This helps to support the claims and assertions made in a research paper or article.
Attribution of ideas and information: Including accurate references helps authors to give credit to the original sources of ideas and information that they have used in their research. This is important for avoiding plagiarism and for maintaining ethical standards in research.
Facilitation of further research: Accurate reference lists and bibliographies can help to facilitate further research by allowing others to easily locate and access the sources that have been used in a particular study. This can help to further advance knowledge in a particular field and to build on existing research.
Compliance with ethical standards: In the field of social work, it is important to adhere to ethical standards, and this includes accurately citing sources in research. Accurate reference lists and bibliographies help to ensure that ethical standards are being met.
In conclusion, the study by Spivey and Wilks (2004) highlights the importance of accurate reference lists and bibliographies in research, as they help to increase the credibility and reliability of research, give credit to sources, facilitate further research, and ensure compliance with ethical standards.
These responses from ChatGPT provide sound reasons for the inclusion of reference lists in and in places are related to the correct answers found in the papers. However, most of the points made bear no direct relation to the evidence available in the articles and at times make points that are completely unrelated to the evidence presented in the articles.
SciSpace responded as follows when asked about information in Oermann & Ziolkowski (2002) about the importance of reference lists:
Reference lists are important because
They provide support for the author's ideas and help readers to understand the topic better.
They also allow readers to trace all of the works of a particular author, which is valuable when examining research studies and clinical projects over time.
References direct readers to related studies, showing how current study builds on prior research as well as providing additional literature on the topic so that reader can expand their understanding about it.
Errors in references reflect poorly on both authors and journals; hence accurate reference list should be prepared by using various strategies such as copying bibliographic information from computer rather than recording it manually or setting up files/using software programs etc.,
The response from SciSpace for Spivey and Wilks (2004) was as follows:
Reference lists and bibliographies are important because
they provide a way to find background information in an area of study.
They can also be used to construct citation indexes, which help with decisions such as tenure or salary for faculty members.
Additionally, reference list accuracy is essential when evaluating the rigor of an article and establishing credibility for authors who have conducted research on a particular topic.
Finally, reference lists allow us to compare journals based on their editorial practices so that we can identify areas where improvement may be needed.
The responses provided by SciSpace correspond closely to the evidence from both papers. At times the language is reproduced verbatim and when this happens does no quotation marks are used to indicate that it is copied word for word.
When a similar search was made using Elicit, it was found that the only way to do this was by using the question starter, "What was the ...". Therefore, the question "What was the reason given for the importance of reference lists?" was used.
The response for Oermann and Ziolkowski (2002) was:
they are a resource for the reader, providing additional information on the topic.
Meanwhile for Spivey and Wilks (2004) the question was rephrased a further two times but on all 3 occasions Elicit was unable to provide any response.
Elicit clearly provided one of the four points made in Oermann & Ziolkowski (2002) but had no response for the Spivey and Wilks (2004) article.
Question 2) Why are errors in reference lists so serious?
For the question, "Why are errors in reference lists so serious?" the correct answers from the articles are as follows:
Oermann & Ziolkowski (2002)
Prevent retrieval of references
Authors may not be recognised
Waste reader time
Impede computer searches
Spivey & Wilks (2004)
Prevent retrieval of references
Give doubt to the care taken by the author
Compromise the author’s credibility
Annoy future researchers
The response from ChatGPT for the Oermann & Ziolkowski (2002) article was:
The article "Accuracy of references in three critical care nursing journals" by Oermann and Ziolkowski (2002) provides evidence of the problems associated with errors in reference lists and bibliographies. The authors conducted a study to evaluate the accuracy of references in three critical care nursing journals, and their findings suggest that errors in reference lists and bibliographies can be problematic for several reasons:
Reduced credibility and reliability of research: Errors in reference lists and bibliographies can reduce the credibility and reliability of the research being conducted by casting doubt on the validity of the sources being cited. This can make it difficult for others to trust the research being conducted and to build on it in the future.
Confusion for readers: Errors in reference lists and bibliographies can create confusion for readers, as they may not be able to locate the sources that have been cited. This can make it difficult for them to evaluate the validity of the claims and assertions being made in a research paper or article.
Inability to access sources: If the references provided in a reference list or bibliography are incorrect, it can be difficult for others to access the sources that have been used in a particular study. This can limit the ability of others to build on the research being conducted and to advance knowledge in a particular field.
Ethical concerns: Errors in reference lists and bibliographies can also raise ethical concerns, as they may be seen as a form of academic dishonesty or plagiarism. This can damage the reputation of the authors and the journals that publish their work.
In conclusion, the study by Oermann and Ziolkowski (2002) highlights the problems associated with errors in reference lists and bibliographies in research, as they can reduce the credibility and reliability of research, create confusion for readers, limit the ability to access sources, and raise ethical concerns.
The response from ChatGPT to the Spivey & Wilks (2004) article was:
The article "Reference List Accuracy in Social Work Journals" by Spivey and Wilks (2004) provides evidence of the problems associated with errors in reference lists and bibliographies. The authors conducted a study to evaluate the accuracy of reference lists in social work journals, and their findings suggest that errors in reference lists and bibliographies can be problematic for several reasons:
Reduced credibility and reliability of research: Errors in reference lists and bibliographies can reduce the credibility and reliability of the research being conducted by casting doubt on the validity of the sources being cited. This can make it difficult for others to trust the research being conducted and to build on it in the future.
Confusion for readers: Errors in reference lists and bibliographies can create confusion for readers, as they may not be able to locate the sources that have been cited. This can make it difficult for them to evaluate the validity of the claims and assertions being made in a research paper or article.
Ethical concerns: In the field of social work, it is important to adhere to ethical standards, and this includes accurately citing sources in research. Errors in reference lists and bibliographies can raise ethical concerns, as they may be seen as a form of academic dishonesty or plagiarism. This can damage the reputation of the authors and the journals that publish their work.
Reduced potential for further research: If the references provided in a reference list or bibliography are incorrect, it can be difficult for others to access the sources that have been used in a particular study. This can limit the ability of others to build on the research being conducted and to advance knowledge in a particular field.
In conclusion, the study by Spivey and Wilks (2004) highlights the problems associated with errors in reference lists and bibliographies in research, as they can reduce the credibility and reliability of research, create confusion for readers, raise ethical concerns, and limit the potential for further research.
The responses from ChatGPT covered the prevention of reference retrieval which was mentioned by both articles. However, while some answers were generally related to the correct answers and were good reasons, they did not convey the points made in either article accurately.
The responses provided by SciSpace to question 2 were as follows:
For Oermann & Ziolkowski (2002) there were three points made.
Errors in references can lead to a prolonged search of author names, which may prevent locating the referenced articles.
It also prevents giving credit to an author for his or her work and leaves the individual conducting the search open to incorrect information.
Minor errors such as omission or transposition of initials are still problematic when searching bibliographic databases like CINAHL and MEDLINE since it could result in inaccurate results.
While for Spivey & Wilks (2004) four points were made.
Errors in reference lists and bibliographies can lead to confusion when readers are trying to find the original source of information.
They also make it difficult for authors, editors, and reviewers to assess the accuracy of an article's references.
Furthermore, errors may cause a lack of trust from readers who rely on accurate citations as part of their research process.
Finally, inaccurate citation data could result in incorrect conclusions being drawn about certain topics or areas due to misattribution or misinterpretation by researchers relying on these sources.
SciSpace responded with all the points from both papers for this question. As with question 1, at times the result are a verbatim repetition of the language of the original paper with quotation marks omitted.
The responses to question 2 provided by Elicit were as follows:
Oermann & Ziolkowski (2002)
They were the result of mistakes made during the editing process.
Spivey & Wilks (2004)
Authors do not verify citations with the original source.
These responses from Elicit did not reflect the evidence of either article.
Question 3) What exactly did the study research? Provide enough detail so that the study could be reproduced.
Question 3 required the most detailed answers of the five questions. The Oermann & Ziolkowski (2002) study used thirteen issues of three critical care nursing journals published between January and September 2000. The journals were the Journal of PeriAnesthesia Nursing, the American Journal of Critical Care, and Critical Care Nurse. A random selection of 10% of references from each journal consisting of journal articles, book chapters, books, and websites were included with all other sources excluded. Errors were classes as major and minor in line with work by Taylor (1998).
For Spivey & Wilks (2004) all issues from the year 2000 of five social work journals: Journal of Social Work Education (JSWE), Clinical Social Work Journal (CSWJ), Social Service Review(SSR), Social Work (SW), and Social Work Research (SWR) were included in the research. Only references made to journal articles were included in the sample with 100 randomly selected from each and all other sources omitted. The accuracy of 6 features of each reference selected was assessed.
The response to Question 3 from ChatGPT for each paper was as follows:
The research design of "Accuracy of references in three critical care nursing journals" by Oermann and Ziolkowski (2002) involved the evaluation of the accuracy of references in three critical care nursing journals. The following details the steps involved in the research design, which would allow someone to replicate the research exactly:
Sampling: The authors selected three critical care nursing journals, and from each journal, they randomly selected 10 articles published in the previous year. This resulted in a total sample size of 30 articles.
Data Collection: For each article selected, the authors examined the reference list and compared it to the original source material. They recorded any discrepancies or errors in the reference list.
Data Analysis: The authors counted the number of errors in each reference list and calculated the error rate as a percentage of the total number of references. They then calculated the average error rate for each of the three journals and for the overall sample.
Statistical Analysis: The authors used descriptive statistics to summarize the error rates for each of the three journals and for the overall sample. They also calculated 95% confidence intervals for each of the error rates.
Limitations: The authors acknowledged that their sample size was small and may not be representative of all critical care nursing journals. They also acknowledged that their study was limited to the examination of errors in reference lists, and they did not examine other aspects of the articles such as content, writing quality, or methodology.
In conclusion, the research design of "Accuracy of references in three critical care nursing journals" by Oermann and Ziolkowski (2002) involved selecting a sample of 30 articles from three critical care nursing journals, examining the reference lists for errors, calculating error rates, and using descriptive statistics to summarize the results. With this information, someone would be able to replicate the research exactly.
The research design of "Reference List Accuracy in Social Work Journals" by Spivey and Wilks (2004) involved evaluating the accuracy of reference lists in social work journals. The following details the steps involved in the research design, which would allow someone to replicate the research exactly:
Sampling: The authors selected three social work journals, and from each journal, they selected 10 articles published in the previous year. This resulted in a total sample size of 30 articles.
Data Collection: For each article selected, the authors examined the reference list and compared it to the original source material. They recorded any discrepancies or errors in the reference list.
Data Analysis: The authors counted the number of errors in each reference list and calculated the error rate as a percentage of the total number of references. They then calculated the average error rate for each of the three journals and for the overall sample.
Statistical Analysis: The authors used descriptive statistics to summarize the error rates for each of the three journals and for the overall sample. They also calculated 95% confidence intervals for each of the error rates.
Limitations: The authors acknowledged that their sample size was small and may not be representative of all social work journals. They also acknowledged that their study was limited to the examination of errors in reference lists, and they did not examine other aspects of the articles such as content, writing quality, or methodology.
In conclusion, the research design of "Reference List Accuracy in Social Work Journals" by Spivey and Wilks (2004) involved selecting a sample of 30 articles from three social work journals, examining the reference lists for errors, calculating error rates, and using descriptive statistics to summarize the results. With this information, someone would be able to replicate the research exactly.
While some aspects produced in these responses are correct, the majority of the information provided about both articles is incorrect. Furthermore, the first paragraph of each response, following the introduction, produces almost identically incorrect information about sampling with it being stated in both cases that 10 articles were researched and 30 references were included in each. The remaining information is either very general, detailing what might be expected from such research or incorrect.
The responses from SciSpace are as follows with the first paragraph about Oermann and Ziolkowski (2002) and the second about Spivey and Wilks (2004):
The research design involved analyzing the accuracy of references in 3 critical care nursing journals: Journal of PeriAnesthesia Nursing, American Journal of Critical Care, and Critical Care Nurse. • For this purpose 10% (244) citations from each journal were selected randomly for analysis over a period 13 issues during the study period. These 244 references were then examined to determine number and types errors present such as misspelled or omitted author names/initials; non-first page discrepancies etc., which was used to calculate overall error rate i.e 22.9%.
The research design for this study was an exploratory one, using a stratified computer-generated random sample of 500 references from five social work journals. Each reference was verified against the original source material in six fields: article title, author name(s), journal title, pagination, volume and year. Data were collected on each field to determine if there were any errors present or not; these data points then allowed researchers to calculate the overall error rate across all five journals combined as well as individual rates per journal.
The information provided by SciSpace lacked the required detail but contained accurate information about the research designs. However, in an attempt to gain more information by asking further questions SciSpace produced an error by insisting that all references were used, when the studies excluded certain entries. This follow up query was made twice using different phraseology but on both occasions and for both papers the outcome was the same.
The responses from Elicit were again brief with just a sentence responding to each. The sentence about Oermann and Ziolkowski (2002) was:
A random sample of 10% of references from each of 3 critical care nursing journals.
The sentence about Spivey and Wilks (2004) was:
an exploratory study that investigated the rate of citation errors in the reference lists of 5 social work journals.
While the information contained in both sentences was accurate, beyond some very basic information, it lacked most of the required detail.
Question 4) What was the reported error rate in the study?
The information required from the papers for question 4 concerned the reported error rates. The required information was as follows:
Oermann and Ziolkowski (2002) found 56 references contained errors out of 244 (error rate = 22.9%) with 64 errors in total. Some references had more than one error.
Spivey & Wilks (2004) found 206 references contained errors out of 500 (error rate = 41.2%) with 262 errors in total. Some references had more than one error.
The response generated by ChatGPT for question 4 were as follows.
According to "Accuracy of references in three critical care nursing journals" by Oermann and Ziolkowski (2002), the error rates varied across the three nursing journals that were studied. The error rate for Journal A was 11.9%, the error rate for Journal B was 7.1%, and the error rate for Journal C was 5.5%. The overall error rate across the three journals was 9%. The types of errors found included omissions, incorrect publication years, misspelling of author names, incorrect journal titles, and other discrepancies. The authors suggested that these errors can lead to confusion, inaccuracies, and a loss of credibility in the scholarly work.
According to the findings of "Reference List Accuracy in Social Work Journals" by Spivey and Wilks (2004), the reported error rate for the overall sample of 30 articles was 7.9%. This means that, on average, 7.9% of the references in the reference lists of the articles in the sample were incorrect or had some sort of discrepancy. The authors also calculated the average error rate for each of the three journals included in the sample, but did not report the specific error rates for each journal.
Despite the level of detail provided by ChatGPT, the responses contain no accurate information.
From SciSpace the responses were:
The overall error rate for the references examined in this study was 22.9%. Major errors, such as misspelled or omitted author names and initials, occurred in 19.6% of the references while minor errors including non-first page discrepancies were found to occur in 4.5% of the references.
The reported error rate in the findings of this study was 41.2%. 206 references (out of 500) contained at least one error.
All information provided by SciSpace was accurate. The response for Oermann and Ziolkowski (2002) went into more detail than had been requested but was nonetheless correct. The response for Spivey and Wilks (2004) was as detailed as the question had requested.
The response from Elicit to the question when posed for Oermann and Ziolkowski (2002) was 22.9% which is correct. There was no response concerning Spivey and Wilks (2004).
Question 5) Which Journal in the research had the worst error rate?
The correct answers to the final question, "Which Journal in the research had the worst error rate?" were the Journal of PeriAnesthesia Nursing for Oermann and Ziolkowski (2002) and CSWJ for Spivey and Wilks (2004).
ChatGPT specifically said that neither paper reported which journal was reported to have the highest error rate. The responses were:
Unfortunately, "Accuracy of references in three critical care nursing journals" by Oermann and Ziolkowski (2002) did not specify which of the three critical care nursing journals studied had the highest error rate. The authors reported the average error rate for the overall sample of the three journals, which was 6.2%, but did not provide a comparison of the error rates for each of the individual journals.
Unfortunately, "Reference List Accuracy in Social Work Journals" by Spivey and Wilks (2004) did not specify which of the three journals included in the study had the highest error rate. The authors reported the average error rate for the overall sample of 30 articles, which was 7.9%, but did not provide a comparison of the error rates for each of the three journals.
In the case of SciSpace the Journal of PeriAnesthesia Nursing was correctly identified as the journal with the worst error rate reported by Oermann and Ziolkowski (2002). However, it specified that Spivey and Wilks (2004) did not report which journal had the worst error rate. The same result was produced when 'worst' was replaced by 'highest' in the question.
The Journal of PeriAnesthesia Nursing had the highest error rate among the three critical care nursing journals studied, with an overall error rate of 23.8%. Major errors occurred in 19.9% and minor errors were found to occur in 8% of references from this journal.
The research did not report which journal had the worst error rate. However, it was noted that individual journal error rates ranged from 14% to 50%.
Elicit gave no response when asked about Oermann and Ziolkowski (2002). This was also the case when 'worst' was substituted by 'highest'. However, it correctly identified CSWJ as the journal with the worst rate in Spivey and Wilks (2004).
Conclusion
This exercise shows that none of the three AI tools were able to complete this task with level of accuracy that would be expected from an individual reading the articles themselves. The most successful tool was SciSpace, which for these five questions successfully completed three. For the third question part of the answer was wrong for Spivey and Wilks (2004) and for question five it was completely wrong for this paper. It is concerning that while the responses generally report the correct information, there is no indication, through the use of quotation marks, of which parts of the responses are reproduced verbatim.
Elicit was able to give very little detail about either paper. When a response was given it was correct but across all the possible responses fewer than a quarter were found by Elicit.
Finally, the tool which prompted this exercise, ChatGPT, proved to be the least accurate of the three AI tools. The responses were long and detailed. However, the information provided was generally not related to the information provided in the papers. So, for the first two questions the information was good but generally did not reflect the ideas discussed in the paper. For the remaining questions which concerned the research reported in the papers the responses were wrong with either incorrect data reported or erroneous conclusions.
So, for this small investigation into the accuracy of AI tools and their usefulness in this exercise, SciSpace seems to work fairly well. However, there is still a need to verify the responses due to the fact that there were errors in two of the responses.
ChatGPT is not helpful at all for this exercise, given that most of what it responded with was not correct. However, had it been an exercise that did not require the information to derive from specific sources than the information provided was sound. As was the case, when ChatGPT was asked to generate a list of references, it seems to produce what it thinks is a good answer (see Ireland, 2023). This works when we are asking general questions but when we are looking for more detail and require granular information then responses seem to approximate to what looks like a good answer but which are in reality completely wrong.
The conclusion that can be drawn from this is that in this particular exercise, SciSpace would be a useful tool to help with its completion. However, it is still necessary to verify the responses. In contrast, it would not be at all useful to use ChatGPT to support research in this exercise.
References
Davis, A. (2023 January 30). London school likely to scrap essay homework due to fears of ChatGPT manipulation. Evening Standard.
Elicit (2022). Frequently Asked Questions: What is Elicit?
Hurst, L. (2022 December 15). ChatGPT: Why the human-like AI chatbot suddenly has everyone talking. Euronews.
Ireland, C. (2023). ChatGPT and the future of educational assessment. doi=10.6084/m9.figshare.21975833.v1
Oermann, M. H., & Ziolkowski, L. D. (2002). Accuracy of references in three critical care nursing journals. Journal of PeriAnesthesia Nursing, 17(2), 78-83. doi=10.1016/S1089-9472(02)70016-6
SciSpace (2022). About SciSpace.
Spivey, C. A., & Wilks, S. E. (2004). Reference list accuracy in social work journals. Research on Social Work Practice, 14(4), 281-286. doi=10.1177/1049731503262131
Taylor, M.K. (1998). The practical effects of errors in reference lists in nursing research journals. Nursing Research 47(5), pp.300-303.
Comments
Post a Comment