Citation analysis

Examination of the frequency, patterns, and graphs of citations in documents
Part of a series on
Citation metrics
  • Altmetrics
  • Article-level
  • Author-level
    • Eigenfactor
    • G-index
    • H-index
  • Bibliographic coupling
  • Citation
    • Analysis
    • Dynamics
    • Index
    • Graph
  • Co-citation
    • Proximity Analysis
  • Coercive citation
  • Citation cartel
  • I4OC
  • Journal-level
  • Kardashian Index
  • v
  • t
  • e

Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the directed graph of citations — links from one document to another document — to reveal properties of the documents. A typical aim would be to identify the most important documents in a collection. A classic example is that of the citations between academic articles and books.[1][2] For another example, judges of law support their judgements by referring back to judgements made in earlier cases (see citation analysis in a legal context). An additional example is provided by patents which contain prior art, citation of earlier patents relevant to the current claim. The digitization of patent data and increasing computing power have led to a community of practice that uses these citation data to measure innovation attributes, trace knowledge flows, and map innovation networks.[3]

Documents can be associated with many other features in addition to citations, such as authors, publishers, journals as well as their actual texts. The general analysis of collections of documents is known as bibliometrics and citation analysis is a key part of that field. For example, bibliographic coupling and co-citation are association measures based on citation analysis (shared citations or shared references). The citations in a collection of documents can also be represented in forms such as a citation graph, as pointed out by Derek J. de Solla Price in his 1965 article "Networks of Scientific Papers".[4] This means that citation analysis draws on aspects of social network analysis and network science.

An early example of automated citation indexing was CiteSeer, which was used for citations between academic papers, while Web of Science is an example of a modern system which includes more than just academic books and articles reflecting a wider range of information sources. Today, automated citation indexing[5] has changed the nature of citation analysis research, allowing millions of citations to be analyzed for large-scale patterns and knowledge discovery. Citation analysis tools can be used to compute various impact measures for scholars based on data from citation indices.[6][7][note 1] These have various applications, from the identification of expert referees to review papers and grant proposals, to providing transparent data in support of academic merit review, tenure, and promotion decisions. This competition for limited resources may lead to ethically questionable behavior to increase citations.[8][9]

A great deal of criticism has been made of the practice of naively using citation analyses to compare the impact of different scholarly articles without taking into account other factors which may affect citation patterns.[10] Among these criticisms, a recurrent one focuses on "field-dependent factors", which refers to the fact that citation practices vary from one area of science to another, and even between fields of research within a discipline.[11]

Overview

While citation indexes were originally designed for information retrieval, they are increasingly used for bibliometrics and other studies involving research evaluation. Citation data is also the basis of the popular journal impact factor.

There is a large body of literature on citation analysis, sometimes called scientometrics, a term invented by Vasily Nalimov, or more specifically bibliometrics. The field blossomed with the advent of the Science Citation Index, which now covers source literature from 1900 on. The leading journals of the field are Scientometrics, Informetrics, and the Journal of the Association for Information Science and Technology. ASIST also hosts an electronic mailing list called SIGMETRICS at ASIST.[12] This method is undergoing a resurgence based on the wide dissemination of the Web of Science and Scopus subscription databases in many universities, and the universally available free citation tools such as CiteBase, CiteSeerX, Google Scholar, and the former Windows Live Academic (now available with extra features as Microsoft Academic). Methods of citation analysis research include qualitative, quantitative and computational approaches. The main foci of such scientometric studies have included productivity comparisons, institutional research rankings, journal rankings [13] establishing faculty productivity and tenure standards,[14] assessing the influence of top scholarly articles,[15] tracing the development trajectory of a science or technology field,[16] and developing profiles of top authors and institutions in terms of research performance.[17]

Legal citation analysis is a citation analysis technique for analyzing legal documents to facilitate the understanding of the inter-related regulatory compliance documents by the exploration the citations that connect provisions to other provisions within the same document or between different documents. Legal citation analysis uses a citation graph extracted from a regulatory document, which could supplement E-discovery - a process that leverages on technological innovations in big data analytics.[18][19][20][21]

History

In a 1965 paper, Derek J. de Solla Price described the inherent linking characteristic of the SCI as "Networks of Scientific Papers".[4] The links between citing and cited papers became dynamic when the SCI began to be published online. The Social Sciences Citation Index became one of the first databases to be mounted on the Dialog system[22] in 1972. With the advent of the CD-ROM edition, linking became even easier and enabled the use of bibliographic coupling for finding related records. In 1973, Henry Small published his classic work on Co-Citation analysis which became a self-organizing classification system that led to document clustering experiments and eventually an "Atlas of Science" later called "Research Reviews".

The inherent topological and graphical nature of the worldwide citation network which is an inherent property of the scientific literature was described by Ralph Garner (Drexel University) in 1965.[23]

The use of citation counts to rank journals was a technique used in the early part of the nineteenth century but the systematic ongoing measurement of these counts for scientific journals was initiated by Eugene Garfield at the Institute for Scientific Information who also pioneered the use of these counts to rank authors and papers. In a landmark paper of 1965 he and Irving Sher showed the correlation between citation frequency and eminence in demonstrating that Nobel Prize winners published five times the average number of papers while their work was cited 30 to 50 times the average. In a long series of essays on the Nobel and other prizes Garfield reported this phenomenon. The usual summary measure is known as impact factor, the number of citations to a journal for the previous two years, divided by the number of articles published in those years. It is widely used, both for appropriate and inappropriate purposes—in particular, the use of this measure alone for ranking authors and papers is therefore quite controversial.

In an early study in 1964 of the use of Citation Analysis in writing the history of DNA, Garfield and Sher demonstrated the potential for generating historiographs, topological maps of the most important steps in the history of scientific topics. This work was later automated by E. Garfield, A. I. Pudovkin of the Institute of Marine Biology, Russian Academy of Sciences and V. S. Istomin of Center for Teaching, Learning, and Technology, Washington State University and led to the creation of the HistCite[24] software around 2002.

Automatic citation indexing was introduced in 1998 by Lee Giles, Steve Lawrence and Kurt Bollacker[25] and enabled automatic algorithmic extraction and grouping of citations for any digital academic and scientific document. Where previous citation extraction was a manual process, citation measures could now scale up and be computed for any scholarly and scientific field and document venue, not just those selected by organizations such as ISI. This led to the creation of new systems for public and automated citation indexing, the first being CiteSeer (now CiteSeerX, soon followed by Cora, which focused primarily on the field of computer science and information science. These were later followed by large scale academic domain citation systems such as the Google Scholar and Microsoft Academic. Such autonomous citation indexing is not yet perfect in citation extraction or citation clustering with an error rate estimated by some at 10% though a careful statistical sampling has yet to be done. This has resulted in such authors as Ann Arbor, Milton Keynes, and Walton Hall being credited with extensive academic output.[26] SCI claims to create automatic citation indexing through purely programmatic methods. Even the older records have a similar magnitude of error.

Citation impact

This section is an excerpt from Citation impact.[edit]
Part of a series on
Citation metrics
  • v
  • t
  • e

Citation impact or citation rate is a measure of how many times an academic journal article or book or author is cited by other articles, books or authors.[27][28][29][30][31][32] Citation counts are interpreted as measures of the impact or influence of academic work and have given rise to the field of bibliometrics or scientometrics,[33][34] specializing in the study of patterns of academic impact through citation analysis. The importance of journals can be measured by the average citation rate,[35][32]

the ratio of number of citations to number articles published within a given time period and in a given index, such as the journal impact factor or the citescore. It is used by academic institutions in decisions about academic tenure, promotion and hiring, and hence also used by authors in deciding which journal to publish in. Citation-like measures are also used in other fields that do ranking, such as Google's PageRank algorithm, software metrics, college and university rankings, and business performance indicators.

Citation analysis for legal documents

Citation analysis for legal documents is an approach to facilitate the understanding and analysis of inter-related regulatory compliance documents by exploration of the citations that connect provisions to other provisions within the same document or between different documents. Citation analysis uses a citation graph extracted from a regulatory document, which could supplement E-discovery - a process that leverages on technological innovations in big data analytics.[20][21][36]

Citation analysis for plagiarism detection

This section is an excerpt from Content similarity detection § Citation analysis.[edit]
Citation-based plagiarism detection (CbPD)[37] relies on citation analysis, and is the only approach to plagiarism detection that does not rely on the textual similarity.[38] CbPD examines the citation and reference information in texts to identify similar patterns in the citation sequences. As such, this approach is suitable for scientific texts, or other academic documents that contain citations. Citation analysis to detect plagiarism is a relatively young concept. It has not been adopted by commercial software, but a first prototype of a citation-based plagiarism detection system exists.[39] Similar order and proximity of citations in the examined documents are the main criteria used to compute citation pattern similarities. Citation patterns represent subsequences non-exclusively containing citations shared by the documents compared.[38][40] Factors, including the absolute number or relative fraction of shared citations in the pattern, as well as the probability that citations co-occur in a document are also considered to quantify the patterns' degree of similarity.[38][40][41][42]

Controversies

  • E-publishing: due to the unprecedented growth of electronic resource (e-resource) availability, one of the questions currently being explored is, "how often are e-resources being cited in my field?"[43] For instance, there are claims that On-Line access to computer science literature leads to higher citation rates,[44] however, humanities articles may suffer if not in print.
  • Self-citations: it has been criticized that authors game the system by accumulating citations by citing themselves excessively.[45] For instance, it has been found that men tend to cite themselves more often than women.[46]
  • Citation pollution: the infiltration of retracted research, or fake research, being cited in legitimate research, but negatively impacting on the validity of the research.[47] It is due to various factors, including the publication race and the concerning rise in unscrupulous business practices related to so-called predatory or deceptive publishers, research quality, in general, is facing different types of threats.
  • Citation justice and citation bias: Because having others cite a publication helps the original author's career prospects, and because the key works in some fields were published by men, by older scholars, and by white people, there have been calls to promote social justice by deliberately citing publications by people from marginalized backgrounds, or by checking citations for bias before publication.[48]

See also

Notes

  1. ^ Examples include subscription-based tools based on proprietary data, such as Web of Science and Scopus, and free tools based on open data, such as Scholarometer by Filippo Menczer and his team.

References

  1. ^ Rubin, Richard (2010). Foundations of library and information science (3rd ed.). New York: Neal-Schuman Publishers. ISBN 978-1-55570-690-6.
  2. ^ Garfield, E. Citation Indexing - Its Theory and Application in Science, Technology and Humanities Philadelphia:ISI Press, 1983.
  3. ^ Jaffe, Adam; de Rassenfosse, Gaétan (2017). "Patent citation data in social science research: Overview and best practices". Journal of the Association for Information Science and Technology. 68: 1360–1374.
  4. ^ a b Derek J. de Solla Price (July 30, 1965). "Networks of Scientific Papers" (PDF). Science. 149 (3683): 510–515. Bibcode:1965Sci...149..510D. doi:10.1126/science.149.3683.510. PMID 14325149.
  5. ^ Giles, C. Lee; Bollacker, Kurt D.; Lawrence, Steve (1998), "CiteSeer", Proceedings of the third ACM conference on Digital libraries - DL '98, New York: Association for Computing Machinery, pp. 89–98, doi:10.1145/276675.276685, ISBN 978-0-89791-965-4, S2CID 514080
  6. ^ Kaur, Jasleen; Diep Thi Hoang; Xiaoling Sun; Lino Possamai; Mohsen JafariAsbagh; Snehal Patil; Filippo Menczer (2012). "Scholarometer: A Social Framework for Analyzing Impact across Disciplines". PLOS ONE. 7 (9): e43235. Bibcode:2012PLoSO...743235K. doi:10.1371/journal.pone.0043235. PMC 3440403. PMID 22984414.
  7. ^ Hoang, D.; Kaur, J.; Menczer, F. (2010), "Crowdsourcing Scholarly Data", Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US, archived from the original on 2015-04-17, retrieved 2015-08-09
  8. ^ Anderson, M.S. van; Ronning, E.A. van; de Vries, R.; Martison, B.C. (2007). "The perverse effects of competition on scientists' work and relationship". Science and Engineering Ethics. 4 (13): 437–461. doi:10.1007/s11948-007-9042-5. PMID 18030595. S2CID 2994701.
  9. ^ Wesel, M. van (2016). "Evaluation by Citation: Trends in Publication Behavior, Evaluation Criteria, and the Strive for High Impact Publications". Science and Engineering Ethics. 22 (1): 199–225. doi:10.1007/s11948-015-9638-0. PMC 4750571. PMID 25742806.
  10. ^ Bornmann, L.; Daniel, H. D. (2008). "What do citation counts measure? A review of studies on citing behavior". Journal of Documentation. 64 (1): 45–80. doi:10.1108/00220410810844150. hdl:11858/00-001M-0000-0013-7A94-3. S2CID 17260826.
  11. ^ Anauati, Maria Victoria and Galiani, Sebastian and Gálvez, Ramiro H., Quantifying the Life Cycle of Scholarly Articles Across Fields of Economic Research (November 11, 2014). Available at SSRN: https://ssrn.com/abstract=2523078
  12. ^ "The American Society for Information Science & Technology". The Information Society for the Information Age. Retrieved 2006-05-21.
  13. ^ Lowry, Paul Benjamin; Moody, Gregory D.; Gaskin, James; Galletta, Dennis F.; Humpherys, Sean; Barlow, Jordan B.; and Wilson, David W. (2013). "Evaluating journal quality and the Association for Information Systems (AIS) Senior Scholars' journal basket via bibliometric measures: Do expert journal assessments add value?", MIS Quarterly, vol. 37(4), 993–1012. Also, video narrative of this paper: TheAISChannel (Oct 22, 2014). "Information Systems Journal Rankings MISQ 2013". YouTube. Archived from the original on Nov 2, 2023.
  14. ^ Dean, Douglas L; Lowry, Paul Benjamin; and Humpherys, Sean (2011). "Profiling the research productivity of tenured information systems faculty at U.S. institutions", MIS Quarterly, vol. 35(1), pp. 1–15 (ISSN 0276-7783).
  15. ^ Karuga, Gilbert G.; Lowry, Paul Benjamin; and Richardson, Vernon J. (2007). "Assessing the impact of premier information systems research over time", Communications of the Association for Information Systems, vol. 19(7), pp. 115–131 (http://aisel.aisnet.org/cais/vol19/iss1/7)
  16. ^ Liu, John S.; Lu, Louis Y.Y. (2012-03-01). "An integrated approach for main path analysis: Development of the Hirsch index as an example". Journal of the American Society for Information Science and Technology. 63 (3): 528–542. doi:10.1002/asi.21692. ISSN 1532-2890.
  17. ^ Lowry, Paul Benjamin; Karuga, Gilbert G.; and Richardson, Vernon J. (2007). "Assessing leading institutions, faculty, and articles in premier information systems research journals", Communications of the Association for Information Systems, vol. 20(16), pp. 142–203 (http://aisel.aisnet.org/cais/vol20/iss1/16).
  18. ^ Hamou-Lhadj, Abdelwahab; Hamdaqa, Mohammad (2009). "Citation Analysis: An Approach for Facilitating the Understanding and the Analysis of Regulatory Compliance Documents". 2009 Sixth International Conference on Information Technology: New Generations. pp. 278–283. doi:10.1109/ITNG.2009.161. ISBN 978-1-4244-3770-2. S2CID 10083351.
  19. ^ Mohammad Hamdaqa and A. Hamou-Lhadj, "Citation Analysis: An Approach for Facilitating the Understanding and the Analysis of Regulatory Compliance Documents", In Proc. of the 6th International Conference on Information Technology, Las Vegas, US
  20. ^ a b "E-Discovery Special Report: The Rising Tide of Nonlinear Review". Hudson Legal. Archived from the original on 3 July 2012. Retrieved 1 July 2012. by Cat Casey and Alejandra Perez
  21. ^ a b "What Technology-Assisted Electronic Discovery Teaches Us About The Role Of Humans In Technology - Re-Humanizing Technology-Assisted Review". Forbes. Retrieved 1 July 2012.
  22. ^ "Dialog, A Thomson Business". Dialog invented online information services. Retrieved 2006-05-21.
  23. ^ Garner, Ralph; Lunin, Lois; Baker, Lois (1967). "Three Drexel Information Science Research Studies" (PDF). Drexel Press. Archived from the original (PDF) on March 27, 2022. Retrieved August 14, 2022.
  24. ^ Eugene Garfield; A. I. Pudovkin; V. S. Istomin (2002). "Algorithmic Citation-Linked Historiography—Mapping the Literature of Science". Presented the ASIS&T 2002: Information, Connections and Community. 65th Annual Meeting of ASIST in Philadelphia, PA. November 18–21, 2002. Retrieved 2006-05-21.
  25. ^ C.L. Giles, K. Bollacker, S. Lawrence, "CiteSeer: An Automatic Citation Indexing System", DL'98 Digital Libraries, 3rd ACM Conference on Digital Libraries, pp. 89-98, 1998.
  26. ^ Postellon DC (March 2008). "Hall and Keynes join Arbor in the citation indexes". Nature. 452 (7185): 282. Bibcode:2008Natur.452..282P. doi:10.1038/452282b. PMID 18354457.
  27. ^ Garfield, E. (1955). "Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas". Science. 122 (3159): 108–111. Bibcode:1955Sci...122..108G. doi:10.1126/science.122.3159.108. PMID 14385826.
  28. ^ Garfield, E. (1973). "Citation Frequency as a Measure of Research Activity and Performance" (PDF). Essays of an Information Scientist. 1: 406–408.
  29. ^ Garfield, E. (1988). "Can Researchers Bank on Citation Analysis?" (PDF). Essays of an Information Scientist. 11: 354.
  30. ^ Garfield, E. (1998). "The use of journal impact factors and citation analysis in the evaluation of science". 41st Annual Meeting of the Council of Biology Editors.
  31. ^ Moed, Henk F. (2005). Citation Analysis in Research Evaluation. Springer. ISBN 978-1-4020-3713-9.
  32. ^ a b Haustein, S. (2012). Multidimensional Journal Evaluation: Analyzing Scientific Periodicals beyond the Impact Factor. Knowledge and Information. De Gruyter. ISBN 978-3-11-025555-3. Retrieved 2023-06-06.
  33. ^ Leydesdorff, L., & Milojević, S. (2012). Scientometrics. arXiv preprint arXiv:1208.4566.
  34. ^ Harnad, S. (2009). Open access scientometrics and the UK Research Assessment Exercise. Scientometrics, 79(1), 147-156.
  35. ^ Garfield, Eugene (1972-11-03). "Citation Analysis as a Tool in Journal Evaluation". Science. 178 (4060). American Association for the Advancement of Science (AAAS): 471–479. Bibcode:1972Sci...178..471G. doi:10.1126/science.178.4060.471. ISSN 0036-8075. PMID 5079701.
  36. ^ Hamdaqa, M.; A Hamou-Lhadj (2009). "Citation Analysis: An Approach for Facilitating the Understanding and the Analysis of Regulatory Compliance Documents". 2009 Sixth International Conference on Information Technology: New Generations. 2009 Sixth International Conference on Information Technology: New Generations. Las Vegas, NV: IEEE. pp. 278–283. doi:10.1109/ITNG.2009.161. ISBN 978-1-4244-3770-2. S2CID 10083351.
  37. ^ Gipp, Bela (2014), Citation-based Plagiarism Detection, Springer Vieweg Research, ISBN 978-3-658-06393-1
  38. ^ a b c Gipp, Bela; Beel, Jöran (June 2010), "Citation Based Plagiarism Detection - A New Approach to Identifying Plagiarized Work Language Independently", Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10) (PDF), ACM, pp. 273–274, doi:10.1145/1810617.1810671, ISBN 978-1-4503-0041-4, S2CID 2668037, archived from the original (PDF) on 25 April 2012, retrieved 21 October 2011
  39. ^ Gipp, Bela; Meuschke, Norman; Breitinger, Corinna; Lipinski, Mario; Nürnberger, Andreas (28 July 2013), "Demonstration of Citation Pattern Analysis for Plagiarism Detection", Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (PDF), ACM, p. 1119, doi:10.1145/2484028.2484214, ISBN 9781450320344, S2CID 2106222
  40. ^ a b Gipp, Bela; Meuschke, Norman (September 2011), "Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence", Proceedings of the 11th ACM Symposium on Document Engineering (DocEng2011) (PDF), ACM, pp. 249–258, doi:10.1145/2034691.2034741, ISBN 978-1-4503-0863-2, S2CID 207190305, archived from the original (PDF) on 25 April 2012, retrieved 7 October 2011
  41. ^ Gipp, Bela; Meuschke, Norman; Beel, Jöran (June 2011), "Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag", Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'11) (PDF), ACM, pp. 255–258, CiteSeerX 10.1.1.736.4865, doi:10.1145/1998076.1998124, ISBN 978-1-4503-0744-4, S2CID 3683238, archived from the original (PDF) on 25 April 2012, retrieved 7 October 2011
  42. ^ Gipp, Bela; Beel, Jöran (July 2009), "Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis", Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09) (PDF), International Society for Scientometrics and Informetrics, pp. 571–575, ISSN 2175-1935, archived from the original (PDF) on 13 September 2012, retrieved 7 October 2011
  43. ^ Zhao, Lisa. "How Librarian Used E-Resources--An Analysis of Citations in CCQ." Cataloging & Classification Quarterly 42(1) (2006): 117-131.
  44. ^ Lawrence, Steve. Free online availability substantially increases a paper's impact. Nature volume 411 (number 6837) (2001): 521. Also online at http://citeseer.ist.psu.edu/online-nature01/
  45. ^ Gálvez RH (March 2017). "Assessing author self-citation as a mechanism of relevant knowledge diffusion". Scientometrics. 111 (3): 1801–1812. doi:10.1007/s11192-017-2330-1. S2CID 6863843.
  46. ^ Singh Chawla, Dalmeet (5 July 2016). "Men cite themselves more than women do". Nature. 535 (7611): 212. doi:10.1038/nature.2016.20176. PMID 27414239. S2CID 4395779.
  47. ^ Van Der Walt, Wynand; Willems, Kris; Friedrich, Wernher; Hatsu, Sylvester; Kirstin, Krauss (2020). "Retracted Covid-19 papers and the levels of 'citation pollution': A preliminary analysis and directions for further research". Cahiers de la Documentation - Bladen voor Documentatie. 3 (4). hdl:10962/167732. Retrieved 13 January 2021.
  48. ^ Paul, Pamela (2023-05-04). "A Paper That Says Science Should Be Impartial Was Rejected by Major Journals. You Can't Make This Up". The New York Times. ISSN 0362-4331. Retrieved 2023-05-06.
Authority control databases: National Edit this at Wikidata
  • Czech Republic