Social media mining

(Learn how and when to remove this template message)

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

Social media mining uses a range of basic concepts from computer science, data mining, machine learning and statistics. Social media miners develop algorithms suitable for investigating massive files of social media data. Social media mining is based on theories and methodologies from social network analysis, network science, sociology, ethnography, optimization and mathematics. It encompasses the tools to formally represent, measure and model meaningful patterns from large-scale social media data.[1] In the 2010s, major corporations, governments and not-for-profit organizations engaged in social media mining to obtain data about customers, clients and citizens.

Background

As defined by Kaplan and Haenlein,[2] social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (Facebook or LinkedIn), microblogging (Twitter), photo sharing (Flickr, Instagram, Photobucket, or Picasa), news aggregation (Google Reader, StumbleUpon, or Feedburner), video sharing (YouTube, MetaCafe), livecasting (Ustream or Twitch), virtual worlds (Kaneva), social gaming (World of Warcraft), social search (Google, Bing, or Ask.com), and instant messaging (Google Talk, Skype, or Yahoo! messenger).

The first social media website was introduced by GeoCities in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of HTML coding. The first social networking site, SixDegrees.com, was introduced in 1997.[3] Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media.

Uses

Social media mining is used across several industries including business development, social science research, health services, and educational purposes.[4][5] Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity.[6] These forces are then measured via statistical analysis of the nodes and connections between these nodes.[4] Social analytics also uses sentiment analysis, because social media users often relay positive or negative sentiment in their posts.[7] This provides important social information about users' emotions on specific topics.[8][9][10]

These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network.[4] Companies would be interested in this information in order to decide who they may hire for influencer marketing. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites.[4] Analysts also value measures of homophily: the tendency of two similar individuals to become friends.[6] Users have begun to rely on information of other users' opinions in order to understand diverse subject matter.[7] These analyses can also help create recommendations for individuals in a tailored capacity.[4] By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with.

Perception

Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and Google. Companies such as these, considered "Big Tech" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021.[11]

It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics.[12]

At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming influencers. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought.[13]

Research

Research areas

Publication venues

Social media mining research articles are published in computer science, social science, and data mining conferences and journals:

Conferences

Conference papers can be found in proceedings of Knowledge Discovery and Data Mining (KDD), World Wide Web (WWW), Association for Computational Linguistics (ACL), Conference on Information and Knowledge Management (CIKM), International Conference on Data Mining (ICDM), Internet Measuring Conference (IMC).

Journals

Social media mining is also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and International Conference on Very Large Data Bases.

See also

Methods
Application domains
Companies
Related topics

References

  1. ^ a b c d e f g Zafarani, Reza; Abbasi, Mohammad Ali; Liu, Huan (2014). "Social Media Mining: An Introduction". Retrieved November 15, 2014.
  2. ^ Kaplan, Andreas M.; Haenlein, Michael (2010). "Users of the world, unite! The challenges and opportunities of social media". Business Horizons. 53 (1): 59–68. doi:10.1016/j.bushor.2009.09.003. S2CID 16741539.
  3. ^ "The History of Social Media: 29+ Key Moments". Social Media Marketing & Management Dashboard. November 22, 2018. Retrieved April 21, 2021.
  4. ^ a b c d e Zafarani, R., Ali Abbasi, M., Liu, H., (2014). Social Media Mining. Cambridge University Press. http://dmml.asu.edu/smm.
  5. ^ Singh, Archana (2017). "Mining of Social Media data of University students". Education and Information Technologies. 22 (4): 1515–1526. doi:10.1007/s10639-016-9501-1. S2CID 1761288.
  6. ^ a b Tang, J., Chang, Y., Aggarwal, C., Liu, H., (2016). "A Survey of Signed Network Mining in Social Media". ACM Computing Surveys, 49: 3.
  7. ^ a b Adedoyin-Olowe, M., Gaber, M., & Stahl, F., (2013). "A Survey of Data Mining Techniques for Social Media Analysis."
  8. ^ Laeeq, F., Nafis, T., & Beg, M. (2017). "Sentimental Classification of Social Media using Dating Mining." International Journal of Advanced Research in Computer Science, 8: 5.
  9. ^ Ho, Vong Anh; Nguyen, Duong Huynh-Cong; Nguyen, Danh Hoang; Pham, Linh Thi-Van; Nguyen, Duc-Vu; Nguyen, Kiet Van; Nguyen, Ngan Luu-Thuy (2020). "Emotion Recognition for Vietnamese Social Media Text". Computational Linguistics. Communications in Computer and Information Science. Vol. 1215. pp. 319–333. arXiv:1911.09339. doi:10.1007/978-981-15-6168-9_27. ISBN 978-981-15-6167-2. S2CID 208202333.
  10. ^ Nguyen et al.(2020). "Exploiting Vietnamese Social Media Characteristics for Textual Emotion Recognition in Vietnamese." International Conference on Asian Language Processing (IALP), 2020.
  11. ^ McCourt, Abby (April 3, 2018). "Social Media Mining: The Effects of Big Data In the Age of Social Media". Media Freedom & Information Access Clinic. Yale Law School. Retrieved February 25, 2021.
  12. ^ The Social Dilemma.(2020) Directed by Jeff Orlowski, Exposure Labs. Netflix, https://www.netflix.com/title/81254224.
  13. ^ Newman, John; Haw Allensworth, Rebecca (January 30, 2021). "The Government Didn't Foresee How Facebook Would Behave". The Atlantic. Retrieved February 15, 2021.
  14. ^ Zarrinkalam, Fattane; Bagheri, Ebrahim (2017). "Event identification in social networks". Encyclopedia with Semantic Computing and Robotic Intelligence. 01 (1): 1630002. arXiv:1606.08521. doi:10.1142/S2425038416300020. S2CID 8484345.
  15. ^ Nurwidyantoro, A.; Winarko, E. (June 1, 2013). "Event detection in social media: A survey". International Conference on ICT for Smart Society. pp. 1–5. doi:10.1109/ICTSS.2013.6588106. ISBN 978-1-4799-0145-6. S2CID 23802901.
  16. ^ "Event Detection from Social Media Data" (PDF). Retrieved May 5, 2017.
  17. ^ "Event Detection in Social Media Data" (PDF). Retrieved May 5, 2017.
  18. ^ Cordeiro, Mário; Gama, João (January 1, 2016). "Online Social Networks Event Detection: A Survey". Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science. Vol. 9580. Springer International Publishing. pp. 1–41. doi:10.1007/978-3-319-41706-6_1. ISBN 978-3-319-41705-9.
  19. ^ Gasco, Luis; Clavel, Chloé; Asensio, Cesar; De Arcas, Guillermo (March 25, 2019). "Beyond sound level monitoring: Exploitation of social media to gather citizens subjective response to noise". Science of the Total Environment. 658: 69–79. Bibcode:2019ScTEn.658...69G. doi:10.1016/j.scitotenv.2018.12.071. ISSN 0048-9697. PMID 30572215. S2CID 58647430.
  20. ^ Correia, Rion Brattig; Li, Lang; Rocha, Luis M. (2016). "Monitoring Potential Drug Interactions and Reactions Via Network Analysis of Instagram User Timelines". Biocomputing 2016. Vol. 21. pp. 492–503. doi:10.1142/9789814749411_0045. ISBN 978-981-4749-40-4. PMC 4720984. PMID 26776212. {{cite book}}: |journal= ignored (help)
  21. ^ a b Korkontzelos, Ioannis; Nikfarjam, Azadeh; Shardlow, Matthew; Sarker, Abeed; Ananiadou, Sophia; Gonzalez, Graciela H. (2016). "Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts". Journal of Biomedical Informatics. 62: 148–158. doi:10.1016/j.jbi.2016.06.007. PMC 4981644. PMID 27363901.
  22. ^ a b Wood, Ian B.; Varela, Pedro L.; Bollen, Johan; Rocha, Luis M.; Gonçalves-Sá, Joana (2017). "Human Sexual Cycles are Driven by Culture and Match Collective Moods". Scientific Reports. 7 (1): 17973. arXiv:1707.03959. Bibcode:2017NatSR...717973W. doi:10.1038/s41598-017-18262-5. PMC 5740080. PMID 29269945.
  23. ^ Tang, Jiliang; Tang, Jie; Liu, Huan (2014). "Recommendation in Social Media - Recent Advances and New Frontiers". Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Archived from the original on April 13, 2016. Retrieved November 30, 2014.
  24. ^ Tang, Jiliang; Hu, Xia; Liu, Huan (2013). "Social Recommendation: A Review" (PDF). Social Network Analysis and Mining. 3 (4): 1113–1133. doi:10.1007/s13278-013-0141-9. S2CID 14899273. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  25. ^ Horowitz, Damon; Kamvar, Sepandar (2013). "The Anatomy of a Large-Scale Social Search Engine" (PDF). Proceedings of the 19th International Conference on World Wide Web. ACM. pp. 431–440.
  26. ^ Hu, Xia; Tang, Lei; Tang, Jiliang; Liu, Huan (2013). "Exploiting Social Relations for Sentiment Analysis in Microblogging" (PDF). Proceedings of the 6th ACM International Conference on Web Search and Data Mining. Archived from the original (PDF) on March 4, 2016. Retrieved November 29, 2014.
  27. ^ Hu, Xia; Tang, Jiliang; Gao, Huiji; Liu, Huan (2013). "Unsupervised Sentiment Analysis with Emotional Signals" (PDF). Proceedings of the 22nd International World Wide Web Conference. pp. 607–618. doi:10.1145/2488388.2488442. ISBN 9781450320351. S2CID 6608236. Archived from the original (PDF) on March 4, 2016. Retrieved November 29, 2014.
  28. ^ Ali, K; Dong, H; Bouguettaya, A (2017). "Sentiment Analysis as a Service: A social media based sentiment analysis framework". The 24th IEEE International Conference on Web Services (IEEE ICWS 2017). pp. 660–667.
  29. ^ Shahheidari, S; Dong, H; Daud, R (2013). "Twitter sentiment mining: A multi domain analysis". 2013 Seventh International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS 2013). pp. 144–149.
  30. ^ Hu, Xia; Tang, Jiliang; Zhang, Yanchao; Liu, Huan (2013). "Social Spammer Detection in Microblogging" (PDF). Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Archived from the original (PDF) on March 4, 2016. Retrieved November 29, 2014.
  31. ^ Hu, Xia; Tang, Jiliang; Liu, Huan (2014). "Online Social Spammer Detection" (PDF). Proceedings of the 28th AAAI Conference on Artificial Intelligence. Archived from the original (PDF) on March 28, 2016. Retrieved November 29, 2014.
  32. ^ Hu, Xia; Tang, Jiliang; Liu, Huan (2014). "Leveraging Knowledge across Media for Spammer Detection in Microblogging" (PDF). Proceedings of the 37th Annual ACM SIGIR Conference. Archived from the original (PDF) on March 4, 2016. Retrieved November 29, 2014.
  33. ^ Hu, Xia; Tang, Jiliang; Gao, Huiji; Liu, Huan (2014). "Social Spammer Detection with Sentiment Information" (PDF). Proceedings of the IEEE International Conference on Data Mining. Archived from the original (PDF) on March 3, 2016. Retrieved November 29, 2014.
  34. ^ Tang, Jiliang; Liu, Huan (2012). "Feature Selection with Linked Data in Social Media" (PDF). Proceedings of SIAM International Conference on Data Mining. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  35. ^ Tang, Jiliang; Liu, Huan (2014). "Feature Selection for Social Media Data" (PDF). ACM Transactions on Knowledge Discovery from Data. 8 (4): 1–27. doi:10.1145/2629587. S2CID 15006243. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  36. ^ Tang, Jiliang; Liu, Huan (2012). "Unsupervised Feature Selection for Linked Social Media Data" (PDF). Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  37. ^ Tang, Jiliang; Liu, Huan (2014). "Unsupervised Feature Selection for Linked Social Media Data" (PDF). IEEE Transactions on Knowledge and Data Engineering. doi:10.1109/TKDE.2014.2320728. S2CID 16142099. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  38. ^ Tang, Jiliang; Liu, Huan (2014). "Trust in Social Computing". Proceedings of the 23rd International World Wide Web Conference. Archived from the original on March 4, 2016. Retrieved November 30, 2014.
  39. ^ Tang, Jiliang; Gao, Huiji; Liu, Huan (2012). "mTrust: Discerning Multi-Faceted Trust in a Connected World" (PDF). The 5th ACM International Conference on Web Search and Data Mining. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  40. ^ Tang, Jiliang; Gao, Huiji; DasSarma, Atish; Liu, Huan (2012). "eTrust: Understanding Trust Evolution in an Online World" (PDF). Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Archived from the original (PDF) on March 4, 2016. Retrieved November 30, 2014.
  41. ^ Tang, Jiliang; Gao, Huiji; Hu, Xia; Liu, Huan (2013). "Exploiting Homophily Effect for Trust Prediction" (PDF). The 6th ACM International Conference on Web Search and Data Mining. Archived from the original (PDF) on March 4, 2016. Retrieved November 30, 2014.
  42. ^ Tang, Jiliang; Hu, Xia; Liu, Huan (2014). "Is Distrust the Negation of Trust? The Value of Distrust in Social Media" (PDF). Proceedings of ACM Hypertext Conference. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  43. ^ Tang, Jiliang; Hu, Xia; Chang, Yi; Liu, Huan (2014). "Predictability of Distrust with Interaction Data" (PDF). ACM International Conference on Information and Knowledge Management. Archived from the original (PDF) on March 3, 2016. Retrieved November 30, 2014.
  44. ^ Tang, Jiliang; Chang, Shiyu; Aggarwal, Charu; Liu, Huan (2015). "Negative Link Prediction in Social Media" (PDF). Proceedings OfACM International Conference on Web Search and Data Mining. arXiv:1412.2723. Bibcode:2014arXiv1412.2723T. Archived from the original (PDF) on September 24, 2015. Retrieved November 30, 2014.
  45. ^ Bruno, Nicola (2011). "Tweet first, verify later? How real-time information is changing the coverage of worldwide crisis events". Oxford: Reuters Institute for the Study of Journalism, University of Oxford. 10: 2010–2011.
  46. ^ Sakaki, Takashi; Okazaki, Makoto; Yutaka, Matsuo (2010). "Earthquake shakes Twitter users: real-time event detection by social sensors". Proceedings of the 19th International Conference on World Wide Web. pp. 851–860.
  47. ^ Mendoza, Marcelo; Poblete, Barbara; Castillo, Carlos (2010). "Twitter under crisis: Can we trust what we RT?". Proceedings of the First Workshop on Social Media Analytics. pp. 71–79.
  48. ^ Kumar, Shamanth; Barbier, Geoffrey; Abbasi, Mohammad Ali; Liu, Huan (2011). "TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief". The 5th International AAAI Conference on Weblogs and Social Media. Archived from the original on December 5, 2014. Retrieved December 1, 2014.
  49. ^ Kumar, Shamanth; Hu, Xia; Liu, Huan (2014). "A behavior analytics approach to identifying tweets from crisis regions". Proceedings of the 25th ACM Conference on Hypertext and Social Media. pp. 255–260.
  50. ^ Gao, Huiji; Tang, Jiliang; Liu, Huan (2012). "Exploring Social-Historical Ties on Location-Based Social Networks" (PDF). Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media. Archived from the original (PDF) on January 22, 2016. Retrieved December 1, 2014.
  51. ^ Gao, Huiji; Tang, Jiliang; Liu, Huan (2012). "Mobile Location Prediction in Spatio-Temporal Context" (PDF). Nokia Mobile Data Challenge Workshop 2012. Archived from the original (PDF) on September 24, 2015. Retrieved December 1, 2014.
  52. ^ Gao, Huiji; Tang, Jiliang; Liu, Huan (2012). "gSCorr: Modeling Geo-Social Correlations for New Check-ins on Location-Based Social Networks" (PDF). Proceedings of the 21st ACM International Conference on Information and Knowledge Management. Archived from the original (PDF) on September 24, 2015. Retrieved December 1, 2014.
  53. ^ Gao, Huiji; Tang, Jiliang; Hu, Xia; Liu, Huan (2013). "Exploring Temporal Effects for Location Recommendation on Location-Based Social Networks" (PDF). Proceedings of the 7th ACM Recommender Systems Conference. pp. 93–100. doi:10.1145/2507157.2507182. ISBN 9781450324090. S2CID 14990290. Archived from the original (PDF) on September 24, 2015. Retrieved December 1, 2014.
  54. ^ Gao, Huiji; Tang, Jiliang; Hu, Xia; Liu, Huan (2014). "Content-Aware Point of Interest Recommendation on Location-Based Social Networks" (PDF). Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Archived from the original (PDF) on September 24, 2015. Retrieved December 1, 2014.
  55. ^ Gao, Huiji; Tang, Jiliang; Liu, Huan (2014). "Personalized Location Recommendation on Location-based Social Networks" (PDF). Proceedings of the 8th ACM Recommender Systems Conference. Archived from the original (PDF) on September 24, 2015. Retrieved December 1, 2014.
  56. ^ Barbier, Geoffrey; Feng, Zhuo; Gundecha, Pritam; Liu, Huan (2013). "Provenance Data in Social Media". Synthesis Lectures on Data Mining and Knowledge Discovery. 4: 1–84. doi:10.2200/S00496ED1V01Y201304DMK007. S2CID 46794494.
  57. ^ Gundecha, Pritam; Feng, Zhuo; Liu, Huan (2013). "Seeking Provenance of Information in Social Media" (PDF). Proceedings of the 22nd ACM International Conference on Information and Knowledge Management Conference. Archived from the original (PDF) on March 4, 2016. Retrieved December 1, 2014.
  58. ^ Gundecha, Pritam; Barbier, Geoffrey; Tang, Jiliang; Liu, Huan (2014). "User Vulnerability and its Reduction on a Social Networking Site" (PDF). ACM Transactions on Knowledge Discovery from Data. 9 (2): 1–25. doi:10.1145/2630421. S2CID 1200227. Archived from the original (PDF) on March 3, 2016. Retrieved December 1, 2014.
  59. ^ Marozzo, Fabrizio; Bessi, Alessandro (2018), "Analyzing polarization of social media users and news sites during political campaigns", Social Network Analysis and Mining, 8: 1, doi:10.1007/s13278-017-0479-5, S2CID 21257844

External links

  • v
  • t
  • e
Types
Networks
Services
Concepts and
theories
Models and
processes
Economics
Phenomena
Related topics
  • v
  • t
  • e
Note: This template roughly follows the 2012 ACM Computing Classification System.
Hardware
Computer systems organization
Networks
Software organization
Software notations and tools
Software development
Theory of computation
Algorithms
Mathematics of computing
Information systems
Security
Human–computer interaction
Concurrency
Artificial intelligence
Machine learning
Graphics
Applied computing
  • Category
  • Outline
  • WikiProject
  • Commons