• Community
    • Network
    • Ask the community
    • Events
    • Mentors
  • Entrepreneurs
  • Partners
  • Investors
  • Discover
    Sign in Sign up

    • Community
      • Network
      • Ask the community
      • Events
      • Mentors
    • Entrepreneurs
    • Partners
    • Investors
    • Discover
    Sign in Sign up

    Tag: machine learning

    Understanding Covid-19 using Twitter NLP

    Last week, the United States surpassed Italy as the country with the most deaths from the coronavirus, now making the US the epicenter of the…

    Rich Larrabee April 20, 2020
    6 Comments

    Latest updates

    Profile photo of John Zozzaro

    John Zozzaro posted an update 12 hours ago

    Profile photo of Megan Botha

    meganbotha posted an update 13 hours ago

    Profile photo of Rob Campanell

    Rob Campanell posted an update 6 days ago

    Profile photo of Paul O'Brien

    Paul O’Brien posted an update a week ago

    Profile photo of Paul O'Brien

    Paul O’Brien to Ted Cohen 2 weeks ago

    Recently Active in MediaTech

    Profile photo of Andy Lee
    Profile photo of Marlon Montgomery
    Profile photo of Tarby Bryant
    Profile photo of Paul O'Brien
    Profile photo of Sambaran Chakraborty
    Profile photo of Kevin Kempf
    Profile photo of meganbotha
    Profile photo of John Zozzaro
    Profile photo of Muhammed Alsharief
    Profile photo of Kirk Kirkland
    See all

    Celebramos graduación de 13 empresas colombianas de tecnología del curso de ‘Soft Landing’. Tras intensa preparación con @mediatechvent y @PROCOLOMBIACO están listas para entrar a USA al concluir este proyecto que arrancamos hace 2 años https://t.co/brHJPu0ITQ pic.twitter.com/t1DA6Khb3D

    — Pacho Santos (@PachoSantosC) April 8, 2021

    Questions

    Who is interested in investing in a new subscription SaaS platform for digital AND print products? 0 Answers | 10 Votes
    What are the books you recommend startup founders read? 6 Answers | 12 Votes
    X

    Latest Posts

    • Marketing is Forever Changed by ChatGPT…

      January 27, 2023
    • Creativity Loves Constraints

      January 16, 2023
    • InternetFM – 2023 update

      January 9, 2023

    Online 0
    There are no users currently online

    Navigation Menu

    • Register Your Profile
    • Startup Incubator
    • Let’s find a time to talk
    • Venture Development
    • Our Work for Your Country
    • Venture Capital


    Facebook
    Twitter
    Instagram
    Quora
    YouTube
    All Rights Reserved © 2023 MediaTech Ventures
    • Privacy Policy
    • |
    • Terms of Service

    Forum Description

    Last week, the United States surpassed Italy as the country with the most deaths from the coronavirus, now making the US the epicenter of the virus. The country has been in a lockdown for several weeks, and while it appears the outbreak is beginning to reach its peak, many unknowns remain ahead of us. What are people thinking and doing in response to the current situation, and what is to come next? 

    In past crises, we’ve had no means of gauging the mood of the American people, but with today’s technology, we can gain some insight into what people are thinking and doing during this unprecedented time.

    One new source of information available to us during the 2020 Coronavirus Pandemic is the microblogging messages from Twitter.

    According to recent data, there are 30 million daily Twitter users in the United States. Additionally, many of these messages contain geospatial information so we can pinpoint the location of the sender at the time the message was sent. So, what information might these messages carry, and how can we gain insight as to how people are coping as we enter our second month of social distancing and the death count increases daily?

    https://twitter.com/LBHouseMusic/status/1239229888931885059

    With modern Natural Language Processing (NLP) methods, we can analyze this message traffic to gain insight into what people are communicating. But what can we glean from thousands of tweets regarding the coronavirus? How can we summarize the message traffic and pull out general themes from the information?

    One approach is the use of a “Topic Model,” a probabilistic model that communicates information about topics in a body of text (or corpus). Using this method, we can extract general themes and gain insight into a large body of words and extract a probabilistic distribution of topics.

    Here is an example of 42K messages, taken from the United States on the 15th of March that has been confirmed accurate based on their geospatial location.

    Natural Language Processing Twitter NLP

    While there are several different algorithms that perform topic modeling I’ll focus on the Latent Dirichlet Allocation (LDA) algorithm which is widely used for topic modeling and visualized using pyLDAvis. I’ll also explore the use of the Non-Negative Matrix Factorization (NMF) algorithm that provided a cleaner set of topics based on my observations for this data. Both algorithms are unsupervised learning methods to cluster documents for topic analysis; the NMF algorithm has the reputation for being better for learning compact topics, producing more succinct labels (my goal). In our models we are using n-grams (using adjacent words to provide context) since particular phrases such as “social distancing” and “toilet paper” are significant.

    pyLDAvis is designed to help users interpret the topics in a topic model that have been fit to a corpus of text data.

    The work uses a Python library for interactive topic model visualization.

    pyLDAvis

    On the top right is listed the overall term frequency of the “Top-30 Most Salient Terms.” Not surprising at the top of the list is “social distancing.” So, even at this earlier date, the previously unknown term, “social distancing” quickly became a central focus of slowing the spread; the generated word cloud pictured below (Topic 2) further illustrates the public awareness of this phrase.

    On the top left is the “Intertopic Distance Map” which has taken the multidimensional data and simplified it into the observed 2-dimensions. I have generated 5 different topics and those can be seen on the graph. Given these are mathematical models, the topics are labeled as numbers 1 through 5 (logical topic names will be derived from the numerical topics and related words) and the placement is a representation of the distance between topics. The significance of a topic is represented by the area of the circle. As you can see, topics 3 and 5 have a large intersection (related to testing and the pending pandemic).

    The next step is to review some of the discovered topics.

    These visuals are a powerful tool for the LDA algorithm as you can easily see how the possible topics are grouped, how the related phrases are ranked, and what the related word frequencies are. For this topic, the NMF algorithm captures the following top 10 phrases: “social distancing, practice social distancing, practice social, slow spread, urge social, urge social distancing, stop spread, message urge, illness share message, illness share”.

    https://twitter.com/KevRincon/status/1239233785763332098

    From here, we’ll drill into one section of the country that has become the epicenter of the virus: New York City.

    So we are only considering the tweets that originated from the NYC area using geospatial functions to separate those out from the larger group. In mid-March the cases were just beginning to accelerate and people were petitioning the city government to close the schools to slow the spread. Under immense pressure Mayor de Blasio closed the nation’s largest public school system several days later. We can also see that Topic 1 is separated out from this other topics in the Intertopic Distance Map making it more unique.

    Here Topic 1 "Public Schools" is the dominant topic as we see phrases such as “slow spread,” “sign petition” and “close public”, indicating rising public pressure to close the public school system to slow the spread of the virus. The NYC Public School System was shut-down later in the week. In this case, the NMF algorithm closely matched the LDA algorithm with the following top 10 phrases: “public school, close public, close public school, sign petition, slow spread, school slow, school slow spread, public school slow, spread sign petition, spread sign”.

    Coronavirus word cloud
    Word Cloud for the Public School

    As a point of comparison, we next look at what people were tweeting in Los Angeles on the same date. Again, we are using geospatial functions to only consider tweets from the LA area. At this point in time, Los Angeles had under 100 cases. In Topic 1, “Case” was the dominant word, but it also included phrases such as “add case”, “case death” and “add case death”, indicating people were aware of the escalating cases and the death count from the virus both here and abroad. On this day Gov Newson announced restrictions enacted within the state as published in the Los Angeles Times.

    Also, Topic 1 "Case" is more unique then the others viewing the Intertopic Distance Map.

    Natural Language Processing Twitter for Los Angeles

    For this topic the NMF algorithm captured the following top 10 phrases “case death, add case, add case death, trump add, trump add case, addition trump, addition trump add, play risk, death play risk, death play”. The context of the references to President Trump were that he had increased the number of cases by downplaying the risk of the virus.

    I hope you’ve seen value in the NLP work demonstrated here as I’m looking for people to collaborate with me and perhaps sponsor this effort to capture all the data to-date on the virus and publish this out to a website for all to view and study. Until then, connect with me here.

    While it is too early to say I do believe we can gain predictive insights into the spread of the virus and also how we will resume our normal lives once the lockdown is lifted.  In my next article I'll look at why New York City became the epicenter as compared to cities such as Los Angeles.  While New York is much larger with higher population density, Los Angeles has had a very different trajectory of cases.  Why?

    Report

    There was a problem reporting this post.

    Harassment or bullying behavior
    Contains mature or sensitive content
    Contains misleading or false information
    Contains abusive or derogatory content
    Contains spam, fake content or potential malware

    Block Member?

    Please confirm you want to block this member.

    You will no longer be able to:

    • See blocked member's posts
    • Mention this member in posts
    • Invite this member to groups
    • Message this member
    • Add this member as a connection

    Please note: This action will also remove this member from your connections and send a report to the site admin. Please allow a few minutes for this process to complete.

    Report

    You have already reported this .
    Clear Clear All