Data Science Capstone

Text mining

Final Project - May 2016

Presented by Ola Lie



Explore

Blogs, news and twitter

The content in these texts is explored in this Milestone Report

Algorithm

using tm and RWeka

  1. Create corpus and clean data with tm
  2. Create bi-, tri- and tetragrams with RWeka
  3. In server.R (shiny)
    • Strip user input to last three (two, one) words
    • Search first three words of tetragrams
    • If no matches, search first two words of trigrams
    • If no matches, search first word of bigrams
    • Calculate percentages for matches

 



Performance

Less than five seconds
response time

The first search
might take a bit longer
when the app is awakening



 

 






Visit the

Web App