Text mining

Data Science Capstone

Text mining

Final Project - May 2016

Presented by Ola Lie

Explore

Blogs, news and twitter

The content in these texts is explored in this Milestone Report

Algorithm

using tm and RWeka

Create corpus and clean data with tm
Create bi-, tri- and tetragrams with RWeka
In server.R (shiny)
- Strip user input to last three (two, one) words
- Search first three words of tetragrams
- If no matches, search first two words of trigrams
- If no matches, search first word of bigrams
- Calculate percentages for matches

Performance

Less than five seconds
response time

The first search
might take a bit longer
when the app is awakening

Visit the

Web App