Data Gathering

Data Description Data View Data Source
1. Use API on Twitter in R
This data is about people's attitude toward reading on Twitter. It is text data. This data is used to explore the trend of people's favor with reading during the time.

(Click to see the output file.)
Click here to get R Code.
2. Use API on BBCNews in Python
This data is about people's attitude toward reading on BBCNews. It is also text data. This data is also used to explore the trend of people's favor with reading during the time.

(Click to see the output file.)
Click here to get Python Code.
3.1 Get data from Kaggle
This data is about book formats, book categories, prices and ratings of different books. The book formats and book categories are nominal data. Prices is ratio data. Ratings is ordinal data. The question that will be answered using this data is: is there any relationship between prices of books, ratings of books, and people's favor with different book formats and different book categories?

(Click to see a larger view.)
https://www.kaggle.com/lukaanicin/book-covers-dataset
3.2 Get data from Kaggle
This data is about book rating, book reviews, and book price. Book price and book reviews are ratio data. Book rating is ordinal data. They are all quantitative data. For better analysis, the rating is divided into three classes: high, middle and low. The question that will be answered using this data is: is there any relationship between prices of books, ratings of books, and the number of reviews of books?

(Click to see a larger view.)
https://www.kaggle.com/mandan/amazon-vs-flipkart-book-prices
3.3 Get data from Kaggle
This data is about book rating, book reviews, book price, year of publishment and book gerne. Book gerne is nominal data. Book price and book reviews are ratio data. Book rating is ordinal data. Year of publishment is interval data. The question that will be answered using this data is: is there any relationship between prices of books, ratings of books, the number of reviews of books, gernes of books and the year of publishment?

(Click to see a larger view.)
https://www.kaggle.com/sootersaalu/amazon-top-50-bestselling-books-2009-2019
4. Data on Statista
This data is to find out: will income, age, education level and so on have an influence on people's favor with different book formats and different book categories? Will these factors affect people's reading hours and cost on reading? In addition, it is used to explore the trend of people's favor with different book formats during the time and predict if e-books will replace paper books totally or not.

(Click to see a larger view.)
1) Reading habits in the US. https://www.statista.com/topics/3928/reading-habits-in-the-us
2) Book formats in the US. https://www.statista.com/topics/3938/book-formats-in-the-us