Analytics

Fake NEWS detection using Data Analytics

Do you think all the news that spread across the internet is true and realistic? Not at all. Fake news has become a serious issue in the digital world. This news spread just like wildfire, without limitations and very fast impacting the lives of millions of peoples.  So how can we deal with fake news? It is not as easy as turning to a simple fact-checker. Such news is intentionally written with some story-by-story base. Here comes Python to help us.

Read more..

Fake NEWS detection using Data Analytics project Looking to build projects on Analytics?:

Analytics Kit will be shipped to you and you can learn and build using tutorials. You can start for free today!

1. Data Analytics using R


Project Description

Before going deep into the fake news detection project, let’s get familiar with some terms related to this project.

To get the statistics about the news, we need to count the appearance of the word in the document. But one issue with word counting is that words like ‘the’ appears many times in the document but its count is not meaningful in encoded vector.

One solution for this is to count the word frequency. The method used for this is TF-IDF which stands for “Term Frequency – Inverse Document Frequency “.

  • Term Frequency: It indicates how many times the word appears in the document. A higher value means the word appears more times and so on.
  • Inverse Document Frequency: IDF measures how significant the term is in other articles of the same writer. Words that occur many times in a document may occur many times in others also.

In short, TF-IDF is a word frequency counter that tries to highlight the interesting words. TF-IDF tokenize the document and encode the new document. TF-IDF Vectorizer converts the raw data in the document into TF-IDF matrix.

Modules used for this Project

  • numpy: Numpy is a package that stands for ‘Numeric Python’. It is a library for scientific calculations and computations. It is used in linear algebra, random number capability, Fourier transform and dealing with multidimensional arrays. Numpy is also used as a multidimensional container for generic data. It is a sophisticated and high-performance multidimensional array object processor.
  • Pandas: pandas is an open-source library built on the top of Numpy. This means to run pandas you need to have Numpy installed on your machine. It is used to perform data manipulation in Python. Pandas provides an easy and efficient way to slice the data, merge, concatenation and reshaping the data.
  • Sklearn: It is an open-source Python library that includes a wide range of machine learning, preprocessing, cross-validation and visualization algorithms.

Latest projects on Analytics

Want to develop practical skills on Analytics? Checkout our latest projects and start learning for free


Project Implementation

The dataset used for this project is news.csv. Dataset has a shape of 7796*4.

The dataset has four columns: first identifies the news, second and third are title and text and the fourth one is the label denoting FAKE or REAL.

Follow the below steps to complete the project:

  • Make the necessary imports.
  • Read the data into the data frame and get the shape of the data.
  • Now get the labels from the DataFrame.
  • Split the dataset into training and testing models.
  • Initialize the TfidfVectorizer with stop words from English and maximum document frequency of 0.7.
  • Initialize the PassiveAggressiveClassifier.
  • At last print the confusion matrix to gain the data about false and true negatives and positives.
  • After completion of the project, we get an accuracy of 92.82%.

Software requirements: Pycharm Community Edition.

Programming Languages and modules: Python3, Numpy-module, pandas, sklearn.


How to build Analytics projects Did you know

Skyfi Labs helps students learn practical skills by building real-world projects.

You can enrol with friends and receive kits at your doorstep

You can learn from experts, build working projects, showcase skills to the world and grab the best jobs.
Get started today!


Kit required to develop Fake NEWS detection using Data Analytics:
Technologies you will learn by working on Fake NEWS detection using Data Analytics:
Fake NEWS detection using Data Analytics
Skyfi Labs Last Updated: 2022-04-16





Join 250,000+ students from 36+ countries & develop practical skills by building projects

Get kits shipped in 24 hours. Build using online tutorials.

Subscribe to receive more project ideas

Stay up-to-date and build projects on latest technologies