What to take home from UCSD Big Data Specialization ??

Look at the main take aways from the UCSD Big Data Specialization

Featured image

Ever wondered what you will be taking back from a course before you embarked on that journey? So did I.

This post is a brief walkthrough along the core contents of the UCSD Big Data Specialization given to us by Coursera. Is it worth the time and effort? Well… there ain’t no right answer and it all depends on you. I will give you what to expect to be taken back home with you before you actually do and you can decide for yourself…

Access and respective details about this course can be found at Coursera Courses content. Read more about the course and its specifics at Big Data Specialization

WHAT IS THIS COURSE ABOUT ?

University of California, San Diego as an online self paced course module. The blog will have a brief surf through about the contents of the course and a comprehensive guide of the capstone project and how I have approached this to derive a feasible solution.

In a nutshell the course guides us through the basics of using Spark while providing a conceptual understanding of Hadoop Eco system as a whole along with Map-reduce. This specialization prepares us to ask the right questions about data, communicate effectively with data, and do basic exploration of large, complex data sets using platforms such as Splunk, Knime, Spark and neo4j.

The outline of the specialization can be broken down as follows:

  1. Course 1 - Introduction to Big Data
  2. Course 2 - Big Data Modeling and Management Systems
  3. Course 3 - Big Data Integration and Processing
  4. Course 4 - Machine Learning With Big Data
  5. Course 5 - Graph Analytics for Big Data
  6. Course 6 - Big Data - Capstone Project

Without further ado let’s dive right into the details:

The individual component listed above is briefed below to give a rough understanding of what I learnt in each step.

1. COURSE 1 - INTRODUCTION TO BIG DATA

This section provides the knowledge to become conversant with the big data terminology and the core concepts behind big data problems, applications, and systems. It gets us started to think about how Big Data might be useful in our daily business or career while providing an introduction to one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible.

Bottom line is that you will get:

2. COURSE 2 - BIG DATA MODELLING AND INTEGRATION

In this section we learn the best suited tools to manage and analyse the above identified types of data.

You will get an exposure towards:

3. COURSE 3 - BIG DATA INTEGRATION AND PROCESSING

In simple terms you will see yourself engaging in:

4. MACHINE LEARNING WITH BIG DATA

This course provides an overview of machine learning techniques to explore, analyze, and leverage data.

The key take aways from this endeavour would be:

5. GRAPH ANALYTICS FOR BIG DATA

This course gives a basic understanding about data network structures and how it changes under different conditions.

What you will learn from this course?

6. BIG DATA CAPSTONE PROJECT

flamingo

After learning all these about Big Data, it would be a waste to not utilize them. Its time to show you and the mentors that you can use the learnt knowledge in a practical scenario. This is where the Capstone Project comes in. You will be given a task to handle an entore pipeline associated with a hypothetical industry related game knows as Catch the Pink Flamingo

The above mentioned are the basic components that was the highlight of each course module as per my opinion. The key highlights could vary depending on each person but I have tried to summarize the contents of each course in accordance with the practical weight it has given in terms of applications towards big data.

The detailed explanation of how I approached the Capstone project will be given in another blog post. In this blog I will mainly explain how Spark was used to analyse a component of the big data problems. That is how pySpark was used for a clustering task.

SUMMARY OF THE SESSION

The big data specialization provides with a range of ideas as to how to identify the V’s of Big Data and how we can interpret the basic insights through them.

The data tools that can be used to excavate the mines of BIg Data were also introduced to the users within this course. Some such tools were, KNIME, Splunk, pySpark etc.

The users are taken through a guided case study to better understand the means of using the above introduced tools in a big data scenario.

Utilize the knowledge derived throughout in an industry grade capstone project and get it reviewed by the peers.