January 16, 2020 6 min to read

What to take home from UCSD Big Data Specialization ??

Look at the main take aways from the UCSD Big Data Specialization

Ever wondered what you will be taking back from a course before you embarked on that journey? So did I.

This post is a brief walkthrough along the core contents of the UCSD Big Data Specialization given to us by Coursera. Is it worth the time and effort? Well… there ain’t no right answer and it all depends on you. I will give you what to expect to be taken back home with you before you actually do and you can decide for yourself…

Access and respective details about this course can be found at Coursera Courses content. Read more about the course and its specifics at Big Data Specialization

WHAT IS THIS COURSE ABOUT ?

University of California, San Diego as an online self paced course module. The blog will have a brief surf through about the contents of the course and a comprehensive guide of the capstone project and how I have approached this to derive a feasible solution.

In a nutshell the course guides us through the basics of using Spark while providing a conceptual understanding of Hadoop Eco system as a whole along with Map-reduce. This specialization prepares us to ask the right questions about data, communicate effectively with data, and do basic exploration of large, complex data sets using platforms such as Splunk, Knime, Spark and neo4j.

The outline of the specialization can be broken down as follows:

Course 1 - Introduction to Big Data
Course 2 - Big Data Modeling and Management Systems
Course 3 - Big Data Integration and Processing
Course 4 - Machine Learning With Big Data
Course 5 - Graph Analytics for Big Data
Course 6 - Big Data - Capstone Project

Without further ado let’s dive right into the details:

The individual component listed above is briefed below to give a rough understanding of what I learnt in each step.

1. COURSE 1 - INTRODUCTION TO BIG DATA

This section provides the knowledge to become conversant with the big data terminology and the core concepts behind big data problems, applications, and systems. It gets us started to think about how Big Data might be useful in our daily business or career while providing an introduction to one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible.

Bottom line is that you will get:

Understanding the Big Data landscape including examples of real world big data problems.
Understanding how the V’s of Big Data (volume, velocity, variety, veracity, valence, and value) impact data collection, storage and analysis.
Identify features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce model.

2. COURSE 2 - BIG DATA MODELLING AND INTEGRATION

In this section we learn the best suited tools to manage and analyse the above identified types of data.

You will get an exposure towards:

Experience various data genres and management tools appropriate for each.
Understand how to differentiate between a traditional Database Management System and a Big Data Management System.
Designing a big data model for an online game company using the taught platforms.

3. COURSE 3 - BIG DATA INTEGRATION AND PROCESSING

In simple terms you will see yourself engaging in:

Understand how to retrieve data from a sample database and big data management systems.
Understand how to execute a simple big data integration and processing task on Hadoop and Spark platforms (For this a VM provided by coursera for this course content is used)

4. MACHINE LEARNING WITH BIG DATA

This course provides an overview of machine learning techniques to explore, analyze, and leverage data.

The key take aways from this endeavour would be:

Understand how to Identify the type of machine learning problem in order to apply the appropriate set of techniques.
Apply machine learning techniques to explore and prepare data for modeling.
Software introduced: KNIME, Spark

5. GRAPH ANALYTICS FOR BIG DATA

This course gives a basic understanding about data network structures and how it changes under different conditions.

What you will learn from this course?

Analyze graph-structured data
Model a problem into a graph database

6. BIG DATA CAPSTONE PROJECT

After learning all these about Big Data, it would be a waste to not utilize them. Its time to show you and the mentors that you can use the learnt knowledge in a practical scenario. This is where the Capstone Project comes in. You will be given a task to handle an entore pipeline associated with a hypothetical industry related game knows as Catch the Pink Flamingo

The above mentioned are the basic components that was the highlight of each course module as per my opinion. The key highlights could vary depending on each person but I have tried to summarize the contents of each course in accordance with the practical weight it has given in terms of applications towards big data.

The detailed explanation of how I approached the Capstone project will be given in another blog post. In this blog I will mainly explain how Spark was used to analyse a component of the big data problems. That is how pySpark was used for a clustering task.

SUMMARY OF THE SESSION

Understand how to interpret massive and complex data

The big data specialization provides with a range of ideas as to how to identify the V’s of Big Data and how we can interpret the basic insights through them.

Use common Big Data technologies, including Splunk and KNIME and basics of pySpark

The data tools that can be used to excavate the mines of BIg Data were also introduced to the users within this course. Some such tools were, KNIME, Splunk, pySpark etc.

Conduct analytical case studies using several big data tools

The users are taken through a guided case study to better understand the means of using the above introduced tools in a big data scenario.

Put it all into practice at the industry grade Capstone Project - CATCH THE PINK FLAMINGO

Utilize the knowledge derived throughout in an industry grade capstone project and get it reviewed by the peers.

Blog . Space -By Deshan-

What to take home from UCSD Big Data Specialization ??