January 16, 2020 7 min to read

What about the Big Data Specialization by UCSD ??

An analytical overview of the UCSD Big Data Specialization

This blog post is about the UCSD Big Data Specialization in terms of its main objectives and success in achieving these goals.

The post contains a review of the course and its content in accordance with the goal it sets to achieve as an online course specializing in Big Data and how successful it is, in delivering the stated content across to the audience.

Access and respective details about this course can be found at Coursera Courses content. Read more about the course and its specifics at Big Data Specialization

Outline for the authoritative review

The overview of the contents of the blog post is outlined here:

Objective(s) of the course
Success of the course in achieving its objectives
Student workload aligned with course outline
Perceived strengths and weaknesses of the course
Recommended improvements to be made to the course

Without further ado let’s dive right into the details:

1. OBJECTIVES OF THE COURSE

The course mainly intends to:

Provide the fundamental understanding of big data infrastructure

The course tries to lay a foundation in the minds of the students regarding the fundamentals of Big Data prior to deep diving into the plethora of information available in this domain. Starting off with real life examples and context in which the V’s of Big Data demarcates the importance of “Big” data and data in general.

Introduce the tools and techniques used in big data

As a student it is only natural that we would be interested in identifying the tools and techniques available to be used to tackle this so called big data and how they differ from the contemporaries used in data analysis in general. This course does a significant job in outlying the variety of tools available in the market for handling big data, which is comprehensible to a new comer to the world of Big Data. But a minor drawback is that it fails to mention a leading framework used in handling big data, which is Scala.

Provide a practical sense of analyzing big data using these tools

Not only does the course provide with an introduction to the tools, it also provide practical exercises to get a hand-on experience as to how to use these tools.

2. SUCCESS OF THE COURSE IN ACHIEVING THE ABOVE OBJECTIVES

The course provides a solid understanding of what big data essentially means, especially for a new comer to enormously large data sets and how to handle them. When providing this understanding, they also provide a solid knowledge about how big data is stored and later on used in analytics for deriving insights.

For instance, they explain about data lakes and how hadoop eco-system is utilized to extract, transform and load big-data.

The practical knowledge provided consists of tools and techniques that provide a significant understanding of the scope of Big Data, but given the current standing of the tools used in tackling Big Data issues, the systems introduced in the course can be a bit outdated as well.

For instance, while Splunk can be useful in providing the users an overview of doing basic exploratory data analysis of large volumes of data, the industrial usability in terms of scalability can be seen as a bottleneck.

However, at the same time, the introductory course content, in terms of Spark can be a valuable knowledge area for a new-comer to the deep-end of Big Data and its depth. The reason is that, the module on Spark provides a brief overview of the generally used packages and libraries in handling Big Data problems and how the syntax generally work. So a student such as myself could improve upon this basic foundation knowledge going forward. And on the other hand, Spark (PySpark to be clear) can be seen as a more frequently used industrial tool in Big Data handling which is useful for the course takers in a long-term stand point.

3. STUDENT WORKLOAD ALIGNED WITH COURSE OUTLINE

The course provides with enough study material, both as a video tutorial and summarized slide decks for reading and memorizing the important take-away points from a study session for later use.

In addition to these materials, the course also provides a significant amount of quizzes and practical assignments to test the aforementioned knowledge these study session imparted to the users.

In terms of the workload, I believe that the course is self paced but the information provided is pretty much concise with regards to achieving the objective of introducing the key components of Big Data.

The quizzes of the module of graph analysis (using the NEO4j analysis platform) was particularly challenging and provided room for the students to implement their own case and then develop the answer based on this hypothetical case.

The Spark practical assignment was a clustering analysis where the users of a certain online gaming platform needed to be clustered based on their usage data. This provided a brief insight as to how to use Spark, its syntax and use cases.

In general, KNIME was an interesting practical assignment where its had a drag-and-drop setting to create a pipeline and a user interface rather than a coding script for conducting analysis tasks such as classification that was used in this course in relation with KNIME.

Thus, in general the workload was compensating the time and effort in all courses that they managed to provide an overall understanding of the platforms such as Spark, KNIME, Neo4j and Splunk and how they can be used around large data sets.

4. PERCEIVED STRENGTHS AND WEAKNESSES OF THE COURSE

The course was significantly strong in disseminating a large volume of knowledge in a concise and simple manner. This is especially true in terms of getting the knowledge and content across to new users in the field of Big Data. But overall I will be listing down the strengths and weaknesses of the course below, in my point of view as a user who has completed the course.

4.1. Strengths.

Course explains the theory areas simply and gives you practice in a controlled, guided environment.
The capstone project is built sequentially accumulating all the study areas covered in the course and wrapping it in an interesting hypothetical case study.
The delivery of the course content is simple and well-paced, especially for a new-comer into the big data field.

4.2. Weaknesses.

While it does teach you data cleanup techniques, but those are basic techniques. It does not teach feature engineering type of advanced machine learning paradigms in terms of big data.
The assignments are peer reviewed. Thus the ambiguities are very hard to be solved unless we do a bit of research on our own.
The capstone project is peer reviewed, It would have been better if a much more reliable approach was taken to validate the accuracy of the answers rather than using peers and guiding them along an answer script.

5. RECOMMENDED IMPROVEMENTS

The course module is well-rounded as an introductory door into the universe of Big Data.

I myself, as a data analyst with a small-scale understanding of big data environment, I would say that this course is useful, it will open your mind to the infrastructure and algorithm used, but it does not go so deep in technical details.

But at least it scratch the surface of the Hadoop framework, so you end up with some more material to research on and you get in touch with the basic concept of the Big Data universe.

Blog . Space -By Deshan-

What about the Big Data Specialization by UCSD ??