This course is intended for aspiring Data Scientists, data analysts that have completed the associate level Data Science and Big Data Analytics course, and computer scientists wanting to learn MapReduce and methods for analyzing unstructured data such as text.
• Completion of the Data Science and Big Data Analytics course
• Proficiency in at least one programming language such as R or Python
Upon successful completion of this course, participants should be able to:
• Develop and execute MapReduce functionality
• Gain familiarity with NoSQL databases and Hadoop Ecosystem tools for
• analyzing large-scale, unstructured data sets
• Develop a working knowledge of Natural Language Processing, Social
• Network Analysis, and Data Visualization concepts
• Use advanced quantitative methods, and apply one of them in a Hadoop • environment
• Apply advanced techniques to real-world datasets in a final lab
Module 1: MapReduce and Hadoop
• Lesson 1: The MapReduce Framework
• Lesson 2: Apache Hadoop
• Lesson 3: Hadoop Distributed File System
• Lesson 4: YARN
Module 2: Hadoop Ecosystem and NoSQL
• Lesson 1: Hadoop Ecosystem
• Lesson 2: Pig
• Lesson 3: Hive
• Lesson 4: NoSQL – Not Only SQL
• Lesson 5: HBase
• Lesson 6: Spark
Module 3: Natural Language Processing
• Lesson 1: Introduction to NLP
• Lesson 2: Text Preprocessing
• Lesson 3: TFIDF
• Lesson 4: Beyond Bag of Words
• Lesson 5: Language Modeling
• Lesson 6: POS Tagging and HMM
• Lesson 7: Sentiment Analysis and Topic Modeling
Module 4: Social Network Analysis
• Lesson 1: Introduction to SNA and Graph Theory
• Lesson 2: Most Important Nodes
• Lesson 3: Communities and Small World
• Lesson 4: Network Problems and SNA Tools
Module 5: Data Science Theory and Methods
• Lesson 1: Simulation
• Lesson 2: Random Forests
• Lesson 3: Multinomial Logistic Regression
Module 6: Data Visualization
• Lesson 1: Perception and Visualization
• Lesson 2: Visualization of Multivariate Data
Click on the following link to see the current Course Schedule
Our minimum class-size is 3 for this course.
If there are no scheduled dates for this course, it can be customized to suit the time and skill needs of clients and it can be held online, at a rented location or at your premises.
Click on the following link below to arrange for a custom course: Enquire about a course date
Data is created constantly, and at an ever-increasing rate. Mobile phones, social media, imaging technologies to determine a medical diagnosis-all these and more create new data, and that must be stored somewhere for some purpose. Devices and sensors automatically generate diagnostic information that needs to be stored and processed in real time. Merely keeping up with this huge influx of data is difficult, but substantially more challenging is analyzing vast amounts of it, especially when it does not conform to traditional notions of data structure, to identify meaningful patterns and extract useful information. These challenges of the data deluge present the opportunity to transform business, government, science, and everyday life.
“Big Data” is data whose scale, diversity and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from.
Although the volume of Big Data tends to attract the most attention, generally the variety and velocity of the data provide a more apt definition of Big Data. (Big Data is sometimes described as having 4 Vs: volume, variety, veracity and velocity.)
Due to its size or structure, Big Data cannot be efficiently analyzed using only traditional databases or methods. Big Data problems require new tools and technologies to store, manage, and realize business benefits.
The players in this field are data analysts, data engineers and data scientists :
An effective data analyst will take the guesswork out of business decisions and help the entire organization thrive. The data analyst must be an effective bridge between different teams by analyzing new data, combining different reports, and translating the outcomes. In turn, this is what allows the organization to maintain an accurate pulse check on its growth.
The data scientist will uncover hidden insights by leveraging both supervised (e.g. classification, regression) and unsupervised learning (e.g. clustering, neural networks, anomaly detection) methods toward their machine learning models. They are essentially training mathematical models that will allow them to better identify patterns and derive accurate predictions.
Data engineers establish the foundation that the data analysts and scientists build upon. Data engineers are responsible for constructing data pipelines and often have to use complex tools and techniques to handle data at scale. Unlike the previous two career paths, data engineering leans a lot more toward a software development skill set. At larger organizations, data engineers can have different focuses such as leveraging data tools, maintaining databases, and creating and managing data pipelines. Whatever the focus may be, a good data engineer allows a data scientist or analyst to focus on solving analytical problems, rather than having to move data from source to source. The data engineer’s mindset is often more focused on building and optimization
CANCELLATION POLICY – There is never a fee for cancelling seven business days before a class for any reason. Data Vision Systems reserves the right to cancel any course due to insufficient registration or other extenuating circumstances. Participants will be advised prior to doing so.