Tools
These tools may help you deal with data. Here you will find a brief synopsis on all the major data related topics and tools, as well as some less common ones; books, articles, videos and more to learn more about the topic from; as well as free code samples demonstrating technical topics usually in the form of Notebooks and/or containers.
Table of Contents
Data Science
-
Data Visualization, such as with Tableau, PowerBI, Python or R
-
Gathering Data, such as using web scraping and lists of free data sources
How to Use
The Starter Guides are meant to be use-it-when-you-need-it, so if you know what topic you are looking for, dig right in! Otherwise, this page can help you get started.
If you are brand new to dealing with data, start with learning about the data modeling process. If you need to gather your own dataset, web scraping or searching for a dataset is the next step. Once you have data, you might need to select and perform an analysis technique.
For projects with a data engineering focus, check out our Containers, SQL, or SLURM guides.
Data Engineering Vs. Data Science: What’s the Difference?
If you are relatively new to dealing with data, refer to the table below to get a feel for the difference between data engineering and data science.
Discipline | Data Engineering | Data Science |
---|---|---|
Languages Used |
Any; General Purpose Languages Most Common, like Python, Java, C++ |
Python, R |
What They Do With Data |
"Move Data Around"; Collect, Organize, Set Up Databases; Set up Cloud Systems |
"Make Sense of Data"; Analyze, Train Models, Make Visualizations |
Common Backgrounds |
Computer Science, IT |
Math, Statistics, Computer Science |
Common Tools |
Hadoop, NoSQL, Spark, Postgresql, Kubernetes, Docker |
MapReduce, Keras, PyTorch, Plotting Packages like GGPlot, JAX |
While there are debates about whether data science is data engineering and vice versa, or whether they even belong on the same guide together, they are both dependent on each other in some form or fashion, and so we included both as separate categories. For some organizations, people do both! For others, they have multiple departments that share all of those responsibilities; still others draw a much starker line between the two than we have here. One thing is clear: dealing with data always depends on who you deal with, and the jury is still out on the right way to categorize these skills. Nonetheless they overlap in many areas. |
Data professionals of all stripes should know a mix of data engineering and data science to be successful at their jobs. |