Are you planning to start a career in data science or thinking of making a switch? Don’t be tempted to jump straight into machine learning, deep learning, or LLM courses. Instead, below is a recommended order to follow.
In this post, I have tried to share an overview of how to start with data science and why it’s important to learn each topic:
- Statistics: Start by learning statistics, including basic concepts like mean, mode, median, and more. This will give you a rough idea of how data looks. Once you understand the data, move on to inferential statistics to get inferences using methods like hypothesis testing, confidence intervals, and t-tests, etc. You’ll find that what you’ve learned in statistics can mostly be implemented in Python and SQL.
Note: Learn a little about algebra and calculus as well to understand ML algorithms better.
- Python: Dive into Python. You should be able to solve basic logical programming problems because while writing code, sometimes you have to set up the flow in such a way that requires certain conditions to be met to generate preprocessed data.
Once done, now utilize the techniques you have learned so far in statistics to understand data by using libraries like Pandas, NumPy, and Matplotlib, etc.
- SQL: Learn SQL and write queries to fetch results using specific conditions, nested queries, window functions, and more. There will be scenarios where you have to explore data more, and that can be done using Python and SQL. Learning SQL is preferable because it will generate results faster, and the language is designed to handle complex queries. Also, SQL can be useful with other tools like Hive, Spark, etc., for handling large datasets.
- Machine learning: Now move on to machine learning (ML) and try to relate the concepts you’ve learned while studying any ML algorithm. Pay attention to the details, even if they might not be useful all the time during modeling but are required during interviews and also to solve a problem faster. Once you’ve learned an ML algorithm, see its implementation in Python and try to apply them using all the knowledge you’ve gained so far.
- Productionize the model: Finally, learn how to put your models into production. This involves understanding how to create shareable models, data, etc. files and deploying them to the cloud so that they run automatically in real-time. This will require you to learn both cloud and Python’s productionizing libraries a little.
Remember, there’s a lot to cover in each topic. Start with basics on each topic, try to understand how each topic is connected with others, and implement it. Implementation is a must.
Note: Here statistics, Python, and SQL can be started in parallel.
I’ll try to share more details about each point in future posts, explaining WHY to learn what you are learning because you will find articles stating WHAT to learn easily on the internet, but you have to figure out always WHY to learn. If you already know WHY to learn, it will be easier for you to relate things while learning.
Feel free to comment down if you need anything specific to be covered or need more details on anything, I’ll try to cover it in future posts.