Data Science Road Map part 01
We can divide the data science learning journey into five levels. It is not necessary to complete all five levels before entering the world of data science. After clearing one level, you can start working in data science and clear the remaining levels along the way.
Level-1

PYTHON
Python: Data science implementation is not possible without coding knowledge. Any coding knowledge is all you need. Python plays a major role in this. Besides, Python basics are used in the industrial field. The question is how much knowledge of Python is necessary. Generally, learning 70% of Python is enough to become a software engineer. Things to learn include-
- basic syntax
- such as how to make variables
- use if-else functions
- understand loops
- functions
- Object-oriented programming (OOP)
- file handling
- exception handling
- Working with libraries polymorphism
- encapsulation.
This much Python knowledge is enough to enter the machine learning (ML) and data science world.
MATHS
Math: Math is a very important skill in the data science learning journey. Math skills will broaden your horizons and help you understand how data works. If you have math knowledge, you can enjoy the actual fun of data science. You don’t have to be an expert in all aspects of math; covering some specific topics is enough. Specific topics to cover include-
- statistics
- probability
- differential calculus
- linear algebra.
There is no need to memorize all math topics, but covering specific topics will help you understand data science-related math concepts.
TOOLS
Tools: SQL is important to learn as 60-70% of the data in any business company is stored in databases. It is essential for handling this data. Additionally, tools like-
- Pandas (Python library)
- NumPy (Python library)
- Matplotlib
- Seaborn
- Exploratory data analysis (EDA)
After completing these three steps of Level 1, you will be fully ready to study data science.
Level-2

Machine Learning (ML) Algorithms:
Machine Learning (ML) Algorithms: You will experience the actual fun through ML algorithms. However, the challenge lies in the multitude of algorithms available; it can be overwhelming to decide which ones to learn. There are approximately 15-20 algorithms, but learning all of them together can be time-consuming and boring. Therefore, it’s important to focus on the most commonly used and important algorithms first. These include:
- Linear Model (Supervised ML): This is a very old algorithm but still the most used. There are three types of algorithms within the linear model:
- Linear regression
- Logistic regression
- Support Vector Machines (SVM)
- Tree-based Model (Supervised ML):
- Decision tree
- Gradient tree
- Random forest
- Unsupervised ML:
- Principal Component Analysis (PCA)
- K-means
- DBSCAN
Focus on these algorithms before delving into others.
ML Techniques:
- ML Techniques: After learning ML algorithms, it may seem that the work on a machine learning project should start by applying the ML algorithm to a dataset. However, when you apply machine learning algorithms to a practical task, you will face some problems with data. ML techniques will be required for this. Topics to be covered include:
- Feature Engineering: Feature engineering means that any kind of engineering technique can be applied to the input column of your dataset to make the data useful, such as:
1. Feature Normalizations
2. Feature Standardization
3. Handling Missing Values
4. Handling Outliers
- Hyperparameter Tuning Techniques: The exact value of hyperparameters inside an ML algorithm can be difficult to find. This value can be determined through hyperparameter tuning techniques.
- Data Leakage Avoidance Techniques.
- How to Handle Unbalanced Classes.
All these should be learned within ML techniques. When you learn ML algorithms and ML techniques, you can start solving machine learning problems.
PROJECT
We learned coding, math, ML algorithms, and ML techniques.
Now is the perfect time to do a project or two. You can take the dataset of any topic of your choice from Kaggle.com. Then, using coding and ML algorithm knowledge on this dataset, extract the output and make the input data usable.