Data science from scratch: how to start career

Python.Engineering

Career Trajectory and Career Track

Each stage requires a different number of tools and skills.

There are roles and tracks with similar competencies (you can develop in one direction, and then go deeper or turn in another direction).

In terms of training, this means laying out a track of bricks of knowledge and skills - we call these modules.

Read more in the Data science from scratch PDF version book.

Data Transformation Level. ETL specialists transform unstructured data sets into databases (DBs):

Data Engineer - responsible for data integrity and optimal storage;
database developer - provides the operability of the database;
database architect - designs the data storage.

Data Processing Layer. Analyzes the data collected in the previous level to gain knowledge and benefit from it:

Analyst - analyzes metrics, conducts experiments, makes predictions.
Data Scientist - develops a data-driven product (e.g., a recommendation system).
BI Specialist - handles visualization, interactive dashboards.
ML Engineer - designs and is responsible for the development of data-driven products.

ML Engineer has the most career tracks - in fact, it is an algorithm developer. These are neural networks, voice assistants, Object detection - the field of security, demand prediction, predictive analytics, object recognition. Among the more complex areas: GAN - image manipulation, RL - game strategy, gamdev, Black-box AI - box solutions for artificial intelligence.

What knowledge and skills a data analyst needs

Hard skills

Gather and analyze customer requirements for problem solving and form of presentation.
Obtain, cleanse, and transform data.
Interpret data and draw valid conclusions from it;
Develop software solution requirements and implementation.
Conduct research and A/B tests.
Know key mathematical methods and basic statistics.
Make sketches and prototypes.

Soft skills

Think in the abstract.
See the meaning behind metrics and indicators.
Find relationships and make hypotheses.
Have developed emotional intelligence.

Tools in demand

All Data Science professionals need to master spreadsheets and data access and processing tools: DBMS, data warehouses, SQL, ETL.

BI analytics: BI tools - e.g. Power BI, Tableau, OLAP and mining tools: SAS, R, Weka, Python (limited to specific tasks), Knime, RapidMiner.

Data Scientist and data analytics: visualization and analysis libraries within Python and R, mining tools - in-depth, interactive shells Jupyter, Zeppelin, automation and deployment tools Docker, Airflow.

Data Engineer: in-depth knowledge of ETL processes and pipelining.

SQL and Python knowledge is a must, preferably Java/Scala languages. Must have experience with cloud platforms, such as Amazon Web Services or Google Cloud Platform, as well as big data technologies: Hadoop, Spark, Kafka.

Why is data science so interesting? The main reason is the hidden efficiency contained in the data. Any company collects data. And their analysis allows you to make better products, attract more target customers and retain them, improve business processes and much more.

Why is data science perceived as a kind of "magic pill"? The basic principle is that data science allows you to draw objective conclusions from available data, free from bias or prejudice inherent in humans.

The demand from business generates a great demand for specialists. In the United States alone, a shortage of about 190,000 data scientists is expected in the next three years. The interest of job seekers is also not long in coming.