A Noob’s perspective — Learning Data Science, ML and AI from scratch Pt.1
I’ve always wanted to learn about machine learning and artificial intelligence, but I’ve been procrastinating it until now because I need to learn for my thesis project. I know there are a lot of ML tools that do not involve much brain work, but as an engineer I should know at the very least how the underlying math and code for ML works. Also I want to be able to produce a good quality thesis work.
Intro
I feel like when most people hear anything related to machine learning or artificial intelligence, two things come to mind. It’s either very exciting for the future or you fear the rebellion of machines. One thing that I do know about AI is that it’s really like any other piece of code. It’s a set of instructions that a machine needs to follow. The “Magic” of AI is powered by math and statistics and I suck at both.
Data Science
You could try to define it yourself or look up a definition, however I like to explain and understand things, just knowing isn’t enough, and I’m not saying this to be a pretentious punk, I saying this because if you’re like me, you might spend hours “studying” and not absorbing anything because you might have undiagnosed ADHD. To solve our problems I’ll explain an image.
I used to work for an IT company which typically means I was miserable all the time. In this job I had to help with a project to extract information about attention tickets that were registered using the Jira ITSM tool. So basically what was done was we used AWS and the API of Jira to extract the information from a Lambda function using HTTP requests and we cleaned the data for our receiving end. After extraction and cleansing, we stored the information in an S3 bucket in JSON format. and finally we linked that S3 bucket to AWS Quick Sight which is a business intelligence tool that AWS offers.
That solution took maybe a little over a week, we had to tune some things but it wasn’t impossible to manage. In our case we didn’t have to apply machine learning. Something worth knowing too is that you need to very well your job or the task at hand because you may not need ML at all.
The diagram above shows how it’s important to understand the fundamentals of each field in order to become a data scientist. Let’s check each one.
It’s important to know your field, if you know what data you need, where you get it, and what data is not needed then you won’t have problems while gathering the necessary data that you need to process.
Math and statistics are also very much necessary, it’s virtually impossible to make models, discover new insights or use AI with our data. Coding helps us to process data very fast and easy; naturally, we can use high volumes of data with coding as well as prepare the data for visualization.
However, coding is not always necessary to create a model and make predictions. in the book, Sapiens — A Brief History of Humankind, Author Yuval Harari in chapter 14 “The Discovery of Ignorance” there’s an explanation about how in 1744, two Presbyterian clergymen decided to build a life insurance fund that would provide for widows and children when the husband died; they would have to take into account how much money ministers of the church needed to pay, and how many ministers would die in a given year. Now let’s take a minute to appreciate how Harari does not waste any chance to throw a bit of shade to any form of religious following. Please don’t come at me. I just find what he wrote kinda funny.
“[…] Take note of what the two churchmen did not do. They did not pray to God to reveal the answer. Nor did they search for an answer in the Holy Scriptures or among the works of ancient theologians. Nor did they enter into an abstract philosophical disputation. Being Scots, they were practical types. So they contacted a professor of mathematics from the University of Edinburgh, Colin Maclaurin. […]”
He indeed did not have to but still chose to remark how churchmen didn’t seek God for answers but a math professor. Let’s go on, so the three of them began recollecting data on how many ministers died through the ages and tried to calculate how many would die in a given year. They used breakthroughs in math and statistics that were recent at the time, like the Law of Large Numbers which was discovered by Jacob Bernoulli.
I’m not going to go in depth on what is in the book, I encourage you to search and read it. I might have a pdf of this particular chapter but in Spanish so if you’d like to read it, please let me know the comments if you want the pdf. Anyway, according to their calculations by the year 1765, the capital would be £58,348. Note this was without any means of computers or coding. When the year came they had £58,347. It was very accurate, we have to keep in mind that all of this hype or fright about AI and ML comes with math, statistics, and probability.
Keep tuned for more thanks for reading.