Data science makes its way into daily life without people being aware of it. For example, Spotify and Netflix users get recommendations of music, TV shows, and movies based on a database of the media they previously watched or listened to.
Data is generated by ordinary actions like a Tweet or an online purchase. Smartphones collect plenty of data, with the number of sensors built in them.
“Everything is getting smarter, and this is because we are producing so much data. With data, we get to understand our world so much better,” said Dr. Erika Legara, a data scientist and faculty member of the Analytics, Information, and Operations Department at the Asian Institute of Management.
Speaking at an AIM conference on big data in Makati City, Legara advocated the use of data science to make decisions in government and business, quoting a new saying which likened big data to oil.
“Oil is very valuable, but it’s only valuable if you store it and you refine it and you produce something out of it. Same thing with data. No matter how big your data is, if you don’t store it, if you don’t clean it, if you don’t analyze it, it’s practically useless,” Legara said.
She drew lessons from her five-year stint at the Agency for Science, Technology and Research (A*STAR), under Singapore’s Ministry of Trade and Industry.
One of their projects was centered on Singapore’s MRT system. Transport planners gave them data collected through the commuters’ Beep cards, which served as their tickets. The team’s goal was to create a transport simulation tool which would answer the following questions.
What are the critical origin and destination pairs in the MRT system?
How often are people unable to board trains because they are full?
If the commuters increase further, will it reach a point where there will breakdowns that will lead to a catastrophe?
In the end, the data scientists were able to use the information they had to create a platform that could do simulations. This allowed them to see how many people were tapping in and out of each station at any time of the day. They could see how many people were in the trains and how full they were. They could also see how long it took to get from one point to another.
They could use the platform to imagine scenarios – for example, if there was a breakdown in one station, which stations would be affected, as well? Authorities would know where shuttle buses could be sent to fetch people who were stranded.
The platform also enabled them to find out what would happen to travel time if the commuting population increased, and up to what point the current infrastructure could handle the passengers without any new construction being done.
“So this is the power of using data to create empirical models and simulations,” Legara said.
Another project her former A*STAR colleagues did was to improve the system of Singapore’s ports, with historical data as their basis.
“Singapore has a billion-dollar port industry. It’s very important that all the schedules are optimized … the movement of the ships, the docking, the scheduling system, and then once the ship reaches the ports, they also have to make sure that the cranes that would pick up the cargos are optimized,” Legara said.
They also did a project on machine health monitoring.
“Before a machine would break down, they can predict months ahead using artificial intelligence and machine learning. This saves you a lot of money. You don’t need to wait for the machine to actually break down, for could cause a lot of delays in your processes,” Legara explained.
According to her, there are different ways to analyze data sets. Descriptive analytics is when a person looks at the data and describes what he or she sees. Diagnostic analytics is when one tries to figure out why the thing that he or she observed is happening. Predictive analytics comes next – after the diagnosis, one can make predictions.
Machine learning and artificial intelligence have led to prescriptive analytics, where machines do the calculations for the people, then recommend or prescribe certain actions.
Data science is about using data to answer specific questions, and to support decision making, Legara said.
Even the retail industry benefits from data science.
Legara cited an auto-parts distributor in the United Kingdom that hired data scientists to look at 20 million transactions. They were able to identify which specific products to sell at a discounted price at certain times of the year. This led to an increase in revenue by millions of British pounds.
And these were just junior data scientists who did the job, Legara pointed out.
But can these examples be replicated in the Philippines?
It had already been done when A*STAR worked with the Manila Water Company to predict the height of Angat Dam’s water level, which supplies Metro Manila’s water.
“You see, if you can predict it but just one week ahead, two weeks ahead, you cannot really do a lot. But what we did was, with deep learning, we were able to help them improve the system so they can predict the level of the Angat Dam height six months beforehand. If it’s six months, then there’s a lot that you can do,” Legara said.
She stressed, “Implementing policy based on intuition alone, sometimes yeah, they work. But they can be very expensive and time-consuming. And catastrophic if they are wrong. We already have the data. I think we have to make use of the data that we have.”