A Simple Guide to Data Mining

Illustration of cart full of paper on the railroad headed towards mine with a sign “Data mining”.

In the information age, people of all ages use the internet each and every day. According to pewresearch.org, adults between the ages of 18 to29 in the year 2000 used the internet 70% of the time, and by 2021, the usage increased to 99%: this was a 29% increase in 21 years. In the year 2000, internet usage by adults between the ages of 34 to 49 was 60%.   By 2021, the usage increased to 98%; this was a 38% increase. For adults between the ages 50 to64 in the year 2000, internet usage was 46% and by 2021, the usage increased to 96%. This was a 50% increase. Lastly, for adults ages 65 and older, in the year 2000, used the internet 14% of the time, and by the year 2021, the usage increased to 75%; this was a 61% increase. For children, according to the National Center for Education Statistics from 2016 to2019, internet usage between the ages of 3 to18 started at 87% on home computers and 6% on smartphones. People use the internet for many reasons such as shopping, gaming, watching videos, doing podcasts etc.; data is free flowing like a current in a body of water.  Data is unchecked, and unfiltered forms of information extracted by data miners can be worth more than oil and gas. This leads into the concept of data mining, how this technique is beneficial to learn, and what types of professions use this technique.

Data mining is associated with computer science. It is the technique that finds new information in a large quantity of data.  Data miners use sophisticated software to analyze large amounts of data quickly and efficiently for corporations and other businesses. The information that is mined is not always useful. Think of data mining as fishing. Fisherman have hundreds of fish they can grab but they only want a particular fish of color and size, so they are patient and hook the rare fish. The same concept can be said with the data miner but  with data.  The main goal for data is to be stored so it can be used for a later purpose. This stored data is within a system called a database. There are four different types of data mining that professionals use in order to find new information. Starting with “pattern recognition”, this concept finds similarities in vertical and horizontal rows of data. For example, if you are  looking for a red truck within the data,  you would see “Truck->red”. This would show that trucks are red within that data. The second type is called “Bayesian network”, which is a graph used to model events that cannot be seen or observed. The third type is called, “neural network”, which is used in the health field or biostatistics. This technique is an artificial system that makes a model such as  a brain which mimics neuron activation in order to solve computational issues..  The fourth type is called the “classification tree” which tells the data miner what the overall data comes down to by category. For example, the truck is green, shiny, and large and then it can be determined   how fast the truck can go.

Illustration of cart full of paper on the railroad headed towards mine with a sign “Data mining”. Original illustration by Richard Montenegro.

Data mining is used in each sector; primarily it is used in the financial, retail, health, social media and big data fields. According to Investopedia.com, “Data mining can be used by corporations for everything from learning about what customers are interested in or want to buy to fraud detection and spam filtering.” Investopedia.org also states, “social media companies use data mining techniques to commodity their users in order to generate profit. “In addition, Investopedia.org suggests, “in retail, data mining is used to give customers loyalty cards to access to discounted items, as well as track purchases including certain types of items which would be stored in the store’s database.” Hospitals use data mining to keep track of health charts of patients and the medications that patients may need to be prescribed. This data would be stored and sent to the pharmacy within the hospital.

There are many careers that use data mining as part of their responsibilities. These include data scientists, computer scientists, programmers, engineers, data miner specialists, statisticians, biostatisticians, marketers for social media platforms, cryptographers, mathematicians, computational physicists, and data analysts. People who have an interest in science, math, and technology will find these careers fulfilling.

For anyone interested in computer science, computer scientists use mathematics, data science, programming languages such as C/C++, Python, Java, Javascript, and data mining to find the patterns in the data that is being analyzed in order to bring the consumer to their products and make capital gains. Technology companies such as Meta, Apple, Samsung, Netflix and Microsoft are actively seeking computer scientists for this reason. In order to become a computer scientist, your education requirements would be between a bachelor’s degree to a masters unless you’re interested in doing research in a  lab environment then the PhD route would be best. Anyone that is interested in the health field and enjoys statistics, would seek becoming a biostatistician; Biostatisticians use mathematics, statistics, computer science, biology, and data mining in order to analyze large clusters of data, collect the data and summarize the data through 3D models. The data which is collected  could result in finding  medical records, diseases,  cures, blood counts for patients. In order to become a biostatistician you will need a bachelor’s degree in statistics, mathematics or biostatistics. For the individuals that are interested in research and lab work a master’s and PhD would suffice. For anyone interested in marketing, seeking to become a digital marketer; digital marketers specialize in finding trends and patterns to keep consumers on their platforms. Businesses such as Twitter, Snap Inc, Tik Tok, Meta, Pinterest, Ebay, Google, and  Amazon would utilize digital marketers to increase sales, develop efficient ad campaigns and earn money through AdSense. In order to become a digital marketer,  you can get a bachelor’s degree in marketing or statistics or you can receive a number of certifications such as Google adwords, Google analytics, Hubspot inbound marketing, Hubspot content marketing, and digital marketing certified marketing professional.

Data Mining is part of the information age that we currently live in. As technology begins to grow at a rapid pace, data mining will evolve into a way of life that will control all the things we do in terms of security, privacy, and how we handle financial transactions in a more secure way. Data will only grow in importance as time goes on.



