Bridget Kimball, Vice President and Chief Architect at Intuit, will moderate "Gain a Competitive Edge in AI/ML Applications with Good Data" on July 28th, 5 to 6 p.m.
For artificial intelligence and machine learning (AI/ML) to be effective, it is crucial to have good data. Athena is hosting a webinar to help you maneuver the increasing complexities of gathering data and deriving insights and value from it – faster than ever before. Join Athena at the webinar titled, "Gain a Competitive Edge in AI/ML Applications with Good Data" on July 28th, 5 to 6 p.m.
Bridget Kimball, Vice President and Chief Architect at Intuit, will moderate the discussion. She sat down with the Athena Technology Committee to outline some of the core attributes of useful data and how companies can set the stage for quality data acquisition.
When people say that artificial intelligence (AI) and machine learning (ML) requires good data, what does that mean? Good quality data is fit for its purpose. Data quality has many dimensions:
Completeness: The data has the expected comprehensiveness. For example, when someone gathers phone numbers, we expect it will include the area code.
Consistency: All systems across the data ecosystem contain the same information
Accuracy: How accurately does the data reflect the event in question or the real-world object?
Timeliness: Is the data available when required. For instance, real-time clickstream data can demonstrate where in the buying process, customers are facing challenges and motivate them to continue through the purchase path.
Validity: It conforms to the structure of its definition
Uniqueness: Each data entry is one of its kind
What issues do companies, developers and programmers face when working with data? As they used to say in the '80s, "garbage in, garbage out." A data science model is only half of the equation for creating a structure wherein you can gather data and derive insights from it. When the data quality is high, you can have high confidence in the outputs. If the data quality is sketchy, then you can't trust the results of your models.
What problems arise when the data gathered is poor-quality? Poor data quality leads to the inability to develop accurate insights/models. It also drives down the productivity of the data team. Data workers spend most of their time trying to piece together existing data to clean it and create better quality. Data scientists report that they spend more than 82% of their work cleaning and preparing data for AI/ML applications.
Not only is that a waste of the company's human resources, but it's also not what data scientists want to be doing. Data scientists and should be spending time developing models. Data clean-up is very tedious, and we should be able to automate a lot of the manual work that is done today.
How do things run when the data quality is good? The data team's productivity correlates directly with the quality of the data. When the data is high-quality, scientists spend time working on the business problem and decision making, not tweaking data. They can increase the effectiveness of marketing and communications activities, for example, because the data helps them accurately target the correct audience. Good data also makes compliance easier, particularly in highly-regulated industries such as finance.
How can companies focus on developing better data stores? My advice is to put a focus on data engineering. The right people should be given the time at the start of projects to create robust and effective systems for collecting, curating, and storing data. That is the data team's most important business objective, not just building a data warehouse and dumping everything there.
There is structured data and unstructured data. What are the challenges of both types of data? How would a data scientist approach and manage both types of data? Structured data has clearly defined data types, and its patterns make them easily searchable.
Unstructured data is everything else. Unstructured data includes things like audio, video, social media postings, text files, email, text messages. This data is not easily searchable because it's difficult to understand and not organized in a pre-defined manner or data model.
To analyze and utilized unstructured data, data scientists use data mining, natural language processing (NLP) and text analytics to find patterns and interpret the information.
What are popular applications and use cases for artificial intelligence and machine learning data? Artificial intelligence and machine learning data are already used all around us. It will continue to refine products and how we interact with them and allow us to make decisions faster and drive efficiency in repetitive work. Some examples of AI and ML usage today:
Customer Support – AI can use natural language processing to interpret words, data, and apply contextual and reasoning algorithms to generate insights on customer needs.
Competitive Positioning – By analyzing sentiment, you can assess email and social media streams to detect mood, even from photos and videos. Based on sentiment input, systems can create more targeted marketing and interactions.
Autonomous Vehicles – Autonomous vehicles have a lot of sensors they use to understand the environment around them and make decisions. That requires an enormous amount of data collection, matched with extreme processing power, to run models in seconds.
Medicine – Radiologic image evaluation helps a radiologist identify areas of interest to investigate in more detail.
Predictive Modeling – Data can be used to predict the likelihood that a specific event will occur, like the risk of overdrafting your bank account.
Media – Data can be used to analyze incoming news events, identify topics and AI can automatically generate articles and stories.
Where do you think more innovation needs to be applied to AI and ML? How can it be improved? Some challenges with AI and ML are privacy and bias.
It takes an enormous amount of data to train artificial intelligence models. We have to be vigilant to use data in a way that is not personally identifiable and that a person's data isn't used or retained if they don't want it to be.
Second is bias. Data must be used for the reason it was gathered. That reason must be transparent and truthful. Sadly there have been stories of companies using people's data irresponsibly and using it to create unfair bias in credit evaluations and lending (such as gender, all other things being equal). That's not appropriate.
Thank you, Bridget! As businesses demand faster time to insight than ever before to remain competitive, more industries are leveraging an abundant amount of data from a growing number of diverse internal and external data sources with varying degrees of trustworthiness. Join Athena as we learn how to master the art of good data for AI/ML.
Bridget will lead the webinar panel with top industry leaders who have changed how their organizations gather and leverage data to improve products and services. Meet this impressive group of data leaders who help teams effectively perform their work in the world's data-driven environment. CLICK HERE to Register.
Comments