How to Build an AI Factory

Here are five important AI stories from the week.

How to Build an AI Factory (ZDNet)

In their soon-to-be-released book, Competing in the Age of AI, Harvard University professors Marco Iansiti and Karim Lakhani discuss the four most critical ingredients to building an AI-first company.

  1. Invest in a well-functioning data pipeline to clean and integrate data.

  2. Develop or modify algorithms for your tasks, drawing upon supervised, unsupervised, and/or reinforcement learning, as necessary.

  3. Build an experimentation platform to quickly perform tests on new data with new machine learning algos and approaches.

  4. Deliver data and predictions via APIs; focus on the ease of use by your end consumers. Invest in the cloud and use standard off-the-shelf components for this infrastructure so end consumers can pick up your outputs and use them in their workflow easily and at scale.

John Deere + Artificial Intelligence in Agriculture (IEEE Spectrum)

The agriculture giant John Deere now considers itself a software company. Its equipment serve as IoT devices, collecting millions of data points per second; in fact, it has more data than Twitter does. This data is fed into machine learning models to help farmers achieve better results from farming (i.e., more and better crops with less fuel, seed, fertilizer, etc.). John Deere is also betting on autonomous vehicles for its farming machinery. For a company that is not typically known as a software company, John Deere has as many software engineers as it does mechanical and electrical engineers.

How Spotify Shapes Your Music Experience (OneZero)

Spotify logs one terabyte of user data per day and applies machine learning to it to provide a personalized listening experience. The personalization affects what you see on your personal home screen in the app, your curated playlists such as Discover Weekly, and what song plays next for you in an auto-play playlist. To collect this user data, Spotify captures what songs you listen to, what you skip, when you listen to music and what type, your location data, and what users that have similar music tastes to you listen to. This read does a great job laying out just how much data Spotify captures on users and how it uses it to deliver a better and personalized listening experience.

Microsoft Develops AI Bot to Comment on News Articles (Vice)

Using the latest NLP techniques, Microsoft’s bot - known as DeepCom, short for “deep commenter” - automatically reads articles, picks up the most important points, and then generates comments based on the points. These comments are intended to engage real human readers to debate in the comments, drawing more and more views for the article. This “fake” AI bot has benign intentions - to draw people in - but this is the type of tech that many people worry will make the problem of fake news on the internet much worse.

The (Losing) Fight Against Fake Content (Wired)

Deep learning has led to an explosion of generative content in text, images, video, speech, music, art, and more. Some of this is incredibly beneficial to humans such as auto-generative text in emails and text message, auto-captioning of videos, and AI-created music and art. But, the same tech is being used for nefarious purposes, including propagating fake news and videos, duping humans and leading to widespread misinformation. This article by Wired does a great job explaining just how difficult it has become to spot the fakes from the real content.

More Stories Worth Reading and Watching…

Why Is BERT A Game Changer (Towards Data Science)

Ankur Patel
Facebook's RoBERTa, the Next Best Advance in NLP

Here are five important AI stories from the week.

RoBERTa, the Next Best Advance in NLP (Facebook)

Since Google launched BERT late last year, there have been several improvements along the way such as OpenAI’s GPT-2 and XLNet. Facebook has launched its own improvement, RoBERTa, which produced state-of-the-art results on the most popular NLP benchmark known as GLUE. To build this, Facebook made some adjustments to Google’s BERT architecture but also trained on more data and for longer. Google’s and Facebook’s commitment to BERT matters because BERT is inherently a much more scalable solution, relying on semi-supervised NLP approaches (e.g., partially labeled datasets) versus the more mainstream supervised approaches (which requires lots and lots of hard to come by LABELED data).

Amazon Launches DeepRacer, a 1/18th Scale Race Car (Amazon)

To accelerate the field of autonomous driving and reinforcement learning and to integrate programmers more closely with its SageMaker offering, Amazon just launched a $399 miniature racing car. Amazon also launched a racing league for experts and hobbyists to compete.

Why Machine Learning Projects Fail (Nature)

Google’s Patrick Riley calls for greater scientific rigor in machine learning model development and productionization. Many times, machine learning works “great” in the lab and then does awfully in production. Common problems include splitting the training and test sets inappropriately, not explicitly modeling hidden variables (e.g., a seasonality component), and targeting  the wrong objective (i.e., the model is developed to answer the wrong problem).

For novices and experts alike, Google has good resources on machine learning.

Weather, A Deluge of Data, and Artificial Intelligence (EOS)

Machine learning is being successfully applied to many data-intensive fields such as autonomous vehicles, image and voice recognition, finance, marketing, and healthcare. Weather is a similar field with similar types of problems - there is too much data and that data needs to be analyzed and inferred from in near real-time to support forecasting. This article does a great job exploring how machine learning could help in forecasting weather better.

Growing Trust in Machines and the Consequences (The New Yorker)

As machines become increasingly ubiquitous both in the household (Siri, Alexa, Nest, etc.) and in the workplace (via machine learning models and software), humans are becoming more trusting of automated decision-making by machines. In other words, performance of these machines is becoming more important than interpretability; even if you don’t know how or why a machine has decided a particular action, you come to accept it. Over time, this could lead to machines in the driver's seat, while humans just tag along for the ride. The New Yorker does a beautiful job discussing this in the article.

More Stories Worth Reading and Watching…

Trends in NLP from ACL 2019 (Milhail Eric)

The Deep Learning Formula for NLP (Explosion AI)

A Primer to All Things NLP (Towards Data Science)

Ludwig & Democratizing Machine Learning for the Masses (Science)

Using Machine Learning in Tableau with Algorithmia (Tableau)

AI in Clinical Development (Nature)

Tesla Promises Large Autonomous Vehicle Roll-out (Axios)

Ankur Patel
Microsoft Bets Big on General AI

Here are five important AI stories from the week.

Microsoft Bets Big on Artificial General Intelligence (The New York Times)

Microsoft invests $1 billion in the A.I. research lab founded by Elon Musk and headed now by the former head of Y Combinator Sam Altman. OpenAI’s successes to date include releasing a very impressive language model called GPT-2; OpenAI made headlines earlier this year when it chose NOT to release the code because it feared releasing the code would lead bad actors to disseminate false information using the model. OpenAI also designed an AI to beat the world’s best players at a complex strategy-based video game called Dota 2.

According to Sam Altman, OpenAI will focus on building a quantum computer next. What does Microsoft capture from the deal? A portion of profits. Microsoft also will eventually become the sole provider of cloud infrastructure for OpenAI. Fortune also did a great job covering this transaction.

SoftBank Launches Massive $108 Billion Fund to Invest in A.I. (CNBC)

A few years after launching its initial $100 billion Vision Fund, SoftBank is at it again, expecting to raise $108 billion for this second Vision Fund. Other likely investors include Apple, Microsoft, Foxconn, and several major financial giants. Its mission this time: to “facilitate the continued acceleration of the AI (artificial intelligence) revolution through investment in market-leading, tech-enabled growth companies.”

Autonomous Driving Proves Harder Than Expected (The New York Times)

Autonomous vehicles perform very well when they do not encounter abnormal situations such as cars or cyclists or pedestrians running lights or inclement weather. Although autonomous vehicles perform well in most driving conditions, they struggle with the edge cases. And since driving poorly has potentially fatal consequences, autonomous vehicles cannot yet drive on open roads. In other words, self-driving cars are almost here but not quite yet.

Panic over Data Privacy After Russian App Goes Viral (The Washington Post)

Two weeks ago, a photo-transforming app went viral; users were able to upload photos of their face and see “aged” versions of themselves, courtesy of synthetic image generation powered by machine learning. The app almost magically transformed faces, adding in wrinkles and graying hair. But, users of the app eventually realized the app had been developed by a mysterious Russian firm, creating some paranoia about what would happen to the data they had made available to the Russian firm. Concerns over data privacy are on the rise as users become more aware of how their data is being used.

How StitchFix Uses BERT for AI in Retail (StitchFix)

Google released BERT late last year, and many firms such as StitchFix are rapidly adopting the model for use in their core business. At StitchFix, stylists use notes provided by consumers to find the most suitable clothes. Instead of relying on just humans, StitchFix has an array of machine learning solutions to narrow down the search for stylists. With BERT, StitchFix is able to extract information from text and automatically map text to clothes that the consumers will like with considerably less human involvement. Here is just how it all works.

More Stories Worth Reading and Watching…

Videos from the SpaCy IRL Conference (YouTube)

For More on the SpaCy IRL Conference (SpaCy)

Turn Selfies into Renaissance Art (MIT Technology Review)

Google’s DeepMind Reaches an Important Milestone in Healthcare (Bloomberg)

How AI Could Help with Climate Change (National Geographic)

How NYC Might Use Data and AI to Surveil Cars (The Intercept)

Code Autocompletion With Deep Learning (TabNine)

Transformers Keep Setting New NLP Records (Hacking Semantics)

A Long, Comprehensive Guide to Labeling Data (LightTag)

The Many Forms of AI in Video Games (Hewlett Packard Enterprise)

Ankur Patel
The Future of AI Is Unsupervised

Here are five important AI stories from the week.

The Future of AI Is Unsupervised (MIT Technology Review)

Today’s machine learning applications need a lot of labeled data to have good performance, but most of the world’s data is not labeled. For machine learning to advance, algorithms will need to learn from unlabeled data and make sense of the world from pure observation, much like how children learn to operate in the real world after birth without too much guidance.

According to Yann LeCun, one of the fathers of machine learning and currently the chief AI scientist at Facebook, the future of machine learning will be driven by unsupervised or self-supervised learning systems. For more, please turn to this article in the MIT Technology Review or my book on unsupervised learning.

The Value Chain in Machine Learning (Medium)

Compared to a few years ago, solutions for a lot of the common machine learning tasks have been commoditized; companies have built robust solutions to help developers set up cloud infrastructure for machine learning, acquire data to train their models, clean and prepare the data, apply machine learning algorithms and perform hyper-parameter tuning, and deploy their trained models. The best way for startups to provide value in machine learning is not by trying to reinvent what has already been commoditized but rather by developing domain-specific solutions to high value business problems. In other words, let’s focus obsessively on solving the business problem not just on the latest and greatest tech.

An Overview of Machine Learning Applications Today (Analytics Vidhya)

Most people consume machine learning applications throughout the day without ever realizing it. Machine learning is not some high tech that will come in the future; it’s already here. This articles explores machine learning applications in smartphones, transportation, web services, sales and marketing, security, and finance.

Amazon To Spend $700 Million to Retrain Its People (The Wall Street Journal)

Amazon is one of the leaders in machine learning today, and it recognizes just how disruptive the technology will be to the existing labor force. In preparation, Amazon has set aside $700 million to retrain a third of its U.S. workforce — nearly 100,000 people. Non-corporate workers will be transitioned to IT support roles and non-technical corporate workers will retrain as software engineers.

Companies Are Hungry For Your Face Data (The New York Times)

Companies that build computer vision applications need lots and lots of photos to power their facial recognition technology. Over the past several years, these companies have crawled photos online to build these massive datasets and, in some cases, installed cameras in public spaces to capture this data. This article does a great job exploring just how that data gets collected and used, often without consent from the users.

More Stories Worth Reading and Watching…

Google Releases New Text-Processing Library (InfoQ)

Transfer Learning for NLP (Cloudera Fast Forward Labs)

Unsupervised and Semantic Learning in NLP (Science)

Review of the Hundred-Page Machine Learning Book (Medium)

Ankur Patel