Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 9/27/2023. If you missed last week’s readings, you can find it here.
Important announcement: We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/
The AI Valley newsletter is a resource for people who want to get smarter about AI for Productivity/Business. Get smarter about AI and Tech in 3 minutes — with a touch of humor | Join 71,000+ people from Google, OpenAI, Notion, Apple. Sign up over here.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you atleast catch these works.
Enam Hoque from our lovely cult was involved in creating this exceptional paper. Benchmarks are a great contribution to AI Research/Engineering, and I’m excited to see how this goes. I feel like a proud papa (although most of you are older than I am)
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning — which distinguish between its many forms — correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.
When GPT-4 first came out, I attracted a lot of flak for saying that LLMs don’t actually understand language (they just have form associations between how words are related to each other). This is another great overview
We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was the ninth Chancellor of Germany”, it will not automatically be able to answer the question, “Who was the ninth Chancellor of Germany?”. Moreover, the likelihood of the correct answer (“Olaf Scholz”) will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if “A is B” occurs, “B is A” is more likely to occur).
We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as “Uriah Hawthorne is the composer of Abyssal Melodies” and showing that they fail to correctly answer “Who composed Abyssal Melodies?”. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as “Who is Tom Cruise’s mother? [A: Mary Lee Pfeiffer]” and the reverse “Who is Mary Lee Pfeiffer’s son?”. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse.
Code is available at: https://github.com/lukasberglund/reversal_curse.
I’m personally not a huge podcast person (this edition will feature a few) but someone recommended this to me, and I fell in love. This episode changed my views on Nietzche and Dostoevsky (2 of my favorite authors) and really helped me contextualize some of their writings. Massive shoutout to Darryl Cooper for his research here.
There’s a quote always attributed to Winston Churchill — falsely, I think? — that goes something like, “If you’re not a liberal at 20, you have no heart. If you’re not a conservative by 30, you have no brain.” I’ve got a different version that I like a lot better, and it goes, “If you’re not reading Nietzsche at 20, you have no heart. But if you haven’t transitioned to Dostoevsky by 30…” In this episode, I look through the lives and work of the two 19th century existentialist authors, who have a great deal in common, but who, in the end, couldn’t be more different.
A great showcase of how good design and leveraging domain understanding (in this case knowledge human biases and shopping behavior brings more value than more mechanical business ventures. Too many groups try to overengineer on metrics instead of trying to improve user experience. It’s expensive to make a train run quickly, it’s cheap to install WiFi on the train so that people don’t complain about a long-train ride.
How did this one retail giant turn shopping for basic items like toothpaste and shower curtains into a form of therapy for a very specific target demo: young suburban moms ???
A recurring feature in my writing, How Money Works puts about absolute bangers when it comes to business and finance. His videos are always insightful and shine a light on the machinations in business.
Companies are under a lot of pressure from investors and customers to reduce their environmental footprint. Carbon offsets were an easy solution that for a while kept stakeholders happy. Unfortunately for the companies patting themselves on the back for their ESG efforts, people eventually realized that these offsets have achieved nothing. Major investors have lost hundreds of millions of dollars in just the last week, and now a multi-billion-dollar speculative market that you weren’t supposed to know about is imploding… Any time you see a company promoting a product as carbon neutral it got that status by buying a security off an unregulated financial market…
So it’s time to learn How Money Works to find out how a well intentioned plan turned into a multi-billion dollar bubble.
Judd Legum put out an incredibly detailed piece about how several groups that pretend to care about child hunger also aggressively lobby against policies that can reduce child hunger.
In 2021, childhood poverty in the United States dropped to 5.2%, the lowest recorded level since measuring began in 2009. According to the U.S. Census Bureau, this historic low was chiefly driven by the federal government’s one-year expansion of the child tax credit (CTC)…
Nevertheless, several companies that participated in the anti-CTC lobbying effort insist that they are committed to ending childhood poverty and often boast of the millions they contribute to the cause.
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model’s parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
Millions of videos posted every day, and somehow, my Instagram feed makes me exhale sharply through my nose very often. How is this made possible for billions of their users? In this issue, we’ll be looking at –
- 📸 How does Instagram personalize posts and scale it?
- 🩺 How can LLMs augment the use of specialized medical AI models?
- 🛵 How does Swiggy predict food delivery time?
Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user’s mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA.
When we define a class in Python, it is possible to dynamically add new attributes to its objects during run-time.
However, this is not always recommended because:
- It may lead to bugs if the code assumes all class instances will always have the same attributes.
- It makes it difficult to debug code when objects keep on accumulating new attributes.
- It leads to a conflicting schema, etc.
Can we restrict this dynamicity?
Of course we can!
Defining a slotted class helps us achieve this.
Simply put, it allows us to fix the instance-level attributes a class object can ever possess.
In this talk, I hope to pull those ideas together, into a unified theory of Engineering strategy, with a particular emphasis on how you can drive strategy even if you’re not the company’s CTO. Another way to think about this talk, is that I hope to “Solve the Engineering Strategy Crisis” that so many people keep emailing me about.
This week I’m summarizing What Predicts Software Developers’ Productivity? by Google developer productivity researchers Emerson Murphy-Hill, Ciera Jaspan, Caitlin Sadowski, and their colleagues. This study aimed to identify the factors that most strongly correlate with productivity, and learn whether these factors differ across companies.
While other similar papers have provided insight into the factors that affect productivity, this paper gives some insight into which factors may be more important to prioritize.
Emerging victorious from the Battles of Plassey and Buxar, The East India Company cements its grip on power in India.
Join Anita and William as they discuss the nature, horrors, and key figures of ‘Company Rule’ in India at its height in the early 19th Century.
Shoutout to Max Kless for the recommendation.
Always a good day when 🎙Patrick Akil drops a new episode.
Beyond Coding is a weekly podcast with fireside chats on tech, entrepreneurship and career journeys. Common topics are: software engineering, leadership, self-improvement and entrepreneurship. Authentic, informative and inspiring. That’s the aim for each episode.
If you find AI Made Simple useful and would like to support my writing- please consider getting a premium subscription to my sister publication Tech Made Simple below. Supporting gives you access to a lot more content and enables me to continue writing. You can use the button below for a special discount for readers of AI Made Simple, which will give you a premium subscription at 50% off forever. This will cost you 400 INR (5 USD) monthly or 4000 INR (50 USD) per year.
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819