[LAM 5] A Short History of Software

This is a long post, and none of it is AI-generated. It was initially written to enable a friend, a brilliant chemist from Cambridge with only an everyday user's perspective on computers, to think in software. It is therefore intended as a map of all the major technologies, companies, business models and memes that have dominated the domain of software since its inception, as I understand them. No attempt is made for completeness, and large sacrifices have been made at the altar of breadth and suitability for a non-technical reader.

San Francisco, Oct 12, 2025 (LA 3)

The story of software begins just under 100 years ago with David Hilbert, a mathematician that formalized a lot of late 19th century mathematics. In 1928, he proposed a problem called the Entscheidungsproblem (“decision problem” in German). This asks whether there is a mechanical procedure for determining the truth of a particular set of formal mathematical statements. In 1936, Alan Turing constructed a theoretical machine for processing data, showed that if you could solve Hilbert’s problem you could also prove that this machine would stop on a given input, and proved that such a proof was impossible.

A byproduct of the construction of the machines in Turing’s paper was the birth of Computer Science. Turing’s construction was long and tedious but (effectively) had three main steps:

  • Define “small machines” that could run deterministic (one input gives exactly one output) functions like addition, multiplication and other manipulations. Explicitly, you can do things like 25124 + 89234, or change the second last digit of a number to 9: “53125” -> “53195”.
  • Show you can compose these small machines by feeding the output of one machine into another machine. Now you can take “machine 1” for addition and “machine 2” for multiplication and can do things like 14512 x (25124 + 89234). This proves his construction can handle a large class of functions.
  • [brilliant] Show that you can encode the definition of a machine (the program) and provide that as input (data) to another machine. This allows you to create one universal machine, called the Universal Turing Machine. This machine can simulate any other machine by taking that machine’s description as input and “behaving like” that machine on more inputs.

This third point is the reason why all computers are general purpose: why we don’t have one computer for Excel, a separate one for ChatGPT, and another for reading PDFs and Word documents. But alas, we have skipped ahead 80 years.

Starting in the 1940s, many engineers then converted these theoretical designs into actual working machines. This was when the transistor was invented, a semiconductor device that implemented the core “atom” of computation inside the “molecules” of the small machines in Step 1 above. Before the transistor these atoms were made with vacuum tubes, but semiconductors were cheap, abundant, wieldy and much more robust to failures. Interestingly, Sam Altman recently compared the invention of AI to transistors.

In this era, computers were big, expensive and mainly used for war and research applications (computing expensive mathematical functions), built by the likes of IBM. In 1965, Gordon Moore observed that integrated circuits (a by-then standardized engineering component of computers and electronic applications) were able to pack double the number of transistors every 2 years. Moore’s law sustained for almost 60 years! That’s 2^30, or a 1,000,000,000x increase. This created the microprocessor industry, and the first part-software or pure-software companies like Microsoft (1975), Apple (1976) and Oracle (1977). Interestingly, this was also the time when there was a first wave of AI research. A lot of people first developed the idea of Frames in the context of human cognition and machine intelligence at this time during this period’s AI research. People thought artificial intelligence was imminent, and used complicated systems to simulate experts, efforts that wouldn’t go anywhere because some kind of neural processing was missing.

Software Businesses

Microsoft first created an operating system that ran on top of IBM’s machines. This was licensed to IBM for something like $70 per unit (Microsoft retained the software rights and didn’t provide exclusivity to IBM). Then they licensed the same operating system software for other computer-makers like Dell and Compaq. Bill Gates executed a brilliant strategy here: he would charge their customers a low price ($20-$40), but on all computers shipped rather than just ones that ship with Microsoft’s OS. So rather than ship PCs with different operating systems, manufacturers just used the cheaper Microsoft OS on all units. This had the incredible effect of standardizing Microsoft’s OS in the industry. Microsoft effectively benefited from a couple of firsts in one of the best business models in history of business:

Zero marginal cost of production: Microsoft developed one Windows 1998 operating system, and it ran on over 25 million computers. Adobe makes one PDF reader once, and it can run on 1 billion devices without additional “manufacturing cost”. ChatGPT was developed once, and almost 1 billion people use it daily.

Network Effects: Standardization of Microsoft’s OS gave them incredible advantages with developers building applications. Think of an operating system as the “universal machine” in Step 3 of Turing’s construction above. Developers build the actual programs (atoms) that the operating system “simulates”. By having the entire industry run on Microsoft’s OS, Gates forced all software application developers to make programs that ran on top of Microsoft if they wanted anyone to use their products. Developers building apps on Microsoft means that consumers want to buy PCs that run Microsoft’s OS. It is mind-boggling how great a feedback loop this creates.

Apple ran, and largely still runs, a vertically integrated strategy. They design their own hardware and chips, and tightly integrate their operating system to the hardware to maximize performance. This is why iPhones are very fast and smooth despite having much lower RAM and CPU specs than Android phones today. Apple was also involved in creating a proto-browser in the late 1980s called HyperCard. Video

Browser Wars

Parallel to the development of Microprocessors, the US defense department (DARPA) was funding research in communication technologies for computers in the 1960s and 70s. This research created standard languages for communication (“protocols”) between computers over phone cables. Researchers from various universities were able to collaborate and email each other using this communication stack. In the 1970s and 1980s lots of researchers were communicating with each other on different proto-internets that spoke different languages. In the 1980s, something called the TCP/IP stack standardized all these protocols and every computer spoke the same language, and this created space for the world wide web.

In 1989, Tim Berners’ Lee created the first website while at CERN: info.cern.ch. He created the “url” (a standard address), “http” (a protocol for communication between computers and servers”), and “html” (your browser renders an html file fetched from http calls to show any webpage), the first browser and the first server to serve that page. This was the birth of the world-wide web. At the end of the day, every protocol or language is a standard adopted by people, and there are significant network effects dynamics. This is why most of the western world speaks English, and why there is only one World Wide Web (www).

Tim Berners’ Lee convinced CERN to open the license for the browser code. They agreed and browser development was taken up by the community, and in particular, a graduate student in University of Illinois, Urbana-Champaign. Marc Andreessen built the first browser called Mosaic at NCSA in UIUC and launched it on a mailing list post. Andreessen left, started his own company, and made a commercial browser in 1994 called Netscape. This was a paid browser and quickly gained market share (80% in 1995-1996) and started the browser wars. Andreessen would go on to create, most notably, Andreessen Horowitz, a leading VC firm that grew from $300M in AUM in 2009 to ~$46B in 2025. Interestingly, while Sam Altman compared the AI wave to the invention of the transistor, Marc Andreessen compares it to the invention of the microprocessor.

Anyway, the risk with browsers for Microsoft was that it threatened to “up-shift” the platform layer. If you imagine a stack with the hardware on the bottom, then the OS above that and finally the applications at the top, the introduction of a browser meant that people spent 80-90% of their time on one application, reducing Microsoft’s relative advantage and making the browser the de-facto platform on which network effects were built. Microsoft purchased the old code that Andreessen had written from UIUC and made it into Internet Explorer, and finally included that for free with Microsoft Windows. This got them back to 60% market share in 1997 and allowed them to win the first browser wars. Netscape sold to AOL for $10B. Microsoft was mostly dominant for the next 10 years until Chrome. More detail on the browser wars is here: link.

Internet Businesses

The Internet was a different wave than AI because it unlocked a new distribution channel: any business could be put “online” to reach more customers. This was Elon’s insight with his first company Zip2 which put businesses online along with maps and directions. He sold it to Compaq in 1999.

Amazon started with an online bookstore. You could have much greater selection on an internet bookstore or marketplace since you’re not constrained by physical space. There is a video somewhere with Bezos’ initial idea. He saw that the internet was growing at 2400% a year in 1994 and wanted to find a business plan in the context of things you could sell online. He made a list of 20 different items and realized that the books category was different from all others in one unusual way: there are 3 million different books available worldwide. This is the greatest number of items in any one category (music was the second at 200,000). So you could make this entire selection available online in a way that would never have been possible before the internet.

A third major business was online payments. As internet marketplaces like eBay and Amazon popped up and trust and fraud was rampant, PayPal (founded by Peter Thiel and eventually merged with X.com, Elon’s payments company, now twitter) ended up creating a business to allow people to get paid via their email. They found product-market-fit in eBay because eBay had millions of small sellers and buyers doing low-trust, small-dollar transactions. They used to do money orders or mail checks, and paypal provided a much better product. Very quickly sellers put up “I accept PayPal” on their listings and this gave them a convenient way to get paid online. Eventually Paypal was acquired by eBay in 2001, just before the dot com crash.

In the late 1990s Google emerged as the default search engine. Search engines had been around for a while, their core innovation was that you can rank pages like citations (chatgpt will do a good job of explaining this if you want to go deeper). Essentially, if my blog is linked-to by many other blogs it is a good blog and should rank highly on Google search results. This was combined with the product breakthrough that you so keenly observed: extremely simple, no nonsense search bar.

The first wave of Indian internet businesses were also online marketplaces that came up in the early 2000s where liquidity was typically low and constrained by physical space: these were job listings (naukri.com), matrimonial websites (shaadi.com) and real estate (99acres). If you put something online you get much better price discovery for a good product than you would in a market with fewer customers. All marketplaces are network effects businesses. Network effects seems like the dominant theme in consumer software.

SaaS (early 2000s)

In the early 2000s, lots of enterprise SaaS (software as a service) companies were created. An iconic company in this space is Salesforce, a CRM solution. In the 90s, most software was distributed by installing the application on the computer of the buyer. With the rise of the web browser, a website could be an entire application, no installation needed apart from the browser and an internet connection. Salesforce was among the first to prove that buyers would pay for something like this. Their business model turned software from upfront license + low maintenance costs with large sales cycles to an online self-service subscription with sustained monthly payments. This business model innovation combined with 0 marginal cost of distribution (the fundamental nature of software) led to an entire category of very profitable SaaS businesses.

Amazon Web Services (2004?)

Around 2004, Amazon had created significant networking and hardware infrastructure to serve their customers. They decided to make this available to other companies, creating Amazon Web Services and starting Cloud as a category and a technology wave in itself. Cloud just means that the application is hosted “elsewhere” or the cloud, not on the software company’s own premises.

Before AWS, every startup had to buy their own servers and scale during periods of demand. Some consumer companies went through a phase of hyper scaling and then eventual drop; they had to provision these resources each time and fund them accordingly, and pay through their nose for down-time overcapacity. AWS’s core innovation provided a business model improvement for smaller software companies. Now startups didn’t have to plan or buy servers up-front. This turns capex into opex and makes it easier to scale up or down very quickly.

We now just “rent” hardware from AWS and provision software, scale it up as and when needed (I pay AWS something like $150/mo for all my products right now since none of them are at scale). This is also an extremely high-margin business for Amazon, and makes up over 25% of their profits or something. Microsoft has their own cloud service called Azure (which OpenAI relies heavily on for its own datacenter and compute requirements).

Facebook (2004)

Facebook is unique and likely not a model to be replicated wholesale, but some parts maybe. You know the broad story from the Social Network, but Zuckerberg built something like 10 different products around that time when the internet was maturing and past its early majority phase.

There were lots of social media companies at the time, but Facebook seems to have gotten a few things really right:

Rapid acceptance: Within 24 hours, between 1,200–1,500 Harvard students had signed up (out of ~6,000 undergrads). By the end of the first week, over half of Harvard undergrads were on the site.

Product: There’s an interview with Mark Pincus (angel investor in FB and eventually founded Zynga), who also built some products in social networking by 2004. He says Facebook got trust right. His point was that before Facebook, online experiences were low-trust. You didn’t want to put up information online or talk to people because it was random anonymous accounts. I think I can understand what he means.

Nuclear Reactor core at Harvard: They always had an extremely strong network of early adopters and evangelists within Harvard. Extremely tight-knit community that, during summer holidays, would go to their home towns and get their brothers and sisters and friends to download the app. Anecdotally, I have a cousin here, Akash, who was studying in Mumbai at that time. He said his friend’s sister was at Harvard. She came to their college to visit and made them create their facebook profiles. At this college, they were already using Orkut, and apparently she said, “Yeah we used other apps too. But there’s something about Facebook where you just end up spending much more time on it.” Having this kind of a strong core to grow from is an extremely important thing for any network effects business like a messaging app or social media platform. Plus, this being from Harvard is a very big plus on prestige. I think even if we go down the education route, this is important: I spoke to ISB, but they will easily follow if something like Wharton or Harvard adopts Frames in any context.

Focus on Revenue: Zuckerberg was not naive on revenue despite the movie. I think within the first year and a half they were profitable. This allows you to raise money and maneuver on your own terms, and I think it’s a very important lesson.

Facebook had incredible growth and numbers, and created an entire field of “growth engineering”: A/B testing in the context of product growth. Even today, there isn’t one version of Facebook or Instagram, there are 1000s. Any engineer can add a product feature, and test it on a small percentage of users. If the revenue engagement numbers go up against a double-blind test, it is considered a success and rolled out to more users. Incredible, scientific biohacking and why these apps are so addictive. Their business model is simple, they auction real-estate on their website to advertisers. More on Facebook in this 6-hour podcast: link.

iPhone (2007)

Steve Jobs announced the iPhone in 2007. Their core breakthrough was making a touch keyboard work. Before this, blackberry was gaining popularity and you had physical button keyboards on most phones. iPhone reclaimed 50% of phone real-estate by making the touch screen appear only when necessary. There is an interesting story of how the engineers making the iPhone discovered the right “hit-box” to indicate touch. One engineer created a game with different sized boxes that you could tap on screen to score points. More hits would lead to more points. He used that game to figure out the minimum size at which most people were able to quickly hit the correct boxes.

Apart from this there is a 2007 keynote launch video where Jobs announces integrating the phone, a music library and a browser all in one-device. This was the beginning of the second computing device and the iPhone created the smartphone wave. The smartphone had lots of new capabilities that developers could use to make new apps. Some people used the camera flash capability to hack into creating a torch – this iPhone natively included in their product eventually. New capabilities you have with phones are GPS, portable camera and instant-messaging, that are all not native to desktop computers. This created a whole ecosystem for apps that used these capabilities, like Uber, Airbnb, and WhatsApp. Apple and Google charge 30% platform fees(!) to developers that make money on their platforms.

Facebook was early to this and realized people were moving to smartphones, so they did a big push to get a good app here. There is an account of a discussion between Mark Zuckerberg and Steve Jobs where Zuck explains his vision for apps and experiences on top of facebook on top of iPhone and so on, and at some point Steve Jobs stops him and says, “I respect you, but we will not let you build a platform on top of ours. 🙂”

This is why Facebook really wants to own the next platform and has spent something like $60B (which used to be a lot) on Meta AR glasses, and bought Oculus for its VR headset in 2016.

Bitcoin And Ethereum (2009-)

Bitcoin comes from cypherpunk/libertarian ideals and is a mode of digital cash. One simple way to think about it is:

If I want to pay you Rs. 500,

I can hand you the note and you can be happy since the note is a physical object and you having it means I don’t have it.

If notes were represented by a digital file, in a naive implementation, I could send you a rs500.pdf file. But this is not acceptable because information is exactly copy-able, you can never be sure that I did not delete it on my device.

So we need a centralized authority to maintain an account book of payments. This is at the central bank level for every currency.

But the problem is that a central bank is also part of the economy and can print a lot of money.

Blockchains allow you to “decentralize” the keeping of the account book, with rules that “everyone in the network” agrees to.

In the case of Bitcoin, this set of rules is that there will only be 21,000,000 bitcoins ever, that miners get paid for solving puzzles (which prevents the network from being hacked), and standard rules of payments (you can only spend bitcoin that you have). The payment for solving puzzles is carefully monitored through a design of a game where the dominant strategy for every participant is to work towards something that maximizes security and longevity for the Bitcoin network.

Vitalik Buterin was an early proponent of Bitcoin and spent a lot of time blogging about it and improving the design. He wanted to run arbitrary functions on top of Bitcoin, since people were trying to hack together different types of “programs” in addition to the simple account book that Bitcoin was designed for, but Bitcoin was not “universal” in the Turing sense (Step 3 above). Ethereum basically was that breakthrough, if you put a turing machine inside the blockchain, then you can run any arbitrary programs, not just simple Bitcoins. So you can have functions like lending, borrowing and exchanges on top of the core decentralized layer. This is an area called decentralized finance.

Stripe & FinTech (2010-2020)

Stripe and Square started in the early 2010s as payments were modernizing. The founders of Stripe (Irish brothers patrick and john collison) had sold software on mobile phones since Apple made it very easy to do that. But payments on the web were still very painful. So they got bank integrations and offered something called an API: an application programming interface. Basically a few lines of code that I can include on my website that allows me to get paid very easily. They handle payments and integration with the banks and take a 3% fee. These guys were first in this space, backed by Y Combinator, and created the entire category of FinTech (financial technology) companies. They are also brilliant and there is a term in startup culture called the “Collison installation”, which comes from a time when they would meet their prospective customers, ask them if they wanted to use the product and when they said “sure”, they’d take their customers’ laptops and install it on their website right there.

These guys also created one of the first pure-API businesses, are incredible and they founded Arc Institute, to accelerate biology research. Patrick says that we’ve never solved a complex disease as a species, and they want to change that.

Beyond Stripe and Square (a payments processor in the US), there was a wave of consumer banking internet companies which worked with existing banks as their regulated partner and provided an internet service on top of that. This allowed better distribution and lower costs. There was also something called “banking-as-a-service” which includes companies like Brex and Ramp (M2P in India), which provide corporate credit cards and other banking services as SaaS products.

AI: Phase I (2012-2022)

With 60 years of improvements in microprocessors, our computers had gotten super fast by the early 2010s. Moreover, there was a big boom in video games in the 1980s onwards, and this created specialized chips called Graphics Processing Units. Instead of doing tasks sequentially, these chips were optimized for parallel processing. If you want to calculate 200*1451 + 59*23590, you can calculate the first product, then the second, then add them or you can run the two products in parallel and then add them. If you write software that can take advantage of parallelizable processes you get a massive speedup. This application was perfect for video games because when you’re calculating what should be shown on one corner of the screen it often has nothing to do with what should be shown on other parts of the screen.

So anyways, GPUs developed for this but coincidentally they were perfect for another type of software program. Since the 1960s people have been exploring computer models of approximating some functions. I think the first use case back in the 1960s was recognizing handwriting. If you think about your handwritten number 8 in a 400x400 image (pixels): each pixel can be either of black and white. An 8 has a certain pattern (or set of patterns) in which the black pixels will be located on that 400 x 400 grid. The goal is to predict from any given pattern (a list of 400 x 400 = 160,000 numbers) whether that pattern represents the digit 8. This is a function f(x1, x2, …, x160000) -> { yes, no }. If you have labeled examples where certain patterns are labeled as 8 and others are labeled not-8, you can “train” your model to predict in the following way.

[1] Start with a random function → [2] make a (bad) prediction → [3] “calculate which direction to adjust your function” → [4] make a better prediction

This is essentially all that neural networks are, and all AI is at a functional level. More detail is here if you are interested. Step 1 is what is called a “model”. Step 2 is a generation. Step 3 is backpropagation, you propagate the error backward (uses some basic calculus). Step 4 is an improved model.

Frank Rosenblatt made the first neural network for predicting handwriting, and it did okay, but it was a toy model and it would never scale to more complicated functions. The two things that changed post the internet and post 60 years of Moore’s law was now we have LOTS of data and LOTS of compute capability. Turns out that is all it takes to make these models better. There is something called the “bitter lesson” in AI where people were trying all sorts of fancy algorithms from 2015-2020 to improve the models, but the essential idea was that you just need a bigger model (more compute and more data), not human ingenuity.

Professor Fei Fei Li (who has her own startup now) created the ImageNet dataset. In 2012, Geoffrey Hinton’s lab entered the competition with AlexNet, a “deep learning” (effectively, big) model and did surprisingly well. This got the attention of the entire machine learning community and truly started the first AI race. OpenAI was started in late 2015 or early 2016.

Things were happening in AI applied to language processing also. An early breakthrough in 2013 where people realized you can represent words as vectors (sequences of numbers), initialize the values randomly and train them in a way similar to what we have above. This captured surprising semantic properties like vector(king) - vector(man) + vector(woman) almost equals vector(queen). By 2015 Google had changed their google translate model to a deep learning model from earlier statistical approaches.

In 2018, there was an algorithm called the Transformer from the Google Brain team that significantly improved the efficiency of neural machine translation tasks (converting spanish to english, etc.) by parallelizing the core algorithm using something called an attention mechanism. This architecture has a seminal but badly written paper called Attention Is All You Need that led to ChatGPT. I learnt recently they were inspired by the movie Arrival.

AI: Phase II (2022-)

I am putting this in its own class, because this kind of AI is different for many reasons. It took 4 years for the transformers architecture to mature and deliver results, but in 2022, ChatGPT happened. This was a step-function improvement in reasoning capability and language models suddenly had human-level intelligence for the first time. This creates many new things:

Human-level thought partner.

Kicked off a broad industry effort towards AI agents that would automate all human cognitive labor.

A new resource in software development. When cement was invented, you could make fundamentally new “kinds” of buildings than you could before that. AI can now be used in software to do new kinds of things.

Marc Andreessen (Mosaic, Netscape, a16z) has a frame for thinking about startups in a technology wave. In any platform shift, there are lots of new companies that are started. Every company is either in hill-climbing or hill-finding mode. When a market is found that the technology can deliver new value on, everyone in the market turns to hill-climbing mode.

Hills that have been found, as I see it:

  • Consumer AI chatbots (OpenAI, Claude, Grok), obviously. These are usually built by the big AGI labs.
  • Coding co-pilots: these are companies that use the foundation model companies like openai/claude’s APIs and embed an AI that generates and explains code right in the code editor, where everyone writes code. (Cursor grew to an $18B valuation in ~2 years, on real revenue of over $300M. Lovable.dev should also announce a $10b-ish funding round soon I assume)
  • Model inference infrastructure: companies that provide intelligence-as-a-service via an API, like Stripe provides payment capability as an API. Examples are Groq (q, not k) and cerebras, and they compete with core AGI labs.
  • Voice AI is very hot right now. VAPI, elevenlabs, etc. have raised lots of money and seem to be doing well.
  • Customer support chatbots (sierra.com etc), but this seems more like something that incumbents will do.
  • Consumer AI companions: Character AI is a standout example. Founded by former google brain researchers, they had bots that you can create with personalities, traits, voices and photos (just like Frames allows you to create your own, but back in 2022 :’)).

Smaller/questionable hills seem to be:

  • Meeting recorders and transcription: (granola AI and cluely have raised big rounds). This is why when you suggested a meeting recorder, I thought that’s a good idea and am integrating into the browser.
  • Hardware: People are experimenting with new form factors. AI toys, AI lockets, other kinds of hardware.
  • Education: I think education is fundamentally transforming. Some people think ChatGPT is actually the most popular EdTech product out there (play on categories), but broadly rings true. There is NotebookLM but also paid apps that help with homework on the iOS app store that do okay (Gauss AI). Sanjay uses this app called Turbolearn AI that records lectures.

False (or at least not-yet) hills:

  • AI Agents: This is something that really tears me. I started building AI agents back in 2023. Agents are defined in many ways, but one of the ways is just, “AI that can do tasks for you autonomously”. I believe this is not the right hill to go down, or make a business in. It is too hot a space, does not seem to be working at the moment, and ChatGPT and other chatbots seem to be already making progress here.
  • Legal Work: Harvey and lots of legal companies seem to have going after legal work. The problem is that LLM-generated output should always be treated as a hypothesis. The problem with legal work is that the cost of verifying an LLM output seems to be very high. This problem does not exist in code copilots. You can just run the code and see whether it works. You cannot “run” a contract, at least not yet.

People compare this wave to the internet, some compare it to transistors, some to microprocessors, some to smartphones. One way to think about how big the AI wave is at what level the incumbents get disrupted and things get remade. Lots of people are experimenting with new form factors and physical hardware, and the product experience capability is high enough that lots of software and hardware companies will get replaced. I think this is also a fundamental computer science breakthrough, in the sense that you can “execute” a larger class of functions than were possible in Turing’s time.

In addition, AI coding is one of the transformative waves to hit software development and it is changing a lot of things around making software companies. If the cost of production of a good crashes, what kind of structure does it create in the industry? Software does not seem to be the primary moat in software businesses (at least once the market hill has been identified and semi-matured), but AI assisted coding makes distribution, network effects, taste and design even more important. I am ~3x more capable of making products now than I was 2 years ago, before ChatGPT. I would not have been able to make 2 enterprise AI agents, 2 iOS apps, a web app, a PDF reader, and nearing a browser fork and customization in 5-6 different programming languages, without AI changing how code is written and run. Essentially, I feel a lot of agency across my codebase. I’m guessing the best AI developers everywhere feel this.

Super Intelligence (2026?-)

I would be remiss if I did not mention at this time that a good chunk of Silicon Valley, including Elon Musk, Sam Altman, etc. have very aggressive timelines for something like “super intelligence”, probably as soon as 2026. Their main reasoning is one kind of model architecture where you are able to train the AI within an environment to self-play and self-improve. This started to work in mid-2024 on top of LLMs and has done well in AI coding and AI math demos. One of OpenAI’s definitions is “able to do 99% of economically valuable work”, but I think that is flawed because the economy is dynamic relative to supply of economically valuable work. My definition is, given all the information that Einstein had when he came up with relativity, if AI can independently come up with the theory and suggest the right test to prove it, that would be superintelligence. Unfortunately I do not have the apparatus to test it at this time :).

There is a long blog post about this written by a hedge fund manager that got very famous about a year ago. It’s called Situational Awareness and is very influential in Silicon Valley. But about half of the people within the field do not believe it is imminent. I will have more information on this by October 15, after spending time in SF.

Thinking In Software

Sorry if this piece was long. The point was to give you all my context and enable you to think in software, and in AI. At the very least it should help you understand where I’m coming from.

Some other points and frames that didn’t fit neatly anywhere in the above narrative:

AI products have OpEx: Each LLM API call costs a small amount of money. Rs. 0.01-0.05 on avg per call. This is different from earlier software where operational cost (apart from AWS hosting) was 0.

Product-Market-Fit: I mentioned product-market-fit (PMF) a few times above. The definition is what the words say, a time when your product so appropriately fulfills the need of a particular customer or segment that the market pulls it out of you. There is a piece from Andreessen that spells this out, but the broad idea is that every startup has two phases, before PMF and after PMF. Nothing matters if you can’t get to PMF. This is also why I think Perplexity and Cursor are fundamentally different companies. Perplexity doesn’t have PMF and is overvalued at $20B and Cursor has massive PMF and is appropriately/under-valued at $18B.

Growth: People have asked me before, what is a startup? Is it a small business? The answer is given by Paul Graham (founder of Y Combinator, famous startup accelerator). A startup is defined by growth. When startups work, they grow very quickly. Applying the above idea of PMF, this usually happens when the startup reaches PMF. Zero marginal cost of production means if it works (really works) for one customer in a specific market for a specific use case, it likely works for everyone in that use case.

In the beginning, it is much better to have some customers that love you rather than lots that kind of like you (your product 🙂): This is also YC advice, but it makes sense if you follow the thinking of the above point from PG. You want to iterate your product with a narrow set of customers (ideally a small niche in a large market), and really make it so that they love your product so much that you reach PMF and then you scale and go after bigger markets. If you have a lot of people that kind of like you, you are trying to boil the ocean like I was saying earlier. Maybe okay eventually, not in the early days.