Post

But why are AI academics depressed at all?

Some thoughts on the piece "Choose Your Weapon: Survival Strategies for Depressed AI Academics"

Recently, Julian Togelius dropped this piece on arxiv and mentioned strategies for surviving as an AI academic in 2023 on twitter.

alt

This is a very recent phenomenon, where people are increasingly getting anxious about their role in academia - especially Master’s and Ph.D. students. Some people contemplate if their fields are “dead” or considered solved, while others are desperately searching for tasks that big corporations haven’t taken over yet with their large models. As a fledgling academic myself, I am very optimistic that academics have nothing to worry about because all the recent progress we see by industry is not necessarily research progress. To see this, we have to start with the basics.

First, let’s just go over the dictionary definitions of research and engineering.

Research:

Engineering:

The engineer’s role is to come up with designs with the idea of making things work. This can be (and is generally) done artfully and does not need to consult the truth of why it works. On the contrary, the researcher’s primary job is to investigate systematically the properties to establish facts and reach new conclusions.

However, this sounds very far from what today’s researcher is trying to achieve. Today’s researcher is obsessed with beating SOTA by some other rival “research” group, essentially making them engineers who are obsessed with being at the top of their benchmark or model. This has worked fine for the last few years in deep learning, until “Big Brother” AI companies step in and obliterate these efforts. This is the primary cause of depression in academics whose efforts are more engineering-centric.

Don’t get me wrong, I think the AI industry has enabled great scalability and accessibility, but they are not driven by finding out the truth, but rather by just making things work. Emergence is a property that makes things work, but companies who train these big models (DALL-E, GPT, StableDiffusion) are not even a tad bit interested in how or why it works, or how emergence even happens in these systems. This is the core difference between industry and academia in AI. Emergence just works, and that is the engineer’s haven - to scale things with huge amounts of compute and data. Even then, patchwork like RLHF cannot escape the curse of dimensionality, combined with the combinatorial nature of the problem.

This is where academics come in, with mitigating the burden of compute and data by finding structure in the problem domain. Our brains do not have infinite capacity, nor are they preloaded with everything in the universe, seemingly suggesting that we use some underlying structure of the world we live in. In my opinion, the structure of intelligence is what AI academics should strive to dig and find out, analogous to how physicists try to find the structure of matter and their interactions in the universe.

Again, physicists are not obsessed with building structures like rocket ships or Dyson spheres, that’s for the engineers to build, with their ingenuity and artfulness. This does not suggest that science and engineering are mutually exclusive, but the fact that scientists and engineers need to have different mindsets based on what they are trying to do (including the realization that they may be on the wrong team). The influx of engineers masquerading as researchers has brought us to this problem of existential crisis for the modern AI researcher.

With that in mind, here’s my review of the strategies proposed in the paper (please read the paper first if you haven’t - I’m not gonna describe it again here):

  • Giving up: I hope its meant to be funny or sarcastic. Giving up on doing high-impact things should be unacceptable to the researcher. But if the researcher believes their goal is to make things work, they are gonna lose to the industry giants anyway. They only stand a chance at making a noticeable impact as a researcher only if they can strive to find universal truths about intelligence.

  • Try scaling anyway: Good luck! Again, are you being a researcher or an engineer here anyway?

  • Scale down: This is one way in which the researcher can conduct experiments without worrying about compute. One gotcha here is that people tend to invent their own niche with this toy environment land, and one should also have an accompanying inductive argument showing that if the problem scales, the solution does so too.

  • Reuse and remaster: Could be useful with understanding properties of these larger models without explicitly running them.

  • Analysis Instead of Synthesis: Read 4. Sure, its a valid approach to doing research, but is everything worth analysing?

  • RL! No Data!: RL is essentially “search”, and I fundamentally do not like this subfield because of the sheer amount of papers that contribute nothing. RL is brittle, and lot of papers modify/tailor the environment to make their method look sweeter. This is comparatively lower in computer vision where the benchmark datasets and evaluation criteria are fixed.

  • Small Models! No Compute!: Again, useful to analyze and observe behaviors of models without huge amounts of compute. Its useful for some fields that the author mentions, but also useful if it utilizes some structure in the problem.

  • Work on Specialized Application Areas or Domains: This is an escape hatch, and disruptive startups use this idea to stay under the radar to avoid clashing with incumbents (thanks entrepreneurship classes!)

  • Solve Problems Few Care About (For Now!): Similar to 8, but also it needs to have real-world impact. If few people care about a problem, ask why. Similar to startups, most fail because they jump in to pursue a “cool idea” without doing relevant market research on whether people even want it (even if they don’t know they do). This needs foresight and a bit of luck, and lots of discussions with people who may care about, and why those problems haven’t been attended to.

  • Try Things that Shouldn’t Work: Very useful for uncovering truths about why these things do not work. This has real-world impact because provable statements about why things do not work is tremendously useful.

  • Do Things that Have Bad Optics: I perceive them as “fun projects” and are oriented more towards the engineer than the researcher. But hey, if you find something that sticks out of the norm, let me know.

  • Start it Up; Spin it Out!: Not related to the goals of a researcher. Next!

  • Collaborate or Jump Ship!: In my opinion and the way I described the academic goals, collaborating with industry is not even necessary. It’s a good thing to work with folks from companies, experience corporate culture, and earn more than measly PhD stipends, but by no means is it necessary to conduct good research.

I do not like Section 15 in the paper, it paints a picture that academia is helpless without access to industry scale resources and active sharing of IP and collaboration is needed on the part of AI industries, which I have argued is not necessary.

Especially this cryptic statement - “On the other hand, a lot could be gained from having academics contributing to your code and training for free.” - like what?

Overall, I really liked this paper and the points it made, although I do not necessarily agree with everything. This made me think, however, about the role of academics and engineers in this era of supercharged AI advancements. I’ve heard stupid statements like “industry is leading AI research because of ChatGPT”, “computer vision is dead”, “eventually academics will find a task that industry doesn’t look at/care about”. If you agree with what I think about the role of academics, there is no reason to worry about anything, really. It is really tempting to see all the hype and feel the FOMO, but that is the primal brain urge you gotta resist, and make more use of your cortex 😉

Cheers!

This post is licensed under CC BY 4.0 by the author.