I have a confession to make.
I was a CS major in college and took very few advanced math or stats courses. Besides basic calculus, linear algebra, and probability 101, I took merely one machine learning class. It was about very concrete SVMs/ decision tree/ probabilistic graphical models that I rarely encounter today.
I joined a machine learning laboratory in college and was mentored by a senior PhD. We actually had a couple of publishings together, though they were nothing but minor architecture changes. Now that I’m in grad school doing AI research full-time, I believed I could continue to get away with zero math and clever lego building. Unfortunately, I fail to produce anything creative. What’s worse, I find it increasingly hard to read some of the latest papers, which probably don’t look complicated at all to math-minded students. The gap in my math/ stats knowledge is taking a hefty toll on my career.
For example, I’ve never heard of the term “Lipschitz” or “Wasserstein distance” before, so I’m unable to digest the Wasserstein GAN paper, let alone invent something like that by myself. Same with f-GAN( https :// arxiv.org/ pdf/ 1606.0070 9. pdf ), and SeLU( https :// arxiv.org/ pdf/ 1706.0251 5. pdf ). I don’t have the slightest clue what the 100 -page SeLU proof is doing. The “Normalizing Flow”( https :// arxiv.org/ pdf/ 1505.0577 0. pdf) paper even involves physics( Langevin Flow, stochastic differential equation) … each word seems to require a semester-long course to master. I don’t even know where to start wrapping my head around.
I’ve thought about potential solutions. The top-down approach is to google each unfamiliar lingo in the paper. That doesn’t work at all because the explanation of 1 unknown points to 3 more unknowns. It’s an exponential tree expansion. The alternative bottom-up approach is to read real analysis, functional analysis, probability hypothesi textbook. I opt a systematic treatment, but …
reading takes a huge amount of hour. I have the next conference deadline to gratify, so I can’t only put aside two months without producing anything. My advisor wouldn’t be happy. but if I don’t read, my mindless lego building will not yield anything publishable for the next conference. What a chicken-and-egg vicious cycle. the “utility density” of reading those 1000 -page textbooks is very low. A plenty of pages are not relevant, but I don’t have an efficient way to sift them out. I understand that some knowledge might be useful some day, but the reward is too sparse to justify my attention budget. The vicious cycle kicks in again. in the ideal world, I can query an oracle with “Langevin flow”. The prophecy would return a listing of pointers,” devoted your current math capability, you should first read chapter 7 of Bishop’s PRML book, and then chapter 10 of information theory, and then chapter 12 of …”. Google is not such an prophecy for my purpose.