Dan Goncharov, Francesco Mosconi, and Dave Elliott
Dan Goncharov, Head of AI and Robotics at 42 Silicon Valley, had the idea of gathering information from scientific papers. After getting more interest, Francesco Mosconi, Google Developer Expert and CEO and co-founder of Zero to Deep Learning, Dave Elliott, from the Google Cloud team, and volunteers including students and alumni from the 42 Network joined in working on this project. We interviewed them to find out more about the project.
We also interviewed the 42 Network students and alumni working on the project as well, which you can find here.
Tell us more about the project and what you’re working on.
Francesco: With Dan, Dave, and a group of other people, we got together in a volunteer project to collect and analyze all of the existing scientific literature, almost 4 million scientific papers. The goal is to make it available to researchers and the broader community for accessing information really quickly and accurately. Currently, scientific information is partly siloed and even the ones that are accessible are not easily searchable. We’ve talked with subject-matter-experts that told us their stories of the hurdles they encounter when going about their job of researching new drugs, or problems that involve biomedicine, and with that knowledge we went out to solve some of the problems that they have – specifically the problem of retrieving and organizing information.
We took an approach that is state-of-the-art, that involves both techniques from information retrieval as well as from machine learning and deep learning, which is neural networks. We have a team of about 20 people and they’re all volunteers. Some come from the ML-GDE community, the Google Developer Experts, Machine Learning community, and some are from 42 – and they’re doing amazing progress and we’re very happy about that.
Google has also been a tremendous partner. Dave represents Google and they have given us access to some of the fastest computers in the world. They’ve also sponsored us with some credits on their platforms so that we can have the compute infrastructure to run the projects. Recently we’ve received interest from Stanford and other entities so we’re building partnerships to expand the reach of our project at this stage.
Why do you think this is the most impactful thing you can work on right now?
Dan: The main focus right now is to assist in the research of COVID-19.
Francesco: That’s how the project started, and what motivated us in the first place. It’s an intersection of what we can do, what we are good at, and what is needed. Probably the thing that’s most needed are respirators for ERs and vaccines. Unfortunately, we are not experts in making either of those, and so we took what skills we excel at, and decided to find a project that could be useful to researchers. And that’s why we interviewed a lot of them, because we wanted to make sure that what we are building can actually help them.
Dave: From Google’s perspective, my role is to work with software developers who are interested in implementing AI into their apps and projects. My team is a bunch of engineers who go out and talk to developers and show them how to do things better and faster using our infrastructure, so this is a perfect fit for a collaboration.
We have folks who are really deep in ML and are ML GDEs, we have folks who are great overall computational scientists at 42, and we have folks who are individual experts at various parts of the infrastructure and methodologies who are at Google. So we’ve been bringing in those experts we needed to move the project along as quickly as we can. And it’s been really exciting and interesting to do this project because COVID-19 is a compelling use case, but it’s also an interesting domain space because there’s now an untenable amount of medical and research information that exists and continues to grow at a faster pace and at the same time there are state of the art breakthrough tools that let you understand that information and to identify knowledge and to find latent insights in all of that research.
How are you measuring success?
Dave: I think usage at the end of the day. We’d love to be able to say, “Our tools are used to find the cure for COVID-19.” I mean that’s the ultimate goal – to find the vaccine, to find therapies for COVID-19. That would be the ultimate outcome and the success we’re looking for. But I think more realistically, it’s usage. We’ve done a fair amount of work, working with our intended users. To me, a success is getting the data set out, getting the tools out, and seeing it actually get adopted by this target audience.
What tech stack are you using?
Francesco: We’re using a bunch of tools. We’re definitely using a lot of the Google Cloud infrastructure for collaborating or prototyping, we’re using Cloud storage to store our dataset, we’re using a tool called ElasticSearch, for ranking the documents and retrieving results. TensorFlow is our framework of choice for building neural networks. We’re also using Apache Beam, BigQuery, and we will also likely use some other infrastructure tools for inference, like Google AI prediction platform.
How do you feel that corporations like Google and other large corporations should continue to contribute to these causes moving forward?
Dave: There’s been a lot of media coverage on large companies, and in particular, large tech companies collaborating to help solve the current crisis. I think we [Google] publicly said we’re contributing $850 million and we have a huge number of people working on some aspects. There’s so many projects inside of Google to help out with the current crisis that it’s hard to even keep track. And even with this one, I’m just peripherally involved with. It’s really driven by our partners, the ML GDEs and 42. I’m really just supporting them. This hardly counts from a corporate perspective, but it’s certainly something that a lot of companies have been taking very seriously and are trying to do what we can.
Francesco: I cannot emphasize enough that although the initiative is ours, Google and Dave, specifically, have been a tremendous help in terms of providing resources.I think the biggest one has been access to information and experts, because we are all volunteers at different levels of our journey in mastery of machine learning and these tools. Dave has been amazing in connecting us with people who can answer our questions.
How have you seen the role of our 42 Silicon Valley students and other students in the industry contributing to this effort?
Dave: I’ll point out that they’ve been fantastic. They’ve been terrific in delivery. Dan and the 42 team have really led the effort on the dataset collection which is a really complicated and challenging process, and it’s one of our biggest points of differentiation. And building the pipeline, they’ve been great at.
Find out more information about the project on their website.
published by Kim Alvarez – May 8, 2020