Register now for better personalized quote!

GitHub built a new search engine for code 'from scratch' in Rust

Feb, 09, 2023 Hi-network.com
Image: Luis Alvarez/Getty Images

The Rust programming language continues to grow in popularity and now developer platform GitHub has used it to build its new code-focused search engine, Blackbird. 

Instead of perusing forums for answers, GitHub wants users to use its search engine, which is currently in beta. 

Also:Memory safe programming languages are on the rise. Here's how developers should respond

Rust is consistently the most loved (but not most widely used) programming language among developers, according to developer question and answer site, Stack Overflow. 

Developer

  • It's the end of programming as we know it -- again
  • Developers feel secure in their jobs, but they're still thinking about quitting
  • The future of the web will need a different sort of software developer
  • The best Linux laptops for consumers and developers

As a new project, it is an interesting reference for Rust, which is usually adopted for building new features in projects previously written in C/C++, and is popular for systems programming versus building apps. The CTO of Microsoft Azure last year declared all new projects should be written in Rust over C/C++ because of its memory safety features.  

But why build a search engine from scratch when GitHub could use another open-source solution, such as Apache Cassandra, Solr, or Elasticsearch?

"At first glance, building a search engine from scratch seems like a questionable decision. Why would you do that? Aren't there plenty of existing, open source solutions out there already? Why build something new?" writes GitHub's Timothy Clem. 

His short answer is that GitHub hasn't found success using general text search products to powercodesearch.     

"The user experience is poor, indexing is slow, and it's expensive to host. There are some newer, code-specific open source projects out there, but they definitely don't work at GitHub's scale," he writes. 

GitHub started experimenting with Elasticsearch in 2011, but Clem notes it look "months" to index GitHub's then roughly eight million repositories. Today, GitHub supports about 200 million dynamic code repositories.  

GitHub's Blackbird currently supports searching across about 45 million repositories, so it provides only partial coverage, but it still enables code searching across 15 terabytes of code and 15.5 billion documents for programs written in Python, Java, and JavaScript. 

The Rust-written custom search engine, Blackbird, is more efficient and gives GitHub "substantial storage savings via deduplication and guarantees a uniform load distribution across shards", according to Pavel Avgustinov, VP of software engineering at GitHub.  

He argues GitHub's scale means it can't use a Unix 'grep' (global regular expression print) for search. In effect, it would be too slow when considering the possibility of processing hundred of terabytes of code in memory. Queries would take too long. 

Also:New job? Here are 5 ways to make a great first impression

Clem notes that deduplication and its approach to indexing cut down the 115 terabytes it needed to search down to 28 terabytes of unique content. The index itself is now 25 terabytes.  

Innovation

I tried Apple Vision Pro and it's far ahead of where I expectedThis tiny satellite communicator is packed full of features and peace of mindHow to use ChatGPT: Everything you need to knowThese are my 5 favorite AI tools for work
  • I tried Apple Vision Pro and it's far ahead of where I expected
  • This tiny satellite communicator is packed full of features and peace of mind
  • How to use ChatGPT: Everything you need to know
  • These are my 5 favorite AI tools for work

tag-icon Hot Tags : Business Developer

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.