Arduino Library Index Health Check
Or, how well is it with the 6.000+ librarys in the Arduino Library Manager? 🤔
Let´s do some "Big Data" things, ask ChatGPT, make a couple of thausand API querys, and draw important looking charts! 😀
(And here is also all whats needed to reproduce this by yourself)
First, what?
The Arduino world uses Librarys heavily to simplify projects, like somebody one time figured out how to write and read to a SD card, and now you just have to install the (SD lib) and dont have to worry about the details behind any more, nice!
The Arduino IDE includes the "Arduino Library Manager", a tool that let you search Librarys and install them, also update them later on.
This Tool has a Register, it is hosted on GitHub so everybody can contribute to it. https://github.com/arduino/library-registry/
Second, and?
Arduino has defined a specification for Librarys (https://arduino.github.io/arduino-cli/0.35/library-specification/)
It defines things like naming convention, folder structure and metadata. So that the Library can be used.
And to make it easy to comply there is also a Tool, Lint, to automatically check if a Library conforms to it.
For every Library there is a Lint Report where you can see if all is well or not. So for the SD Library from above, this is the Report Link: Lint Log
There is even a Github Action that can be used to do it automatically on every change.
So you could think that all the Arduino Librarys follow this? Hint, no, not even close... 🙁
Let´s take a step back
How can we say the Library Index is healthy? If it only contains healthy Librarys or? But what is a "Healthy" Library?
- It is used by many people
- It is actively maintained
- It follows the specification
Here is my take on answering this 3 points:
- It is used by many people -> Stars. More stars on Github/Gitlab/... mean that more people use it.
- It is actively maintained. -> Last edit timestamp and open Issue count. A recent edit and no open Issues? all good.
- It follows the specification -> Less Lint errors, better.
Further, if lets say i ask ChatGPT on what a "good Arduino Library" is, it further brings:
- Documentation
- License
- Examples
- (Some more, but i skip that)
Taking a look
to warm up with the data, lets see some basic things. Like how many librarys are there ecaxtly, and where are they?
And who are the top 10 Library writers?
Going further, are there duplicate entrys?
file1 = open("repositories.txt", "r")
Lines = file1.readlines()
dupes = [item for item, count in Counter(Lines).items() if count > 1]
print(f"Dupes: {dupes}")
Dupes: ['https://github.com/Syncano/syncano-arduino\n', 'https://github.com/thinger-io/ClimaStick\n']
Just 2, lets fix that right away: GitHub Pull Request
And how about not working ones, dead links, 404´s? There are some but it is not so quick to point them out with 301's and others.
A closer look
How many Lint errors are there?
So LP010 (Name to long) is the highrunner, followed by LS008 (Name-Header mismatch) and LP015 (Name contains spaces).
Data collection
with over 6000 librarys to look up this has to be done with some caution, for example it will require API keys for GitHub and GitLab otherwise you will run into limitations.
The Process:
- Download the Library index file
- Query the Lint Status for each
- Query the Status on the individual hosts
- Store all informations locally
Now the local stored info can be used to make querys and charts.
All the code and the collected info used is uploaded here: GitHub Data Host. so you dont need to run the querys and go right into plotting.
Comments powered by CComment