paint-brush
I went on a Big Data Spree because of Covid19by@TheLoneroFoundation
462 reads
462 reads

I went on a Big Data Spree because of Covid19

by Andrew Magdy KamalMay 8th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Covid19 has been the closest we are to the end of the world besides the fall of the roman empire, barbonic plague, Spanish flu, the dark ages, Armenian genocide, holocaust, world war 2, etc. This led me to create a GitHub page for most of my Covid orgs page called Cov19 or Coronavirus Tools in order to manage some of the things I created. The full size charts can be viewed here. The data visualization was based off of SARS-COV Data in relation to Shigellosis.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - I went on a Big Data Spree because of Covid19
Andrew Magdy Kamal HackerNoon profile picture

Covid19 taken the world by storm. People have been panicking and buying toilet paper like no tomorrow. Celebrities have been making sure to keep us caught up on latest videos of them eating cereal. Anxious teens and twenty year olds have been extra moody.

Elon Musk's twitter is on fire. This has been the closest we are to the end of the world besides the fall of the roman empire, barbonic plague, the dark ages, slavery, Spanish flu, Armenian genocide, holocaust, world war 2, etc. Actual news sites are telling me to rub flower oil where the moon don't shine to get rid of the

coronavirus
.

Anyways, I knew it was time for me to save the world. I decided to start a big data quest on Covid19. I was originally going to just do some GitHub repos and posts on Devpost. However, I was finally inspired to push my work on Hackernoon given I had these repos sitting for weeks and months, and that people like George Hotz started doing Covid programming livestreams on YouTube.

Now I originally decided to take my first steps towards early March. I learned about a service called datasette, and decided to create a datasette service based off of pre-existing GISAID data for Covid19. This led me to create a GitHub orgs page for most of my Covid repos called Cov19 or Coronavirus Tools in order to manage some of the things I created.

Above, was the original image for my first Devpost, which primarily focused on announcing the series of tools I created related to Covid19. Now, something interesting is that I was highly interested in distributed computing.

As some may know, I have a distributed computing startup, I'm working on the Decentralized-Internet SDK,and I run a bio-statistical computing pipeline, so I know a thing or two about the market. This was one of the primary reasons why I follow Folding@home and know a lot about some of the work they are doing.

This is why their GitHub was one of the first places I looked for data resources. Folding@Home had a list of proteins they were targeting, which I used for the next Datasette services I wanted to create.

That being said, I also wanted to start doing some data visualization stuff. Perhaps I was going on a tangent, but I also started getting bored, and needing to do a data visualization case study for something else anyways. One of the first data visualization studies I did in relation to Covid19 was on RawGraphs.io.

RawGraphs isn't at all complex, and I usually use way more advanced programs or custom code something, but for the data set that I was looking at, this would do. The repo can be see here, and view #2 above is based off of SARS-COV Data in relation to Shigellosis. The original data was from the data set seen here which was data published by the CDC. Now that being said, this is a quite simple dataset. Let us look at something way more complex.

Data View #1

Data View #2

Data View #3

Data View #4

Data View #5

The above data views is related to GISAID data that was visualized with the program RapidMiner Studio. The full size charts can be viewed here. Now a few pointers when I did this data visualization. 1) The data file could have been scrubbed or cleaned for the purpose of the visualization.

I didn't scrub or clean the data file and remove things such as author or URLs although it would have made the data more meaningful in just analyzing core data such as genes, regions, length, sex, ascension IDs, etc. 2) These were the genetic epidemiology data sets that seemed to coordinate with what many people already know. The origin host seemed to be a Chinese horseshoe bat prior to human to human transmission.

Now something that is quite interesting, is prior to doing RapidMiner visualizations, I was really into the news following the Coronavirus. However, I was also tired of a lot of the misinformation out there.

I was following Covid19 when prior to there being 125,000 cases. Infact, I was following even before that and before the first reported case in the United States. When there was 125,000 cases is when I first started openly talking about the data.

I predicted it could pass 5 million cases worldwide in a matter of months, and so on. However, lots of this was said in order to calm people down. As of 5/2/2020 8:48PM, there is so far 3.42 Million cases.

I made a Udemy course early on months ago, called the Covid19 survival guide that was data-centric. My whole point is: 1) Look at severe cases for Covid19 as a key metric over total cases 2) Study herd immunity and practice common sense measures centered around age and the vulnerable rather then policies for all age groups 3) Negative economic offsets can cause death too 4) The media to an extent have over-hyped some data and information. That being said, I am not trying to get political.

It is radical to say open everything up and everyone can go hold hands and sing cumbia while a deadly virus is spreading because the Dow Jones. However, it is also radical to say close everything down and make everyone unemployed whether Covid could effect them the same way or not regardless of their civil liberties.

We won't have a vaccine until 2 years? Okay, bro you just can't go see your loves ones or work for 2 years, its fine. There are two extremes, and I want to be an unbaised mediator between both of them.

Now that being said, I kind of started getting carried away, and went to the next part of my "big data spree":

As you know, I am into bio-statistical computing, which lead me down a rabbit hole. After watching a video on ReasonTV, I mirrored something called Coronope. Coronope was basically this proposal by some biohackers wanting to crowdsource a Coronavirus vaccine. I thought it was interesting, and our startup would do the same if it wasn't for those pesky regulations. However, I still went down the rabbit hole.

This rabbit hole I went down was almost as bad as the conspiracy theorist rabbit hole. I SUGGEST NOBODY DOES THIS. Nah, I'm just kidding. I just started messing around with more software in the "lab". I found this repo on GitHub called VxAfee, which I decided to convert the files and visualize with SnapGene viewer. One of the visuals is seen above, and you can see the whole project here.

This was just for fun as this combination through further inspection was revealed highly unlikely to work and is for demonstration purposes only. As in... see the power of open source and code, but don't use. Man I wish I could do more. Oh wait.

I kind of got bored, so I found out about this game you can download called FoldIt. I even started a repo to keep track of my puzzle scores. However, after day #1, I was like this is taking a really long, long time. Also ain't I supposed to be a "scientist"?

My laptop was getting burning hot really quick too, and since burning one of my computers wasn't on my todo list, it was time to do something else.

That something else was kind of ambitious. It was also kind of silly. Looking back I was like, "wow, how did I think I was going to impress anybody or be that much help compared to the big guys?". Maybe it was better than doing nothing.

Anyways, a "few years ago", I wanted to make an "underwater aqua-man suit". This was literally something I wanted to do for a serious innovation competition where I need to pitch infront of NASA scientist (I opted out to create Underwater Wireless Telecom networks instead).

Also, I am pretty sure in terms of building the closest thing to an underwater aquaman suit, the legendary Peter Sripol beat me to it. To be honest though, I was being more inspired by the whole artificial gills concept going around in the tech community.

Anyways, I know I am going on a tangent. My old design was also as you can see kind off repurposed off a gas mask 3D model on some open asset repo. It was a while ago, and my 3D modeling skills weren't nearly as good as they are now.

You might be saying, "why are you mentioning this in an article about Covid19, on Hackernoon?", well I am getting there. Many people have been doing Open Ventilator concepts, so I kind of wanted to jump on the bandwagon.

My idea was like, hey this would cost like $350 to build, and likely it can even be lowered to less then $100. If people can do that when their are ventilators that are $18k+, that would be savage. The components list I thought I needed from my research was:

  1. ARDUINO UNO R3 for Airway Control
  2. Multi-Servo Motor Controls for Arduino
  3. Arduino Pan & Tilt Mounting Kits
  4. Airway filters, valves and Lung Bags
  5. CPAP Hose Adapter and Mask Liner
  6. Medical Air Pump device such as Airsense
  7. If not medical airpump device, then airway controller + regular pump
  8. Portable battery or power station type device (dependent on chosen components)

Anyways, there was few problems. 1) I need access to a makerspace to build my device. My makerspace lab access expired and I had limitations to go outside at that time. 2) FDA + HIPAA regulations, further testing and other stuff.

This is why it was just a project. My repository with its disclaimers is here and the Devpost I made was here. All in all, I consider that more of a failed attempt to resurrect an old (and ridiculous concept) that never went somewhere.

However, I think the Open-ventilator concept I was working on still has some potential if I have more resources. That may be decided for in the future. I think other people like this guy may be further than me. I also know this isn't necessarily a "big data thing", but listen guys, I was taking a break. Now let us look into even some more stuff my mad scientist self was doing.

This is where I decided to take some of my efforts and may have went a bit too far. I may have really done it this time guys. Okay, not really. All I did, was convert some files from the original data sources I had and view them utilizing SWISS-MODEL's software. You can see the original repo here, and this explanatory Devpost.

I wanted this to basically be a proposal for a drug discovery toolkit using those spike receptor sources. This was actually something small I did, and mainly just converting a genomics file. I don't want to accuse myself of doing something that impressive.

Yes, I had to convert a pdb file and make it compatible, remove illegal characters and wait literally hours to see the report finally run, but it was literally no big deal. It didn't require that much skill. I got bored, and decided I wanted to do even more.

The image above is for a chart and analysis I did on TradingView seen here. Now the background story on, "why I did this", is obvious. Lots of people think the economy is crashing. Yes, it did have a crash. It was kind of a steep one.

However, people are reacting as if Caligula or his not so sane nephew Nero came back from the dead to play the fiddle as the world is burning down. The economy isn't crashing as bad as one would expect. When stocks or other asset classes go down a large chunk like let us say 25, 30% in weeks, that doesn't mean suddenly those businesses are 25 or 30% smaller.

Most of these drops are unjustified and people have been panic selling hence losing money on top of the already bad financial losses.

I decided to come as Mr. Common Sense and look at the charts and be like, listen guys me and other people are starting to see this trend, here is data and wave correlations that look like a positive retracement is happening soon.

Not offering financial advice, but just saying, hey the economy looks like it may start going back up given people are like, "maybe the apocalyptic hype didn't make much sense". I think the YouTuber Graham Stephan explained panic selling pretty well as well in this video.

That being said, I have another surprise for those who made it this far in the article.

That is right, I decided to go even further down the big data rabbit hole. A small "distraction" lead me to finding this data set by the Qatar Computing Research Institute. I decided to make the first visualization with RapidMiner, and most RapidMiner graphs were rubbish given the simplicity of the data set.

I thought the best RapidMiner visual was the linear bubble chart. The circular diagram in RawGraphs with symbols worked better in my opinion for the simpler things. That being said, this data was updated as of April 8th, 2020 by the time I visualized it. I don't know why I did this.

Perhaps I just wanted to be like, hey I visualized Twitter data, or hey I visualized Geographic data. I could have done something advanced with some sort of sentiment analysis ethical AI, or anything but this. Really the note I am ending is one upping what could have been an excel visualization? This leads me to my next point.

There are alot more projects I want to do in similar realms, like perhaps a Hantavirus model, forecasting dashboards, research dashboards, etc. Maybe I want to come back to these "things" later.

Ways you can help? 😊

There are many ways you can help me with my projects and research. The stuff you seen related to Covid recently took like 2% of my time. Okay, I will be generous, and say maybe like 5% of my time. This was a very small side project.

Besides this, I had to manage my Udemy courses (which I recently taught about 15 new subjects), run DigitalCPR, trying to start QuantPortal LLC., analyze 60+ stocks, write over 80+ pages related to bio-statistical computing for something in the works, and actively manage my SDK. Some weeks I actively work up to 125 hours.

Most of my time is consumed in managing my SDK. One of the ways you can help support me is sponsoring my SDK through Open Collective. Other ways you can help could be through Tidelift or through following Lonero on Twitter or Minds.com.

Also, I don't like being a show boater and being like oh sponsor me, follow me, etc. This is why I put this at the very end. Trust me though, everything thus far have been mostly out of pocket and any support can help. Sharing this article is just as much gratitude or even reading some of this and appreciating a bit of the work.

Not gonna claim this deserves any attention at all, as there are people my age likely doing far superior things with their time. That being said, hope you enjoyed this big data spree I went through. Can't wait for you to follow what I may be up to next.