Leaena

Custom APIs and Web Scraping for Science

Jan
31

So my team’s most recent application, Helix, involved genome visualization. We integrated it with the 23andme API, but still needed a way to find out interesting information about specific RSIDs (used by researchers and databases to refer to specific base pairs of DNA). By far the most useful and open source repository of genetic information is SNPedia, but I needed access to lots of information and to integrate calls to specific SNPs. Basically I needed an API. So being ever resourceful, I decided to make my own.

Tools for the task were an easy choice. I needed a small fast server that I could implement a web scrapper on. I have always wanted a reason to use BeautifulSoup, but it’s a Python library so I knew it would be easier to build a Python server to run the API endpoints. I chose Flask because of its lightweight nature and how much it reminds me of a Node/Express server at times.

Thankfully there are some really good tutorials for both Flask and BeautifulSoup, my favorites (and the ones I referenced when I hit weirdness) were Designing a RESTful API and Website Scraping with BeautifulSoup. Both of these tutorials said a lot of things better than I could have myself.

For access to my SNPedia API and information on how to use it, check out my project on GitHub.

Week 9: Highs and Lows and… WTF I only have three weeks left??

Jan
26

My week started out fairly average. We were all rolling along on our projects and then I noticed an event on the Hack Reactor Senior calendar. Tuesday, three weeks from this past Tuesday, is Hiring Day. Three weeks?? Not even now, more like two?? Oh, god. And yet, as much of a whirlwind as this has been and as often as I have impostor syndrome, I’m a little excited. I want to see what’s out there for me and find a job and learn and grow and do my instructors proud.

One slight stumbling block for me this week: Hacker in Residence positions. I applied and think I would have been accepted, but I had to bow out. After I sat down and thought about it, I just couldn’t justify being out of work that much longer (even on a stipend). It would have been fun to learn how to teach and spend some more time hacking on personal ideas, but that’s what weekends are for, right?

We also got to demo Helix for the first time. Helix is a gene visualization app that shows you your SNPs (base pairs) from 23andme that have traits attached to them (according to SNPedia.com). You can search traits or just browse your chromosomes for interesting info. It was built using a private beta framework (called Famo.us) that my team was lucky enough to get to be involved with. We have *fingers crossed* two more opportunities to demo Helix, one more run through at Hack Reactor and if all goes well, a private party/meetup for Famous.

Another fun thing that came out of Helix was that I got to dust off my Python knowledge. I had wanted to try BeautifulSoup (a Python web scraper) for a while now and I needed an easy way to pull rsid information from SNPedia so I created my own API wrapper! The code is available (including instructions on how to run it on your own) on my github account. It’s a tiny Python/Flask server that only has a couple of endpoints (the ones I really needed) but I’m thinking about expanding eventually.

And then I got sick. I came down with a cold on Friday and haven’t been to HackReactor since. I’ve been working from home, but mostly just trying to sleep, having weird dreams, and sounding pitiful. I’m getting better though and I will definitely be on-point on Monday to work out the last-minute details of Helix before all the demos come crashing around us.

Three more weeks until I graduate! My gift to myself – I’m attending the LAUNCH hackathon with two other women from HackReactor the weekend after it’s all over. I just don’t want to get lazy!

HackReactor , , , , Comments Off on Week 9: Highs and Lows and… WTF I only have three weeks left??

Week 8: New Surroundings, Same Routine

Jan
19

Sorry for the delay in this post. My roommate, Ava, was worried about my long hours all week so she wouldn’t let me touch my laptop on Sunday. Saturday night was card games with Hack Reactor peeps so I was out late. It was a jumble of a week and I’m writing this so late that this week is already upon me. So I think this one will be very short.

We started working at Famo.us this past week. They have a beautiful office that was converted from an apartment. It’s weird to go back to Hack Reactor now with their darker rooms and only two bathrooms, but I still miss it something fierce when I’m away. Hack Reactor feels like home, but Famo.us is a nice vacation. Our project is slowly progressing. We have some really neat ideas about gene visualization and if all goes well we’ll get to demo the awesomeness in front of a bunch of people.

Other fun things from this week included a talk on Thursday from the author of Cracking the Coding Interview and Saturday social night where I got my ass handed to me in Marvel vs. Capcom and then made people squirm in Cards Against Humanity.

One final thing: Ava talked me into buying a FitBit! I’ve been meeting all my goals every day and it’s pink so life is pretty amazing. You can find me on Fitbit here.

 

HackReactor , , Comments Off on Week 8: New Surroundings, Same Routine

Week 7: Is This What Confidence Feels Like?

Jan
11

Coming back from break was wonderful. I really missed this place and these people and I’m at a point now where I’m excited to walk in the front door of this space. I am really, truly a Software Engineer. I have been for a long time, but it took this place and these people to pull that knowledge out of myself. I started the week with giant hugfests of awesome. It was great to see everyone after two weeks. There was some unexpected lack of (and new growth of) facial hair and general fun stories about hijinks had during our time away. We all quickly felt the glory of being seniors and then were promptly blown away by how awesome the new batch of juniors are.

There wasn’t much time to chat though, juniors were starting their hell week and we were about to embark on a different sort of hell – Hiring Day Assessments. I was terrified. I’ve decided my brain just needs to have something to focus on being terrified about to function at all – I’m starting to wonder if losing my fear would also diminish my awesomeness. We had all day to finish our assessments and as I dove in my confidence built. I knew this stuff. I knew it from the times it had been drilled into my head and the moments when I was working on something alone and would need to Google a concept and those times at the lunch table with my peers discussing wild and crazy new concepts. It rocked to realize how awesome we all are now. Everyone can tell us we are awesome until their blue in the face, but it’s moments like that when it clicks for me.

The other moments it clicks for me is the new, terribly unfunny programming jokes we’ve all started making. It’s getting ridiculous.

After Monday’s stress, we quickly got our hands dirty in our code. Our first round of group projects wrapped this week. I worked with Sara and João to make a custom html5 video player plugin to vote on moments in videos and visualize the user data. Our project is called HeatVote. We’re still hacking on it in our “free time”, but its production cycle is officially over. There are a few previous posts on things I worked on for this project and I feel like I have a book more to write about the experience, but time is, as my faithful readers know, very short lately so I’m going to close this book for now.

Our next project period starts on Tuesday. I was fortunate enough to get a client project working with an awesome team to create mobile web apps at famo.us! I am very excited to dive into unfamiliar territory, learn, and help out a team of super talented people.

D3.js Rollups

Jan
04

Do you have all the data and none of the visuals? Do you just want a pretty, fast way to compare lots of data that centers around maybe just a handful of moments?

D3.js can help you tame all of your data and d3.rollup is especially useful if you have lots of data that you need to combine into just a couple of data points. All it takes is a couple of (pretty long) lines of code and you will have an awesome visual that’s very customizable.

Lets start with a really straightforward example of a rollup. In all of these examples, I’m using code straight from my HeatVote project, which requires me to pull voting data from our server API that I receive as a JSON blob. Here’s an example entry:

{ video_id: 'T-D1KVIuvjA',
  timestamp: 2,
  vote: 1,
  id: 1,
  createdAt: Sat Dec 21 2013 14:55:42 GMT-0800 (PST),
  updatedAt: Sat Dec 21 2013 14:55:42 GMT-0800 (PST) }

Now obviously there are a bunch of these, and technically there are easier ways to do this, but to show off the structure of a rollup, lets count how many entries we had in our database using a d3 rollup!

var total = d3.nest()
  .rollup(function(d){
    return d.length;
  })
  .entries(data);

Remember, data here is my array of JSON entries, so in our rollup function the d is just shorthand for all of the data. This isn’t a very interesting example though, lets take a look at something that really shows off the beauty of a d3 rollup.

var averages = d3.nest()
  .key(function(d) {
    return d.timestamp; 
  })
  .sortKeys(d3.ascending)
  .rollup(function(d){
    return d3.mean(d, function(g) { 
      return +g.vote;
    });
   })
   .entries(data);

Now there is a lot going on in this very compact few lines, so well go through them one by one, but the result is that averages is equal to an array of objects with the properties key (that is equal to each unique timestamp) and value (that is equal to the mean of all votes at that timestamp).

So lets break it down:

  • .key(...) is just used to tell the function what our keys are, only grabbing unique values of that property.
  • .sortKeys is just a prettiness thing, it sorts my keys into an order (when they’re pulled off the server the only order is by the time they were created on the database).
  • and finally our lovely .rollup(...). Now instead of d being an array of the whole data, it’s now an array of only the data for each individual key (so all of the data with the same timestamp). The inner function d3.mean takes a specific property from all of the data for each key and averages them up.

And that, is d3 rollup in a nutshell, it’s really lovely at coercing relationships out of your raw data and you can obviously do a lot more with it that just averaging things. The d3 nest docs are probably the next best place to look to get your hands dirty (.rollup is a property of nest).