MyFitnessPal: Web-scraping tool

What's a carb? A car part? What's a macro? A type of camera lens?

Moderator: Manveer

Post Reply
User avatar
throwinshapes
Registered User
Posts: 301
Joined: Wed May 09, 2018 4:50 am
Location: Dover, NH
Age: 34

MyFitnessPal: Web-scraping tool

#1

Post by throwinshapes » Sat Jan 11, 2020 12:28 pm

I created a tool to scrape the MFP website for your nutrition data, while still maintaining a free account.

Recently, I've been taking a stab at learning VBA, so this tool is very basic. There's no error checking or anything like that. And the script runs very slow - about 6min for a month's worth of data.

If anyone wants to help develop this tool into something fancier or get better performance out of it, I'm all for it - as long as you teach me about what you changed :)

Instructions are included in the controls tab:

https://drive.google.com/file/d/1Z-YL3S ... sp=sharing

*It seems to error out occasionally. IDK if I'm hitting the MFP website to often or what.

kingsamj
Registered User
Posts: 101
Joined: Fri Jan 03, 2020 11:46 am
Age: 36

Re: MyFitnessPal: Web-scraping tool

#2

Post by kingsamj » Sun Jan 12, 2020 8:11 pm

Cool tool idea! I've just started using MFP and I've been disappointed with the lack of an easy/free way to get the data you give them.

I'm a complete noob when it comes to Google Sheets and Excel so I wasn't sure how to see the code you wrote. What URL are you accessing to get the data to scrape? From your post and the comment on its performance it sounds like you might be accessing the /food/diary/<username>?date=<date> path directly for each day?

If you're not already doing it, it would probably be much more performant to access /reports/printable_diary/<username>?from=<date1>&to=<date2>, instead, as that will give you the data from a range of dates with only one web query. It might also save users from being blocked by MFP for doing too many queries.

User avatar
throwinshapes
Registered User
Posts: 301
Joined: Wed May 09, 2018 4:50 am
Location: Dover, NH
Age: 34

Re: MyFitnessPal: Web-scraping tool

#3

Post by throwinshapes » Mon Jan 13, 2020 5:05 am

kingsamj wrote: Sun Jan 12, 2020 8:11 pm Cool tool idea! I've just started using MFP and I've been disappointed with the lack of an easy/free way to get the data you give them.

I'm a complete noob when it comes to Google Sheets and Excel so I wasn't sure how to see the code you wrote. What URL are you accessing to get the data to scrape? From your post and the comment on its performance it sounds like you might be accessing the /food/diary/<username>?date=<date> path directly for each day?

If you're not already doing it, it would probably be much more performant to access /reports/printable_diary/<username>?from=<date1>&to=<date2>, instead, as that will give you the data from a range of dates with only one web query. It might also save users from being blocked by MFP for doing too many queries.
Ahhhh. I didn't know about those reports! That's so much better. You are right about using the diary page. The script is slow because it's not only loading each day as a web page, but on top of that, it opens a new instance of excel to convert the webpage to a spreadsheet. So, you can pull a month of data at a time in that printable report, that should improve the speed by 30x :P

convergentsum
Registered User
Posts: 826
Joined: Thu Mar 01, 2018 3:44 am
Age: 43

Re: MyFitnessPal: Web-scraping tool

#4

Post by convergentsum » Tue Jan 14, 2020 2:25 am

Why not output csv? Should make your script run faster but more importantly easier to read/debug, and Excel can read csv no problem.

Also, if their server is starting to seem unhappy, pop an Application.Wait() between web requests.

User avatar
throwinshapes
Registered User
Posts: 301
Joined: Wed May 09, 2018 4:50 am
Location: Dover, NH
Age: 34

Re: MyFitnessPal: Web-scraping tool

#5

Post by throwinshapes » Tue Jan 14, 2020 4:46 am

convergentsum wrote: Tue Jan 14, 2020 2:25 am Why not output csv? Should make your script run faster but more importantly easier to read/debug, and Excel can read csv no problem.
Could you explain how you'd automate opening the webpage and converting to .csv in VBA?

convergentsum
Registered User
Posts: 826
Joined: Thu Mar 01, 2018 3:44 am
Age: 43

Re: MyFitnessPal: Web-scraping tool

#6

Post by convergentsum » Tue Jan 14, 2020 5:53 am

throwinshapes wrote: Tue Jan 14, 2020 4:46 am
convergentsum wrote: Tue Jan 14, 2020 2:25 am Why not output csv? Should make your script run faster but more importantly easier to read/debug, and Excel can read csv no problem.
Could you explain how you'd automate opening the webpage and converting to .csv in VBA?
Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something like

Code: Select all

 Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report"
    HttpRequest.Send
But then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this. :(

User avatar
throwinshapes
Registered User
Posts: 301
Joined: Wed May 09, 2018 4:50 am
Location: Dover, NH
Age: 34

Re: MyFitnessPal: Web-scraping tool

#7

Post by throwinshapes » Tue Jan 14, 2020 8:49 am

convergentsum wrote: Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something like

Code: Select all

 Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report"
    HttpRequest.Send
But then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this. :(
Ya. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.

FredM
Registered User
Posts: 728
Joined: Wed Jul 18, 2018 9:17 am
Age: 36

Re: MyFitnessPal: Web-scraping tool

#8

Post by FredM » Wed Jan 15, 2020 7:48 am

throwinshapes wrote: Tue Jan 14, 2020 8:49 am
convergentsum wrote: Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something like

Code: Select all

 Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report"
    HttpRequest.Send
But then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this. :(
Ya. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.
But.. why?

As an engineer (by education) who learned VBA for his first job, then became a software engineer because he liked programming so much -- might I suggest you skip my wasted step of learning an inefficient/crappy language?

If you're doing data analysis (even if it's with Excel), and have a background in Matlab, use Python/Numpy/Pandas stack. It's also the second best stack to do web scraping (Perl is the best).

convergentsum
Registered User
Posts: 826
Joined: Thu Mar 01, 2018 3:44 am
Age: 43

Re: MyFitnessPal: Web-scraping tool

#9

Post by convergentsum » Wed Jan 15, 2020 8:23 am

throwinshapes wrote: Tue Jan 14, 2020 8:49 am
convergentsum wrote: Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something like

Code: Select all

 Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report"
    HttpRequest.Send
But then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this. :(
Ya. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.
https://www.wiseowl.co.uk/blog/s393/query-table.htm looks like it shows how to query multiple pages without spinning up a fresh excel instance for each one, just point each query table at a different destination (maybe a new sheet?).
Using another language is more versatile, but carries the significant disadvantage that you have to take on the responsibility for extracting & parseing the data yourself. It's never going to be perfect: web scraping never gets any better than "fragile" (since the guys at the other end might change the way they present the data at any time), so if excel's automagic is working, I see little advantage to disturbing it!

@FredM I've written precisely one excel macro, to loop through all the worksheets exporting them each to their own csv!

FredM
Registered User
Posts: 728
Joined: Wed Jul 18, 2018 9:17 am
Age: 36

Re: MyFitnessPal: Web-scraping tool

#10

Post by FredM » Wed Jan 15, 2020 1:08 pm

convergentsum wrote: Wed Jan 15, 2020 8:23 am
throwinshapes wrote: Tue Jan 14, 2020 8:49 am
convergentsum wrote: Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something like

Code: Select all

 Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report"
    HttpRequest.Send
But then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this. :(
Ya. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.
https://www.wiseowl.co.uk/blog/s393/query-table.htm looks like it shows how to query multiple pages without spinning up a fresh excel instance for each one, just point each query table at a different destination (maybe a new sheet?).
Using another language is more versatile, but carries the significant disadvantage that you have to take on the responsibility for extracting & parseing the data yourself. It's never going to be perfect: web scraping never gets any better than "fragile" (since the guys at the other end might change the way they present the data at any time), so if excel's automagic is working, I see little advantage to disturbing it!

@FredM I've written precisely one excel macro, to loop through all the worksheets exporting them each to their own csv!
VBA is hyper fragile. Python and Perl libraries are a lot more robust for web scraping. E.G, see: https://pandas.pydata.org/pandas-docs/v ... _html.html

In addition, Python/Pandas is way easier to manipulate/clean the data after you read it into memory.

I spent 4 months learning VBA and by the second was a top 10 contributor to Microsoft's VBA forums (points for providing answers). It's certainly better than NOT programming, but I see very little reason to not use more powerful tools if you're just self learning. If your company won't let you install Python, then fine.

User avatar
throwinshapes
Registered User
Posts: 301
Joined: Wed May 09, 2018 4:50 am
Location: Dover, NH
Age: 34

Re: MyFitnessPal: Web-scraping tool

#11

Post by throwinshapes » Wed Jan 15, 2020 3:21 pm

I'm glad you brought up Python. Really I'm just trying to learn enough VBA to make my life easier when working with Excel. I find there's a lot of times where its still quicker to throw test data in a Excel and get a quick plot going. VBA would help me bridge the gap to Octave where I could set up tools to code formatting for charts I use often so they have a similar look for example. Really just to fix some annoyances I have with Excel. The other thing is readability to other engineers - if it's not in Excel, its inaccessible to them. At least at my current place.

Octave/MatLab have been good to me, but I'm definitely interested in learning Python. Seems like there's a lot of community support for it.

HenryHokie
Registered User
Posts: 10
Joined: Sun Jan 19, 2020 7:25 am

Re: MyFitnessPal: Web-scraping tool

#12

Post by HenryHokie » Tue Jan 21, 2020 8:00 pm

throwinshapes wrote: Wed Jan 15, 2020 3:21 pm I'm glad you brought up Python. Really I'm just trying to learn enough VBA to make my life easier when working with Excel. I find there's a lot of times where its still quicker to throw test data in a Excel and get a quick plot going. VBA would help me bridge the gap to Octave where I could set up tools to code formatting for charts I use often so they have a similar look for example. Really just to fix some annoyances I have with Excel. The other thing is readability to other engineers - if it's not in Excel, its inaccessible to them. At least at my current place.

Octave/MatLab have been good to me, but I'm definitely interested in learning Python. Seems like there's a lot of community support for it.
If your goal is automate the data creation into Excel, there are definitely ways to do that in Python.

The snip below is a quick python3 script that downloads a single day and kicks it out a CSV and Excel file with the output.
Note: This is a proof of concept, so some data munging would need to be done for this to really be usable.

Code: Select all

import requests
import pandas as pd

USERNAME = <USERNAME> 
START = '1/20/2020'
END = START  


def get_mfp_page(username, start, end):
    url = f'https://www.myfitnesspal.com/reports/printable_diary/{username}?from={start}&to={end}'
    page = requests.get(url)
    if page.status_code == 200:
        return page


if __name__ == '__main__':
    page = get_mfp_page(USERNAME, START, END)
    tables = pd.read_html(page.text)
    tables[0].to_csv('output.csv')
    tables[0].to_excel('output.xlsx')

Post Reply