MyFitnessPal: Web-scraping tool
Moderator: Manveer
- throwinshapes
- Registered User
- Posts: 301
- Joined: Wed May 09, 2018 4:50 am
- Location: Dover, NH
- Age: 34
MyFitnessPal: Web-scraping tool
I created a tool to scrape the MFP website for your nutrition data, while still maintaining a free account.
Recently, I've been taking a stab at learning VBA, so this tool is very basic. There's no error checking or anything like that. And the script runs very slow - about 6min for a month's worth of data.
If anyone wants to help develop this tool into something fancier or get better performance out of it, I'm all for it - as long as you teach me about what you changed
Instructions are included in the controls tab:
https://drive.google.com/file/d/1Z-YL3S ... sp=sharing
*It seems to error out occasionally. IDK if I'm hitting the MFP website to often or what.
Recently, I've been taking a stab at learning VBA, so this tool is very basic. There's no error checking or anything like that. And the script runs very slow - about 6min for a month's worth of data.
If anyone wants to help develop this tool into something fancier or get better performance out of it, I'm all for it - as long as you teach me about what you changed
Instructions are included in the controls tab:
https://drive.google.com/file/d/1Z-YL3S ... sp=sharing
*It seems to error out occasionally. IDK if I'm hitting the MFP website to often or what.
-
- Registered User
- Posts: 101
- Joined: Fri Jan 03, 2020 11:46 am
- Age: 36
Re: MyFitnessPal: Web-scraping tool
Cool tool idea! I've just started using MFP and I've been disappointed with the lack of an easy/free way to get the data you give them.
I'm a complete noob when it comes to Google Sheets and Excel so I wasn't sure how to see the code you wrote. What URL are you accessing to get the data to scrape? From your post and the comment on its performance it sounds like you might be accessing the /food/diary/<username>?date=<date> path directly for each day?
If you're not already doing it, it would probably be much more performant to access /reports/printable_diary/<username>?from=<date1>&to=<date2>, instead, as that will give you the data from a range of dates with only one web query. It might also save users from being blocked by MFP for doing too many queries.
I'm a complete noob when it comes to Google Sheets and Excel so I wasn't sure how to see the code you wrote. What URL are you accessing to get the data to scrape? From your post and the comment on its performance it sounds like you might be accessing the /food/diary/<username>?date=<date> path directly for each day?
If you're not already doing it, it would probably be much more performant to access /reports/printable_diary/<username>?from=<date1>&to=<date2>, instead, as that will give you the data from a range of dates with only one web query. It might also save users from being blocked by MFP for doing too many queries.
- throwinshapes
- Registered User
- Posts: 301
- Joined: Wed May 09, 2018 4:50 am
- Location: Dover, NH
- Age: 34
Re: MyFitnessPal: Web-scraping tool
Ahhhh. I didn't know about those reports! That's so much better. You are right about using the diary page. The script is slow because it's not only loading each day as a web page, but on top of that, it opens a new instance of excel to convert the webpage to a spreadsheet. So, you can pull a month of data at a time in that printable report, that should improve the speed by 30xkingsamj wrote: ↑Sun Jan 12, 2020 8:11 pm Cool tool idea! I've just started using MFP and I've been disappointed with the lack of an easy/free way to get the data you give them.
I'm a complete noob when it comes to Google Sheets and Excel so I wasn't sure how to see the code you wrote. What URL are you accessing to get the data to scrape? From your post and the comment on its performance it sounds like you might be accessing the /food/diary/<username>?date=<date> path directly for each day?
If you're not already doing it, it would probably be much more performant to access /reports/printable_diary/<username>?from=<date1>&to=<date2>, instead, as that will give you the data from a range of dates with only one web query. It might also save users from being blocked by MFP for doing too many queries.
-
- Registered User
- Posts: 826
- Joined: Thu Mar 01, 2018 3:44 am
- Age: 43
Re: MyFitnessPal: Web-scraping tool
Why not output csv? Should make your script run faster but more importantly easier to read/debug, and Excel can read csv no problem.
Also, if their server is starting to seem unhappy, pop an Application.Wait() between web requests.
Also, if their server is starting to seem unhappy, pop an Application.Wait() between web requests.
- throwinshapes
- Registered User
- Posts: 301
- Joined: Wed May 09, 2018 4:50 am
- Location: Dover, NH
- Age: 34
Re: MyFitnessPal: Web-scraping tool
Could you explain how you'd automate opening the webpage and converting to .csv in VBA?convergentsum wrote: ↑Tue Jan 14, 2020 2:25 am Why not output csv? Should make your script run faster but more importantly easier to read/debug, and Excel can read csv no problem.
-
- Registered User
- Posts: 826
- Joined: Thu Mar 01, 2018 3:44 am
- Age: 43
Re: MyFitnessPal: Web-scraping tool
Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something likethrowinshapes wrote: ↑Tue Jan 14, 2020 4:46 amCould you explain how you'd automate opening the webpage and converting to .csv in VBA?convergentsum wrote: ↑Tue Jan 14, 2020 2:25 am Why not output csv? Should make your script run faster but more importantly easier to read/debug, and Excel can read csv no problem.
Code: Select all
Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report"
HttpRequest.Send
- throwinshapes
- Registered User
- Posts: 301
- Joined: Wed May 09, 2018 4:50 am
- Location: Dover, NH
- Age: 34
Re: MyFitnessPal: Web-scraping tool
Ya. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.convergentsum wrote: ↑Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something likeBut then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this.Code: Select all
Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1") HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report" HttpRequest.Send
-
- Registered User
- Posts: 728
- Joined: Wed Jul 18, 2018 9:17 am
- Age: 36
Re: MyFitnessPal: Web-scraping tool
But.. why?throwinshapes wrote: ↑Tue Jan 14, 2020 8:49 amYa. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.convergentsum wrote: ↑Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something likeBut then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this.Code: Select all
Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1") HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report" HttpRequest.Send
As an engineer (by education) who learned VBA for his first job, then became a software engineer because he liked programming so much -- might I suggest you skip my wasted step of learning an inefficient/crappy language?
If you're doing data analysis (even if it's with Excel), and have a background in Matlab, use Python/Numpy/Pandas stack. It's also the second best stack to do web scraping (Perl is the best).
-
- Registered User
- Posts: 826
- Joined: Thu Mar 01, 2018 3:44 am
- Age: 43
Re: MyFitnessPal: Web-scraping tool
https://www.wiseowl.co.uk/blog/s393/query-table.htm looks like it shows how to query multiple pages without spinning up a fresh excel instance for each one, just point each query table at a different destination (maybe a new sheet?).throwinshapes wrote: ↑Tue Jan 14, 2020 8:49 amYa. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.convergentsum wrote: ↑Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something likeBut then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this.Code: Select all
Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1") HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report" HttpRequest.Send
Using another language is more versatile, but carries the significant disadvantage that you have to take on the responsibility for extracting & parseing the data yourself. It's never going to be perfect: web scraping never gets any better than "fragile" (since the guys at the other end might change the way they present the data at any time), so if excel's automagic is working, I see little advantage to disturbing it!
@FredM I've written precisely one excel macro, to loop through all the worksheets exporting them each to their own csv!
-
- Registered User
- Posts: 728
- Joined: Wed Jul 18, 2018 9:17 am
- Age: 36
Re: MyFitnessPal: Web-scraping tool
VBA is hyper fragile. Python and Perl libraries are a lot more robust for web scraping. E.G, see: https://pandas.pydata.org/pandas-docs/v ... _html.htmlconvergentsum wrote: ↑Wed Jan 15, 2020 8:23 amhttps://www.wiseowl.co.uk/blog/s393/query-table.htm looks like it shows how to query multiple pages without spinning up a fresh excel instance for each one, just point each query table at a different destination (maybe a new sheet?).throwinshapes wrote: ↑Tue Jan 14, 2020 8:49 amYa. I'm just learning VBA. My background is more MatLab and some C. The difficult part for this was the web facing side of things, so getting Application.Open to do the heavy lifting for me to both open the page, and convert to a readable data format was enough for me initially. If we can find some nifty VBA way to do it, I'm all for it. The slowness is really from opening a new instance of Excel on time. But kingsamj's suggestion to get the 30 day report will reduce the calls to Application.Open by 30x.convergentsum wrote: ↑Tue Jan 14, 2020 5:53 am Hmm, you probably shouldn't listen to me as I don't use VBA. I would do this in python, js (node) or shell (or maybe powershell if I needed windows) script. But in VBA you can do something likeBut then you've got to parse the html (HttpRequest.ResponseText) and extract the relevant <table>. I dunno how convenient VBA makes this.Code: Select all
Set HttpRequest = CreateObject("WinHttp.WinHttpRequest.5.1") HttpRequest.Open "GET", "http://www.blah.blah/path/to/printable_diary/report" HttpRequest.Send
Using another language is more versatile, but carries the significant disadvantage that you have to take on the responsibility for extracting & parseing the data yourself. It's never going to be perfect: web scraping never gets any better than "fragile" (since the guys at the other end might change the way they present the data at any time), so if excel's automagic is working, I see little advantage to disturbing it!
@FredM I've written precisely one excel macro, to loop through all the worksheets exporting them each to their own csv!
In addition, Python/Pandas is way easier to manipulate/clean the data after you read it into memory.
I spent 4 months learning VBA and by the second was a top 10 contributor to Microsoft's VBA forums (points for providing answers). It's certainly better than NOT programming, but I see very little reason to not use more powerful tools if you're just self learning. If your company won't let you install Python, then fine.
- throwinshapes
- Registered User
- Posts: 301
- Joined: Wed May 09, 2018 4:50 am
- Location: Dover, NH
- Age: 34
Re: MyFitnessPal: Web-scraping tool
I'm glad you brought up Python. Really I'm just trying to learn enough VBA to make my life easier when working with Excel. I find there's a lot of times where its still quicker to throw test data in a Excel and get a quick plot going. VBA would help me bridge the gap to Octave where I could set up tools to code formatting for charts I use often so they have a similar look for example. Really just to fix some annoyances I have with Excel. The other thing is readability to other engineers - if it's not in Excel, its inaccessible to them. At least at my current place.
Octave/MatLab have been good to me, but I'm definitely interested in learning Python. Seems like there's a lot of community support for it.
Octave/MatLab have been good to me, but I'm definitely interested in learning Python. Seems like there's a lot of community support for it.
-
- Registered User
- Posts: 10
- Joined: Sun Jan 19, 2020 7:25 am
Re: MyFitnessPal: Web-scraping tool
If your goal is automate the data creation into Excel, there are definitely ways to do that in Python.throwinshapes wrote: ↑Wed Jan 15, 2020 3:21 pm I'm glad you brought up Python. Really I'm just trying to learn enough VBA to make my life easier when working with Excel. I find there's a lot of times where its still quicker to throw test data in a Excel and get a quick plot going. VBA would help me bridge the gap to Octave where I could set up tools to code formatting for charts I use often so they have a similar look for example. Really just to fix some annoyances I have with Excel. The other thing is readability to other engineers - if it's not in Excel, its inaccessible to them. At least at my current place.
Octave/MatLab have been good to me, but I'm definitely interested in learning Python. Seems like there's a lot of community support for it.
The snip below is a quick python3 script that downloads a single day and kicks it out a CSV and Excel file with the output.
Note: This is a proof of concept, so some data munging would need to be done for this to really be usable.
Code: Select all
import requests
import pandas as pd
USERNAME = <USERNAME>
START = '1/20/2020'
END = START
def get_mfp_page(username, start, end):
url = f'https://www.myfitnesspal.com/reports/printable_diary/{username}?from={start}&to={end}'
page = requests.get(url)
if page.status_code == 200:
return page
if __name__ == '__main__':
page = get_mfp_page(USERNAME, START, END)
tables = pd.read_html(page.text)
tables[0].to_csv('output.csv')
tables[0].to_excel('output.xlsx')