Web scraping: Considerations on Copyright?
Good evening Gitlabers!
As discussed, I will present the first steps of my project my project on tuesday. At the same time, I will try to give some info about text mining workflows. I ve put some general and basic information the topic of text mining and data into the Wiki. This should be a work in progress. You don’t have to read everything, but if you are interested you can (it’s a wiki)!
Please take a look at the repository as well, where I will write a short description of Tuesday’s lesson!
Since I use the method of web scraping (see Wiki!!), a big/medium/small issue appeared: Please read this blog entry and then the terms of use of “Der Spiegel”.
My question to you would be: What are problems regarding (e.g.) copyright if I would perform web scraping and use aggregated data of texts from the Spiegel-Archive? Or do you have other ideas why this could be problematic?
(I know it’s a short-termed issue, but I would be glad if everybody could take a 5-minute-look at the two links before Tuesdays lesson.) Thank you so much!