download_html_perl

Repository which shows some of examples of downloading HTML from a URL
git clone git://git.samirparikh.com/download_html_perl
Log | Files | Refs | README

commit 13db666fb046c8ac62aca95835833e9164507f9c
Author: Samir Parikh <noreply@samirparikh.com>
Date:   Wed,  7 Dec 2022 20:57:27 +0000

add new repo

Diffstat:
AREADME.md | 22++++++++++++++++++++++
Acurl2lwp.pl | 27+++++++++++++++++++++++++++
Amojo_ua.pl | 31+++++++++++++++++++++++++++++++
3 files changed, 80 insertions(+), 0 deletions(-)

diff --git a/README.md b/README.md @@ -0,0 +1,22 @@ +This repository contains a couple of examples of how to download HTML from a URL +using two different Perl modules: + +* LWP::UserAgent Mojo::UserAgent + +Both of these approaches requiring knowing the Session ID of a valid cookie to +the site you are trying to download from. To find the Session ID, first log +into a site in either Chrome or Firefox and then launch the "Developer Tools" or +"Web Developer Tool". Pressing `Ctrl+Shift+I` should work on both browsers. + +From there, go to "Storage", "Cookies" and then click on the cookie for the URL +you want to download. Note the "value" field where "Name" equals `session`. +The value of "value" should be something like: + +`90ipwx7093le8uu5jjaiva12mdhdfftyb8ig44eydhvimjva9roqwmiutpwzphekeje82qr6469pt71 +9r86gmnp2z5ja4sxjbokvyj8pilaweo17tdcvhidayvzyt4yc` + +You will need to provide that in the hash reference for the `cookie_jar`. + +Both of these examples are an attempt to replicate the `curl` command: + +`curl https://example.com --cookie "session=90ip...pt71" -o "output.txt"` diff --git a/curl2lwp.pl b/curl2lwp.pl @@ -0,0 +1,27 @@ +#!/usr/local/bin/perl + +# make sure you have installed: +# LWP::UserAgent +# LWP::Protocol::https + +# +# taken from: +# https://corion.net/curl2lwp.psgi +# + +use strict; +use warnings; +use LWP::UserAgent; + +my $ua = LWP::UserAgent->new( 'send_te' => '0' ); +my $r = HTTP::Request->new( + 'GET' => 'https://example.com', + [ + 'Accept' => '*/*', + 'User-Agent' => 'curl/7.55.1', + 'Cookie' => '90ip...pt71' + ], + +); +my $res = $ua->request( $r ); +print $res->decoded_content, "\n"; diff --git a/mojo_ua.pl b/mojo_ua.pl @@ -0,0 +1,31 @@ +#!/usr/local/bin/perl + +use strict; +use warnings; +use v5.32; +use Mojo::UserAgent; + +# +# See guides at: +# https://docs.mojolicious.org/Mojo/UserAgent +# https://docs.mojolicious.org/Mojo/Cookie/Response +# https://metacpan.org/pod/Mojo::UserAgent +# + +my $ua = Mojo::UserAgent->new; +my $url = "https://example.com"; + +$ua->cookie_jar->add( + Mojo::Cookie::Response->new( + name => 'session', + value => '90ip...pt71', + domain => 'example.com', + path => '/' + ) +); + +my $res = $ua->get( $url )->result; +if ($res->is_success) { say $res->body } +elsif ($res->is_error) { say $res->message } +elsif ($res->code == 301) { say $res->headers->location } +else { say 'Whatever...' }