Integrity
Write
Loading...
Farhad Malik

Farhad Malik

3 years ago

How This Python Script Makes Me Money Every Day

More on Technology

Frank Andrade

Frank Andrade

2 years ago

I discovered a bug that allowed me to use ChatGPT to successfully web scrape. Here's how it operates.

This method scrapes websites with ChatGPT (demo with Amazon and Twitter)

Photo by Mikhail Nilov on Pexels

In a recent article, I demonstrated how to scrape websites using ChatGPT prompts like scrape website X using Python.

But that doesn’t always work.

After scraping dozens of websites with ChatGPT, I realized that simple prompts rarely work for web scraping.

Using ChatGPT and basic HTML, we can scrape any website.

First things first, use ChatGPT's advanced version (Playground)

ChatGPT Playground's enhanced version is needed to scrape websites quickly. It generates code faster and has less constraints.

Here’s how it looks.

The classic ChatGPT display just allows you to type a prompt. Playground is more customizable and generates code faster than the basic version.

No more delays or restrictions.

In this tutorial, we'll write prompts underneath Playground.

How to Use ChatGPT to Scrape Any Website

To show you how to utilize ChatGPT to scrape any website, we'll use subslikescript, which lists movies.

Later in this post, I'll show you how to scrape Amazon and Twitter, but let's start easy.

If we try a basic prompt like “scrape the movie titles on this website: https://subslikescript.com/movies” it won’t scrape anything. Here’s when we need some HTML knowledge to build the right prompt for ChatGPT.Let's scrape the above movie titles. First, check the webpage. Right-click any element and choose Inspect.

Then see below.

We identify the element to scrape (those in the red box). The data elements are ul, a, and scripts-list.

Here's the prompt for movie titles.

Scrape this website: https://subslikescript.com/movies with Python and Beautiful Soup. Locate the element with tag “ul” and class “scripts-list”. Scrape all the “a” elements inside. Get the text attribute and print it.

Code generated:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://subslikescript.com/movies/')

soup = BeautifulSoup(page.content, 'html.parser')

scripts_list = soup.find(class_="scripts-list")

all_a_elements = scripts_list.find_all('a')

for element in all_a_elements:
    print(element.get_text())

It extracts movie titles successfully.

Let's scrape Amazon and Twitter.

ChatGPT's Amazon scraping

Consider scraping Amazon for self-help books. First, copy the Amazon link for self-help books.

Here’s the link I got. Location-dependent connection. Use my link to replicate my results.

Now we'll check book titles. Here's our element.

If we want to extract the book titles, we need to use the tag name spanclass attribute name and a-size-base-plus a-color-base a-text-normalattribute value.

This time I'll use Selenium. I'll add Selenium-specific commands like wait 5 seconds and generate an XPath.

Scrape this website https://www.amazon.com/s?k=self+help+books&sprefix=self+help+%2Caps%2C158&ref=nb_sb_ss_ts-doa-p_2_10 with Python and Selenium.

Wait 5 seconds and locate all the elements with the following xpath: “span” tag, “class” attribute name, and “a-size-base-plus a-color-base a-text-normal” attribute value. Get the text attribute and print them.

Code generated: (I only had to manually add the path where my chromedriver is located).

from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep

#initialize webdriver
driver = webdriver.Chrome('<add path of your chromedriver>')

#navigate to the website
driver.get("https://www.amazon.com/s?k=self+help+books&sprefix=self+help+%2Caps%2C158&ref=nb_sb_ss_ts-doa-p_2_10")

#wait 5 seconds to let the page load
sleep(5)

#locate all the elements with the following xpath
elements = driver.find_elements(By.XPATH, '//span[@class="a-size-base-plus a-color-base a-text-normal"]')

#get the text attribute of each element and print it
for element in elements:
    print(element.text)

#close the webdriver
driver.close()

It pulls Amazon book titles.

Utilizing ChatGPT to scrape Twitter

Say you wish to scrape ChatGPT tweets. Search Twitter for ChatGPT and copy the URL.

Here’s the link I got. We must check every tweet. Here's our element.

To extract a tweet, use the div tag and lang attribute.

Again, Selenium.

Scrape this website: https://twitter.com/search?q=chatgpt&src=typed_query using Python, Selenium and chromedriver.

Maximize the window, wait 15 seconds and locate all the elements that have the following XPath: “div” tag, attribute name “lang”. Print the text inside these elements.

Code generated: (again, I had to add the path where my chromedriver is located)

from selenium import webdriver
import time

driver = webdriver.Chrome("/Users/frankandrade/Downloads/chromedriver")
driver.maximize_window()
driver.get("https://twitter.com/search?q=chatgpt&src=typed_query")
time.sleep(15)

elements = driver.find_elements_by_xpath("//div[@lang]")
for element in elements:
    print(element.text)

driver.quit()

You'll get the first 2 or 3 tweets from a search. To scrape additional tweets, click X times.

Congratulations! You scraped websites without coding by using ChatGPT.

Amelia Winger-Bearskin

Amelia Winger-Bearskin

3 years ago

Reasons Why AI-Generated Images Remind Me of Nightmares

AI images are like funhouse mirrors.

Google's AI Blog introduced the puppy-slug in the summer of 2015.

Vice / DeepDream

Puppy-slug isn't a single image or character. "Puppy-slug" refers to Google's DeepDream's unsettling psychedelia. This tool uses convolutional neural networks to train models to recognize dataset entities. If researchers feed the model millions of dog pictures, the network will learn to recognize a dog.

DeepDream used neural networks to analyze and classify image data as well as generate its own images. DeepDream's early examples were created by training a convolutional network on dog images and asking it to add "dog-ness" to other images. The models analyzed images to find dog-like pixels and modified surrounding pixels to highlight them.

Puppy-slugs and other DeepDream images are ugly. Even when they don't trigger my trypophobia, they give me vertigo when my mind tries to reconcile familiar features and forms in unnatural, physically impossible arrangements. I feel like I've been poisoned by a forbidden mushroom or a noxious toad. I'm a Lovecraft character going mad from extradimensional exposure. They're gross!

Is this really how AIs see the world? This is possibly an even more unsettling topic that DeepDream raises than the blatant abjection of the images.

When these photographs originally circulated online, many friends were startled and scandalized. People imagined a computer's imagination would be literal, accurate, and boring. We didn't expect vivid hallucinations and organic-looking formations.

DeepDream's images didn't really show the machines' imaginations, at least not in the way that scared some people. DeepDream displays data visualizations. DeepDream reveals the "black box" of convolutional network training.

Some of these images look scary because the models don't "know" anything, at least not in the way we do.

These images are the result of advanced algorithms and calculators that compare pixel values. They can spot and reproduce trends from training data, but can't interpret it. If so, they'd know dogs have two eyes and one face per head. If machines can think creatively, they're keeping it quiet.

You could be forgiven for thinking otherwise, given OpenAI's Dall-impressive E's results. From a technological perspective, it's incredible.

Arthur C. Clarke once said, "Any sufficiently advanced technology is indistinguishable from magic." Dall-magic E's requires a lot of math, computer science, processing power, and research. OpenAI did a great job, and we should applaud them.

Dall-E and similar tools match words and phrases to image data to train generative models. Matching text to images requires sorting and defining the images. Untold millions of low-wage data entry workers, content creators optimizing images for SEO, and anyone who has used a Captcha to access a website make these decisions. These people could live and die without receiving credit for their work, even though the project wouldn't exist without them.

This technique produces images that are less like paintings and more like mirrors that reflect our own beliefs and ideals back at us, albeit via a very complex prism. Due to the limitations and biases that these models portray, we must exercise caution when viewing these images.

The issue was succinctly articulated by artist Mimi Onuoha in her piece "On Algorithmic Violence":

As we continue to see the rise of algorithms being used for civic, social, and cultural decision-making, it becomes that much more important that we name the reality that we are seeing. Not because it is exceptional, but because it is ubiquitous. Not because it creates new inequities, but because it has the power to cloak and amplify existing ones. Not because it is on the horizon, but because it is already here.

Clive Thompson

Clive Thompson

2 years ago

Small Pieces of Code That Revolutionized the World

Few sentences can have global significance.

Photo by Chris Ried on Unsplash

Ethan Zuckerman invented the pop-up commercial in 1997.

He was working for Tripod.com, an online service that let people make little web pages for free. Tripod offered advertising to make money. Advertisers didn't enjoy seeing their advertising next to filthy content, like a user's anal sex website.

Zuckerman's boss wanted a solution. Wasn't there a way to move the ads away from user-generated content?

When you visited a Tripod page, a pop-up ad page appeared. So, the ad isn't officially tied to any user page. It'd float onscreen.

Here’s the thing, though: Zuckerman’s bit of Javascript, that created the popup ad? It was incredibly short — a single line of code:

window.open('http://tripod.com/navbar.html'
"width=200, height=400, toolbar=no, scrollbars=no, resizable=no, target=_top");

Javascript tells the browser to open a 200-by-400-pixel window on top of any other open web pages, without a scrollbar or toolbar.

Simple yet harmful! Soon, commercial websites mimicked Zuckerman's concept, infesting the Internet with pop-up advertising. In the early 2000s, a coder for a download site told me that most of their revenue came from porn pop-up ads.

Pop-up advertising are everywhere. You despise them. Hopefully, your browser blocks them.

Zuckerman wrote a single line of code that made the world worse.

A photo of the cover of “You Are Not Expected To Understand This”; it is blue and lying on its side, with the spine facing the viewer. The editor’s name, Torie Bosch, is in a green monospaced font; the title is in a white monospaced font

I read Zuckerman's story in How 26 Lines of Code Changed the World. Torie Bosch compiled a humorous anthology of short writings about code that tipped the world.

Most of these samples are quite short. Pop-cultural preconceptions about coding say that important code is vast and expansive. Hollywood depicts programmers as blurs spouting out Niagaras of code. Google's success was formerly attributed to its 2 billion lines of code.

It's usually not true. Google's original breakthrough, the piece of code that propelled Google above its search-engine counterparts, was its PageRank algorithm, which determined a web page's value based on how many other pages connected to it and the quality of those connecting pages. People have written their own Python versions; it's only a few dozen lines.

Google's operations, like any large tech company's, comprise thousands of procedures. So their code base grows. The most impactful code can be brief.

The examples are fascinating and wide-ranging, so read the whole book (or give it to nerds as a present). Charlton McIlwain wrote a chapter on the police beat algorithm developed in the late 1960s to anticipate crime hotspots so law enforcement could dispatch more officers there. It created a racial feedback loop. Since poor Black neighborhoods were already overpoliced compared to white ones, the algorithm directed more policing there, resulting in more arrests, which convinced it to send more police; rinse and repeat.

Kelly Chudler's You Are Not Expected To Understand This depicts the police-beat algorithm.

About 25 lines of code that includes several mathematical formula. Alas, it’s hard to redact it in plain text here, since it uses mathematical notation

Even shorter code changed the world: the tracking pixel.

Lily Hay Newman's chapter on monitoring pixels says you probably interact with this code every day. It's a snippet of HTML that embeds a single tiny pixel in an email. Getting an email with a tracking code spies on me. As follows: My browser requests the single-pixel image as soon as I open the mail. My email sender checks to see if Clives browser has requested that pixel. My email sender can tell when I open it.

Adding a tracking pixel to an email is easy:

<img src="URL LINKING TO THE PIXEL ONLINE" width="0" height="0">

An older example: Ellen R. Stofan and Nick Partridge wrote a chapter on Apollo 11's lunar module bailout code. This bailout code operated on the lunar module's tiny on-board computer and was designed to prioritize: If the computer grew overloaded, it would discard all but the most vital work.

When the lunar module approached the moon, the computer became overloaded. The bailout code shut down anything non-essential to landing the module. It shut down certain lunar module display systems, scaring the astronauts. Module landed safely.

22-line code

POODOO    INHINT
    CA  Q
    TS  ALMCADR

    TC  BANKCALL
    CADR  VAC5STOR  # STORE ERASABLES FOR DEBUGGING PURPOSES.

    INDEX  ALMCADR
    CAF  0
ABORT2    TC  BORTENT

OCT77770  OCT  77770    # DONT MOVE
    CA  V37FLBIT  # IS AVERAGE G ON
    MASK  FLAGWRD7
    CCS  A
    TC  WHIMPER -1  # YES.  DONT DO POODOO.  DO BAILOUT.

    TC  DOWNFLAG
    ADRES  STATEFLG

    TC  DOWNFLAG
    ADRES  REINTFLG

    TC  DOWNFLAG
    ADRES  NODOFLAG

    TC  BANKCALL
    CADR  MR.KLEAN
    TC  WHIMPER

This fun book is worth reading.

I'm a contributor to the New York Times Magazine, Wired, and Mother Jones. I've also written Coders: The Making of a New Tribe and the Remaking of the World and Smarter Than You Think: How Technology is Changing Our Minds. Twitter and Instagram: @pomeranian99; Mastodon: @clive@saturation.social.

You might also like

Will Lockett

Will Lockett

3 years ago

Russia's nukes may be useless

Russia's nuclear threat may be nullified by physics.

Putin seems nostalgic and wants to relive the Cold War. He's started a deadly war to reclaim the old Soviet state of Ukraine and is threatening the West with nuclear war. NATO can't risk starting a global nuclear war that could wipe out humanity to support Ukraine's independence as much as they want to. Fortunately, nuclear physics may have rendered Putin's nuclear weapons useless. However? How will Ukraine and NATO react?

To understand why Russia's nuclear weapons may be ineffective, we must first know what kind they are.

Russia has the world's largest nuclear arsenal, with 4,447 strategic and 1,912 tactical weapons (all of which are ready to be rolled out quickly). The difference between these two weapons is small, but it affects their use and logistics. Strategic nuclear weapons are ICBMs designed to destroy a city across the globe. Russia's ICBMs have many designs and a yield of 300–800 kilotonnes. 300 kilotonnes can destroy Washington. Tactical nuclear weapons are smaller and can be fired from artillery guns or small truck-mounted missile launchers, giving them a 1,500 km range. Instead of destroying a distant city, they are designed to eliminate specific positions, bases, or military infrastructure. They produce 1–50 kilotonnes.

These two nuclear weapons use different nuclear reactions. Pure fission bombs are compact enough to fit in a shell or small missile. All early nuclear weapons used this design for their fission bombs. This technology is inefficient for bombs over 50 kilotonnes. Larger bombs are thermonuclear. Thermonuclear weapons use a small fission bomb to compress and heat a hydrogen capsule, which undergoes fusion and releases far more energy than ignition fission reactions, allowing for effective giant bombs. 

Here's Russia's issue.

A thermonuclear bomb needs deuterium (hydrogen with one neutron) and tritium (hydrogen with two neutrons). Because these two isotopes fuse at lower energies than others, the bomb works. One problem. Tritium is highly radioactive, with a half-life of only 12.5 years, and must be artificially made.

Tritium is made by irradiating lithium in nuclear reactors and extracting the gas. Tritium is one of the most expensive materials ever made, at $30,000 per gram.

Why does this affect Putin's nukes?

Thermonuclear weapons need tritium. Tritium decays quickly, so they must be regularly refilled at great cost, which Russia may struggle to do.

Russia has a smaller economy than New York, yet they are running an invasion, fending off international sanctions, and refining tritium for 4,447 thermonuclear weapons.

The Russian military is underfunded. Because the state can't afford it, Russian troops must buy their own body armor. Arguably, Putin cares more about the Ukraine conflict than maintaining his nuclear deterrent. Putin will likely lose power if he loses the Ukraine war.

It's possible that Putin halted tritium production and refueling to save money for Ukraine. His threats of nuclear attacks and escalating nuclear war may be a bluff.

This doesn't help Ukraine, sadly. Russia's tactical nuclear weapons don't need expensive refueling and will help with the invasion. So Ukraine still risks a nuclear attack. The bomb that destroyed Hiroshima was 15 kilotonnes, and Russia's tactical Iskander-K nuclear missile has a 50-kiloton yield. Even "little" bombs are deadly.

We can't guarantee it's happening in Russia. Putin may prioritize tritium. He knows the power of nuclear deterrence. Russia may have enough tritium for this conflict. Stockpiling a material with a short shelf life is unlikely, though.

This means that Russia's most powerful weapons may be nearly useless, but they may still be deadly. If true, this could allow NATO to offer full support to Ukraine and push the Russian tyrant back where he belongs. If Putin withholds funds from his crumbling military to maintain his nuclear deterrent, he may be willing to sink the ship with him. Let's hope the former.

Matthew Royse

Matthew Royse

3 years ago

5 Tips for Concise Writing

Here's how to be clear.

I have only made this letter longer because I have not had the time to make it shorter.” — French mathematician, physicist, inventor, philosopher, and writer Blaise Pascal

Concise.

People want this. We tend to repeat ourselves and use unnecessary words.

Being vague frustrates readers. It focuses their limited attention span on figuring out what you're saying rather than your message.

Edit carefully.

Examine every word you put on paper. You’ll find a surprising number that don’t serve any purpose.” — American writer, editor, literary critic, and teacher William Zinsser

How do you write succinctly?

Here are three ways to polish your writing.

1. Delete

Your readers will appreciate it if you delete unnecessary words. If a word or phrase is essential, keep it. Don't force it.

Many readers dislike bloated sentences. Ask yourself if cutting a word or phrase will change the meaning or dilute your message.

For example, you could say, “It’s absolutely essential that I attend this meeting today, so I know the final outcome.” It’s better to say, “It’s critical I attend the meeting today, so I know the results.”

Key takeaway

Delete actually, completely, just, full, kind of, really, and totally. Keep the necessary words, cut the rest.

2. Just Do It

Don't tell readers your plans. Your readers don't need to know your plans. Who are you?

Don't say, "I want to highlight our marketing's problems." Our marketing issues are A, B, and C. This cuts 5–7 words per sentence.

Keep your reader's attention on the essentials, not the fluff. What are you doing? You won't lose readers because you get to the point quickly and don't build up.

Key takeaway

Delete words that don't add to your message. Do something, don't tell readers you will.

3. Cut Overlap

You probably repeat yourself unintentionally. You may add redundant sentences when brainstorming. Read aloud to detect overlap.

Remove repetition from your writing. It's important to edit our writing and thinking to avoid repetition.

Key Takeaway

If you're repeating yourself, combine sentences to avoid overlap.

4. Simplify

Write as you would to family or friends. Communicate clearly. Don't use jargon. These words confuse readers.

Readers want specifics, not jargon. Write simply. Done.

Most adults read at 8th-grade level. Jargon and buzzwords make speech fluffy. This confuses readers who want simple language.

Key takeaway

Ensure all audiences can understand you. USA Today's 5th-grade reading level is intentional. They want everyone to understand.

5. Active voice

Subjects perform actions in active voice. When you write in passive voice, the subject receives the action.

For example, “the board of directors decided to vote on the topic” is an active voice, while “a decision to vote on the topic was made by the board of directors” is a passive voice.

Key takeaway

Active voice clarifies sentences. Active voice is simple and concise.

Bringing It All Together

Five tips help you write clearly. Delete, just do it, cut overlap, use simple language, and write in an active voice.

Clear writing is effective. It's okay to occasionally use unnecessary words or phrases. Realizing it is key. Check your writing.

Adding words costs.

Write more concisely. People will appreciate it and read your future articles, emails, and messages. Spending extra time will increase trust and influence.

Not that the story need be long, but it will take a long while to make it short.” — Naturalist, essayist, poet, and philosopher Henry David Thoreau

Tim Denning

Tim Denning

3 years ago

I gave up climbing the corporate ladder once I realized how deeply unhappy everyone at the top was.

Restructuring and layoffs cause career reevaluation. Your career can benefit.

Photo by Humberto Chavez on Unsplash

Once you become institutionalized, the corporate ladder is all you know.

You're bubbled. Extremists term it the corporate Matrix. I'm not so severe because the business world brainwashed me, too.

This boosted my corporate career.

Until I hit bottom.

15 months later, I view my corporate life differently. You may wish to advance professionally. Read this before you do.

Your happiness in the workplace may be deceptive.

I've been fortunate to spend time with corporate aces.

Working for 2.5 years in banking social media gave me some of these experiences. Earlier in my career, I recorded interviews with business leaders.

These people have titles like Chief General Manager and Head Of. New titles brought life-changing salaries.

They seemed happy.

I’d pass them in the hallway and they’d smile or shake my hand. I dreamt of having their life.

The ominous pattern

Unfiltered talks with some of them revealed a different world.

They acted well. They were skilled at smiling and saying the correct things. All had the same dark pattern, though.

Something felt off.

I found my conversations with them were generally for their benefit. They hoped my online antics as a writer/coach would shed light on their dilemma.

They'd tell me they wanted more. When you're one position away from CEO, it's hard not to wonder if this next move will matter.

What really displeased corporate ladder chasers

Before ascending further, consider these.

Zero autonomy

As you rise in a company, your days get busier.

Many people and initiatives need supervision. Everyone expects you to know business details. Weak when you don't. A poor leader is fired during the next restructuring and left to pursue their corporate ambition.

Full calendars leave no time for reflection. You can't have a coffee with a friend or waste a day.

You’re always on call. It’s a roll call kinda life.

Unable to express oneself freely

My 8 years of LinkedIn writing helped me meet these leaders.

I didn't think they'd care. Mistake.

Corporate leaders envied me because they wanted to talk freely again without corporate comms or a PR firm directing them what to say.

They couldn't share their flaws or inspiring experiences.

They wanted to.

Every day they were muzzled eroded by their business dream.

Limited family time

Top leaders had families.

They've climbed the corporate ladder. Nothing excellent happens overnight.

Corporate dreamers rarely saw their families.

Late meetings, customer functions, expos, training, leadership days, team days, town halls, and product demos regularly occurred after work.

Or they had to travel interstate or internationally for work events. They used bags and motel showers.

Initially, they said business class flights and hotels were nice. They'd get bored. 5-star hotels become monotonous.

No hotel beats home.

One leader said he hadn't seen his daughter much. They used to Facetime, but now that he's been gone so long, she rarely wants to talk to him.

So they iPad-parented.

You're miserable without your family.

Held captive by other job titles

Going up the business ladder seems like a battle.

Leaders compete for business gains and corporate advancement.

I saw shocking filthy tricks. Leaders would lie to seem nice.

Captives included top officials.

A different section every week. If they ran technology, the Head of Sales would argue their CRM cost millions. Or an Operations chief would battle a product team over support requests.

After one conflict, another began.

Corporate echelons are antagonistic. Huge pay and bonuses guarantee bad behavior.

Overly centered on revenue

As you rise, revenue becomes more prevalent. Most days, you'd believe revenue was everything. Here’s the problem…

Numbers drain us.

Unless you're a closet math nerd, contemplating and talking about numbers drains your creativity.

Revenue will never substitute impact.

Incapable of taking risks

Corporate success requires taking fewer risks.

Risks can cause dismissal. Risks can interrupt business. Keep things moving so you may keep getting paid your enormous salary and bonus.

Restructuring or layoffs are inevitable. All corporate climbers experience it.

On this fateful day, a small few realize the game they’ve been trapped in and escape. Most return to play for a new company, but it takes time.

Addiction keeps them trapped. You know nothing else. The rest is strange.

You start to think “I’m getting old” or “it’s nearly retirement.” So you settle yet again for the trappings of the corporate ladder game to nowhere.

Should you climb the corporate ladder?

Let me end on a surprising note.

Young people should ascend the corporate ladder. It teaches you business skills and helps support your side gig and (potential) online business.

Don't get trapped, shackled, or muzzled.

Your ideas and creativity become stifled after too much gaming play.

Corporate success won't bring happiness.

Find fulfilling employment that matters. That's it.