Ultimate Python Cheat Sheet: Practical Python For Everyday Tasks (part 2)

Часть 1
Часть 3

Оглавление:

· Working With Scikit-Learn Library (Machine Learning)

· Working With Plotly Library (Interactive Data Visualization)

· Working With Dates and Times

· Working With More Advanced List Comprehensions and Lambda Functions

· Working With Object Oriented Programming

· Working With Decorators

· Working With GraphQL

· Working With Regular Expressions

· Working With Strings

· Working With Web Scraping

Working With Scikit-Learn Library (Machine Learning)

1. Loading a Dataset

To work with datasets for your ML experiments

from sklearn import datasets
iris = datasets.load_iris()
X, y = iris.data, iris.target

2. Splitting Data into Training and Test Sets

To divide your data, dedicating portions to training and evaluation:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

3. Training a Model

Training a ML Model using RandomForestClassifier:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

4. Making Predictions

To access the model predictions:

predictions = model.predict(X_test)

5. Evaluating Model Performance

To evaluate your model, measuring its accuracy in prediction:

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, predictions)
print(f"Model accuracy: {accuracy}")

6. Using Cross-Validation

To use Cross-Validation:

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"Cross-validation scores: {scores}")

7. Feature Scaling

To create the appropriate scales of your features, allowing the model to learn more effectively:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

8. Parameter Tuning with Grid Search

To refine your model’s parameters, seeking the optimal combination:

from sklearn.model_selection import GridSearchCV
param_grid = {'n_estimators': [10, 50, 100], 'max_depth': [None, 10, 20]}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

9. Pipeline Creation

To streamline your data processing and modeling steps, crafting a seamless flow:

from sklearn.pipeline import Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', RandomForestClassifier())
])
pipeline.fit(X_train, y_train)

10. Saving and Loading a Model

To preserve your model:

import joblib
# Saving the model
joblib.dump(model, 'model.joblib')
# Loading the model
loaded_model = joblib.load('model.joblib')

Working With Plotly Library (Interactive Data Visualization)

1. Creating a Basic Line Chart

To create a line chart:

import plotly.graph_objs as go
import plotly.io as pio
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines'))
pio.show(fig)

2. Creating a Scatter Plot

To create a scatter plot:

fig = go.Figure(data=go.Scatter(x=x, y=y, mode='markers'))
pio.show(fig)

3. Creating a Bar Chart

To Create a Bar Chart:

categories = ['A', 'B', 'C', 'D', 'E']
values = [10, 20, 15, 30, 25]
fig = go.Figure(data=go.Bar(x=categories, y=values))
pio.show(fig)

4. Creating a Pie Chart

To create a Pie Chart:

labels = ['Earth', 'Water', 'Fire', 'Air']
sizes = [25, 35, 20, 20]
fig = go.Figure(data=go.Pie(labels=labels, values=sizes))
pio.show(fig)

5. Creating a Histogram

To create a Histogram:

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
fig = go.Figure(data=go.Histogram(x=data))
pio.show(fig)

6. Creating Box Plots

To create a Box Plot:

data = [1, 2, 2, 3, 4, 4, 4, 5, 5, 6]
fig = go.Figure(data=go.Box(y=data))
pio.show(fig)

7. Creating Heatmaps

To create a heatmap:

import numpy as np
z = np.random.rand(10, 10)  # Generate random data
fig = go.Figure(data=go.Heatmap(z=z))
pio.show(fig)

8. Creating 3D Surface Plots

To create a 3D Surface Plot:

z = np.random.rand(20, 20)  # Generate random data
fig = go.Figure(data=go.Surface(z=z))
pio.show(fig)

9. Creating Subplots

To create a subplot:

from plotly.subplots import make_subplots
fig = make_subplots(rows=1, cols=2)
fig.add_trace(go.Scatter(x=x, y=y, mode='lines'), row=1, col=1)
fig.add_trace(go.Bar(x=categories, y=values), row=1, col=2)
pio.show(fig)

10. Creating Interactive Time Series

To work with Time Series:

import pandas as pd
dates = pd.date_range('20230101', periods=5)
values = [10, 11, 12, 13, 14]
fig = go.Figure(data=go.Scatter(x=dates, y=values, mode='lines+markers'))
pio.show(fig)

Working With Dates and Times

1. Getting the Current Date and Time

To get the current data and time:

from datetime import datetime
now = datetime.now()
print(f"Current date and time: {now}")

2. Creating Specific Date and Time

To conjure a moment from the past or future, crafting it with precision:

specific_time = datetime(2023, 1, 1, 12, 30)
print(f"Specific date and time: {specific_time}")

3. Formatting Dates and Times

Formatting Dates and Times:

formatted = now.strftime("%Y-%m-%d %H:%M:%S")
print(f"Formatted date and time: {formatted}")

4. Parsing Dates and Times from Strings

Parsing Dates and Times from Strings:

date_string = "2023-01-01 15:00:00"
parsed_date = datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
print(f"Parsed date and time: {parsed_date}")

5. Working with Time Deltas

To traverse the distances between moments, leaping forward or backward through time:

from datetime import timedelta
delta = timedelta(days=7)
future_date = now + delta
print(f"Date after 7 days: {future_date}")

6. Comparing Dates and Times

Date and Times comparisons:

if specific_time > now:
    print("Specific time is in the future.")
else:
    print("Specific time has passed.")

7. Extracting Components from a Date/Time

To extract dates year, month, day, and more:

year = now.year
month = now.month
day = now.day
hour = now.hour
minute = now.minute
second = now.second
print(f"Year: {year}, Month: {month}, Day: {day}, Hour: {hour}, Minute: {minute}, Second: {second}")

8. Working with Time Zones

To work with time zones honoring the local time:

from datetime import timezone, timedelta
utc_time = datetime.now(timezone.utc)
print(f"Current UTC time: {utc_time}")
# Adjusting to a specific timezone (e.g., EST)
est_time = utc_time - timedelta(hours=5)
print(f"Current EST time: {est_time}")

9. Getting the Weekday

To identify the day of the week:

weekday = now.strftime("%A")
print(f"Today is: {weekday}")

10. Working with Unix Timestamps

To converse with the ancient epochs, translating their count from the dawn of Unix:

timestamp = datetime.timestamp(now)
print(f"Current timestamp: {timestamp}")
# Converting a timestamp back to a datetime
date_from_timestamp = datetime.fromtimestamp(timestamp)
print(f"Date from timestamp: {date_from_timestamp}")

Working With More Advanced List Comprehensions and Lambda Functions

1. Nested List Comprehensions

To work with nested list Comprehensions:

matrix = [[j for j in range(5)] for i in range(3)]
print(matrix)  # Creates a 3x5 matrix

2. Conditional List Comprehensions

To filter elements that meet your criteria:

filtered = [x for x in range(10) if x % 2 == 0]
print(filtered)  # Even numbers from 0 to 9

3. List Comprehensions with Multiple Iterables

To merge and transform elements from multiple sources in a single dance:

pairs = [(x, y) for x in [1, 2, 3] for y in [3, 1, 4] if x != y]
print(pairs)  # Pairs of non-equal elements

4. Using Lambda Functions

To summon anonymous functions, ephemeral and concise, for a single act of magic:

square = lambda x: x**2
print(square(5))  # Returns 25

5. Lambda Functions in List Comprehensions

To employ lambda functions within your list comprehensions:

squared = [(lambda x: x**2)(x) for x in range(5)]
print(squared)  # Squares of numbers from 0 to 4

6. List Comprehensions for Flattening Lists

To flatten a nested list, spreading its elements into a single dimension:

nested = [[1, 2, 3], [4, 5], [6, 7]]
flattened = [x for sublist in nested for x in sublist]
print(flattened)

7. Applying Functions to Elements

To apply a transformation function to each element:

import math
transformed = [math.sqrt(x) for x in range(1, 6)]
print(transformed)  # Square roots of numbers from 1 to 5

8. Using Lambda with Map and Filter

To map and filter lists:

mapped = list(map(lambda x: x**2, range(5)))
filtered = list(filter(lambda x: x > 5, mapped))
print(mapped)    # Squares of numbers from 0 to 4
print(filtered)  # Elements greater than 5

9. List Comprehensions with Conditional Expressions

List Comprehensions with Condidtional Expressions:

conditional = [x if x > 2 else x**2 for x in range(5)]
print(conditional)  # Squares numbers less than or equal to 2, passes others unchanged

10. Complex Transformations with Lambda

To conduct intricate transformations, using lambda functions:

complex_transformation = list(map(lambda x: x**2 if x % 2 == 0 else x + 5, range(5)))
print(complex_transformation)  # Applies different transformations based on even-odd condition

Working With Object Oriented Programming

1. Defining a Class

Creating a class:

class Wizard:
    def __init__(self, name, power):
        self.name = name
        self.power = power
   def cast_spell(self):
        print(f"{self.name} casts a spell with power {self.power}!")

2. Creating an Instance

To create an instance of your class:

merlin = Wizard("Merlin", 100)

3. Invoking Methods

To call methods on instance of class:

merlin.cast_spell()

4. Inheritance

Subclassing:

class ArchWizard(Wizard):
    def __init__(self, name, power, realm):
        super().__init__(name, power)
        self.realm = realm
    def summon_familiar(self):
        print(f"{self.name} summons a familiar from the {self.realm} realm.")

5. Overriding Methods

To overide base classes:

class Sorcerer(Wizard):
    def cast_spell(self):
        print(f"{self.name} casts a powerful dark spell!")

6. Polymorphism

To interact with different forms through a common interface:

def unleash_magic(wizard):
    wizard.cast_spell()
unleash_magic(merlin)
unleash_magic(Sorcerer("Voldemort", 90))

7. Encapsulation

To use information hiding:

class Alchemist:
    def __init__(self, secret_ingredient):
        self.__secret = secret_ingredient
    def reveal_secret(self):
        print(f"The secret ingredient is {self.__secret}")

8. Composition

To assemble Objects from simpler ones:

class Spellbook:
    def __init__(self, spells):
        self.spells = spells
class Mage:
    def __init__(self, name, spellbook):
        self.name = name
        self.spellbook = spellbook

9. Class Methods and Static Methods

To bind actions to the class itself or liberate them from the instance, serving broader purposes:

class Enchanter:
    @staticmethod
    def enchant(item):
        print(f"{item} is enchanted!")
    @classmethod
    def summon(cls):
        print("A new enchanter is summoned.")

10. Properties and Setters

To elegantly manage access to an entity’s attributes, guiding their use and protection:

class Elementalist:
    def __init__(self, element):
        self._element = element
    @property
    def element(self):
        return self._element
   @element.setter
    def element(self, value):
        if value in ["Fire", "Water", "Earth", "Air"]:
            self._element = value
        else:
            print("Invalid element!")

Working With Decorators

1. Basic Decorator

To create a simple decorator that wraps a function:

def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

@my_decorator
def say_hello():
    print("Hello!")

say_hello()

2. Decorator with Arguments

To pass arguments to the function within a decorator:

def my_decorator(func):
    def wrapper(*args, **kwargs):
        print("Before call")
        result = func(*args, **kwargs)
        print("After call")
        return result
    return wrapper

@my_decorator
def greet(name):
    print(f"Hello {name}")

greet("Alice")

3. Using functools.wraps

To preserve the metadata of the original function when decorating:

from functools import wraps

def my_decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        """Wrapper function"""
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def greet(name):
    """Greet someone"""
    print(f"Hello {name}")

print(greet.__name__)  # Outputs: 'greet'
print(greet.__doc__)   # Outputs: 'Greet someone'

4. Class Decorator

To create a decorator using a class:

class MyDecorator:
    def __init__(self, func):
        self.func = func
   def __call__(self, *args, **kwargs):
        print("Before call")
        self.func(*args, **kwargs)
        print("After call")

@MyDecorator
def greet(name):
    print(f"Hello {name}")

greet("Alice")

5. Decorator with Arguments

To create a decorator that accepts its own arguments:

def repeat(times):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for _ in range(times):
                func(*args, **kwargs)
        return wrapper
    return decorator

@repeat(3)
def say_hello():
    print("Hello")

say_hello()

6. Method Decorator

To apply a decorator to a method within a class:

def method_decorator(func):
    @wraps(func)
    def wrapper(self, *args, **kwargs):
        print("Method Decorator")
        return func(self, *args, **kwargs)
    return wrapper

class MyClass:
    @method_decorator
    def greet(self, name):
        print(f"Hello {name}")

obj = MyClass()
obj.greet("Alice")

7. Stacking Decorators

To apply multiple decorators to a single function:

@my_decorator
@repeat(2)
def greet(name):
    print(f"Hello {name}")

greet("Alice")

8. Decorator with Optional Arguments

Creating a decorator that works with or without arguments:

def smart_decorator(arg=None):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            if arg:
                print(f"Argument: {arg}")
            return func(*args, **kwargs)
        return wrapper
    if callable(arg):
        return decorator(arg)
    return decorator

@smart_decorator
def no_args():
    print("No args")

@smart_decorator("With args")
def with_args():
    print("With args")

no_args()
with_args()

9. Class Method Decorator

To decorate a class method:

class MyClass:
    @classmethod
    @my_decorator
    def class_method(cls):
        print("Class method called")

MyClass.class_method()

10. Decorator for Static Method

To decorate a static method:

class MyClass:
    @staticmethod
    @my_decorator
    def static_method():
        print("Static method called")

MyClass.static_method()

Working With GraphQL

1. Setting Up a GraphQL Client

To work with GraphQL:

from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport
transport = RequestsHTTPTransport(url='https://your-graphql-endpoint.com/graphql')
client = Client(transport=transport, fetch_schema_from_transport=True)

2. Executing a Simple Query

Executing a Query:

query = gql('''
{
  allWizards {
    id
    name
    power
  }
}
''')

result = client.execute(query)
print(result)

3. Executing a Query with Variables

Query with Variables:

query = gql('''
query GetWizards($element: String!) {
  wizards(element: $element) {
    id
    name
  }
}
''')
params = {"element": "Fire"}
result = client.execute(query, variable_values=params)
print(result)

4. Mutations

To create and execute a mutation:

mutation = gql('''
mutation CreateWizard($name: String!, $element: String!) {
  createWizard(name: $name, element: $element) {
    wizard {
      id
      name
    }
  }
}
''')
params = {"name": "Gandalf", "element": "Light"}
result = client.execute(mutation, variable_values=params)
print(result)

5. Handling Errors

Error handling:

from gql import gql, Client
from gql.transport.exceptions import TransportQueryError
try:
    result = client.execute(query)
except TransportQueryError as e:
    print(f"GraphQL Query Error: {e}")

6. Subscriptions

Working with Subscriptions:

subscription = gql('''
subscription {
  wizardUpdated {
    id
    name
    power
  }
}
''')
for result in client.subscribe(subscription):
    print(result)

7. Fragments

Working with Fragments:

query = gql('''
fragment WizardDetails on Wizard {
  name
  power
}
query {
  allWizards {
    ...WizardDetails
  }
}
''')
result = client.execute(query)
print(result)

8. Inline Fragments

To tailor the response based on the type of the object returned:

query = gql('''
{
  search(text: "magic") {
    __typename
    ... on Wizard {
      name
      power
    }
    ... on Spell {
      name
      effect
    }
  }
}
''')
result = client.execute(query)
print(result)

9. Using Directives

To dynamically include or skip fields in your queries based on conditions:

query = gql('''
query GetWizards($withPower: Boolean!) {
  allWizards {
    name
    power @include(if: $withPower)
  }
}
''')
params = {"withPower": True}
result = client.execute(query, variable_values=params)
print(result)

10. Batching Requests

To combine multiple operations into a single request, reducing network overhead:

from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

transport = RequestsHTTPTransport(url='https://your-graphql-endpoint.com/graphql', use_json=True)
client = Client(transport=transport, fetch_schema_from_transport=True)

query1 = gql('query { wizard(id: "1") { name } }')
query2 = gql('query { allSpells { name } }')

results = client.execute([query1, query2])
print(results)

Working With Regular Expressions

1. Basic Pattern Matching

To find a match for a pattern within a string:

import re
text = "Search this string for patterns."
match = re.search(r"patterns", text)
if match:
    print("Pattern found!")

2. Compiling Regular Expressions

To compile a regular expression for repeated use:

pattern = re.compile(r"patterns")
match = pattern.search(text)

3. Matching at the Beginning or End

To check if a string starts or ends with a pattern:

if re.match(r"^Search", text):
    print("Starts with 'Search'")
if re.search(r"patterns.$", text):
    print("Ends with 'patterns.'")

4. Finding All Matches

To find all occurrences of a pattern in a string:

all_matches = re.findall(r"t\w+", text)  # Finds words starting with 't'
print(all_matches)

5. Search and Replace (Substitution)

To replace occurrences of a pattern within a string:

replaced_text = re.sub(r"string", "sentence", text)
print(replaced_text)

6. Splitting a String

To split a string by occurrences of a pattern:

words = re.split(r"\s+", text)  # Split on one or more spaces
print(words)

7. Escaping Special Characters

To match special characters literally, escape them:

escaped = re.search(r"\bfor\b", text)  # \b is a word boundary

8. Grouping and Capturing

To group parts of a pattern and extract their values:

match = re.search(r"(\w+) (\w+)", text)
if match:
    print(match.group())  # The whole match
    print(match.group(1)) # The first group

9. Non-Capturing Groups

To define groups without capturing them:

match = re.search(r"(?:\w+) (\w+)", text)
if match:
    print(match.group(1))  # The first (and only) group

10. Lookahead and Lookbehind Assertions

To match a pattern based on what comes before or after it without including it in the result:

lookahead = re.search(r"\b\w+(?= string)", text)  # Word before ' string'
lookbehind = re.search(r"(?<=Search )\w+", text)  # Word after 'Search '
if lookahead:
    print(lookahead.group())
if lookbehind:
    print(lookbehind.group())

11. Flags to Modify Pattern Matching Behavior

To use flags like re.IGNORECASE to change how patterns are matched:

case_insensitive = re.findall(r"search", text, re.IGNORECASE)
print(case_insensitive)

12. Using Named Groups

To assign names to groups and reference them by name:

match = re.search(r"(?P<first>\w+) (?P<second>\w+)", text)
if match:
    print(match.group('first'))
    print(match.group('second'))

13. Matching Across Multiple Lines

To match patterns over multiple lines using the re.MULTILINE flag:

multi_line_text = "Start\nmiddle end"
matches = re.findall(r"^m\w+", multi_line_text, re.MULTILINE)
print(matches)

14. Lazy Quantifiers

To match as few characters as possible using lazy quantifiers (*?, +?, ??):

html = "<body><h1>Title</h1></body>"
match = re.search(r"<.*?>", html)
if match:
    print(match.group())  # Matches '<body>'

15. Verbose Regular Expressions

To use re.VERBOSE for more readable regular expressions:

pattern = re.compile(r"""
    \b      # Word boundary
    \w+     # One or more word characters
    \s      # Space
    """, re.VERBOSE)
match = pattern.search(text)

Working With Strings

1. Concatenating Strings

To join strings together:

greeting = "Hello"
name = "Alice"
message = greeting + ", " + name + "!"
print(message)

2. String Formatting with str.format

To insert values into a string template:

message = "{}, {}. Welcome!".format(greeting, name)
print(message)

3. Formatted String Literals (f-strings)

To embed expressions inside string literals (Python 3.6+):

message = f"{greeting}, {name}. Welcome!"
print(message)

4. String Methods — Case Conversion

To change the case of a string:

s = "Python"
print(s.upper())  # Uppercase
print(s.lower())  # Lowercase
print(s.title())  # Title Case

5. String Methods — strip, rstrip, lstrip

To remove whitespace or specific characters from the ends of a string:

s = "   trim me   "
print(s.strip())   # Both ends
print(s.rstrip())  # Right end
print(s.lstrip())  # Left end

6. String Methods — startswith, endswith

To check the start or end of a string for specific text:

s = "filename.txt"
print(s.startswith("file"))  # True
print(s.endswith(".txt"))    # True

7. String Methods — split, join

To split a string into a list or join a list into a string:

s = "split,this,string"
words = s.split(",")        # Split string into list
joined = " ".join(words)    # Join list into string
print(words)
print(joined)

8. String Methods — replace

To replace parts of a string with another string:

s = "Hello world"
new_s = s.replace("world", "Python")
print(new_s)

9. String Methods — find, index

To find the position of a substring within a string:

s = "look for a substring"
position = s.find("substring")  # Returns -1 if not found
index = s.index("substring")    # Raises ValueError if not found
print(position)
print(index)

10. String Methods — Working with Characters

To process individual characters in a string:

s = "characters"
for char in s:
    print(char)  # Prints each character on a new line

11. String Methods — isdigit, isalpha, isalnum

To check if a string contains only digits, alphabetic characters, or alphanumeric characters:

print("123".isdigit())   # True
print("abc".isalpha())   # True
print("abc123".isalnum())# True

12. String Slicing

To extract a substring using slicing:

s = "slice me"
sub = s[2:7]  # From 3rd to 7th character
print(sub)

13. String Length with len

To get the length of a string:

s = "length"
print(len(s))  # 6

14. Multiline Strings

To work with strings spanning multiple lines:

multi = """Line one
Line two
Line three"""
print(multi)

15. Raw Strings

To treat backslashes as literal characters, useful for regex patterns and file paths:

path = r"C:\User\name\folder"
print(path)

Working With Web Scraping

1. Fetching Web Pages with requests

To retrieve the content of a web page:

import requests

url = 'https://example.com'
response = requests.get(url)
html = response.text

2. Parsing HTML with BeautifulSoup

To parse HTML and extract data:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())  # Pretty-print the HTML

3. Navigating the HTML Tree

To find elements using tags:

title = soup.title.text  # Get the page title
headings = soup.find_all('h1')  # List of all <h1> tags

4. Using CSS Selectors

To select elements using CSS selectors:

articles = soup.select('div.article')  # All elements with class 'article' inside a <div>

5. Extracting Data from Tags

To extract text and attributes from HTML elements:

for article in articles:
    title = article.h2.text  # Text inside the <h2> tag
    link = article.a['href']  # 'href' attribute of the <a> tag
    print(title, link)

6. Handling Relative URLs

To convert relative URLs to absolute URLs:

from urllib.parse import urljoin
absolute_urls = [urljoin(url, link) for link in relative_urls]

7. Dealing with Pagination

To scrape content across multiple pages:

base_url = "https://example.com/page/"
for page in range(1, 6):  # For 5 pages
    page_url = base_url + str(page)
    response = requests.get(page_url)
    # Process each page's content

8. Handling AJAX Requests

To scrape data loaded by AJAX requests:

# Find the URL of the AJAX request (using browser's developer tools) and fetch it
ajax_url = 'https://example.com/ajax_endpoint'
data = requests.get(ajax_url).json()  # Assuming the response is JSON

9. Using Regular Expressions in Web Scraping

To extract data using regular expressions:

import re
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', html)

10. Respecting robots.txt

To check robots.txt for scraping permissions:

from urllib.robotparser import RobotFileParser

rp = RobotFileParser()
rp.set_url('https://example.com/robots.txt')
rp.read()
can_scrape = rp.can_fetch('*', url)

11. Using Sessions and Cookies

To maintain sessions and handle cookies:

session = requests.Session()
session.get('https://example.com/login')
session.cookies.set('key', 'value')  # Set cookies, if needed
response = session.get('https://example.com/protected_page')

12. Scraping with Browser Automation (selenium Library)

To scrape dynamic content rendered by JavaScript:

from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://example.com')
content = browser.page_source
# Parse and extract data using BeautifulSoup, etc.
browser.quit()

13. Error Handling in Web Scraping

To handle errors and exceptions:

try:
    response = requests.get(url, timeout=5)
    response.raise_for_status()  # Raises an error for bad status codes
except requests.exceptions.RequestException as e:
    print(f"Error: {e}")

14. Asynchronous Web Scraping

To scrape websites asynchronously for faster data retrieval:

import aiohttp
import asyncio

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

urls = ['https://example.com/page1', 'https://example.com/page2']
loop = asyncio.get_event_loop()
pages = loop.run_until_complete(asyncio.gather(*(fetch(url) for url in urls)))

15. Data Storage (CSV, Database)

To store scraped data in a CSV file or a database:

import csv

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'URL'])
    for article in articles:
        writer.writerow([article['title'], article['url']])

Оригинал статьи (доступ только через VPN)

Наверх