How to get metadata from webpage link to show preview in PHP

by Vincy. Last modified on March 6th, 2024.

This tutorial shows how to show a webpage preview by extracting the metadata from a link using PHP. It uses PHP cURL to get the webpage title, description, and image if any from the open graph metadata.

Method 1: How to use PHP cURL to get meta tags for page preview?

The below steps are used to show the webpage link preview.

  1. Prepare a PHP cURL request to get the site HTML.
  2. Filter metadata from the cURL response using a regex pattern.
  3. Build an HTML code to show a webpage preview to the UI.

get metadata from url

1. Prepare a PHP cURL request to get the site HTML.

The getSiteHTMLViaCURL() function prepares PHP cURL requests to the webpage link. It returns the site HTML from the cURL response.

<?php
$link = 'http://example.com/';
$siteHTML = getSiteHTMLViaCURL($link);

function getSiteHTMLViaCURL($url)
{
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $siteHTML = curl_exec($ch);
    if (curl_errno($ch)) {
        return false;
    }
    curl_close($ch);
    return $siteHTML;
}
?>

2. Filter metadata from the cURL response using a regex pattern

The getMetadataFromHTML() prepares a regex pattern to filter the open graph metadata from the site HTML. It generates an array of metadata that matches the regex pattern.

Regex patterns are generally used to ensure the expected formats. We have already used the regex to validate email data and to measure password strength.

<?php
function getMetadataFromHTML($siteHTML)
{
    
    // Get the webpage metadata
    $metadataArray = [];
    if (preg_match_all('/<meta property="og:([^"]+)"\s*content="([^"]*)"/i', $siteHTML, $matches)) {
        $metadataArray = array_combine($matches[1], $matches[2]);
    }
    return $metadataArray;
}
$metadataArray = getMetadataFromHTML($siteHTML);
?>

3. Build an HTML code to show a webpage preview to the UI

The code below has the function generatePreviewHTML() uses the metadata og:title, og:description and og:image to form the preview HTML.

<?php
function generatePreviewHTML($metadataArray)
{
    // Generate HTML with webpage link preview
    $previewHTML = '<div class="site-preview">';
    $previewHTML .= '<h1>' . $metadataArray['title'] . '</h1>';
    $previewHTML .= '<p>' . $metadataArray['description'] . '</p>';

    if (isset($linkPreview['image'])) {
        $previewHTML .= '<p><img src="' . $metadataArray['image'] . '" alt="Site Logo"></p>';
    }

    $previewHTML .= '</div>';

    return $previewHTML;
}
?>
<!DOCTYPE html>
<html>
<head>
    <title>How to get metadata from webpage link to show preview in PHP</title>
    <link rel="stylesheet" type="text/css" href="style.css" />
</head>

<body>
    <div class="phppot-container">
        <?php
        echo generatePreviewHTML($metadataArray);
        ?>
    </div>
</body>
</html>

Method 2: How to create a metadata scrapper using JavaScript

This example gives details to scrape a URL using JavaScript. This is done on the client side, run by the browser. A client-side script with simple JavaScript fetch() or XMLHttpRequest can get the response from a different domain via a network call.

The request is served by your server from domain A and you will be scrapping a URL from domain B. For this scenario, CORS policy comes into the picture. By default, this is not allowed by a browser.

To bypass this restriction, the called domain (B) should declare either of the one below.

  1. cross-domain requests are allowed (this is a blanket allowed for all). This can be done using a .htaccess rule.
  2. whitelist the domain A for CORS.

The below code helps to test your API endpoints during development. You can write a client API, then allow CORS and test it.

get-metadata-via-ajax.html

<html>
<head>
    <meta charset="UTF-8">
    <title>Fetch HTML with CORS</title>
</head>
<body>
<script>
// Example AJAX code accesses remote data using XMLHttpRequest
var url = 'http://example.com';
xhr.onreadystatechange = function () {
    if (xhr.readyState === XMLHttpRequest.DONE) {
        if (xhr.status === 200) {
            var response = xhr.responseText;
            // Parse the meta tags from the response using DOMParser
            var parser = new DOMParser();
            var doc = parser.parseFromString(response, 'text/html');
            var metaTags = doc.querySelectorAll('meta');
            // Display the site preview data from the meta tags
            metaTags.forEach(function (tag) {
                if (tag.getAttribute('name') == 'og:title') {
                    document.write("<h1>" + tag.getAttribute('content') + "</h1>");
                }
                if (tag.getAttribute('name') == 'og:description') {
                    document.write("<p>" + tag.getAttribute('content') + "</p>");
                }
                if (tag.getAttribute('name') == 'og:image') {
                    document.write("<img src='" + tag.getAttribute('content') + "' />");
                }
            });
        } else {
            console.error('Request failed with status:', xhr.status);
        }
    }
};
xhr.open('GET', url, true);
xhr.send();
</script>

</body>

</html>

You can also enable the browser CORS extension to access the cross-site resource from the remote URL for testing purposes.

Method 3: How to get PHP file_get_contents() to get the metadata

This PHP code is the easiest possible way of getting metadata from a webpage link HTML. But, it requires the php.ini directive allow_url_fopen set to 1 to use the file_get_contents(). It is not secure to enable this flag in a real-time scenario. So, this method is also applicable to a development environment.

file_get_contents.php

<?php

$link = 'http://example.com';

// get HTML using PHP core function
$html = file_get_contents($link);

// Extract the Open Graph metadata tags using regular expression
$ogMetaTags = [];
if (preg_match_all('/<meta property="og:([^"]+)"\s*content="([^"]*)"/i', $html, $matches)) {
    $ogMetaTags = array_combine($matches[1], $matches[2]);
}

?>

<!DOCTYPE html>
<html>

<head>
    <title>How to get metadata from webpage link to show preview in PHP</title>
    <link rel="stylesheet" type="text/css" href="style.css" />
</head>

<body>
    <div class="phppot-container">
        <!-- Create webpage link preview with open graph meta tags -->
        <div class="site-preview">
            <h2><?php echo $ogMetaTags['title']; ?></h2>
            <!-- Check if 'og:image' is set before including it in the HTML -->
            <?php
            if (isset($ogMetaTags['image'])) {
            ?>
            <img src="<?php echo $ogMetaTags['image']; ?>" alt="Site Logo" width="100%"></p>
            <?php
            }
            ?>
            <p><?php echo $ogMetaTags['description']; ?></p>
        </div>
    </div>
</body>

</html>

Method 4: Fetch metadata via 3rd-party API

There are APIs available to fetch the webpage meta and other data from a link. For that, the API requires them to create keys from their dashboard.

By sending the encoded webpage link with the key the API endpoint returns metadata to the client.

Most of the APIs are paid services, but provide free trials to see how things give value.

The below example code shows how to use 3-party API to get the metadata to show webpage preview.

// Example usage with a website URL
var siteUrl = 'http://example.com/';
//Specify the Third pary API URL with key 
//Eg. `https://opengraph.io/api/1.1/site/${encodeURIComponent(url)}`
//Eg. 'https://apiv2.ahrefs.com?token=${apiKey}&target=${encodeURIComponent(siteUrl)}'
fetch('API URL with access key')
    .then(response => response.json())
    .then(data => {
        console.log(data);
    })
    .catch(error => {
        console.error('Error fetching site metadata:', error);
    });

Download

Vincy
Written by Vincy, a web developer with 15+ years of experience and a Masters degree in Computer Science. She specializes in building modern, lightweight websites using PHP, JavaScript, React, and related technologies. Phppot helps you in mastering web development through over a decade of publishing quality tutorials.

Leave a Reply

Your email address will not be published. Required fields are marked *

↑ Back to Top

Share this page