How much do you know about the blog system: revealing the unknown knowledge (2)

The first part "How much do you know about the blog system: introduces the history of the blog, my blog stories and the source of the blog's audience. This wonderful article continues, introducing the essentials of the basic function design of the blog.
Due to the length of the article, this article will be divided into 4 articles, the directory is as follows:

The past and present of "blog"
My blog story
Who is the audience of the blog?
basic function design points of the blog
4.1 Article (Post)
4.2 Comment
4.3 Category (Category)
4.4 Tag (Tag)
4.5 Archive
4.6 Page
4.7 Subscribe
4.8 version control
4.9 Theme and personalization
4.10 Users and permissions
4.11 plug-in
4.12 Processing of pictures and attachments
4.13 Sensitive filtering and review review
4.14 Static
4.15 Notification system
Blog protocol or standard
5.1 RSS
5.2 ATOM
5.3 OPML
5.4 APML
5.5 FOAF
5.6 BlogML
5.7 Open Search
5.8 Pingback
5.9 Trackback
5.10 MetaWeblog
5.11 RSD
5.12 Reader view
What are the knowledge points of designing a blog system
6.1 Really use UTC for all time zones?
6.2 HTML or Markdown
6.3 MVC or SPA
6.4 Security
Concluding remarks

01 Article (Post)

We may read 3-5 short or long articles every day. Articles are the core business of the blog system, so the content and quality of blog posts are very important.

So, how do you name the business type of the article? Do you use article for database table names and code variable names and type names? It seems that when I was in school, I only learned articles called articles. In fact, the correct expression for blog type articles is post. The difference between post and article in the English word is that post is just an article written at will, while article refers to a paper that has been carefully crafted, cited by others, and may be published in academic journals. The article was published on . Therefore, when designing a blog system, try to avoid using the word article to name the code. To be more specific, loose, colloquial expressions can appear in the post. For example, this article is a post. The article pays attention to verbal norms, and even words like "let us" and "let's take a look" cannot appear.

Figure | Network

Articles need to have elements such as title, slug, creation time, publication time, modification time, abstract, and content, as well as secondary information such as classification, tags, reading volume, and likes. Slug is a feature of blogs, which refers to the URL of an article. For example, my article: "Try the New Azure .NET SDK", its URL is
(Figure: Abstract from the article list)

(Picture: meta description tag code)

The abstract can automatically grab the first few hundred words of the article, or it can require users to fill in manually like the WeChat official account. My blog uses automatic extraction of the first 400 words of the article. Combined with the relationship of SEO, the beginning paragraph of my article is usually a summary, so that users can see the accurate content on the search engine preview page, rather than the insignificant UI elements on the page.

(Picture: Summary of content identified by Bing search engine)

The status of an article usually includes: draft, published, and recycled. Users can only see published articles, and administrators can change the status of articles in the background.

02 Comment

Comments are the main way for the author and readers to interact in a blog. Some blogs require readers to log in to post comments, while others allow visitors to comment (such as my blog and WordPress). The advantage of logging in is that it can identify your readers and effectively prevent spam comments. However, the requirement to log in will also cause an additional step to the user's operation, and users who find it troublesome will not comment.

My blog and WordPress are both designed by default to require the administrator to review comments in the background before they can be displayed. This can also effectively avoid spam advertisements, harassing information and even some malicious incitement. For users who provide email addresses, the administrator can also reply to user comments in the background, and the blog system will send email notifications to users.

(Picture: Moonglade's comment area)

For technical blogs, comments may consider opening the markdown format. This is a syntax that is particularly popular among programmers and is widely used on GitHub.

Comments need to use verification codes or other man-machine verification technologies to prevent robots from advertising. However, based on experience, the verification code cannot prevent 100% of spam. sent by people because of modern spam. There are special companies, teams, WeChat groups, etc., and there are also foreigners. Therefore, you may need to consider keyword filtering, purchase a three-party filtering interface, and so on.

You must also remember to limit the number of words in comments, otherwise it may also cause some users to "fill water" and refresh the screen.

If you don’t want to write the function yourself, you can also integrate the three-party comment service, that is, the blog system itself does not implement the comment function, and the external JS is loaded through the third-party service, and a comment area is "injected" on the article reading page. Usually this requires the URL of the article to remain unchanged ( It's called a permanent URL in WordPress).

03 Category

The article is divided according to the content like creating a folder, that is, classification. After the articles are classified, it can help readers to quickly retrieve articles of the same type.

For example, writing articles about .NET, PHP, and JS belong to the category of "Development". The technology circle news and workplace experience sharing and other articles belong to the "work" category, and the classification of categories is completely controlled by users. The classification can be many-to-many. For example, writing an article introducing ASP.NET Core to develop Angular applications can be classified as both ".NET technology" and "front-end development".

The classification requires a title, an introduction, and a route name. For example, my blog, the classification of Microsoft Cloud Azure, is titled Microsoft Azure, profile is The Best Cloud, and the routing name is azure. The title needs to be displayed in the title bar at the same time to facilitate SEO. The introduction is a supplementary description of the title, which is easy for users to view. For the reason of designing the route name, please refer to the label design described in the next paragraph.

(Picture: An article classification of Moonglade blog system)

Another function of classification is to generate OPML and RSS/Atom feeds. This will be explained in Chapter 5 Introduction to Blog Protocol.

04 Tag

The topic mentioned in an article is the label of the article. Like classification, tags are also a many-to-many relationship. Tags can be used as the basis for retrieving articles, similar keywords, and quickly find articles with related content.

Labels need to take into account the duplication of label meanings, for example: VS and Visual Studio mean the same, and VSCode, VSC and Visual Studio Code also mean the same. Then when the user selects a label, it is best to use the smart prompt to recommend the user to use the existing label.

For blog system designers, the URL of the tag must also be considered. If the URL uses the content of the tag itself, it will cause a lot of problems. When the tag name is a whole English word, such as Excel, there is no problem, because the URL is usually https://yourblog/tags/excel. But if the tag content is .NET Core, C#, Robots.txt, things become more interesting. https://yourblog/tags/robots.txt Is it requesting tags or is it requesting tags? As a blog system designer, of course, I can programmatically restrict all the routing parameters accepted by tags to tags, which seems to solve the problem, but SEO and scanning tools don’t think so. They have a large number of by convention rules that would consider requesting files. .

For tag content that requires URL Encoding, it will cause the lack of readability of the URL, thereby affecting SEO. Don't be clever thinking that modern search engines can handle URL Encoding well, and whether a URL is clean has a great impact on SEO. Especially when the tag is Chinese content, if it is fully encoded, the URL will be very lengthy, even affecting SEO, and also affecting bloggers to share links. Therefore, in order to process tag URLs, my blog system has designed a normalized name for each tag, which is automatically generated by the system according to the tag content. For example, after normalizing. NET Core, it will become dotnet-core and finally generated The URL is https://edi.wang/tags/list/dotnet-core.

(Picture: Label of Moonglade blog system)

For users, one of the most common mistakes is to use tags as search keywords. For example, if a user writes an article about Visual Studio Code, the label may be marked with VSCode, VSC and Visual Studio Code at the same time, but in fact, only one label needs to be selected. Too many tags with the same meaning will result in readers not being able to retrieve all relevant articles completely, and this is also true for search engines. So how to make good use of tags is the main point that blog designers and users need to pay attention to .

Tag Cloud is a feature used to list the most popular tags in blogs. Usually use large letters and more obvious colors to identify tags that correspond to more articles. Tag cloud can be used as a personalized attribute of bloggers, and you can see at a glance what topics the bloggers are keen on (such as Windows Phone? 0.0).

05 Archive

Blog posts organized by time (year, month, day) are archived. The difference between it and classification is that the archive only divides the articles based on time. Archive's SEO is not so critical compared to articles, categories, and tags. So apart from the URL can be divided by year and month, there is no extra attention.

For example: https://edi.wang/archive/2019/9 represents the article in September 2019. https://edi.wang/archive/2019 means all articles in 2019. The archiving function is mainly used to query readers by time to see what the blogger is doing at a certain time. Designing such a function can increase readers' interest in bloggers, and it is also a display of personal external image.

(Picture: Archive of Moonglade blog system)

06 Page

The page is one of the optional features of the blog, in fact, it is closer to the function of the CMS. Some content is not suitable for publishing in the form of articles, such as the "About" page. Such pages usually have nothing to do with the time of publication, the content is frequently updated, and the layout design is very free, not just text.

Pages usually do not need attributes such as comments, tags, and categories, but they can have publishing and editing time. Like articles, pages also need to pay attention to Slug.

(Picture: About page of my blog)

In my blog system, the page also chooses whether to hide the sidebar, and the user can also completely write the HTML and CSS code of the page and add the page as a navigation menu. WordPress is more complete for page processing, which is close to the CMS system.

07 Subscription

The main ways for readers to subscribe to blogs are Feed (RSS/ATOM) and Newsletter. The feed method is essentially passive subscription, which requires the client software to initiate a request to the server to check whether there are new articles published before it can be displayed in the client. Newsletter is generally sent to subscribers actively in the form of Email, but this requires the writer of the blog system to implement the Email subscription function, and also requires the administrator to maintain the Email service. Subscription generally only pushes new articles published recently, such as the first 10 or 20 articles, and does not push all articles every time, causing the client to explode.

(Picture: Moonglade's RSS/ATOM feed)
Subscriptions can generally be provided by article categories, so that readers who are only interested in certain categories can read. Some blog systems also provide a feed of article comments, so that readers can watch the Tucao conference.

For a detailed introduction to RSS and ATOM, please see chapters 5.1 and 5.2.

08 version control

Blog systems closer to CMS usually provide version control functions that allow users to roll back historical versions of articles or pages. When designing version control, you can't just think about rolling forward, you have to roll back again. Usually, every time a user edits an article that has already been written, a new version will be generated, similar to a git commit to a file. The blog version control is also similar to the code version control. You can choose to save the complete content of an article as the historical version, or you can choose to save only the delta each time. Saving the complete content is not easy to spend a lot of time and energy later, but it will take up more storage space. Saving content changes saves database space, but the implementation code easily takes up a lot of effort.

09 Theme and Personalization

A good blog system usually supports themes, after all, personalization is one of the characteristics of the blog itself. WordPress has accumulated a large library of themes and also allows self-made themes. But my blog only supports changing the theme color, and there is still a lot of room for improvement.

10 Users and permissions

The blog system is divided into individual, team and blog platforms. The personal blog system is generally a single user (such as my blog) and does not require functions such as design permissions and registration. Multi-user blogs need to implement different roles and permissions, such as blog administrators, moderators, writers, comment administrators, and so on. Whether it is a single-user or multi-user blog, integrating a mature three-party RBAC solution may be the most efficient choice. Most of the three-party solutions also support SSO, such as Azure AD supported by my blog.

11 plug-in

The plug-in function can expand the function of the blog on demand without changing the blog code. Both WordPress and BlogEngine support plugins, but Moonglade is not yet available.

(Picture: WordPress Plugin Market)

12 Processing of pictures and attachments

Image Format

In 2020, the image format is very free. Most blogs are JPG, and programmers’ blogs are mostly PNG (after all are screenshots). It is also desirable to use the WEBP format like WeChat official accounts, as long as the reader’s device is compatible. Generally, the BMP format is not recommended due to the slow network transmission due to its large size. For the same reason, GIF should also pay attention to the size limit.

When the blog system outputs pictures, the correct Mime Type must be used to ensure client compatibility. Generally, direct output of static files itself does not require bloggers to manually process the Mime Type, but blogs with special image processing logic (such as my Moonglade) need to pay attention to retaining the original Mime Type of the image.

Picture watermark

Automatically adding a watermark to the uploaded pictures helps to protect the copyright. The content of the watermark is generally the address of the blog or the name of the blogger. When adding a watermark, pay attention to the image size and adjust the ratio of the watermark, so as not to block important content in the image and affect reading. For images that are too small, you can optionally ignore the watermark.

In addition, considering that the blog may be renamed during the development process, it is recommended to keep a copy of the original picture in the system when adding the watermark, so that the watermark content can be updated later.

For specific methods, please refer to my article "ASP.NET Core Watermarking Uploaded Pictures".

Picture storage

Where the pictures are stored is a question worth pondering. There are generally three places for storage: file system, database, and Blob storage service on the cloud. Moonglade supports file system and Azure Blob storage. Each of these three has advantages and disadvantages.

The advantage of the file system is that it is the fastest to serve static file directly, but if the image directory itself is located under the website directory, it will cause the directory to not be read-only and cause potential security problems. For example, it was very popular in junior high school to upload an ASP web shell with a changed extension to DVBBS. Although uploading executable files to web servers has basically disappeared in 2020, there are still hidden dangers, just like if you hire 007 as a bodyguard at home. It is also necessary to lock the door at night.

The data inventory graph is the most secure, and allows the data of the blog to be located in only one location, which is convenient for management and backup. This was popular more than ten years ago, but in fact, reading and writing pictures has a certain overhead on the database, and the website output is doubled. Overhead, generally not recommended.

The cloud Blob storage service is currently the most suitable solution for this era. Storing pictures in the Blob not only ensures that the server directory is read-only, but also uses the security features of the cloud itself to restrict abnormal access, and it can also speed up image output through CDN. . To insist on the shortcomings, it is that cloud services require additional money, and lack of money is your own problem, not the problem of the cloud.

Figure | Network

Picture anti-leech

As a website developer, we sometimes do not want the pictures on our website to be directly quoted by other websites. In some scenarios, this will lead to huge bandwidth consumption in our own data center, which means that others use our pictures, and we have to pay for it. For example, your website is a.com, you have a picture of http://a.com/facepalm.jpg, and b.com uses an img tag on their website to reference your picture, which leads to A network request is to enter your data center and consume your resources. Therefore, the blog can selectively enable the anti-leeching function. For the specific method, please refer to my article "ASP.NET / Core Website Image Anti-leeching".

Appendix

Usually the programmer's technical blog will provide readers to download code samples and other attachments. The design attachment function is very similar to the design picture storage, which is completely feasible. But I recommend that technical blogs host attachments such as code samples to other websites (such as GitHub) for readers to download.

The disadvantages of downloading attachments from your own blog are:

large file

Different web servers and firewall products have different restrictions on file size, and users who deploy blogs may not have the right to manage these restrictions, which will result in large attachments that cannot be downloaded.

domain and IP blacklist

Some companies or organizations (especially software companies with high security standards) will block file downloads from non-whitelisted domains. Although you can open the web pages of this domain with a browser, you cannot download files (the firewall only allows HTML/CSS). /JS, etc., but not ZIP, EXE, etc.). The readers of programmer blogs are likely to be in such companies.

CDN resource consumption

If your attachments are large and large, and you have set a CDN to the attachment system like designing picture storage, at this time, according to the different billing mode of the CDN service provider, if you charge by traffic, I am afraid your attachment download will be Cause your wallet to lose weight faster.

The benefits of using third-party file downloads (such as GitHub, OneDrive) are:

√ Your files can be shared not only in blog posts, but also in other locations;

√ These three-party services have their own CDN, so you don’t have to worry about consuming your own wallet;

√ Many file hosting services have complete management functions, such as file deletion, recovery, version control, permissions, etc. If you write this in your blog system, it will take a lot of time...

13 Sensitive word filtering and review review

Blogs will inevitably attract some hostile people, as well as those who post advertisements, so it usually requires sensitive word filtering and review review. If the user’s comments are displayed directly under the article without review, it may have an adverse effect on the blogger and the website itself. For example, if someone posts politically sensitive remarks or non-compliant advertisements, they are directly displayed without background review, and your blog is deployed in the mainland, then your blog is likely to be shut down for rectification immediately, and you will also It will unlock the programmer's achievements from entry to prison. Don’t think that it’s okay if you deploy it abroad. Some hateful remarks can even help you attract hackers, poison you on your blog, and blackmail you or your readers.

Therefore, I strongly recommend that personal blogs enable sensitive word filtering and comment review. Both WordPress and my Moonglade blog system support sensitive word filtering and comment review.

14 Static

In the early news systems, blogs, and CMS, in order to improve the response speed under a large amount of visits, static technology was used, that is, the page rendered on the server side was saved as a real HTML file on the disk, and the output of the static file and the web server were performed. The efficiency of static file is very high. For unchanged content, the user's subsequent access will not hit the database, so the pressure on the server is greatly reduced. Today in 2020, static is not the only solution, Redis Cache can also help us reduce frequent access to the database. For personal blogs, if your visits are not high, you don't actually need 996 static or Redis to increase development and maintenance costs. But if you are designing a blog platform, it is better to use static or Redis.

15 Notification system

Blogs usually send notifications to administrators or users in the form of Email. However, there is no specification or agreement that whether or not a blog must use Email for notification push is up to the designer of the blog system.

The notice usually includes:

Notification to bloggers: new comments, articles are cited by others' blogs (see chapters 5.8, 5.9).

Send notifications to users: new articles are published (subscribe to Newsletter), comments are replied, comments are approved or rejected.

Email notification system should pay attention to spam and user privacy protection issues.

Sending spam to bloggers itself is not a big problem, but you have to pay attention to whether the mail system will allow readers to send emails without the permission of the blogger, which may be used to send spam, resulting in the server being blocked. Some server providers, such as Microsoft Azure, have stricter regulations on emails. Codes deployed on some PaaS services to call SMTP terminals will be directly blocked.

Regarding user privacy issues, when the user provides the email address to the blog system, it is necessary to inform the user how the email address will be used (can be written in the privacy agreement or the visible area of the page), or allow the user to check whether to allow the blogger to use the email address Email notification push. Another problem is the exposure of email addresses. This usually happens in the newsletter subscription group posting. If all subscribers’ email addresses are placed in To or CC, then every user will know the email addresses of everyone else, so they can make appointments with each other. Fraud, so please use BCC for Newsletter or send it separately, and allow users to unsubscribe.

Moonglade's notification system uses Email, but the design is relatively basic. A complete notification system needs to adopt message queue and event design, and adopt three-way service. For example, Storage Queue + Function App + SendGrid can be used on Azure to avoid exploding in place when sending large batches of emails.

will mainly introduce [blog protocol or standard] tomorrow, welcome to pay attention!

Scan the QR code to follow Microsoft MSDN to get more first-hand technical information and official learning materials from Microsoft!