Test content
Latest Entries »
Offline Batch Processing เป็นเรื่องปกติที่พบในระบบขององค์กร ซึ่งแนวคิดในการออกแบบที่ดีนั้นแตกต่างกันมากจาก Online Request/Response Processing และ Streaming โดยที่งานแบบนี้จะมีการใช้กันอย่างหนักมากในระบบทางการเงิน-การบัญชี การบริหารองค์กร และการวิเคราะห์ข้อมูลออกรายงานต่างๆ ดังนั้นการทำงานแบบ Offline Batch Processing จึงเป็นรูปแบบที่พบได้ทั่วไปในระบบไอทีขององค์กร บทความนี้จึงมุ่งที่จะสรุปหลักการเบื่องต้นในการออกแบบและตรวจสอบ Offline Batch Processing
Note: รูปแบบสถาปัตยกรรมที่ใช้สำหรับหลักการเบื้องต้นเหล่านี้ เป็นรูปแบบที่ตัว Batch Process ทำงานใน Active-Standby Clustered Servers
- Users ที่ใช้ run ตัว batch jobs จะต้องแยกจาก users ปกติที่ใช้กับ OS และ System
- Users rights/priviledges จะต้องปรับให้ถูกต้องและได้เท่าที่ต้องใช้งานเท่านั้น
- ตรวจสอบให้แน่ใจว่า user ที่ใช้ run ตัว batch job นั้นๆ มีสิทธิ์ใน resource และ file ต่างๆ อย่างถูกต้อง
- Users และ configurations/settings ต่างๆ จะต้องถูกต้องตรงกันระหว่าง active node กับ failed-over/standby node จะต้องไม่มีการ lock ตัว resource ค้างระหว่างกัน
- เมื่อทำการ fail-over จากเครื่องหนึ่งไปยังอีกเครื่องหนึ่ง Working states/data จะต้องมีการถ่ายโอนกันอย่างถูกต้อง ภายในกรอบเวลาของข้อมูลที่ตกลงกัน
- ถ้าเครื่องที่ทำงาน batch ต้องทำงาน online ด้วย ให้ตรวจสอบให้แน่ใจว่า load balancer รู้และสามารถจัดการย้าย incoming requests/connections จาก clients ไปยังเครื่องที่ทำงานขึ้นมาได้ถูกต้อง
- ถ้ามีการใช้ heartbeat กันระหว่างเครื่อง ควรแยกวง network ของ heartbeat ออกจาก service vlan, data replication vlan, และ management vlan
- ถ้ามีการใช้ heartbeat กันระหว่างเครื่อง ควรมีวง network สำรองให้กับ heartbeat vlan ด้วย
- ระวังการที่ batch jobs ใช้ resource ใดๆของเครื่องสูงจนให้บริการงานอื่นๆไม่ได้หรือผิดพลาด
- ระวังการทำงานของเจ้าหน้าที่้/ระบบ supports/operations อาจขัดขวาง/รบกวนการทำงานของ batch job
- Batch jobs ต้องการระบบการ validate, monitoring, logging, health-checking ที่ให้ข้อมูลที่มีความหมายชัดเจนและมากพอที่จะปะติดปะต่อสถานการณ์การทำงานของระบบในแต่ละขณะได้จนสามารถจำลองทำงานซ้ำได้เสมอทุกครั้งที่ต้องการได้
- หน้าจอ dashboard ที่แสดงสถานะของแต่ละห่วงโซ่ในสายงานของ batch jobs เป็นสิ่งที่จำเป็นมาก
- สิ่งที่ใช้ติดต่อกันระหว่าง batch ทั้งหมดเป็นสิ่งชั่วคราว จะต้องตกลงและออกแบบให้ส่งและรับในช่องทาง, ไดเร็กทอรี่, ช่วงเวลา, users, port ที่ถูกต้องสอดคล้องตรงกัน ถูกที่ ถูกuser ถูกเวลา
- ตัวจัดการและทำการสั่งงาน batch ควรเป็น component ที่แยกจากตัวที่ run batch
- Fault management process ควรแยกต่างหากจากตัว batch
.
Once the requirements are gathered. We will start to analyze the requirements, this leads to Gap Analysis. A gap is essentially a “thing” that we need have/use in order to deliver solution in response to the requirements so a gap can fall into one or more following domains:
- IT Context
- Business and Business Process
- Money
- Data
High-level Gap Analysis
We do this analysis to create Gap Registry. Gap analysis can be done via listing in a table or mind-map if there are not too many requirements. But normally it is likely to have around hundreds to thousands requirements in medium to large project. We will list the gaps for each project phase:
- Analysis and Design
- Development, Build, Test
- Deployment and Distribution
- Life
- Migration
- Support and Maintenance
| Gap ID | Requirement ID | Gap Description | Domain(s) | Project Phase(s) |
Each gap could leads to Project Dependencies and each gap and each dependency lead to Impact(s) and Risk(s) later.
| Dependency ID | Gap ID | Dependency | Description | Domain(s) | Project Phase(s) |
We will use this list of gaps and dependencies that fall into IT Context, Business Process, and Data domains to create Context Diagram.
High-level Impact Analysis
Gap or dependency could lead to one or more changes (either new or modification or even elimination) of one or more things in one or more domains.
Then we will prepare a Impact Registry to list all high level impacts as following table:
| Impact ID | Gap ID | Dependency ID | Impact Description | Domain(s) | Project Phase(s) | Note(s) |
We will use this created spreadsheet to raise impacts to senior management and project committee. And use the impacts that fall into IT Context, Business Process, and Data domains and the created Context Diagram to do further detailed impact analysis.
High-level Risks Analysis
This steps is to create Risks Registry which will be reviewed with governance team, stakeholders, and project committee as well as using in project management.
| Risk ID | Gap ID/Dependency ID | Risk Description | Impact Level (L/M/H) and Description | Likelihood (L/M/H) and Description | Mitigation | Remark(s) |
After review, the approved mitigation of each risk will be added into the next revision of the design for implementation.
Change Impact Analysis or Impact Analysis (IA) is the process of identifying the potential consequences of a change, or estimating what needs to be modified to accomplish a change. In software engineering, there are two normal types of IA: “Traceability Impact Analysis” and “Dependency Impact Analysis”. The input of this process are the requirement or need that cause the change with the definition of that need or requirement and the current picture of architecture and its context. And the output of this process is the list of the impacts or potential impacts.
In the scope of software-intensive system architecture, we usually refer to Dependency Impact Analysis (DIA) as Impact Analysis which is the scope of this post because we will normally use Traceability Impact Analysis (TIA) as a starting point for the change in existing requirements, scenarios, functionalities, and features. And although there are also other types of impact analysis but it is not in the scope of this post.
Static structure, dynamic structure, and quality-attributes views of the architecture will be used to identify the potential impacts by reviewing them against the scenario. And once we get the list of the impacts, we will then submit it to review with team and stakeholders.
At first (and most important) level, we will list only the impacts that happened on the whole IT system:
- system logics (both data and processing)
- information architecture
- software architecture
- hardware system architecture
- networks and infrastructure.
Then we will drill down into more details at detail level:
- Detail software design and implementation
- Detail OS and software framework or libraries specifications
- Detail hardware and driver specifications
After having the list of impacts on the IT system, we will then move to investigate the second level impacts on the processes or procedures of managing the IT system:
- Release and distribution process
- Installation, patching and uninstallation procedures
- Backup, restoration, and reconciliation procedures
- Environment/site failover procedure
- QoS and SLA
- Support and troubleshooting procedure
- Monitoring and instrumentation procedure
- Development and QA process/methods
- Etc.
Once we done with the second level impacts analysis, we can then submit the report to the chief architect and enterprise architect to review and consult with the business to evaluate and analyse the impacts at service and business level in order to review with the business side as necessary.
Our business today are more and more transparently relies on IT systems without even knowing what are those systems doing behind the scene for us. So it means that IT systems are more and more critical to us and our customers while also keeping more and more complex and distributed over time and spaces. These facts make the risk of business operation higher and complicated and harder for risk management, so we must have a process to reduce/avoid/manage the risks from the starting point and also help in doing it in the end-to-end or whole lifecycle of the IT systems.
Architect is the very important role in this picture because she will be the person who realize the requirements into the solution and govern it through the development to deployment and support it until the end-of-life of the system. In order to do this effectively from the start, we then need to have a step in the design process to review, rethink, refine the design at least one time before releasing the design to the production line which we call it “Architecture Review Workshop” in this post.
What is it?
“Architecture review workshop” will be a lightweight meeting/brainstorming event by doing (preferably) face-to-face dialectic discussion to go over various design decisions to get/review/confirm the decisions and to review and evaluate the designs. It means the discussions and works in the workshop will be for the design and architecting only, any other peripheral works could be done by remote meetings and teleconferencing.
This workshop is meant to be a lightweight process, we should keep in mind to focus of only related peoples and do just enough things so we can really focus on thinking and discussing.
Why do we need it?
- Setting up this important step in the process to be a visible step so the team and related parties will always aware and join the discussion together in one time.
- Because the complex system contains high risks and costs, this process will be an important opportunity to review, reduce, avoid, and manage those risks and costs.
- This workshop is a big chance to get an agreed picture on reviewing, committing, and guarantee the quality attributes of the architecture through the design process.
Where is it in the whole architecture design process?
If we look at the high level of the IT architecture design process, we can write it in an easy steps more or less like this:
- Project initiation and business missions envision
- Gather and analyse problems, requirements, and context
- Identify the architectural drivers and define the details
- Analysis and create the core concepts and models – business and analysis models
- Identify opportunities of reusing the architectural assets
- Analysis and create candidate schematic designs
- Review design options and decisions, and select the most suitable design e.g. Active Review for Intermediate Designs (ARID), …
- Refine and complete the design, its presentations, and documents
- Formal review and accept/sign-off the design
- Maintain the design and its documents
This list does not cover the organization and team management aspects of the whole architect team but rather focusing on design works in a project.
What are we going to do?
We are going to only review, discuss, and think only about the design and important architectural decisions in this workshop:
- Review and evaluate the intermediate or schematic designs and select the most suitable design
- Review the important architectural decisions and finalize the answers to those decisions.
We will discussion about the definition of an “important” architectural decision later. The reason that we try to use the workshop to only for review and finalize the decisions is because the studies and preliminary analysis for the decisions and also its possible answers should be made and discussed prior to the meeting for quite a while, so the participants should already have the background knowledge for all or most parts of the problem to be reviewed. - Review the big picture of the system/inter-system and the roadmap and timeline of the architecture.
What are the outputs?
- The selected or most suitable design.
- The agreed/finalized answers to the important design decisions.
- Optionally, we may also have new or updates in actions or tasks or even questions and issues.
How are we going to do it?
- The architect submits the review works with the Facilitator and Chief Architect to identify the best reviewers.
- The architect prepares a briefing explaining the design, which may also used to have some preliminarily or initially reviewed and discussed with chief architect, peer architects, and important stakeholders before.
- The architect presents the overview to the reviewers and walks through examples of using the design. Minutes-taker captures questions and answers.
- The reviewers and the architect brainstorm scenarios for using the design to solve problems and requirements.
- If the design performs well under the adopted scenarios and requirements, then it must be agreed that the design has passed the review.
- Minutes-taker records issues, problems, and places where the stakeholders get stuck.
- Set new appointment and distribute the minutes.
How long will it take?
Around one day, or one day and a half at the maximum for one project and one architect.
When we should do it?
At least one time before submitting the design to the production line.
How often we have to do it?
At least one time in a phase but we could have it as many as necessary if our system is very complicate or critical.
Who will join this workshop?
Here are the roles that will participate in the workshop:
- Moderator/Timekeeper/Minutes-taker/Facilitator
- Chief Architect/Head of Architect – he is my boss in this case …
- Project/Solution/Design Architect(s)
- Problem Statement/Theme Architect
- Contribute Architect(s)
- Review Architect
- Requirement Owner(s)/Product Manager(s)
- Other stakeholders that related to that review
What are the inputs?
The inputs for this process is the list of the following artefacts:
- List of the requirements in the scope of that design phase
- List of the prioritized non-functional requirements and top-five quality attributes
- Schematic architecture design documents (see “What are included in the design and its documents to be used in this workshop?” at the bottom of this post)
What are the tools and resources that we need to use?
- Central online point to post the notes and discussions and every stakeholders can subscribe to receive the change notifications e.g. Wiki, …
- Meeting room with whiteboard and projector (or virtual meeting facilities that can share presentation over the screen)
- Voice/video recorder + digital camera [Optional]
What are included in the design and its documents to be used in this workshop?
You may argue that below list seems many but actually almost all of us are already unintentionally do it while we are designing and thinking in our own notes. So what do we need to do it to just write them down or copy and paste them into the same presentation or document file to present them in the meeting:
- Architectural Drivers – E.g. list of architectural significant functional and non-functional requirements, quality attributes, …
- Architectural Context – E.g. technical governance and corporate IT specifications and standards, compliance checklists, design specifications of the related systems, …
- Gaps Analysis [Optional]
- Impacts Analysis
- Design Assumptions
- Candidate design options and its presentations/documents + detailed IA, DA, R&M specific of each candidate.
- Risk and Mitigation list
- Notes – E.g. memo on the rationales behind design decisions, detailed diagrams, research papers, pending questions and issues
- Design prototypes [Optional]
One biggest question in designing the architecture of a system is because the word “simplicity” is rather relative thing than having hard-and-fast rule to define it. According to that, I think we also have some criterions to define the baseline here:
- First of all, it fulfils the MANDATORY functional requirements.
- It also fulfils the MANDATORY non-functional requirements.
- Frequently, we tend to jump right into the problem before knowing the exact threshold of each non-functional requirement.
- Please make sure that we already have the list of prioritized non-functional requirements with its expected values that are really *NEEDED*, not fancy features please.
- There is nothing to put out, otherwise it will break or not meet some of mandatory requirements.
- It can be developed, tested, distributed, deployed, maintained, managed, and support by using typical knowledge, tools, and processes.
- Sometimes, sometimes, the real problem is hidden in politics.
This list does not guarantee that we can get the best or simplest or most optimal design in one shot, but it is a reminder for us to always review it while we are designing something.
Having said that, sometimes, we also have to take some other criterions into account too. Somebody may argue that it depends on what type of the system we are designing is e.g. life-critical or safety-critical system, system that directly accessing to the precious resources, … , in such situations, a little degree of over-engineering maybe acceptable or even more preferable, which I also agree with it in some extents.
I just had a tiresome sery of arguments with friends regarding the design of a tool or small system that will be used for directly manipulating the master/referenced data in the production environment. The argument is all about which design is more suitable or simpler between Design A and Design B.
The thing that makes this kind of argument complicated is each person tends to have his or her own views and levels of “simplicity” differently from one another. Although both designs can serve the functional requirements very well, we still have some different point of views in the suitable architecture. The problems are hidden in the priorities of quality attributes.
So I decided to note down this lesson learnt about the most influential quality or non-functional attributes that could be the most influential to our design:
- Needed availability and resiliency level.
- Needed security, safety, and controllability level.
- Needed level of workloads and concurrency e.g. requests per second, no. of concurrent connections, no. of jobs per batch, …
- Needed performance (e.g. response time, update rate, transactions or batches per second, …) per the expected workload or concurrency level.
The first two is very critical in defining the style/pattern/topology that will be used in your architecture from both static and dynamic views. While the latter two points will help you define your architecture in more details especially from the dynamic views (or processing and threading view in 4+1 method).
There are several areas to touch when we start discussing about producing the flexible architecture and we may have several posts back to business side if we not put a scope to this dialectic. So I decided to put aside the points regarding the whole product delivery process, development process, architecture review process, and requirement management process, and let’s concerns on the technical design topics which will be a lot easier to keep it practical and be able to apply to our job immediately:
- Have good prioritized requirements or scenarios, so we know what are the core or value-added functionalities of the service.
- Clear and concise traceability from scenarios and features down to the chain of system components.
- On/off configurations and scripting for non-core features.
- Stateless and horizontal scalability are still the most preferable.
- No distributed transaction.
- Decoupling the business process logics from the processing or workers.
- Decoupling the service interface from the service implementation.
- Modularity and composability
- Self-containment and isolation of the services/modules.
- Interoperability and open standard protocols.
- Contract-driven design for both data and interfaces.
- Loose-coupling.
- Weak Typing.
- Prefer stateless over stateful connection.
- Fire and forget communication.
- Event and message driven.
- Not so realtime, please.
- Cross-cutting and common concerns.
- Dependency management and injection.
- Make sure to have the complete test stubs.
- Easiness in deploy/undeploy/re-deployment is always preferable.
- Virtualization and cloud computing.
Note: Although this list is not prioritized in any order because it could be vary from project to project, but it sure is a good starting point for reviewing our design and even apply it immediately. The flexible architecture provide us a lot of benefits both during both creation and modification times. During creation time, we can plan and develop the system in the distributed agile mode because we can adopt the “Just Enough Design Upfront” concept to analyse the context and create the skeleton architecture or schematic design to use as a common high-level picture among the distributed teams. We will then define the contracts (plus some important specifications) and review it together through the interaction simulations per each scenario.
In order to let the architects to create the pictures to compose together, requirement/scenario management and synchronization between the product managers is critical.
After we have the same clear common high-level pictures, scenarios, interactions, contracts, and responsibilities. Each team then start their works at full speed. Be aware that some teams may have to provide the test stubs or mocks to other teams or even do the prototypes, while some teams may be not. So planning and orchestration skills of the project managers is the key.
Put the business management aspect aside, there are areas that IT has its job to response to business in order to improve business agility to response the new markets. There are two ways for the business to response to new needs, provide brand new products or provide new flavors of existing products. While proposing the completely new products is great, proposing the new flavors is also very interesting to do.
What we can do for this? There are some key concepts that we have to keep in mind in responding to this requirement:
- Make the products customizable through configuration as much as possible, not coding or re-deployment.
- Configuration or customization process and tools must be easy and fast, but safe and secure.
- Separation between each configuration must be clean and easy and able to operate each one of them independently.
- Beware of data, logical processing, and physical components dependencies between configurations.
- Beware of the separation and difference in QoS and non-functional policies between each configuration.
- If there are new things or modifications need to be introduced in to the system, be sure to make it loose-coupled with the legacy components.
- Test it with real data, real scenarios, real load, in (very close to) real environment, over real network.
I’ve been thinking about this during my free times for more than a month. Since our development teams are exciting and eager to moving itself to agile development process while we, as the architects who have to take care a lot of many tasks and needs from several stakeholders and projects, have to take care of them and response to their demands in order to let them run and enjoy their works.
It boils down to a conclusion that we may also need to change our process or method of working too. In Bangkok side, we have to be able to response to the needs of development teams along with other stakeholders while keep it in balance at the same time. So we need to make our method of working more flexible and clear.
Architects, have to do the architecture design and govern it through the implementation to deployment and support and maintenance. While in the agile manifesto, dev tends to prefer and trust in thinking and prove it through testing and refactoring (to make “Emergent Design”) than what they call us “Big Design Up Front”.
In order to works with teams that are trying to do agile process, we need some guidelines or rules in making decision about what requirements or architectural elements that are needed to be passed through architectural design process strictly or we can leave it to be solved at development level. Here are those criteria that I can think of for the moment, the more or higher those elements or requirements are like these concerns, the more it has to pass-through formal architecture design process, prototyping, and governance.
1/ How much this requirement or functionality or module or project is:
- Less clear and concise
- Critical or important
- To access, feed, inject, process, or manipulate something into the critical system or the system that is a single-point-of-failure of a system
- Complex
- Risky
- Cause impacts to others
- Be the foundation work for other works/teams
- Urgent
- Distributed processing
- Heavily connected to many parties such as the integration works
- High cost
- Quality attributes are highly concerned
- Frequently change the responsible product managers
2/ How much the development teams used for this module or system or functionality or requirement are:
- Less skilled
- Less mentally matured
- Less process-wise matured
- Distributed across time-zones and spaces
- Politically divided
- Limited in resources
- Frequently change their staff
3/ Technology and tools
- Less mature
- Low activity (for open-source technology)
- High licensing costs
- Less vendor support
- Low and slow response in their discussion boards (for open-source technology)
- Hard to do maintenance and high TCO
- Unclear between technology choices
- Vendor’s status is uncertain
4/ Budget for your project
- Limited
- Uncertain and need long time for approval process
5/ Project and timeline
- Strict and rigid
- Tight
- Uncertain
- Frequently change project managers
6/ Environment and infrastructure and its management
- Less mature
- Limited
- Distributed
- Concerns in security
- Need long process for changing something
Other things than this, I think we can leave it to be solved or try at development level in order to let them do development faster and more agile as they need, while we will join them in the potentially architectural-related refactoring and emergent design discussions, we will then gather the results and feedbacks back to do formal design and communication for formal reviewing to make some formal change in the design as appropriate.
The good signs that we already have are, we are starting to do “Incremental PIA” and self-assessment process on top of the formal PIA and technical governance review process. So I think these ideas could fit together.
I will share my idea with you all about the whole process in architectural design iteration in order to work with across smaller development cycles later.
What do you think?

